robot-id: abcdatos robot-name: ABCdatos BotLink robot-cover-url: http://www.abcdatos.com/ robot-details-url: http://www.abcdatos.com/botlink/ robot-owner-name: ABCdatos robot-owner-url: http://www.abcdatos.com/ robot-owner-email: botlink+AEA-abcdatos.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: windows robot-availability: none robot-exclusion: no robot-exclusion-useragent: BotLink robot-noindex: no robot-host: 217.126.39.167 robot-from: no robot-useragent: ABCdatos BotLink/1.0.2 (test links) robot-language: basic robot-description: This robot is used to verify availability of the ABCdatos directory entries (http://www.abcdatos.com), checking HTTP HEAD. Robot runs twice a week. Under HTTP 5xx error responses or unable to connect, it repeats verification some hours later, verifiying if that was a temporary situation. robot-history: This robot was developed by ABCdatos team to help working in the directory maintenance. robot-environment: commercial modified-date: Thu, 29 May 2003 01:00:00 GMT modified-by: ABCdatos robot-id: Acme.Spider robot-name: Acme.Spider robot-cover-url: http://www.acme.com/java/software/Acme.Spider.html robot-details-url: http://www.acme.com/java/software/Acme.Spider.html robot-owner-name: Jef Poskanzer - ACME Laboratories robot-owner-url: http://www.acme.com/ robot-owner-email: jef@acme.com robot-status: active robot-purpose: indexing maintenance statistics robot-type: standalone robot-platform: java robot-availability: source robot-exclusion: yes robot-exclusion-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-noindex: no robot-host: * robot-from: no robot-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-language: java robot-description: A Java utility class for writing your own robots. robot-history: robot-environment: modified-date: Wed, 04 Dec 1996 21:30:11 GMT modified-by: Jef Poskanzer robot-id: ahoythehomepagefinder robot-name: Ahoy! The Homepage Finder robot-cover-url: http://www.cs.washington.edu/research/ahoy/ robot-details-url: http://www.cs.washington.edu/research/ahoy/doc/home.html robot-owner-name: Marc Langheinrich robot-owner-url: http://www.cs.washington.edu/homes/marclang robot-owner-email: marclang@cs.washington.edu robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: UNIX robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ahoy robot-noindex: no robot-host: cs.washington.edu robot-from: no robot-useragent: 'Ahoy! The Homepage Finder' robot-language: Perl 5 robot-description: Ahoy! is an ongoing research project at the University of Washington for finding personal Homepages. robot-history: Research project at the University of Washington in 1995/1996 robot-environment: research modified-date: Fri June 28 14:00:00 1996 modified-by: Marc Langheinrich robot-id: Alkaline robot-name: Alkaline robot-cover-url: http://www.vestris.com/alkaline robot-details-url: http://www.vestris.com/alkaline robot-owner-name: Daniel Doubrovkine robot-owner-url: http://cuiwww.unige.ch/~doubrov5 robot-owner-email: dblock@vestris.com robot-status: development active robot-purpose: indexing robot-type: standalone robot-platform: unix windows95 windowsNT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: AlkalineBOT robot-noindex: yes robot-host: * robot-from: no robot-useragent: AlkalineBOT robot-language: c++ robot-description: Unix/NT internet/intranet search engine robot-history: Vestris Inc. search engine designed at the University of Geneva robot-environment: commercial research modified-date: Thu Dec 10 14:01:13 MET 1998 modified-by: Daniel Doubrovkine robot-id:anthill robot-name:Anthill robot-cover-url:http://www.anthill.org/index.html robot-details-url:http://www.anthill.org/index.html robot-owner-name:Torsten Kaubisch robot-owner-url:http://www.anthill.org/index.html robot-owner-email:info@anthill.org robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:independent robot-availability:not yet robot-exclusion:no (soon in V1.2) robot-exclusion-useragent:anthill robot-noindex:no robot-host:anywhere robot-from:no robot-useragent:AnthillV1.1 robot-language:java robot-description:Anthill is used to gather priceinformation automatically from online stores.support for international versions. robot-history:This is a reasearch project at the University of Mannheim in Germany, professorship Prof. Martin Schader, assistant Dr. Stefan Kuhlins robot-environment:research modified-date:Thu, 6 Dec 2001 01:55:00 GMT modified-by:Torsten Kaubisch robot-id: appie robot-name: Walhello appie robot-cover-url: www.walhello.com robot-details-url: www.walhello.com/aboutgl.html robot-owner-name: Aimo Pieterse robot-owner-url: www.walhello.com robot-owner-email: aimo@walhello.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windows98 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: appie robot-noindex: yes robot-host: 213.10.10.116, 213.10.10.117, 213.10.10.118 robot-from: yes robot-useragent: appie/1.1 robot-language: Visual C++ robot-description: The appie-spider is used to collect and index web pages for the Walhello search engine robot-history: The spider was built in march/april 2000 robot-environment: commercial modified-date: Thu, 20 Jul 2000 22:38:00 GMT modified-by: Aimo Pieterse robot-id: arachnophilia robot-name: Arachnophilia robot-cover-url: robot-details-url: robot-owner-name: Vince Taluskie robot-owner-url: http://www.ph.utexas.edu/people/vince.html robot-owner-email: taluskie@utpapa.ph.utexas.edu robot-status: robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: halsoft.com robot-from: robot-useragent: Arachnophilia robot-language: robot-description: The purpose (undertaken by HaL Software) of this run was to collect approximately 10k html documents for testing automatic abstract generation robot-history: robot-environment: modified-date: modified-by: robot-id: arale robot-name: Arale robot-cover-url: http://web.tiscali.it/_flat robot-details-url: http://web.tiscali.it/_flat robot-owner-name: Flavio Tordini robot-owner-url: http://web.tiscali.it/_flat robot-owner-email: flaviotordini@tiscali.it robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix, windows, windows95, windowsNT, os2, mac, linux robot-availability: source, binary robot-exclusion: no robot-exclusion-useragent: arale robot-noindex: no robot-host: * robot-from: no robot-useragent: no robot-language: java robot-description: A java multithreaded web spider. Download entire web sites or specific resources from the web. Render dynamic sites to static pages. robot-history: This is brand new. robot-environment: hobby modified-date: Thu, 09 Jan 2001 17:28:52 GMT modified-by: Flavio Tordini robot-id: araneo robot-name: Araneo robot-cover-url: http://esperantisto.net robot-details-url: http://esperantisto.net/araneo/ robot-owner-name: Arto Sarle robot-owner-url: http://esperantisto.net robot-owner-email: araneo@esperantisto.net robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: araneo robot-noindex: yes robot-nofollow: yes robot-host: *.esperantisto.net robot-from: yes robot-useragent: Araneo/0.7 (araneo@esperantisto.net; http://esperantisto.net) robot-language: Python, Java robot-description: Araneo is a web robot developed for crawling and indexing web pages written in the international language Esperanto. The database will be used to build a web search engine and auxiliary services to be published at esperantisto.net. robot-history: (The name Araneo means "spider" in Esperanto.) robot-environment: hobby, research modified-date: Fri, 16 Nov 2001 08:30:00 GMT modified-by: Arto Sarle robot-id: araybot robot-name: AraybOt robot-cover-url: http://www.araykoo.com/ robot-details-url: http://www.araykoo.com/araybot.html robot-owner-name: Guti robot-owner-url: http://www.araykoo.com/ robot-owner-email: robot@araykoo.com robot-status: active robot-purpose: indexing maintenance robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: AraybOt robot-noindex: yes robot-host: * robot-from: no robot-useragent: AraybOt/1.0 (+http://www.araykoo.com/araybot.html) robot-language: perl5 robot-description: AraybOt is the agent software of AraykOO! which crawls web sites listed in http://dmoz.org/Adult/, in order to build a adult search engine. robot-history: robot-environment: service modified-date: Sat, 19 Jun 2004 20:25:00 GMT+1 modified-by: Guti robot-id: architext robot-name: ArchitextSpider robot-cover-url: http://www.excite.com/ robot-details-url: robot-owner-name: Architext Software robot-owner-url: http://www.atext.com/spider.html robot-owner-email: spider@atext.com robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: *.atext.com robot-from: yes robot-useragent: ArchitextSpider robot-language: perl 5 and c robot-description: Its purpose is to generate a Resource Discovery database, and to generate statistics. The ArchitextSpider collects information for the Excite and WebCrawler search engines. robot-history: robot-environment: modified-date: Tue Oct 3 01:10:26 1995 modified-by: robot-id: aretha robot-name: Aretha robot-cover-url: robot-details-url: robot-owner-name: Dave Weiner robot-owner-url: http://www.hotwired.com/Staff/userland/ robot-owner-email: davew@well.com robot-status: robot-purpose: robot-type: robot-platform: Macintosh robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: robot-from: robot-useragent: robot-language: robot-description: A crude robot built on top of Netscape and Userland Frontier, a scripting system for Macs robot-history: robot-environment: modified-date: modified-by: robot-id: ariadne robot-name: ARIADNE robot-cover-url: (forthcoming) robot-details-url: (forthcoming) robot-owner-name: Mr. Matthias H. Gross robot-owner-url: http://www.lrz-muenchen.de/~gross/ robot-owner-email: Gross@dbs.informatik.uni-muenchen.de robot-status: development robot-purpose: statistics, development of focused crawling strategies robot-type: standalone robot-platform: java robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ariadne robot-noindex: no robot-host: dbs.informatik.uni-muenchen.de robot-from: no robot-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-language: java robot-description: The ARIADNE robot is a prototype of a environment for testing focused crawling strategies. robot-history: This robot is part of a research project at the University of Munich (LMU), started in 2000. robot-environment: research modified-date: Mo, 13 Mar 2000 14:00:00 GMT modified-by: Mr. Matthias H. Gross robot-id:arks robot-name:arks robot-cover-url:http://www.dpsindia.com robot-details-url:http://www.dpsindia.com robot-owner-name:Aniruddha Choudhury robot-owner-url: robot-owner-email:aniruddha.c@usa.net robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:PLATFORM INDEPENDENT robot-availability:data robot-exclusion:yes robot-exclusion-useragent:arks robot-noindex:no robot-host:dpsindia.com robot-from:no robot-useragent:arks/1.0 robot-language:Java 1.2 robot-description:The Arks robot is used to build the database for the dpsindia/lawvistas.com search service . The robot runs weekly, and visits sites in a random order robot-history:finds its root from s/w development project for a portal robot-environment:commercial modified-date:6 th November 2000 modified-by:Aniruddha Choudhury robot-id: aspider robot-name: ASpider (Associative Spider) robot-cover-url: robot-details-url: robot-owner-name: Fred Johansen robot-owner-url: http://www.pvv.ntnu.no/~fredj/ robot-owner-email: fredj@pvv.ntnu.no robot-status: retired robot-purpose: indexing robot-type: robot-platform: unix robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: nova.pvv.unit.no robot-from: yes robot-useragent: ASpider/0.09 robot-language: perl4 robot-description: ASpider is a CGI script that searches the web for keywords given by the user through a form. robot-history: robot-environment: hobby modified-date: modified-by: robot-id: atn.txt robot-name: ATN Worldwide robot-details-url: robot-cover-url: robot-owner-name: All That Net robot-owner-url: http://www.allthatnet.com robot-owner-email: info@allthatnet.com robot-status: active robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: ATN_Worldwide robot-noindex: robot-nofollow: robot-host: www.allthatnet.com robot-from: robot-useragent: ATN_Worldwide robot-language: robot-description: The ATN robot is used to build the database for the AllThatNet search service operated by All That Net. The robot runs weekly, and visits sites in a random order. robot-history: robot-environment: modified-date: July 09, 2000 17:43 GMT robot-id: atomz robot-name: Atomz.com Search Robot robot-cover-url: http://www.atomz.com/help/ robot-details-url: http://www.atomz.com/ robot-owner-name: Mike Thompson robot-owner-url: http://www.atomz.com/ robot-owner-email: mike@atomz.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: service robot-exclusion: yes robot-exclusion-useragent: Atomz robot-noindex: yes robot-host: www.atomz.com robot-from: no robot-useragent: Atomz/1.0 robot-language: c robot-description: Robot used for web site search service. robot-history: Developed for Atomz.com, launched in 1999. robot-environment: service modified-date: Tue Jul 13 03:50:06 GMT 1999 modified-by: Mike Thompson robot-id: auresys robot-name: AURESYS robot-cover-url: http://crrm.univ-mrs.fr robot-details-url: http://crrm.univ-mrs.fr robot-owner-name: Mannina Bruno robot-owner-url: ftp://crrm.univ-mrs.fr/pub/CVetud/Etudiants/Mannina/CVbruno.htm robot-owner-email: mannina@crrm.univ-mrs.fr robot-status: robot actively in use robot-purpose: indexing,statistics robot-type: Standalone robot-platform: Aix, Unix robot-availability: Protected by Password robot-exclusion: Yes robot-exclusion-useragent: robot-noindex: no robot-host: crrm.univ-mrs.fr, 192.134.99.192 robot-from: Yes robot-useragent: AURESYS/1.0 robot-language: Perl 5.001m robot-description: The AURESYS is used to build a personnal database for somebody who search information. The database is structured to be analysed. AURESYS can found new server by IP incremental. It generate statistics... robot-history: This robot finds its roots in a research project at the University of Marseille in 1995-1996 robot-environment: used for Research modified-date: Mon, 1 Jul 1996 14:30:00 GMT modified-by: Mannina Bruno robot-id: backrub robot-name: BackRub robot-cover-url: robot-details-url: robot-owner-name: Larry Page robot-owner-url: http://backrub.stanford.edu/ robot-owner-email: page@leland.stanford.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: *.stanford.edu robot-from: yes robot-useragent: BackRub/*.* robot-language: Java. robot-description: robot-history: robot-environment: modified-date: Wed Feb 21 02:57:42 1996. modified-by: robot-id: robot-name: bayspider robot-cover-url: http://www.baytsp.com/ robot-details-url: http://www.baytsp.com/ robot-owner-name: BayTSP.com,Inc robot-owner-url: robot-owner-email: marki@baytsp.com robot-status: Active robot-purpose: Copyright Infringement Tracking robot-type: Stand Alone robot-platform: NT robot-availability: 24/7 robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: robot-from: robot-useragent: BaySpider robot-language: English robot-description: robot-history: robot-environment: modified-date: 1/15/2001 modified-by: Marki@baytsp.com robot-id: bbot robot-name: BBot robot-cover-url: http://www.otthon.net/search robot-details-url: http://www.otthon.net/search/bbot robot-owner-name: Istvan Fulop robot-owner-url: http://www.otthon.net robot-owner-email: poluf1 at yahoo dot co dot uk robot-status: development robot-purpose: indexing, maintenance robot-type: standalone robot-platform: windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: bbot robot-noindex: yes robot-nofollow: yes robot-host: *.netcologne.de robot-from: yes robot-useragent: bbot/0.100 robot-language: perl robot-description: Mainly intended for site level search, sometimes set loose. robot-history: Started project in 11/2000. Called BBot since 24/04/2003. robot-environment: hobby modified-date: Sun, 04 May 2003 10:15:00 GMT modified-by: Istvan Fulop robot-id: bigbrother robot-name: Big Brother robot-cover-url: http://pauillac.inria.fr/~fpottier/mac-soft.html.en robot-details-url: robot-owner-name: Francois Pottier robot-owner-url: http://pauillac.inria.fr/~fpottier/ robot-owner-email: Francois.Pottier@inria.fr robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: mac robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: not as of 1.0 robot-useragent: Big Brother robot-language: c++ robot-description: Macintosh-hosted link validation tool. robot-history: robot-environment: shareware modified-date: Thu Sep 19 18:01:46 MET DST 1996 modified-by: Francois Pottier robot-id: bjaaland robot-name: Bjaaland robot-cover-url: http://www.textuality.com robot-details-url: http://www.textuality.com robot-owner-name: Tim Bray robot-owner-url: http://www.textuality.com robot-owner-email: tbray@textuality.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Bjaaland robot-noindex: no robot-host: barry.bitmovers.net robot-from: no robot-useragent: Bjaaland/0.5 robot-language: perl5 robot-description: Crawls sites listed in the ODP (see http://dmoz.org) robot-history: None, yet robot-environment: service modified-date: Monday, 19 July 1999, 13:46:00 PDT modified-by: tbray@textuality.com robot-id: blackwidow robot-name: BlackWidow robot-cover-url: http://140.190.65.12/~khooghee/index.html robot-details-url: robot-owner-name: Kevin Hoogheem robot-owner-url: robot-owner-email: khooghee@marys.smumn.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: 140.190.65.* robot-from: yes robot-useragent: BlackWidow robot-language: C, C++. robot-description: Started as a research project and now is used to find links for a random link generator. Also is used to research the growth of specific sites. robot-history: robot-environment: modified-date: Fri Feb 9 00:11:22 1996. modified-by: robot-id: blindekuh robot-name: Die Blinde Kuh robot-cover-url: http://www.blinde-kuh.de/ robot-details-url: http://www.blinde-kuh.de/robot.html (german language) robot-owner-name: Stefan R. Mueller robot-owner-url: http://www.rrz.uni-hamburg.de/philsem/stefan_mueller/ robot-owner-email:maschinist@blinde-kuh.de robot-status: development robot-purpose: indexing robot-type: browser robot-platform: unix robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: minerva.sozialwiss.uni-hamburg.de robot-from: yes robot-useragent: Die Blinde Kuh robot-language: perl5 robot-description: The robot is use for indixing and proofing the registered urls in the german language search-engine for kids. Its a none-comercial one-woman-project of Birgit Bachmann living in Hamburg, Germany. robot-history: The robot was developed by Stefan R. Mueller to help by the manual proof of registered Links. robot-environment: hobby modified-date: Mon Jul 22 1998 modified-by: Stefan R. Mueller robot-id:Bloodhound robot-name:Bloodhound robot-cover-url:http://web.ukonline.co.uk/genius/bloodhound.htm robot-details-url:http://web.ukonline.co.uk/genius/bloodhound.htm robot-owner-name:Dean Smart robot-owner-url:http://web.ukonline.co.uk/genius/bloodhound.htm robot-owner-email:genius@ukonline.co.uk robot-status:active robot-purpose:Web Site Download robot-type:standalone robot-platform:Windows95, WindowsNT, Windows98, Windows2000 robot-availability:Executible robot-exclusion:No robot-exclusion-useragent:Ukonline robot-noindex:No robot-host:* robot-from:No robot-useragent:None robot-language:Perl5 robot-description:Bloodhound will download an whole web site depending on the number of links to follow specified by the user. robot-history:First version was released on the 1 july 2000 robot-environment:Commercial modified-date:1 july 2000 modified-by:Dean Smart robot-id: borg-bot robot-name: Borg-Bot robot-cover-url: robot-details-url: http://www.skunkfarm.com/borgbot.htm robot-owner-name: James Bragg robot-owner-url: http://www.skunkfarm.com robot-owner-email: botdev@skunkfarm.com robot-status: development robot-purpose: indexing statistics robot-type: standalone robot-platform: Linux Windows2000 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: borg-bot/0.9 robot-noindex: yes robot-host: 24.11.13.173 robot-from: yes robot-useragent: borg-bot/0.9 robot-language: python robot-description: Developmental crawler to feed a search engine robot-history: robot-environment: research service modified-date: Sat, 20 Oct 2001 04:00:00 GMT modified-by: Sat, 20 Oct 2001 04:00:00 GMT robot-id: boxseabot robot-name: BoxSeaBot robot-cover-url: http://www.boxsea.com/crawler robot-details-url: http://www.boxsea.com/crawler robot-owner-name: BoxSea Search Engine robot-owner-url: http://www.boxsea.com robot-owner-email: boxseasearch@yahoo.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: linux robot-availability: robot-exclusion: yes robot-exclusion-useragent: boxseabot robot-noindex: robot-host: robot-from: robot-useragent: BoxSeaBot/0.5 (http://boxsea.com/crawler) robot-language: java robot-description: This robot is used to find pages for building the BoxSea search engine indices. robot-history: The robot code uses Nutch. Earlier experimental crawls were done under various user agent names such as NutchCVS(boxsea) robot-environment: modified-date: Fri, 23 Jul 2004 11:58:00 PST modified-by: BoxSeaBot robot-id: brightnet robot-name: bright.net caching robot robot-cover-url: robot-details-url: robot-owner-name: robot-owner-url: robot-owner-email: robot-status: active robot-purpose: caching robot-type: robot-platform: robot-availability: none robot-exclusion: no robot-noindex: robot-host: 209.143.1.46 robot-from: no robot-useragent: Mozilla/3.01 (compatible;) robot-language: robot-description: robot-history: robot-environment: modified-date: Fri Nov 13 14:08:01 EST 1998 modified-by: brian d foy robot-id: bspider robot-name: BSpider robot-cover-url: not yet robot-details-url: not yet robot-owner-name: Yo Okumura robot-owner-url: not yet robot-owner-email: okumura@rsl.crl.fujixerox.co.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: bspider robot-noindex: yes robot-host: 210.159.73.34, 210.159.73.35 robot-from: yes robot-useragent: BSpider/1.0 libwww-perl/0.40 robot-language: perl robot-description: BSpider is crawling inside of Japanese domain for indexing. robot-history: Starts Apr 1997 in a research project at Fuji Xerox Corp. Research Lab. robot-environment: research modified-date: Mon, 21 Apr 1997 18:00:00 JST modified-by: Yo Okumura robot-id: cactvschemistryspider robot-name: CACTVS Chemistry Spider robot-cover-url: http://schiele.organik.uni-erlangen.de/cactvs/spider.html robot-details-url: robot-owner-name: W. D. Ihlenfeldt robot-owner-url: http://schiele.organik.uni-erlangen.de/cactvs/ robot-owner-email: wdi@eros.ccc.uni-erlangen.de robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: utamaro.organik.uni-erlangen.de robot-from: no robot-useragent: CACTVS Chemistry Spider robot-language: TCL, C robot-description: Locates chemical structures in Chemical MIME formats on WWW and FTP servers and downloads them into database searchable with structure queries (substructure, fullstructure, formula, properties etc.) robot-history: robot-environment: modified-date: Sat Mar 30 00:55:40 1996. modified-by: robot-id: calif robot-name: Calif robot-details-url: http://www.tnps.dp.ua/calif/details.html robot-cover-url: http://www.tnps.dp.ua/calif/ robot-owner-name: Alexander Kosarev robot-owner-url: http://www.tnps.dp.ua/~dark/ robot-owner-email: kosarev@tnps.net robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: calif robot-noindex: yes robot-host: cobra.tnps.dp.ua robot-from: yes robot-useragent: Calif/0.6 (kosarev@tnps.net; http://www.tnps.dp.ua) robot-language: c++ robot-description: Used to build searchable index robot-history: In development stage robot-environment: research modified-date: Sun, 6 Jun 1999 13:25:33 GMT robot-id: cassandra robot-name: Cassandra robot-cover-url: http://post.mipt.rssi.ru/~billy/search/ robot-details-url: http://post.mipt.rssi.ru/~billy/search/ robot-owner-name: Mr. Oleg Bilibin robot-owner-url: http://post.mipt.rssi.ru/~billy/ robot-owner-email: billy168@aha.ru robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: crossplatform robot-availability: none robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: www.aha.ru robot-from: no robot-useragent: robot-language: java robot-description: Cassandra search robot is used to create and maintain indexed database for widespread Information Retrieval System robot-history: Master of Science degree project at Moscow Institute of Physics and Technology robot-environment: research modified-date: Wed, 3 Jun 1998 12:00:00 GMT robot-id: cgireader robot-name: Digimarc Marcspider/CGI robot-cover-url: http://www.digimarc.com/prod_fam.html robot-details-url: http://www.digimarc.com/prod_fam.html robot-owner-name: Digimarc Corporation robot-owner-url: http://www.digimarc.com robot-owner-email: wmreader@digimarc.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: 206.102.3.* robot-from: robot-useragent: Digimarc CGIReader/1.0 robot-language: c++ robot-description: Similar to Digimarc Marcspider, Marcspider/CGI examines image files for watermarks but more focused on CGI Urls. In order to not waste internet bandwidth with yet another crawler, we have contracted with one of the major crawlers/seach engines to provide us with a list of specific CGI URLs of interest to us. If an URL is to a page of interest (via CGI), then we access the page to get the image URLs from it, but we do not crawl to any other pages. robot-history: First operation in December 1997 robot-environment: service modified-date: Fri, 5 Dec 1997 12:00:00 GMT modified-by: Dan Ramos robot-id: checkbot robot-name: Checkbot robot-cover-url: http://www.xs4all.nl/~graaff/checkbot/ robot-details-url: robot-owner-name: Hans de Graaff robot-owner-url: http://www.xs4all.nl/~graaff/checkbot/ robot-owner-email: graaff@xs4all.nl robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix,WindowsNT robot-availability: source robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: Checkbot/x.xx LWP/5.x robot-language: perl 5 robot-description: Checkbot checks links in a given set of pages on one or more servers. It reports links which returned an error code robot-history: robot-environment: hobby modified-date: Tue Jun 25 07:44:00 1996 modified-by: Hans de Graaff robot-id: christcrawler robot-name: ChristCrawler.com robot-cover-url: http://www.christcrawler.com/search.cfm robot-details-url: http://www.christcrawler.com/index.cfm robot-owner-name: Jeremy DeYoung robot-owner-url: http://www.christcentral.com/aboutus/index.cfm robot-owner-email: jeremy.deyoung@christcentral.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Windows NT 4.0 SP5 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: christcrawler robot-noindex: yes robot-host: 64.51.218.*, 64.51.219.*, 12.107.236.*, 12.107.237.* robot-from: yes robot-useragent: Mozilla/4.0 (compatible; ChristCrawler.com, ChristCrawler@ChristCENTRAL.com) robot-language: Cold Fusion 4.5 robot-description: A Christian internet spider that searches web sites to find Christian Related material robot-history: Developed because of the growing need for a more God influence on the Internet. robot-environment: service modified-date: Fri, 27 Jun 2001 00:53:12 CST modified-by: Jeremy DeYoung robot-id: churl robot-name: churl robot-cover-url: http://www-personal.engin.umich.edu/~yunke/scripts/churl/ robot-details-url: robot-owner-name: Justin Yunke robot-owner-url: http://www-personal.engin.umich.edu/~yunke/ robot-owner-email: yunke@umich.edu robot-status: robot-purpose: maintenance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: A URL checking robot, which stays within one step of the local server robot-history: robot-environment: modified-date: modified-by: robot-id: cienciaficcion robot-name: cIeNcIaFiCcIoN.nEt robot-cover-url: http://www.cienciaficcion.net/ robot-details-url: http://www.cienciaficcion.net/ robot-owner-name: David Fernández robot-owner-url: http://www.cyberdark.net/ robot-owner-email: root@cyberdark.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: linux robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: yes robot-host: epervier.cqhost.net robot-from: no robot-useragent: cIeNcIaFiCcIoN.nEt Spider (http://www.cienciaficcion.net) robot-language: php,perl robot-description: Robot encargado de la indexación de las páginas para www.cienciaficcion.net robot-history: Alcorkón (Madrid) - Europa 2000/2001 robot-environment: hobby modified-date: Sat, 18 Aug 2001 00:38:52 GMT modified-by: David Fernández robot-id: cmc robot-name: CMC/0.01 robot-details-url: http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=robot robot-cover-url: http://www2.next.ne.jp/music/ robot-owner-name: Shinobu Kubota. robot-owner-url: http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=profile robot-owner-email: shinobu@po.next.ne.jp robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: CMC/0.01 robot-noindex: no robot-host: haruna.next.ne.jp, 203.183.218.4 robot-from: yes robot-useragent: CMC/0.01 robot-language: perl5 robot-description: This CMC/0.01 robot collects the information of the page that was registered to the music specialty searching service. robot-history: This CMC/0.01 robot was made for the computer music center on November 4, 1997. robot-environment: hobby modified-date: Sat, 23 May 1998 17:22:00 GMT robot-id:Collective robot-name:Collective robot-cover-url:http://web.ukonline.co.uk/genius/collective.htm robot-details-url:http://web.ukonline.co.uk/genius/collective.htm robot-owner-name:Dean Smart robot-owner-url:http://web.ukonline.co.uk/genius/collective.htm robot-owner-email:genius@ukonline.co.uk robot-status:development robot-purpose:Collective is a highly configurable program designed to interrogate online search engines and online databases, it will ignore web pages that lie about there content, and dead url's, it can be super strict, it searches each web page it finds for your search terms to ensure those terms are present, any positive urls are added to a html file for your to view at any time even before the program has finished. Collective can wonder the web for days if required. robot-type:standalone robot-platform:Windows95, WindowsNT, Windows98, Windows2000 robot-availability:Executible robot-exclusion:No robot-exclusion-useragent: robot-noindex:No robot-host:* robot-from:No robot-useragent:LWP robot-language:Perl5 (With Visual Basic front-end) robot-description:Collective is the most cleverest Internet search engine, With all found url?s guaranteed to have your search terms. robot-history:Develpment started on August, 03, 2000 robot-environment:Commercial modified-date:August, 03, 2000 modified-by:Dean Smart robot-id: combine robot-name: Combine System robot-cover-url: http://www.ub2.lu.se/~tsao/combine.ps robot-details-url: http://www.ub2.lu.se/~tsao/combine.ps robot-owner-name: Yong Cao robot-owner-url: http://www.ub2.lu.se/ robot-owner-email: tsao@munin.ub2.lu.se robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: combine robot-noindex: no robot-host: *.ub2.lu.se robot-from: yes robot-useragent: combine/0.0 robot-language: c, perl5 robot-description: An open, distributed, and efficient harvester. robot-history: A complete re-design of the NWI robot (w3index) for DESIRE project. robot-environment: research modified-date: Tue, 04 Mar 1997 16:11:40 GMT modified-by: Yong Cao robot-id: confuzzledbot robot-name: ConfuzzledBot robot-cover-url: http://www.blue.lu/ robot-details-url: http://bot.confuzzled.lu/ robot-owner-name: Britz Thibaut robot-owner-url: http://www.confuzzled.lu/ robot-owner-email: bot@confuzzled.lu robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: Linux,Freebsd robot-availability: none robot-exclusion: yes robot-exclusion-useragent: confuzzledbot robot-noindex: yes robot-nofollow: yes robot-host: *.ion.lu robot-from: no robot-useragent: Confuzzledbot/X.X (+http://www.confuzzled.lu/bot/) robot-language: perl5 robot-description: The robot is used to build a searchable database for luxembourgish sites. It only indexes .lu domains and luxembourgish sites added to the directory. robot-history: Developed 2000-2002. Only minor changes recently robot-environment: hobby modified-date: Tue, 11 May 2004 17:45:00 CET modified-by: Britz Thibaut robot-id: coolbot robot-name: CoolBot robot-cover-url: www.suchmaschine21.de robot-details-url: www.suchmaschine21.de robot-owner-name: Stefan Fischerlaender robot-owner-url: www.suchmaschine21.de robot-owner-email: info@suchmaschine21.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: CoolBot robot-noindex: yes robot-host: www.suchmaschine21.de robot-from: no robot-useragent: CoolBot robot-language: perl5 robot-description: The CoolBot robot is used to build and maintain the directory of the german search engine Suchmaschine21. robot-history: none so far robot-environment: service modified-date: Wed, 21 Jan 2001 12:16:00 GMT modified-by: Stefan Fischerlaender robot-id: core robot-name: Web Core / Roots robot-cover-url: http://www.di.uminho.pt/wc robot-details-url: robot-owner-name: Jorge Portugal Andrade robot-owner-url: http://www.di.uminho.pt/~cbm robot-owner-email: wc@di.uminho.pt robot-status: robot-purpose: indexing, maintenance robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: shiva.di.uminho.pt, from www.di.uminho.pt robot-from: no robot-useragent: root/0.1 robot-language: perl robot-description: Parallel robot developed in Minho Univeristy in Portugal to catalog relations among URLs and to support a special navigation aid. robot-history: First versions since October 1995. robot-environment: modified-date: Wed Jan 10 23:19:08 1996. modified-by: robot-id: cosmos robot-name: XYLEME Robot robot-cover-url: http://xyleme.com/ robot-details-url: robot-owner-name: Mihai Preda robot-owner-url: http://www.mihaipreda.com/ robot-owner-email: preda@xyleme.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: cosmos robot-noindex: no robot-nofollow: no robot-host: robot-from: yes robot-useragent: cosmos/0.3 robot-language: c++ robot-description: index XML, follow HTML robot-history: robot-environment: service modified-date: Fri, 24 Nov 2000 00:00:00 GMT modified-by: Mihai Preda robot-id: cruiser robot-name: Internet Cruiser Robot robot-cover-url: http://www.krstarica.com/ robot-details-url: http://www.krstarica.com/eng/url/ robot-owner-name: Internet Cruiser robot-owner-url: http://www.krstarica.com/ robot-owner-email: robot@krstarica.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Internet Cruiser Robot robot-noindex: yes robot-host: *.krstarica.com robot-from: no robot-useragent: Internet Cruiser Robot/2.1 robot-language: c++ robot-description: Internet Cruiser Robot is Internet Cruiser's prime index agent. robot-history: robot-environment: service modified-date: Fri, 17 Jan 2001 12:00:00 GMT modified-by: tech@krstarica.com robot-id: cusco robot-name: Cusco robot-cover-url: http://www.cusco.pt/ robot-details-url: http://www.cusco.pt/ robot-owner-name: Filipe Costa Clerigo robot-owner-url: http://www.viatecla.pt/ robot-owner-email: clerigo@viatecla.pt robot-status: active robot-purpose: indexing robot-type: standlone robot-platform: any robot-availability: none robot-exclusion: yes robot-exclusion-useragent: cusco robot-noindex: yes robot-host: *.cusco.pt, *.viatecla.pt robot-from: yes robot-useragent: Cusco/3.2 robot-language: Java robot-description: The Cusco robot is part of the CUCE indexing sistem. It gathers information from several sources: HTTP, Databases or filesystem. At this moment, it's universe is the .pt domain and the information it gathers is available at the Portuguese search engine Cusco http://www.cusco.pt/. robot-history: The Cusco search engine started in the company ViaTecla as a project to demonstrate our development capabilities and to fill the need of a portuguese-specific search engine. Now, we are developping new functionalities that cannot be found in any other on-line search engines. robot-environment:service, research modified-date: Mon, 21 Jun 1999 14:00:00 GMT modified-by: Filipe Costa Clerigo robot-id: cyberspyder robot-name: CyberSpyder Link Test robot-cover-url: http://www.cyberspyder.com/cslnkts1.html robot-details-url: http://www.cyberspyder.com/cslnkts1.html robot-owner-name: Tom Aman robot-owner-url: http://www.cyberspyder.com/ robot-owner-email: amant@cyberspyder.com robot-status: active robot-purpose: link validation, some html validation robot-type: standalone robot-platform: windows 3.1x, windows95, windowsNT robot-availability: binary robot-exclusion: user configurable robot-exclusion-useragent: cyberspyder robot-noindex: no robot-host: * robot-from: no robot-useragent: CyberSpyder/2.1 robot-language: Microsoft Visual Basic 4.0 robot-description: CyberSpyder Link Test is intended to be used as a site management tool to validate that HTTP links on a page are functional and to produce various analysis reports to assist in managing a site. robot-history: The original robot was created to fill a widely seen need for a easy to use link checking program. robot-environment: commercial modified-date: Tue, 31 Mar 1998 01:02:00 GMT modified-by: Tom Aman robot-id: cydralspider robot-name: CydralSpider robot-cover-url: http://www.cydral.com/ robot-details-url: http://en.cydral.com/help.html robot-owner-name: Cydral robot-owner-url: http://www.cydral.com/ robot-owner-email: cydral@cydral.com robot-status: active robot-purpose: gather Web content for image search engine service robot-type: standalone robot-platform: unix; windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: cydralspider robot-noindex: yes robot-host: *.cydral.com robot-from: yes robot-useragent: CydralSpider/X.X (Cydral Web Image Search; http://www.cydral.com/) robot-language: c++ robot-description: Advanced image spider for www.cydral.com robot-history: Developped in 2003, the robot uses new methods to discover Web sites and index images robot-environment: commercial modified-date: Tue, 17 Jun 2004, 11:50:30 GMT modified-by: cydral@cydral.com robot-id: desertrealm robot-name: Desert Realm Spider robot-cover-url: http://www.desertrealm.com robot-details-url: http://spider.desertrealm.com robot-owner-name: Brian B. robot-owner-url: http://www.desertrealm.com robot-owner-email: spider@desertrealm.com robot-status: robot actively in use robot-purpose: indexing robot-type: standalone robot-platform: cross platform robot-availability: none robot-exclusion: yes robot-exclusion-useragent: desertrealm, desert realm robot-noindex: yes robot-nofollow: yes robot-host: * robot-from: no robot-useragent: DesertRealm.com; 0.2; [J]; robot-language: java 1.3, java 1.4 robot-description: The spider indexes fantasy and science fiction sites by using a customizable keyword algorithm. Only home pages are indexed, but all pages are looked at for links. Pages are visited randomly to limit impact on any one webserver. robot-history: The spider originally was created to learn more about how search engines work. robot-environment: hobby modified-date: Fri, 19 Sep 2003 08:57:52 GMT modified-by: Brian B. robot-id: deweb robot-name: DeWeb(c) Katalog/Index robot-cover-url: http://deweb.orbit.de/ robot-details-url: robot-owner-name: Marc Mielke robot-owner-url: http://www.orbit.de/ robot-owner-email: dewebmaster@orbit.de robot-status: robot-purpose: indexing, mirroring, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: deweb.orbit.de robot-from: yes robot-useragent: Deweb/1.01 robot-language: perl 4 robot-description: Its purpose is to generate a Resource Discovery database, perform mirroring, and generate statistics. Uses combination of Informix(tm) Database and WN 1.11 serversoftware for indexing/ressource discovery, fulltext search, text excerpts. robot-history: robot-environment: modified-date: Wed Jan 10 08:23:00 1996 modified-by: robot-id: dienstspider robot-name: DienstSpider robot-cover-url: http://sappho.csi.forth.gr:22000/ robot-details-url: robot-owner-name: Antonis Sidiropoulos robot-owner-url: http://www.csi.forth.gr/~asidirop robot-owner-email: asidirop@csi.forth.gr robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: sappho.csi.forth.gr robot-from: robot-useragent: dienstspider/1.0 robot-language: C robot-description: Indexing and searching the NCSTRL(Networked Computer Science Technical Report Library) and ERCIM Collection robot-history: The version 1.0 was the developer's master thesis project robot-environment: research modified-date: Fri, 4 Dec 1998 0:0:0 GMT modified-by: asidirop@csi.forth.gr robot-id: digger robot-name: Digger robot-cover-url: http://www.diggit.com/ robot-details-url: robot-owner-name: Benjamin Lipchak robot-owner-url: robot-owner-email: admin@bulldozersoftware.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix, windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: digger robot-noindex: yes robot-host: robot-from: yes robot-useragent: Digger/1.0 JDK/1.3.0 robot-language: java robot-description: indexing web sites for the Diggit! search engine robot-history: robot-environment: service modified-date: modified-by: robot-id: diibot robot-name: Digital Integrity Robot robot-cover-url: http://www.digital-integrity.com/robotinfo.html robot-details-url: http://www.digital-integrity.com/robotinfo.html robot-owner-name: Digital Integrity, Inc. robot-owner-url: robot-owner-email: robot@digital-integrity.com robot-status: Production robot-purpose: WWW Indexing robot-type: robot-platform: unix robot-availability: none robot-exclusion: Conforms to robots.txt convention robot-exclusion-useragent: DIIbot robot-noindex: Yes robot-host: digital-integrity.com robot-from: robot-useragent: DIIbot robot-language: Java/C robot-description: robot-history: robot-environment: modified-date: modified-by: robot-id: directhit robot-name: Direct Hit Grabber robot-cover-url: www.directhit.com robot-details-url: http://www.directhit.com/about/company/spider.html robot-status: active robot-description: Direct Hit Grabber indexes documents and collects Web statistics for the Direct Hit Search Engine (available at www.directhit.com and our partners' sites) robot-purpose: Indexing and statistics robot-type: standalone robot-platform: unix robot-language: C++ robot-owner-name: Direct Hit Technologies, Inc. robot-owner-url: www.directhit.com robot-owner-email: DirectHitGrabber@directhit.com robot-exclusion: yes robot-exclusion-useragent: grabber robot-noindex: yes robot-host: *.directhit.com robot-from: yes robot-useragent: grabber robot-environment: service modified-by: grabber@directhit.com robot-id: dnabot robot-name: DNAbot robot-cover-url: http://xx.dnainc.co.jp/dnabot/ robot-details-url: http://xx.dnainc.co.jp/dnabot/ robot-owner-name: Tom Tanaka robot-owner-url: http://xx.dnainc.co.jp robot-owner-email: tomatell@xx.dnainc.co.jp robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix, windows, windows95, windowsNT, mac robot-availability: data robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: xx.dnainc.co.jp robot-from: yes robot-useragent: DNAbot/1.0 robot-language: java robot-description: A search robot in 100 java, with its own built-in database engine and web server . Currently in Japanese. robot-history: Developed by DNA, Inc.(Niigata City, Japan) in 1998. robot-environment: commercial modified-date: Mon, 4 Jan 1999 14:30:00 GMT modified-by: Tom Tanaka robot-id: download_express robot-name: DownLoad Express robot-cover-url: http://www.jacksonville.net/~dlxpress robot-details-url: http://www.jacksonville.net/~dlxpress robot-owner-name: DownLoad Express Inc robot-owner-url: http://www.jacksonville.net/~dlxpress robot-owner-email: dlxpress@mediaone.net robot-status: active robot-purpose: graphic download robot-type: standalone robot-platform: win95/98/NT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: downloadexpress robot-noindex: no robot-host: * robot-from: no robot-useragent: robot-language: visual basic robot-description: automatically downloads graphics from the web robot-history: robot-environment: commerical modified-date: Wed, 05 May 1998 modified-by: DownLoad Express Inc robot-id: dragonbot robot-name: DragonBot robot-cover-url: http://www.paczone.com/ robot-details-url: robot-owner-name: Paul Law robot-owner-url: robot-owner-email: admin@paczone.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: DragonBot robot-noindex: no robot-host: *.paczone.com robot-from: no robot-useragent: DragonBot/1.0 libwww/5.0 robot-language: C++ robot-description: Collects web pages related to East Asia robot-history: robot-environment: service modified-date: Mon, 11 Aug 1997 00:00:00 GMT modified-by: robot-id: dwcp robot-name: DWCP (Dridus' Web Cataloging Project) robot-cover-url: http://www.dridus.com/~rmm/dwcp.php3 robot-details-url: http://www.dridus.com/~rmm/dwcp.php3 robot-owner-name: Ross Mellgren (Dridus Norwind) robot-owner-url: http://www.dridus.com/~rmm robot-owner-email: rmm@dridus.com robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: java robot-availability: source, binary, data robot-exclusion: yes robot-exclusion-useragent: dwcp robot-noindex: no robot-host: *.dridus.com robot-from: dridus@dridus.com robot-useragent: DWCP/2.0 robot-language: java robot-description: The DWCP robot is used to gather information for Dridus' Web Cataloging Project, which is intended to catalog domains and urls (no content). robot-history: Developed from scratch by Dridus Norwind. robot-environment: hobby modified-date: Sat, 10 Jul 1999 00:05:40 GMT modified-by: Ross Mellgren robot-id: e-collector robot-name: e-collector robot-cover-url: http://www.thatrobotsite.com/agents/ecollector.htm robot-details-url: http://www.thatrobotsite.com/agents/ecollector.htm robot-owner-name: Dean Smart robot-owner-url: http://www.thatrobotsite.com robot-owner-email: smarty@thatrobotsite.com robot-status: Active robot-purpose: email collector robot-type: Collector of email addresses robot-platform: Windows 9*/NT/2000 robot-availability: Binary robot-exclusion: No robot-exclusion-useragent: ecollector robot-noindex: No robot-host: * robot-from: No robot-useragent: LWP:: robot-language: Perl5 robot-description: e-collector in the simplist terms is a e-mail address collector, thus the name e-collector. So what? Have you ever wanted to have the email addresses of as many companys that sell or supply for example "dried fruit", i personnaly don't but this is just an example. Those of you who may use this type of robot will know exactly what you can do with information, first don't spam with it, for those still not sure what this type of robot will do for you then take this for example: Your a international distributer of "dried fruit" and you boss has told you if you rise sales by 10% then he will bye you a new car (Wish i had a boss like that), well anyway there are thousands of shops distributers ect, that you could be doing business with but you don't know who they are?, because there in other countries or the nearest town but have never heard about them before. Has the penny droped yet, no well now you have the opertunity to find out who they are with an internet address and a person to contact in that company just by downloading and running e-collector. Plus it's free, you don't have to do any leg work just run the program and sit back and watch your potential customers arriving. robot-history: - robot-environment: Service modified-date: Weekly modified-by: Dean Smart robot-id:ebiness robot-name:EbiNess robot-cover-url:http://sourceforge.net/projects/ebiness robot-details-url:http://ebiness.sourceforge.net/ robot-owner-name:Mike Davis robot-owner-url:http://www.carisbrook.co.uk/mike robot-owner-email:mdavis@kieser.net robot-status:Pre-Alpha robot-purpose:statistics robot-type:standalone robot-platform:unix(Linux) robot-availability:Open Source robot-exclusion:yes robot-exclusion-useragent:ebiness robot-noindex:no robot-host: robot-from:no robot-useragent:EbiNess/0.01a robot-language:c++ robot-description:Used to build a url relationship database, to be viewed in 3D robot-history:Dreamed it up over some beers robot-environment:hobby modified-date:Mon, 27 Nov 2000 12:26:00 GMT modified-by:Mike Davis robot-id: eit robot-name: EIT Link Verifier Robot robot-cover-url: http://wsk.eit.com/wsk/dist/doc/admin/webtest/verify_links.html robot-details-url: robot-owner-name: Jim McGuire robot-owner-url: http://www.eit.com/people/mcguire.html robot-owner-email: mcguire@eit.COM robot-status: robot-purpose: maintenance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: robot-useragent: EIT-Link-Verifier-Robot/0.2 robot-language: robot-description: Combination of an HTML form and a CGI script that verifies links from a given starting point (with some controls to prevent it going off-site or limitless) robot-history: Announced on 12 July 1994 robot-environment: modified-date: modified-by: robot-id: elfinbot robot-name:ELFINBOT robot-cover-url:http://letsfinditnow.com robot-details-url:http://letsfinditnow.com/elfinbot.html robot-owner-name:Lets Find It Now Ltd robot-owner-url:http://letsfinditnow.com robot-owner-email:admin@letsfinditnow.com robot-status:Active robot-purpose:Indexing for the Lets Find It Now search Engine robot-type:Standalone robot-platform:Unix robot-availability:None robot-exclusion: yes robot-exclusion-useragent:elfinbot robot-noindex:yes robot-host:*.letsfinditnow.com robot-from:no robot-useragent:elfinbot robot-language:Perl5 robot-description:ELFIN is used to index and add data to the "Lets Find It Now Search Engine" (http://letsfinditnow.com). The robot runs every 30 days. robot-history: robot-environment: modified-date: modified-by: robot-id: emacs robot-name: Emacs-w3 Search Engine robot-cover-url: http://www.cs.indiana.edu/elisp/w3/docs.html robot-details-url: robot-owner-name: William M. Perry robot-owner-url: http://www.cs.indiana.edu/hyplan/wmperry.html robot-owner-email: wmperry@spry.com robot-status: retired robot-purpose: indexing robot-type: browser robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: yes robot-useragent: Emacs-w3/v[0-9\.]+ robot-language: lisp robot-description: Its purpose is to generate a Resource Discovery database This code has not been looked at in a while, but will be spruced up for the Emacs-w3 2.2.0 release sometime this month. It will honor the /robots.txt file at that time. robot-history: robot-environment: modified-date: Fri May 5 16:09:18 1995 modified-by: robot-id: emcspider robot-name: ananzi robot-cover-url: http://www.empirical.com/ robot-details-url: robot-owner-name: Hunter Payne robot-owner-url: http://www.psc.edu/~hpayne/ robot-owner-email: hpayne@u-media.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: bilbo.internal.empirical.com robot-from: yes robot-useragent: EMC Spider robot-language: java This spider is still in the development stages but, it will be hitting sites while I finish debugging it. robot-description: robot-history: robot-environment: modified-date: Wed May 29 14:47:01 1996. modified-by: robot-id: esculapio robot-name: esculapio robot-cover-url: http://esculapio.cype.com robot-details-url: http://esculapio.cype.com/details.htm robot-owner-name: CYPE Ingenieros robot-owner-url: http://www.cype.com robot-owner-email: imasd@cype.com robot-status: active robot-purpose: link validation robot-type: standalone robot-platform: linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: esculapio robot-noindex: yes robot-host: 80.34.92.45 robot-from: yes robot-useragent: esculapio/1.1 robot-language: C++ robot-description: Checks the integrity of the links between several domains. robot-history: First, a research project. Now, an internal tool. Next, ???. robot-environment: research, service modified-date: Mon, 6 Jun 2004 08:25 +1 GMT modified-by: robot-id: esther robot-name: Esther robot-details-url: http://search.falconsoft.com/ robot-cover-url: http://search.falconsoft.com/ robot-owner-name: Tim Gustafson robot-owner-url: http://www.falconsoft.com/ robot-owner-email: tim@falconsoft.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix (FreeBSD 2.2.8) robot-availability: data robot-exclusion: yes robot-exclusion-useragent: esther robot-noindex: no robot-host: *.falconsoft.com robot-from: yes robot-useragent: esther robot-language: perl5 robot-description: This crawler is used to build the search database at http://search.falconsoft.com/ robot-history: Developed by FalconSoft. robot-environment: service modified-date: Tue, 22 Dec 1998 00:22:00 PST robot-id: evliyacelebi robot-name: Evliya Celebi robot-cover-url: http://ilker.ulak.net.tr/EvliyaCelebi robot-details-url: http://ilker.ulak.net.tr/EvliyaCelebi robot-owner-name: Ilker TEMIR robot-owner-url: http://ilker.ulak.net.tr robot-owner-email: ilker@ulak.net.tr robot-status: development robot-purpose: indexing turkish content robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: N/A robot-noindex: no robot-nofollow: no robot-host: 193.140.83.* robot-from: ilker@ulak.net.tr robot-useragent: Evliya Celebi v0.151 - http://ilker.ulak.net.tr robot-language: perl5 robot-history: robot-description: crawles pages under ".tr" domain or having turkish character encoding (iso-8859-9 or windows-1254) robot-environment: hobby modified-date: Fri Mar 31 15:03:12 GMT 2000 robot-id: nzexplorer robot-name: nzexplorer robot-cover-url: http://nzexplorer.co.nz/ robot-details-url: robot-owner-name: Paul Bourke robot-owner-url: http://bourke.gen.nz/paul.html robot-owner-email: paul@bourke.gen.nz robot-status: active robot-purpose: indexing, statistics robot-type: standalone robot-platform: UNIX robot-availability: source (commercial) robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: bitz.co.nz robot-from: no robot-useragent: explorersearch robot-language: c++ robot-history: Started in 1995 to provide a comprehensive index to WWW pages within New Zealand. Now also used in Malaysia and other countries. robot-environment: service modified-date: Tues, 25 Jun 1996 modified-by: Paul Bourke robot-id: fastcrawler robot-name: FastCrawler robot-cover-url: http://www.1klik.dk/omos/ robot-details-url: http://www.1klik.dk/omos/ robot-owner-name: 1klik.dk A/S robot-owner-url: http://www.1klik.dk robot-owner-email: crawler@1klik.dk robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Windows 2000 Adv. Server robot-availability: none robot-exclusion: yes robot-exclusion-useragent: fastcrawler robot-noindex: yes robot-host: 1klik.dk robot-from: yes robot-useragent: FastCrawler 3.0.X (crawler@1klik.dk) - http://www.1klik.dk robot-language: C++ robot-description: FastCrawler is used to build the databases for search engines used by 1klik.dk and it's partners robot-history: Robot started in April 1999 robot-environment: commercial modified-date: 05-08-2001 modified-by: Kim Gam-Jensen robot-id:fdse robot-name:Fluid Dynamics Search Engine robot robot-cover-url:http://www.xav.com/scripts/search/ robot-details-url:http://www.xav.com/scripts/search/ robot-owner-name:Zoltan Milosevic robot-owner-url:http://www.xav.com/ robot-owner-email:zoltanm@nickname.net robot-status:active robot-purpose:indexing robot-type:standalone robot-platform:unix;windows robot-availability:source;data robot-exclusion:yes robot-exclusion-useragent:FDSE robot-noindex:yes robot-host:yes robot-from:* robot-useragent:Mozilla/4.0 (compatible: FDSE robot) robot-language:perl5 robot-description:Crawls remote sites as part of a shareware search engine program robot-history:Developed in late 1998 over three pots of coffee robot-environment:commercial modified-date:Fri, 21 Jan 2000 10:15:49 GMT modified-by:Zoltan Milosevic robot-id: felix robot-name: Felix IDE robot-cover-url: http://www.pentone.com robot-details-url: http://www.pentone.com robot-owner-name: The Pentone Group, Inc. robot-owner-url: http://www.pentone.com robot-owner-email: felix@pentone.com robot-status: active robot-purpose: indexing, statistics robot-type: standalone robot-platform: windows95, windowsNT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: FELIX IDE robot-noindex: yes robot-host: * robot-from: yes robot-useragent: FelixIDE/1.0 robot-language: visual basic robot-description: Felix IDE is a retail personal search spider sold by The Pentone Group, Inc. It supports the proprietary exclusion "Frequency: ??????????" in the robots.txt file. Question marks represent an integer indicating number of milliseconds to delay between document requests. This is called VDRF(tm) or Variable Document Retrieval Frequency. Note that users can re-define the useragent name. robot-history: This robot began as an in-house tool for the lucrative Felix IDS (Information Discovery Service) and has gone retail. robot-environment: service, commercial, research modified-date: Fri, 11 Apr 1997 19:08:02 GMT modified-by: Kerry B. Rogers robot-id: ferret robot-name: Wild Ferret Web Hopper #1, #2, #3 robot-cover-url: http://www.greenearth.com/ robot-details-url: robot-owner-name: Greg Boswell robot-owner-url: http://www.greenearth.com/ robot-owner-email: ghbos@postoffice.worldnet.att.net robot-status: robot-purpose: indexing maintenance statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Hazel's Ferret Web hopper, robot-language: C++, Visual Basic, Java robot-description: The wild ferret web hopper's are designed as specific agents to retrieve data from all available sources on the internet. They work in an onion format hopping from spot to spot one level at a time over the internet. The information is gathered into different relational databases, known as "Hazel's Horde". The information is publicly available and will be free for the browsing at www.greenearth.com. Effective date of the data posting is to be announced. robot-history: robot-environment: modified-date: Mon Feb 19 00:28:37 1996. modified-by: robot-id: fetchrover robot-name: FetchRover robot-cover-url: http://www.engsoftware.com/fetch.htm robot-details-url: http://www.engsoftware.com/spiders/ robot-owner-name: Dr. Kenneth R. Wadland robot-owner-url: http://www.engsoftware.com/ robot-owner-email: ken@engsoftware.com robot-status: active robot-purpose: maintenance, statistics robot-type: standalone robot-platform: Windows/NT, Windows/95, Solaris SPARC robot-availability: binary, source robot-exclusion: yes robot-exclusion-useragent: ESI robot-noindex: N/A robot-host: * robot-from: yes robot-useragent: ESIRover v1.0 robot-language: C++ robot-description: FetchRover fetches Web Pages. It is an automated page-fetching engine. FetchRover can be used stand-alone or as the front-end to a full-featured Spider. Its database can use any ODBC compliant database server, including Microsoft Access, Oracle, Sybase SQL Server, FoxPro, etc. robot-history: Used as the front-end to SmartSpider (another Spider product sold by Engineeering Software, Inc.) robot-environment: commercial, service modified-date: Thu, 03 Apr 1997 21:49:50 EST modified-by: Ken Wadland robot-id: fido robot-name: fido robot-cover-url: http://www.planetsearch.com/ robot-details-url: http://www.planetsearch.com/info/fido.html robot-owner-name: Steve DeJarnett robot-owner-url: http://www.planetsearch.com/staff/steved.html robot-owner-email: fido@planetsearch.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: fido robot-noindex: no robot-host: fido.planetsearch.com, *.planetsearch.com, 206.64.113.* robot-from: yes robot-useragent: fido/0.9 Harvest/1.4.pl2 robot-language: c, perl5 robot-description: fido is used to gather documents for the search engine provided in the PlanetSearch service, which is operated by the Philips Multimedia Center. The robots runs on an ongoing basis. robot-history: fido was originally based on the Harvest Gatherer, but has since evolved into a new creature. It still uses some support code from Harvest. robot-environment: service modified-date: Sat, 2 Nov 1996 00:08:18 GMT modified-by: Steve DeJarnett robot-id: finnish robot-name: Hämähäkki robot-cover-url: http://www.fi/search.html robot-details-url: http://www.fi/www/spider.html robot-owner-name: Timo Metsälä robot-owner-url: http://www.fi/~timo/ robot-owner-email: Timo.Metsala@www.fi robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-availability: no robot-exclusion: yes robot-exclusion-useragent: Hämähäkki robot-noindex: no robot-host: *.www.fi robot-from: yes robot-useragent: Hämähäkki/0.2 robot-language: C robot-description: Its purpose is to generate a Resource Discovery database from the Finnish (top-level domain .fi) www servers. The resulting database is used by the search engine at http://www.fi/search.html. robot-history: (The name Hämähäkki is just Finnish for spider.) robot-environment: modified-date: 1996-06-25 modified-by: Jaakko.Hyvatti@www.fi robot-id: fireball robot-name: KIT-Fireball robot-cover-url: http://www.fireball.de robot-details-url: http://www.fireball.de/technik.html (in German) robot-owner-name: Gruner + Jahr Electronic Media Service GmbH robot-owner-url: http://www.ems.guj.de robot-owner-email:info@fireball.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: KIT-Fireball robot-noindex: yes robot-host: *.fireball.de robot-from: yes robot-useragent: KIT-Fireball/2.0 libwww/5.0a robot-language: c robot-description: The Fireball robots gather web documents in German language for the database of the Fireball search service. robot-history: The robot was developed by Benhui Chen in a research project at the Technical University of Berlin in 1996 and was re-implemented by its developer in 1997 for the present owner. robot-environment: service modified-date: Mon Feb 23 11:26:08 1998 modified-by: Detlev Kalb robot-id: fish robot-name: Fish search robot-cover-url: http://www.win.tue.nl/bin/fish-search robot-details-url: robot-owner-name: Paul De Bra robot-owner-url: http://www.win.tue.nl/win/cs/is/debra/ robot-owner-email: debra@win.tue.nl robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: www.win.tue.nl robot-from: no robot-useragent: Fish-Search-Robot robot-language: c robot-description: Its purpose is to discover resources on the fly a version exists that is integrated into the Tübingen Mosaic 2.4.2 browser (also written in C) robot-history: Originated as an addition to Mosaic for X robot-environment: modified-date: Mon May 8 09:31:19 1995 modified-by: robot-id: fouineur robot-name: Fouineur robot-cover-url: http://fouineur.9bit.qc.ca/ robot-details-url: http://fouineur.9bit.qc.ca/informations.html robot-owner-name: Joel Vandal robot-owner-url: http://www.9bit.qc.ca/~jvandal/ robot-owner-email: jvandal@9bit.qc.ca robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix, windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: fouineur robot-noindex: no robot-host: * robot-from: yes robot-useragent: Mozilla/2.0 (compatible fouineur v2.0; fouineur.9bit.qc.ca) robot-language: perl5 robot-description: This robot build automaticaly a database that is used by our own search engine. This robot auto-detect the language (french, english & spanish) used in the HTML page. Each database record generated by this robot include: date, url, title, total words, title, size and de-htmlized text. Also support server-side and client-side IMAGEMAP. robot-history: No robots does all thing that we need for our usage. robot-environment: service modified-date: Thu, 9 Jan 1997 22:57:28 EST modified-by: jvandal@9bit.qc.ca robot-id: francoroute robot-name: Robot Francoroute robot-cover-url: robot-details-url: robot-owner-name: Marc-Antoine Parent robot-owner-url: http://www.crim.ca/~maparent robot-owner-email: maparent@crim.ca robot-status: robot-purpose: indexing, mirroring, statistics robot-type: browser robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: zorro.crim.ca robot-from: yes robot-useragent: Robot du CRIM 1.0a robot-language: perl5, sqlplus robot-description: Part of the RISQ's Francoroute project for researching francophone. Uses the Accept-Language tag and reduces demand accordingly robot-history: robot-environment: modified-date: Wed Jan 10 23:56:22 1996. modified-by: robot-id: freecrawl robot-name: Freecrawl robot-cover-url: http://euroseek.net/ robot-owner-name: Jesper Ekhall robot-owner-email: ekhall@freeside.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Freecrawl robot-noindex: no robot-host: *.freeside.net robot-from: yes robot-useragent: Freecrawl robot-language: c robot-description: The Freecrawl robot is used to build a database for the EuroSeek service. robot-environment: service robot-id: funnelweb robot-name: FunnelWeb robot-cover-url: http://funnelweb.net.au robot-details-url: robot-owner-name: David Eagles robot-owner-url: http://www.pc.com.au robot-owner-email: eaglesd@pc.com.au robot-status: robot-purpose: indexing, statisitics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: earth.planets.com.au robot-from: yes robot-useragent: FunnelWeb-1.0 robot-language: c and c++ robot-description: Its purpose is to generate a Resource Discovery database, and generate statistics. Localised South Pacific Discovery and Search Engine, plus distributed operation under development. robot-history: robot-environment: modified-date: Mon Nov 27 21:30:11 1995 modified-by: robot-id: gama robot-name: gammaSpider, FocusedCrawler robot-details-url: http://www.gammasite.com, http://www.gammasite.com/gammaSpider.html robot-cover-url: http://www.gammasite.com robot-owner-name: gammasite robot-owner-url: http://www.gammasite.com robot-owner-email: support@gammasite.com robot-status: active robot-purpose: indexing, maintenance robot-type: standalone robot-platform: unix, windows, windows95, windowsNT, linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gammaSpider robot-noindex: no robot-nofollow: no robot-host: * robot-from: no robot-useragent: gammaSpider xxxxxxx ()/ robot-language: c++ robot-description: Information gathering. Focused carwling on specific topic. Uses gammaFetcherServer Product for selling. RobotUserAgent may changed by the user. More features are being added. The product is constatnly under development. AKA FocusedCrawler robot-history: AKA FocusedCrawler robot-environment: service, commercial, research modified-date: Sun, 25 Mar 2001 18:49:52 GMT robot-id: gazz robot-name: gazz robot-cover-url: http://gazz.nttrd.com/ robot-details-url: http://gazz.nttrd.com/ robot-owner-name: NTT Cyberspace Laboratories robot-owner-url: http://gazz.nttrd.com/ robot-owner-email: gazz@nttrd.com robot-status: development robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gazz robot-noindex: yes robot-host: *.nttrd.com, *.infobee.ne.jp robot-from: yes robot-useragent: gazz/1.0 robot-language: c robot-description: This robot is used for research purposes. robot-history: Its root is TITAN project in NTT. robot-environment: research modified-date: Wed, 09 Jun 1999 10:43:18 GMT modified-by: noto@isl.ntt.co.jp robot-id: gcreep robot-name: GCreep robot-cover-url: http://www.instrumentpolen.se/gcreep/index.html robot-details-url: http://www.instrumentpolen.se/gcreep/index.html robot-owner-name: Instrumentpolen AB robot-owner-url: http://www.instrumentpolen.se/ip-kontor/eng/index.html robot-owner-email: anders@instrumentpolen.se robot-status: development robot-purpose: indexing robot-type: browser+standalone robot-platform: linux+mysql robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gcreep robot-noindex: yes robot-host: mbx.instrumentpolen.se robot-from: yes robot-useragent: gcreep/1.0 robot-language: c robot-description: Indexing robot to learn SQL robot-history: Spare time project begun late '96, maybe early '97 robot-environment: hobby modified-date: Fri, 23 Jan 1998 16:09:00 MET modified-by: Anders Hedstrom robot-id: getbot robot-name: GetBot robot-cover-url: http://www.blacktop.com.zav/bots robot-details-url: robot-owner-name: Alex Zavatone robot-owner-url: http://www.blacktop.com/zav robot-owner-email: zav@macromedia.com robot-status: robot-purpose: maintenance robot-type: standalone robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: robot-from: no robot-useragent: ??? robot-language: Shockwave/Director. robot-description: GetBot's purpose is to index all the sites it can find that contain Shockwave movies. It is the first bot or spider written in Shockwave. The bot was originally written at Macromedia on a hungover Sunday as a proof of concept. - Alex Zavatone 3/29/96 robot-history: robot-environment: modified-date: Fri Mar 29 20:06:12 1996. modified-by: robot-id: geturl robot-name: GetURL robot-cover-url: http://Snark.apana.org.au/James/GetURL/ robot-details-url: robot-owner-name: James Burton robot-owner-url: http://Snark.apana.org.au/James/ robot-owner-email: James@Snark.apana.org.au robot-status: robot-purpose: maintenance, mirroring robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: GetURL.rexx v1.05 robot-language: ARexx (Amiga REXX) robot-description: Its purpose is to validate links, perform mirroring, and copy document trees. Designed as a tool for retrieving web pages in batch mode without the encumbrance of a browser. Can be used to describe a set of pages to fetch, and to maintain an archive or mirror. Is not run by a central site and accessed by clients - is run by the end user or archive maintainer robot-history: robot-environment: modified-date: Tue May 9 15:13:12 1995 modified-by: robot-id: golem robot-name: Golem robot-cover-url: http://www.quibble.com/golem/ robot-details-url: http://www.quibble.com/golem/ robot-owner-name: Geoff Duncan robot-owner-url: http://www.quibble.com/geoff/ robot-owner-email: geoff@quibble.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: mac robot-availability: none robot-exclusion: yes robot-exclusion-useragent: golem robot-noindex: no robot-host: *.quibble.com robot-from: yes robot-useragent: Golem/1.1 robot-language: HyperTalk/AppleScript/C++ robot-description: Golem generates status reports on collections of URLs supplied by clients. Designed to assist with editorial updates of Web-related sites or products. robot-history: Personal project turned into a contract service for private clients. robot-environment: service,research modified-date: Wed, 16 Apr 1997 20:50:00 GMT modified-by: Geoff Duncan robot-id: googlebot robot-name: Googlebot robot-cover-url: http://www.googlebot.com/ robot-details-url: http://www.googlebot.com/bot.html robot-owner-name: Google Inc. robot-owner-url: http://www.google.com/ robot-owner-email: googlebot@google.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: googlebot robot-noindex: yes robot-host: googlebot.com robot-from: yes robot-useragent: Googlebot/2.X (+http://www.googlebot.com/bot.html) robot-language: c++ robot-description: Google's crawler robot-history: Developed by Google Inc robot-environment: commercial modified-date: Thu Mar 29 21:00:07 PST 2001 modified-by: googlebot@google.com robot-id: grapnel robot-name: Grapnel/0.01 Experiment robot-cover-url: varies robot-details-url: mailto:v93_kat@ce.kth.se robot-owner-name: Philip Kallerman robot-owner-url: v93_kat@ce.kth.se robot-owner-email: v93_kat@ce.kth.se robot-status: Experimental robot-purpose: Indexing robot-type: robot-platform: WinNT robot-availability: None, yet robot-exclusion: Yes robot-exclusion-useragent: No robot-noindex: No robot-host: varies robot-from: Varies robot-useragent: robot-language: Perl robot-description: Resource Discovery Experimentation robot-history: None, hoping to make some robot-environment: modified-date: modified-by: 7 Feb 1997 robot-id:griffon robot-name:Griffon robot-cover-url:http://navi.ocn.ne.jp/ robot-details-url:http://navi.ocn.ne.jp/griffon/ robot-owner-name:NTT Communications Corporate Users Business Division robot-owner-url:http://navi.ocn.ne.jp/ robot-owner-email:griffon@super.navi.ocn.ne.jp robot-status:active robot-purpose:indexing robot-type:standalone robot-platform:unix robot-availability:none robot-exclusion:yes robot-exclusion-useragent:griffon robot-noindex:yes robot-nofollow:yes robot-host:*.navi.ocn.ne.jp robot-from:yes robot-useragent:griffon/1.0 robot-language:c robot-description:The Griffon robot is used to build database for the OCN navi search service operated by NTT Communications Corporation. It mainly gathers pages written in Japanese. robot-history:Its root is TITAN project in NTT. robot-environment:service modified-date:Mon,25 Jan 2000 15:25:30 GMT modified-by:toka@navi.ocn.ne.jp robot-id: gromit robot-name: Gromit robot-cover-url: http://www.austlii.edu.au/ robot-details-url: http://www2.austlii.edu.au/~dan/gromit/ robot-owner-name: Daniel Austin robot-owner-url: http://www2.austlii.edu.au/~dan/ robot-owner-email: dan@austlii.edu.au robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Gromit robot-noindex: no robot-host: *.austlii.edu.au robot-from: yes robot-useragent: Gromit/1.0 robot-language: perl5 robot-description: Gromit is a Targetted Web Spider that indexes legal sites contained in the AustLII legal links database. robot-history: This robot is based on the Perl5 LWP::RobotUA module. robot-environment: research modified-date: Wed, 11 Jun 1997 03:58:40 GMT modified-by: Daniel Austin robot-id: gulliver robot-name: Northern Light Gulliver robot-cover-url: robot-details-url: robot-owner-name: Mike Mulligan robot-owner-url: robot-owner-email: crawler@northernlight.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gulliver robot-noindex: yes robot-host: scooby.northernlight.com, taz.northernlight.com, gulliver.northernlight.com robot-from: yes robot-useragent: Gulliver/1.1 robot-language: c robot-description: Gulliver is a robot to be used to collect web pages for indexing and subsequent searching of the index. robot-history: Oct 1996: development; Dec 1996-Jan 1997: crawl & debug; Mar 1997: crawl again; robot-environment: service modified-date: Wed, 21 Apr 1999 16:00:00 GMT modified-by: Mike Mulligan robot-id: gulperbot robot-name: Gulper Bot robot-cover-url: http://yuntis.ecsl.cs.sunysb.edu/ robot-details-url: http://yuntis.ecsl.cs.sunysb.edu/help/robot/ robot-owner-name: Maxim Lifantsev robot-owner-url: http://www.cs.sunysb.edu/~maxim/ robot-owner-email: gulperbot@ecsl.cs.sunysb.edu robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gulper robot-noindex: yes robot-nofollow: yes robot-host: yuntis*.ecsl.cs.sunysb.edu robot-from: no robot-useragent: Gulper Web Bot 0.2.4 (www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot) robot-language: c++ robot-description: The Gulper Bot is used to collect data for the Yuntis research search engine project. robot-history: Developed in a research project at SUNY Stony Brook. robot-environment: research modified-date: Tue, 28 Aug 2001 21:40:47 GMT modified-by: maxim@cs.sunysb.edu robot-id: hambot robot-name: HamBot robot-cover-url: http://www.hamrad.com/search.html robot-details-url: http://www.hamrad.com/ robot-owner-name: John Dykstra robot-owner-url: robot-owner-email: john@futureone.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix, Windows95 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: hambot robot-noindex: yes robot-host: *.hamrad.com robot-from: robot-useragent: robot-language: perl5, C++ robot-description: Two HamBot robots are used (stand alone & browser based) to aid in building the database for HamRad Search - The Search Engine for Search Engines. The robota are run intermittently and perform nearly identical functions. robot-history: A non commercial (hobby?) project to aid in building and maintaining the database for the the HamRad search engine. robot-environment: service modified-date: Fri, 17 Apr 1998 21:44:00 GMT modified-by: JD robot-id: harvest robot-name: Harvest robot-cover-url: http://harvest.cs.colorado.edu robot-details-url: robot-owner-name: robot-owner-url: robot-owner-email: robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: bruno.cs.colorado.edu robot-from: yes robot-useragent: yes robot-language: robot-description: Harvest's motivation is to index community- or topic- specific collections, rather than to locate and index all HTML objects that can be found. Also, Harvest allows users to control the enumeration several ways, including stop lists and depth and count limits. Therefore, Harvest provides a much more controlled way of indexing the Web than is typical of robots. Pauses 1 second between requests (by default). robot-history: robot-environment: modified-date: modified-by: robot-id: havindex robot-name: havIndex robot-cover-url: http://www.hav.com/ robot-details-url: http://www.hav.com/ robot-owner-name: hav.Software and Horace A. (Kicker) Vallas robot-owner-url: http://www.hav.com/ robot-owner-email: havIndex@hav.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Java VM 1.1 robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: havIndex robot-noindex: yes robot-host: * robot-from: no robot-useragent: havIndex/X.xx[bxx] robot-language: Java robot-description: havIndex allows individuals to build searchable word index of (user specified) lists of URLs. havIndex does not crawl - rather it requires one or more user supplied lists of URLs to be indexed. havIndex does (optionally) save urls parsed from indexed pages. robot-history: Developed to answer client requests for URL specific index capabilities. robot-environment: commercial, service modified-date: 6-27-98 modified-by: Horace A. (Kicker) Vallas robot-id: hi robot-name: HI (HTML Index) Search robot-cover-url: http://cs6.cs.ait.ac.th:21870/pa.html robot-details-url: robot-owner-name: Razzakul Haider Chowdhury robot-owner-url: http://cs6.cs.ait.ac.th:21870/index.html robot-owner-email: a94385@cs.ait.ac.th robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: yes robot-useragent: AITCSRobot/1.1 robot-language: perl 5 robot-description: Its purpose is to generate a Resource Discovery database. This Robot traverses the net and creates a searchable database of Web pages. It stores the title string of the HTML document and the absolute url. A search engine provides the boolean AND & OR query models with or without filtering the stop list of words. Feature is kept for the Web page owners to add the url to the searchable database. robot-history: robot-environment: modified-date: Wed Oct 4 06:54:31 1995 modified-by: robot-id: hometown robot-name: Hometown Spider Pro robot-cover-url: http://www.hometownsingles.com robot-details-url: http://www.hometownsingles.com robot-owner-name: Bob Brown robot-owner-url: http://www.hometownsingles.com robot-owner-email: admin@hometownsingles.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: * robot-noindex: yes robot-host: 63.195.193.17 robot-from: no robot-useragent: Hometown Spider Pro robot-language: delphi robot-description: The Hometown Spider Pro is used to maintain the indexes for Hometown Singles. robot-history: Innerprise URL Spider Pro robot-environment: commercial modified-date: Tue, 28 Mar 2000 16:00:00 GMT modified-by: Hometown Singles robot-id: wired-digital robot-name: Wired Digital robot-cover-url: robot-details-url: robot-owner-name: Bowen Dwelle robot-owner-url: robot-owner-email: bowen@hotwired.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: hotwired robot-noindex: no robot-host: gossip.hotwired.com robot-from: yes robot-useragent: wired-digital-newsbot/1.5 robot-language: perl-5.004 robot-description: this is a test robot-history: robot-environment: research modified-date: Thu, 30 Oct 1997 modified-by: bowen@hotwired.com robot-id: htdig robot-name: ht://Dig robot-cover-url: http://www.htdig.org/ robot-details-url: http://www.htdig.org/howitworks.html robot-owner-name: Andrew Scherpbier robot-owner-url: http://www.htdig.org/author.html robot-owner-email: andrew@contigo.com robot-owner-name2: Geoff Hutchison robot-owner-url2: http://wso.williams.edu/~ghutchis/ robot-owner-email2: ghutchis@wso.williams.edu robot-status: robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: htdig robot-noindex: yes robot-host: * robot-from: no robot-useragent: htdig/3.1.0b2 robot-language: C,C++. robot-history:This robot was originally developed for use at San Diego State University. robot-environment: modified-date:Tue, 3 Nov 1998 10:09:02 EST modified-by: Geoff Hutchison robot-id: htmlgobble robot-name: HTMLgobble robot-cover-url: robot-details-url: robot-owner-name: Andreas Ley robot-owner-url: robot-owner-email: ley@rz.uni-karlsruhe.de robot-status: robot-purpose: mirror robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: tp70.rz.uni-karlsruhe.de robot-from: yes robot-useragent: HTMLgobble v2.2 robot-language: robot-description: A mirroring robot. Configured to stay within a directory, sleeps between requests, and the next version will use HEAD to check if the entire document needs to be retrieved robot-history: robot-environment: modified-date: modified-by: robot-id: hyperdecontextualizer robot-name: Hyper-Decontextualizer robot-cover-url: http://www.tricon.net/Comm/synapse/spider/ robot-details-url: robot-owner-name: Cliff Hall robot-owner-url: http://kpt1.tricon.net/cgi-bin/cliff.cgi robot-owner-email: cliff@tricon.net robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: robot-from: no robot-useragent: no robot-language: Perl 5 Takes an input sentence and marks up each word with an appropriate hyper-text link. robot-description: robot-history: robot-environment: modified-date: Mon May 6 17:41:29 1996. modified-by: robot-id: iajabot robot-name: iajaBot robot-cover-url: robot-details-url: http://www.scs.carleton.ca/~morin/iajabot.html robot-owner-name: Pat Morin robot-owner-url: http://www.scs.carleton.ca/~morin/ robot-owner-email: morin@scs.carleton.ca robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix, windows robot-availability: none robot-exclusion: no robot-exclusion-useragent: iajabot robot-noindex: no robot-host: *.scs.carleton.ca robot-from: no robot-useragent: iajaBot/0.1 robot-language: c robot-description: Finds adult content robot-history: None, brand new. robot-environment: research modified-date: Tue, 27 Jun 2000, 11:17:50 EDT modified-by: Pat Morin robot-id: ibm robot-name: IBM_Planetwide robot-cover-url: http://www.ibm.com/%7ewebmaster/ robot-details-url: robot-owner-name: Ed Costello robot-owner-url: http://www.ibm.com/%7ewebmaster/ robot-owner-email: epc@www.ibm.com" robot-status: robot-purpose: indexing, maintenance, mirroring robot-type: standalone and robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: www.ibm.com www2.ibm.com robot-from: yes robot-useragent: IBM_Planetwide, robot-language: Perl5 robot-description: Restricted to IBM owned or related domains. robot-history: robot-environment: modified-date: Mon Jan 22 22:09:19 1996. modified-by: robot-id: iconoclast robot-name: Popular Iconoclast robot-cover-url: http://gestalt.sewanee.edu/ic/ robot-details-url: http://gestalt.sewanee.edu/ic/info.html robot-owner-name: Chris Cappuccio robot-owner-url: http://sefl.satelnet.org/~ccappuc/ robot-owner-email: chris@gestalt.sewanee.edu robot-status: development robot-purpose: statistics robot-type: standalone robot-platform: unix (OpenBSD) robot-availability: source robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: gestalt.sewanee.edu robot-from: yes robot-useragent: gestaltIconoclast/1.0 libwww-FM/2.17 robot-language: c,perl5 robot-description: This guy likes statistics robot-history: This robot has a history in mathematics and english robot-environment: research modified-date: Wed, 5 Mar 1997 17:35:16 CST modified-by: chris@gestalt.sewanee.edu robot-id: Ilse robot-name: Ingrid robot-cover-url: robot-details-url: robot-owner-name: Ilse c.v. robot-owner-url: http://www.ilse.nl/ robot-owner-email: ilse@ilse.nl robot-status: Running robot-purpose: Indexing robot-type: Web Indexer robot-platform: UNIX robot-availability: Commercial as part of search engine package robot-exclusion: Yes robot-exclusion-useragent: INGRID/0.1 robot-noindex: Yes robot-host: bart.ilse.nl robot-from: Yes robot-useragent: INGRID/0.1 robot-language: C robot-description: robot-history: robot-environment: modified-date: 06/13/1997 modified-by: Ilse robot-id: imagelock robot-name: Imagelock robot-cover-url: robot-details-url: robot-owner-name: Ken Belanger robot-owner-url: robot-owner-email: belanger@imagelock.com robot-status: development robot-purpose: maintenance robot-type: robot-platform: windows95 robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: 209.111.133.* robot-from: no robot-useragent: Mozilla 3.01 PBWF (Win95) robot-language: robot-description: searches for image links robot-history: robot-environment: service modified-date: Tue, 11 Aug 1998 17:28:52 GMT modified-by: brian@smithrenaud.com robot-id: incywincy robot-name: IncyWincy robot-cover-url: http://osiris.sunderland.ac.uk/sst-scripts/simon.html robot-details-url: robot-owner-name: Simon Stobart robot-owner-url: http://osiris.sunderland.ac.uk/sst-scripts/simon.html robot-owner-email: simon.stobart@sunderland.ac.uk robot-status: robot-purpose: robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: osiris.sunderland.ac.uk robot-from: yes robot-useragent: IncyWincy/1.0b1 robot-language: C++ robot-description: Various Research projects at the University of Sunderland robot-history: robot-environment: modified-date: Fri Jan 19 21:50:32 1996. modified-by: robot-id: informant robot-name: Informant robot-cover-url: http://informant.dartmouth.edu/ robot-details-url: http://informant.dartmouth.edu/about.html robot-owner-name: Bob Gray robot-owner-name2: Aditya Bhasin robot-owner-name3: Katsuhiro Moizumi robot-owner-name4: Dr. George V. Cybenko robot-owner-url: http://informant.dartmouth.edu/ robot-owner-email: info_adm@cosmo.dartmouth.edu robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: no robot-exclusion-useragent: Informant robot-noindex: no robot-host: informant.dartmouth.edu robot-from: yes robot-useragent: Informant robot-language: c, c++ robot-description: The Informant robot continually checks the Web pages that are relevant to user queries. Users are notified of any new or updated pages. The robot runs daily, but the number of hits per site per day should be quite small, and these hits should be randomly distributed over several hours. Since the robot does not actually follow links (aside from those returned from the major search engines such as Lycos), it does not fall victim to the common looping problems. The robot will support the Robot Exclusion Standard by early December, 1996. robot-history: The robot is part of a research project at Dartmouth College. The robot may become part of a commercial service (at which time it may be subsumed by some other, existing robot). robot-environment: research, service modified-date: Sun, 3 Nov 1996 11:55:00 GMT modified-by: Bob Gray robot-id: infoseek robot-name: InfoSeek Robot 1.0 robot-cover-url: http://www.infoseek.com robot-details-url: robot-owner-name: Steve Kirsch robot-owner-url: http://www.infoseek.com robot-owner-email: stk@infoseek.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: corp-gw.infoseek.com robot-from: yes robot-useragent: InfoSeek Robot 1.0 robot-language: python robot-description: Its purpose is to generate a Resource Discovery database. Collects WWW pages for both InfoSeek's free WWW search and commercial search. Uses a unique proprietary algorithm to identify the most popular and interesting WWW pages. Very fast, but never has more than one request per site outstanding at any given time. Has been refined for more than a year. robot-history: robot-environment: modified-date: Sun May 28 01:35:48 1995 modified-by: robot-id: infoseeksidewinder robot-name: Infoseek Sidewinder robot-cover-url: http://www.infoseek.com/ robot-details-url: robot-owner-name: Mike Agostino robot-owner-url: http://www.infoseek.com/ robot-owner-email: mna@infoseek.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Infoseek Sidewinder robot-language: C Collects WWW pages for both InfoSeek's free WWW search services. Uses a unique, incremental, very fast proprietary algorithm to find WWW pages. robot-description: robot-history: robot-environment: modified-date: Sat Apr 27 01:20:15 1996. modified-by: robot-id: infospider robot-name: InfoSpiders robot-cover-url: http://www-cse.ucsd.edu/users/fil/agents/agents.html robot-owner-name: Filippo Menczer robot-owner-url: http://www-cse.ucsd.edu/users/fil/ robot-owner-email: fil@cs.ucsd.edu robot-status: development robot-purpose: search robot-type: standalone robot-platform: unix, mac robot-availability: none robot-exclusion: yes robot-exclusion-useragent: InfoSpiders robot-noindex: no robot-host: *.ucsd.edu robot-from: yes robot-useragent: InfoSpiders/0.1 robot-language: c, perl5 robot-description: application of artificial life algorithm to adaptive distributed information retrieval robot-history: UC San Diego, Computer Science Dept. PhD research project (1995-97) under supervision of Prof. Rik Belew robot-environment: research modified-date: Mon, 16 Sep 1996 14:08:00 PDT robot-id: inspectorwww robot-name: Inspector Web robot-cover-url: http://www.greenpac.com/inspector/ robot-details-url: http://www.greenpac.com/inspector/ourrobot.html robot-owner-name: Doug Green robot-owner-url: http://www.greenpac.com robot-owner-email: doug@greenpac.com robot-status: active: robot significantly developed, but still undergoing fixes robot-purpose: maintentance: link validation, html validation, image size validation, etc robot-type: standalone robot-platform: unix robot-availability: free service and more extensive commercial service robot-exclusion: yes robot-exclusion-useragent: inspectorwww robot-noindex: no robot-host: www.corpsite.com, www.greenpac.com, 38.234.171.* robot-from: yes robot-useragent: inspectorwww/1.0 http://www.greenpac.com/inspectorwww.html robot-language: c robot-description: Provide inspection reports which give advise to WWW site owners on missing links, images resize problems, syntax errors, etc. robot-history: development started in Mar 1997 robot-environment: commercial modified-date: Tue Jun 17 09:24:58 EST 1997 modified-by: Doug Green robot-id: intelliagent robot-name: IntelliAgent robot-cover-url: http://www.geocities.com/SiliconValley/3086/iagent.html robot-details-url: robot-owner-name: David Reilly robot-owner-url: http://www.geocities.com/SiliconValley/3086/index.html robot-owner-email: s1523@sand.it.bond.edu.au robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: sand.it.bond.edu.au robot-from: no robot-useragent: 'IAGENT/1.0' robot-language: C robot-description: IntelliAgent is still in development. Indeed, it is very far from completion. I'm planning to limit the depth at which it will probe, so hopefully IAgent won't cause anyone much of a problem. At the end of its completion, I hope to publish both the raw data and original source code. robot-history: robot-environment: modified-date: Fri May 31 02:10:39 1996. modified-by: robot-id: irobot robot-name: I, Robot robot-cover-url: http://irobot.mame.dk/ robot-details-url: http://irobot.mame.dk/about.phtml robot-owner-name: [mame.dk] robot-owner-url: http://www.mame.dk/ robot-owner-email: irobot@chaos.dk robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: irobot robot-noindex: yes robot-host: *.mame.dk, 206.161.121.* robot-from: no robot-useragent: I Robot 0.4 (irobot@chaos.dk) robot-language: c robot-description: I Robot is used to build a fresh database for the emulation community. Primary focus is information on emulation and especially old arcade machines. Primarily english sites will be indexed and only if they have their own domain. Sites are added manually on based on submitions after they has been evaluated. robot-history: The robot was started in june 2000 robot-environment1: service robot-environment2: hobby modified-date: Fri, 27 Oct 2000 09:08:06 GMT modified-by: BombJack mameadm@chaos.dk robot-id:iron33 robot-name:Iron33 robot-cover-url:http://verno.ueda.info.waseda.ac.jp/iron33/ robot-details-url:http://verno.ueda.info.waseda.ac.jp/iron33/history.html robot-owner-name:Takashi Watanabe robot-owner-url:http://www.ueda.info.waseda.ac.jp/~watanabe/ robot-owner-email:watanabe@ueda.info.waseda.ac.jp robot-status:active robot-purpose:indexing, statistics robot-type:standalone robot-platform:unix robot-availability:source robot-exclusion:yes robot-exclusion-useragent:Iron33 robot-noindex:no robot-host:*.folon.ueda.info.waseda.ac.jp, 133.9.215.* robot-from:yes robot-useragent:Iron33/0.0 robot-language:c robot-description:The robot "Iron33" is used to build the database for the WWW search engine "Verno". robot-history: robot-environment:research modified-date:Fri, 20 Mar 1998 18:34 JST modified-by:Watanabe Takashi robot-id: israelisearch robot-name: Israeli-search robot-cover-url: http://www.idc.ac.il/Sandbag/ robot-details-url: robot-owner-name: Etamar Laron robot-owner-url: http://www.xpert.com/~etamar/ robot-owner-email: etamar@xpert.co robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: dylan.ius.cs.cmu.edu robot-from: no robot-useragent: IsraeliSearch/1.0 robot-language: C A complete software designed to collect information in a distributed workload and supports context queries. Intended to be a complete updated resource for Israeli sites and information related to Israel or Israeli Society. robot-description: robot-history: robot-environment: modified-date: Tue Apr 23 19:23:55 1996. modified-by: robot-id: javabee robot-name: JavaBee robot-cover-url: http://www.javabee.com robot-details-url: robot-owner-name:ObjectBox robot-owner-url:http://www.objectbox.com/ robot-owner-email:info@objectbox.com robot-status:Active robot-purpose:Stealing Java Code robot-type:standalone robot-platform:Java robot-availability:binary robot-exclusion:no robot-exclusion-useragent: robot-noindex:no robot-host:* robot-from:no robot-useragent:JavaBee robot-language:Java robot-description:This robot is used to grab java applets and run them locally overriding the security implemented robot-history: robot-environment:commercial modified-date: modified-by: robot-id: JBot robot-name: JBot Java Web Robot robot-cover-url: http://www.matuschek.net/software/jbot robot-details-url: http://www.matuschek.net/software/jbot robot-owner-name: Daniel Matuschek robot-owner-url: http://www.matuschek.net robot-owner-email: daniel@matuschek.net robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: Java robot-availability: source robot-exclusion: yes robot-exclusion-useragent: JBot robot-noindex: no robot-host: * robot-from: - robot-useragent: JBot (but can be changed by the user) robot-language: Java robot-description: Java web crawler to download web sites robot-history: - robot-environment: hobby modified-date: Thu, 03 Jan 2000 16:00:00 GMT modified-by: Daniel Matuschek robot-id: jcrawler robot-name: JCrawler robot-cover-url: http://www.nihongo.org/jcrawler/ robot-details-url: robot-owner-name: Benjamin Franz robot-owner-url: http://www.nihongo.org/snowhare/ robot-owner-email: snowhare@netimages.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: jcrawler robot-noindex: yes robot-host: db.netimages.com robot-from: yes robot-useragent: JCrawler/0.2 robot-language: perl5 robot-description: JCrawler is currently used to build the Vietnam topic specific WWW index for VietGATE . It schedules visits randomly, but will not visit a site more than once every two minutes. It uses a subject matter relevance pruning algorithm to determine what pages to crawl and index and will not generally index pages with no Vietnam related content. Uses Unicode internally, and detects and converts several different Vietnamese character encodings. robot-history: robot-environment: service modified-date: Wed, 08 Oct 1997 00:09:52 GMT modified-by: Benjamin Franz robot-id: askjeeves robot-name: AskJeeves robot-cover-url: http://www.ask.com robot-details-url: robot-owner-name: Ask Jeeves, Inc. robot-owner-url: http://www.ask.com robot-owner-email: postmaster@ask.com robot-status: active robot-purpose: indexing, maintenance robot-type: standalone robot-platform: linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: "Teoma" or "Ask Jeeves" or "Jeeves" robot-noindex: Yes robot-host: ez*.directhit.com robot-from: No robot-useragent: Mozilla/2.0 (compatible; Ask Jeeves/Teoma) robot-language: c++ robot-description: Ask Jeeves / Teoma spider robot-history: Developed by Direct Hit Technologies which was aquired by Ask Jeeves in 2000. robot-environment: service modified-date: Fri Jan 17 15:20:08 EST 2003 modified-by: brucep@ask.com robot-id: jobo robot-name: JoBo Java Web Robot robot-cover-url: http://www.matuschek.net/software/jobo/ robot-details-url: http://www.matuschek.net/software/jobo/ robot-owner-name: Daniel Matuschek robot-owner-url: http://www.matuschek.net robot-owner-email: daniel@matuschek.net robot-status: active robot-purpose: downloading, mirroring, indexing robot-type: standalone robot-platform: unix, windows, os/2, mac robot-availability: source robot-exclusion: yes robot-exclusion-useragent: jobo robot-noindex: no robot-host: * robot-from: yes robot-useragent: JoBo (can be modified by the user) robot-language: java robot-description: JoBo is a web site download tool. The core web spider can be used for any purpose. robot-history: JoBo was developed as a simple download tool and became a full featured web spider during development robot-environment: hobby modified-date: Fri, 20 Apr 2001 17:00:00 GMT modified-by: Daniel Matuschek robot-id: jobot robot-name: Jobot robot-cover-url: http://www.micrognosis.com/~ajack/jobot/jobot.html robot-details-url: robot-owner-name: Adam Jack robot-owner-url: http://www.micrognosis.com/~ajack/index.html robot-owner-email: ajack@corp.micrognosis.com robot-status: inactive robot-purpose: standalone robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: supernova.micrognosis.com robot-from: yes robot-useragent: Jobot/0.1alpha libwww-perl/4.0 robot-language: perl 4 robot-description: Its purpose is to generate a Resource Discovery database. Intended to seek out sites of potential "career interest". Hence - Job Robot. robot-history: robot-environment: modified-date: Tue Jan 9 18:55:55 1996 modified-by: robot-id: joebot robot-name: JoeBot robot-cover-url: robot-details-url: robot-owner-name: Ray Waldin robot-owner-url: http://www.primenet.com/~rwaldin robot-owner-email: rwaldin@primenet.com robot-status: robot-purpose: robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: JoeBot/x.x, robot-language: java JoeBot is a generic web crawler implemented as a collection of Java classes which can be used in a variety of applications, including resource discovery, link validation, mirroring, etc. It currently limits itself to one visit per host per minute. robot-description: robot-history: robot-environment: modified-date: Sun May 19 08:13:06 1996. modified-by: robot-id: jubii robot-name: The Jubii Indexing Robot robot-cover-url: http://www.jubii.dk/robot/default.htm robot-details-url: robot-owner-name: Jakob Faarvang robot-owner-url: http://www.cybernet.dk/staff/jakob/ robot-owner-email: jakob@jubii.dk robot-status: robot-purpose: indexing, maintainance robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: any host in the cybernet.dk domain robot-from: yes robot-useragent: JubiiRobot/version# robot-language: visual basic 4.0 robot-description: Its purpose is to generate a Resource Discovery database, and validate links. Used for indexing the .dk top-level domain as well as other Danish sites for aDanish web database, as well as link validation. robot-history: Will be in constant operation from Spring 1996 robot-environment: modified-date: Sat Jan 6 20:58:44 1996 modified-by: robot-id: jumpstation robot-name: JumpStation robot-cover-url: http://js.stir.ac.uk/jsbin/jsii robot-details-url: robot-owner-name: Jonathon Fletcher robot-owner-url: http://www.stir.ac.uk/~jf1 robot-owner-email: j.fletcher@stirling.ac.uk robot-status: retired robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: *.stir.ac.uk robot-from: yes robot-useragent: jumpstation robot-language: perl, C, c++ robot-description: robot-history: Originated as a weekend project in 1993. robot-environment: modified-date: Tue May 16 00:57:42 1995. modified-by: robot-id: kapsi robot-name: image.kapsi.net robot-cover-url: http://image.kapsi.net/ robot-details-url: http://image.kapsi.net/index.php?page=robot robot-owner-name: Jaakko Heusala robot-owner-url: http://huoh.kapsi.net/ robot-owner-email: Jaakko.Heusala@kapsi.net robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: image.kapsi.net robot-noindex: no robot-host: addr-212-50-142-138.suomi.net robot-from: yes robot-useragent: image.kapsi.net/1.0 robot-language: perl robot-description: The image.kapsi.net robot is used to build the database for the image.kapsi.net search service. The robot runs currently in a random times. robot-history: The Robot was build for image.kapsi.net's database in year 2001. robot-environment: hobby, research modified-date: Thu, 13 Dec 2001 23:28:23 EET modified-by: robot-id: katipo robot-name: Katipo robot-cover-url: http://www.vuw.ac.nz/~newbery/Katipo.html robot-details-url: http://www.vuw.ac.nz/~newbery/Katipo/Katipo-doc.html robot-owner-name: Michael Newbery robot-owner-url: http://www.vuw.ac.nz/~newbery robot-owner-email: Michael.Newbery@vuw.ac.nz robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: Macintosh robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: yes robot-useragent: Katipo/1.0 robot-language: c robot-description: Watches all the pages you have previously visited and tells you when they have changed. robot-history: robot-environment: commercial (free) modified-date: Tue, 25 Jun 96 11:40:07 +1200 modified-by: Michael Newbery robot-id: kdd robot-name: KDD-Explorer robot-cover-url: http://mlc.kddvw.kcom.or.jp/CLINKS/html/clinks.html robot-details-url: not available robot-owner-name: Kazunori Matsumoto robot-owner-url: not available robot-owner-email: matsu@lab.kdd.co.jp robot-status: development (to be avtive in June 1997) robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent:KDD-Explorer robot-noindex: no robot-host: mlc.kddvw.kcom.or.jp robot-from: yes robot-useragent: KDD-Explorer/0.1 robot-language: c robot-description: KDD-Explorer is used for indexing valuable documents which will be retrieved via an experimental cross-language search engine, CLINKS. robot-history: This robot was designed in Knowledge-bases Information processing Laboratory, KDD R&D Laboratories, 1996-1997 robot-environment: research modified-date: Mon, 2 June 1997 18:00:00 JST modified-by: Kazunori Matsumoto robot-id:kilroy robot-name:Kilroy robot-cover-url:http://purl.org/kilroy robot-details-url:http://purl.org/kilroy robot-owner-name:OCLC robot-owner-url:http://www.oclc.org robot-owner-email:kilroy@oclc.org robot-status:active robot-purpose:indexing,statistics robot-type:standalone robot-platform:unix,windowsNT robot-availability:none robot-exclusion:yes robot-exclusion-useragent:* robot-noindex:no robot-host:*.oclc.org robot-from:no robot-useragent:yes robot-language:java robot-description:Used to collect data for several projects. Runs constantly and visits site no faster than once every 90 seconds. robot-history:none robot-environment:research,service modified-date:Thursday, 24 Apr 1997 20:00:00 GMT modified-by:tkac robot-id: ko_yappo_robot robot-name: KO_Yappo_Robot robot-cover-url: http://yappo.com/info/robot.html robot-details-url: http://yappo.com/ robot-owner-name: Kazuhiro Osawa robot-owner-url: http://yappo.com/ robot-owner-email: office_KO@yappo.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ko_yappo_robot robot-noindex: yes robot-host: yappo.com,209.25.40.1 robot-from: yes robot-useragent: KO_Yappo_Robot/1.0.4(http://yappo.com/info/robot.html) robot-language: perl robot-description: The KO_Yappo_Robot robot is used to build the database for the Yappo search service by k,osawa (part of AOL). The robot runs random day, and visits sites in a random order. robot-history: The robot is hobby of k,osawa at the Tokyo in 1997 robot-environment: hobby modified-date: Fri, 18 Jul 1996 12:34:21 GMT modified-by: KO robot-id: labelgrabber.txt robot-name: LabelGrabber robot-cover-url: http://www.w3.org/PICS/refcode/LabelGrabber/index.htm robot-details-url: http://www.w3.org/PICS/refcode/LabelGrabber/index.htm robot-owner-name: Kyle Jamieson robot-owner-url: http://www.w3.org/PICS/refcode/LabelGrabber/index.htm robot-owner-email: jamieson@mit.edu robot-status: active robot-purpose: Grabs PICS labels from web pages, submits them to a label bueau robot-type: standalone robot-platform: windows, windows95, windowsNT, unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: label-grabber robot-noindex: no robot-host: head.w3.org robot-from: no robot-useragent: LabelGrab/1.1 robot-language: java robot-description: The label grabber searches for PICS labels and submits them to a label bureau robot-history: N/A robot-environment: research modified-date: Wed, 28 Jan 1998 17:32:52 GMT modified-by: jamieson@mit.edu robot-id: larbin robot-name: larbin robot-cover-url: http://para.inria.fr/~ailleret/larbin/index-eng.html robot-owner-name: Sebastien Ailleret robot-owner-url: http://para.inria.fr/~ailleret/ robot-owner-email: sebastien.ailleret@inria.fr robot-status: active robot-purpose: Your imagination is the only limit robot-type: standalone robot-platform: Linux robot-availability: source (GPL), mail me for customization robot-exclusion: yes robot-exclusion-useragent: larbin robot-noindex: no robot-host: * robot-from: no robot-useragent: larbin (+mail) robot-language: c++ robot-description: Parcourir le web, telle est ma passion robot-history: french research group (INRIA Verso) robot-environment: hobby modified-date: 2000-3-28 modified-by: Sebastien Ailleret robot-id: legs robot-name: legs robot-cover-url: http://www.MagPortal.com/ robot-details-url: robot-owner-name: Bill Dimm robot-owner-url: http://www.HotNeuron.com/ robot-owner-email: admin@magportal.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: legs robot-noindex: no robot-host: robot-from: yes robot-useragent: legs robot-language: perl5 robot-description: The legs robot is used to build the magazine article database for MagPortal.com. robot-history: robot-environment: service modified-date: Wed, 22 Mar 2000 14:10:49 GMT modified-by: Bill Dimm robot-id: linkidator robot-name: Link Validator robot-cover-url: robot-details-url: robot-owner-name: Thomas Gimon robot-owner-url: robot-owner-email: tgimon@mitre.org robot-status: development robot-purpose: maintenance robot-type: standalone robot-platform: unix, windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Linkidator robot-noindex: yes robot-nofollow: yes robot-host: *.mitre.org robot-from: yes robot-useragent: Linkidator/0.93 robot-language: perl5 robot-description: Recursively checks all links on a site, looking for broken or redirected links. Checks all off-site links using HEAD requests and does not progress further. Designed to behave well and to be very configurable. robot-history: Built using WWW-Robot-0.022 perl module. Currently in beta test. Seeking approval for public release. robot-environment: internal modified-date: Fri, 20 Jan 2001 02:22:00 EST modified-by: Thomas Gimon robot-id:linkscan robot-name:LinkScan robot-cover-url:http://www.elsop.com/ robot-details-url:http://www.elsop.com/linkscan/overview.html robot-owner-name:Electronic Software Publishing Corp. (Elsop) robot-owner-url:http://www.elsop.com/ robot-owner-email:sales@elsop.com robot-status:Robot actively in use robot-purpose:Link checker, SiteMapper, and HTML Validator robot-type:Standalone robot-platform:Unix, Linux, Windows 98/NT robot-availability:Program is shareware robot-exclusion:No robot-exclusion-useragent: robot-noindex:Yes robot-host:* robot-from: robot-useragent:LinkScan Server/5.5 | LinkScan Workstation/5.5 robot-language:perl5 robot-description:LinkScan checks links, validates HTML and creates site maps robot-history: First developed by Elsop in January,1997 robot-environment:Commercial modified-date:Fri, 3 September 1999 17:00:00 PDT modified-by: Kenneth R. Churilla robot-id: linkwalker robot-name: LinkWalker robot-cover-url: http://www.seventwentyfour.com robot-details-url: http://www.seventwentyfour.com/tech.html robot-owner-name: Roy Bryant robot-owner-url: robot-owner-email: rbryant@seventwentyfour.com robot-status: active robot-purpose: maintenance, statistics robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: linkwalker robot-noindex: yes robot-host: *.seventwentyfour.com robot-from: yes robot-useragent: LinkWalker robot-language: c++ robot-description: LinkWalker generates a database of links. We send reports of bad ones to webmasters. robot-history: Constructed late 1997 through April 1998. In full service April 1998. robot-environment: service modified-date: Wed, 22 Apr 1998 modified-by: Roy Bryant robot-id:lockon robot-name:Lockon robot-cover-url: robot-details-url: robot-owner-name:Seiji Sasazuka & Takahiro Ohmori robot-owner-url: robot-owner-email:search@rsch.tuis.ac.jp robot-status:active robot-purpose:indexing robot-type:standalone robot-platform:UNIX robot-availability:none robot-exclusion:yes robot-exclusion-useragent:Lockon robot-noindex:yes robot-host:*.hitech.tuis.ac.jp robot-from:yes robot-useragent:Lockon/xxxxx robot-language:perl5 robot-description:This robot gathers only HTML document. robot-history:This robot was developed in the Tokyo university of information sciences in 1998. robot-environment:research modified-date:Tue. 10 Nov 1998 20:00:00 GMT modified-by:Seiji Sasazuka & Takahiro Ohmori robot-id:logo_gif robot-name: logo.gif Crawler robot-cover-url: http://www.inm.de/projects/logogif.html robot-details-url: robot-owner-name: Sevo Stille robot-owner-url: http://www.inm.de/people/sevo robot-owner-email: sevo@inm.de robot-status: under development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: logo_gif_crawler robot-noindex: no robot-host: *.inm.de robot-from: yes robot-useragent: logo.gif crawler robot-language: perl robot-description: meta-indexing engine for corporate logo graphics The robot runs at irregular intervals and will only pull a start page and its associated /.*logo\.gif/i (if any). It will be terminated once a statistically significant number of samples has been collected. robot-history: logo.gif is part of the design diploma of Markus Weisbeck, and tries to analyze the abundance of the logo metaphor in WWW corporate design. The crawler and image database were written by Sevo Stille and Peter Frank of the Institut für Neue Medien, respectively. robot-environment: research, statistics modified-date: 25.5.97 modified-by: Sevo Stille robot-id: lycos robot-name: Lycos robot-cover-url: http://lycos.cs.cmu.edu/ robot-details-url: robot-owner-name: Dr. Michael L. Mauldin robot-owner-url: http://fuzine.mt.cs.cmu.edu/mlm/home.html robot-owner-email: fuzzy@cmu.edu robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: fuzine.mt.cs.cmu.edu, lycos.com robot-from: robot-useragent: Lycos/x.x robot-language: robot-description: This is a research program in providing information retrieval and discovery in the WWW, using a finite memory model of the web to guide intelligent, directed searches for specific information needs robot-history: robot-environment: modified-date: modified-by: robot-id: macworm robot-name: Mac WWWWorm robot-cover-url: robot-details-url: robot-owner-name: Sebastien Lemieux robot-owner-url: robot-owner-email: lemieuse@ERE.UMontreal.CA robot-status: robot-purpose: indexing robot-type: robot-platform: Macintosh robot-availability: none robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: hypercard robot-description: a French Keyword-searching robot for the Mac The author has decided not to release this robot to the public robot-history: robot-environment: modified-date: modified-by: robot-id: magpie robot-name: Magpie robot-cover-url: robot-details-url: robot-owner-name: Keith Jones robot-owner-url: robot-owner-email: Keith.Jones@blueberry.co.uk robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: *.blueberry.co.uk, 194.70.52.*, 193.131.167.144 robot-from: no robot-useragent: Magpie/1.0 robot-language: perl5 robot-description: Used to obtain information from a specified list of web pages for local indexing. Runs every two hours, and visits only a small number of sites. robot-history: Part of a research project. Alpha testing from 10 July 1996, Beta testing from 10 September. robot-environment: research modified-date: Wed, 10 Oct 1996 13:15:00 GMT modified-by: Keith Jones robot-id: marvin robot-name: marvin/infoseek robot-details-url: robot-cover-url: http://www.infoseek.de/ robot-owner-name: WSI Webseek Infoservice GmbH & Co KG. robot-owner-url: http://www.infoseek.de/ robot-owner-email: marvin-team@webseek.de robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: marvin robot-noindex: yes robot-nofollow: yes robot-host: arthur*.sda.t-online.de robot-from: yes robot-useragent: marvin/infoseek (marvin-team@webseek.de) robot-language: java robot-description: robot-history: day of birth: 4.2. 2001 - replaces Infoseek Sidewinder robot-environment: comercial modified-date: Fri, 11 May 2001 17:28:52 GMT robot-id: mattie robot-name: Mattie robot-cover-url: http://www.mcw.aarkayn.org robot-details-url: http://www.mcw.aarkayn.org/web/mattie.asp robot-owner-name: Matt robot-owner-url: http://www.mcw.aarkayn.org robot-owner-email: matt@mcw.aarkayn.org robot-status: Active robot-purpose: Procurement Spider robot-type: Standalone robot-platform: UNIX robot-availability: None robot-exclusion: Yes robot-exclusion-useragent: mattie robot-noindex: N/A robot-nofollow: Yes robot-host: mattie.mcw.aarkayn.org robot-from: Yes robot-useragent: M/3.8 robot-language: C++ robot-description: Mattie is an all-source procurement spider. robot-history: Created 2000 Mar. 03 Fri. 18:48:16 -0500 GMT (R) as an MP3 spider, Mattie was reborn 2002 Jul. 07 Sun. 03:47:29 -0500 GMT (R) as an all-source procurement spider. robot-environment: Hobby modified-date: Fri, 13 Sep 2002 00:36:13 GMT modified-by: Matt robot-id: mediafox robot-name: MediaFox robot-cover-url: none robot-details-url: none robot-owner-name: Lars Eilebrecht robot-owner-url: http://www.home.unix-ag.org/sfx/ robot-owner-email: sfx@uni-media.de robot-status: development robot-purpose: indexing and maintenance robot-type: standalone robot-platform: (Java) robot-availability: none robot-exclusion: yes robot-exclusion-useragent: mediafox robot-noindex: yes robot-host: 141.99.*.* robot-from: yes robot-useragent: MediaFox/x.y robot-language: Java robot-description: The robot is used to index meta information of a specified set of documents and update a database accordingly. robot-history: Project at the University of Siegen robot-environment: research modified-date: Fri Aug 14 03:37:56 CEST 1998 modified-by: Lars Eilebrecht robot-id:merzscope robot-name:MerzScope robot-cover-url:http://www.merzcom.com robot-details-url:http://www.merzcom.com robot-owner-name:(Client based robot) robot-owner-url:(Client based robot) robot-owner-email: robot-status:actively in use robot-purpose:WebMapping robot-type:standalone robot-platform: (Java Based) unix,windows95,windowsNT,os2,mac etc .. robot-availability:binary robot-exclusion: yes robot-exclusion-useragent: MerzScope robot-noindex: no robot-host:(Client Based) robot-from: robot-useragent: MerzScope robot-language: java robot-description: Robot is part of a Web-Mapping package called MerzScope, to be used mainly by consultants, and web masters to create and publish maps, on and of the World wide web. robot-history: robot-environment: modified-date: Fri, 13 March 1997 16:31:00 modified-by: Philip Lenir, MerzScope lead developper robot-id: meshexplorer robot-name: NEC-MeshExplorer robot-cover-url: http://netplaza.biglobe.or.jp/ robot-details-url: http://netplaza.biglobe.or.jp/keyword.html robot-owner-name: web search service maintenance group robot-owner-url: http://netplaza.biglobe.or.jp/keyword.html robot-owner-email: web-dir@mxa.meshnet.or.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: NEC-MeshExplorer robot-noindex: no robot-host: meshsv300.tk.mesh.ad.jp robot-from: yes robot-useragent: NEC-MeshExplorer robot-language: c robot-description: The NEC-MeshExplorer robot is used to build database for the NETPLAZA search service operated by NEC Corporation. The robot searches URLs around sites in japan(JP domain). The robot runs every day, and visits sites in a random order. robot-history: Prototype version of this robot was developed in C&C Research Laboratories, NEC Corporation. Current robot (Version 1.0) is based on the prototype and has more functions. robot-environment: research modified-date: Jan 1, 1997 modified-by: Nobuya Kubo, Hajime Takano robot-id: MindCrawler robot-name: MindCrawler robot-cover-url: http://www.mindpass.com/_technology_faq.htm robot-details-url: robot-owner-name: Mindpass robot-owner-url: http://www.mindpass.com/ robot-owner-email: support@mindpass.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: MindCrawler robot-noindex: no robot-host: * robot-from: no robot-useragent: MindCrawler robot-language: c++ robot-description: robot-history: robot-environment: modified-date: Tue Mar 28 11:30:09 CEST 2000 modified-by: robot-id: mnogosearch robot-name: mnoGoSearch search engine software robot-cover-url: http://www.mnogosearch.org robot-details-url: http://www.mnogosearch.org/features.html robot-owner-name: Lavtech.com corp. robot-owner-url: http://www.mnogosearch.org robot-owner-email: support@mnogosearch.org robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix, windows, mac robot-availability: source robot-exclusion: yes robot-exclusion-useragent: udmsearch robot-noindex: yes robot-host: * robot-from: no robot-useragent: UdmSearch robot-language: c robot-description: mnoGoSearch search engine software (formerly known as UDMSearch) is an advanced search solution for large-scale websites and Intranet. It is based on SQL database and supports numerous features. robot-history: Formerly known as UDMSearch was developed as the search engine for the Russian republic of Udmurtia. robot-environment: commercial modified-date: Wed, 12 Sept 2001 modified-by: Dmitry Tkatchenko robot-id:moget robot-name:moget robot-cover-url: robot-details-url: robot-owner-name:NTT-ME Infomation Xing,Inc robot-owner-url:http://www.nttx.co.jp robot-owner-email:moget@goo.ne.jp robot-status:active robot-purpose:indexing,statistics robot-type:standalone robot-platform:unix robot-availability:none robot-exclusion:yes robot-exclusion-useragent:moget robot-noindex:yes robot-host:*.goo.ne.jp robot-from:yes robot-useragent:moget/1.0 robot-language:c robot-description: This robot is used to build the database for the search service operated by goo robot-history: robot-environment:service modified-date:Thu, 30 Mar 2000 18:40:37 GMT modified-by:moget@goo.ne.jp robot-id: momspider robot-name: MOMspider robot-cover-url: http://www.ics.uci.edu/WebSoft/MOMspider/ robot-details-url: robot-owner-name: Roy T. Fielding robot-owner-url: http://www.ics.uci.edu/dir/grad/Software/fielding robot-owner-email: fielding@ics.uci.edu robot-status: active robot-purpose: maintenance, statistics robot-type: standalone robot-platform: UNIX robot-availability: source robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: yes robot-useragent: MOMspider/1.00 libwww-perl/0.40 robot-language: perl 4 robot-description: to validate links, and generate statistics. It's usually run from anywhere robot-history: Originated as a research project at the University of California, Irvine, in 1993. Presented at the First International WWW Conference in Geneva, 1994. robot-environment: modified-date: Sat May 6 08:11:58 1995 modified-by: fielding@ics.uci.edu robot-id: monster robot-name: Monster robot-cover-url: http://www.neva.ru/monster.list/russian.www.html robot-details-url: robot-owner-name: Dmitry Dicky robot-owner-url: http://wild.stu.neva.ru/ robot-owner-email: diwil@wild.stu.neva.ru robot-status: active robot-purpose: maintenance, mirroring robot-type: standalone robot-platform: UNIX (Linux) robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: wild.stu.neva.ru robot-from: robot-useragent: Monster/vX.X.X -$TYPE ($OSTYPE) robot-language: C robot-description: The Monster has two parts - Web searcher and Web analyzer. Searcher is intended to perform the list of WWW sites of desired domain (for example it can perform list of all WWW sites of mit.edu, com, org, etc... domain) In the User-agent field $TYPE is set to 'Mapper' for Web searcher and 'StAlone' for Web analyzer. robot-history: Now the full (I suppose) list of ex-USSR sites is produced. robot-environment: modified-date: Tue Jun 25 10:03:36 1996 modified-by: robot-id: motor robot-name: Motor robot-cover-url: http://www.cybercon.de/Motor/index.html robot-details-url: robot-owner-name: Mr. Oliver Runge, Mr. Michael Goeckel robot-owner-url: http://www.cybercon.de/index.html robot-owner-email: Motor@cybercon.technopark.gmd.de robot-status: developement robot-purpose: indexing robot-type: standalone robot-platform: mac robot-availability: data robot-exclusion: yes robot-exclusion-useragent: Motor robot-noindex: no robot-host: Michael.cybercon.technopark.gmd.de robot-from: yes robot-useragent: Motor/0.2 robot-language: 4th dimension robot-description: The Motor robot is used to build the database for the www.webindex.de search service operated by CyberCon. The robot ios under development - it runs in random intervals and visits site in a priority driven order (.de/.ch/.at first, root and robots.txt first) robot-history: robot-environment: service modified-date: Wed, 3 Jul 1996 15:30:00 +0100 modified-by: Michael Goeckel (Michael@cybercon.technopark.gmd.de) robot-id: msnbot robot-name: MSNBot robot-cover-url: http://search.msn.com robot-details-url: http://search.msn.com/msnbot.htm robot-owner-name: Microsoft Corp. robot-owner-url: http://www.microsoft.com robot-owner-email: msnbot@microsoft.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Windows Server 2000, Windows Server 2003 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: msnbot robot-noindex: yes robot-host: robot-from: yes robot-useragent: MSNBOT/0.1 (http://search.msn.com/msnbot.htm) robot-language: C++ robot-description: MSN Search Crawler robot-history: Developed by Microsoft Corp. robot-environment: commercial modified-date: June 23, 2003 modified-by: msnbot@microsoft.com robot-id: muncher robot-name: Muncher robot-details-url: http://www.goodlookingcooking.co.uk/info.htm robot-cover-url: http://www.goodlookingcooking.co.uk robot-owner-name: Chris Ridings robot-owner-url: http://www.goodlookingcooking.co.uk robot-owner-email: muncher@ridings.org.uk robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: muncher robot-noindex: yes robot-nofollow: yes robot-host: www.goodlookingcooking.co.uk robot-from: no robot-useragent: yes robot-language: perl robot-description: Used to build the index for www.goodlookingcooking.co.uk. Seeks out cooking and recipe pages. robot-history: Private project september 2001 robot-environment: hobby modified-date: Wed, 5 Sep 2001 19:21:00 GMT robot-id: muninn robot-name: Muninn robot-cover-url: http://people.freenet.de/Muninn/eyrie.html robot-details-url: http://people.freenet.de/Muninn/ robot-owner-name: Sandra Groth robot-owner-url: http://santana.dynalias.net/ robot-owner-email: muninn_bot@gmx.net robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source, data robot-exclusion: yes robot-exclusion-useragent: muninn robot-noindex: yes robot-nofollow: yes robot-host: santana.dynalias.net, 80.185.*, * robot-from: yes robot-useragent: Muninn/0.1 libwww-perl-5.76 (http://people.freenet.de/Muninn/) robot-language: Perl5 robot-description: Muninn looks at museums within my reach and tells me about current exhibitions. robot-history: It's hard to keep track of things. Automation helps. robot-environment: hobby modified-date: Thu Jun 3 16:36:47 CEST 2004 modified-by: Sandra Groth robot-id: muscatferret robot-name: Muscat Ferret robot-cover-url: http://www.muscat.co.uk/euroferret/ robot-details-url: robot-owner-name: Olly Betts robot-owner-url: http://www.muscat.co.uk/~olly/ robot-owner-email: olly@muscat.co.uk robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: MuscatFerret robot-noindex: yes robot-host: 193.114.89.*, 194.168.54.11 robot-from: yes robot-useragent: MuscatFerret/ robot-language: c, perl5 robot-description: Used to build the database for the EuroFerret robot-history: robot-environment: service modified-date: Tue, 21 May 1997 17:11:00 GMT modified-by: olly@muscat.co.uk robot-id: mwdsearch robot-name: Mwd.Search robot-cover-url: (none) robot-details-url: (none) robot-owner-name: Antti Westerberg robot-owner-url: (none) robot-owner-email: Antti.Westerberg@mwd.sci.fi robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix (Linux) robot-availability: none robot-exclusion: yes robot-exclusion-useragent: MwdSearch robot-noindex: yes robot-host: *.fifi.net robot-from: no robot-useragent: MwdSearch/0.1 robot-language: perl5, c robot-description: Robot for indexing finnish (toplevel domain .fi) webpages for search engine called Fifi. Visits sites in random order. robot-history: (none) robot-environment: service (+ commercial)mwd.sci.fi> modified-date: Mon, 26 May 1997 15:55:02 EEST modified-by: Antti.Westerberg@mwd.sci.fi robot-id: myweb robot-name: Internet Shinchakubin robot-cover-url: http://naragw.sharp.co.jp/myweb/home/ robot-details-url: robot-owner-name: SHARP Corp. robot-owner-url: http://naragw.sharp.co.jp/myweb/home/ robot-owner-email: shinchakubin-request@isl.nara.sharp.co.jp robot-status: active robot-purpose: find new links and changed pages robot-type: standalone robot-platform: Windows98 robot-availability: binary as bundled software robot-exclusion: yes robot-exclusion-useragent: sharp-info-agent robot-noindex: no robot-host: * robot-from: no robot-useragent: User-Agent: Mozilla/4.0 (compatible; sharp-info-agent v1.0; ) robot-language: Java robot-description: makes a list of new links and changed pages based on user's frequently clicked pages in the past 31 days. client may run this software one or few times every day, manually or specified time. robot-history: shipped for SHARP's PC users since Feb 2000 robot-environment: commercial modified-date: Fri, 30 Jun 2000 19:02:52 JST modified-by: Katsuo Doi robot-id: NDSpider robot-name: NDSpider robot-cover-url: http://www.NationalDirectory.com/addurl robot-details-url: http://www.NationalDirectory.com/addurl robot-owner-name: NationalDirectory.com robot-owner-url: http://www.NationalDirectory.com robot-owner-email: dns3@NationalDirectory.com robot-status: Active robot-purpose: Indexing robot-type: Standalone robot-platform: Unix platform robot-availability: None robot-exclusion: Yes robot-exclusion-useragent: robot-noindex: robot-host: Blowfish.NationalDirectory.net robot-from: robot-useragent: NDSpider/1.5 robot-language: C robot-description: It is designed to index the web. robot-history: Development started on 05 December 1996 robot-environment: UNIX modified-date: 14 March 2004 modified-by: robot-id: netcarta robot-name: NetCarta WebMap Engine robot-cover-url: http://www.netcarta.com/ robot-details-url: robot-owner-name: NetCarta WebMap Engine robot-owner-url: http://www.netcarta.com/ robot-owner-email: info@netcarta.com robot-status: robot-purpose: indexing, maintenance, mirroring, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: NetCarta CyberPilot Pro robot-language: C++. robot-description: The NetCarta WebMap Engine is a general purpose, commercial spider. Packaged with a full GUI in the CyberPilo Pro product, it acts as a personal spider to work with a browser to facilitiate context-based navigation. The WebMapper product uses the robot to manage a site (site copy, site diff, and extensive link management facilities). All versions can create publishable NetCarta WebMaps, which capture the crawled information. If the robot sees a published map, it will return the published map rather than continuing its crawl. Since this is a personal spider, it will be launched from multiple domains. This robot tends to focus on a particular site. No instance of the robot should have more than one outstanding request out to any given site at a time. The User-agent field contains a coded ID identifying the instance of the spider; specific users can be blocked via robots.txt using this ID. robot-history: robot-environment: modified-date: Sun Feb 18 02:02:49 1996. modified-by: robot-id: netmechanic robot-name: NetMechanic robot-cover-url: http://www.netmechanic.com robot-details-url: http://www.netmechanic.com/faq.html robot-owner-name: Tom Dahm robot-owner-url: http://iquest.com/~tdahm robot-owner-email: tdahm@iquest.com robot-status: development robot-purpose: Link and HTML validation robot-type: standalone with web gateway robot-platform: UNIX robot-availability: via web page robot-exclusion: Yes robot-exclusion-useragent: WebMechanic robot-noindex: no robot-host: 206.26.168.18 robot-from: no robot-useragent: NetMechanic robot-language: C robot-description: NetMechanic is a link validation and HTML validation robot run using a web page interface. robot-history: robot-environment: modified-date: Sat, 17 Aug 1996 12:00:00 GMT modified-by: robot-id: netscoop robot-name: NetScoop robot-cover-url: http://www-a2k.is.tokushima-u.ac.jp/search/index.html robot-owner-name: Kenji Kita robot-owner-url: http://www-a2k.is.tokushima-u.ac.jp/member/kita/index.html robot-owner-email: kita@is.tokushima-u.ac.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-availability: none robot-exclusion: yes robot-exclusion-useragent: NetScoop robot-host: alpha.is.tokushima-u.ac.jp, beta.is.tokushima-u.ac.jp robot-useragent: NetScoop/1.0 libwww/5.0a robot-language: C robot-description: The NetScoop robot is used to build the database for the NetScoop search engine. robot-history: The robot has been used in the research project at the Faculty of Engineering, Tokushima University, Japan., since Dec. 1996. robot-environment: research modified-date: Fri, 10 Jan 1997. modified-by: Kenji Kita robot-id: newscan-online robot-name: newscan-online robot-cover-url: http://www.newscan-online.de/ robot-details-url: http://www.newscan-online.de/info.html robot-owner-name: Axel Mueller robot-owner-url: robot-owner-email: mueller@newscan-online.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: newscan-online robot-noindex: no robot-host: *newscan-online.de robot-from: yes robot-useragent: newscan-online/1.1 robot-language: perl robot-description: The newscan-online robot is used to build a database for the newscan-online news search service operated by smart information services. The robot runs daily and visits predefined sites in a random order. robot-history: This robot finds its roots in a prereleased software for news filtering for Lotus Notes in 1995. robot-environment: service modified-date: Fri, 9 Apr 1999 11:45:00 GMT modified-by: Axel Mueller robot-id: nhse robot-name: NHSE Web Forager robot-cover-url: http://nhse.mcs.anl.gov/ robot-details-url: robot-owner-name: Robert Olson robot-owner-url: http://www.mcs.anl.gov/people/olson/ robot-owner-email: olson@mcs.anl.gov robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: *.mcs.anl.gov robot-from: yes robot-useragent: NHSEWalker/3.0 robot-language: perl 5 robot-description: to generate a Resource Discovery database robot-history: robot-environment: modified-date: Fri May 5 15:47:55 1995 modified-by: robot-id: nomad robot-name: Nomad robot-cover-url: http://www.cs.colostate.edu/~sonnen/projects/nomad.html robot-details-url: robot-owner-name: Richard Sonnen robot-owner-url: http://www.cs.colostate.edu/~sonnen/ robot-owner-email: sonnen@cs.colostat.edu robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: *.cs.colostate.edu robot-from: no robot-useragent: Nomad-V2.x robot-language: Perl 4 robot-description: robot-history: Developed in 1995 at Colorado State University. robot-environment: modified-date: Sat Jan 27 21:02:20 1996. modified-by: robot-id: northstar robot-name: The NorthStar Robot robot-cover-url: http://comics.scs.unr.edu:7000/top.html robot-details-url: robot-owner-name: Fred Barrie robot-owner-url: robot-owner-email: barrie@unr.edu robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: frognot.utdallas.edu, utdallas.edu, cnidir.org robot-from: yes robot-useragent: NorthStar robot-language: robot-description: Recent runs (26 April 94) will concentrate on textual analysis of the Web versus GopherSpace (from the Veronica data) as well as indexing. robot-history: robot-environment: modified-date: modified-by: robot-id: objectssearch robot-name: ObjectsSearch robot-cover-url: http://www.ObjectsSearch.com/ robot-details-url: robot-owner-name: Software Objects, Inc robot-owner-url: http://www.thesoftwareobjects.com/ robot-owner-email: support@thesoftwareobjects.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix, windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ObjectsSearch robot-noindex: yes robot-host: robot-from: yes robot-useragent: ObjectsSearch/0.01 robot-language: java robot-description: Objects Search Spider robot-history: Developed by Software Objects Inc. robot-environment: commercial modified-date: Friday March 05, 2004 modified-by: support@thesoftwareobjects.com robot-id: occam robot-name: Occam robot-cover-url: http://www.cs.washington.edu/research/projects/ai/www/occam/ robot-details-url: robot-owner-name: Marc Friedman robot-owner-url: http://www.cs.washington.edu/homes/friedman/ robot-owner-email: friedman@cs.washington.edu robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Occam robot-noindex: no robot-host: gentian.cs.washington.edu, sekiu.cs.washington.edu, saxifrage.cs.washington.edu robot-from: yes robot-useragent: Occam/1.0 robot-language: CommonLisp, perl4 robot-description: The robot takes high-level queries, breaks them down into multiple web requests, and answers them by combining disparate data gathered in one minute from numerous web sites, or from the robots cache. Currently the only user is me. robot-history: The robot is a descendant of Rodney, an earlier project at the University of Washington. robot-environment: research modified-date: Thu, 21 Nov 1996 20:30 GMT modified-by: friedman@cs.washington.edu (Marc Friedman) robot-id: octopus robot-name: HKU WWW Octopus robot-cover-url: http://phoenix.cs.hku.hk:1234/~jax/w3rui.shtml robot-details-url: robot-owner-name: Law Kwok Tung , Lee Tak Yeung , Lo Chun Wing robot-owner-url: http://phoenix.cs.hku.hk:1234/~jax robot-owner-email: jax@cs.hku.hk robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: phoenix.cs.hku.hk robot-from: yes robot-useragent: HKU WWW Robot, robot-language: Perl 5, C, Java. robot-description: HKU Octopus is an ongoing project for resource discovery in the Hong Kong and China WWW domain . It is a research project conducted by three undergraduate at the University of Hong Kong robot-history: robot-environment: modified-date: Thu Mar 7 14:21:55 1996. modified-by: robot-id:OntoSpider robot-name:OntoSpider robot-cover-url:http://ontospider.i-n.info robot-details-url:http://ontospider.i-n.info robot-owner-name:C. Fenijn robot-owner-url:http://ontospider.i-n.info robot-owner-email:ontospider@int-org.com robot-status:development robot-purpose:statistics robot-type:standalone robot-platform:unix robot-availability:none robot-exclusion:yes robot-exclusion-useragent: robot-noindex:no robot-host:ontospider.i-n.info robot-from:no robot-useragent:OntoSpider/1.0 libwww-perl/5.65 robot-language:perl5 robot-description:Focused crawler for research purposes robot-history:Research robot-environment:research modified-date:Sun Mar 28 14:39:38 modified-by:C. Fenijn robot-id: openfind robot-name: Openfind data gatherer robot-cover-url: http://www.openfind.com.tw/ robot-details-url: http://www.openfind.com.tw/robot.html robot-owner-name: robot-owner-url: robot-owner-email: robot-response@openfind.com.tw robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: 66.7.131.132 robot-from: robot-useragent: Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html) robot-language: robot-description: robot-history: robot-environment: modified-date: Thu, 26 Apr 2001 02:55:21 GMT modified-by: stanislav shalunov robot-id: orb_search robot-name: Orb Search robot-cover-url: http://orbsearch.home.ml.org robot-details-url: http://orbsearch.home.ml.org robot-owner-name: Matt Weber robot-owner-url: http://www.weberworld.com robot-owner-email: webernet@geocities.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: Orbsearch/1.0 robot-noindex: yes robot-host: cow.dyn.ml.org, *.dyn.ml.org robot-from: yes robot-useragent: Orbsearch/1.0 robot-language: Perl5 robot-description: Orbsearch builds the database for Orb Search Engine. It runs when requested. robot-history: This robot was started as a hobby. robot-environment: hobby modified-date: Sun, 31 Aug 1997 02:28:52 GMT modified-by: Matt Weber robot-id: packrat robot-name: Pack Rat robot-cover-url: http://web.cps.msu.edu/~dexterte/isl/packrat.html robot-details-url: robot-owner-name: Terry Dexter robot-owner-url: http://web.cps.msu.edu/~dexterte robot-owner-email: dexterte@cps.msu.edu robot-status: development robot-purpose: both maintenance and mirroring robot-type: standalone robot-platform: unix robot-availability: at the moment, none...source when developed. robot-exclusion: yes robot-exclusion-useragent: packrat or * robot-noindex: no, not yet robot-host: cps.msu.edu robot-from: robot-useragent: PackRat/1.0 robot-language: perl with libwww-5.0 robot-description: Used for local maintenance and for gathering web pages so that local statisistical info can be used in artificial intelligence programs. Funded by NEMOnline. robot-history: In the making... robot-environment: research modified-date: Tue, 20 Aug 1996 15:45:11 modified-by: Terry Dexter robot-id:pageboy robot-name:PageBoy robot-cover-url:http://www.webdocs.org/ robot-details-url:http://www.webdocs.org/ robot-owner-name:Chihiro Kuroda robot-owner-url:http://www.webdocs.org/ robot-owner-email:pageboy@webdocs.org robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:unix robot-availability:none robot-exclusion:yes robot-exclusion-useragent:pageboy robot-noindex:yes robot-nofollow:yes robot-host:*.webdocs.org robot-from:yes robot-useragent:PageBoy/1.0 robot-language:c robot-description:The robot visits at regular intervals. robot-history:none robot-environment:service modified-date:Fri, 21 Oct 1999 17:28:52 GMT modified-by:webdocs robot-id: parasite robot-name: ParaSite robot-cover-url: http://www.ianett.com/parasite/ robot-details-url: http://www.ianett.com/parasite/ robot-owner-name: iaNett.com robot-owner-url: http://www.ianett.com/ robot-owner-email: parasite@ianett.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ParaSite robot-noindex: yes robot-nofollow: yes robot-host: *.ianett.com robot-from: yes robot-useragent: ParaSite/0.21 (http://www.ianett.com/parasite/) robot-language: c++ robot-description: Builds index for ianett.com search database. Runs continiously. robot-history: Second generation of ianett.com spidering technology, originally called Sven. robot-environment: service modified-date: July 28, 2000 modified-by: Marty Anstey robot-id: patric robot-name: Patric robot-cover-url: http://www.nwnet.net/technical/ITR/index.html robot-details-url: http://www.nwnet.net/technical/ITR/index.html robot-owner-name: toney@nwnet.net robot-owner-url: http://www.nwnet.net/company/staff/toney robot-owner-email: webmaster@nwnet.net robot-status: development robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: patric robot-noindex: yes robot-host: *.nwnet.net robot-from: no robot-useragent: Patric/0.01a robot-language: perl robot-description: (contained at http://www.nwnet.net/technical/ITR/index.html ) robot-history: (contained at http://www.nwnet.net/technical/ITR/index.html ) robot-environment: service modified-date: Thurs, 15 Aug 1996 modified-by: toney@nwnet.net robot-id: pegasus robot-name: pegasus robot-cover-url: http://opensource.or.id/projects.html robot-details-url: http://pegasus.opensource.or.id robot-owner-name: A.Y.Kiky Shannon robot-owner-url: http://go.to/ayks robot-owner-email: shannon@opensource.or.id robot-status: inactive - open source robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source, binary robot-exclusion: yes robot-exclusion-useragent: pegasus robot-noindex: yes robot-host: * robot-from: yes robot-useragent: web robot PEGASUS robot-language: perl5 robot-description: pegasus gathers information from HTML pages (7 important tags). The indexing process can be started based on starting URL(s) or a range of IP address. robot-history: This robot was created as an implementation of a final project on Informatics Engineering Department, Institute of Technology Bandung, Indonesia. robot-environment: research modified-date: Fri, 20 Oct 2000 14:58:40 GMT modified-by: A.Y.Kiky Shannon robot-id: perignator robot-name: The Peregrinator robot-cover-url: http://www.maths.usyd.edu.au:8000/jimr/pe/Peregrinator.html robot-details-url: robot-owner-name: Jim Richardson robot-owner-url: http://www.maths.usyd.edu.au:8000/jimr.html robot-owner-email: jimr@maths.su.oz.au robot-status: robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: yes robot-useragent: Peregrinator-Mathematics/0.7 robot-language: perl 4 robot-description: This robot is being used to generate an index of documents on Web sites connected with mathematics and statistics. It ignores off-site links, so does not stray from a list of servers specified initially. robot-history: commenced operation in August 1994 robot-environment: modified-date: modified-by: robot-id: perlcrawler robot-name: PerlCrawler 1.0 robot-cover-url: http://perlsearch.hypermart.net/ robot-details-url: http://www.xav.com/scripts/xavatoria/index.html robot-owner-name: Matt McKenzie robot-owner-url: http://perlsearch.hypermart.net/ robot-owner-email: webmaster@perlsearch.hypermart.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: perlcrawler robot-noindex: yes robot-host: server5.hypermart.net robot-from: yes robot-useragent: PerlCrawler/1.0 Xavatoria/2.0 robot-language: perl5 robot-description: The PerlCrawler robot is designed to index and build a database of pages relating to the Perl programming language. robot-history: Originated in modified form on 25 June 1998 robot-environment: hobby modified-date: Fri, 18 Dec 1998 23:37:40 GMT modified-by: Matt McKenzie robot-id: phantom robot-name: Phantom robot-cover-url: http://www.maxum.com/phantom/ robot-details-url: robot-owner-name: Larry Burke robot-owner-url: http://www.aktiv.com/ robot-owner-email: lburke@aktiv.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: Macintosh robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Duppies robot-language: robot-description: Designed to allow webmasters to provide a searchable index of their own site as well as to other sites, perhaps with similar content. robot-history: robot-environment: modified-date: Fri Jan 19 05:08:15 1996. modified-by: robot-id: phpdig robot-name: PhpDig robot-cover-url: http://phpdig.toiletoine.net/ robot-details-url: http://phpdig.toiletoine.net/ robot-owner-name: Antoine Bajolet robot-owner-url: http://phpdig.toiletoine.net/ robot-owner-email: phpdig@toiletoine.net robot-status: * robot-purpose: indexing robot-type: standalone robot-platform: all supported by Apache/php/mysql robot-availability: source robot-exclusion: yes robot-exclusion-useragent: phpdig robot-noindex: yes robot-host: yes robot-from: no robot-useragent: phpdig/x.x.x robot-language: php 4.x robot-description: Small robot and search engine written in php. robot-history: writen first 2001-03-30 robot-environment: hobby modified-date: Sun, 21 Nov 2001 20:01:19 GMT modified-by: Antoine Bajolet robot-id: piltdownman robot-name: PiltdownMan robot-cover-url: http://profitnet.bizland.com/ robot-details-url: http://profitnet.bizland.com/piltdownman.html robot-owner-name: Daniel Vilŕ robot-owner-url: http://profitnet.bizland.com/aboutus.html robot-owner-email: profitnet@myezmail.com robot-status: active robot-purpose: statistics robot-type: standalone robot-platform: windows95, windows98, windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: piltdownman robot-noindex: no robot-nofollow: no robot-host: 62.36.128.*, 194.133.59.*, 212.106.215.* robot-from: no robot-useragent: PiltdownMan/1.0 profitnet@myezmail.com robot-language: c++ robot-description: The PiltdownMan robot is used to get a list of links from the search engines in our database. These links are followed, and the page that they refer is downloaded to get some statistics from them. The robot runs once a month, more or less, and visits the first 10 pages listed in every search engine, for a group of keywords. robot-history: To maintain a database of search engines, we needed an automated tool. That's why we