spider (web) (Or "robot", "crawler") A program that automatically explores the web by retrieving a document and recursively retrieving some or all the documents that are referenced in it. This is in contrast with a normal web browser operated by a human that doesn't automatically follow links other than inline images and URL redirection.
The algorithm used to pick which references to follow strongly depends on the program's purpose. Index-building spiders usually retrieve a significant proportion of the references. The other extreme is spiders that try to validate the references in a set of documents; these usually do not retrieve any of the links apart from redirections. The standard for robot exclusion is designed to avoid some problems with spiders. Early examples were Lycos and WebCrawler. Home. Last updated: 2001-04-30