The robots.txt file is a file in the root directory of a web site that controls spider access to the site. It can allow or disallow specific spiders access to specific pages on the site. Spiders can ignore the robots.txt file, but the reputable ones check it first and index only the pages that the robots.txt file allows them to. Notably, all of the major search engines and the Wayback Machine honour the robots.txt file.