The robots.txt file contains instructions from the website to the search engine bots that crawl the internet. This text file tells these web crawlers or web robots what they can and cannot do. In most cases, these robots want to categorize websites so they can easily be searched, but some want to do more. Generally, web crawlers are programmed to look for a robots.txt file first and to follow any special instructions that the website developer has left for them.
These special instructions may tell the robots that they cannot look at certain directories or add certain files to the search engines. There are several reasons why a website owner may want to do this. Some of these files may need to be kept private for security reasons. Other times, the owner may not want the files indexed because they contain information that is irrelevant or that doesn’t fit with the website’s categorization.
For websites that have multiple subdomains, the website owner needs to create a robots.txt file for each of those subdomains.
Note that these bots don’t always read the robots.txt file, and even those that do may be programmed to ignore the instructions. In fact, hackers often create bots that intentionally seek out the directors and files listed in the robots.txt file because they believe they may contain protected information.