Check any website for a robots.txt file and see what it contains.
What is robots.txt?
The robots.txt file is a plain text file that is placed on a website containing instructions for search engine bots. It is a loose standard with a specific format called the Robots Exclusion Protocol. When a search engine sends their bots crawling your website, they will check this file to determine any limitations or rules that the webmaster has set for the search engine.
For example, a webmaster can disallow a search engine from crawling part of their website explicitly, or disallow all crawling of a website unless specifically allowed by the rules defined in robots.txt.
How Reliable is robots.txt?
Using the robots.txt file to provide instructions to search bots is in no sense a highly secure method of protecting content from being indexed. While it is generally understood that a well-behaving search bot should follow the rules in the robots.txt file, there is nothing in this protocol that actually prevents them from crawling your entire site.
Robots.txt is not a form of security or authentication for your website, and some spider bots don't explicitly follow rules placed in this file. However, for webmasters that wish to leave content on their site without authentication, but do not want it to be indexed, robots.txt is the most suitable method of providing search engines with instructions for what to avoid.