Lars Thurlow: Robots.txt

Friday, 13 January 2012

Robots.txt

“The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard is different from, but can be used in conjunction with, Sitemaps, a robot inclusion standard for websites.” (wikipedia 2011)

I looked at numerous websites to gain an understanding of robots.txt to see how it worked and what it is used for. A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data. I hope this document will work as a factor to the live client project. It may take me a few attempts to get it functioning live but the text its self seems straight forward.

Lars Thurlow

Followers

Blog Archive

About Me

Friday, 13 January 2012

Robots.txt

No comments:

Post a Comment