Short question:
Has anybody got any C# code to parse robots.txt and then evaluate URLS against it so see if they would be excluded or not.
Long question:
I have been creating a sitemap for a new site yet to be released to google. The sitemap has two modes, a user mode (like a traditional sitemap) and an 'admin' mode.
The admin mode will show all possible URLS on the site, including customized entry URLS or URLS for a specific outside partner - such as example.com/oprah
for anyone who sees our site on Oprah. I want to track published links somewhere other than in an Excel spreadsheet.
I would have to assume that someone might publish the /oprah
link on their blog or somewhere. We don't actually want this 'mini-oprah site' to be indexed because it would result in non-oprah viewers being able to find the special Oprah offers.
So at the same time I was creating the sitemap I also added URLS such as /oprah
to be excluded from our robots.txt
file.
Then (and this is the actual question) I thought 'wouldn't it be nice to be able to show on the sitemap whether or not files are indexed and visible to robots'. This would be quite simple - just parse robots.txt and then evaluate a link against it.
However this is a 'bonus feature' and I certainly don't have time to go off and write it (even thought its probably not that complex) - so I was wondering if anyone has already written any code to parse robots.txt ?