Anybody got any C# code to parse robots.txt and evaluate URLS against it

Short question:

Has anybody got any C# code to parse robots.txt and then evaluate URLS against it so see if they would be excluded or not.

Long question:

I have been creating a sitemap for a new site yet to be released to google. The sitemap has two modes, a user mode (like a traditional sitemap) and an 'admin' mode.

The admin mode will show all possible URLS on the site, including customized entry URLS or URLS for a specific outside partner - such as example.com/oprah for anyone who sees our site on Oprah. I want to track published links somewhere other than in an Excel spreadsheet.

I would have to assume that someone might publish the /oprah link on their blog or somewhere. We don't actually want this 'mini-oprah site' to be indexed because it would result in non-oprah viewers being able to find the special Oprah offers.

So at the same time I was creating the sitemap I also added URLS such as /oprah to be excluded from our robots.txt file.

Then (and this is the actual question) I thought 'wouldn't it be nice to be able to show on the sitemap whether or not files are indexed and visible to robots'. This would be quite simple - just parse robots.txt and then evaluate a link against it.

However this is a 'bonus feature' and I certainly don't have time to go off and write it (even thought its probably not that complex) - so I was wondering if anyone has already written any code to parse robots.txt ?

oops. ill admit i didnt search google this time. however ironically this question is now the first match for 'c# robots.txt' :-) i'll see if i can extract what i need from that. thanks

Simon_Weaver 2009-03-11 08:31:05

I hope you're not stuck in an infinite loop now ;-)Funny, they even show exactly the Google part of my answer as the preview text. I didn't realize Google has become so fast by now even for non-news sites, very interesting.

markus 2009-03-11 09:49:25

ansaurus

tags:

views:

answers:

Anybody got any C# code to parse robots.txt and evaluate URLS against it

related questions