views:

197

answers:

1

If I am creating a simple web scraper (from root url, grab all links, then from those links grab all emails) would it be worthwhile to use HTML Agility Pack? I am not actually looking through HTML tags, I am simply looking to scan for emails within the entire document.

Would it be more efficient to use HTML agility pack?

I am stripping them strictly because it is necessary I have these emails, and there are about 100 links. Only about 500 emails will be scraped. No worries, I'm keeping ethics in mind here.

+2  A: 

There are many question on SO about this - most of the ones I read say - don't use regular expressions for web scrapping.

On the other hand - if all you want is text parsing regardless of the HTML nature of the text (which you do if I understand you correctly), it may be better to use regular expressions.

Dror
thank you, this is exactly why I posted this. I've read numerous threads on this... but not on if you don't care if it contains HTML or not.
cam