I am writing a program that will help me find out sites are my competitors linking to.
In order to do that, I am writing a program that will parse an HTML file, and will produce 2 lists: internal links and external links.
I will use the internal links to further crawl the website, and the external links are actually what I am looking for.
How, using .NET RegEx, do I parse an HTML file and find 1. External links. 2. Internal links.
Thanks in advance, Eytan Levit.
Edit: In response to the question - no - I am not bound to regex, i can use any other ideas.