I want to load an url as a string, and then use regex to write out matches in c#.
+3
A:
Downloading an URL as a string is easy using System.Net.WebClient.DownloadString.
Finding matches in the HTML is easy using System.Text.RegularExpressions.Regex.Match.
Both links have good examples on usage.
bzlm
2009-02-22 17:21:35
+1
A:
It's really not a good idea to use regular expressions to match content in HTML. Better is to use regular expressions to match tokens in HTML, and parse it. But then, at that point, you might as well use an existing parser.
Devin Jeanpierre
2009-02-22 17:27:10
Well, for traditional screen scraping, traversing the DOM scarcely works. I've never done that without having had to resort to regular expressions. But I agree that screen scraping is inherently malpractical.
bzlm
2009-02-22 17:29:20
I can't speak for C#, but I've had very good experiences with using extremely permissive HTML parsers in Python-- specifically, with BeautifulSoup. In such cases, it works about as well as you'd expect.
Devin Jeanpierre
2009-02-22 17:32:38
I agree, but I screen scraping and HTML parsing are two completely different things. I have a feeling what the OP here wants is the former. But really, who cares?
bzlm
2009-02-22 17:35:48