views:

226

answers:

2

I want to load an url as a string, and then use regex to write out matches in c#.

+3  A: 

Downloading an URL as a string is easy using System.Net.WebClient.DownloadString.

Finding matches in the HTML is easy using System.Text.RegularExpressions.Regex.Match.

Both links have good examples on usage.

bzlm
+1  A: 

It's really not a good idea to use regular expressions to match content in HTML. Better is to use regular expressions to match tokens in HTML, and parse it. But then, at that point, you might as well use an existing parser.

Devin Jeanpierre
Well, for traditional screen scraping, traversing the DOM scarcely works. I've never done that without having had to resort to regular expressions. But I agree that screen scraping is inherently malpractical.
bzlm
I can't speak for C#, but I've had very good experiences with using extremely permissive HTML parsers in Python-- specifically, with BeautifulSoup. In such cases, it works about as well as you'd expect.
Devin Jeanpierre
I agree, but I screen scraping and HTML parsing are two completely different things. I have a feeling what the OP here wants is the former. But really, who cares?
bzlm