This question is a lazy way of collecting examples of parsing HTML with a variety of languages and parsing libraries. Individual comments will be linked to in answers to questions about how to parse HTML with regexes as a way of showing the right way to do things (similar to how I use Can you provide some examples of why it is hard to parse XML and HTML with a regex?).
For the sake of consistency, I ask that the example be parsing an HTML file for the href
in anchor tags. To make it easy to search this question, I ask that you follow this format
language:
library:
<example code>
Please make the library a link to the documentation for the library. If you want to provide an example other than extracting links, please include a
purpose:
after the "library:".
Note, the tags have been changed to draw in other languages. Here is a history of the tags this post has had: c#, perl, python, ruby, vb.net, and parsing.