ansaurus

Question

Parsing Random Web Pages

Answer 1

A:

Use Python. http://www.python.org/
Use Beautiful Soup. http://www.crummy.com/software/BeautifulSoup/

S.Lott 2010-09-21 10:10:10

Thanks! I am planning to use a .NET.

Venkateshwar 2010-09-21 20:50:57

@Venkateshwar: Please **update** your question with all the facts. Python and Beautiful Soup work perfectly in .Net

S.Lott 2010-09-21 22:38:49

Answer 2

+1 A:

Please see this answer: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Daniel Cassidy 2010-09-21 10:11:23

Answer 3

A:

You need to use a proper HTML parser, and extract the elements you’re interested in via the parser’s API (or via the DOM).

Since I don’t know what language you’re programming in, it’s rather difficult to recommend a parser, but some well known ones are Jericho for Java, and Beautiful Soup for Python.

Daniel Cassidy 2010-09-21 10:18:55

ansaurus

tags:

views:

answers:

Parsing Random Web Pages

related questions