ansaurus

Question

Answer 1

+3 A:

Do you want to parse XML, as you state in your question's title, or HTML, as you show in the text of the question? For the latter, I recommend BeautifulSoup -- download it and install it, then, once having made a soup object out of the HTML, you can easily locate the tag with a certain id (or other attribute), e.g.:

errp = soup.find(attrs={'id': 'ErrorPanel'})
if errp is not None:
  print 'Error:', errp.string

and similarly for the other case (easily tweakable e.g. into a loop if you're looking for non-unique attributes, and so on).

Alex Martelli 2009-12-01 04:52:00

I would like to parse HTML (XHTML).

Russell 2009-12-01 05:08:12

Keep in mind, Beautiful Soup is designed to parse sloppy HTML, and it might not work on theoretically correct XHTML. On the other hand, a correct XML parser will choke on bad inputs that Beautiful Soup will parse.

wisty 2009-12-01 05:11:55

@wisty, I've never had any problem using BeautifulSoup to parse XHTML -- can you think of any specific example? (URL would be welcome, tx!)

Alex Martelli 2009-12-01 05:44:22

Answer 2

+3 A:

You can also do it with lxml. It handles HTML very well, and you can use CSS selectors for querying DOM, which makes it particularly attractive if you use libraries like jQuery regularly.

Imran 2009-12-01 06:36:39

ansaurus

tags:

views:

answers:

How do I query XHTML using python?

related questions