views:

77

answers:

2

I'm trying to count the number of tags in the 'soup' from a beautifulsoup result. I'd like to use a regular expression but am having trouble. The code Ive tried is as follows:

reg_exp_tag = re.compile("<[^>*>")
tags = re.findall(reg_exp_tag, soup(cast as a string))

but re will not allow reg_exp_tag, giving an unexpected end of regular expression error.

Any help would be much appreciated!

Thanks

+1  A: 

Shouldn't that be "<[^>]*>" instead of "<[^>*>"?

(the class needs to be closed with a ])

Bart Kiers
Thanks a lot! Been staring at it so long not seeing the simple typos!
db90
Hehe, when programmer decided solve a problem with regex, he ends up with 2 problems.
Kugel
+4  A: 

If you've already parsed the HTML with BeautifulSoup, why parse it again? Try this:

num_tags = len(soup.findAll())
Ned Batchelder
+1 no regex can possibly parse HTML correctly, which is why most people use BeautifulSoup. BeautifulSoup plus serialisation to HTML plus regex is just a bagful of wrong!
bobince