ansaurus

Question

HTML tag replacement using regex and python

Answer 1

+1 A:

search and replace with this regex: search for: <.*?> replace with: "

ennuikiller 2009-09-27 21:53:42

Answer 2

+1 A:

Check out lxml, a really nice python library for dealing with xml. You can use drop_tag to accomplish what you are looking for.

from lxml import html 
h = html.fragment_fromstring('<doc>Hello <b>World!</b></doc>')
h.find('*').drop_tag()
print(html.tostring(h, encoding=unicode))

<doc>Hello World!</doc>

moorej 2009-09-28 02:54:23

Answer 3

+3 A:

For what you are trying to accomplish I would use BeautifulSoup rather than regex.

http://www.crummy.com/software/BeautifulSoup/

gnibbler 2009-09-28 03:11:57

+1. BeautifulSoup is the definite answer.

muhuk 2009-09-29 07:20:30

ansaurus

tags:

views:

answers:

HTML tag replacement using regex and python

related questions