tags:

views:

69

answers:

2

Hi,

I am doing simple regular expressions in python

I am trying the re.split but things like ['\r\n', '\r\n'] are coming instead of the answer. Can someone please tell me how to display the actual text please?

I tried this statement:

t_html = re.split("<[a-zA-Z0-9\s\w\W]*>[a-zA-Z0-9\s\w\W]*</[a-zA-Z0-9\s\w\W]*>" ,s)

THanks

A: 

re.split by its very nature splits on the pattern but does not preserve it. If you want to return the string matched by the pattern you can put parentheses around the pattern: re.split((R),string) where R is your expression. If you want to say find all non overlapping matches use re.findall which will return a list. See here for more details and options.

fridder
A: 

If you want to use a regex to parse html, see here.

Matt Anderson