tags:

views:

27

answers:

1

Hello everyone. For the sake of this question, I'll include a basic example of what I'm trying to do. I have been looking for a method using regex which would allow me to have an input such as this:

<a>$4<br>.00</a>

To match this in one sub-group 4.00

I have tried numerous methods, all being around the lines of:

<a>\$([0-9]+<br>\.[0-9]+)</a>
or
<a>\$([0-9]+(?:<br>)\.[0-9]+)</a>
            ^-- Excludes <br> from being placed in a match group, but it does not
                exclude <br> from its parent match group, so we still get 4<br>.00

Both of the methods above match 4<br>.00

My question is: Are there any other Regex operators that allow me to exclude certain sub-expressions from their parent sub-expressions? (Match 4<br>.00 but exclude <br> giving 4.00 in 1 sub-group)

A: 

If you want to use regex, you don't have to really do it in one step. you can break it up into steps. Eg: Get the text from to and save to variable using /<a>(.*?)<\/a>/. then replace the tags

>>> import re
>>> s="<a>$4<br>.00</a>"
>>> re.sub("<a>(.*?)<\/a>","\\1",s)
'$4<br>.00'
>>> var=re.sub("<a>(.*?)<\/a>","\\1",s)
>>> re.sub("<.*?>","",var)
'$4.00'
ghostdog74
This is the problem I'm trying to work around. I could just as well have the two strings returned as 2 sub-groups and concatenate them, but I need to be able to match using only 1 expression, as well as having 4.00 returned as 1 sub-group (ignoring the <br> in between).
Parazuce