tags:

views:

51

answers:

4

I'm trying to figure out the regex for the following:

String</td><td>[number 0-100]%</td><td>[number 0-100]%</td><td>String</td><td>String</td>

Also, some of these td tags may have style attributes at some point. I tried this:

String<.*>

and that returned

String</td>

but trying

String<.*><.*>

returned nothing. Why is this?

+1  A: 
(.+)</td><td>(1?\d?\d)%</td><td>(1?\d?\d)%</td><td>(.+)</td><td>(.+)</td>
Mentalikryst
This is good, but the tags won't always be <td>, sometimes they will have attributes and say <td style=....>
codersarepeople
+1  A: 

use Character class, like <td[^>]*> if <td> or <td class="abc">

Nikhil Jain
+1  A: 

You probably shouldn't be trying to use a regex to parse HTML, because that way lies madness.

michaeltwofish
Nice article :)
Zafer
+1  A: 

Try the following:

(.+)(<[^>]+>){2}(1?\d?\d)%(<[^>]+>){2}(1?\d?\d)%(<[^>]+>){2}(.+)(<[^>]+>){2}(.+)<[^>]+>

You can test it here.

EDIT: Although this will work for most of the time, if there is > character in one attribute of the tag, this regex won't work.

Zafer
`>` is allowed in an attribute value.
Gumbo
I was writing this as an edit :).
Zafer