tags:

views:

55

answers:

3

i have a small problem, i want to find in <tr><td>3</td><td>foo</td><td>2</td> the foo, i use: $<tr><td>\d</td><td>(.*)</td>$ to find the foo, but it dont work because it dont match with the </td> at the end of foo but with the </td> at the end of the string

A: 

Use:

^<tr><td>\d</td><td>(.*?)</td>

(insert obligatory comment about not using regex to parse xml)

Senseful
+2  A: 

You have to make the .* lazy instead of greedy. Read more about lazy vs greedy here.
Your end of string anchors ($) also don't make sense. Try:

<tr><td>\d<\/td><td>(.*?)<\/td>

(As seen on rubular.)

NOTE: I don't advocate using regex to parse HTML. But some times the task at hand is simple enough to be handled by regex, for which a full-blown XML parser is overkill (for example: this question). Knowing to pick the "right tool for the job" is an important skill in programming.

NullUserException
Explain the downvote.
NullUserException
I'm just going to say it wasn't me (even though I did downvote another post for saying HTML isn't regular and should not be parsed with regex). You're actually answering the question. (EDIT: +1 for you)
Platinum Azure
+1 Good answer and thanks for catching my mistake.
Senseful
A: 

Your leading $ should be a ^.

If you don't want to match all of the way to the end of the string, don't use a $ at the end. However, since * is greedy, it'll grab as much as it can. Some regex implementations have a non-greedy version which would work, but you probably just want to change (.*) to ([^<]*).

dash-tom-bang
Indeed, I'm curious what was wrong enough about this answer to demand a downvote. Alas.
dash-tom-bang