views:

47

answers:

3

This is the sort of HTML string I will be performing matches on:

<span class="q1">+12 Spell Power and +10 Hit Rating</span>

I want to get +12 Spell Power and +10 Hit Rating out of the above HTML. This is the code I wrote:

preg_match('/<span class="q1">(.*)<\/span>/', $gem, $match);

But due to <\/span> it's escaping the / in </span> so it doesn't stop the match, so I get a lot more data than what I want.

How can I escape the / in </span> while still having it part of the pattern?

Thanks.

+2  A: 
  1. Don't use regex to parse HTML
  2. use DOM, particularly the loadHTML method and getElementsByTagName('span')

-

    $doc = new DOMDocument();
    $doc->loadHTML($htmlString);
    $spans = $doc->getElementsByTagName('span');
    if ( $spans->length > 0 ) {
     // loop on $spans
    }
meder
+2  A: 

Don't use regex to parse HTML. Use an HTML parser. See Robust, Mature HTML Parser for PHP.

Jason
A: 

I think the reason that your regex is getting more than you want is because * is greedy, matching as much as possible. Instead, use *?, which will match as little as possible:

preg_match('/<span class="q1">(.*?)<\/span>/', $gem, $match);
dvcolgan
That works thanks. Reason I don't want to use the DOMDocument class is that it's a very small piece of HTML and this code will only be run once, I'm collecting data to be put into a database. No need to complicate things. :)
VIVA LA NWO