tags:

views:

993

answers:

6

Stuck on a (rather simple) regex problem in PHP.

Buried in a mess of text is this section:

  <tr>
        <td id="descriptionArea">
            Customer request to remove "Intro - 01/13/09" video clip.
            <br/>
        </td>
    </tr>

I want whatever is between:

descriptionArea">

...and...

</td>

A friend suggested:

$pattern = '<td="descriptionArea">\s*(.*?)\s*<';
$clean = preg_replace("'[\n\r\s\t]'","",$text); // to rid of line breaks
preg_match($pattern, $clean, $matches);
print_r($matches);

But I get the following error:

Warning: preg_match() [function.preg-match]: Unknown modifier 'q'

I suppose the second question is whether preg_match is the correct PHP function for this, also. Should I be using ereg instead? Thanks for your help.

+1  A: 

I suspect it's interpreting the ampersands as control characters of some kind. I can't find a reference to support this however.

Try replacing all of the instances of & with [&].

wombleton
+2  A: 

You'll want to escape out the "&", like wombleton says, and also enclose your pattern with forward slashes, like $pattern = "/pattern/";

The below code returns an array with some ugly stuff in it but at least it returns a match.. :)

$description = " <tr>
        <td id="descriptionArea">
            Customer request to remove "Intro - 01/13/09" video clip.
            <br/>
        </td>
    </tr>";

$pattern = "/<td.*[&]quot;descriptionArea[&]quot;[&]gt;\s*(.*?)\s*.*?lt/";
$clean = preg_replace("'[\n\r\s\t]'","",$description); // to rid of line breaks

preg_match($pattern, $clean, $matches);
var_dump($matches);

EDIT

Here's a nicer version. Get rid of all the HTML encoding so you can use a standard HTML-parsing regex:

$pattern = '/<.*?id="descriptionArea">(.*?)<\/td>/';
$clean = preg_replace("'[\n\r\t]'","",htmlspecialchars_decode($description)); 
preg_match($pattern, $clean, $matches);
SkippyFlipjack
A: 

If you want to grab the text between two constants, wouldn't it be easier to use good ol' strpos?

EDIT

e.g.

$string = 'text to be >searched< within';
$const1 = '>';
$const2 = '<';
$start = strpos($string, $const1);
$end = strpos($string, $const2, $start + strlen($const1));
$result = substr($string, $start, $end - $start);

I haven't run it, so it might be buggy, but you should get the idea.

ya23
A: 

What is you used the following for $pattern?

$pattern = '(?s:descriptionArea&quot;&gt;(.*)&lt;/td&gt;)';

I don't know PHP, but the RegEx appears to work within Regular Expression Designer when I tested it. The option of (?s:) is 'Singleline On'.

Mark

lordhog
+3  A: 

When using the preg_* functions, the first character or the pattern is treated as delimiter:

The expression must be enclosed in the delimiters, a forward slash (/), for example. Any character can be used for delimiter as long as it's not alphanumeric or backslash (\). If the delimiter character has to be used in the expression itself, it needs to be escaped by backslash. Since PHP 4.0.4, you can also use Perl-style (), {}, [], and <> matching delimiters.
Regular Expressions (Perl-Compatible) – Introduction

So you don’t need to escape or replace the & characters as others said. Instead use proper delimiters and escape those characters inside the expression:

'/&lt;td id=&quot;descriptionArea&quot;&gt;(.*?)&lt;\/td&gt;/'
Gumbo
Note also that the OP's regex was incomplete: it started "<td="" instead of "<td id="". But the main problem was the delimiters, and this is the first answer to correctly address that issue.
Alan Moore
A: 

The specific error you are getting comes from preg_* functions using the first character of the pattern as a delimiter (in this case "&"), and everything after the second occurrence of the delimiter as modifiers (such as "i" for case-insensitivity.)

In this case, it thinks you are looking for lt;td= and you want modifiers quot;descriptionArea&quot;&gt;\s*(.*?)\s*&lt;. The first modifier "q" does not make sense, and it bails.

Mike Boers