views:

672

answers:

5

I am using preg_replace() for some string replacement.

$str = "<aa>Let's find the stuff qwe in between <id>12345</id> these two previous brackets</h>";

$do = preg_match("/qwe(.*)12345/", $str, $matches);

which is working just fine and gives the following result

$match[0]=qwe in between 12345
$match[1]=in between 

but I am using same logic to extract from the following string.

<text>
  <src><![CDATA[<TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Arial" SIZE="36" COLOR="#999999" LETTERSPACING="0" KERNING="0">r1 text 1  </FONT></P></TEXTFORMAT>]]></src>
  <width>45%</width>
  <height>12%</height>
  <left>30.416666666666668%</left>
  <top>3.0416666666666665%</top>
  <begin>2s</begin>
  <dur>10s</dur>
  <transIn>fadeIn</transIn>
  <transOut>fadeOut</transOut>
  <id>E2159292994B083ACA7ABC7799BBEF3F7198FFA2</id>
</text>

I want to extract the string from

r1text1

to

</id>

The Regular expression I currently Have is:

preg_match('/r1text1(.*)</id\>/', $metadata], $matches);

where $metadata is the above string..

$matches does not return anything.... For some reason...how do i do it? Thanks in advance

A: 

you probably need to parse your string/file and extract the value between the FONT tag. Then insert the value into the id tag

Try googling for php parsing.

Konstantinos
+2  A: 

If you want to extract the text, you will probably want to use preg_match. The following might work:

preg_match('#\<P[^\>]*\>\<FONT[^\>]*\>(.*\</id\>)#', $string, $matches)

Whatever gets matched in the parantheses can be found later in the $matches array. In this case everything between a <P> tag followed by a <FONT> tag and </id>, including the latter.

Above regex is untested but might give you a general idea of how to do it. Adapt if your needs are a bit different :)

Joey
Hi Johannes. Thanks for the snippet. It did not seem to work. Also I did'nt get why we need to worry about <p><font>. Can we just pick from <text> ?I need to replace the string "r1text1" based on the unique identifier <id>E2159292994B083ACA7ABC7799BBEF3F7198FFA2</id> for various other data sets
I hope that gave you more direction to what I was trying to do. Thanks in advance for helping me out. :) :)
If you use / as the character to delimit your regex (as you have done in your edit) you need to escape any literal / with \/ (such as the one in </id>. That's why I used # :)
Joey
And sure you can just get from <text> to </id> but that wasn't what you wanted in your question. The best way would probably be to just let PHP parse the XML (there are XML parsers somewhere) and then examine the DOM. Parsing XML with regular expressions is really a PITA.
Joey
A: 

try this

preg_match('/r1text1(.*)<\/id\>/', $metadata], $matches);

You are using / as the pattern delimiter but your content has / in . You can use \ as the escape character.

uuɐɯǝʃǝs
A: 

In the sample you have "r1 text 1 ", yet your regular expression has "r1text1". The regular expression doesn't match because there are spaces in the string you are trying to match it against. You should include the spaces in the regular expression.

Rich Adams
+1  A: 

Even if don't know why you would match the regex on a incomplete XML fragment (starting within a <![CDATA[ and ending right before the closing XML tag </id>, you do have three obvious problems with your regex:

  1. As Amri said: you have to escape the / character in the closing XML tag because you use / as the pattern delimiter. By the way, you don't have to escape the > character. That gives you: '/r1text1(.*)<\/id>/' Alternatively you can change the pattern delimiter to # for example: '#r1text1(.*)</id>#' (I will use the first pattern to further develop the expression).

  2. As Rich Adams already said: the text in your example data is "r1_text_1" (_ is a space character) but you match against '/r1text1(.*)<\/id>/'. You have to include the spaces in your regex or allow for a uncertain number of spaces, such as '/r1(?:\s*)text(?:\s*)1(.*)<\/id>/' (the ?: is the syntax for non-capturing subpatterns)

  3. The . (dot) in your regex does not match newlines by default. You have to add the s (PCRE_DOTALL) pattern modifier to let the . (dot) match against newlines as well: '/r1(?:\s*)text(?:\s*)1(.*)<\/id>/s'

Stefan Gehrig