tags:

views:

245

answers:

4

I have an html file where I'd like to get all the text inside these two tags:

<div class="articleTitle">
</div>

I'm not entirely sure how to do the php regex. (I also know there are no html tags inside the div, so there's no problem about nested tags)

update: when i try the solutions given i get this: Warning: preg_match() [function.preg-match]: Unknown modifier 'd' on line 29

A: 

This would be more correct, as other solutions would match <div class="articleTitle"><div/> by itself, which is probably undesirable?

preg_match('<div class="articleTitle">(.+?)</div>', $test_string, $matches);
GONeale
this will only capture 1 character and, if there is more then one (even whitespace) will cause it to not match, as-well it will not be captured.
Unkwntech
wow, is that easy?
That expression does not capture but match the text inside. And since it's greedy, it may match the wrong closing tag.
Török Gábor
@raj no, as I said in my comment this will only capture a single character and there must not be any other characters in the text.
Unkwntech
I was going to ask if there was a necessity to make it greedy or not.
GONeale
@Unkwntech actually, it will match 0 to unlimited characters, which is probably wrong anyway. We both should be using '.+?'
GONeale
@GONeale thanks
+4  A: 
preg_match('/<div class="articleTitle">(.*?)<\/div>/i', $source, $matches);
print_r($matches);

This is the "Explination" from RegexBuddy:

<div class="articleTitle">(.*?)</div>

Options: case insensitive

Match the characters “<div class="articleTitle">” literally «<div class="articleTitle">»
Match the regular expression below and capture its match into backreference number 1 «(.*?)»
   Match any single character that is not a line break character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “</div>” literally «</div>»

Created with RegexBuddy

(.*?) will capture everything between what comes before it until what comes after it, and it will be places into the $matches var.

I assumed that the HTML will be in the $source var.

I suggest that you look into RegexBuddy, it's 39.95 (USD) but it is worth every penny. It can help build your RegExs with most every major RegEx implementation, and it can help you to learn RegEx

Unkwntech
Just fixed my preg_match I was missing the /s and the case-insensitive flag.
Unkwntech
when i try that i get this:Warning: preg_match() [function.preg-match]: Unknown modifier 'd' on line 29
Escape the forward slash in '</div>', so rather write '<\/div>'.
Török Gábor
got it to work thanks
Thanks Torok I didn't the code so that slipped by, I'll update the answer
Unkwntech
A: 
'/<div class="articleTitle">(.*?)<\/div>/'

Would generally work; however, if you need to take into account other possible attributes in the div tag, it would be a little more complex.

Tim Lytle
That expression does not capture but only matches the text inside.
Török Gábor
Yes, I missed that - add the () to match. I was just trying to give the basic idea, there are other far better (and a little harder to parse) examples in the answers.
Tim Lytle
+2  A: 

Wrong answers!

preg_match('#<div\s+[^>]*class="articleTitle"[^>]*>(.*)</\s*div>#ims', $str, $matches);
  1. DIV can be empty, so pattrns like (.+) are wrong.
  2. you shold use "m" modifier - content can be multiline.
  3. you should use "s" modifier to match dot-metacharacter as newline.
  4. Just wonder, why escape slash if pattens in php can have ANY delimiter? Usually I use # as delimiter in this case.
  5. DIV can have additional attributes and/or space characters (including newlines).

Sorry, have no time to test pattern good, but it seems to be correct. This should work in any case.

PS: and, GONeale, about greediness - pattern must be greedy and it IS greedy without modifier "U".

Jet
Welcome new challenger!
GONeale
Had a look at patterns again and noticed that (.+?) must work, but why using 2 quantifiers when they are in summary equal to "*" ? Or I don't know something and they work different ways...?PS: thanks, GONeale, but why challenger?
Jet