views:

164

answers:

2

I want to parse a html content that have something like this:

<div id="sometext">Lorem<br> <b>Ipsun</b></div><span>content</span><div id="block">lorem2</div>

I need to catch just the "Lorem<br> <b>Ipsun</b>" inside the first div. How can I achieve this?

Ps: the html inside the first div have multiple lines, its an article.

Thanks

A: 

Assuming that the id is known:

preg_match('#<div id="sometext">(.*?)</div>#s', $text, $match);
kemp
Would not work if the `div` has more attributes than only `id`.
Felix Kling
It will also not work if div changes to `<p>`, so? I stick to the question.
kemp
this works for me...thanks
rizidoro
+4  A: 

Trying to use regex to parse HTML is not a very nice experience as HTML isn't a regular language. An alternative would be to use a HTML parser like Simple HTML DOM or the DOM library/

Simple HTML DOM Example:

$html = str_get_html('<div id="sometext">Lorem<br> <b>Ipsun</b></div><span>content</span><div id="block">lorem2</div>');
echo $html->find('div[id=sometext]', 0)->innertext;
Yacoby
@Yacoby, thanks for recommending this library. I think it's great and it solves the OP's issue with the snap of a finger :)
macek