Since the line number is important to you here and not the actual contents of the div, I'd be inclined not to use regex at all. I'd probably explode()
the string into an array and loop through that array looking for your marker. Like so:
<?php
$myContent = "[your string of html here]";
$myArray = explode("\n", $myContent);
$arraylen = count($myArray); // So you don't waste time counting the array at every loop
$lineNo = 0;
for($i = 0; $i < $arraylen; $i++)
{
$pos = strpos($myArray[$i], 'id="Alpha"');
if($pos !== false)
{
$lineNo = $i+1;
break;
}
}
?>
Disclaimer: I haven't got a php installation readily available to test this so some debugging may be required.
Hope this helps as I think it's probably just going to be a waste of time for you to implement a parsing engine just to do something so simple - especially if it's a one-off.
Edit: if the content is impotant to you at this stage too then you can use this in combination with the other answers which provide an adequate regex for the job.
Edit #2: Oh what the hey... here's my two cents:
"/<div.*?id=\"Alpha\".*?>.*?(<div.*//div>)*.*?//div>/m"
The (<div.*//div>)
tells the regex engine that it may find nested div tags and to just incorporate them into the match if it finds them rather than just stopping at the first </div>
. However this only solves the problem if there is only one level of nesting. If there's more, then regex is not for you sorry :(.
The /m
also makes the regex engine ignore linebreaks so you don't have to dirty up your expressions with [\S\s]
everywhere.
Again, sorry, I've no environment to test this in at the moment so you may need to debug.
Cheers
Iain