Before anybody asks, I am not doing any kind of screenscraping.
I'm trying to parse an html string to find a div with a certain id. I cannot for the life of me get this to work. The following expression worked in one instance, but not in another. I'm not sure if it has to do with extra elements in the html or not.
<div\s*?id=(\""|"|")content(\""|"|").*?>\s*?(?>(?! <div\s*?> | </div> ) | <div\s*?>(?<DEPTH>) | </div>(?<-DEPTH>) | .?)*(?(DEPTH)(?!))</div>
It is finding the first div with the right id correctly, but it then closes at the first closing div, and not the related div.
<div id="firstdiv">begining content<div id="content">some other stuff
<div id="otherdiv">other stuff here</div>
more stuff
</div>
</div>
This should bring back
<div id="content">some other stuff
<div id="otherdiv">other stuff here</div>
more stuff
</div>
, but for some reason, it is not. It is bring back:
<div id="content">some other stuff
<div id="otherdiv">other stuff here</div>
Does anybody have an easier expression to handle this?
To clarify, this is in .NET, and I'm using the DEPTH keyword. You can find more details here.