Is it possible/practical to build a single regular expression that matches hierarchical data?
For example:
<h1>Action</h1>
<h2>Title1</h2><div>data1</div>
<h2>Title2</h2><div>data2</div>
<h1>Adventure</h1>
<h2>Title3</h2><div>data3</div>
I would like to end up with matches.
"Action", "Title1", "data1"
"Action", "Title2", "data2"
"Adventure", "Title3", "data3"
As I see it this would require knowing that there is a hierarchical structure at play here and if I code the pattern to capture the H1, it only matches the first entry of that hierarchy. If I don't code for H1 then I can't capture it. Was wondering if there are any special tricks I an employ to solve this.
This is a .NET project.