I was wondering if somebody could help me use string split to get all occurrences of text in between <p>
</p>
tags in an HTML document?
views:
171answers:
5
+2
A:
That's rather a large problem for String.Split()
. I'd recommend using an XML parser instead.
harriyott
2009-11-18 15:42:37
it doesnt even have to get the whole lot. maybe just four or five occurances? would that be cool? ...xml doesn't like me :(
baeltazor
2009-11-18 15:43:38
Ah, ok. Is there any chance the opening <p> tag will have attributes (e.g <p class="article">)?
harriyott
2009-11-18 15:45:16
there is a chance. but if its too hard if it does then i will settle for just <p> for now :)
baeltazor
2009-11-18 16:00:42
+6
A:
Sounds like you want to look at the HTML Agility Pack. It works very well on dodgy HTML documents!
RichardOD
2009-11-18 15:43:11
+2
A:
Take a look at regular expressions. String split is not a good solution.
rerun
2009-11-18 15:44:12
+1
A:
For the benefit of the folks who suggest RegEx, can I just point to this answer:
RegEx match open tags except XHTML self-contained tags (Stack Overflow)
Just say no.
Kev
2009-11-18 15:53:09
A:
i've been doing this manually, just traversing the string in a loop and counting the <p>
tags and if you found one <p
and than another <p
and another and than you suddenly have a </p>
than you must wait until you find the 3rd </p>
and there you have it
Omu
2009-11-18 15:58:27