views:

171

answers:

5

I was wondering if somebody could help me use string split to get all occurrences of text in between <p> </p> tags in an HTML document?

+2  A: 

That's rather a large problem for String.Split(). I'd recommend using an XML parser instead.

harriyott
it doesnt even have to get the whole lot. maybe just four or five occurances? would that be cool? ...xml doesn't like me :(
baeltazor
Ah, ok. Is there any chance the opening <p> tag will have attributes (e.g <p class="article">)?
harriyott
there is a chance. but if its too hard if it does then i will settle for just <p> for now :)
baeltazor
+6  A: 

Sounds like you want to look at the HTML Agility Pack. It works very well on dodgy HTML documents!

RichardOD
lol..are you one of the people who made that? btw thank you very much for the link i'm downloading it now it sounds awesome!
baeltazor
No, but I've used it and recommend it.
RichardOD
+2  A: 

Take a look at regular expressions. String split is not a good solution.

rerun
Just say NO to using RegEx for parsing HTML.http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
Greg
@Greg- that's funny. I'd not seen that answer before.
RichardOD
+1  A: 

For the benefit of the folks who suggest RegEx, can I just point to this answer:

RegEx match open tags except XHTML self-contained tags (Stack Overflow)

Just say no.

Kev
A: 

i've been doing this manually, just traversing the string in a loop and counting the <p> tags and if you found one <p and than another <p and another and than you suddenly have a </p> than you must wait until you find the 3rd </p> and there you have it

Omu