views:

2578

answers:

6

Before we start, strip_tags() doesn't work.

now,

I've got some data that needs to be parsed, the problem is, I need to get rid of all the HTML that has been formated very strangely. the tags look like this: (notice the spaces)

< p > blah blah blah < / p > < a href= " link.html " > blah blah blah < /a >

All the regexs I've been trying aren't working, and I don't know enough about regex formating to make them work. I don't care about preserving anything inside of the tags, and would prefer to get rid of the text inside a link if I could.

Anyone have any idea?

(I really need to just sit down and learn regular expressions one day)

+4  A: 

Does

preg_replace('/<[^>]*>/', '', $content)

work?

chaos
+1  A: 

Solution which isn't fool-proof, but will work for what you posted:

s/<[^>]*>//g
strager
+1  A: 

Formatted strangely? That is valid HTML though right? In that case I wouldn't touch it with regular expressions. Examples of how this can go wrong and why it's a bad idea are legion. Instead I'd use HTML Tidy on it to, for example, clean up unnecessary white-space.

cletus
I was going to post this, but was too tired to word it intelligibly. +1.
strager
When I run the string through HTML Tidy it changes the < and > signs to < and > so strip_tags() still wont work on those. I was using both tidy_parse_string() and tidy_repair_string(). Is there another function that will work that I don't see?
Me1000
A: 

http://ca3.php.net/strip_tags is probably what you need.

Ian
strip_tags() doesn't work (as noted by the first line of my question) because PHP doesn't recognize the tags as HTML due to the formating. That was my first thought as well.
Me1000
A: 

Try this out and let me know.

<?php
$text = '< p > blah blah blah < / p > < a href= " link.html " > blah blah blah< /a >';
echo strip_tags($text);
echo "\n";
echo strip_tags($text, '<p><a>');
?>
CodeToGlory
strip_tags() doesn't work (as noted by the first line of my question) because PHP doesn't recognize the tags as HTML. That was my first thought as well.
Me1000
Did you add that later? I totally missed out on that...Did you try using preg_replace?
CodeToGlory
nope, the post hasn't been edited at all. I was asking about the regex I could use. chaos' answer is most likely the one I'll end up using, but if I could use tidy html to clean up the code then use strip_tags that would fine, but I can't find a function in tidy html that does what I need; hence why I haven't checked chaos' answer. :)
Me1000
A: 

Thank You For providing the code! I really get help from this simple code

rabin