tags:

views:

117

answers:

2

Hi,

I am a little stuck and need help. So it goes like this. I have a site where the count of comments on a particular article comes from comment module. the HTML tag on it is displayed which looks something this (<em>1</em>). I have recently upgraded my site 5 to 6. Everything else works fine apart from this. Please Help.

Thanks!!!

A: 

Assuming you want to get rid of <em>1</em>: \<em([^>]*)\>(\d[^>]*)\</(em[^>]*)\> -> removes all bracketed text, containing numbers.

And if this is clumsy, shh, I only learnt regexes yesterday.

Nazarius Kappertaal
Parsing html with regexes... this is folklore now: don't do it! The Whatif flood is coming. What if the brackets contain an open tag and not the closing tag? What if there is an unrelated bracket containing a number? * matches 0 or more, so you would also match anything in brackets without a digit. Matching is also greedy, so an open bracket near the beginning of the document and a close near the end would be matched, and the whole doc goes. Etc etc. Search for 'regex html' on SO to see the horror.
Phil H
I've seen bobince's cry for help. I use lxml to parse my trees. However, I will continue to regexes to my heart's content. Perhaps to my own detriment, but at least I'll learn something huh XD.
Nazarius Kappertaal
+1  A: 

You can use strip_tags() function to remove HTML from string

451F