tags:

views:

61

answers:

3

I am consuming a couple of feeds at the same time and assembling one single feed. When grabbing and 'cleaning up' the description for a particular tag, I find bullet characters, that I cannot for the life of me 'remove' from the output.

Doing a simple str_replace to find the (just like that, not an li or ascii value) character does nothing at all for me. I'm scratching my head and wondering why this is? This does not seem to be an encoding issue, simply a bullet point being sent over in a non ascii safe format.

Anyone run into this? A character you couldn't identify or remove?

Here is some example text:

Required Qualifications:
•BSME or equivalent four year degree
•Minimum four years in blahblah industry experience

The above is an example of a description I wish to clean up (would love to replace the bullet with a -, but would settle for just removing it.

Ideas?

EDIT -------

Based on feedback, here is some additional detail. The character just comes through as is . I doubt it is an encoding issue as this particular location ouputs this data set to either HTML (webpage with the details) or to an XML feed (packaged html tags inside the description field).

I consume the multiple xml feeds using xml2array (php). I have not had any issues with it before. I am pretty sure it is UTF-8, just the bullet comes through.

To assemble the feeds, I build my own array server side, and once I correlate the proper values from the other feeds, I output the final 'built' xml feed (which I then have an internal app consume).

The reason for consuming multiple sources? Gaps in the data that are not available in 1 format.

MORE EDITING -------

Ok looks like this is an encoding issue, but I still have yet to remove the bullet. I convert it using utf8_encode however I get odd symbols that don't copy identically, so I get something like â[]¢.

Again I am doing something like xml2array(URL), which converts the XML @ the url to an array, then simply grabbing data from the built array.

+1  A: 

the html code for that character is • and the numeric code is •. Might try searching on those

btw: maybe a preg_replace() will do the trick

$str2 = preg_replace("/•/", "", $str);
krike
I think you should replace the actual bullet with the bullet code? - • if the feed comes from a webpage then that would be it's code. Otherwise just do 2 str_replace()'s, one on the code and on on the actual bullet?
etbal
@etbal: There is no need to *ever* replace actual characters with their entity reference counterparts in XML.
Tomalak
cool! I learned something new today @Tomalak Thanx :)
etbal
None of the 3 suggestions above work. Its really odd..
Jakub
I've tested the code I provided and it works, of course the text to replace was just a string.
krike
In the end, this worked, but it was an encoding issue that I had to tackle, converting to UTF-8, then stripping the garbage that was converted out... meh.. encoding issues.
Jakub
A: 

If the feed contains a literal bullet character, check if the encoding of your PHP file matches the encoding of the feed. Otherwise str_replace will miss the char.

chiborg
A: 

Try preg_replace and search for \u2022

2022 is a unicode code-point for bullet character.

Vantomex