ansaurus

Question

How to strip out strange characters when consuming a feed?

Answer 1

+1 A:

the html code for that character is • and the numeric code is •. Might try searching on those

btw: maybe a preg_replace() will do the trick

$str2 = preg_replace("/•/", "", $str);

krike 2010-10-14 14:11:37

I think you should replace the actual bullet with the bullet code? - • if the feed comes from a webpage then that would be it's code. Otherwise just do 2 str_replace()'s, one on the code and on on the actual bullet?

etbal 2010-10-14 14:20:25

@etbal: There is no need to *ever* replace actual characters with their entity reference counterparts in XML.

Tomalak 2010-10-14 14:45:50

cool! I learned something new today @Tomalak Thanx :)

etbal 2010-10-14 14:55:05

None of the 3 suggestions above work. Its really odd..

Jakub 2010-10-14 14:55:51

I've tested the code I provided and it works, of course the text to replace was just a string.

krike 2010-10-14 15:16:04

In the end, this worked, but it was an encoding issue that I had to tackle, converting to UTF-8, then stripping the garbage that was converted out... meh.. encoding issues.

Jakub 2010-10-15 17:59:15

Answer 2

A:

If the feed contains a literal bullet character, check if the encoding of your PHP file matches the encoding of the feed. Otherwise str_replace will miss the char.

chiborg 2010-10-14 14:15:55

Answer 3

A:

Try preg_replace and search for \u2022

2022 is a unicode code-point for bullet character.

Vantomex 2010-10-14 14:20:22

ansaurus

tags:

views:

answers:

How to strip out strange characters when consuming a feed?

related questions