views:

202

answers:

3

Hi,

I'm currently parsing an RSS feed and subparsing the html in the description field in order to create a custom XML structure.

In the description field there are ‘ and ’ signs and PHP outputs them as regular question marks. How come?

I've tried different encodings like UTF-8 and iso-8859-1 but nothing works..

This is the xml I'm parsing http://www.ilovetechno.be/artists%5Frss.xml

This is how it should get parsed http://www.crowdsurferapp.com/clients/ilovetechno/

Greets, Nick

A: 

you also have to set the correct encoding in your html meta tags and/or in your http headers

knittl
Like this you mean? header("Content-Type: application/xml; charset=iso-8859-1");
Bundy
yes, although i prefer unicode ;)
knittl
+3  A: 

There is a predefined order in that the encoding of a XML document is to be determined:

  1. charset parameter in the HTTP header field Content-Type:

    Content-Type: application/xml; charset=<character encoding>
  2. encoding attribute in the XML declaration:

    <?xml version="1.0" encoding="<character encoding>"?>

If both are missing, the default character encoding (UTF-8 or UTF-16) is used.

So in order to parse the XML document with the proper encoding, you need to look for those information. Take a look at the question PHP: Detect encoding and make everything UTF-8 for a solution from me.

I also recommend you to use UTF-8 for your internal processing and as the output encoding since that is one of the default character encodings for XML.

Gumbo
… you missed 3.: Byte order mark. Not sure 2. or 3. has precedence, though.
Konrad Rudolph
@Konrad Rudolph: You’re right. But I think it’s rather used to choose between the two default encodings if none of the above is present.
Gumbo
A: 
<?xml version="1.0" encoding="iso-8859-1"?>

change to utf-8.

FractalizeR
Already tried both iso-8859-1 and utf-8, nothing works..
Bundy
Then you need to provide your parser code.
FractalizeR