views:

124

answers:

3

Hello all.

I am having to import data from a database where the character encoding being used is ISO-8859-1 and the new site that we are using is using UTF-8. The site that the data is being pulled from is old, hence the reason that it is in ISO still I presume.

I have tried the following solutions with no results:

iconv

Nevertheless, after it published a proposed rule in March 2008 that would have banned such items altogether, the Postal Service received numerous comments opposing its planned action for four main reasons: (1) the original language was vague and overly broad, so the Postal Service has changed the word “munitions†to “explosive devices,†(2) some respondents questioned whether such a problem even existed, though the Postal Service says it has “recorded numerous incidents involving the discovery of mail that exhibited characteristics of possible explosives,†(3) the proposed rule supposedly violated the Second Amendment, and (4) the Postal Service lacks the authority to ban the mailing of such items.

to

Nevertheless, after it published a proposed rule in March 2008 that would have banned such items altogether, the Postal Service received numerous comments opposing its planned action for four main reasons: (1) the original language was vague and overly broad, so the Postal Service has changed the word “munitions†to “explosive devices,†(2) some respondents questioned whether such a problem even existed, though the Postal Service says it has “recorded numerous incidents involving the discovery of mail that exhibited characteristics of possible explosives,†(3) the proposed rule supposedly violated the Second Amendment, and (4) the Postal Service lacks the authority to ban the mailing of such items.

mb_convert_encoding

Same exact result as above.

utf8_encode

Same exact result as above.

utf8_decode

Pulls back an interesting result with all of the ? replacements:

Nevertheless, after it published a proposed rule in March 2008 that would have banned such items altogether, the Postal Service received numerous comments opposing its planned action for four main reasons: (1) the original language was vague and overly broad, so the Postal Service has changed the word ?munitions? to ?explosive devices,? (2) some respondents questioned whether such a problem even existed, though the Postal Service says it has ?recorded numerous incidents involving the discovery of mail that exhibited characteristics of possible explosives,? (3) the proposed rule supposedly violated the Second Amendment, and (4) the Postal Service lacks the authority to ban the mailing of such items.


Not exactly sure what to do here.

Any help would be appreciated!

Thanks!

A: 

That's not ISO 8859-1, that's Windows code page 1252:

>>> a=u'“'
>>> print a.encode('cp1252').decode('utf-8')
“
>>>
Ignacio Vazquez-Abrams
+1  A: 

You're going to have to be very thorough with this. Between the database and the web browser, there are many places where the encoding can become fouled up.

  • The database server's charset and collation charset
  • The database's charset and collation charset
  • The database's connection and collation charset
  • Each database table's charset and collation charset
  • In Various PHP funtions (such as htmlentities)
  • The HTTP Content-Type

Any one of these could potentially be the culprit. You may have successfully converted your data from ISO-08859-1 to UTF-8 but that still doesn't mean you're manipulating it or displaying it properly.

To check your database stuff (except for the table-specific settings), run this query

select @@character_set_server
     , @@collation_server
     , @@character_set_database
     , @@collation_database
     , @@character_set_client
     , @@character_set_connection
     , @@collation_connection
     , @@character_set_results
;

Inspect your table's CREATE statements for that info (you can copy/paste those into your question if you need help)

To address the HTTP Content-Type (i.e., the output character encoding), make sure you have this in your PHP somewhere before the output

ini_set( 'default_charset', 'UTF-8' );

Finally, if this doesn't help, give us some more detail. What parameters are you using with iconv?

Peter Bailey
WORKED PERFECT -- ini_set( 'default_charset', 'UTF-8' );Thank you sir!
Shane
A: 

Hello Peter. The output from the query you gave me was this:

latin1 latin1_swedish_ci latin1 latin1_swedish_ci latin1 latin1 latin1_swedish_ci
latin1

As for the default charset set at the beginning of the PHP file, it works for some data, but not all of it.

Anyways, still looking into it here.

Shane