views:

886

answers:

2

Hi there,

I have a MYSQL database which needs to be accessed by both PHP and MySQL scripts, this works fine in most cases, but some "special" characters e.g. double quotes, apostrophes don't display correctly in the ASP scripts.

E.g the MySQL database is from a Drupal installation and contains a table with a field containing the text “A double quote” (the quotes are smart quotes but don't seem to dislay on stack overflow). This displays fine in a php script, but not an ASP script. I've written a simple script in both PHP and ASP to loop through the string and print the character codes here are the outputs:

PHP

“ 147
A 65
32
d 100
o 111
u 117
b 98
l 108
e 101
32
q 113
u 117
o 111
t 116
e 101
” 148

ASP

� 8220
A 65
 32
d 100
o 111
u 117
b 98
l 108
e 101
 32
q 113
u 117
o 111
t 116
e 101
� 8221

As you can see, the double quotes are coming out as different characters in PHP and ASP, and the ASP ones aren't rendering correctly.

I'm running MySQL 5 on a windows machine using a standard Drupal install with PHP 5. ASP uses the MySQL ODBC 3.51 Driver and I'm not running any other commands in either PHP or ASP except to open a connection and run the select statement.

Edit As requested here is the asp script

Dim strConn, objConn, objRS, strQ
Dim i, strBody

strConn = "DRIVER={MySQL ODBC 3.51 Driver}; SERVER=" & strDBServer & "; DATABASE=studential; UID=" & strDBUser & ";PASSWORD=" & strDBPass & "; OPTION=3"

Set objConn = Server.CreateObject("ADODB.Connection")     
objConn.Open(strConn)

strQ = "select body from drupal_node_revisions where nid = 261"
Set objRS = objConn.Execute(strQ)

strBody = objRS("body")

For i = 1 To len(strBody)    
    Response.write(Mid(strBody, i, 1) & " " & AscW(Mid(strBody, i, 1))  & "<br />")
Next

objRS.Close
objConn.Close

Set objRS = Nothing
Set objConn = Nothing

Further edit

When replacing the AscW with Asc in the line below:

Response.write(Mid(strBody, i, 1) & " " & AscW(Mid(strBody, i, 1))  & "<br />")

The character codes now match up, but the quote characters still display incorrctly. My page contains the utf-8 charset tag, so it may well be something before that is not using utf-8 encoding - any ideas what it may be or how I can fix it?

Thanks for your help,

Tom

A: 

Your ASP script appears to be using Unicode - 8220 = 0x201C which is the Unicode "LEFT DOUBLE QUOTATION MARK". You're probably seeing garbage on the screen because your ASP script is not outputting a valid encoding of this unicode string, but we'd have to see the code to pin down exactly why.

Paul Dixon
I have added the ASP script to the original question - thanks, Tom
Loftx
AscW() would give you the unicode char, I'm guessing it is converting the input char 147 and translating to unicode. What happens if you use Asc()?It's hard to say but I'm going to guess your ASP script is telling the browser its getting UTF-8, and you are passing Latin1.
Paul Dixon
Hi Paul, you're correct when I replace it with Asc the codes appear the same. Any ideas what I might need to change in my script to pass utf-8 instead of latin1?
Loftx
See http://stackoverflow.com/questions/777144/asp-net-converting-iso-8859-file-to-utf-8 for details of how you could convert your latin1/iso8859 into UTF-8 before you output it.
Paul Dixon
I tried setting the codepage and this appears to fix the example - I will try it in my actual scripts. Thanks, Tom
Loftx
+2  A: 

There seem to be several things going on here:

I'm going to assume that in the database, the column body in the table drupal_node_revisions is indeed set to a Unicode character set. Further, I'd assume that it indeed starts with the code point U+201C LEFT DOUBLE QUOTATION MARK.

Now, the PHP appears to be connecting to the database in Latin1. This causes MySQL to convert the data on being read to Windows-1252 ("Latin1" in MySQL really means Windows-1252). Hence converting the first chracter to the single byte 147. Then when you output this from PHP, I'm guessing you don't indicate the character encoding of the web page, which causes it to default to Latin1, which (sigh) almost all browsers treat as Windows-1252. Hence, the double quotes display correctly, but in fact two mistake have been made, which will cause other Unicode characters to fail:

  1. You need to execute SET NAMES utf8; in the connection ensure all connection variables to MySQL (there are three!) are working in UTF-8.

  2. You need to ensure the web page's content-type indicates a charset of UTF-8. This can be done with a meta element: <meta http-equiv="content-type" content="text/html;charset=utf-8">

The ASP code seems to be connecting to the database in some Unicode encoding. This is indidcated as the expression AscW(Mid(strBody, i, 1)) returns 8220 for the first character. The problem in the output, generating the unknown character glyphs is again that the HTML page's charset has probably been left to default, and not to a Unicode compatible encoding.

I don't know enough about ASP to know how the Response.write() method determines what character set encoding to use, or if it expects the string to already be encoded, so I can't help with figuring out how to ensure that that data path is Unicode clean end to end.

MtnViewMark