views:

66

answers:

2

i inserted data in mysql database which includes arabic script. while the output displays arabic correctly, the data in mysql looks like garbage. something like this:

 'صَومُ ثَلاثَةِ أيّامٍ مِن كُلِّ شَهرٍ ـ أربَعاءُ بَينَ خَ

should i be worried about this? if yes, how do i make it appear in proper arabic script in mysql? thanks.

+2  A: 

Those are HTML entities.

If this text

صَومُ ثَلاثَةِ أيّامٍ مِن كُلِّ شَهرٍ ـ أربَعاءُ بَينَ خَ

is what is supposed to be in your database, everything's most likely fine: Your arabic input gets converted into those entities at some point along the way.

To view the actual arabic characters like above, insert them into a text file, name it something.htm and open it in your browser.

You could also convert it into "proper" native characters in a UTF-8 encoded mySQL table, but for you to get any pointers how to do that you would have to tell us what languages/platforms you are working with.

Pekka
Don't you subscribe to the "store raw (mysql_real_escape_blah...), encode on output" school of thought?
middaparka
@middaparka I usually do, but if he is working with a pre-built 3rd party application (which is what it looks like) there's no reason to start fiddling with its character handling unless really necessary IMO.
Pekka
True - I can see it being more pain in the long run (the joy of searching, etc.), but it that's what he's stuck with, then it might not be worth the short term unpleasantness.
middaparka
@middaparka yeah. Still, good point, it would be worth considering to switch to raw data for that if at all possible. Maybe the OP will clarify a bit on what he's doing.
Pekka
i'm working on php/mysql. the default character set while configuring mysql, i had selected was 'latin1'. the arabic is displaying correctly when i view it on the browser.
fuz3d
@fusion I don't think it's the database that is doing this. I'd guess there is a call somewhere in your code turning the characters into entities. In php, that would be `htmlentities()`.
Pekka
i haven't used htmlentities() anywhere in my code. could it be mysql_real_escape_string?
fuz3d
It could also happen if you are are serving your pages up in an encoding that doesn't support Arabic, such as `iso-8859-1` (or if you serve pages without a character set and the browser guesses that). When someone submits a form with unsupported characters in a field, they get encoded as character references by the browser as a last resort. (This is a disaster because you now can't tell the difference between a character reference and a real ampersand. And if you're outputting user input without HTML-encoding on the way back out to the page your app is not only broken but also vulnerable.)
bobince
@bobince hats off to the depth of your knowledge. Seriously. That comment would be worth an answer in itself IMO.
Pekka
@bobince, thanks for your reply. how do i HTML-encode the input, at the point of output?
fuz3d
@fusion what encoding are you using right now to serve your pages?
Pekka
@Pekka, if i add this: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> .. i get this error: Error: Incorrect string value: '\xE4\xEE\xC3\xD8\xEF\xE6...' for column 'cQuotes' at row 1
fuz3d
@fusion Where did you add the meta tag? Where do you get the error?
Pekka
i added it after the head tag in all my pages. the error throws up on insertion; when i try to insert the data. do you need the code?
fuz3d
@fusion What part of your application is outputting the error? PHP? At which point? Probably worth opening a new question to have the space to post code etc.
Pekka
i'm assuming - php. i'll post the code in the next 'answer'.
fuz3d
@fusion all right. Post a link to it here if you want.
Pekka
here it is: http://stackoverflow.com/questions/2598919/html-encode-output-incorrect-string-error
fuz3d
@fusion: ‘how do I HTML-encode’: `htmlspecialchars`. That's for security/XSS-prevention reasons, not anything to do with the Arabic. Don't use `htmlentities`.
bobince
+1  A: 

As @Pekka says, those are HTML entities.

However, I can't help but think using UTF-8 (for both the database connection and HTML encoding) might save you some pain in the long run. Likewise, if at all possible (i.e.: if this is a "new" system rather than an existing codebase) I'd recommend storing the data raw in the database (using mysql_real_escape_string to prevent SQL injection, etc.) and HTML encoding at the point of output.

In general, this will make it easier to search the data, etc.

middaparka
i did all that you suggested. now it's throwing this error: Incorrect string value: '\xD8\xB3\xD9\x8F\xD8\xA6...' for column 'cQuotes' at row 1
fuz3d