views:

3194

answers:

3

Hi,

I'm building a website that fetches text from another page and insert it into the database.

The problem is that all the special characters are saved in the database using the HTML encoding so then I need to convert the output using:

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

I mean, what I have right now is instead of just saving the character " ' " the html version " &#x27; " is saved in the database. This happens also when spanish characters or another special ones are saved. Instead of the letter " ñ " for ejample, I get " &ntilde; " saved.

This wastes space in the database and also I need to later convert the output using content-type so:

How can I just convert or set the charset before is saved or just let MySQL convert it??

In case you need to know here's how I connect to the database:

function dbConnect() {      
    $conn = new mysqli(DB_SERVER, DB_USER, DB_PASSWORD, DB_NAME) or die ('Error.');
    return $conn;
}

    $conn = dbConnect();
    $stmt = $conn->stmt_init();

Hope you can help me!! Thanks.

+1  A: 

You can use html_entity_decode() to convert from HTML to a (real) character encoding.

<? echo html_entity_decode("&ntilde;", ENT_COMPAT, "UTF-8"); ?>
ñ

Please note that "HTML" isn't a character encoding in the usual sense, so isn't understood by libraries such as iconv, nor by MySQL itself.

I'd also recommend (per example above) having the whole application use UTF-8. Single character encodings such as ISO8859 are effectively obsolete now that Unicode is so widely supported.

Alnitak
A: 

Maybe you should use htmlspecialchars rather that htmlentities where the first just replaces the HTML special characters &, <, > and " and not every character that can be represented by a named entity character reference like the latter does.

Gumbo
Con you explain how to use htmlspecialchars in my case??
Jonathan
Well how do you store the data into the database? Or are you just reading the data from it?
Gumbo
htmlspecialchars doesn't help because it's for _encoding_ HTML entities, not _decoding_ them.
Alnitak
But don’t encoding them in the first place would avoid this problem.
Gumbo
+1  A: 

I suggest using UTF-8 if there are any non-English characters. You can run the SQL

SET NAMES UTF-8

to make your dbase connection in UTF-8 just after you connect to the dbase.

When you do this, you shouldn't use "htmlspecialchars" or "htmlentities" while saving the data.

BYK