Why not run the string through htmlspecialchars() and output it to see what it turns that character into, so you know what to use as your replace expression?
To replace it:
If your script file is encoded in the same encoding as the data you are trying to do the replacement in, it should work the way you posted it. If you're working with UTF-8 data, make sure the script is encoded in UTF-8 and it's not your editor silently transliterating the character when you paste it.
If it won't work, try escaping it as described below and see what code it returns.
To escape it:
If your source file is encoded in UTF-8, this should work:
$string = htmlentities($string, ENT_QUOTES, "UTF-8");
the default character set of html...
is iso-8859-1
. Anything differing from that must be explicitly stated.
For more complex character conversion issues, always check out the User Contributed Notes to functions like htmlentities()
, there are often real gems to be found there.
In General:
Bobince is right in his comment, systemic character set problems should be sorted systematically so they don't bite you in the ass - if only by defining which character set is used on every step of the way:
- How the script file is encoded;
- How the document is served;
- How the data is stored in the database;
- How the database connection is encoded.
This had happend to me too. Couple of things:
Use
htmlentities
function for your text$my_text = htmlentities($string, ENT_QUOTES, 'UTF-8');
More info about the htmlentities function.
Use proper document type, this did the trick for me.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Use utf-8 encoding type in your page:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Here is the final prototype for your page:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<?php
// your code related to database
$my_text = htmlentities($string, ENT_QUOTES, 'UTF-8');
?>
</body>
</html>
.
If you want to replace it however, try the mb_ereg_replace
function.
Example:
mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
$my_text = mb_ereg_replace("’","'", $string);
If you are using non-ASCII characters in your PHP code, you need to make sure that you’re using the same character encoding as in the data you are processing. Your attempt probably fails because you are using a different character encoding in your PHP script than in $string
.
Additionally, if you’re using a multibyte character encoding such as UTF-8, you should also use the multibyte aware string functions.
To find what character it is, run it through the ord
function, which will give you the ASCII code of the character:
echo ord('’'); // 226
Now that you know what it is, you can do this:
str_replace('’', chr(226), $string);
This character you have is the Right Single Quotation Mark.
To replace it with a pattern you'll want to do something like this
$string = preg_replace( "/\\x{2019}/u", 'replacement', $string );
But that really only addresses the symptom. The problem is that you don't have consistent use of character encodings throughout your application, as others have noted.
Gumbo sad right -
- save your script as utf-8 file
- and use http://php.net/mbstring (as Sarfraz pointed in his last example)
Don't use any regex functions ( preg_replace or mb_ereg_replace ). They are way to heavy for this.
str_replace(chr(226),'\u2019' , $string);
If your needle is a multibyte character, you may have better luck with this bespoke function:
<?php
function mb_str_replace($needle, $replacement, $haystack) {
$needle_len = mb_strlen($needle);
$replacement_len = mb_strlen($replacement);
$pos = mb_strpos($haystack, $needle);
while ($pos !== false)
{
$haystack = mb_substr($haystack, 0, $pos) . $replacement
. mb_substr($haystack, $pos + $needle_len);
$pos = mb_strpos($haystack, $needle, $pos + $replacement_len);
}
return $haystack;
}
?>
credit for this last function: http://www.php.net/manual/en/ref.mbstring.php#86120