I would like to make sure that everything I know about UTF-8 is correct. I have been trying to use UTF-8 for a while now but I keep stumbling across more and more bugs and other weird things that make it seem almost impossible to have a 100% UTF-8 site. There is always a gotcha somewhere that I seem to miss. Perhaps someone here can corr...
So we have the XSS cheat sheet to test our XSS filtering - but other than an example benign page I can't find any evil or malformed test data to make sure that my UTF-8 code can handle missbehaving data.
Where can I find some good uh.. bad data to test with? Or what is a tricky sequence of chars?
...
I have a Unicode (UTF-8 without BOM) text file within a jar, that's loaded as a resource.
URL resource = MyClass.class.getResource("datafile.csv");
InputStream stream = resource.openStream();
BufferedReader reader = new BufferedReader(
new InputStreamReader(stream, Charset.forName("UTF-8")));
This works fine on Windows, but on Lin...
I am just trying to retrieve a web page, but somehow a foreign character is embedded in the HTML file. This character is not visible when I use "View Source."
isbn = 9780141187983
url = "http://search.barnesandnoble.com/booksearch/isbninquiry.asp?ean=%s" % isbn
opener = urllib2.build_opener()
url_opener = opener.open(url)
page = url_ope...
The perldoc page for length() tells me that I should use bytes::length(EXPR) to find a Unicode string in bytes, or and the bytes page echoes this.
use bytes;
$ascii = 'Lorem ipsum dolor sit amet';
$unicode = 'Lørëm ípsüm dölör sît åmét';
print "ASCII: " . length($ascii) . "\n";
print "ASCII bytes: " . bytes::length($ascii) . "\n";
prin...
I am not that good with encoding but I am even falling over with the basics here.
I am trying to create a file that is recognised as UTF-8
header("Content-Type: text/plain; charset=utf-8");
header("Content-disposition: attachment; filename=test.txt");
echo "test";
exit();
also tried
header("Content-Type: text/plain; charset=utf-8");...
Hi,
I have a Ubuntu server and PHP5, and the PHP script files, and all output are in UTF-8.
I'm trying to send an image to the output stream, but just garbled chinese characters shows up in the output:
$im = imagecreatetruecolor(120, 20);
$text_color = imagecolorallocate($im, 233, 14, 91);
imagestring($im, 1, 5, 5, 'A Simple Text Stri...
I have a file watcher that is grabbing content from a growing file encoded with utf-16LE. The first bit of data written to it has the BOM available -- I was using this to identify the encoding against UTF-8 (which MOST of my files coming in are encoded in). I catch the BOM and re-encode to UTF-8 so my parser doesn't freak out. The proble...
I am in the process of fixing some bad UTF8 encoding. I am currently using PHP 5 and MySQL
In my database I have a few instances of bad encodings that print like: î
The database collation is
utf8_general_ci PHP is using a proper
UTF8 header Notepad++ is set to use
UTF8 without BOM database management is handled in phpMyAdmin
not al...
We have such a oracle database which contains "Tranditional Chinese" character and english, and the environment is :
PARAMETER VALUE
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET WE8PC850
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
...
In a project all internal strings are kept in utf-8 encoding. The project is ported to Linux and Windows. There is a need for a to_lower functionality now.
On POSIX OS I could use std::ctype_byname("ru_RU.UTF-8"). But with g++ (Debian 4.3.4-1), ctype::tolower() don't recognize Russian UTF-8 characters (latin text is lowercased fine).
O...
Hi, I have a problem reading a txt file to insert in the mysql db table, te sniped of this code:
file contains the in first line: "aclaración"
archivo = open('file.txt',"r")
for line in archivo.readlines():
....body = body + line
model = MyModel(body=body)
model.save()
i get a DjangoUnicodeDecodeError:
'utf8' codec can't...
Currently in my application the utf8 encoded data is spoiled by internal coding of PHP.
How to make it consistent with utf8?
EDIT:To show examples,please tell me how to output the current internal encoding in PHP?
In php.ini I found the following:
default_charset = "iso-8859-1"
Which means Latin1.
How to change it to utf8,say,what...
My cmd promt's default code page is 936.
I need to change it to utf8.
chcp 65001
The above doesn't work,what's the correct one?
...
This is what I tried so far,by modifying php.ini:
default_charset = "utf-8"
This is how MySQL is configured:
mysql> show variables like '%char%';
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----...
For a C++ console application compiled with Visual Studio 2008 on English Windows (XP,Vista or 7). Is it possible to print out to the console and correctly display UTF-8 encoded Japanese using cout or wcout?
...
Hi folks,
I am having a very strange problem with pound signs displaying incorrectly (or not at all) on a web page.
I am keying text in a textbox, which then gets (briefly) stored in XML before being displayed in a new IE(6) window.
The worst part is that this is inconsistent. I have three different things happening:
1. Pound sign doe...
Hi!
I'm writing a small application in C that reads a simple text file and then outputs the lines one by one. The problem is that the text file contains special characters like Æ, Ø and Å among others. When I run the program in terminal the output for those characters are represented with a "?".
Is there an easy fix?
...
I have a MySQL database that I recently migrated to another server. Unfortunately, MySQL dumps its data in Latin1 with any UTF-8 characters represented by composite bytes (ex. – instead of —).
Is it possible to run a simple query or script that would convert these composite bytes to UTF-8 within my tables? It's impossible to do it row...
I've seen this post:  characters appended to the begining of each file.
In that case, the author was manually reading the source file and writing the contents. In my case, I'm abstracting it away via HttpRequest.TransmitFile():
public void ProcessRequest(HttpContext context)
{
HttpRequest req = context.Request;
HttpResponse ...