questions about utf-8 | ansaurus

utf-8

Problem with gzip compressing utf-8 encoded php page. HELP!

i use this at top of my php page: if (substr_count($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip')) ob_start("ob_gzhandler"); else ob_start(); when page is save with ANSI encoding page was compressed. but when i change page encoding to utf-8 compression was faild. please help!!! i test compression on www.gidnetwork.com/tools/gzip-t...

character-encoding

WevDav example project in Apache Wink throws unsupported encoding exception

I am trying to compile and run WebDav example project supplied as part of examples of Apache Wink project. I have successfully deployed the project into JBoss and can reach it throw http. However, when i try to use Total Commander with WebDav plug in, i get the following exception: 15:13:41,595 ERROR [[restSdkService]] Servlet.service()...

Create an utf-8 csv file in Python.

I can't create an utf-8 csv file in Python. I'm trying to read it's docs, and in the examples section, it says: For all other encodings the following UnicodeReader and UnicodeWriter classes can be used. They take an additional encoding parameter in their constructor and make sure that the data passes the real reader or wri...

How can I read a UTF-8 text file from a jar using the Java method Class.getResourceAsStream()?

I have a UTF-8 file stored inside a java jar file. I am trying to read it using the method getResourceAsStream(), but the input stream reader that is returned by the function uses the default encoding, which is the ANSI one under Windows. How can I read a UTF-8 text file from inside a jar? ...

inputstreamreader

printing utf8 in glib

Hi. Why utf8 symbols cannot be printed via glib functions? Source code: #include "glib.h" #include <stdio.h> int main() { g_print("марко\n"); fprintf(stdout, "марко\n"); } Build it like this: gcc main.c -o main $(pkg-config glib-2.0 --cflags --libs) You could see that glib can't print utf8 and fprintf can: [marko@marko-...

Unescaping HTML entities (&#nnnn;) into plain UTF-8

Hello, We have HTML source files which contain special characters encoded as &#nnnn; like in the word: außergewöhnlich We would like to convert them into plain UTF-8: außergewöhnlich Is there any small tool to do that? ...

Does IIS 5.0 Require Unique Configuration Settings To Support UTF-8?

[Note: I can only reproduce this issue with a Win2k web server running IIS 5.0. I can't reproduce this issue with a Windows XP web server (localhost) running IIS 5.1.] I've uncovered a lot of information pertinent to UTF-8 encoding. If I've learned one thing, it's this. EDIT: MSDN offered that for IIS 5.0 and earlier, Response.CodePag...

windows-server-2000

EDOM Parse Error (invalid character was found in text)/ Korean characters Problem

How I can fix this? Tdm = class(TDataModule) HTTP: TIdHTTP; XMLDoc: TXMLDocument; ... var sStory: String; ... sStory:= GetHTTP('http://localhost/MultiPlay_PHP/contentlesson.php'); begin xmlDoc.XML.Text := sStory; xmlDoc.Active :=true; StartItemNode := XMLDoc.DocumentElement.ChildNodes.First; ANode := StartItemNo...

MySql varchar change from Latin1 to UTF8

In a mySql table I'm using Latin1 character set to store text in a varchar field. As our website now is supported in more countries we need support for UTF8 instead. What will happen if I change these fields to UTF8 instead? Is it secure to do this or will it mess up the data inside these fields? Is it something I need to think about whe...

character-encoding

How do I use unicode (UTF-8) characters in Clojure regular expressions?

This is a double question for you amazingly kind Stacked Overflow Wizards out there. How do I set emacs/slime/swank to use UTF-8 when talking with Clojure, or use UTF-8 at the command-line REPL? At the moment I cannot send any non-roman characters to swank-clojure, and using the command-line REPL garbles things. It's really easy to do ...

XmlReader breaks on UTF-8 BOM

I have the following XML Parsing code in my application: public static XElement Parse(string xml, string xsdFilename) { var readerSettings = new XmlReaderSettings { ValidationType = ValidationType.Schema, Schemas = new XmlSchemaSet() }; readerSettings.Schemas.Add(null, xsdF...

Implementing a WebSockets server: WebKit sends invalid UTF-8 Strings... sometimes

I'm trying to implement a WebSockets server in C and so far, everything seems to be fine. I tested my implementation on Mac OS X 10.6.4 using Safari Version 5.0 (6533.16) and Google Chrome 5.0.375.70. As they both use WebKit, they unsurprisingly both yield the same results: Handshake works and sending UTF-8 string from and to my server w...

How is utf8 data supposed to look when stored in a database?

I need a bit of help understanding how utf8 data is supposed to look when stored inside the database. I'm using mysql and php, the database is set to utf8, the collation on the column "p_name" is set to "utf8_unicode_ci". When I insert the data I pass it through this function function convert_charset($in_str) { $cur_encod...

Reading UTF-8 data from MySQL shows ? insted of ı

Here is how I read the data: <?php $id = $_GET["id"]; $number = mysql_real_escape_string($id); $result = mysql_query('SELECT * FROM `mystory` where `id` = ' . "$number" . ' LIMIT 1'); $row = mysql_fetch_assoc($result); echo $row['story']; ?> The data is encoded as utf8_bin. Insted of ı PHP outputs ? Why is that? What I'm doing wrong...

utf8_encode or decode isn't doing what I expect...

Hi all, I am taking an XML file and reading it into various strings, before writing to a database, however I am having difficulty with German characters. The XML file starts off <?xml version="1.0" encoding="UTF-8"?> Then an example of where I am having problems is this part <name><![CDATA[PONS Großwörterbuch Deutsch als Fremdsprac...

All characters that may be bullet points (e.g. "*") or "dash" points

This question is a simple point (pardon the pun): What are all the characters that may, when starting a paragraph, be reasonably interpreted as indicating (in the Anglo-saxon demographic) that the paragraph was meant to be a bullet point or a "dash" point. Here are the ones I would expect, so far: Bullets Asterisk: "*", HTML entity ...

Can Ruby 1.9.1 finally get a list of filenames if the filenames have unicode characters on Windows?

Can Ruby 1.9.1 finally get a list of filenames if the filenames have unicode characters on Windows? I think back in the Ruby 1.8.6 and 1.8.7 days, that wasn't possible on Windows. ...

internationalization

Ruby works well with Unicode character in Filenames on Mac OS X and on Linux, but why to make it work on Windows, it took at least 2 years?

Ruby works well with Unicode character in File Path and Filenames on Mac OS X and on Linux, but why to make it work on Windows, it took more than 2 years? I was just looking at Google Code Jam. People are solving non-trivial problems within a few hours. At work, I can imagine solving a filename or path issue having unicode characters ...

internationalization

Adding BOM to UTF-8 files

Hello, I'm searching (without success) a script, which would work as a batch file and allow me to prepend a UTF-8 text file with a BOM if it doesn't have one. Neither the language it is written in (perl, python, c, bash) or the OS it works on matters to me. I have access to a wide range of computers. I've found a lot of script to do t...

How do I write data to disk in UTF-8 encoding in Python?

The following Python code ... html_data = urllib2.urlopen(some_url).read() f = codecs.open(filename, 'w', encoding='utf-8') f.write(html_data) f.close() ... sometimes fails with UnicodeDecodeError ... File "/.../lib/python2.6/codecs.py", line 686, in write return self.writer.write(data) File "/.../lib/python2.6/codecs.py", line 351...

1
...
50
51
52
53
54
...
69