utf-8

Decoding if it's not unicode

I want my function to take an argument that could be an unicode object or a utf-8 encoded string. Inside my function, I want to convert the argument to unicode. I have something like this: def myfunction(text): if not isinstance(text, unicode): text = unicode(text, 'utf-8') ... Is it possible to avoid the use of isins...

Unicode character in octets to hexadecimal

Unicode character in octets is something like 110xxxxx 10xxxxxx. How can I transform these octets in hexadecimal notation like U+XXXX? ...

Size difference when reading UTF8 encoded file

I'm trying to read a UTF8 encoded file (.torrent). In the file there is a 'pieces' section. Directly following that is the length of the text that contains a sequence of SHA1 hashes. The file reports a length (say 130100) to read, but when reading I end up going passed EOF. I'm not sure why this is happening. The files are good (I've t...

Serializing an object as UTF-8 XML in .NET

Proper object disposal removed for brevity but I'm shocked if this is the simplest way to encode an object as UTF-8 in memory. There has to be an easier way doesn't there? var serializer = new XmlSerializer(typeof(SomeSerializableObject)); var memoryStream = new MemoryStream(); var streamWriter = new StreamWriter(memoryStream, System.T...

Failing to write german 'umlauts' (äöü) from console to text file with java.

Hi, currently I'm desperately trying to write german umlauts, read from the console, into a utf8 encoded text file on windows 7. Here is the code to setup the scanner: Scanner scanner = new Scanner(System.in, "UTF8"); Here is the code to read the string: String s = scanner.nextLine(); Here is the code to write into a file: ...

How to output utf8 encoded characters normally in c/c++ console application?

Here's what I'm getting now by wprintf: 1胩?鳧?1敬爄汯?瑳瑡獵猆慴畴?? Is utf8 just not supported by windows? ...

Asian characters in IE 8 get garbled in Server; is this due to HTTP header Content-Type?

One of the request parameters in an http request made by the client contains Japanese characters. If I make this request in Firefox and look at the parameter as soon as it reaches the server by debugging in Eclipse, the characters look fine. If I do the same request using IE 8, the characters get garbled when I look at them at the same p...

Should I change from UTF-8 to UTF-16 to accomodate Chinese characters in my HTML?

I am using ASP.NET MVC, MS SQL and IIS. I have a few users that have used Chinese characters in their profile info. However, when I display this information is shows up as æŽå¼·è¯ but they are correct in my database. Currently my UTF for my HTML pages is set to UTF-8. Should I change it to UTF-16? I und...

LAMP UTF-8 saving incorrectly to MySQL Database

I've converted my database from Latin 1 to UTF8, and using phpPMyAdmin you can enter data and display it correctly. However viewing in the pages I've developed in PHP and editing it using my simple CMS saves characters that must be incorrectly coded. I've spent a few hours researching and eventually came up with this code snippet: mysq...

Force XDocument to write to String with UTF-8 encoding

I want to be able to write XML to a String with the declaration and with UTF-8 encoding. This seems mighty tricky to accomplish. I have read around a bit and tried some of the popular answers for this but the they all have issues. My current code correctly outputs as UTF-8 but does not maintain the original formatting of the XDocument...

Passing UTF-8 encoded string to SOAP

I am using a third party Web Service. I am passing a string to a function in that service, that string, which i am reading from a UTF-8 text file. The problem it that the string contain some non ASCII characters. Now if i save that text file to ANSI format, read it in a string and pass that string to Service then it works smoothly but ...

How to initialize a const char* and/or const std::string in C++ with a sequence of UTF-8 character?

How to initialize a const char* and/or const std::string in C++ with a sequence of UTF-8 characters? I'm using a regular expression API that accepts UTF8 string as const char*. The initialization code should be platform independent. ...

Why is PHP's utf8_encode breaking my utf-8 string?

I'm doing a kind of roundabout experiment thing where I'm pulling data from tables in a remote page to turn it into an ICS so that I can find out when this sports team is playing (because I can't find anywhere that the information is more readily available than in this table), but that's just to give you some context. I pull this data u...

php include ÅÄÖ = ??? / UTF8 problem

index.php <?php include("header.php"); ?> header.php <?php echo"<a href='add.php'>Lägg Till</a>"; ?> result L?gg Till The document is utf8 within the head tags and all, it's a php thing, the problem only occurs when i get text from include, i cannot have ÅÄÖ in included php files , how do i make it work? ...

Batch convert on Mac OS X html files to UTF-8 with Unix (LF)

I am on a Mac OS X with Snow Leopard. I need to batch convert a lot of .htm files that were originally created on Windows to UTF-8 with Unix (LF) line breaks. I can batch rename all of the files .html with NameMangler. I can do a search/replace of all of the files to update all hyperlinks to reflect the extension change to .html u...

android:Resolved:Unable to parse Currency data text obtained from UTF-8 data

Hi I am trying to fetch this xml response <?xml version="1.0" encoding="utf-8"?> <desc c="¥99"/> but on my Android each time I get Â¥99 ,after parsing the xml, instead of correct data(i.e.¥99).Is there any way to parse this Currency data correctly.Please correct me if I am missing something. EDIT:Here is the code that is used to get x...

Is utf-8 safe for the http?

Hi, if I have utf-8 encoded data, is it safe to send them in a HTTP body? The thing is that utf-8 data could include control characters including the null character (binary zero), which are not allowed by http RFC of course. So what to do with such data? Encode them with base64? On the other side the data, which I have in utf-8 is XML ...

How can SELECT HEX(CHAR(0x4E8C USING ucs2)) return '4E01' instead of '4E8C' ?

I converted a kanji column in my database to UCS-2 codes with this, it works: SELECT hex(convert('二' using ucs2)); => 0x4E8C aka &#x4E8C aka Unicode Code Point 20108 But if I want to convert my SQL results back to kanji, I get the wrong character: SELECT CHAR(0x4E8C USING ucs2); Returns 丁 which has code point 0x4E01 Inste...

JSF 2.0 request.getParameter return a string with wrong encoding

Hi, I'm writing an application in JSF 2.0 which supports many languages, among them ones with special characters. I use String value = request.getParameter("name") and POST method, the page encoding is set to UTF-8 and the app is deployed on apache tomcat 6 which has the connector set correctly to utf-8 in a server.xml file: <Connector...

Is replacing a line break UTF-8 safe?

If I have a UTF-8 string and want to replace line breaks with the HTML , is this safe? $var = str_replace("\r\n", "<br>", $var); I know str_replace isn't UTF-8 safe but maybe I can get away with this. I ask because there isn't an mb_strreplace function. ...