unicode

Python Unicode and MIMEE

Hi Guys, Can someone who is way smarter than I tell me what I'm doing wrong.. Shouldn't this simply process... # encoding: utf-8 from email.MIMEText import MIMEText msg = MIMEText("hi") msg.set_charset('utf-8') print msg.as_string() a = 'Ho\xcc\x82tel Ste\xcc\x81phane ' b = unicode(a, "utf-8") print b msg = MIMEText(b) msg.set_cha...

convert html entities to unicode(utf-8) strings in c?

This question is very similar to that one, but I need to do the same thing in C, not python. Here are some examples of what the function should do: input output &lt; < &gt; > &auml; ä &#x00DF; ß The function should have the signature char *html2str(char *html) or similar. I'm not reading byte by byte from a stream. Is t...

How do I recover a document that has been sent through the character encoding wringer?

Until recently, my blog used mismatched character encoding settings for PHP and MySQL. I have since fixed the underlying problem, but I still have a ton of text that is filled with garbage. For instance, ï has become ï. Is there software that can use pattern recognition and statistics to automatically discover broken text and fix it? ...

Which programming languages were designed with Unicode support from the beginning?

Which widely used programming languages were designed ground-up with Unicode support? A lot of programming languages have added Unicode support as an afterthought in later versions, but which widely used languages were released with Unicode support from day one? ...

MySQL utf8_general character mapping table

From what I understand, when MySQL compares a string stored in utf8_general collation, it first converts it's characters to their ascii equivalents. In other words ḩ = h, ţ = t, ā = a, í = i, etc... Is there a mapping table which I could use to implement similar comparison function in php or javacript? I know there are alternatives in p...

How do I represent a Unicode character in a literal string ISO/ANSI C when the character set is ASCII?

In Perl, I can say my $s = "r\x{e9}sum\x{e9}"; to assign "résumé" to $s. I want to do something similar in C. Specifically, I want to say sometype_that_can_hold_utf8 c = get_utf8_char(); if (c < '\x{e9}') { /* do something */ } ...

Servlet request.getParameters non english character help!

Heya guys, I'm in desperate need of help. I have a Java servlet that is accessed by a HTTP Get URL with eight parameters in it. The problem is that the parameters are not exclusive to English. Any other language can be in those parameters, like Hebrew, for example. Now, when I send the data - either from the class that is supposed to...

Delphi < 2009, unicode replacement for JvAppStorage.

I'm looking for the best option to store my application settings. I decided to write own class that inherits from TPersistent which would store all the config options available. Currently I'm looking for the best way to save it - and I found JvAppStorage which looked very promising (as I'm using JVCL in my project anyway...) but it doesn...

Printing Unicode code point to console using int intead of \uNNNN

Hello, apologies if this is silly. How do I print a Unicode character, say \u20ac using an integer? So, instead of Console.WriteLine("\u20ac");, I would like to pass the integer 8364. Thanks. ...

Printing Astral Plane Unicode code point to console using int

Please see here for a related question. However, char goes to 0xffff (or 65535). I need to write 0xd800df46 (or 66374), Gothic letter Faihu, so casting that int to char will not work. I do the conversion ok, that is, I get the correct integer, meaning I calculate the surrogate pairs ok, but I don't know how to "render" it, convert it t...

In C# String/Character Encoding what is the difference between GetBytes(), GetString() and Convert()?

We are having trouble getting a Unicode string to convert to a UTF-8 string to send over the wire: // Start with our unicode string. string unicode = "Convert: \u10A0"; // Get an array of bytes representing the unicode string, two for each character. byte[] source = Encoding.Unicode.GetBytes(unicode); // Convert the Unicode bytes to U...

Python 3 smtplib send with unicode characters

I'm having a problem emailing unicode characters using smtplib in Python 3. This fails in 3.1.1, but works in 2.5.4: import smtplib from email.mime.text import MIMEText sender = to = '[email protected]' server = 'smtp.DEF.com' msg = MIMEText('€10') msg['Subject'] = 'Hello' msg['From'] = sender msg['To'] = to s = smtplib.SM...

Is it possible to convert between Unicode normal forms in PHP?

For example, in one Unicode normal form á is always represented as an unaccented letter a and a combining accent mark, in another it must be a single pre-combined Unicode character. How would I convert between these forms in PHP? ...

Automatic Unicode string formatting in Java

I just came across something like this: String sample = "somejunk+%3cfoobar%3e+morestuff"; Printed out, sample looks like this: somejunk+<foobar>+morestuff How does that work? U+003c and U+003e are the Unicode codes for the less than and greater than signs, respectively, which seems like more than a coincidence, but I've never ...

Writing a string to a TFileStream in Delphi 2010

I have Delphi 2007 code that looks like this: procedure WriteString(Stream: TFileStream; var SourceBuffer: PChar; s: string); begin StrPCopy(SourceBuffer,s); Stream.Write(SourceBuffer[0], StrLen(SourceBuffer)); end; I call it like this: var SourceBuffer : PChar; MyFile: TFileStream; .... SourceBuffer := StrAlloc(1024); MyFi...

WCF Unicode UrlEncoded Get not coming over nicely

I have a RESTful WCF service which accepts GET verbs with Unicode encoded urls. The Unicode characters are translated as little boxes strangely when I get the data on the server. Is there something I have to tell the service contract to do in order to get Unicode UrlEncoded Gets to translate into nice strings? Here's my contract: [Ope...

How to properly trim whitespaces from a string in Java?

The JDK's String.trim() method is pretty naive, and only removes ascii control characters. Apache Commons' StringUtils.strip() is slightly better, but uses the JDK's Character.isWhitespace(), which doesn't recognize non-breaking space as whitespace. So what would be the most complete, Unicode-compatible, safe and proper way to trim a s...

Error C2679 when attempting to use std::wcout << wstring-var; vc++ 2008 express

I'm getting a rather odd error message when attempting to wcout a wstring in vc++ 2008 express: error C2679: binary '<<' : no operator found which takes a right-hand operand of type 'std::wstring' (or there is no acceptable conversion) If I understand this correctly it's reporting that wcout does not accept a wstring? I ask someone...

Get supported characters of a font - in C#

I have a third party font with support for japanese characters which I need to use for an application. Whenever a character is not supported by this font, the often seen rectangle ("default character") is drawn. Obviously not all japanese characters are supported, because if I try to draw the translations that our translation office gave...

Request Object not decoding UrlEncoded

C#, ASP.NET 3.5 I create a simple URL with an encoded querystring: string url = "http://localhost/test.aspx?a=" + Microsoft.JScript.GlobalObject.escape("áíóú"); which becomes nicely: http://localhost/test.aspx?a=%E1%ED%F3%FA (that is good) When I debug test.aspx I get strange decoding: string badDecode = Request.QueryString[...