unicode

Transition to Unicode for an application that handles text files

My Win32 Delphi app analyzes text files produced by other applications that do not support Unicode. Thus, my apps needs to read and write ansi strings, but I would like to provide a better-localized user experience through use of Unicode in GUI. The app does some pretty heavy character-by-character analysis of string in objects descend...

Most Efficient Unicode Hash Function for Delphi 2009

I am in need of the fastest hash function possible in Delphi 2009 that will create hashed values from a Unicode string that will distribute fairly randomly into buckets. I originally started with Gabr's HashOf function from GpStringHash: function HashOf(const key: string): cardinal; asm xor edx,edx { result := 0 } and eax,eax ...

Capitalizing non-ASCII words in Python

How to capitalize words containing non-ASCII characters in Python? Is there a way to tune string's capitalize() method to do that? ...

Converting Symbols, Accent Letters to English Alphabet.

Dear friends, The problem is that, as you know, there are thousands of characters in the Unicode chart and I want to convert all the similar characters to the letters which are in English alphabet. For instance here are a few conversions: ҥ->H Ѷ->V Ȳ->Y Ǭ->O Ƈ->C tђє Ŧค๓เℓy --> the Family ... and I saw that there are more than 20 v...

Check the language of string based on glyphs in PHP

I have a MySQL database with book titles in both English and Arabic and I'm using a PHP class that can automatically transliterate Arabic text into Latin script. I'd like my output HTML to look something like this: <h3>A book</h3> <h3>كتاب <em>(kitaab)</em></h3> <h3>Another book</h3> Is there a way for PHP to determine the language o...

C++ File Reading Library - ANSI and Unicode

I've read a few answers on here about reading Unicode files etc and most people point to UTF8-CPP or iconv. None of the libraries that I found work for both ANSI and Unicode files, ideally I want one function which I pass a filename to and it will return the contents of that file and it won't matter what the encoding is, or is that not ...

Twitter text compression challenge

Rules Your program must have two modes: encoding and decoding. When encoding: Your program must take as input some human readable Latin1 text, presumably English. It doesn't matter if you ignore punctuation marks. You only need to worry about actual English words, not L337. Any accented letters may be converted to simple ASCII. You m...

How to convert Unicode characters to escape codes

So, I have a bunch of strings like this: {\b\cf12 よろてそ } . I'm thinking I could iterate over each character and replace any unicode (Edit: Anything where AscW(char) > 127 or < 0) with a unicode escape code (\u###). However, I'm not sure how to programmatically do so. Any suggestions? Clarification: I have a string like {\b\cf12 よろてそ...

Displaying partially Unicode encoded data via AJAX/innerHTML

Hello everyone, I am trying to get some data from the server via an AJAX call and then displaying the result using responseDiv.innerHTML. The data from the server comes partially encoded with Unicode elements, like: za\u010Dat test. By setting the innerHTML of the response div, this just displayed as is. That is, the Unicode is not conv...

Unicode string to flat file from vba

I want to store a unicode string in a flat file on a windows box from an excel/vba macro. The macro converts normal string to unicode representation, need to store it in a file and retrieve later. ...

OpenFileDialog filename as UTF8

Hi all, C# question here.. I have a UTF-8 string that is being interpreted by a non-Unicode program in C++.. This text which is displayed improperly, but as far as I can tell, is intact, is then applied as an output filename.. Anyway, in a C# project, I am trying to open this file with an System.Windows.Forms.OpenFileDialog object. ...

What's the internal format of a .NET String?

I'm making some pretty string-manipulation-intensive code in C#.NET and got curious about some Joel Spolsky articles I remembered reading a while back: http://www.joelonsoftware.com/articles/fog0000000319.html http://www.joelonsoftware.com/articles/Unicode.html So, how does .NET do it? Two bytes per char? There ARE some Unicode chars^H...

urllib2 read to Unicode

I need to store the content of a site that can be in any language. And I need to be able to search the content for a Unicode string. I have tried something like: import urllib2 req = urllib2.urlopen('http://lenta.ru') content = req.read() The content is a byte stream, so I can search it for a Unicode string. I need some way that wh...

Do I need supplementary plane?

Hi, I think the question is pretty simple, do I need all the rest of the stuff in Unicode after the basic plane? What kind of stuff is included and is that really needed? (and for what purposes?) Thanks. ...

MFC DialogBased

When to use _TCHAR char types? _T(_TEXT) and L macros? What is the difference between them? ...

Is this a good description of Unicode?

Here's my description of Unicode. Please correct and comment. Unicode separates the representation of a character from the mechanism of storing a character. This is different from ANSI in which each character is represented by a byte. An ANSI code page maps characters to byte representations. Unicode maps characters to code poin...

Comparing a char to a code-point?

What is the "correct" way of comparing a code-point to a Java character? For example: int codepoint = String.codePointAt(0); char token = '\n'; I know I can probably do: if (codepoint==(int) token) { ... } but this code looks fragile. Is there a formal API method for comparing codepoints to chars, or converting the char up to a cod...

Unicode characters not showing in Zend_Pdf?

require_once 'Zend/Pdf.php'; $pdf = new Zend_Pdf(); $page = $pdf->newPage(Zend_Pdf_Page::SIZE_A4); $pdf->pages[] = $page; $page->setFont(Zend_Pdf_Font::fontWithName(Zend_Pdf_Font::FONT_HELVETICA), 10); $page->drawText("Bogus Russian: это фигня", 100, 400, "UTF-8"); $pdfData = $pdf->render(); header("Content-Disposition: inline; filename=...

Converting a UCS2 string into UTF8 in Ruby

How to convert a string that is in UCS2 (2 bytes per character) into a UTF8 string in ruby? ...

Unicode output on Windows command line?

Hi, I wrote a small java application which output includes Unicode characters. When I use Eclipse to run it - I see all the output as expected. The people who are supposed to use the application will run it as a jar file. I thought they could use standard cmd window, but in this window the Unicode appear as Gibberish. Is there a way t...