My Win32 Delphi app analyzes text files produced by other applications that do not support Unicode. Thus, my apps needs to read and write ansi strings, but I would like to provide a better-localized user experience through use of Unicode in GUI. The app does some pretty heavy character-by-character analysis of string in objects descend...
I am in need of the fastest hash function possible in Delphi 2009 that will create hashed values from a Unicode string that will distribute fairly randomly into buckets.
I originally started with Gabr's HashOf function from GpStringHash:
function HashOf(const key: string): cardinal;
asm
xor edx,edx { result := 0 }
and eax,eax ...
How to capitalize words containing non-ASCII characters in Python? Is there a way to tune string's capitalize() method to do that?
...
Dear friends,
The problem is that, as you know, there are thousands of characters in the Unicode chart and I want to convert all the similar characters to the letters which are in English alphabet.
For instance here are a few conversions:
ҥ->H
Ѷ->V
Ȳ->Y
Ǭ->O
Ƈ->C
tђє Ŧค๓เℓy --> the Family
...
and I saw that there are more than 20 v...
I have a MySQL database with book titles in both English and Arabic and I'm using a PHP class that can automatically transliterate Arabic text into Latin script.
I'd like my output HTML to look something like this:
<h3>A book</h3>
<h3>كتاب <em>(kitaab)</em></h3>
<h3>Another book</h3>
Is there a way for PHP to determine the language o...
I've read a few answers on here about reading Unicode files etc and most people point to UTF8-CPP or iconv.
None of the libraries that I found work for both ANSI and Unicode files, ideally I want one function which I pass a filename to and it will return the contents of that file and it won't matter what the encoding is, or is that not ...
Rules
Your program must have two modes: encoding and decoding.
When encoding:
Your program must take as input some human readable Latin1 text, presumably English.
It doesn't matter if you ignore punctuation marks.
You only need to worry about actual English words, not L337.
Any accented letters may be converted to simple ASCII.
You m...
So, I have a bunch of strings like this: {\b\cf12 よろてそ } . I'm thinking I could iterate over each character and replace any unicode (Edit: Anything where AscW(char) > 127 or < 0) with a unicode escape code (\u###). However, I'm not sure how to programmatically do so. Any suggestions?
Clarification:
I have a string like {\b\cf12 よろてそ...
Hello everyone,
I am trying to get some data from the server via an AJAX call and then displaying the result using responseDiv.innerHTML. The data from the server comes partially encoded with Unicode elements, like: za\u010Dat test. By setting the innerHTML of the response div, this just displayed as is. That is, the Unicode is not conv...
I want to store a unicode string in a flat file on a windows box from an excel/vba macro. The macro converts normal string to unicode representation, need to store it in a file and retrieve later.
...
Hi all,
C# question here..
I have a UTF-8 string that is being interpreted by a non-Unicode program in C++.. This text which is displayed improperly, but as far as I can tell, is intact, is then applied as an output filename..
Anyway, in a C# project, I am trying to open this file with an System.Windows.Forms.OpenFileDialog object. ...
I'm making some pretty string-manipulation-intensive code in C#.NET and got curious about some Joel Spolsky articles I remembered reading a while back:
http://www.joelonsoftware.com/articles/fog0000000319.html
http://www.joelonsoftware.com/articles/Unicode.html
So, how does .NET do it? Two bytes per char? There ARE some Unicode chars^H...
I need to store the content of a site that can be in any language. And I need to be able to search the content for a Unicode string.
I have tried something like:
import urllib2
req = urllib2.urlopen('http://lenta.ru')
content = req.read()
The content is a byte stream, so I can search it for a Unicode string.
I need some way that wh...
Hi,
I think the question is pretty simple, do I need all the rest of the stuff in Unicode after the basic plane? What kind of stuff is included and is that really needed? (and for what purposes?)
Thanks.
...
When to use _TCHAR char types? _T(_TEXT) and L macros? What is the difference between them?
...
Here's my description of Unicode. Please correct and comment.
Unicode separates the representation of a character from the mechanism of storing a character. This is different from ANSI in which each character is represented by a byte.
An ANSI code page maps characters to byte representations. Unicode maps characters to code poin...
What is the "correct" way of comparing a code-point to a Java character? For example:
int codepoint = String.codePointAt(0);
char token = '\n';
I know I can probably do:
if (codepoint==(int) token)
{ ... }
but this code looks fragile. Is there a formal API method for comparing codepoints to chars, or converting the char up to a cod...
require_once 'Zend/Pdf.php';
$pdf = new Zend_Pdf();
$page = $pdf->newPage(Zend_Pdf_Page::SIZE_A4);
$pdf->pages[] = $page;
$page->setFont(Zend_Pdf_Font::fontWithName(Zend_Pdf_Font::FONT_HELVETICA), 10);
$page->drawText("Bogus Russian: это фигня", 100, 400, "UTF-8");
$pdfData = $pdf->render();
header("Content-Disposition: inline; filename=...
How to convert a string that is in UCS2 (2 bytes per character) into a UTF8 string in ruby?
...
Hi,
I wrote a small java application which output includes Unicode characters. When I use Eclipse to run it - I see all the output as expected.
The people who are supposed to use the application will run it as a jar file. I thought they could use standard cmd window, but in this window the Unicode appear as Gibberish.
Is there a way t...