unicode

Force C# (.Net 3.5) to use ASCII

I'm working on an application in C#, and need to read and write from a particular datafile format. The only issue at the moment is that the format uses strictly single byte characters, and C# keeps trying to throw in Unicode when I use a writer and a char array (which doubles filesize, among other serious issues). I've been working on mo...

How to find out if Python is compiled with UCS-2 or UCS-4?

Just what the title says. $ ./configure --help | grep -i ucs --enable-unicode[=ucs[24]] Searching the official documentation, I found this: sys.maxunicode: An integer giving the largest supported code point for a Unicode character. The value of this depends on the configuration option that specifies whether Unicode cha...

How convert null-terminated string to an AnsiString ?

I have some code that compiles fine with D7 but fails with D2010. Obviously it is an Unicode issue: The compile error is: E2251 Ambiguous overloaded call to 'StrPas' Here is the whole procedure: procedure GetVersionInfo; type PLangCharSetInfo = ^TLangCharSetInfo; TLangCharSetInfo = record Lang: Word; CharSet: Word; end; ...

How to send mail with binary word in mail subject using PHP

I am going to send mail through PHP website. Client may custom the mail subject and I will get the post data in UTF-8. But When I send out a html mail using the php mail(), I found the the mail subject cannot show properly while the mail body does. How to send chinese word in PHP mail function? Thanks. ...

Mysql ASCII vs Unicode

Just a quick one: Will SELECT ... WHERE name LIKE '...' query be faster if name column is ASCII rather then UTF-8? Thanks! ...

Elegant way for handling this string issue. (Unicode-PAnsiString issue)

Consider the following scenario: type PStructureForSomeCDLL = ^TStructureForSomeCDLL; TStructureForSomeCDLL = record pName: PAnsiChar; end function FillStructureForDLL: PStructureForSomeDLL; begin New(Result); // Result.pName := PAnsiChar(SomeObject.SomeString); // Old D7 code working all right Result.pName := Utf8ToAnsi(UTF...

ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ --> n or Remove diacritical marks from unicode chars

I am looking an algorithm that can map between characters with diacritics (tilde, circumflex, caret, umlaut, caron) and their "simple" character. For example: ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ --> n á --> a ä --> a ấ --> a ṏ --> o etc UPDATE 1) I want to do this in Java, although I suspect it should be something unicode-y a...

How do I do a strtr on UTF-8 in PHP?

I'm looking for a UTF-8 compatible strtr for PHP. ...

Dummy's guide to Unicode

Could anyone give me a concise definitions of Unicode UTF7 UTF8 UTF16 UTF32 Codepages How they differ from Ascii/Ansi/Windows 1252 I'm not after wikipedia links or incredible detail, just some brief information on how and why the huge variations in Unicode have come about and why you should care as a programmer. ...

Can Python encode a string to match ASP.NET membership provider's EncodePassword

I'm working on a Python script to create hashed strings from an existing system similar to that of ASP.NET's MembershipProvider. Using Python, is there a way to take a hexadecimal string and convert it back to a binary and then do a base64 encoding, somehow treating the original string as Unicode. Let's try some code. I'm looking to re...

A regular expression for \b

I am writing regular expressions for unicode text in Java. However for the particular script that I am using - Devanagari (0900 - 097F) there is a problem with word boundaries. \b matches characters which are dependent vowels(like 093E-094C) as they are treated like space characters. Example: Suppose I have the string: "कमल कमाल कम्हल क...

Charts with proper unicode support

I want to create simple charts like pies and bars in python. I tried out CairoPlot and pycha. Both look amazing, but they seem not to be able to handle unicode characters properly. CairoPlot.pie_plot(name='test.png', width=800, height=600, data={'eins':100, 'zwei':48, 'drei':90, 'vier':98,u'fünf':187}) result in f...

Finding type of break in icu::BreakIterator

I'm trying to understang how to use icu::BreakIterator to find specific words. For example I have following sentence: To be or not to be? That is the question... Word instance of break iterator would put breaks there: |To| |be| |or| |not| |to| |be|?| |That| |is| |the| |question|.|.|.| Now, not every pair of break points is a...

Writing unicode strings via sys.stdout in Python

Assume for a moment that one cannot use print (and thus enjoy the benefit of automatic encoding detection). So that leaves us with sys.stdout. However, sys.stdout is so dumb as to not do any sensible encoding. Now one reads the Python wiki page PrintFails and goes to try out the following code: $ python -c 'import sys, codecs, locale; ...

Generate random UTF-8 string in Python

I'd like to test the Unicode handling of my code. Is there anything I can put in random.choice() to select from the entire Unicode range, preferably not an external module? Neither Google nor StackOverflow seems to have an answer. Edit: It looks like this is more complex than expected, so I'll rephrase the question - Is the following co...

Delphi, charset detection ([Uni]SynEdit) - Utf8Decode problem

I'm using Unicode SynEdit, which (in theory) has basic file/stream encoding detection. It worked fine until I tried opening the file which was generated by my PHP script. The file I'm talking about is detected by UniSynEdit as utf8 with no BOM. Unfortunately, it doesn't open - the loaded string is empty. I debugged it, and it seems that ...

To do RegEx, what are the advantages/disadvantages to use UTF-8 string instead of unicode?

Usually, the best practice in python, when using international languages, is to use unicode and to convert early any input to unicode and to convert late to a string encoding (UTF-8 most of the times). But when I need to do RegEx on unicode I don't find the process really friendly. For example, if I need to find the 'é' character follow...

c# How to process the string?

Hi! I connect to a webservice that gives me a response something like this(This is not the whole string, but you get the idea): sResponse = "{\"Name\":\" Bod\u00f8\",\"homePage\":\"http:\/\/www.example.com\"}"; As you can see, the "Bod\u00f8" is not as it should be. Therefor i tried to convert the unicode (\u00f8) to char by doing this...

How to get a CString object from a file with CFile::Read() in Unicode?

The charset is Unicode. I want to write a string of CString type into a file, and then read it out from the file afterwards. I write the string into a file with CFile::Write() method: int nLen = strSample.GetLength()*sizeof(TCHAR); file.Write(strSample.GetBuffer(), nLen); Here is the question: I want to obtain the CString from the fil...

Ñ not displayed in google app engine website

I'm using google app engine to build a website and I'm having problems with special characters. I think I've reduced the problem to this two code samples: request = urlfetch.fetch( url=self.WWW_INFO, payload=urllib.urlencode(inputs), method=urlfetch.POST, headers={'Content-Type': 'application/x-www-form-urlencode...