unicode

Unicode BOM for UTF-16LE vs UTF32-LE

It seems like there's an ambiguity between the Byte Order Marks used for UTF16-LE and UTF-32LE. In particular, consider a file that contains the following 8 bytes: FF FE 00 00 00 00 00 00 How can I tell if this file contains: The UTF16-LE BOM (FF FE) followed by 3 null characters; or The UTF32-LE BOM (FF FE 00 00) followed by one ...

Give me an example of Unicode XML parsing using PugiXML

I have tried to find some code for this job in the tutorials and by googling, no luck. If someone has used PugiXml, could you please help me out ? My main trouble is Unicode, otherwise the library is very easy to use. Thanks in advance. ...

MySQL and UTF-8

In MySQL, what is the difference between doing: SET NAMES 'utf8' And: SET CHARACTER SET 'utf8' I've taken a look at Connection Character Sets and Collations MySQL documentation page but I'm still a bit confused... Do both commands need to be issued in order to make MySQL UTF-8 aware? Or is SET NAMES enough? ...

How remove the warning "large integer implicitly truncated" for sqlite/unicode support?

I use the solution of http://ioannis.mpsounds.net/2007/12/19/sqlite-native-unicode-like-support/ for my POS App for the iPhone, and work great. However, as say in the comments: For instance, sqlite_unicode.c line 1861 contains integral constants greater than 0xffff but are declared as unsigned short. I wonder how I should cope with ...

Unicode to string conversion in Java

I am building a language, a toy language. The syntax \#0061 is supposed to convert the given Unicode to an character: String temp = yytext().subtring(2); Then after that try to append '\u' to the string, I noticed that generated an error. I also tried to "\\" + "u" + temp; this way does not do any conversion. I am basically trying...

Monospace Unicode font

Can anybody please tell me the monospace font that covers most of the unicode characters If not then a monospace font that contains most of the european language character set ? ...

What kind of text code is %62%69%73%68%6F%70?

On a specific webpage, when I hover over a link, I can see the text as "bishop" but when I copy-and-paste the link to TextPad, it shows up as "%62%69%73%68%6F%70". What kind of code is this, and how can I convert it into text? Thanks! ...

Why Doesn't my Application Display Unicode?

I have created an MFC application from scratch being careful from the start to use Unicode aware structures such as CStringW, LPCWSTR Etc. to store and process data. Unicode is also defined in the project. Since I only one speak one language I tried the following test to ensure that a Unicode string was processed and stored correctly...

How does Java 16 bit chars support Unicode ?

Javas char is 16 bit, yet Unicode have far more characters - how does Java deal with that ? ...

Accentuated literals in Java

I tried to type char literals for accentuated vowels in Java, but the compilers says something like: unclosed character literal This is what I'm trying to do: char [] a = {'à', 'á', 'â', 'ä' }; I've tried using Unicode '\u00E0' but for some reason they don't match with my code: for( char c : string.toCharArray() ) { if( c == ...

Objective-C: Unicode Date Format

Hi guys, I am trying to work out how to have the UNICODE representation of Sun, 03 May 2009 19:58:58 -0700 as eee, dd MMM yyyy HH:mm:s ZZZZ or something. I cant seem to get this working precisely ...

How can I embede chinese characters in my Perl source?

Hi Guys , In my script I need to "qw" some chinese character into a string . when I run the script , perl conplains that there is unrecognized character in the script . Although I know it must related to encode related stuff , but I don't know how to solve it . so turn to you for help . thanks in advance . ...

Open Source, multi language / unicode fonts?

Hi, We need a selection of fonts that we can use in PDFs. Our PDF library only works with TrueType fonts, and we want these fonts to be as multi language friendly as possible - i.e Chinese/Japanese support ideally, at the very least a wide range of European characters. http://stackoverflow.com/questions/458566/unicode-implementation-...

Reading unicode string from registry

I'm using codegear c++ builder 2007. I'm trying to read a string value with a path from the registry. This path can contain unicode characters, for example russian. I have added a string value with regedit and verified by exporting that the value really contains the expected unicode characters. The result in S1, S2 and S3 below all cont...

Diacritic signs

Hi, how should I write "mąka" in Python without an expection? I've tried var= u"mąka", var= unicode("mąka") etc... nothing helps :/ ...

is it safe to use non ASCII unicode characters like ♺ in web sites?

Taking into account web browsers, operating systems, iphone, blackberries, etc ...

Unicode aware CSV parser in Java

I'm looking for Java implementation of CSV (comma separated values) parser with proper handling of Unicode data, e.g. UTF-8 CSV files with Chinese text. I suppose such a parser should internally use code point related methods while iterating, comparing etc. Apache 2 license or similar would work the best. ...

Unicode Strings in Ruby 1.9

I've written a Ruby script that is reading a file (File.read()) that contains unicode characters, and it works fine from the command line. However, when I try to put it into an Automator Workflow (Mac OS X), I get this error; 2009-12-23 17:55:15 -0500: /Users/jeffreyaylesworth/bin/symbols:19:in `split': invalid byte sequence in US-ASCI...

How do I return a Unicode value in this SQL example?

I need to return Russian text from a SQL Server 2005 database table. In the following example which is a simple way of describing my dilemma, the @Test variable will print out question marks: DECLARE @Test nvarchar(max) SET @Test = 'Баннер' PRINT @Test (Note that the @Test value is Russian text, for those who don't have the font ins...

How to detect locale/language if the locale doesn't have a codepage?

Hi, I need to detect the language from a unicode widestring. I have tried using the iMultiLang2 interface and that properly works if the locale has a codepage. Some locales/languages do not have codepages and are mapped to unicode only. How can I get the lcid for those? Georgian,Hindi and many other languages do not have codepages and ...