ucs2

C++ strings: UTF-8 or 16-bit encoding?

I'm still trying to decide whether my (home) project should use UTF-8 strings (implemented in terms of std::string with additional UTF-8-specific functions when necessary) or some 16-bit string (implemented as std::wstring). The project is a programming language and environment (like VB, it's a combination of both). There are a few wish...

Storing UTF-16/Unicode data in SQL Server

According to this, SQL Server 2K5 uses UCS-2 internally. It can store UTF-16 data in UCS-2 (with appropriate data types, nchar etc), however if there is a supplementary character this is stored as 2 UCS-2 characters. This brings the obvious issues with the string functions, namely that what is one character is treated as 2 by SQL Serve...

Converting a UCS2 string into UTF8 in Ruby

How to convert a string that is in UCS2 (2 bytes per character) into a UTF8 string in ruby? ...

UCS-2LE text file parsing

I have a text file that was created using some Microsoft reporting tool. The text file includes the BOM 0xFFFE in the beginning and then ASCII character output with nulls between characters (i.e "F.i.e.l.d.1."). I can use iconv to convert this to UTF-8 using UCS-2LE for input format and UTF-8 for output format... it works great. My pr...

How to find out if Python is compiled with UCS-2 or UCS-4?

Just what the title says. $ ./configure --help | grep -i ucs --enable-unicode[=ucs[24]] Searching the official documentation, I found this: sys.maxunicode: An integer giving the largest supported code point for a Unicode character. The value of this depends on the configuration option that specifies whether Unicode cha...

UCS2/HexEncoded characters

Any one can help me ? how could I get UCS2/HexEncoded characters like 'Hello' will return "00480065006C006C006F" This are the HexEncoded values: 0048 = H 0065 = e 006C = l 006C = l 006F = o* Also in arabic (!مرحبا عالم) will return 06450631062d0628064b06270020063906270644064500200021 How I can get the encoded UCS2 in php? ...

UCS2/HexEncoded characters to UTF8 in php

Hi Guys, I asked a question previously to get a UCS-2/HexEncoded string from UTF-8, and I got some help from some guys at the following link. UCS2/HexEncoded characters But now I need to get the correct UTF-8 from a UCS-2/HexEncoded string in PHP. For the following strings: 00480065006C006C006F will return 'Hello' 06450631062d0628...

How to calculate length of an UCS-2 string and its size in C++?

I have a string in UCS-2 encoding. I need to copy this string to another UCS-2 string. Before copying I need to calculate the length of a UCS-2 string for memeory allocation. How to calculate length of an UCS-2 string? ...

How NOT to skip BOM info (FF FE) when using fread or ifstream?

I'm trying to use fread/ifstream to read the first 2 bytes of a .csv with BOM info. But following code always skips the first two bytes (which are 'FF FE'): ifstream is; is.open (fn, ios::binary ); char buf[2]; is.read(buf, 2); is.close(); using FILE*/fread does no better. ...

2-byte (UCS-2) wide strings under GCC

Hi all, when porting my Visual C++ project to GCC, I found out that the wchar_t datatype is 4-byte UTF-32 by default. I could override that with a compiler option, but then the whole wcs* (wcslen, wcscmp, etc.) part of RTL is rendered unusable, since it assumes 4-byte wide strings. For now, I've reimplemented 5-6 of these functions fro...

How to set UCS2 in numpy?

I'm trying to build numpy 1.2.1 as a module for a third-party python interpreter (custom-built, py2.4 linux x86_64) so that I can make calls to numpy from within it. Let's call this one interpreter A. The thing is, the system-wide python interpreter (also py2.4, let's call it B) from the vendor is built with --enable-unicode=ucs4, while...

How can SELECT HEX(CHAR(0x4E8C USING ucs2)) return '4E01' instead of '4E8C' ?

I converted a kanji column in my database to UCS-2 codes with this, it works: SELECT hex(convert('二' using ucs2)); => 0x4E8C aka &#x4E8C aka Unicode Code Point 20108 But if I want to convert my SQL results back to kanji, I get the wrong character: SELECT CHAR(0x4E8C USING ucs2); Returns 丁 which has code point 0x4E01 Inste...