utf

UTF usage in C++ code

What is the difference between UTF and UCS. What are the best ways to represent not European character sets (using UTF) in C++ strings. I would like to know your recommendations for: Internal representation inside the code For string manipulation at run-time For using the string for display purposes. Best storage representation (i.e...

how do I implement a custom code page used by a serial device so I can convert text to it in Python ?

I have a scrolling LED sign that takes messages in either ASCII or (using some specific code) characters from a custom code page. For example, the euro sign should be sent as <U00> and ä is <U64> (You can find the full code page in the documentation) My question is, what is the most pythonic way to implement this custom code page...

How to convert (not neccessarily programatically) between Windows' wchar_t and GCC/Linux one?

Suppose I have this Windows wchar_t string: L"\x4f60\x597d" and L"\x00e4\x00a0\x597d" and would like to convert it (not neccessarily programatically; it will be a one-time thing) to GCC/Linux wchar_t format, which is UTF-32 AFAIK. How do I do it? (a general explanation would be nice, but example based on this concrete case would be...

How do I check that string has only international letters and spaces in UTF8 in PHP?

In Python I could've converted it to Unicode and do '(?u)^[\w ]+$' regex search, but PHP doesn't seem to understand international \w, or does it? ...

findstr or grep that autodetects chararacter encoding (UTF-16)

I want to do this: findstr /s /c:some-symbol * or the grep equivalent grep -R some-symbol * but I need the utility to autodetect files encoded in UTF-16 (and friends) and search them appropriately. My files even have the byte-ordering mark FFEE in them so I'm not even looking for heroic autodetection. Any suggestions? Thanks,...

jsp utf encoding

I'm having a hard time figuring out how to handle this problem: I'm developing a web tool for an Italian university, and I have to display words with accents (such as è, ù, ...); sometimes I get these words from a PostgreSql table (UTF8-encoded), but mostly I have to read long passages from a file. These files are encoded as utf-8 xml, ...

character encoding seems to work on a MAMP server but not on a WAMP server?

Hi, I've working on a web application, that should be able to accept tags and search queries in multiple languages. That's not asking too much, is it? Now, on my development MAMP server everything is great. I add multilingual tags, search in any language I want etc. On the other hand, on the production WAMP server, multilingual characte...

Do I need supplementary plane?

Hi, I think the question is pretty simple, do I need all the rest of the stuff in Unicode after the basic plane? What kind of stuff is included and is that really needed? (and for what purposes?) Thanks. ...

What are the limitations of primitive character types in D?

I am currently exploring the specification of the Digital Mars D language, and am having a little trouble understanding the complete nature of the primitive character types. The book Learn to Tango With D is similarly vague on the capabilities and limitations of the language in this area. The types are given on the website as: char; ...

Strings in Erlang - what libraries and techniques should I be examining?

I am working on a project that will require internationalisation support down the track. I want to get started on the right foot with UTF support, and I was wondering what the best practice for handling UTF in Erlang is? From my current research it seems there are a couple of issues with Erlang's built in string handling for some use ca...

How to get boost wdirectory_iterator to return UTF32 on the Mac

directory_iterator returns UTF8 using both Visual Studio and Xcode as expected. wdirectory_iterator, however, returns UTF16 using Visual Studio, and UTF8 using Xcode, despite returning a wchar_t string. What can I change to get wdirectory_iterator to return UTF32? An answer to a question I asked previously suggests that changing the l...

Using special Chars in Firefox and IE, are being encoded by the browser differently

Hi guys, I've got a multilingual site, that allows users to input text to search a form field, but the text goes through Javascript before heading off to the backend. Special chars like "欢" are being properly handled in Firefox, but not in any version of IE. Can someone help me understand what's going on? Thanks! ...

ISO-8859-1 vs UTF-8 ?

What should be used and when ? or is it always better to use UTF-8 always? or ISO-8859-1 still has importance in specific conditions? Is Character-set related to geographic region? Edit: Is there any benefit to put this code @charset "utf-8"; or like this <link type="text/css; charset=utf-8" rel="stylesheet" href=".." /> at the t...

PHP MySQL database strange characters

Hello, I'm trying to output product information stored in a MySQL database, but it's writing out some strange characters, like a diamond with a question mark inside of it. I think it may be an encoding/UTF8 issue, but I've specified the encoding I want: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> Is this rig...

xml utf-16 issue

Hi I am receiving a xml through a http request. The format is like this "<.?.x.m.l. .v.e.r.s.i.o.n.=.\".1...0.\". .e.n.c.o.d.i.n.g.=.\".u.t.f.-.1.6.\".?.>| etc Then i'm getting an error: {"Name cannot begin with the '.' character, hexadecimal value 0x2E. Line 1, position 2."} Trying to convert it to ascii like this, doesn't solve the ...

Silverlight to Javascript interop UTF encoding/decoding

How do I get both alerts, one invoked from silverlight and the other invoked from javascript, to show the same data in the same way. eg. ���� != ýÿýÿý System.Windows.Browser.HtmlPage.Window.Alert( data ); alert(parameters); Silverlight3 code, sending data to javascript function: System.Windows.Browser.HtmlPage.Window....

Char to UTF code in vbscript

I'd like to create a .properties file to be used in a Java program from a VBScript. I'm going to use some strings in languages that use characters outside the ASCII map. So, I need to replace these characters for its UTF code. This would be \u0061 for a, \u0062 fro b and so on. Is there a way to get the UTF code for a char in VBScript? ...

Create a file in Java for loading into an nvarchar field in SQLServer 2005 using BCP and UTF-16

Hi All, I want to use BCP to load into a SQL Server 2005 table with an nvarchar field using a loader control file. As I understand it, SQL Server 2005 only supports UTF-16 (and I believe it is UTF-16 LE). The file is being output by a Java program. The way I have it currently set up is as follows: An XML format BCP loader file (cre...

parse utf code in vbscript

Is there a way to parse utf codes in vbscript? What I'd like to do is replace all codes like "\u00f1" in a string for its corresponding character. ...

Windows game: UTF-8, UTF-16, DirectX and Lua

I'm developing a game for windows for learning purposes (I'm learning DirectX). I would like it to have UTF support. Reading this question I learned that windows uses wchar_t, which is UTF-16. I want my game to have Lua scripting support, and Lua doesn't really like Unicode much.. It simply treats strings as a "stream of bytes"; this wo...