unicode

Is there a way to see if a character is using 1 or 2 bytes in Delphi 2009?

Hi, Delphi 2009 has changed its string type to use 2 bytes to represent a character, which allows support for unicode char sets. Now when you get sizeof(string) you get length(String) * sizeof(char) . Sizeof(char) currently being 2. What I am interested in is whether anyone knows of a way which on a character by character basis it ...

Unicode issues with acts_as_taggable_on_steroids

I'm implementing a blog with tags with some French characters. My question has to do with how to deal with spaces and unicode (utf-8) characters in the url. let's say I have a tag called: ohlàlà! and I have the following code in my tag cloud: <%= link_to h(tag.name.capitalize), { :controller => :blog, :action => :tag, :id => h(tag.name...

How to check if a string in Python is in ASCII?

Hello, I am fighting with Python to understand how do I check whether a string is in ASCII or not. I am aware of ord(), however when I try ord('é'), I have TypeError: ord() expected a character, but string of length 2 found. I understood it is caused by the way I built Python (as explained in the ord()'s documentation). So my questio...

Dealing with a string containing multiple character encodings.

I'm not exactly sure how to ask this question really, and I'm no where close to finding an answer, so I hope someone can help me. I'm writing a Python app that connects to a remote host and receives back byte data, which I unpack using Python's built-in struct module. My problem is with the strings, as they include multiple character e...

PHP function imagettftext() and unicode

I'm using the PHP function imagettftext() to convert text into a GIF image. The text I am converting has Unicode characters including Japanese. Everything works fine on my local machine (Ubuntu 7.10), but on my webhost server, the Japanese characters are mangled. What could be causing the difference? Everything should be encoded as UTF-8...

UTF usage in C++ code

What is the difference between UTF and UCS. What are the best ways to represent not European character sets (using UTF) in C++ strings. I would like to know your recommendations for: Internal representation inside the code For string manipulation at run-time For using the string for display purposes. Best storage representation (i.e...

How do I set the byte order marker for Unicode files?

I know this is not a "real" programming question. But, it relates to programming so I am going to set it anyway. I have a program that I need to test that reads the Byte Order Marker of the file to see if it is utf-8 or utf-16. My problem is I cannot find a program/text editor that will allow me to set the byte order marker. Can anyb...

How to use Special Chars in Java/Eclipse

How can I use/display characters like ♥, ♦, ♣, or ♠ in Java/Eclipse? Wenn I try to use them directly, i.e. in the source code, Eclipse cannot save the file: What can I do? Edit: How can I find the unicode escape sequence? ...

How do I match only fully-composed characters in a Unicode string in Perl?

I'm looking for a way to match only fully composed characters in a Unicode string. Is [:print:] dependent upon locale in any regular expression implementation that incorporates this character class? For example, will it match Japanese character 'あ', since it is not a control character, or is [:print:] always going to be ASCII codes 0x20...

Macron in VBA editor

Been creating a simple program using VBA that I can use to review vocabulary in Chinese. I've gotten a fair bit working so far, but have run into a huge problem with inputting a macron-character such as "ā" (unicode 257). The specific application I am working on right now involves changing the contents of the text-box form so that an "...

How do I input 4-byte UTF-8 characters?

I am writing a small app which I need to test with utf-8 characters of different number of byte lengths. I can input unicode characters to test that are encoded in utf-8 with 1,2 and 3 bytes just fine by doing, for example: string in = "pi = \u3a0"; But how do I get a unicode character that is encoded with 4-bytes? I have tried: str...

CMapStringToOb::Lookup problem in Japanese

Does anyone know why CMapStringToOb::Lookup doesn't work in Japanese? The code loads a string from the string table, and puts it into a CMapStringToOb object. Later it loads the same string from the string table (so it is guaranteed to be exactly the same) and calls CMapStringToOb::Lookup to find it. It works in all languages that we'v...

How to purge my junk mail folder of messages in the Cyrillic alphabet (Outlook 2007)?

Recently, my junk mail folder has been filling up with messages composed in what appears (to me) to be the Cyrillic alphabet. If a message body or a message subject is in Cryillic, I want to permanently delete it. On my screen I see Cyrillic characters, but when I iterate through the messages in VBA within Outlook, the "Subject" proper...

Redirecting ConsoleOutput containing pseudo-loc (unicode) strings in C#

I'm running a console app (myApp.exe) which outputs a pseudo localized (unicode) string to the standard output. If I run this in a regular command prompt(cmd.exe), the unicode data gets lost. If I run this in a unicode command prompt(cmd.exe /u) or set the properties of the console to "Lucida Console" then the unicode string is maintaine...

Writing utf16 to file in binary mode

I'm trying to write a wstring to file with ofstream in binary mode, but I think I'm doing something wrong. This is what I've tried: ofstream outFile("test.txt", std::ios::out | std::ios::binary); wstring hello = L"hello"; outFile.write((char *) hello.c_str(), hello.length() * sizeof(wchar_t)); outFile.close(); Opening test.txt in for ...

How do I use Unicode characters in Pod and perldoc?

I need to use utf-8 characters in my perl-documentation. If I use: perldoc MyMod.pm I see strange characters. If I use: pod2text MyMod.pm everything is fine. I use Ubuntu/Debian. $ locale LANG=de_DE.UTF-8 LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" LC_COLLATE="de_DE.UTF-8" LC_MONETARY="de_DE.UTF-8" LC_ME...

REPLACE and Unicode characters in SQL

I have some data with messed-up accented characters. For example in the data we have things like ClΘmentine that should should read Clémentine I'd like to clean it up with a script, but when I do this for example Select Replace('ClΘmentine', 'Θ', 'é') this is what I get: Clémenéine Apparently Θ matches both Θ and t. Any ideas ...

Should I use Ansi or Unicode charset with dllimport?

When you use DllImport to import a function you can specify a CharSet to use. I noticed that in C#, C++ and visual basic the .Net runtime defaults to using Ansi instead of Unicode for this. So for any system call that has an A and a W version the A version will be called by default. .Net uses unicode internally and if I'm not mistaken ne...

How do you properly use WideCharToMultiByte

I've read the documentation here: http://msdn.microsoft.com/en-us/library/ms776420(VS.85).aspx I'm stuck on this parameter: lpMultiByteStr [out] Pointer to a buffer that receives the converted string. I'm not quite sure how to properly initialize the variable and feed it into the function ...

looking for a UTF-8 text editor

I am looking for a (simple) text editor that can handle text in different encodings in the same document. I need to develop some sites with mixed Japanese and English text and the editors I have now (on an English Windows system) are unable to display the Japanese text. Jedit files don't display the Japanese text I have inputted but whe...