questions about unicode | ansaurus

unicode

Is there a Windows command shell that will display Unicode characters?

Assuming I have fonts installed which have the appropriate glyphs in them, is there a command shell for Windows XP that will display Unicode characters? At a minimum, two things that should display Unicode correctly: Directory listings. I don't care what I have to type (dir, ls, get-childitem, etc.), so long as files with Unicode chara...

Handling a Unicode String in Delphi Versions <= 2007

Background: This question relates to versions of Delphi below 2009 (ie without Unicode support built in). I have a specification that requires me to transmit a Unicode encoded string over a TCP connection but I do not have Delphi 2009. Question Is there a single function or very small library (I don't need too much bulk) that I can use...

Validating Kana Input

I am working on an application that allows users to input Japanese language characters. I am trying to come up with a way to determine whether the user's input is a Japanese kana (hiragana, katakana, or kanji). There are certain fields in the application where entering Latin text would be inappropriate and I need a way to limit certain ...

language-agnostic

Unicode characters in Windows command line - how?

We have a project in TFS that has a non-English character (š) in it. When trying to script a few build-related things we've stumbled upon a problem - we can't pass the š letter to the command line tools. Command prompt or what not else messes it up, and the tf.exe utility can't find the specified project. I've tried different formats fo...

How to read/store unicode with STL strings and streams

I need to modify my program to accept Unicode, which may come from any of UTF-8 and the various UTF-16 and UTF-32 encodings. I don't really know much about Unicode (though I've read Joel Spolsky's article and the Wikipedia page). Right now I'm using an std::istream and reading my input char by char, and then storing (when necessary) in...

Unicode appnames in Django

Hi, I live in Norway, and when i make Django apps i would like to be able to name my apps with characters like "æøå", these characters work fine in unicode, but when i try to use these characters in app names, or in fields display text i get an error. Even better, i would like to name my apps by the english convention, but have somethi...

character-encoding

MD5 Hashing in Delphi 2009

In borland delphi 7 and even in delphi 2007 everything worked, but in delphi 2009 it just returns the wrong hash! I use wcrypt2 script (http://pastebin.com/m2f015cfd) Just have a look: string : "123456" hash: Delphi 7 : "e10adc3949ba59abbe56e057f20f883e" - real hash. Delphi 2007 : "e10adc3949ba59abbe56e057f20f883e" - real hash too....

Do I have use the prefix N in the "insert into" statement for unicode?

Like: insert into table (col) values (N'multilingual unicode strings') I'm using SQL Server 2008 and I already use nVarChar as the column data type. ...

How to remove (?) while we convert our byte content from unicode to ansi character

I need to convert the unicode characters to ansi characters byte[] encode = Encoding.Convert(Encoding.Unicode, Encoding.Default, report); I use this piece of code. While I am viewing this I found that extra ? character is added in the first part ?FF EE 20 12 ...

character-encoding

How to get code point number for a given character in a utf-8 string?

I want to get the UCS-2 code points for a given UTF-8 string. For example the word "hello" should become something like "0068 0065 006C 006C 006F". Please note that the characters could be from any language including complex scripts like the east asian languages. So, the problem comes down to "convert a given character to its UCS-2 code...

Is there a standard way to do an fopen with a unicode string file path?

Is there a standard way to do an fopen with a unicode string file path? ...

How to make the Java.awt.Robot type unicode characters? (Is it possible?)

We have a user provided string that may contain unicode characters, and we want the robot to type that string. How do you convert a string into keyCodes that the robot will use? How do you do it so it is also java version independant (1.3 -> 1.6)? What we have working for "ascii" chars is //char c = nextChar(); //char c = 'a'; // this...

Unicode Regex; Invalid XML characters

The list of valid XML characters is well known, as defined by the spec it's: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] My question is whether or not it's possible to make a PCRE regular expression for this (or its inverse) without actually hard-coding the codepoints, by using Unicode general categories. An...

Best Default and User Selected Fonts for a Unicode Application

What is the best default font to use for a Unicode Application in which the user can select the font he wants to use? The problem I notice is that not all Windows machines have every Unicode font. And every Unicode font does not include all the Unicode characters. So what would be the font that would have the best trade off between ava...

Finding System Fonts with Delphi

What is the best way to find all the system fonts a user has available so they can be displayed in a dropdown selection box? I would also like to distinguish between Unicode and non-Unicode fonts. I am using Delphi 2009 which is fully Unicode enabled, and would like a Delphi solution. ...

std::wstring VS std::string

I am not able to understand the differences between std::string and std::wstring. I know wstring supports wide characters such as Unicode characters. I have got the following questions: When should I use std::wstring over std::string? Can std::string hold the entire ASCII character set, including the special characters? Is std::wstring...

unicode char comparing to non unicode char, but no warning nor error

Why does the following code NOT give an error, nor any type of a warning about an implicit conversion? std::wstring str = L"hi"; if(str[0] == 'h') cout<<"strange"<<endl; The proper normal code is: std::wstring str = L"hi"; if(str[0] == L'h') cout<<"strange"<<endl; Compiler: visual studio 2005 Warning level: level 4 (hi...

Why are "control" characters illegal in XML?

There are a variety of characters that are not legally encodeable in XML, e.g. U+0007 ('bell') and U+001B ('escape'). Most of the interesting ones are non-whitespace 'control' characters. It's clear from (e.g.) this question and others that it's the XML spec that's the issue -- but can anyone illuminate me as to why the XML spec forbi...

Java: how to check if character belongs to a specific unicode block?

I need to identify what character set my input belongs to. The goal is to distinguish between Arabic and English words in a mixed input (the input is unicode and is extracted from XML text nodes). I have noticed class Character.UnicodeBlock : is it related to my problem? How can I get it to work? Edit: The Character.UnicodeBlock ...

Is it possible to use libxml with unicode xmlchar?

Is it possible to use libxml with unicode? For example the xmlParseDoc function takes an xmlChar xmlChar has the following definition: typedef unsigned char xmlChar; I would like for libxml to interpret all as 2 byte chars. I have a feeling that the following would not work properly with the lib: typedef unsigned short xmlChar; ...

1
...
8
9
10
11
12
...
104