questions about unicode

What does it mean when my text is displayed as boxes?

I'm attempting to display some text in my program using (say) Windows GDI and some of the unicode characters are displayed as boxes? What is up? See also: What does it mean when my text is displayed as Question Marks? ...

unicode

text

fonts

What does it mean when my text is displayed as Question Marks?

I'm attempting to display some text in my program using (say) Windows GDI and some of the unicode characters are displayed as question marks? What is up? See also: What does it mean when my text is displayed as boxes? ...

unicode

text

fonts

Entering Unicode characters in LaTeX

How do I enter Unicode characters in LaTeX? What packages do I need to install and what escape sequence do I type to specify Unicode characters in an ASCII source file? ...

unicode

latex

printable char in java

Does anyone knows how to detect printable characters in java? After a while ( trial/error ) I get to this method: public boolean isPrintableChar( char c ) { Character.UnicodeBlock block = Character.UnicodeBlock.of( c ); return (!Character.isISOControl(c)) && c != KeyEvent.CHAR_UNDEFINED && ...

java

unicode

non-printable

Converting Simplifed Chinese GB 2312 text characters into UTF8

How do I convert text between Simplified Chinese GB 2312 or similar multi-byte text strings into UTF8 using c++ ? ...

What do I need to know about Unicode?

Being a application developer, do I need to know Unicode? ...

unicode

internationalization

Should I use multi-byte overloading (mbstring.func_overload)?

I'm in the process of making my PHP site Unicode-aware. I'm wondering if anyone has experience with the mbstring.func_overload setting, which replaces the normal string functions (e.g. strlen) with their multi-byte equivalents (mb_strlen). There aren't any comments on the PHP manual page. Are there any potential problems that I should b...

php

unicode

How does GB18030 differ from Unicode?

How does the Chinese GB18030 code set differ from Unicode? What special techniques are required for handling GB18030? Are there any (open source) libraries for handling GB18030? ...

unicode

How do I replace accented Latin characters in Ruby?

I have an ActiveRecord model, Foo, which has a name field. I'd like users to be able to search by name, but I'd like the search to ignore case and any accents. Thus, I'm also storing a canonical_name field against which to search: class Foo validates_presence_of :name before_validate :set_canonical_name private def set_cano...

ruby

unicode

utf-8

Difference between Char.IsDigit() and Char.IsNumber() in C#

What's the difference between Char.IsDigit() and Char.IsNumber() in C#? ...

c#

.net

unicode

Is TCHAR still relevant?

I'm new to Windows programming and after reading the Petzold book I wonder: is it still good practice to use the TCHAR type and the _T() function to declare strings or if I should just use the wchar_t and L"" strings in new code? I will target only Windows 2000 and up and my code will be i18n from the start up. ...

Finding the Unicode codepoint of a character in GNU Emacs

In XEmacs this is done by the calling the function char-to-ucs on a character. GNU Emacs does not seem to have this function. In GNU Emacs, characters seem to be ordinary integers. Running C-x = on a latin character reveals that the Emacs codepoint is different from the Unicode codepoint for the corresponding character. How do I find...

emacs

unicode

Match unicode in ply's regexes

I'm matching identifiers, but now I have a problem: my identifiers are allowed to contain unicode characters. Therefore the old way to do things is not enough: t_IDENTIFIER = r"[A-Za-z](\\.|[A-Za-z_0-9])*" In my markup language parser I match unicode characters by allowing all the characters except those I explicitly use, because my m...

Simplest way to convert unicode codepoint into UTF-8

What's the simplest way to convert a Unicode codepoint into a UTF-8 byte sequence in C? The only way that springs to mind is using iconv to map from the UTF-32LE codepage to UTF-8, but that seems like overkill. ...

c

unicode

utf-8

Can I recover international characters mistakenly stored in a varchar field?

My client has an old MS SQL 2000 database that uses varchar(50) fields to store names. He tried to use this database to capture some data (via a web form). Some of the form-fillers are from other countries, and the varchar fields went nutty when some of these folks entered their names. Is it possible to recover the data somehow? Maybe by...

Unicode block of a character in python

Is there a way to get the Unicode Block of a character in python? The unicodedata module doesn't seem to have what I need, and I couldn't find an external library for it. Basically, I need the same functionality as Character.UnicodeBlock.of() in java. ...

python

unicode

Upgrading Google Application Engine program to use unicode

I have a simple Google App Engine app, that I wrote using ordinary strings. I realize I want to make it handle unicode. Are there any gotchas with this? I'm thinking of all the strings that I currently already have in the live database. (From real users who I don't want to upset.) ...

python

google-app-engine

unicode

Where can I find easy to understand information about Unicode?

Apart from Joel's article on the subject, where can I find information to help me get a deeper understanding of Unicode? ...

unicode

documentation

hyperlink

Delphi 2009 and Firebird 2.1 = Full Unicode?

Has anyone started making Unicode Apps or converting Existing Apps into Unicode? How are you tweaking Firebird to have the least problems, especially the CHARSET attribute? Is there any problem encountered? Any thing else that I should be aware of? I'm just preparing myself so that I get less surprises, before jumping onto the Unicode...

delphi

unicode

firebird

HtmlEncode UTF-8

I'm using Server.HtmlEncode on a utf-8 string in asp-classic, which works fine until there are some accents in the string e.g. Rüstü Recber, which appears as RÃ¼stÃ¼ Recber (RÃ¼stÃ¼ Recber in the source). I've tried setting the Response.Charset property to utf-8 but this doesn't make any difference. ...