questions about unicode | ansaurus

unicode

UTF-8 to Unicode using C#

Help me please. I have problem with encoding response string after GET request: var m_refWebClient = new WebClient(); var m_refStream = m_refWebClient.OpenRead(this.m_refUri); var m_refStreamReader = new StreamReader(this.m_refStream, Encoding.UTF8); var m_refResponse = m_refStreamReader.ReadToEnd(); After calling this code my string ...

PHP and the euro symbo

Lets assume the following string is entered into a form and submitted to a PHP script. "€ should encode as €" I would like to know how to actually get € to encode as € htmlentities() doesn't do it, what voodoo is needed in order to get that to encode properly (and others like it)? ...

Built-in function for converting between unicode characters and virtual keycodes in Cocoa?

Is there a way to convert a unicode character to a Mac virtual keycode? (without building my own table?) It looks like on Windows there is VkKeyScanEx, but I'm not aware of a similar function for Cocoa on OS X. I'm actually trying to do this for the iPad. I want to convert character taken from the keyboard and convert them into key code...

i18n : Umlaut not being displayed correctly in JSP

Hi All, I have a JSP that is supposed to display some German text from some .properties files by using fmt:message, e.g. The corresponding entry in the .properties file is: service.test.hware.test = Hardware prüfen (umlaut between r and f in 2nd word). On internet explorer this displays as: Hardware prÃ¼fen the umlaut being corr...

character-encoding

regex in Vietnamese characters

I have one string and want remove any character not in any case below: not in this list : ÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêìíòóôõùúăđĩũơƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂ ưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸửữựỳỵỷỹ not in [a-z 0-9 A-Z] not is : _ and white space. can anyone help me with this regex in php? ...

internationalized regular expression in postgresql

How can write regular expressions to match names like 'José' in postgres.. In other words I need to setup a constraint to check that only valid names are entered, but want to allow unicode characters also. Regular expressions, unicode style have some reference on this. But, it seems I can't write it in postgres. If it is not possible ...

Does MySQL handle a single utf-8 character key as well as an integer?

I' working on a Chinese/Japanese learning web app where many tables are indexed by the characters (the "glyphs") of those languages. I'm wondering if the integer codepoint value of the glyph would be better for performance than using a single utf8 character (for primary key and indexes)? Using a single utf8 character would be very usef...

How to classify Japanese characters as either Kanji or Kana

Given such text 誰か確認上記これらのフ How can I classify each character as kana or kanji? To get some thing like this 誰 - kanji か - kanji 確 - kanji 認 - kanji 上 - kanji 記 - kanji こ - kana れ - kana ら - kana の - kana フ - kana (sorry if I did it incorrectly) ...

Isn’t on big endian machines UTF-8's byte order different than on little endian machines? So why then doesn’t UTF-8 require a BOM?

UTF-8 can contain a BOM. However, it makes no difference as to the endianness of the byte stream. UTF-8 always has the same byte order. If Utf-8 stored all code-points in a single byte, then it would make sense why endianness doesn’t play any role and thus why BOM isn’t required. But since code points 128 and above are stored ...

string literal to `basic_string<unsigned char>`

When it comes to internationalization & Unicode, I'm an idiot American programmer. Here's the deal. #include <string> using namespace std; typedef basic_string<unsigned char> ustring; int main() { static const ustring my_str = "Hello, UTF-8!"; // <== error here return 0; } This emits a not-unexpected complaint: cannot conv...

internationalization

Objective C / iPhone: How do I extract the actual unicode date format strings for the current region?

I am completely new to objective c and iphone development, so please be gentle (just started looking at the code for the first time tonight). According to this site: http://iphonedevelopertips.com/cocoa/date-formatter-examples.html there is a class that handles formatting, which takes in a set of constants/enums (e.g. NSDateFormatterSh...

How do I specify a range of unicode characters

How do I specify a range of unicode characters from ' ' (space) to \u00D7FF? I have a regular expression like r'[\u0020-\u00D7FF]' and it won't compile saying that it's a bad range. I am new to Unicode regular expressions so I haven't had this problem before. Is there a way to make this compile or a regular expression that I'm forgett...

How does std::stringstream handle wchar_t* in operator<<?

Given that the following snippet doesn't compile: std::stringstream ss; ss << std::wstring(L"abc"); I didn't think this one would, either: std::stringstream ss; ss << L"abc"; But it does (on VC++ at least). I'm guessing this is due to the following ostream::operator<< overload: ostream& operator<< (const void* val ); Does this h...

Are email addresses allowed to contain non-alphanumeric characters?

I'm building a website using Django. The website could have significant users from non-English speaking countries. I just want to know if there're any technical restrictions on what types of characters an email address could contain. Are email addresses only allowed to contain English alphabets + numbers + "_" + "@" + "."? Are they al...

internationalization

How to detect unicode strings with unprintable characters?

I have Unicode strings stored in a database. Some of the character encodings are wrong and instead of displaying actual characters for the language, it's now displaying characters that make no sense. How do I fix this issue? Is there a way to detect if strings have a wrong encoding? ...

character-encoding

How to convert these kind of characters to their corresponding unicode characters in Ruby?

I want to know how to convert these kind of characters to their unicode form, such as the following one: Delphi_7.0%E6%95%B0%E6%8D%AE%E5%BA%93%E5%BC%80%E5%8F%91%E5%85%A5%E9%97%A8%E4%B8%8E%E8%8C%83%E4%BE%8B%E8%A7%A3%E6%9E%90 The unicode characters of the upper string is: Delphi_7.0数据库开发入门与范例解析 Anybody knows how to do the conversion ...

character-encoding

Precompose Unicode Character Sequences in Python

Hi, How can I convert decomposed unicode character sequences like "LATIN SMALL LETTER E" + "COMBINING ACUTE ACCENT" (or U+0075 + U+0301) so they become the precomposed form: "LATIN SMALL LETTER E WITH ACUTE" (or U+00E9) using native Python 2.5+ functions? If it matters, I am on Mac OS X (10.6.4) and I have seen the question Converting ...

Displaying unicode text on ASP.NET page.

I have gone through my DB and code so far, I have hit a problem. For a particular enrty in my DB I pick up descriptions and tags (both are strings)for japanese language. Now, in a ASP.NET page,the description is shown fine but the tags which are japanese as well, are replaced with ? marks, what am I doing wrong here? The same page displa...

using unicode characters with wxPython

hi everybody! i have a problem with wxpython and his rich text control, when i try to insert unicode characters... \xb2 prints an apex '2', '\u2074' should print an apex '4'... edit: i use windows vista... and i tried 'coding cp1252 ' and 'utf-8' but with the same result... 2edit: on vista it crashs, on xp it shows a strange square (i g...

PHP and Unicode: Weirdness between Windows and Linux.

Look at IBM's Unicode for the working PHP programmer, especially listings 3 and 4. On Ubuntu Lucid I get the same output from the code as IBM does, viz: Здравсствуйте Array ( [1] => 65279 [2] => 1047 [3] => 1076 [4] => 1088 [5] => 1072 [6] => 1074 [7] => 1089 [8] => 1089 [9] => 1090 [10] => 1074...

1
...
97
98
99
100
101
...
104