questions about unicode

Javascript Fastest Local Database

What would be the best format for storing a relatively large amount of data (essentially a big hashmap) for quick retrieval using javascript? It would need to support Unicode as well. XML, JSON? ...

javascript

json

unicode

Adobe Flex fails on unicode / foreign input in Linux

Hello, I was learning flex for a few days now and suddenly noticed that input of unicode / foreign characters on Linux into TextInput, TextArea or RichTextEditor gives you unreadable text composed of several characters (seems like utf-8 is making things bad). On the other hand, output is flawless. I was trying hard to find anything for...

flex

unicode

input

Ruby's String#gsub, unicode, and non-word characters

As part of a larger series of operations, I'm trying to take tokenized chunks of a larger string and get rid of punctuation, non-word gobbledygook, etc. My initial attempt used String#gsub and the \W regexp character class, like so: my_str = "Hello," processed = my_str.gsub(/\W/,'') puts processed # => Hello Super, super, super simple...

ruby

regex

unicode

Delphi 2010 or 2007 for upgrading Delphi 3 project?

I've just received an assignment to upgrade an old Delphi 3 project that I wrote in 1999 to a newer version and add features (I previously discussed this in related questions here and here). I was assuming that the appropriate route would be to first upgrade my development environment to Delphi 2010 and then port the application. I'm n...

Convert two string to the same byte length

I have 2 strings in my PHP code, 1 is a parameter to my method and 1 is a string from an ini file. The problem is that they are not equal, although they have the same content, probably due to encoding issues. When using var_dump, it is reported that the first string's lenght is 23 and the second string's length is 47 (see the end of my q...

php

unicode

utf-8

CMemFile and Unicode

Am I right in thinking that the MFC class CMemFile is cannot be used to write unicode data to because it uses BYTE* which is defined as unsigned char BYTE? The line line that actually writes the data in CMemFile::Write is Memcpy((BYTE*)m_lpBuffer + m_nPosition, (BYTE*)lpBuf, nCount); and if so can I replace BYTE with wchar_t in my...

unicode

mfc

IDLE and unicode chars (2.5.4)

Why does IDLE handle one symbol correctly but not another? >>> e = '€' >>> print unichr(ord(e)) # looks like a very thin rectangle on my system. >>> p = '£' >>> print unichr(ord(p)) £ >>> ord(e) 128 >>> ord(p) 163 I tried adding various # coding lines, but that didn't help. EDIT: browser should be UTF-8, else this will look rat...

python

unicode

How does Windows identify non-Unicode applications?

I am building an MFC C++ application with "Use Unicode Character Set" selected in Visual Studio. I have UNICODE defined, my CStrings are 16-bit, I handle filenames with Japanese characters in them, etc. But, when I put Unicode strings containing Japanese characters in a CComboBox (using AddString), they show up as ?????. I'm running Wi...

Writing unicode characters with Batik doesn't work

Hi, I am writing a project with Batik, which is for multi-language image. Therefore I need signs like "sigma" or "alpha". I have to write the character as text - not as a polygon or as a glyph - because it has to be written by my project again. If I write a unicode character in my SVGDocument it is shown correctly in the debugger, but ...

Why do I get this error when I try to print something in Putty?

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 38: ordinal not in range(128) I am downloading a website and then printing its contents...simple. Do I have to encode it somehow? ...

How to handle Unicode (non-ASCII) characters in Python?

I'm programming in Python and I'm obtaining information from a web page through the urllib2 library. The problem is that that page can provide me with non-ASCII characters, like 'ñ', 'á', etc. In the very moment urllib2 gets this character, it provokes an exception, like this: (more stack trace) File "c:\Python25\lib\httplib.py", line ...

python

unicode

character-encoding

Unicode/Japanese characters in a Java applet

I'm writing an applet that's supposed to show both English and Japanese (unicode) characters on a JLabel. The Japanese characters show up fine when I run the applet on my system, but all I get is mojibake when I run it from the web page. The page can display Japanese characters if they're hard-coded into the HTML, but not in the applet. ...

ConfigParser with Unicode items

Hi all, my troubles with ConfigParser continue. It seems it doesn't support Unicode very well. The config file is indeed saved as UTF-8, but when ConfigParser reads it it seems to be encoded into something else. I assumed it was latin-1 and I thougt overriding optionxform could help: -- configfile.cfg -- [rules] Häjsan = 3 ☃ = my snowm...

python

unicode

configparser

Impact of IDNs on web developers?

Hi, So, the BBC just released the story that ICANN is going to approve non-latin scripts for use in domain names (http://news.bbc.co.uk/1/hi/technology/8333194.stm). I'm wondering what influence this will have on us web developers. Are we going to see errors when we're grabbing referral urls, or large numbers of unicode issues when cr...

web-development

unicode

Unicode character categories in Ruby

Is there anything in Ruby that will return me an array of characters belonging to a certain Unicode category? In particular, I'd like to have the Mn category so that I can follow the advice on this answer. ...

Easy Q: UnicodeEncodeError: 'ascii' codec can't encode character

Hi, I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this: UnicodeEncodeError: 'ascii' codec can't encode character I traced it back to a trademark superscript on the end of this word: Protection™ -- and I expect to encounter others like it in the future. Is there a modu...

Unicode and fonts.

Hi, This is something that I don't see much discussed. I'm developing a software that will support multilingualism, thus, I would need to use Unicode compatible fonts, right? Where could I possibly find such fonts and how would I know for sure they support Chinese, Korean, Japanese, whatever there exist? It's a shame you can't use beau...

unicode

fonts

how to open a URL with non utf-8 arguments

Hello, Using Python I need to transfer non utf-8 encoded data (specifically shift-jis) to a URL via the query string. How should I transfer the data? Quote it? Encode in utf-8? Thanks ...

How can I write a Java function to return the standard name for a Unicode point?

I want to write a function String getName(int codePoint) { // ???? } which will return the standard name given to the character that the given code point represents. For example getName(0); would return the String "NULL" and getName(33); would return the String "EXCLAMATION POINT". Is there anything in the JDK for this? ...

java

unicode

character-encoding

does anyone knows of any site that have a searchable index of unicode symbols

Is there a site that would allow you to search for "arrow", for example, and provide all the Unicode symbols that match the keyword "arrow"? This would be very handy :) Ideally, it would also show the Unicode symbol rendered as an image for users without the requisite fonts who would otherwise see these chars as squares. ...

unicode

reference

symbols