questions about unicode | ansaurus

unicode

Cannot insert non latin symbols in MySQL

I'm writing web-app using MySQL version 5.1.45, Tomcat 5.5.28 and Hibernate 3 When I'm trying to save string that contains non-latin characters (for example Упячка) error occurs: 1589 [main] WARN org.hibernate.util.JDBCExceptionReporter - SQL Error: 1366, SQLState: HY000 1589 [main] ERROR org.hibernate.util.JDBCExceptionReporter - Inco...

Problem using unicode in URLs with cgi.PATH_INFO in ColdFusion

Hi there, My ColdFusion (MX7 on IIS 6) site has search functionality which appends the search term to the URL e.g. http://www.example.com/search.cfm/searchterm. The problem I'm running into is this is a multilingual site, so the search term may be in another language e.g. القاهرة leading to a search URL such as http://www.example.com/s...

Does boost library really supports Unicode under Windows?

Under Windows the only way to get Unicode support is to use wchar_t (UTF-16 under Windows) instead of char. The problem is that I found that at least one of the boost libraries (boost::program_options) doesn't support Unicode at all: you are not able to compile the examples as Unicode. Shouldn't boost be able to compiled with wide str...

Large static arrays are slowing down class load, need a better/faster lookup method

I have a class with a couple static arrays: an int[] with 17,720 elements a string[] with 17,720 elements I noticed when I first access this class it takes almost 2 seconds to initialize, which causes a pause in the GUI that's accessing it. Specifically, it's a lookup for Unicode character names. The first array is an index into the ...

latin1/unicode conversion problem with ajax request and special characters

Server is PHP5 and HTML charset is latin1 (iso-8859-1). With regular form POST requests, there's no problem with "special" characters like the em dash (–) for example. Although I don't know for sure, it works. Probably because there exists a representable character for the browser at char code 150 (which is what I see in PHP on the serve...

character-encoding

why in python giving to str func a unicode string will throw an exception?

for example the following: str(u'לשום') will throw an error. how can i prevent these? ...

Why does Python output a string and a unicode of the same value differently?

I'm using Python 2.6.5 and when I run the following in the Python shell, I get: >>> print u'Andr\xc3\xa9' AndrÃ© >>> print 'Andr\xc3\xa9' André >>> What's the explanation for the above? Given u'Andr\xc3\xa9', how can I display the above value properly in an html page so that it shows André instead of AndrÃ©? ...

How do I convert a unicode to a string at the Python level?

The following unicode and string can exist on their own if defined explicitly: >>> value_str='Andr\xc3\xa9' >>> value_uni=u'Andr\xc3\xa9' If I only have u'Andr\xc3\xa9' assigned to a variable like above, how do I convert it to 'Andr\xc3\xa9' in Python 2.5 or 2.6? EDIT: I did the following: >>> value_uni.encode('latin-1') 'Andr\xc3\...

utf8 format in xml

i want to know how to store this è (this type of symbols) in xml file if i store this symbol in xml file.. the file shows this symbol like � i was inserted in front of xml file is <?xml version="1.0" encoding="UTF-8"?> but that doest not shows correct thanks and advance ...

Why do you need this method inside a Django model ?

Class mytable(models.Model): abc = ... xyz = ... def unicode(self): Why is the def ___unicode___ necessary? ...

What is better for PHP developers - Unicode or UTF-8?

What is better for PHP developers - Unicode or UTF-8? I am going to create an international CMS. So I am going to have clients all over the world. They will speak all possible languages. What encoding format is better for browser recognition and for DB data storage? ...

printf field width : bytes or chars?

The printf/fprintf/sprintf family supports a width field in its format specifier. I have a doubt for the case of (non-wide) char arrays arguments: Is the width field supposed to mean bytes or characters? What is the (correct-de facto) behaviour if the char array corresponds to (say) a raw UTF-8 string? (I know that normally I should...

Comparing utf-8 strings in java

In my java program, I am retrieving some data from xml. This xml has few international characters and is encoded in utf8. Now I read this xml using xml parser. Once I retrieve a particular international string from xml parser, I need to compare it with set of predefined strings. Problem is when I use string.equals on internatinal string ...

Is it a good idea to use unicode symbols as Java identifiers?

I have a snippet of code that looks like this: double Δt = lastPollTime - pollTime; double α = 1 - Math.exp(-Δt / τ); average += α * (x - average); Just how bad an idea is it to use unicode characters in Java identifiers? Or is this perfectly acceptable? ...

What is the universal way to use file I/O API with unicode filenames?

In Windows there is a common problem: the filenames should be converted to local codepage, before they are passed to open(). Of course, there is a possibility to use Win32::API for that, but I don't want my script to be platform-dependent. At the moment I have to write something like: open IN, "<", encode("cp1251", $filename) or die $!;...

How do I read Unicode characters from an MS Access 2007 database through Java?

In Java, I have written a program that reads a UTF8 text file. The text file contains a SQL query of the SELECT kind. The program then executes the query on the Microsoft Access 2007 database and writes all fields of the first row to a UTF8 text file. The problem I have is when a row is returned that contains unicode characters, such as...

How do I properly implement Unicode passwords?

Adding support for Unicode passwords it an important feature that should not be ignored by developers. Still, adding support for Unicode in passwords is a tricky job because the same text can be encoded in different ways in Unicode and you don't want to prevent people from logging in because of this. Let's say that you'll store the pa...

unicode-normalization

How to use Unicode characters in a vim script?

I'm trying to get vim to display my tabs as ⇥ so they cannot be mistaken for actual characters. I'd hoped the following would work: if has("multi_byte") set lcs=tab:⇥ else set lcs=tab:>- endif However, this gives me E474: Invalid argument: lcs=tab:⇥ The file is UTF-8 encoded and includes a BOM. Googling "vim encoding" or ...

Delphi Unicode String Type Stored Directly at its Address (or "Unicode ShortString")

I want a string type that is Unicode and that stores the string directly at the adress of the variable, as is the case of the (Ansi-only) ShortString type. I mean, if I declare a S: ShortString and let S := 'My String', then, at @S, I will find the length of the string (as one byte, so the string cannot contain more than 255 characters...

Python file input string: how to handle escaped unicode characters?

In a text file (test.txt), my string looks like this: Gro\u00DFbritannien Reading it, python escapes the backslash: >>> file = open('test.txt', 'r') >>> input = file.readline() >>> input 'Gro\\u00DFbritannien' How can I have this interpreted as unicode? decode() and unicode() won't do the job. The following code writes Gro\u00DFbr...

1
...
70
71
72
73
74
...
104