questions about utf-8 | ansaurus

utf-8

Tiles encoding problem

Hello, I'm trying to use UTF-8 encoding for the Spring application I'm developing but I have problems in getting the correct encoding when inserting attributes from tiles. I have this fragment in my JSP template: <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title><tiles:getAsString name="title"...

How to know the actual codification of a text stored in a MySQL table field?

Hello guys, I have a very simple question. I just need to determine the codification (UTF8, latin1) of a text stored in a MYSQL table field. Thanks for you help! Guillermo ...

How do I get wordpress to output Hebrew (works when in plain HTML pages)?

I have a template I've created, and it displays Hebrew. However, when I then take the HTML code and paste it into a Wordpress template, the Hebrew letters display as Question Marks (????). I'm guessing this has something to do with the format the files are saved / outputted in? What do I need to do to have the Hebrew outputted in Hebrew?...

Unicode generated by toEscapedUnicode method is without spaces

For this word चौरेउत्तमयादव the Unicode is==> \u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940 \u0930\u0940\u091D\u0941\u092E\u0932 \u091C\u093F\u0935\u0924\u0930\u093E\u092E and look it has spaces before \u0930 and \u091C But when I am trying in my code String tempString=Strings.toEscapedUnicode(strString); This method to c...

MySQL CHAR() Function and UTF8 Output?

+--------------------------+--------------------------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------------------------+ | character_set_client | utf8 ...

How do I find the number of bytes within UTF-8 string with PHP?

I have the following function from the php.net site to determine the # of bytes in an ASCII and UTF-8 string: <?php /** * Count the number of bytes of a given string. * Input string is expected to be ASCII or UTF-8 encoded. * Warning: the function doesn't return the number of chars * in the string, but the number of bytes. * ...

How to get 'è' (and not 'e') with activerecord and ruby 1.8.7

Hello folks, I am writing a simple script to update a table data. I am unable to get a record trough a field named "Agliè"; the problem is "è". c = Comune.find_by_denominazione_italiano_tedesco('Agliè') I realised that the problem can be patched using "Aglie", but I need to preserve the accent difference (these are town names, some a...

GSM-7 conversion- and septet-encoding library in Ruby?

I am looking for a pure Ruby solution to convert UTF-8 to GSM-7 and back, and do septet encoding/decoding along the way. Background here is: Sending and receiving SMS via a gateway and via REST-requests. I found a solution with libiconv (http://mobiletidings.com/2009/07/06/gsm-7-encoding-gnu-libiconv/) (which works more or less, but is...

Signedness of char and Unicode in C++0x

From the C++0x working draft, the new char types (char16_t and char32_t) for handling Unicode will be unsigned (uint_least16_t and uint_least32_t will be the underlying types). But as far as I can see (not very far perhaps) a type char8_t (based on uint_least8_t) is not defined. Why ? And it's even more confusing when you see that a ...

python json loads and unicode

I have the following case where I get the result of UTF-8 encoded HTTP response. I want to load the response content(JSON). However I don't know why I have to do 2 json.loads so that I get the final list: result = urllib2.urlopen(req).read() print result, type(result) #=> "[{\"pk\": 66, \"model\": \"core.job\", \"fields\": {\"customer\"...

SQLite, python, unicode, and non-utf data

I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode stri...

.Net using Chr() to parse text

I'm building a simple client-server chat system. The clients send data to the server and the server resends the data to all the other clients. I'm using the TcpListener and Network stream classes to send the data between the client and the server. The fields I need to send are, for example: name, text, timestamp, etc. I separate them u...

How to encode HTML non-ASCII data to UTF-8 in Python

I tried to do that, and I found this errors: >>> import re >>> x = 'Ingl\xeas' >>> x 'Ingl\xeas' >>> print x Ingl�s >>> x.decode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(...

Ruby install jcode

I'm trying to get 'jcode' for ruby, but I type "gem install jcode" and it says nothing exists? Does anyone know why? I'm trying to manipulate UTF-8 encoded strings. Sorry for the noob questoin as I'm a ruby noob :o ...

python unicode implementation (using external programs: cygnative plink ssh rsync)

I have a backup applications in python that needs to work on Windows. It needs UTF compatibility (to be able to backup directories that contain UTF characters like italian accents). The problem is it uses external programs (plink, cygwin, ssh and rsync) and I can't get them working. The prototype is 32 lines long, please take a look: # ...

Properly handling unicode characters in Rails

By default Rails allows users of our application to input non-utf8 data, such as: ¶®«¼ However when we attempt to retrieve the data from our database and render it in a template Rails incorrectly assumes that it is in UTF-8 format and throws an error. ArgumentError: invalid byte sequence in UTF-8 What is the best way to handle this? ...

"Broken" unicode strings encoded in UTF-8?

I have been studying unicode and its Python implementation now for two days, and I think I'm getting a glimpse of what it is about. Just to get confident, I'm asking if my assumptions for my current problems are correct. In Django, forms give me unicode strings which I suspect to be "broken". Unicode strings in Python should be encoded ...

How to find if a character belongs to a particular codepage using c++ or calling winapi

How can we find if a character belongs to a particular codepage? or How can we determine whether a charcter fits into currently active IME for an application. ...

Change entire db's collation and solve illegal mix of collations

Hi I'm having a problem when doing LIKE '' queries in mySQL These are my variables character_set_client utf8 character_set_connection utf8 character_set_database latin1 character_set_filesystem binary character_set_results utf8 character_set_server latin1 character_set_system utf8 character_sets_dir C:\xampp\mysql\share\charsets\ 1...

Ruby, text parsed from XML feed with Nokogiri having encoding issues on display through PHP

I'm grabbing an XML feed that claims to be <?xml version="1.0" encoding="ISO-8859-1"?> with Nokogiri and inserting the text into mysql with activerecord (OUTSIDE of rails). Using HTMLEntities to decode. Here is an example (LON: SDM) I can't seem to handle some of these html special characters properly. When redisplaying...

1
...
35
36
37
38
39
...
69