utf-8

Deployed (ie serverside) mysql db is UTF-8 but local version isn't

At least i think that's the problem. In my staging and production databases (both on the same server) i have a table with a text field that holds html. Some of these have web quotes, which are displayed fine. However, locally i have my development database which is a copy of the staging database (it was copied by taking a dump of the ...

Python UTF-8 comparison

a = {"a":"çö"} b = "çö" a['a'] >>> '\xc3\xa7\xc3\xb6' b.decode('utf-8') == a['a'] >>> False What is going in there? edit= I'm sorry, it was my mistake. It is still False. I'm using Python 2.6 on Ubuntu 10.04. ...

Using preg_replace/ preg_match with UTF-8 characters - specifically Māori macrons

I'm writing some autosuggest functionality which suggests page names that relate to the terms entered in the search box on our website. For example typing in "rubbish" would suggest "Rubbish & Recycling", "Rubbish Collection Centres" etc. I am running into a problem that some of our page names include macrons - specifically the macron ...

How can I detect non-western characters?

I want to disallow certain UTF-8 input (server-side), e.g. eastern languages, where example input might be " 伊 ". However, I do want to continue supporting other latin or "latin-like" characters, such as the welsh ŵ and ŷ, so checking against latin-1 is not possible. What are my options? (if language specific, PHP preferred) Thanks ve...

PHP encoding issue

Hi, I have a trouble displaying Cyrillic characters properly. Looked in forums, tried a few different thing and nothing works. Site runs on PHP / MySQL. MySQL tables charset is utf8, and collation is utf8_general_ci Name entry in DB looks correct (in PhpmyAdmin): Sasha Рукина Output on page http://www.sodaq.com/: Sasha ?????? Inside...

Sniffing and displaying TCP packets in UTF-8

Hi everyone, I am trying to use tcpdump to display the content of tcp packets flowing on my network. I have something like: tcpdump -i wlan0 -l -A The -A option displays the content as ASCII text, but my text seems to be UTF-8. Is there a way to display UTF-8 properly using tcpdump? Do you know any other tools which could help? Many...

unzip utf8 to ascii

i took some files from linux hosting to my windows via ftp and when i check file encodings utf8 without bom now i need to convert those files back to ascii and send my other linux server i zipped files can i do something like unzip if its text file and ut8 format than convert it to ascii when i am unzipping files , i want to make con...

Transform project from windows-1256 to utf-8 charset, what's the right steps?

I got a PHP & MySQL script that use windows-1256 charset, I now want to modify the whole script make it completely built on utf-8 charset. starting from mysql DataBase to PHP files. what is the right steps to achive that's??!! Note: I use non-Latin language in script (Arabic language). ...

The browser shows me "???" instead of UTF-8 characters

The browser shows me "???" instead of UTF-8 characters. What is the cause and how can I fix it? Here is the HTML file: <HTML> <title>Search Engine</title> <form action='search.php' method='GET'> <font face='sans-serif' size='5'> <center> My Search Engine.<br> <input ...

LANG and sed on OSX

In a recent question it was noted that on OSX running sed on a non ascii file gave strange results. For instance if you do (/usr/bin/cal is a random binary file) sed 's/[^A-Z]//' /usr/bin/cal sed will remove all of the printable characters other than A-Z, but many nonprintable characters remain. If however, you do LANG='' sed 's/[...

JSP and tag files UTF-8 encoding

Hello, I am using Spring 3.0.3 + sitemesh + JSP and I am experiencing troubles with encoding of result page. I have used Spring's CharacterEncodingFilter to encode response and request with UTF-8, I have stated in JSTLViewResolver appropriate contentType. I also have saved my jsp's and tag's in UTF-8 format. What I would really want t...

WSGI content encoding

If I execute the following Python 3.1 program, I see only � instead of the correct characters in my browser. The file itself is UTF-8 encoded and the same encoding is sent with the response. from wsgiref.simple_server import make_server page = "<html><body>äöü€ßÄÖÜ</body></html>" def application(environ, start_response): start_res...

How do I quote a UTF-8 String Literal in Sqlite3

I'm looking to encode and store Unicode in a Sqlite database. Is there any way to raw encode a UTF-8 (unicode) string literal in a sql query. I'm looking for something similar to java where I can toss a \u00E9 into a string and have it automagically upconvert to Unicode. ...

Removing hex characters from xml file with php

So to start, I have an array of XML files. These files need to be iterated through and checked for certain 'unrecognized' hexadecimal characters and replaced with normal UTF-8 text, or some kind of placeholder. I've tried iterating through the files and replacing the hex codes using both str_replace and preg_replace with no luck. My ul...

Why is PHP/MySQL inserting my Chinese characters differently?

Gday All, I have a baffling problem whilst trying to insert some chinese characters into my MySQL database from PHP using mysqlnd. I have a form that accepts some details, eg Internal Name, External Name, Shot Name, etc... I enter "语言测试" (Language Testing) into all three fields in the form. I am submitting my information using an inn...

How to skip invalid characters in XML file using PHP

I'm trying to parse an XML file using PHP, but I get an error message: parser error : Char 0x0 out of allowed range in I think it's because of the content of the XML, I think there is a speical symbol "☆", any ideas what I can do to fix it? I also get: parser error : Premature end of data in tag item line What might be caus...

Emacs, unicode, xterm mouse escape sequences, and wide terminals

Hi all, Short version: When using emacs' xterm-mouse-mode, Somebody (emacs? bash? xterm?) intercepts xterm's control sequences and replaces them with \0. This is a pain on wide monitors because only the first 223 columns have mouse. What is the culprit, and how can I work around it? From what I can tell this has something to do with ...

Is "VARCHAR(255) CHARACTER SET utf8" 255 bytes or 255 characters.

I've declared a field in my INNODB/MySQL table as VARCHAR(255) CHARACTER SET utf8 NOT NULL however when inserting my data is truncated at 255 bytes not characters. This might chop the trailing two bite code point i*emphasized text*n two leaving an invalid character. Any ideas what I might be doing wrong EDIT: A sample session is l...

Formatting string for xml attribute in php

I have some strings that are valid in my database but when I include them in an attribute of a UTF-8 XML output they give me the following error: XML Parsing Error: not well-formed My current code (simplified): header('Content-Type: text/xml'); echo '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>'; echo '<root attribute=...

Should Unicode be allowed in usernames?

Why do most (all?) websites only support usernames in ASCII? Are there any security considerations if an admin decides to start accepting Unicode usernames? ...