utf-8

how to show chinese word , not unicode word.

this is my code: from whoosh.analysis import RegexAnalyzer rex = RegexAnalyzer(re.compile(ur"([\u4e00-\u9fa5])|(\w+(\.?\w+)*)")) a=[(token.text) for token in rex(u"hi 中 000 中文测试中文 there 3.141 big-time under_score")] self.render_template('index.html',{'a':a}) and it show this on the web page: [u'hi', u'\u4e2d', u'000', u'...

inserting latin1-encoded text into utf8 tables (forgot to use mysql_set_charset)

I have a PHP web app with MySQL tables taking utf8 text. I recently converted the data from latin1 to utf8 along with the tables and columns accordingly. I did, however, forget to use mysql_set_charset and the latest incoming data I would assume came through the MySQL connection as latin1. I don't know what happens when latin1 comes in t...

How to decode UTF-8 encoded String using java?

Actually i'm having String in UTF-8 encoded form in the mail. I want it to decode it. I use Java mimeutility.decode text. But it doesn't decode properly. Example String =?UTF-8?B?0J/RgNC40LLQtdGC?==?UTF-8?B?0JfQtNGA0LDQstGB0YLQstGD0LnRgtC1?= When i used MimeUtility.decodeText("=?UTF-8?B?0J/RgNC40LLQtdGC?==?UTF-8?B?0JfQtNGA0LD...

Malformed UTF-8 character error in regular expression in Perl

I have 'Malformed UTF-8 character' error when I'm putting some scalar data in XML::Simple or Data::Dumper. There are regular expressions on the lines where the error occurs. Malformed UTF-8 character (fatal) at /usr/share/perl5/XML/Simple.pm line 1690. Malformed UTF-8 character (fatal) at /usr/lib/perl/5.10/Data/Dumper.pm line 682. At...

Rails send_data throws "invalid byte sequence in UTF-8"... but why?

I'm using Rails to generate a PDF with the executable wkhtmltopdf and then using send_data to send the result back to the user as a PDF file. view = ActionView::Base.new(ActionController::Base.view_paths, {}) html = "<h1>A heading</h1>" pdfdata = `echo '#{html}' | #{RAILS_ROOT}/lib/pdf/wkhtmltopdf-i386 - -` send_data pdfdata, :file...

cp1250_general_ci to UTF-8 help

Hello, I'm fetching data from external database (I cannot edit it so don't suggest that please) which has internal encoding set to cp1250_general_ci. I need to display that data as UTF-8 but I cannot get it to work. I'm using this to fetch the data: $dsn = 'mysql:dbname=eklient;host=127.0.0.1'; $user = 'root'; $password = 'root'...

Bash equivalent to Python's string literal for utf string conversion.

I'm writing a bash script that needs to parse html that includes special characters such as @!'ó. Currently I have the entire script running and it ignores or trips on these queries because they're returned from the server as decimal unicode like this: &#39;. I've figured out how to parse and convert to hexadecimal and load these into py...

rails 2.3.5 with ruby 1.9.1p429 : incompatible character encodings: ASCII-8BIT and UTF-8

Hi, I tried the ruby hacks for utf8 (from : http://gist.github.com/273741) ... and I'm still getting the following error: ActionView::TemplateError (incompatible character encodings: ASCII-8BIT and UTF-8) What is bizarre for me is that the same content if retrieved with a post action (searching the app with an html from) it is display...

Failed to getResource() utf file from package in Android

Hi, I have a custom java library which getResource() from an UTF-8 encoded text file in the package. keyWordPairs = new Hashtable<String, Vector<String>>(); try { File pinYinDatabase = new File(this.getClass().getClassLoader().getResource("myCustomLibrary/NewPinYin.utf").getFile()); BufferedReader br = new BufferedReader(new Fi...

Django \u characters in my UTF8 strings

Hiya, I am adding UTF-8 data to a database in Django. As the data goes into the database, everything looks fine - the characters (for example): “Hello” are UTF-8 encoded. My MySQL database is UTF-8 encoded. When I examine the data from the DB by doing a select, my example string looks like this: ?Hello?. I assume this is showing the c...

Sending in UTF-8 from Mail_Queue PEAR Package

So, I have set up the PEAR Mail_queue package on my server, and I have it running fine, and sending emails out. I have it set to run by a cron-job every 15 minutes. Everything works fine, except the problem is that I need to send emails in Chinese, and when I send them using the Mail_queue package, I only get gibberish. I'm assuming that...

Japanese characters are not displaying correctly in IE 8...not sure about earlier versions

The URLs exhibiting this behavior is here: http://culturewithinaculture.org/introduction.php http://culturewithinaculture.org/about.php user: cwac pass: cwac2112 The site has not been launched officially. But my problem is on the right side, Japanese copy. I have my document type set to UTF-8 which is what I thought it should be. On...

Char Encoding: Changing file from MacRoman to UTF-8 breaks string

I am working on a CakePHP site saved in MacRoman char encoding. I want to change all the files to UTF-8 for internationalisation. For all the other files in the site this works fine. However, in the core.php file there is a security salt, which is a string with special characters ("!:* etc.). When I save this file as UTF-8 the salt g...

Java webstart character encoding issues

I have a JavaFX/Groovy application that I'm trying to localize. It turns out that when I use JavaFX standard execution with the Java VM arg "-Dfile.encoding=UTF-8" locally, all of my international characters (for example, ü) display correctly. However, if I invoke the app via a JNLP file, using java-vm-args="-Dfile.encoding=UTF-8" e.g....

URL charset encoding google vs yahoo

What's the best way to manage i18n urls? It's strange because google and facebook encode utf8 ex. search ★。SмAck%2BтнAт。★ on google while yahoo doesn't do it. ex. search ★。SмAck%2BтнAт。★ on yahoo How do u manage utf8 urls and which libs do u use? -- edit I tried on Firefox and the behavior is the same, so the question is: Do you have ...

Zend DB and encoding

Hi guys I have just encountered something rather strange, I use the Zend Framework 1.10 with the Zend_Db_Table module to read some data from a databse. The database itself, the table and the fields in question all have their collation set to "utf8_general_ci" and all special chars appear correctly formatted in the DB when checked with p...

ruby 1.9 + sinatra incompatible character encodings: ASCII-8BIT and UTF-8

I'm trying to migrate a sinatra application to ruby 1.9 I'm using sinatra 1.0, rack 1.2.0 and erb templates when I start sinatra it works but when I request the web page from the browser I get this error: Encoding::CompatibilityError at / incompatible character encodings: ASCII-8BIT and UTF-8 all .rb files has this header: #!/usr/b...

How to best deal with Windows' 16-bit wchar_t ugliness?

I'm writing a wrapper layer to be used with mingw which provides the application with a virtual UTF-8 environment. Functions which deal with filenames are wrappers which convert from UTF-8 and call the corresponding "_w" functions, and so on. The big problem I've run into is that Windows' wchar_t is 16-bit. For filesystem operations, it...

Сonvert unicode (UTF-8) filenames to ANSI (DOS)

Directory listing with broken filenames encoding C:\Downloads\1>dir 18.01.2010 10:45 <DIR> РЎР?Р>Р?Р?С?Р+ 18.01.2010 10:45 <DIR> Р?Р?С'Р?Р>Р?Рє 18.01.2010 10:45 <DIR> Р"Р?С?Р?Р°С╪Р°-Р>РчС╪РчР+Р?Рё РєР?С?РїС?С? 18.01.2010 10:45 <DIR> Р•Р>Р•Р?РўР Р?Р?Р? Is there any tools for windows t...

Inserting utf8 characters in DB using Django

In Django how to use unicode when inserting into DB Example: name =request.POST["name"] //This may be in Chinese or any other lanuages usr = Users(name=name) usr.save() The Python version that is used in Cent os is python 2.4.3 and mod python version is 1.2.1_p2-1 ...