questions about utf-8 | ansaurus

utf-8

PLINK change character set translation to UTF-8 (utf8)

Does anyone know how to configure the character set translation on plink (the command line version of Putty) to UTF-8? I'm trying to SSH to a Linux server whose character set configuration is UTF-8. This can be easily achieved via Putty, but I can't seem to find that command line option on Plink... ...

character-encoding

Ruby to_yaml utf8 string

How can I make ruby to_yaml method to store utf8 strings with original signs but not escape sequence? ...

iPhone setting text from HTTP JSON response in UITableView displays corrupted characters

I'm trying to display text in a UITableView that's a response from a JSON containing something like "Congratulations you just won a prize!", and when I display it in a UITableView the first 9 characters are "corrupted" and display weird symbols. When I do an NSLog("%@", jsonString); I see the text correctly. Is there some type of UTF8...

Optimal MySQL-configuration (my.cnf)

The following is my default production MySQL configuration file (my.cnf) for a pure UTF-8 setup with InnoDB as the default storage engine. [server] bind-address=127.0.0.1 innodb_file_per_table default-character-set=utf8 default-storage-engine=innodb The setup does the following: Binds to localhost:3306 (loopback) instead of the defa...

PHP: How to remove all non printable characters in a string?

I imagine I need to remove chars 0-31 and 127, Is there a function or piece of code to do this efficiently. ...

Decoding double encoded utf8 in Python

Hi, I've got a problem with strings that I get from one of my clients over xmlrpc. He sends me utf8 strings that are encoded twice :( so when I get them in python I have an unicode object that has to be decoded one more time, but obviously python doesn't allow that. I've noticed my client however I need to do quick workaround for now be...

Bizzare eclipse-pydev console behavior

Stumbled upon some seemingly random character mangling in eclipse-pydev console: specific characters are read from stdout as '\xd0?' (first byte correct, second "?") Is there some solution to this? (PyDEV 1.4.6, Python 2.6, console encoding - inherited UTF-8, Eclipse 3.5, WinXP with UK locale) Code: import sys if __name__ == "__main_...

How can I check whether a byte array contains a unicode string in Java

Given a byte array that is either a UTF-8 encoded string or arbitrary binary data, what approaches can be used in Java to determine which it is? The array may be generated by code similar to: byte[] utf8 = "Hello World".getBytes("UTF-8"); Alternatively it may have been generated by code similar to: byte[] messageContent = new byte[2...

Storing and displaying unicode string (हिन्दी) using PHP and MySQL

I have to store hindi text in a MySQL database, fetch it using a PHP script and display it on a webpage. I did the following: I created a database and set its encoding to UTF-8 and also the collation to utf8_bin. I added a varchar field in the table and set it to accept UTF-8 text in the charset property. Then I set about adding data ...

internationalization

Why does anyone use an encoding other than UTF-8?

I want to know why any developer would need to use an encoding other than UTF-8. ...

Reading a UTF-8 Unicode file through non-unicode code.

I have to read a text file which is Unicode with UTF-8 encoding and have to write this data to another text file. The file has tab-separated data in lines. My reading code is C++ code without unicode support. What I am doing is reading the file line-by-line in a string/char* and putting that string as-is to the destination file. I can'...

Simple library to do UTF-8 in Haskell (since Streams no longer compile)

I just want to read (and maybe write) UTF-8 data. haskell.org still advertises System.Streams which does not compile with recent ghc: % runhaskell Setup.lhs configure Configuring Streams-0.2.1... runhaskell Setup.lhs build Preprocessing library Streams-0.2.1... Building Streams-0.2.1... [10 of 45] Compiling System.FD ( System/FD....

How do I encode UTF-8 using the XStream framework?

Per XStream's FAQ its default parser does not preserve UTF-8 document encoding, and one must provide their own encoder. How does one do this? Thanks! ...

Parsing XML Encoded in UTF-8

I am working with a Wikipedia XML dump that is encoded in UTF-8. Right now, I am reading in everything as std::string, so when I std::cout to the screen, foreign characters are displayed as jibberish. The actual parsing process only looks for ASCII characters though, but when I write the parsed file to disk, I want to preserve the fore...

How do I gurantee that utf-8 characters are scraped accurately using CURL in php?

Hello, I am scraping webpages (using php's curl) that have accented characters (like "é"). In the source of those webpages, those characters are written using utf-8 (they are not html encoded.) However, when the result is produced using the following code, I get question marks instead of the accented characters. $ch = curl_init(); $ti...

screen-scraping

Convert UTF-16 hex to UTF-8 in PHP

I have the following output from strace and i want to convert it to UTF-8 using PHP: R\00f6dhakev\00e4gen 4 R\00e4ntm\00e4starv\00e4gen 24 K\00d8BENHAVN The above strings is UTF 16 HEX i think. ...

Best way to shorten UTF8 string based on byte length

A recent project called for importing data into an Oracle database. The program that will do this is a C# .Net 3.5 app and I'm using the Oracle.DataAccess connection library to handle the actual inserting. I ran into a problem where I'd receive this error message when inserting a particular field: ORA-12899 Value too large for column ...

Check if a char* buffer contains UTF8 characters?

In the absence of a BOM is there a quick and dirty way in which I can check if a char* buffer contains UTF8 characters? ...

PHP: Converting from UTF-8 HTML

I have a French site that I want to parse, but am running into problems converting the (uft-8) html to latin-1. The problem is shown in the following phpunit test case: class Test extends PHPUnit_Framework_TestCase { private static function fromHTML($str){ return html_entity_decode($str, ENT_QUOTES, 'UTF-8'); } publi...

Java: Converting String to and from ByteBuffer and associated problems

I am using Java NIO for my socket connections, and my protocol is text based, so I need to be able to convert Strings to ByteBuffers before writing them to the SocketChannel, and convert the incoming ByteBuffers back to Strings. Currently, I am using this code: public static Charset charset = Charset.forName("UTF-8"); public static Cha...

character-encoding

1
...
14
15
16
17
18
...
69