utf-8

Java parsing UTF8

I have the following issue with a UTF8 files structured as following: FIELD1§FIELD2§FIELD3§FIELD4 Looking at hexadecimal values of the file it uses A7 to codify §. So according to this codify it should be UTF8, but it's strange because A7 > 7F so 1 byte shouldn't be enough to codify §. So I tried using directly a BufferedReader wi...

Can base64 encoding applied to multibyte utf-8 characters ?

Can base64 encoding applied to multibyte utf-8 characters ? How base64 encoded string is converted back to multibyte utf-8 string ? ...

Convert non-breaking spaces to spaces in Ruby

I have cases where user-entered data from an html textarea or input is sometimes sent with \u00a0 (non-breaking spaces) instead of spaces when encoded as utf-8 json. I believe that to be a bug in Firefox, as I know that the user isn't intentionally putting in non-breaking spaces instead of spaces. There are also two bugs in Ruby, one o...

using .NET how to convert iso8859-1 encoded text files that contain Latin-1 accented characters to utf-8

I am being sent text files saved in iso88591-1 format that contain accented characters from the Latin-1 range (as well as normal ASCII a-z etc). How to convert these files to utf-8 using C# so that the single-byte accented characters in iso8859-1 become valid utf-8 characters? I have tried to use a StreamReader with ASCIIEncoding, and ...

Saving XML in UTF-8 with MSXML

I'm trying to load a simple Xml file (encoded in UTF-8): <?xml version="1.0" encoding="UTF-8"?> <Test/> And save it with MSXML in vbscript: Set xmlDoc = CreateObject("MSXML2.DOMDocument.6.0") xmlDoc.Load("C:\test.xml") xmlDoc.Save "C:\test.xml" The problem is, MSXML saves file in ANSI instead of UTF-8 (despite the original file ...

Unicode and URI encoding, decoding and escaping in JavaScript

If you look at this table here, it has a list of escape sequences for Unicode characters that don't actually work for me. For example for "%96", which should be a –, I get an error when trying decode: decodeURIComponent("%96"); URIError: URI malformed If I attempt to encode "–" I actually get: encodeURIComponent("–"); "%E2%80%93" ...

Cleaning up nasty characters in PHP

Hi folks, Got a little issue where my client is pasting in content from Word into my little text editor in a CMS. The double quotes are coming back encoded in what looks like some form of UTF. Any ideas if I can strip/replace these using PHP when they get displayed out of my mySQL table. Here is the link to the page that spits out th...

Detect CJK characters in PHP

Hello, I've got an input box that allows UTF8 characters -- can I detect whether the characters are in Chinese, Japanese, or Korean programmatically (part of some Unicode range, perhaps)? I would change search methods depending on if MySQL's fulltext searching would work (it won't work for CJK characters). Thanks! ...

tomcat/jdbc/mysql: can insert ÿ(U+00FF) but not Ā (U+0100)

hi, my setup: mysql 5.1 show variables: | character_set_client | utf8 | character_set_connection | utf8 | character_set_database | utf8 | character_set_filesystem | binary | character_set_results | utf8 | character_set_server | utf8 ...

Why do Unicode characters show up properly in database, but as ? when printed in Java via Hibernate?

I'm writing a webapp, and interfacing with MySQL using Hibernate 3.5. Using "デスクトップ ინგლისური" as my test string, I can input the string and see that it is properly persisted into the database. However, when I later pull the value out of the database and print to the console as a String, I see "?????? ?????????". If I use new OutputS...

DBD::Oracle and utf8

Hello, I have some troubles inserting an UTF8 string into an oracle 10 database on Solaris, using the latest DBD::Oracle on perl v5.8.4. This are my DB settings > --------SELECT * from NLS_DATABASE_PARAMETERS------------------------------- > NLS_NCHAR_CHARACTERSET AL16UTF16 > NLS_LANGUAGE AMERICAN > NLS_TERRITORY AMERICA NLS_CURRENCY $...

UTF-8 Database Problem

I've a MySQL table that has a UTF-8 charset and upon attempting to insert to it via a PHP form, the database gives the following error: PDOStatement::execute(): SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xE8' for column ... The character in question is 'è', yet I don't see why this should be a problem cons...

How to send parameters with same encoding from javascript?

I have a javascript file that lots of people have embedded to their pages. Since I am hosting the file, I have control over that javascript file; I cannot control the way it is embedded because lots of people is using it already. This javascript file sends GET requests to my servlets, and the parameters passed with the request are recor...

Does Process.StartInfo.Arguments support a UTF-8 string?

Can you use a UTF-8 string as the Arguments for a StartInfo? I am trying to pass a UTF-8 (in this case a Japanese string) to an application as a console argument. Something like this (this is just an example! (cmd.exe would be a custom app)) var process = new System.Diagnostics.Process(); process.StartInfo.Arguments = "/K \"echo これはテス...

How to keep character encoding with database queries.

Hi, I am doing the following. 1) I am exporting a database and saving it to a file called dump.sql. 2) The file is then transferred to a different server via PHP ftp. 3) When the file has been successfully transferred the administrator has an option to run a 'dbtransfer' script on the new host. 4) This script blows up the script and ru...

Can not insert UTF8 to Database MySQL in Linux

When create table, I have setted charset = utf8. I create 1 store procedure to insert data to database. When insert data UTF8 to Database on Window, it works OK.(Display data correctly) But it doesnot work in Linux.(Display data not correctly) The strange thing is insert UTF8 work fine in window, but when i deploy MySQL in linux, wh...

Using php to create a password system with chinese characters

Hi guys, I'm having an issue with validating chinese characters against other chinese characters, for example I'm creating a simple password script which gets data from a database, and gets the user input through get. The issue I'm having is for some reason, even though the characters look exactly the same when you echo them out, my if...

Can str_replace be safely used on a UTF-8 encoded string if it's only given valid UTF-8 encoded strings as arguments?

PHP's str_replace() was intended only for ANSI strings and as such can mangle UTF-8 strings. However, given that it's binary-safe would it work properly if it was only given valid UTF-8 strings as arguments? Edit: I'm not looking for a replacement function, I would just like to know if this hypothesis is correct. ...

iphone's nsxmlparser parsing RSS causes encoding problems

Hi, Im working on simle RSS reader. This reader loads data from internet via this code: NSXMLParser *rss = [[NSXMLParser alloc] initWithURL:[NSURL URLWithString:@"http://twitter.com/statuses/user_timeline/50405236.rss"]]; My problem is with encoding. RSS 2.0 file is supposed to be UTF8 encoded according to encoding attribute in XML fi...

Hyphen encoding (minus) in Google Base RSS feed

I am trying to create an automatic feed generation for data to be sent to Google Base using utf-8 encoding. However I am getting errors whenever hyphens are found telling me that there is an encoding error in the relevant attribute (title, description, product_type). I am currently using: &amp;minus; but I have also tried: &amp;#8722...