character-encoding

Character set issues with Oracle Gateways, SQL Server, and Application Express

I am migrating data from a Oracle on VMS that accesses data on SQL Server using heterogeneous services (over ODBC) to Oracle on AIX accessing the SQL Server via Oracle Gateways (dg4msql). The Oracle VMS database used the WE8ISO8859P1 character set. The AIX database uses WE8MSWIN1252. The SQL Server database uses "Latin1-General, case-i...

Charset and POST request

Hi All! I have a Rails 2.3.5 application that is working fine with UTF-8 and international characters. Now I have made some integration to a payment gateway where I POST some data, wait a while and get a POST back. The problem is that when I get that post back the international characters are broken. Instead of "sørensen" I get: "søre...

Problem with iconv

Hi all! If you are on Mac OS X 10.6, and you are familiar with character encoding AND the terminal please do this: Open a terminal and type the following commands: echo sørensen > test.txt iconv -f UTF8 -t ISO-8859-1 test.txt You will see the output: "sørensen". Can somebody explain what is going on? ...

HTML inside webView

I am posting some data to a server using DefaultHttpClient class and in the response stream I am getting a HTML file. I save the stream as a string and pass it onto another activity which contains a WebView to render this HTML on the screen: response = httpClient.execute(get); InputStream is = response.getEntity().getContent(); Buffered...

Unable to Retrieve Simplified Chinese Characters From Form

I have a page that displays content retrieved from XML with no problems: <?xml version="1.0" encoding="UTF-8"?> <Root> <Fields> <NamePrompt>名字</NamePrompt> </Fields> </Root> Page encoding is set to GB18030 and it displays perfectly. However, when I retrieve inputted text from HttpContext.Current.Request.Form that's be...

[Integrity] of a Http Post Request from Iphone to web server

Hey everyone, I am currently building a module that makes possible to comment a news and as you probably understood, I will need to insert this new comment in my web database. I know this stuff can be very fastidous so I would like to know if someone has a method which could assure the integrity of the request by checking some of the u...

Any tool to convert bulk php files to UTF-8 without BOM?

Hi, i have a very large script which contains a lot of php files, so i need some windows tool or software which converts all those files into UTF-8 without BOM, i know this can be done with Notepad++ but you should convert each one. Thanks ...

Best practice to handle non-english characters in Ruby?

My program file is encoded in UTF-8 so "abc".length == 3 but "åäö".length == 6. I realize that å, ä, ö, etc. are stored as two bytes in UTF-8, and that a Ruby String is a sequence of bytes (not characters), but it is annoying! Is there a best practice to work around this problem? ...

Java Unicode encoding

A Java char is 2 bytes (max size of 65,536) but there are 95,221 Unicode characters. Does this mean that you can't handle certain Unicode characters in a Java application? Does this boil down to what character encoding you are using? ...

Understanding character encoding in typical Java web app

Some pseudocode: String a = "A bunch of text"; //UTF-16 saveTextInDb(a); //Write to Oracle VARCHAR(15) column String b = readTextFromDb(); //UTF-16 out.write(b); //Write to http response When you save the Java String (UTF-16) to Oracle VARCHAR(15) does Oracle also store this as UTF-16? Does the length of an Oracle VARCHAR refer to nu...

Character Encoding problem?

Hi, In my mysql database I have the following information in a page name field. ç,Ç,ö,Ö,ü,Ü,ı,İ,ş,Ş,ğ,Ğ If I do a phpmyadmin dump the above is exported. I am using a different php script and instead of the above I am getting this. "ç,Ç,ö,Ö,ü,Ãœ,ı,Ä°,ÅŸ,Åž,ÄŸ,Äž" This is the snippet which is generating the output. $da...

jQuery: AJAX umlauts & special characters are a mess

I've just created my first ajax function with jQuery which actually works, but unfortunately the character encoding (for characters like ä, ö, ü, ß, č, ć, å, ø) is a nightmare. My files and my database are all UTF-8. I've tried a multitude of options in the ajax function and the PHP function, none of which were satisfactory. This is ...

Convert a MySQL database from latin to UTF-8

I am converting a website from ISO to UTF-8, so I need to convert the MySQL database too. On the Internet, I read various solutions, I don't know wich one to choose. Do I really need to convert my varchar columns to binary, then to UTF-8 like that: ALTER TABLE t MODIFY col BINARY(150); ALTER TABLE t MODIFY col CHAR(150) CHARACTER SET ...

Java UTF-8 to ASCII conversion with supplements

Hi, we are accepting all sorts of national characters in UTF-8 string on the input, and we need to convert them to ASCII string on the output for some legacy use. (we don't accept Chinese and Japanese chars, only European languages) We have a small utility to get rid of all the diacritics: public static final String toBaseCharacters(f...

How do I convert character encodings with Javascript? JQuery.

Hi, Ive got this in an XML file that i parse with JQuery. <title>L&#229;ng</title> I'm using .text() for pulling out the text, but it's wrong encoded. How do I get it encoded to proper text? I want 'Lång' out of it. Edit: I'm using JQuery for getting data. Have it on my work computer, but something like this: $.ajax({ type: 'GET...

How do browsers/PHP handle characters outside the set characterset?

I'm looking into how characters are handled that are outside of the set characterset for a page. In this case the page is set to iso-8859-1, and the previous programmer decided to escape input using htmlentities($string,ENT_COMPAT). This is then stored into Latin1 tables in Mysql. As the table is set to the same character set as the pa...

Filtering Wikipedia's XML dump: error on some accents

I'm trying to index Wikpedia dumps. My SAX parser make Article objects for the XML with only the fields I care about, then send it to my ArticleSink, which produces Lucene Documents. I want to filter special/meta pages like those prefixed with Category: or Wikipedia:, so I made an array of those prefixes and test the title of each page ...

Django: Getting a Python encoding error when handling HTTP response in Latin1?

I'm working in Django, and using urllib2 and simplejson to parse some information from an API. The problem is that the API returns information in the Latin-1 encoding, and just once in a while there's a character in there that causes Django to crash horribly with an encoding error. This is my code: get_person_id_url = "http://www.domai...

Python: why does str() on some text from a UTF-8 file give a UnicodeDecodeError?

I'm processing a UTF-8 file in Python, and have used simplejson to load it into a dictionary. However, I'm getting a UnicodeDecodeError when I try to turn one of the dictionary values into a string: f = open('my_json.json', 'r') master_dictionary = json.load(f) #some json wrangling, then it fails on this line... mysql_string += " ('" + ...

Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8 in Perl

Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1 lines. Also, in the event of errors in th...