character-encoding

Trouble using sqlldr.exe with NCLOB values when unicode characters not in ASCII code range are used.

When we use sqlldr to populate an NCLOB column with a text value from a lob file and the character is not in the regular ASCII code range sqlldr bombs. Seemingly relevant sections from log file: EXTENSIONDATA DERIVED ***** VARCHARC Maximum field length is -2147483639 Static LOBFIL...

using character entities in java properties file

I'm trying to add some text to a web app via a java .properties file. I want the text to have an en-dash in it. If I add the character entity, thus: myProp=Foo – Bar or myProp=Foo – Bar I get the code in my output. If I add the literal character to the properties file (and save as UTF-8): mProp=Foo – Bar I get the li...

Unicode characters show differently in different browsers

So... I'm still in unicode hell... New problem... On my computer, everything shows perfectly. In all browsers. On a co-workers computer, same story. Everything is good. Even in elinks and w3m on one of my Linux VPS'es all the exotic diacritics of Lithuanian and Latvian, and nordic letters, shows perfectly. However, I have had a few ca...

Content type vs HTML encoding

Hello! I'm bulding a site and I've set its content type to use charset UTF-8. I'm also using HTML encoding for the special characters, ie: instead of having á I've got á. Now I wonder (still bulding the site) if it was really necesary to do both things. Looking for the answer I found this: http://www.w3.org/International/questi...

PHP: Problems converting "’" character from ISO-8859-1 to UTF-8

I'm having some issues with using PHP to convert ISO-8859-1 database content to UTF-8. I am running the following code to test: // Connect to a latin1 charset database // and retrieve "Georgia O’Keeffe", which contains a "’" character $connection = mysql_connect('*****', '*****', '*****'); mysql_select_db('*****', $connection); mysql_s...

Is there any benefit to adding accept-charset="UTF-8" to HTML forms, if the page is already in UTF-8?

For pages already specified (either by HTTP header, or by meta tag), to have a Content-Type with a UTF-8 charset... is there a benefit of adding accept-charset="UTF-8" to HTML forms? (I understand the accept-charset attribute is broken in IE for ISO-8859-1, but I haven't heard of a problem with IE and UTF-8. I'm just asking if there's a...

Charset problems from one page to the other

Hi. I'm having the following problem. On one page with a form, user_report.php", all characters like 'ç' or 'ã' are all correctly displayed. Now when submitting the data, anything with those characters get's displayed/transfered to the second page, "result.php", all wrong, for example: 'Restauração' get's transfered as 'Restauração'...

C++ encode string to Unicode - ICU library

Hi, I need to convert a bunch of bytes in ISO-2022-JP and ISO-2022-JP-2 (and other variations of ISO-2022) into Unicode. I am trying to use ICU (link text), but the following code doesn't work. std::string input = "\x1B\x28\x4A" "ABC\xA6\xA7"; //the first 3 chars are escape sequence to use JIS_X201 character set in GL/GR UErrorCode...

Help with Extended ASCII/Encoding in PHP!!

Good Evening folks. This is my code: static private function removeAccentedLetters($input){ for ($i = 0; $i < strlen($input); $i++) { $input[$i]=self::simplify($input[$i]); } return $input; } static private function simplify($in){ $ord=ord($in); switch ($ord) { case 193: //Á... return 'A'; ...

How to convert domain names with greek characters to an ascii URL?

For example: When typing παιχνιδια.com into Firefox, it is automatically converted to xn--kxadblxczv9d.com Please suggest a tool for making such a conversion. One of the easiest is this. Converts and checks for availability at the same time. ...

php curl, xml content character problem.

Hello I just start to develop php what I want to do is to get xml contents from another site but when i get it like this $options = array( CURLOPT_RETURNTRANSFER => true, // return web page CURLOPT_HEADER => false, // don't return headers CURLOPT_ENCODING => "UTF-8", // handle compressed CURLOPT_USERAGE...

Rails fixtures encoding error "incompatible character encodings: ASCII-8BIT and UTF-8"

Using ruby 1.9.2 and Rails 3 I get an encoding error when I try to run this in seeds.rb: Fixtures.create_fixtures("#{Rails.root}/db/seed", "countries") I am sure the .csv file is encoded in UTF-8 and it can be read and parsed using ruby's CSV class. Is this a Rails 3 encoding issue with fixtures? ...

Clojure failing to print non-ASCII chars on OS X

I've installed Clojure 1.2.0 using Homebrew package management system on Mac OS X 10.6.4. Running: $ clj -e '(println "русский язык\n")' in the Terminal results in: ??????? ???? While running in the same terminal: $ php -r 'echo "русский язык\n";' displays the Cyrillic text correctly. The same effect when running $ clj <some .c...

we have ? in a url and ajp converts it to %3F

with the Mod_jk connector we have this in our /etc/apache2/sites-available file: RewriteRule \/$ /op_ugw/orderportal/home?switchprofile=RecyledPlants [L] This works fine. and www.recycledplants.com will get you to the correct place. However on Ubuntu 10.04 server we setup ajp instead of mod_jk . so we have ProxyPass / ajp://10.1.1....

How to Determine "Lowest" Encoding Possible?

Scenario You have lots of XML files stored as UTF-16 in a Database or on a Server where space is not an issue. You need to take a large majority of these files that you need to get to other systems as XML Files and it is critical that you use as little space as you can. Issue In reality only about 10% of the files stored as UTF-16 ne...

Foreign characters in meta tags not displaying correctly

I have a site that is replicated in many languages. The site itself display characters correctly but when viewing source the meta tags show the "unknown character" question mark instead of the foreign character. What do I need to do differently for meta tags? I have this tag already: <meta http-equiv="content-type" content="applicat...

Mac OS X and Mercurial

Hello there. I recently acquired a MacBook. I compiled Mercurial 1.6.3, and set it all with NetBeans. The thing is, whenever I try to commit, and since I'm writing the revision message and my name with accented characters (in Spanish), I'm getting an error like: transaction abort! rollback completed abort: decoding near 'Naim? Batuta ...

Convert Special Characters for RTF

Can someone please assist me with converting special characters to something that can be correctly represented in an RTF file? I am taking text stored in a string on the iPad and outputting it as an RTF file using NSASCIIStringEncoding. So far so good. What I've neglected to do successfully, is take into account special characters (e.g....

Arabic characters appears like ??? after adding a Filter to JSP page

When I add a Filter to a particular JSP file, the Arabic characters in the output appears like ???, even when the page encoding is been set to UTF-8 by <% @page pageEncoding="UTF-8"%> and <% response.setCharacterEncoding("UTF-8");%>. The strange thing is, before I added the Filter, the output of all Arabic pages appears with correct e...

encoding guidelines to avoid issues with streams

Hello everyone, What are the guidelines i should follow to avoid encoding issues when reading files or converting string to bytes, bytes to streams streams to reader etc. Any important notes, tutorials would also help. Best Regards, Keshav ...