character-encoding

Oracle Database character set issue with the audit tables on Debian

I've got Oracle XE installed on Debian linux and the character set is configured to AL32UTF8. There are several client applications that connects to a database from Windows with the different locales - French etc, not English. That's ok with all the client data these applications put into database, nothing converted and text data in Fren...

How can I convert a string from windows-1252 to utf-8 in Ruby?

I'm migrating some data from MS Access 2003 to MySQL 5.0 using Ruby 1.8.6 on Windows XP (writing a Rake task to do this). Turns out the Windows string data is encoded as windows-1252 and Rails and MySQL are both assuming utf-8 input so some of the characters, such as apostrophes, are getting mangled. They wind up as "a"s with an accent ...

Unable to set DecoderFallback property of an Encoding type.

Hello, I'm attempting to set the DecoderFallback property of an arbitrary (but supported) encoding in my C# app. Essentially what i'm trying to do is this: ASCIIEncoding ascii = new ASCIIEncoding(); ascii.DecoderFallback = new DecoderExceptionFallback(); I'm getting an exception i've never seen before: System.InvalidOperationExc...

How to change character encoding of XmlReader

I have a simple XmlReader: XmlReader r = XmlReader.Create(fileName); while (r.Read()) { Console.WriteLine(r.Value); } The problem is, the Xml file has ISO-8859-9 characters in it, which makes XmlReader throw "Invalid character in the given encoding." exception. I can solve this problem with adding <?xml version="1.0" encoding="IS...

Unicode Problem with SQLAlchemy

I know I'm having a problem with a conversion from Unicode but I'm not sure where it's happening. I'm extracting data about a recent Eruopean trip from a directory of HTML files. Some of the location names have non-ASCII characters (such as é, ô, ü). I'm getting the data from a string representation of the the file using regex. If i ...

How to enforce internet explorer to use encoding given in meta tag?

I'm trying to prepare a demo html page with mixed english and arabic content. Basically it contains a small table with english phrases on the left, and the arabic translation on the right side. Because I don't understand arabic, I took the first three characters of the arabic alphabet from the Unicode reference. First attempt, using th...

What is the most efficient binary to text encoding?

The closest contenders that I could find so far are yEnc (2%) and ASCII85 (25% overhead). There seem to be some issues around yEnc mainly around the fact that it uses an 8-bit character set. Which leads to another thought: is there a binary to text encoding based on the UTF-8 character set? ...

Read UTF-8 XML with MSXML 4.0

I have a problem with classc ASP / VBScript trying to read an UTF-8 encoded XML file with MSXML. The file is encoded correctly, I can see that with all other tools. Constructed XML example: <?xml version="1.0" encoding="UTF-8"?> <itshop> <Product Name="Backup gewünscht" /> </itshop> If I try to do this in ASP... Set fso = Server...

Are there any problems converting between SHIFT_JIS and Unicode encodings?

I've heard there are (used to be?) ambiguous mappings between Unicode and SHIFT_JIS codes. This KB article somewhat proves this. So the question is: will I lose any data if I take SHIFT_JIS-encoded text, convert it to Unicode and back? Details: I'm talking about Windows (XP and on) and .NET (which in theory relies on NLS API). ...

Encoding problems in Linux & MySQL

I have developed my Java/EE program in Windows machine and everything worked perfectly in Windows, but when I installed my WAR to Jboss in Linux machine I have encoding issues with MySQL when I import csv-files. Csv files are encoded as ISO-8859-1 and file I import is encoded as ISO-8859-1. MySQL doesn't seem to get Strings encoded as UT...

PHP: Detect invalid characters in a text

Hello! I would like to parse user inputs with PHP. I need a function which tells me if there are invalid characters in the text or not. My draft looks as follows: <?php function contains_invalid_characters($text) { for ($i = 0; $i < 3; $i++) { $text = html_entity_decode($text); // decode html entities } // loop is used ...

PHP: 2 strings - which one is UTF-8 and which one not?

Hello! I have a database with lots of strings. Some of them are correctly UTF-8 encoded, some of them not. Therefore, I've set up a script which selects 100 strings from the db. The following function decides whether a string contains UTF-8 or not (no matter if it's correct): function detectUTF8($text) { return preg_match('%(?: ...

PHP Function to replace symbols with character codes to stop SQL Injection

I am trying to write a php function to stop MySQL injection attempts. What I am doing is using str_replace() to remove symbols and replace them with with their HTML character code. My issue is that the codes all contain &#; but I also want to replace those symbols with their codes. How can I do this without changing the code into somet...

Using wrong encoding when writing to a file C#

Hi all, I'm creating a binary file to transmit to a third party that contains images and information about each image. The file uses a record length format, so each record is a particular length. The beginning of each record is the Record Length Indicator, which is 4 characters long and represents the length of the record in Big Endia...

How to type a new line character in SQL Server Management Studio

In the "datagrid" view of an open table of data, how can I type a new line character into an nvarchar field directly in SSMS? Is there an alt code? ...

Why is this the extended ascii character (â, é, etc) getting replaced with <?> characters?

Why is this the extended ascii character (â, é, etc) getting replaced with <?> characters? I attached a pic... but I am using PHP to pull the data from MySQL, and some of these locations have extended characters... I am using the Font Arial. You can see the screen shot here: http://img269.imageshack.us/i/funnychar.png/ Still happening...

I have latin1 encoded data sitting in a UTF-8 mysql database, how do I fix this?

I have latin1 encoded data sitting in a UTF-8 mysql database, how do I fix this? There is no original data to go from unfortunately. I figured out this much as the only way I could display the data correctly was to set everything latin1 in PHP, HTML and MySQL. Once this is completed, I can change everything back to utf-8 in my html an...

Do I really need to switch from VARCHAR to VARBINARY for UTF-8 in Mysql & PHP?

Do I really need to switch from VARCHAR to VARBINARY and TEXT to BLOB for UTF-8 in Mysql & PHP? Or can I stick with CHAR/TEXT fields in MySQL? ...

What are the diffrences between utf8_general_ci and utf8_unicode_ci?

I've got two options for unicode that look promising for a mysql database. utf8_general_ci unicode (multilingual), case-insensitive utf8_unicode_ci unicode (multilingual), case-insensitive Can you please explain what is the difference between utf8_general_ci and utf8_unicode_ci? What are the effects of choosing one over the other when...

Fix file endcoding when downloading a file from Linux to Windows in php

Ok I have an issue. I have a linix web server (RHEL 4 with apache 2) that is used to house an application. Part of this application is a set of php scripts. I created a script that accepts some form variables and then downloads a file to the user. Here si the code: header('Content-Description: File Transfer'); header('Content-Type: ...