character-encoding

how to encode/decode escape sequence characters in python

how to encode/decode escape sequence character '\x13' in python into a character that is valid in a RSS or XML. use case is, I am getting data from arbitrary sources and making a RSS feed for that data. The data source sometimes have escape sequence character which is breaking my RSS feed. So how can I sanitize the input data with e...

css renders with strange characters.

Hi friends, I have an index.html and global.css files. When I open these files at Coda, Textmate, etc. , everything looks fine. than I try in firefox, index.html loads css from right path, but it doesnt take effect. than I tried to see css code from firefox, and I see signs like; ॵ氮扵汬整筰慤摩湧㨰‵灸‰′㕰硽畬⹢畬汥琠汩筬楳琭獴祬攺摩獣㭰慤摩湧㨲灸紮摲慷汩湥筢潲摥爭扯瑴潭㨱灸⁤慳桥...

Encoding detection library in python

This is somehow related to my question here. I process tons of texts (in HTML and XML mainly) fetched via HTTP. I'm looking for a library in python that can do smart encoding detection based on different strategies and convert texts to unicode using best possible character encoding guess. I found that chardet does auto-detection extrem...

PHP/MySQL, discarding Unicode sent from a client

All of our tables are currently set with a LATIN1 character set. A user is currently capable of putting together unicode sequences on the client and trying to embed them into our application. What's the best way to discard all Unicode characters from hitting our database? Even better, that's the best way to ensure that only characters ba...

Changing character sets on a live MySQL database

I currently have a bunch of tables using the latin1 charset in a MySQL 5.1.x DB. Problem is, we recently had a bunch of users trying to input text using UTF-8 encoding, and that seemed to break things. Is it safe to blindly update the table's character set? What are some best practices (besides obviously backing everything up) for a sit...

How do I encode diacritics for Twitter updates?

I have my own Twitter API and I've received a couple emails about a problem when trying to post a status update with accent marks and other diacritics. I would like to encode these so that the status update still has them. I know there are ways to remove the diacritic, but I would like to keep it. I read the Twitter Counting Character...

ASCII-EBCDIC Java converter, using COMTBLG Microsoft SNA Server GTable

Hi, I'm actually working on a Java Host integration. The actual system uses Microsoft SNA Server, where an ASCII-EBCDIC conversion is done based on local COMTBLG Gtable. Do you know the specification of this file? Is there anyone having coded a Java program to read it? Thanks in advance. Esteve ...

Regular Expression with foreign languages

I have a function that I have used a bunch of times in various files which has a signature like: Translate("English Message", "Spanish Message", "French Message") and I am wanting to pull out the English, Spanish and French messages and then output them into a csv so that people who actually know these languages can tell me what I SHO...

Illegal mix of collations in mySQL

I need to transfer a column from one table to another. The source table has a different collation than the target table (latin1_general_ci and latin1_swedish_ci). I use UPDATE target LEFT JOIN source ON target.artnr = source.artnr SET target.barcode = source.barcode I get an "illegal mix of collations". What is a quick fix to ge...

Foreign characters turn into garbage in mysql

I am in the U.S. I have the following line in my web page: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> And my MYSQL table is MyISAM latin1_swedish_ci But when someone fills out a form with a foreign character it gets stored in MySql as garbage. An example would be an e with accent over it, etc. - something...

How are these strange characters getting into Crystal Reports PDF Export?

We have a Crystal Reports 2008 report which merges database data with some SurveyMonkey free-text data stored in an Excel spreadsheet. The free text data looks OK in Excel, looks OK when copied/pasted to Notepad, and looks OK in the Crystal Report. But when we export the crystal report to PDF, a lot of strange box characters get append...

How would you design an 8-bit encoding?

How would you design an 8-bit encoding of a set of 256 characters from western languages (say, with the same characters as ISO 8859-1) if it had not to be backward-compatible with ASCII? I'm thinking to rules of thumb like these: if ABC...XYZabc...xyz0123...89 were, in this order, the first characters of the set (codes from 0 to 61), th...

Appropriate character encoding / collation to store URLs?

My web application stores URL segments in a database. These URL segments are based on user-submitted content. What collation should I use for character strings that will appear in URLs? My assumption is ASCII General CI (?) based on this question: http://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid ...

ruby string encoding

So, I'm trying to do some screen scraping off of a certain site using nokogiri, but the site owners failed to specify the proper encoding of the page in a <meta> tag. The upshot of this is that I'm trying to deal with strings that think they're utf-8, but really aren't. (If you care, here are the files I was using to test this: main ...

Dealing with eacute and other special characters using Oracle, PHP and Oci8

Hi I am trying to store names into an Oracle database and fetch them back using PHP and oci8. However, if I insert the é directly into the Oracle database and use oci8 to fetch it back I just receive an e Do I have to encode all special characters (including é) into html entities (ie: &eacute;) before inserting into database ... or am ...

accented French characters

Is there any problem with ASPX to render french accented characters? I am using utf-8 to encode. I never had any problem like this before (but since this is the first time I am working on an ASP server is there any fix?) e.g Événements = Événements Journées fériées = Journées fériées Is this an encoding problem? or is there any ...

Turning HTML character entities to 'regular' letters... why is it only partially working?

I'm using all of the below to take a field called 'code' from my database, get rid of all the HTML entities, and print it 'as usual' to the site: <?php $code = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $code); $code = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $code); $code = html_entity_decode($code); ?> H...

Dealing with UTF-8 numbers in Python

Hi, I have read many similar questions, apologies if this is considered a duplicate. Suppose I am reading a file containing 3 comma separated numbers. The file was saved with with an unknown encoding, so far I am dealing with ANSI and UTF-8. If the file was in UTF-8 and it had 1 row with values 115,113,12 then: with open(file) as f: ...

Cannot select Unicode data from PostgreSQL with LIKE

I have a PostgreSQL database with some Unicode values. For example "vaishali" in Marathi. I want to fire a query SELECT * FROM table WHERE name LIKE vaishali (I type "vaishali" in Marathi, so I first convert to unicode in my prog). But it matches nothing. Why? ...

ASP.NET chart controls & character encoding issues

I'm trying to use the ASP.NET chart controls for a website that is localised for number of languages. However, we've had issues with the charts when we recently added a Chinese localisation - all of the labels show squares where we actually want Chinese characters, as shown in my sample below (please note I don't know any Chinese so thi...