latin1

C#: "Swedish" characters in Xpath when parsing Lating1Encoded docs.

I've a set of html docs that I need to parse. They are encoded in Latin1Encoded. I'm using HtmlAgiliy pack for "parsing". I have a Xpath query (with swedish characters) that I can't get to work because of different encodings between the docs and the encoding VS stores the XPath query in?? Xpath query: doc.DocumentNode.SelectNodes(@"/...

Converting Composite Bytes to Unicode in MySQL

I have a MySQL database that I recently migrated to another server. Unfortunately, MySQL dumps its data in Latin1 with any UTF-8 characters represented by composite bytes (ex. – instead of —). Is it possible to run a simple query or script that would convert these composite bytes to UTF-8 within my tables? It's impossible to do it row...

What happens if I connect to a utf8 MySQL DB table using latin1?

Interesting question... if I have a MySQL table with CHARSET=utf8, and I open a connection with latin1 encoding, what happens? I tried this, and even characters such as ß and æ could be stored and retrieved properly. Those characters are represented with different byte sequences in utf8 and in latin1, so I didn't expect it to work. Is ...

Fixing older program: database text encoding, and incorrect field types.

I'm currently again working on a program from when I was, umm... less capable. It has a number of problems: The database collation is latin1_swedish_ci. I would like to convert it to utf8. How would I do this? The database has some fields that are boolean values stored as 0 or 1. However, the fields are varchars instead of bools. How c...

Converting a database from one character encoding to another

I have a MYSQL database. Text is currently stored in charset latin1, collation latin1_swedish_ci. These are the defaults and it wasn't a problem back in the day when the database was originally created. I want to switch over to UTF8 so the text encoding in the database matches out text encoding used elsewhere on the web site that uses t...

How can I convert an XML document from Latin-1 to UTF-8 in Perl?

We at the company want to convert all the sites we are hosting from Latin-1 to UTF-8. After a ot of googling, we have our Perl script almost complete. The only thing that is missing now are the XML files. What is the best way to convert XML from Latin-1 to UTF-8 and is it useful? I am asking because we are unsure about it since most en...

Collate information missing when converting a MySQL table from Latin1 to UTF8

I'm converting an existing table such as this: CREATE TABLE `example`(`id` int(10) unsigned NOT NULL AUTO_INCREMENT, `column1` char(32) COLLATE latin1_general_ci NOT NULL DEFAULT '', `column2` char(64) COLLATE latin1_general_ci NOT NULL ...

PHP/MySQL, discarding Unicode sent from a client

All of our tables are currently set with a LATIN1 character set. A user is currently capable of putting together unicode sequences on the client and trying to embed them into our application. What's the best way to discard all Unicode characters from hitting our database? Even better, that's the best way to ensure that only characters ba...

How can I convert Cyrillic stored as LATIN1 ( sql ) to true UTF8 Cyrillic with iconv?

I have a SQL dump file consisting of incorrectly stored Cyrillic Russian ( WINDOWS-1251 ) text, example Èðàíñêèå which should properly be displayed as Иранские. In the past I have successfully converted the sql file but memory fails in what I did and in what order. Logically it would make sense that since it's stored in LATIN1 I would ...

using .NET how to convert iso8859-1 encoded text files that contain Latin-1 accented characters to utf-8

I am being sent text files saved in iso88591-1 format that contain accented characters from the Latin-1 range (as well as normal ASCII a-z etc). How to convert these files to utf-8 using C# so that the single-byte accented characters in iso8859-1 become valid utf-8 characters? I have tried to use a StreamReader with ASCIIEncoding, and ...

Differences between utf8 and latin1

what is the difference between utf8 and latin1? ...

latin1/unicode conversion problem with ajax request and special characters

Server is PHP5 and HTML charset is latin1 (iso-8859-1). With regular form POST requests, there's no problem with "special" characters like the em dash (–) for example. Although I don't know for sure, it works. Probably because there exists a representable character for the browser at char code 150 (which is what I see in PHP on the serve...

rails, mysql charsets & encoding: binary

Hi, i've a rails app that runs using utf-8. It uses a mysql database, all tables with mysql's default charset and collation (i.e. latin1). Therefore the latin1 tables contain utf-8 data. Sure, that's not nice, but i'm not really interested in it. Everything works fine, because the connection encoding is latin1 as well and therefore mysq...

How to store characters like ♥☆ to DB?

Previous issue - was not able to store non-english characters: http://stackoverflow.com/questions/3008918/how-to-store-non-english-characters That was fixed by using UTF8. But realized today that symbols like ♥☆ are not stored correctly. They get converted to characters like ♥☆. How can this be fixed? ...

MySql varchar change from Latin1 to UTF8

In a mySql table I'm using Latin1 character set to store text in a varchar field. As our website now is supported in more countries we need support for UTF8 instead. What will happen if I change these fields to UTF8 instead? Is it secure to do this or will it mess up the data inside these fields? Is it something I need to think about whe...

Use latin characters in appengine

How can store latin characters in appengine? (e.g. "peña") when I want to store this I get this error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xf1 in position 2: ordinal not in range(128) I can change the Ñ by N, but, there not another and better way? And if i encode the value, how can print "Peña" again? ...

inserting latin1-encoded text into utf8 tables (forgot to use mysql_set_charset)

I have a PHP web app with MySQL tables taking utf8 text. I recently converted the data from latin1 to utf8 along with the tables and columns accordingly. I did, however, forget to use mysql_set_charset and the latest incoming data I would assume came through the MySQL connection as latin1. I don't know what happens when latin1 comes in t...

Python 3 chokes on CP-1252/ANSI reading

I'm working on a series of parsers where I get a bunch of tracebacks from my unit tests like: File "c:\Python31\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 112: character maps to <undefined> T...

Utf-8 characters displayed as ISO-8859-1

Hi there, I've got an issue with inserting/reading utf8 content from a db. All verifications I'm doing seem to point to the fact that the content in my DB should be utf8 encoded, however it seems to be latin encoded. The data are initially imported from a PHP script from the CLI. Configuration: Zend Framework Version: 1.10.5 mysql-ser...

How can I detect non-western characters?

I want to disallow certain UTF-8 input (server-side), e.g. eastern languages, where example input might be " 伊 ". However, I do want to continue supporting other latin or "latin-like" characters, such as the welsh ŵ and ŷ, so checking against latin-1 is not possible. What are my options? (if language specific, PHP preferred) Thanks ve...