questions about utf-8 | ansaurus

utf-8

Dealing with ISO-encoding in AJAX requests (prototype)

I have a HTML-page, that's encoded in ISO-8859-1 and a Prototype-AJAX call that's build like this: new Ajax.Request('api.jsp', { method: 'get', parameters: {...}, onSuccess: function(transport) { var ajaxResponse = transport.responseJSON; alert(ajaxResponse.msg); } }); The api.jsp returns its data in IS...

C++ UTF-8 output with ICU

I'm struggling to get started with the C++ ICU library. I have tried to get the simplest example to work, but even that has failed. I would just like to output a UTF-8 string and then go from there. Here is what I have: #include <unicode/unistr.h> #include <unicode/ustream.h> #include <iostream> int main() { UnicodeString s = UNI...

Get ant concat to ignore BOM's'?

I have an ant build that concatenates my javascript into one file and then compresses it. The problem is that Visual Studio's default encoding attaches a BOM to every file. How do I configure ant to strip out BOM's that would otherwise appear in the middle of the resulting concatenated file? My googl'ing revealed this discussion which i...

Unicode characters in URLs

In 2010, would you serve URLs containing UTF-8 characters in a large web portal? Unicode characters are forbidden as per the RFC on URLs (see here). They would have to be percent encoded to be standards compliant. My main point, though, is serving the unencoded characters for the sole purpose of having nice-looking URLs, so percent enc...

SQL tables using VARCHAR with UTF8 (with respect to multi byte character length)

Like in Oracle VARCHAR( 60 CHAR ) I would like to specify a varchar field with variable length depending on the inserted characters. for example: create table X (text varchar(3)) insert into X (text) VALUES ('äöü') Should be possible (with UTF8 as the default charset of the database). On DB2 I got this Error: DB2 SQL Error: SQLCODE...

Should I convert overlong UTF-8 strings to their shortest normal form?

I've just been reworking my Encoding::FixLatin Perl module to handle overlong UTF-8 byte sequences and convert them to the shortest normal form. My question is quite simply "is this a bad idea"? A number of sources (including this RFC) suggest that any over-long UTF-8 should be treated as an error and rejected. They caution against "n...

setting the header of a response in python / django

This is my code: template = loader.get_template('blog/post.html') c = Context(parameterDict) return HttpResponse(template.render(c)) I am using this to render data into a template(contained in parameterDict). The problem is that parameterDict contains certain UTF characters like ®. This is causing a problem in my template...

How can I output a UTF-8 encoded XML file with unix line-endings from ActivePerl on Windows?

I'm running ActivePerl 5.8.8 on WinXP. I'd like to output an XML file as UTF-8 with UNIX line endings. I've looked at the perldoc for binmode, but am unsure of the exact syntax (if I'm not barking up the wrong tree). The following doesn't do it (forgive my Perl - it's a learning process!): sub SaveFile { my($FileName, $Contents) = ...

How do you print raw UTF-8 characters from their numbers? [PHP]

Say I wanted to print a ÿ (latin small y with diaeresis) from its Unicode/UTF-8 number of U+00FF or hex of c3 bf. How can I do that in PHP? The reason is that I need to be able to create certain UTF-8 Characters is for testing in my regex and string functions. However, since I have less than 200 keys on my keyboard I can't type them - a...

How would you create a string of all UTF-8 characters? [PHP]

There are many ways to represent the +1 million UTF-8 characters. Take the latin capital "A" with macron (Ā). This is unicode code point U+0100, hex number 0xc4 0x80, decimal number 196 128, and binary 11000100 10000000. I would like to create a collection of the first 65,535 UTF-8 characters for use in testing applications. These are a...

cannot output a json encoded dict containing accents (noob inside)

Hi all, here is a fairly simple example wich is driving me nuts since a couple of days. Considering the following script: # -*- coding: utf-8 -* from json import dumps as json_dumps machaine = u"une personne émérite" print(machaine) output = {} output[1] = machaine jsonoutput = json_dumps(output) print(jsonoutput) The result of thi...

Problem with ajax and posting non-latin characters

Posting non-latin based languages with ajax + jquery doesn't save to mysql the correct text. What I have done is this: I am getting multiple translated words from Google's translation api. The ajax request is showing the correct translations for all languages. But when i try and insert this into the db it shows up in php my admin as g...

Why can't I assign a scalar value to a class using shorthand, but instead declare it first, then set its value?

I am writing a UTF-8 library for C++ as an exercise as this is my first real-world C++ code. So far, I've implemented concatenation, character indexing, parsing and encoding UTF-8 in a class called "ustring". It looks like it's working, but two seemingly equivalent ways of declaring a new ustring behave differently. The first way: ustri...

Rails 2.3.5, Ruby 1.9, SQLite 3 incompatible character encodings: UTF-8 and ASCII-8BIT

Hello, I know that question with same title has been asked almost 6 month ago. I have Googled for this problem and I have not found any working solution. Has there been any fixes for this very critical problem? I need to get my website running ASAP. Just to get the site up and running I'm even ready to add utf8 conversion methods to ...

mysql replace accented characters

Hi, i would like to generate strict alphanumeric character logins from users' first and lastname. Since many of them are foreigners, their names have special characters (é, è, ï, ...). I would like to remove the accents (e,e,i,...) in the logins. Here is my query. Is there a character set that does not contain accents? UPDATE contacts...

character-encoding

How do I convert Windows 7 file-name encoding to UTF-8 for Ruby on Rails?

Hi (Ive looked at the other questions - none seemed to quite fit my problem.) I have some file-names under Windows 7 that need to be translated in to MySQL database (UTF-8) with Ruby on Rails. An example file-name includes "íéó" in some kind of Windows 7 file-system encoding. Ive tried many combinations of gsub and ActiveSupport::Mul...

character-encoding

Load JSON in Python as header character set

Hi everyone, I've always found character sets and encodings complicated to understand and here I'm faced with another problem. My apologies for any inaccuracies. I'll do my best. I'm requesting data from a server which returns JSON. In the HTTP headers it also returns the character set like so: Content-Type: text/html; charset=UTF-8 ...

character-encoding

Outputing json with well formed accents

Hello, I have an anoying problem that is giving me a hard time these days... I would like to develop a few webservices for my own usage and currently i am fighting with my damn french accents to be rendered correctly in my json outputs. Here is my scenario: I retrieve a number of lines from my database that i put in a dict. What i want...

UTF8 charset, diacritical elements, conversion problems - and Zend Framework form escaping

Hey, I am writing a webapp in ZF and am having serious issues with UTF8. It's using multi lingual content through Zend Form and it seems that ZF heavily escapes all of these characters and basically just won't show a field if there's diacritical elements 'é' and if I use the HTML entity equivalent e.g. it gets escaped so that the user...

How do I change mysql settings so that it is default UTF-8 for everything?

I am getting "ASCII encoding" errors when I insert into my database because I did a fresh install of the MYSQL. I'd like to change the default to UTF-8 again. This is the error I'm getting because MYSQL is not set in UTF-8 mode: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) ...

1
...
42
43
44
45
46
...
69