utf-8

Is it just me, or are characters being rendered incorrectly more lately?

I'm not sure if it's my system, although I haven't done anything unusual with it, but I've started noticing incorrectly rendered characters popping up in web pages, text-files, like this: I have a hunch it's a related to the fairly recent trend to use unicode for everything, which is a good thing I think, combined with fonts that don'...

How does Ruby 1.9 handle character cases in source code?

In Ruby 1.8 and earlier, Foo is a constant (a Class, a Module, or another constant). Whereas foo is a variable. The key difference is as follows: module Foo bar = 7 BAZ = 8 end Foo::BAZ # => 8 Foo::bar # NoMethodError: undefined method 'bar' for Foo:Module That's all well and good, but Ruby 1.9 allows UTF-8 source code. ...

How can I set LANG to ascii?

I'm accessing an ubuntu machine using PuTTY, and using gcc. The default LANG environment variable on this machine is set to en_NZ.UTF-8, which causes GCC to think PuTTY is capable of displaying UTF-8 text, which it doesn't seem to be. Maybe it's my font, I don't know - it does this: foo.c:1: error: expected â=â, â,â, â;â, âasmâ or â__...

MySQL UTF/Unicode migration tips

Does anyone have any tips or gotcha moments to look out for when trying to migrate MySQL tables from the the default case-insenstive swedish or ascii charsets to utf-8? Some of the projects that I'm involved in are striving for better internationalization and the database is going to be a significant part of this change. Before we look ...

Java, UTF-8 and Windows console

We try to use Java and UTF-8 on Windows. The application writes logs on the console, and we would like to use UTF-8 for the logs as our application has internationalized logs. It is possible to configure the JVM so it generates UTF-8, using -Dfile.encoding=UTF-8 as arguments to the JVM. It works fine, but the output on a Windows console...

Stuts2 Tiles Tomcat suspected of changing UTF-8 to ?????

I'm having some internationalisation woes: My UTF-8 string fields are being rendered in the browser as ???? after being returned from the database. After retrieval from the database using Hibernate, the String fields are presented correctly on inspection using the eclipse debugger. However Struts2/Tiles is rendering these strings as ?...

Best way to convert text files between character sets?

What is the fastest, easiest tool or method to convert text files between character sets? Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa. Everything goes: one-liners in your favorite scripting language, command-line tools or other utilities for OS, web sites, etc. Best solutions so far: On Linux/UNIX/OS X/cy...

What is the best way to change the encoding of text in PHP

I want to run text through a filter to ensure it is all UTF-8 encoded. What is the recommended way to do this with PHP? ...

How to handle UTF-8 characters in sqlite2 to sqlite3 migration

Trying the easy approach: sqlite2 mydb.db .dump | sqlite3 mydb-new.db I got this error: SQL error near line 84802: no such column: Ð In that line the script is this: INSERT INTO vehiculo VALUES(127548,'21K0065217',Ñ,'PA007808',65217,279,1989,3,468,'1998-07-30 00:00:00.000000','14/697/98-07',2,'',1); My guess is that the...

C++ strings: UTF-8 or 16-bit encoding?

I'm still trying to decide whether my (home) project should use UTF-8 strings (implemented in terms of std::string with additional UTF-8-specific functions when necessary) or some 16-bit string (implemented as std::wstring). The project is a programming language and environment (like VB, it's a combination of both). There are a few wish...

A script to change all tables and fields to the utf-8-bin collation in MYSQL

Is there a SQL or PHP script that I can run that will change the default collation in all tables and fields in a database? I can write one myself, but I think that this should be something that readily available at a site like this. If I can come up with one myself before somebody posts one, I will post it myself. ...

UTF-8 validation

I'm processing some data files that are supposed to be valid UTF-8 but aren't, which causes the parser (not under my control) to fail. I'd like to add a stage of pre-validating the data for UTF-8 well-formedness, but I've not yet found a utility to help do this. There's a web service at W3C which appears to be dead, and I've found a Wind...

Convert a UTF-8 string to/from 7-bit XML in PHP

How can UTF-8 strings (i.e. 8-bit string) be converted to/from XML-compatible 7-bit strings (i.e. printable ASCII with numeric entities)? i.e. an encode() function such that: encode("“£”") -> "“£”" decode() would also be useful: decode("“£”") -> "“£”" PHP's htmlenties()/html_entity_decode() pair d...

How do I truncate a java string to fit in a given number of bytes, once UTF-8 encoded?

How do I truncate a java String so that I know it will fit in a given number of bytes storage once it is UTF-8 encoded? ...

How to sort an array of UTF-8 strings?

I currentyl have no clue on how to sort an array which contains UTF-8 encoded strings in PHP. The array comes from a LDAP server so sorting via a database (would be no problem) is no solution. The following does not work on my windows development machine (although I'd think that this should be at least a possible solution): $array=arra...

Decode an UTF8 email header

Hi, I have an email subject of the form: =?utf-8?B?T3.....?= The body of the email is utf-8 base64 encoded - and has decoded fine. I am current using Perl's Email::MIME module to decode the email. What is the meaning of the =?utf-8 delimiter and how do I extract information from this string? ...

Unicode in PDF

My program generates relatively simple PDF documents on request, but I'm having trouble with unicode characters, like kanji or odd math symbols. To write a normal string in PDF, you place it in brackets: (something) There is also the option to escape a character with octal codes: (\527) but this only goes up to 512 characters. How ...

How do I correct the character encoding of a file?

I have an ANSI encoded text file that should not have been encoded as ANSI as there were accented characters that ANSI does not support. I would rather work with UTF-8. Can the data be decoded correctly or is it lost in transcoding? What tools could I use? Here is a sample of what I have: ç é I can tell from context (café should...

How Do You Write Code That Is Safe for UTF-8?

We have a set of applications that were developed for the ASCII character set. Now, we're trying to install it in Iceland, and are running into problems where the Icelandic characters are getting screwed up. We are working through our issues, but I was wondering: Is there a good "guide" out there for writing C++ code that is designed ...

Type double byte character into vbscript file

I need to convert (&rarr) to a symbol I can type into a ANSI VBScript file. I am writing a script that translates a select set of htmlcodes to their actual double byte symbols using a regex. Many languages accomplish this using "\0x8594;"... what is the equivelent in VBScript? ...