unicode

Internationalization in your projects

How have you implement Internationalization (18n) in actual projects you've worked on? I took an interest in making software cross-cultural after I read the famous post by Joel, The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). However I have yet to able to take...

Python, Unicode, and the Windows console

When I try to print a Unicode string in a windows console, I get a "UnicodeEncodeError: 'charmap' codec can't encode character ...." error. I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this? Is there any way I can make Python automatically print a "?" instead of faili...

Is it just me, or are characters being rendered incorrectly more lately?

I'm not sure if it's my system, although I haven't done anything unusual with it, but I've started noticing incorrectly rendered characters popping up in web pages, text-files, like this: I have a hunch it's a related to the fairly recent trend to use unicode for everything, which is a good thing I think, combined with fonts that don'...

How to display unicode text in OpenGL?

Is there a good way for displaying unicode text in opengl under Windows? For example, when you have to deal with different languages. The most common approach like #define FONTLISTRANGE 128 GLuint list; list = glGenLists(FONTLISTRANGE); wglUseFontBitmapsW(hDC, 0, FONTLISTRANGE, list); just won't do because you can't create enough list...

String To Lower/Upper in C++

What is the best way people have found to do String to Lower case / Upper case in C++? The issue is complicated by the fact that C++ isn't an English only programming language. Is there a good multilingual method? ...

How can I get Unicode characters to display properly for the tooltip for the IMG ALT in IE7?

I've got some Japanese in the ALT attribute, but the tooltip is showing me the ugly block characters in the tooltip. The rest of the content on the page renders correctly. So far, it seems to be limited to the tooltips. ...

Regex and unicode

I have a script that parses the filenames of TV episodes (show.name.s01e02.avi for example), grabs the episode name (from the www.thetvdb.com API) and automatically renames them into something nicer (Show Name - [01x02].avi) The script works fine, that is until you try and use it on files that have Unicode show-names (something I never ...

Unicode vs UTF-8 confusion in Python / Django?

I stumbled over this passage in the Django tutorial: Django models have a default str() method that calls unicode() and converts the result to a UTF-8 bytestring. This means that unicode(p) will return a Unicode string, and str(p) will return a normal string, with characters encoded as UTF-8. Now, I'm confused because afaik Unicode...

Are named entities in HTML still necessary in the age of Unicode aware browsers?

I did a lot of PHP programming in the last years and one thing that keeps annoying me is the weak support for Unicode and multibyte strings (to be sure, natively there is none). For example, "htmlentities" seems to be a much used funtion in the PHP world and I found it to be absolutely annoying when you've put an effort into keeping ever...

How do I put unicode characters in my Antlr grammar?

I'm trying to build a grammar with the following: NUMERIC: INTEGER | FLOAT | INFINITY | PI ... INFINITY: '∞' PI: 'π' But Antlr refuses to load the grammar. ...

'Reliable' SMS Unicode & GSM Encoding in PHP

(Updated a little) I'm not very experienced with internationalisation using PHP, it must be said, and a deal of searching didn't really provide the answers I was looking for. I'm in need of working out a reliable way to convert only 'relevant' text to Unicode to send in an SMS message, using PHP (just temporarily, whilst a service is r...

Formatting tabular data using unicode characters

I need to produce a calculation trace file containing tabular data showing intermediate results. I am currently using a combination of the standard ascii pipe symbols (|) and dashes (-) to draw the table lines: E.g. Numerator | Denominator | Result ----------|-------------|------- 6 | 2 | 3 10 | ...

cross platform unicode support

I find that getting Unicode support in my cross-platform apps a real pain in the butt. I need strings that can go from C code, to a database, to a Java application and into a Perl module. Each of these use a different Unicode encodings (UTF8, UTF16) or some other code page. The biggest thing that I need is a cross-platform way of doin...

Reading Email using Pop3 in C#

I am looking for a method of reading emails using Pop3 in C# 2.0. Currently, I am using code found in CodeProject. However, this solution is less than ideal. The biggest problem is that it doesn't support emails written in unicode. ...

MySQL UTF/Unicode migration tips

Does anyone have any tips or gotcha moments to look out for when trying to migrate MySQL tables from the the default case-insenstive swedish or ascii charsets to utf-8? Some of the projects that I'm involved in are striving for better internationalization and the database is going to be a significant part of this change. Before we look ...

Getting international characters from a web page?

I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as ÄÄRITALO! That is, html uses escaped markup for the special characters, such as Ä Is there a simple way of reading the html into the correct python st...

Unicode in C++

What's the best practice of unicode processing in C++? ...

Are you fluent in Unicode yet?

Almost 5 years ago Joel Spolsky wrote this article, "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)". Like many, I read it carefully, realizing it was high-time I got to grips with this "replacement for ASCII". Unfortunately, 5 years later I feel I have slip...

What do I need to know to globalize an asp.net application?

I'm writing an asp.net application that will need to be localized to several regions other than North America. What do I need to do to prepare for this globalization? What are your top 1 to 2 resources for learning how to write a world ready application. ...

international characters in Javascript

I am working on a web application, where I transfer data from the server to the browser in XML. Since I'm danish, I quickly run into problems with the characters æøå. I know that in html, I use the "æøå" for æøå. however, as soon as the chars pass through javascript, I get black boxes with "?" in them when using æøå...