unicode

Square Bullet in XSL-FO

I am attempting to create a list in XSL-FO using a square bracket. I have been able to get it working using the standard unicode bullet character (•) but I just can't seem to get it working for square brackets. I have tried using ■, but that does not seem to work. It is important that i can get the square bullets working bec...

Output unicode strings in Windows console app

Hi I was trying to output unicode string to a console with iostreams and failed. I found this: Using unicode font in c++ console app and this snippet works. SetConsoleOutputCP(CP_UTF8); wchar_t s[] = L"èéøÞǽлљΣæča"; int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL); char* m = new char[bufferSize]; WideChar...

Python.expat can't parse XML file with bad symbols. How to go around?

I'm trying to parse an XML file (OSM data) with expat, and there are lines with some Unicode characters that expat can't parse: <tag k="name" v="абвгдежзиклмнопр�?туфхцчшщьыъ�?ю�?�?БВГДЕЖЗИКЛМ�?ОПРСТУФХЦЧШЩЬЫЪЭЮЯ" /> <tag k="name" v="Cin\x8e? Rex" /> (XML file encoding in the opening line is "UTF-8") The file is quite old, and there...

Can you get access to the NumberFormatter used by ICU MessageFormat

This may be a niche question but I'm working with ICU to format currency strings. I've bumped into a situation that I don't quite understand. When using the MesssageFormat class, is it possible to get access to the NumberFormat object it uses to format currency strings. When you create a NumberFormat instance yourself, you can specify a...

Does Lua support Unicode?

Based on the link below, I'm confused as to whether the Lua programming language supports Unicode. http://lua-users.org/wiki/LuaUnicode It appears it does but has limitations. I simply don't understand, are the limitation anything big/key or not a big deal? ...

How can I test if an input field contains foreign characters?

I have an input field in a form. Upon pushing submit, I want to validate to make sure the user entered non-latin characters only, so any foreign language characters, like Chinese among many others. Or at the very least test to make sure it does not contain any latin characters. Could I use a regular expression for this? What would be th...

Is it possible to tweak TStringField to work like TWideStringField in Delphi?

We want to use Unicode with Delphi 2009 and Interbase, and found that to switch the character encoding from WIN1252 to UNICODE_FSS or UTF8 we first have to replace all instances of TStringField with TWideStringField in all datamodules. For around 60 datamodules, we can not simply do this over one weekend. I can see only two options for a...

Non-ascii characters in velocity templates are broken when displayed

Hi! I have non-ascii chars in velocity template files. And when processed they are garbled. The files are saved in UTF-8 encoding and response header contentType is also set to text/html;charset=UTF-8. What else can be done? ...

Unicode strings in my C# App are shown with question marks

Hi, I have a header file in C++/CLI project, which contains some strings in different languages. arabic, english, german, chinese, french, japanese etc... I have a second project written in C#. Here I access the strings stored in the header file of the C++/CLI project. The encoding of the header file is Unicode - Codepage 1200 or UTF...

Can HTTP URIs have non-ASCII characters?

I tried to find this in the relevant RFC, IETF RFC 3986, but couldn't figure it. Do URIs for HTTP allow Unicode, or non-ASCII of any kind? Can you please cite the section and the RFC that supports your answer. NB: For those who might think this is not programming related - it is. It's related to an ISAPI filter I'm building. A...

Anyone ported Snoop Component Suite version 3.0 to Delphi 2010 ? (ie. Unicode issues)

Hi, Has anyone ported "Snoop Component Suite version 3.0" by http://www.netlab.co.kr To Delphi 2010 ? Its a great WinPCap library. Just doesn't work on Delphi 2010 (unicode) Thanks ...

String searching algorithm for Chinese characters.

There are Python code available for existing algorithms for normal string searching e.g. Boyer-Moore Algorithm. I am looking to use this on Chinese characters and it doesn't seem like the same implementation would work. What would I go about doing in order to make the algorithm work on Chinese characters? I am referring to this: http://...

C# Button Text Unicode characters.

C# doesn't want to put Unicode characters on buttons. If I put \u2129 in the Text attribute of the button, the button displays the \u2129, not the Unicode character, (example - I chose 2129 because I could see it in the font currently active on the machine). I saw this question before, link text, but the question isn't really answered, ...

How to get unicodes from Google translation output string.

In google translate web site if i type any word in English and select any other foreign language, it show the exact word in the foreign language. I want the unicode value of that foreign characters. How to get that? ...

Convert or strip out "illegal" Unicode characters

I've got a database in MSSQL that I'm porting to SQLite/Django. I'm using pymssql to connect to the database and save a text field to the local SQLite database. However for some characters, it explodes. I get complaints like this: UnicodeDecodeError: 'ascii' codec can't decode byte 0x97 in position 1916: ordinal not in range(128) Is ...

Convert UTF-8 bytes to some other encoding in Python

I need to do in Python 2.4 (yes, 2.4 :-( ). I've got a plain string object, which represents some text encoded with UTF-8. It comes from an external library, which can't be modified. So, what I think I need to do, is to create an Unicode object using bytes from that source object, and then convert it to some other encoding (iso-8859-2,...

PHP: Convert curl_exec output to UTF8

I would like to only work with UTF8. The problem is I don't know the charset of every webpage. How can I detect it and convert to UTF8? <?php $url = "http://vkontakte.ru"; $ch = curl_init($url); $options = array( CURLOPT_RETURNTRANSFER => true, ); curl_setopt_array($ch, $options); $data = curl_exec($ch); // $data = magic($data); p...

Encoding gives "'ascii' codec can't encode character … ordinal not in range(128)"

I am working through the Django RSS reader project here. The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes. I get the Django error of "'ascii' codec can'...

GCC, Unicode and __FUNCTION__

I'm trying to make my project compile under GCC (Visual Studio compiles it flawlessly). I have a custom assert function which throws a wstring message. A part of it is the _ FUNCTION_ macro, which I "unicodize" using the WIDEN macro from MSDN #define WIDEN2(x) L ## x #define WIDEN(x) WIDEN2(x) It compiles okay in MSVC, but it prints ...

Normalizing (webdav) unicode paths

Hi guys, I'm working on a WebDAV implementation for PHP. In order to make it easier for Windows and other operating systems to work together, I need jump through some character encoding hoops. Windows uses ISO-8859-1 in it's HTTP request, while most other clients encode anything beyond ascii as UTF-8. My first approach was to ignore t...