unicode

Export MS Access Memo field and convert Unicode

I have an Access 2003 database. A table has a Memo field and I'm having issues with getting that data out. Exporting that field to a txt or csv chops that field off (255 characters) Exporting as Excel gives me strange characters for linebreaks Appending to a mysql database via myODBC gives an error about "incorrect string" Using VBA w...

How to convert html entities into symbols?

Hi, I have made some adaptations to the script from this answer. and I am having problems with unicode. Some of the questions end up being written poorly. Some answers and responses end up looking like: Yeah.. I know.. I’m a simpleton.. So what’s a Singleton? (2) How can I make the ’ to be translated to the right cha...

copyright character in vim

I used to get this copyright symbol in vim earlier through some keys' combination. Can someone help me with it now? I simply fail to recollect it. Also, if possible, share some more of such characters... someone might need it sometime. ...

Delphi 2009: Search skipping diacritics in unicode utf-8

I am having utf-8 encoded file containing arabic text and I have to search it. My problem are diacritics, how to search skipping them? Like if you load that text in Internet Explorer (converting text in HTML ofcourse ), IE is skipping those diacritics? Any help? Edit1: Search is simply performed by following code: var m1 : TMemo; /...

Converting TMemoryStream to String in Delphi 2009

We had the following code previous to Delphi 2009: function MemoryStreamToString(M: TMemoryStream): String; var NewCapacity: Longint; begin if (M.Size = 0) or (M.Memory = nil) then Result:= '' else begin if TMemoryStreamProtected(M).Capacity = M.Size then begin NewCapacity:= M.Size+1; TMemoryStreamProtec...

Ellipsis unviewable in HTML

When pulling data from a MySQL database onto a web page, all ellipsis's(...) in the data are displayed with a � in firefox or a square box in IE7. Has anyone ever encountered this problem before? Thanks. update 1: I just changed the original ellipsis '…' with '...' (three dots) and now it works? Any idea what this could be? ...

Small open source Unicode library for C/C++

Does anyone know of a great small open source Unicode handling library for C or C++? I've looked at ICU, but it seems way too big. I need the library to support: all the normal encodings normalization finding character types - finding if a character should be allowed in identifiers and comments validation - recognizing nonsense ...

Looking for a good tutorial for ICU

I was looking recently for a toolkit/library with good unicode support. I had checked ICU, Qt3, Qt4 and Glib. Unfortunalty all of them with exception of ICU had some missing features or had implemented them incorrectly. Unfortunalty, ICU library has quite bad documentation and is very hard to use because it ignores most of modern C++ de...

How to work with unicode in Python

I am trying to clean all of the HTML out of a string so the final output is a text file. I have some some research on the various 'converters' and am starting to lean towards creating my own dictionary for the entities and symbols and running a replace on the string. I am considering this because I want to automate the process and ther...

Some Basic Python Questions

I'm a total python noob so please bear with me. I want to have python scan a page of html and replace instances of Microsoft Word entities with something UTF-8 compatible. My question is, how do you do that in Python (I've Googled this but haven't found a clear answer so far)? I want to dip my toe in the Python waters so I figure somet...

Where are the fields documented for the unicode.org file "UnicodeData.txt"?

I cannot find documentation for the actual fields of the UnicodeData.txt file. The data is available here. The document describing it is available here but it doesn't list the actual field numbers and what the field is (like used to be in the document around version 3.0). I've searched the site and must be missing something that is rig...

Suppress the u'prefix indicating unicode' in python strings

Is there a way to globally suppress the unicode string indicator in python? I'm working exclusively with unicode in an application, and do a lot of interactive stuff. Having the u'prefix' show up in all of my debug output is unnecessary and obnoxious. Can it be turned off? ...

What's the difference between utf8_general_ci and utf8_unicode_ci

Between utf8_general_ci and utf8_unicode_ci, are there any differences in terms of performance? ...

COM server AnsiString parameters in Delphi 2009

I have a simple COM dll with a method that takes two strings. In the type library editor of delphi these strings are defined as LPSTR. This translates to PChar in the TLB file. When upgrading from D2007 to D2009 this became a problem since PChar now has changed from PAnsiChar to PWideChar (it still becomes PChar in the TLB file when it i...

non-unicode WM_CHAR in unicode windows

I have written a DLL which exports a function that creates a window using RegisterClassExW and CreateWindowExW. Every message is retrieved via GetMessageW(&msg, wnd_handle, 0, 0); TranslateMessage(&msg); DispatchMessageW(&msg); Also there is a program which loads the DLL and calls the function. Despite the Unicode window creation m...

XML encoding issue

Hello everyone, I want to know whether there is quick way to find whether an XML document is correctly encoded in UTF-8 and does not contains any characters which is not allowed in XML UTF-8 encoding. <?xml version="1.0" encoding="utf-8"?> thanks in advance, George EDIT1: here is the content of my XML file, in both text form and in ...

UTF-8 to EBCDIC in Java

Hello, Our requirement is to send EBCDIC text to mainframe. We have some chinese characters thus UTF8 format. So, is there a way to convert the UTF-8 characters to EBCDIC? Thanks, Raj Mohan ...

C#: Unicode from a string with MySQL

I'm trying to insert a string into a MySQL database. I can insert it by running the query on the server, but when I try to use my C# source file to insert "Iñtërnâtiônàlizætiøn", I get "Iñtërnâtiônàlizætiøn". I've tried adding it as a parameter and adding ;charset=utf8 to my connection string, but no look. The table in the databas...

In PHP, how do I deal with the difference in encoded filenames on HFS+ vs. elsewhere?

I am creating a very simple file search, where the search database is a text file with one file name per line. The database is built with PHP, and matches are found by grepping the file (also with PHP). This works great in Linux, but not on Mac when non-ascii characters are used. It looks like names are encoded differently on HFS+ (Ma...

How does a file with Chinese characters know how many bytes to use per character?

I have read Joel's article "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" but still don't understand all the details. An example will illustrate my issues. Look at this file below: I have opened the file in a binary editor to closely examine the last of t...