questions about unicode | ansaurus

unicode

Using Wide Character Constants with clang Gets "extraneous characters in wide character constant ignored" Error

I recently decided to switch to clang from gcc and I’m getting the following warning for my use of wide character constants: "extraneous characters in wide character constant ignored". Here is the code that gets the warning: wstring& line; … for (wstring::iterator ch = line.begin(); ch != line.end(); ++ch) switch (*ch) { cas...

Microsoft Access TransferText function: problem with codepage

I inherited a huge, bulky MS Access database and am assigned to solve a problem in it. The problem is as follow... System A exports its data to a pipeline-delimited .txt file. The files has special characters working correctly, for example the value "Müller" shows when opening this file in notepad or Excel. Next, the Access DB imports ...

special-characters

Transform an unicode plain text to common String

Hi, I got an unicode string from an external server like this: 005400610020007400650020007400ED0020007400FA0020003F0020003A0029 and I have to decode it using java. I know that the '\u' prefix make the magic (i.e. '\u0054' -> 'T'), but I don't know how transform it to use as a common string. Thanks in advance. Edit: Thanks to ever...

Lucene Search Problem

I have built an index on my database rows (Each row as a document) which are of unicode type in MySQL(i.e. Charset: utf8 and Collation: utf8-bin). But When I search any word English or non-English it gives me no answers. It says: 0 total matching documents My code is the demo code of lucene for search except that I have changed fie...

Mapping between Wingdings/Symbol characters and their Unicode equivalents

MsWord uses Wingdings and Symbol characters for bullets, by default their hex values are F0A7 and F0B7. I want to convert the bullets to their Unicode equivalents. Of course, it depends on the actual font used, so F0A7 Wingding would become Unicode 25AA (●). I've found a partial mapping from Wingdings to Unicode and from Symbol to Uni...

Lucene Search Prolem with Unicode Characters

I have indexed a database of some texts and the database texts are of unicode encoding. When I search an english word with lucene search everything goes OK. But when I use a non-English query like: "تو" it gives me the following exception: Exception in thread "main" org.apache.lucene.queryParser.ParseException: Cannot parse '??': '' ...

Unicode not converted when displayed

Hi, I'm localizing an app to spanish, and characters are encoded in the Localizable.strings file for that language using Unicode. For example, I have the entry: "login.saveSettings"="Guardar configuraci\\u00F3n:"; which is displayed in a UILabel exactly like that ("Guardar configuraci\\u00F3n:"), instead of "Guardar configuración:". I t...

nslocalizedstring

Unicode, char pointers and wcslen

Hi, I've been trying to use the cpw library for some graphics things. I got it set up, and I seem to be having a problem compiling. Namely, in a string header it provides support for unicode via #if defined(UNICODE) | defined(_UNICODE) #define altstrlen wstrlen #else #define altstrlen strlen Now, there's no such thing as wstrlen o...

How can I replace a Python 2.65 UCS-2 build with one built using UCS-4 without losing everything in my site-packages?

I downloaded the Python 2.6.5 source, built it for OS 10.6.4 64-bit, and installed numerous dependencies. I opened a big project our team has been working on recently, ran the unit tests, and one of the tests failed because I had installed Python built using UCS-2 (I didn't know this was the default of OS X!) In a nutshell: I didn't sup...

downgrade non-ascii symbols to closest 7-bit ASCII equivalent (preferrably Java)

Hello there, is there any simple/lightweight solution to change at least some of non-ASCII symbols to respective ASCII analogs? For example this string abc-åäö.txt should be changed to abc-aao.txt A bit of background: Zip-tools do not reliably support UTF-8, hence the need to downgrade. AFAICR Google "download attachments as sin...

character-encoding

Convert TCHAR * -> std::wstring in both unicode and non-unicode environments

I have some code in a library which has to internally work with wstring, that's all nice and fine. But it's called with a TCHAR string parameter, from both unicode and non-unicode projects, and I'm having trouble finding a neat conversion for both cases. I see some ATL conversions and so on but can't see the right way, without defining ...

C: sscanf problem

Hi I have a text file like this: 2 A 10 5 B 31 2 C 6 6 I want to read first line number in a variable and read each line's space separated list of 3 values in 3 variables. I wrote this code: iF=fopen(fileName,"r"); fgets(tmp,255,iF); sscanf(tmp,"%d",&interval); while(!feof(iF)){ cur=(P *)malloc(sizeof(P)); fgets(tmp,255,i...

How to pdflatex with CJK characters/font/encoding

What's the best way to combine pdflatex with CJK characters/font/encoding? I'd like to generate pdf that includes CJK characters, and in the future all possible unicode characters. I'm thinking about using 'The CJK package for LaTeX' for cjk characters specifically but it seems not to be maintained since 2006. Can you suggest somethi...

How do I find "wide characters" printed by perl?

A perl script that scrapes static html pages from a website and writes them to individual files appears to work, but also prints many instances of wide character in print at ./script.pl line n to console: one for each page scraped. However, a brief glance at the html files generated does not reveal any obvious mistakes in the scraping. ...

screen-scraping

Rendering japanese text in flex.

Flex documentation says that we need to include fonts for japanese characters, can't flashplayer access fonts from the system in which it is running.If my flex application has to support all languages, should I embed entire font library into swf-file? In my case data is fed from mysql, so I can't fallback to runtime loading. Is there any...

Unicode class names in C# - why do some work, when others don't?

I'm wondering why this is. I have two unicode characters from the same group Ll, which is allowed according to the specs: http://msdn.microsoft.com/en-us/library/aa664670%28VS.71%29.aspx One of them works, the other gives a compile error, and I can't find any documentation on why this is: This works: U+0467 CYRILLIC SMALL LETTER LITT...

special-characters

WideCharToMultiByte problem

I have the lovely functions from my previous question, which work fine if I do this: wstring temp; wcin >> temp; string whatever( toUTF8(getSomeWString()) ); // store whatever, copy, but do not use it as UTF8 (see below) wcout << toUTF16(whatever) << endl; The original form is reproduced, but the in between form often contains extr...

Unicode characters become question marks after inserting into database

When I insert some text written in unicode into database, they become question marks. Database encoding is set to utf-8. What else may be incorrect? When I check in phpmyadmin there are question marks inserted only! This is the code I use for connecting to database: define ("DB_HOST", "localhost"); // set database host define ("DB_USER...

isalpha equivalent for wchar_t

what is the equivalent function for isalpha or isalnum using wchar_t? wctype ? an example would be nice also thanks ...

strings in hebrew in python for s60

I'm using python for S60. I want to use string in hebrew, to represent them on the GUI and to send them in SMS message. It seems that the PythonScriptShell don't accept such expressions, for example: u"אבגדה" what can I do? thanks development of situation: I added the line: # -*- coding: utf-8 -*- as the first line in the source ...

1
...
86
87
88
89
90
...
104