questions about unicode | ansaurus

unicode

Zend Framework PDF generation unicode issue

Hi all, I have troubles using Zend Framework's PDF When I create PDF file I need to use UTF-8 as encoding. This is the code I am using to generate simple pdf file. I always get this wrong displayed. Instead of seeing 'Faktúra' in pdf file, it gives me 'Faktú' Instead of seeing 'Dodávateľ:' in pdf file, it gives me 'Dodáva' $pdf = new...

Converting LPCWSTR with WideCharToMultiByte. Need help.

i have a function like this: BOOL WINAPI MyFunction(HDC hdc, LPCWSTR text, UINT cbCount){ char AnsiBuffer[255]; int written = WideCharToMultiByte(CP_ACP, 0, text, cbCount, AnsiBuffer , 0, NULL, NULL); if(written > -1) AnsiBuffer[written] = '\0'; if(written>0){ ofstream myfile; myfile.open ("C:\\example.txt", ios::app); myfile.writ...

how do you echo a 4 digit unicode character in bash

I'd like to add the Unicode skull and crossbones to my shell prompt (specifically the 'SKULL AND CROSSBONES' (U+2620)) but I can't figure out the magic incantation to make echo spit it, or any other, 4 digit Unicode character. 2 digit one's are easy echo -e "\x55", for example. In addition to the answers below it should be noted that, o...

wifstream equivalent to _wfopen's "mode" parameter?

I'm having troubles opening a Unicode file in C++ using fstreams instead of the older FILE-based file handling functions. When opening a file using _wfopen, I can specify a mode to tell it what character encoding to use. Eg: _wfopen_s(&file, fileName, unicode ? L"r+, ccs=UTF-16LE" : L"r+" ); This works fine. When using wifstream thoug...

Java equivalent to JavaScript's encodeURIComponent that produces identical output?

I've been experimenting with various bits of Java code trying to come up with something that will encode a string containing quotes, spaces and "exotic" Unicode characters and produce output that's identical to JavaScript's encodeURIComponent function. My torture test string is: "A" B ± " If I enter the following JavaScript statement i...

When must we use NVARCHAR/NCHAR instead of VARCHAR/CHAR in SQL Server?

Is there a rule when we must use the Unicode types? I have seen that most of the European languages (German, Italian, English, ...) are fine in the same database in VARCHAR columns. I am looking for something like: If you have Chinese --> use NVARCHAR If you have German and Arabic --> use NVARCHAR What about the collation of the...

Saving Unicode text to MS Access from VB.Net

Hey i am making a project on VB.NET in which my text boxes have FONT property set to "TERAFONT-VARUN, 12pt " which is for GUJARATI language. Now i want to save the data of text box into my MS ACCESS data base. I also want to retrieve that data for other purpose. Can you please tell me what to do? ...

C++ (Standard) Exceptions and Unicode

I'm running into an issue where I'm processing unicode strings and I want to do some error reporting with standard exceptions. The error messages contained in standard exceptions are not unicode. Usually that hasn't been a problem for me because I can define the error message in non-unicode and have enough information, but in this case...

How do I tell VS 2008 to stop putting byte-order marks in front of my files?

By default, Visual Studio 2008 puts the Unicode byte-order mark in front of any file you save. You can override this on a per-file basis by going to File > Advanced Save Options and picking a different encoding. How do I tell VS to use a default encoding for all files in a particular project or solution? This is drastically screwing up ...

visual-studio-2008

version-control

How to print tuples of unicode strings in original language (not u'foo' form)

I have a list of tuples of unicode objects: >>> t = [('亀',), ('犬',)] Printing this out, I get: >>> print t [('\xe4\xba\x80',), ('\xe7\x8a\xac',)] which I guess is a list of the utf-8 byte-code representation of those strings? but what I want to see printed out is, surprise: [('亀',), ('犬',)] but I'm having an inordinate amount o...

unicode in vb.net

how to use unicode available in vb6 in vb.net?? is there ny equivalent of vb6 unicode in vb.net?? ...

How to convert *.txt file into Unicode

I have a requirement where a client will supply a file in encoding ANSI, but my system can only successfully read a file in UNICODE. So how do I tackle this issue? I know when I "save as" the file into as UNICODE encoded the file gets picked up. It's difficult to make the client comply with our request. So can I have any batch program f...

Why UTF-32 instead of UTF-16 if we have surrogate pairs?

If I understand correctly, UTF-32 can handle every character in the universe. So can UTF-16, through the use of surrogate pairs. So is there any good reason to use UTF-32 instead of UTF-16? ...

surrogate-pairs

Validating a Unicode Name

In ASCII, validating a name isn't too difficult: just make sure all the characters are alphabetical. But what about in Unicode (utf-8) ? How can I make sure there are no commas or underscores (outside of ASCII scope) in a given string? (ideally in Python) ...

form-validation

Converting ú to u in javascript

How would I convert ú into u in javascript. I might possibly need it for other non-english characters too. ...

character-encoding

How can I output UTF-8 from Perl?

I am trying to write a Perl script using the "utf8" pragma, and I'm getting unexpected results. I'm using Mac OS X 10.5 (Leopard), and I'm editing with TextMate. All of my settings for both my editor and operating system are defaulted to writing files in utf-8 format. However, when I enter the following into a text file, save it as a ...

Decoding HTML Entities With Python

The following Python code uses BeautifulStoneSoup to fetch the LibraryThing API information for Tolkien's "The Children of Húrin". import urllib2 from BeautifulSoup import BeautifulStoneSoup URL = ("http://www.librarything.com/services/rest/1.0/" "?method=librarything.ck.getwork&id=1907912" "&apikey=2a2e596b887...

BeautifulSoup gives me unicode+html symbols, rather than straight up unicode. Is this a bug or misunderstanding?

I'm using BeautifulSoup to scrape a website. The website's page renders fine in my browser: Oxfam International’s report entitled “Offside! http://www.coopamerica.org/programs/responsibleshopper/company.cfm?id=271 In particular, the single and double quotes look fine. They look html symbols rather than ascii, though strangely wh...

What is the difference between EM Dash #151; and #8212;?

I've an ASCII file that contains an EM Dash (— or — in HTML). The hex value is 0x97. When we pass this file through one application it arrives as UTF-8, and it converts the character to 0xC297, which is  in HTML. However, when we pass this file through a different application it converts the character to 0xE28094 or —. ...

tell whether a character is a combining diacritic mark

if you're looping though the chars a unicode string in python (2.x), say: ak.sɛp.tɑ̃ How can you tell whether the current char is a combining diacritic mark? For instance, the last char in the above string is actually a combining mark: ak.sɛp.tɑ̃ --> ̃ ...

1
...
13
14
15
16
17
...
104