questions about unicode | ansaurus

unicode

Why does SQL Server consider N'㐢㐢㐢㐢' and N'㐢㐢㐢' to be equal?

We are testing our application for Unicode compatibility and have been selecting random characters outside the Latin character set for testing. On both Latin and Japanese-collated systems the following equality is true (U+3422): N'㐢㐢㐢㐢' = N'㐢㐢㐢' but the following is not (U+30C1): N'チチチチ' = N'チチチ' This was discovered when a test c...

Unicode string turns garbage at serverside.

I have a situation. I have a label in ASP.NET 2.0(C#). The label should display a dutch language text that is "Sähköpostiosoite", I tried setting the Label.Text both from markup and code-behind but what I see in the browser response is "SÃ¤hkÃ¶postiosoite". Originally assigned string "Sähköpostiosoite" get replaced with "SÃ¤hkÃ¶postios...

How can i convert from wostream to ostream

i am using a function that receives ostream but i have wostream is there a way to convert one to the other? in particular i want to use boost::write_graphviz which takes ostream but i currently in << operator for wostream. ...

Conversion of strings like \\uXXXX in python

I have a string like \uXXXX (representation) and I need to convert it into unicode. I receive it from 3rd party service so python interpreter doesn't convert it and I need conversion in my code. How do I do it in Python? >>> s u'\\u0e4f\\u032f\\u0361\\u0e4f' ...

Python and hebrew encoding/decoding error

Hey, I have sqlite database which I would like to insert values in Hebrew to I am keep getting the following error : UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0: ordinal not in range(128) my code is as following : runsql(u'INSERT into personal values(%(ID)d,%(name)s)' % {'ID':1,'name':fabricate_heb...

How to convert Unicode strings (\u00e2, etc) into NSString for display?

I am trying to support arbitrary unicode from a variety of international users. They have already put a bunch of data into sqlite databases on their iPhones, and now I want to capture the data into a database, then send it back to their device. Right now I am using a php page that is sending data back to from an internet mysql database. ...

Unicode escape characters not being read by XmlReader

I've got an XML document that I'm importing into an XmlReader that has some unicode formatting I need to preserve. I'm preserving the whitespace but it's dropping the encoded #x2028 which I assume should be expressed as a line break. Here's my code: var settings = new XmlReaderSettings { Prohi...

lxml unicode entity parse problems

I'm using lxml as follows to parse an exported XML file from another system: xmldoc = open(filename) etree.parse(xmldoc) But im getting: lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46 Obviously it's having problems with unicode entity names - but how would i get round this? Via open() or parse...

Regular expression of unicode characters on string

I'm working in C# doing some OCR work and have extracted the text I need to work with. Now I need to parse a line using Regular Expressions. string checkNum; string routingNum; string accountNum; Regex regEx = new Regex(@"\u9288\d+\u9288"); Match match = regEx.Match(numbers); if (match.Success) checkNum = match.Value.Remove(0, 1).R...

Writing UTF8 text to file

I am using the following function to save text to a file (on IE-8 w/ActiveX). function saveFile(strFullPath, strContent) { var fso = new ActiveXObject( "Scripting.FileSystemObject" ); var flOutput = fso.CreateTextFile( strFullPath, true ); //true for overwrite flOutput.Write( strContent ); flOutput.Close(); } The code...

internet-explorer

internationalization

Dreaded python encoding errors, how to stop them?

These have been plaguing me endlessly. Why? It seems that my console can't handle the encoding. I take it that the my browser and word processor can handle it. I don't have a master list of all the possible characters that it's choking on. What is the best way to relieve this without modifying my data? 'charmap' codec can't encode chara...

character-encoding

Flexible string handling in Visual Studio 2008 C++

I'm slowly starting to get the hang of the _T stuff in Visual Studio 2008 c++, but a few things still elude me. I can see the benefit of the flexibility, but if I can't get the basics soon, I think I'll go back to the standard way of doing this - much less confusing. The idea with the code below is that it scans the parameters for -d a...

visual-studio-2008

Output Unicode to Console Using C++

I'm still learning C++, so bear with me and my sloppy code. The compiler I use is Dev C++. I want to be able to output Unicode characters to the Console using cout. Whenver i try things like: # #include directive here (include iostream) using namespace std; int main() { cout << "Hello World!\n"; cout << "Blah blah blah some ...

Can I turn off implicit Python unicode conversions to find my mixed-strings bugs?

When profiling our code I was surprised to find millions of calls to C:\Python26\lib\encodings\utf_8.py:15(decode) I started debugging and found that across our code base there are many small bugs, usually comparing a string to a unicode or adding a sting and a unicode. Python graciously decodes the strings and performs the followin...

Allowed unicode characters in IDN host labels

Hi all, Im currently working on a "proper" URI validator and currently it all comes down to hostname validation, the rest isnt that tricky. Im stuck at IDN hostname labels (e.g. containing unicode; possible punycode encoded strings have been decoded at this point). My first idea was basicly a regex for TLD's not supporting IDN and one...

Wrong reading file in UNICODE (fread) on C++

Hello, I'm trying to load into string the content of file saved on the dics. The file is .CS code, created in VisualStudio so I suppose it's saved in UTF-8 coding. I'm doing this: FILE *fConnect = _wfopen(connectFilePath, _T("r,ccs=UTF-8")); if (!fConnect) return; fseek(fConnect, 0, SEEK_END); lSize = ftell(fConnect)...

What new Unicode functions are there in C++0x?

It has been mentioned in several sources that C++0x will include better language-level support for Unicode(including types and literals). If the language is going to add these new features, it's only natural to assume that the standard library will as well. However, I am currently unable to find any references to the new standard librar...

standard-library

What's the fastest way to strip and replace a document of high unicode characters using Python?

I am looking to replace from a large document all high unicode characters, such as accented Es, left and right quotes, etc., with "normal" counterparts in the low range, such as a regular 'E', and straight quotes. I need to perform this on a very large document rather often. I see an example of this in what I think might be perl here: ht...

text-processing

Show escaped string as Unicode in Python

Hello, i have just known Python for few days. Unicode seems to be a problem with Python. i have a text file stores a text string like this '\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1' i can read the file and print the string out but it displays incorrectly. How can i print it out to screen correctly ...

escaped-characters

How can I convert japanese characters to unicode in Perl?

Can you point me tool to convert japanese characters to unicode? ...

1
...
71
72
73
74
75
...
104