unicode

C++: Chr() and unichr() equivalent?

I could have sworn I used a chr() function 40 minutes ago but can't find the file. I know it can go up to 256 so I use this: std::string chars = ""; chars += (char) 42; //etc So that's alright, but I really want to access unicode characters. Can I do (w_char) 512? Or maybe something just like the unichr() function in python, I just ca...

$_SERVER['HTTP_REFERER'] vs Request.ServerVariables("HTTP_REFERER")

Why $_SERVER['HTTP_REFERER'] (PHP) and Request.ServerVariables("HTTP_REFERER") (ASP) return different result if query string has non english characters? php return correct value but asp will not: php: сабака asp: ׁ׀°׀±׀°׀÷׀° ...

How to add a '-' apex in Python

I have a problem: i can't find the '-' apex character... i'm writing code on math function: and i want to insert representation like ², ³. i found that print '\xb2, \xb3' work good. now, i have to insert negative numbers at the apex, like :¯². so, i need the ¯ charachter. How can i find that? ...

How do I fix this unicode/cPickle error in Python?

ids = cPickle.loads(gem.value) loads() argument 1 must be string, not unicode ...

Removing right-to-left mark and other unicode characters from input in Python

I am writing a forum in Python. I want to strip input containing the right-to-left mark and things like that. Suggestions? Possibly a regular expression? ...

How do I read UTF-8 characters via a pointer?

Suppose I have UTF-8 content stored in memory, how do I read the characters using a pointer? I presume I need to watch for the 8th bit indicating a multi-byte character, but how exactly do I turn the sequence into a valid Unicode character? Also, is wchar_t the proper type to store a single Unicode character? This is what I have in ...

Differentiate between TCHAR and _TCHAR

What are the various differences between the two symbols TCHAR and _TCHAR type defined in the Windows header tchar.h? Explain with examples. Briefly describe scenarios where you would use TCHAR as opposed to _TCHAR in your code. (10 marks) ...

.net Regular Expression to match any kind of letter from any language

Which regular expression can I use to match (allow) any kind of letter from any language I need to match any letter including any diacritics (e.g. á, ü, ñ, etc.) and exlude any kind of symbol (math symbols, currency signs, dingbats, box-drawing characters, etc.) and punctuation characters. I'm using asp.net MVC 2 with .net 4. I've trie...

What .NET UnmanagedType is Unicode (UTF-16)?

I am packing bytes into a struct, and some of them correspond to a Unicode string. The following works fine for an ASCII string: [StructLayout(LayoutKind.Sequential)] private struct PacketBytes { [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 64)] public string MyString; } I assumed that I could do [StructLayout(LayoutKind....

Python unicode Decode Error SUDs

OK so I have # -*- coding: utf-8 -*- at the top of my script and it worked for being able to pull data from the database that had funny chars(Ñ ,Õ,é,—,–,’,…) in it and store that data into variables...but I have run into other problems, see I pull my data, organize it, and then dump it into a variables like so: title = product[1] Wher...

Which Perl moudle can handle variety of date formats with unicode characters ?

My requirement is parsing xml files which contains wide varieties of timestamps based on the locales at which they are written. They may contain Unicode characters in case of Chinese or Korean locales. I have to parse these timestamps and put then in a standard format something like 2009-11-26 12:40:54 to put them in a oracle database. S...

How to diagnose, and reverse (not prevent) Unicode mangling

Somewhere upstream of me, "something" happened that looks like unicode mangling. One symptom is that a lowercase u umlaut (ü) gets converted to "ü" (ie, character FC gets converted to C3 BC). Assuming that I have no control over this upstream process, how can I reverse-engineer what's going on? And if that is possible, can I crank the s...

Qt and unicode escape string.

I'm getting from server data using signal and slot. Here is slot part: QString text(this->reply->readAll()); Problem is, that in text variable will be unicode escape, for example: \u043d\u0435 \u043f\u0430\u0440\u044c\u0441\u044f ;-) Is there any way to convert this? ...

Javascript parse error on '\u2028' unicode character

Whenever I use the \u2028 character literal in my javascript source with the content type set to "text/html; charset=utf-8" I get a javascript parse errors. Example: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"&gt; <html lang="en"> <head> <meta http-equiv="Content-Type" content="text...

Validate Unicode String and Escape if Unicode is Invalid (C/C++)

I have a program that reads arbitrary data from a file system and outputs results in Unicode. The problem I am having is that sometimes filenames are valid Unicode and sometimes they aren't. So I want a function that can validate a string (in C or C++) and tell me if it is a valid UTF-8 encoding. If it is not, I want to have the invalid ...

Most Lite-Weight XML Parser with XPath and Wide-char Support

I want a lite-weight C++ XML parser/DOM that: Can take UTF-8 as input, and parse into UTF-16. Maybe it does this directly (ideal!), or perhaps it provides a hook for the conversion (such as taking a custom stream object that does the conversion before parsing). Offers some XPath support. I've been looking at RapidXML, the Kranf xmlP...

Is Django double encoding a Unicode (utf-8?) string?

I'm having trouble storing and outputting an ndash character as UTF-8 in Django. I'm getting data from an API. In raw form, as retrieved and viewed in a text editor, given unit of data may be similar to: "I love this detergent \u2013 it is so inspiring." (\u2013 is & ndash; as an html entity). If I get this straight from an API and...

What's HTML character code 8203?

What does the character code (HTML) &#8203;? I found it in one of my jQuery scripts and wondered what it was.. Thanks. Edit: Here is the script it was in (it was added to the end, found it in Firebug) {literal} <script src="http://code.jquery.com/jquery-latest.js" type="text/javascript"></script> <script type="text/javascript"> var $...

C++: Join an array of WCHAR[]s?

I have an array of WCHAR[]s. How can I join them? I know the array length. [L"foo", L"bar"] => "foo, bar" ...

How to test if a string has a certain unicode char?

Supose you have a command line executable that receives arguments. This executalbe is widechar ready and you want to test if one of this arguments starts with an HYPHEN case in which its an option: command -o foo how you could test it inside your code if you don't know the charset been used by the host? Should be not possible to a give...