unicode

What is the default content-type/charset?

According to this answer: http://stackoverflow.com/questions/1020892/python-urllib2-read-to-unicode I have to get the content-type in order to change to unicode. However, some websites don't have a "charset". For example, the ['content-type'] for this page is "text/html". http://bit.ly/6IcCtf/ I can't convert it to unicode. encodi...

I just want to download this URL...but it is giving me an error! ...unicode.. (Python)

theurl = 'http://bit.ly/6IcCtf/' urlReq = urllib2.Request(theurl) urlReq.add_header('User-Agent',random.choice(agents)) urlResponse = urllib2.urlopen(urlReq) htmlSource = urlResponse.read() if unicode == 1: #print urlResponse.headers['content-type'] #encoding=urlResponse.headers['content-type'].split('charset=')[-1] #htmlSour...

Truncating unicode so it fits a maximum size when encoded for wire transfer

Given a Unicode string and these requirements: The string be encoded into some byte-sequence format (e.g. UTF-8 or JSON unicode escape) The encoded string has a maximum length For example, the iPhone push service requires JSON encoding with a maximum total packet size of 256 bytes. What is the best way to truncate the string so that...

Is a wide character string literal starting with L like L"Hello World" guaranteed to be encoded in Unicode?

I've recently tried to get the full picture about what steps to take to create plattform independent C++ applikations that support unicode. A thing that is confusing to me is that most howtos and stuff equalize the character encoding (i.e. ANSI or Unicode) and the character datatype (char or wchar_t). As far as I've learned so far these ...

RichTextBox use to retrieve Text property in C++

I am using a hidden RichTextBox to retrieve Text property from a RichEditCtrl. rtb->Text; returns the text portion of either English of national languages – just great! But I need this text in \u12232? \u32232? instead of national characters and symbols. to work with my db and RichEditCtrl. Any idea how to get from “пассажирским поезд...

Displaying ® symbol in Silverlight.

Folks! I am trying to display ® and superscript TM symbols in my silverlight app. I want to save the text containing the symbols in a resx file. Things i have tried: Copy paste the ® symbol from any document to resx file. ® symbol gets displayed in the resx file. But, when running the silverlight app, xamlparseexception ...

How to pass an unicode char argument to ImageMagick?

Suppose the char of "▣" is in somefont.ttf's glyph table. char = unichr(9635) subprocess.call(['convert', '-font', 'somefont.ttf', '-size', '50x50', '-label:%s' % char, 'output.png']) subprocess.call(['convert', '-font', 'somefont.ttf', '-size', '50x50', ('-label:%s' % char).encode('utf-8'), 'output.png']) Both create an blank imag...

How to detect if a Unicode character maps to the missing-symbol square?

Is there a way to detect whether a Unicode character is present in a font on the iPhone, i.e., to detect whether the character will map to a printable glyph or instead to the square "missing character" symbol? For example, if I want to generate a random Wingding character with this snippet: NSString *s = [NSString stringWithFormat:@"%C...

Python: How to get StringIO.writelines to accept unicode string?

I'm getting a UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 34: ordinal not in range(128) on a string stored in 'a.desc' below as it contains the '£' character. It's stored in the underlying Google App Engine datastore as a unicode string so that's fine. The cStringIO.StringIO.writelines function is t...

Reading characters outside ASCII.

A friend of mine showed me a situation where reading characters produced unexpected behaviour. Reading the character '¤' caused his program to crash. I was able to conclude that '¤' is 164 decimal so it's over the ASCII range. We noticed the behaviour on '¤' but any character >127 seems to show the problem. The question is how would we ...

Python unicode: how to test against unicode string

I have a script like this: #!/Python26/ # -*- coding: utf-8 -*- import sys import xlrd import xlwt argset = set(sys.argv[1:]) #----------- import ---------------- wb = xlrd.open_workbook("excelfile.xls") #----------- script ---------------- #Get the first sheet either by name sh = wb.sheet_by_name(u'Data') hlo = [] for i in range(...

Is there any tools/utility to convert "string" to "AnsiString" in pascal source files?

Delphi 2009 and above support unicode. I have few legacy pascal source files that I wish to make it compile in Delphi 2009/2010 as well as Delphi 2007 and below. A quick and safe way is replace String to AnsiString PChar to PAnsiChar Char to AnsiChar Is there any utility available that able to parse .pas file and make such replacem...

How to use TRACE with ascii under unicode MFC environment?

I am developing a MFC program under windows CE. It is unicode by default. I can use TRACE to print some message like this TRACE(TEXT("Hey! we got a problem!\n")); It works fine if everything is unicode. But however, I got some ascii string to print. For example: // open the serial port m_Context = CreateFile(TEXT("COM1:"), ...); int ...

UrlEncodeUnicode and browser navigation errors

I want to redirect a request to some URL that may or may not contain non-ascii characters (e.g. german umlauts). Doing this with the relevant part of the URL: var url = HttpUtility.UrlEncodeUnicode("öäü.pdf"); // -> "%u00f6%u00e4%u00fc.pdf" and then issuing the redirect: Response.Redirect(url, ...); will not produce the desired be...

How to get unicode month name in Python?

I am trying to get a unicode version of calendar.month_abbr[6]. If I don't specify an encoding for the locale, I don't know how to convert the string to unicode. The example code below shows my problem: >>> import locale >>> import calendar >>> locale.setlocale(locale.LC_ALL, ("ru_RU")) 'ru_RU' >>> print repr(calendar.month_abbr[6]) '\x...

DB2 database using unicode

I have a problem with DB2 databases that should store unicode characters. The connection is established using JDBC. What do I have to do if I would like to insert a unicode string into the database? INSERT INTO my_table(id, string_field) VALUES(1, N'my unicode string'); or INSERT INTO my_table(id, string_field) VALUES(1, 'my unicod...

How can I set the encoding of shell-command-on-region output?

I have a small elisp script which applies Perl::Tidy on region or whole file. For reference, here's the script (borrowed from EmacsWiki): (defun perltidy-command(start end) "The perltidy command we pass markers to." (shell-command-on-region start end "perltidy" t ...

Unicode string literals in C# vs C++/CLI

C#: char z = '\u201D'; int i = (int)z; C++/CLI: wchar_t z = '\u201D'; int i = (int)z; In C# "i" becomes, just as I expect, 8221 ($201D). In C++/CLI on the other hand, it becomes 65428 ($FF94). Can some kind soul explain this to me? EDIT: Size of wchar_t can not be of issue here, because: C++/CLI: wchar_t z = (wchar_t)8221; int i = (...

Is an update to D2010 really meaningful

I am trying to migrate my own projects to delphi 2010. But it seems to be very difficult. I use TntControls for old projects. If I remove this library, some runtime functions must be re-implemented by myself. For instance: convert UnicodeString to a specified code page. The "SizeOf", "Length", FillChar() still confuse me. Compiler wil...

PHP: Make Site Unicode Compatible

Hello, How can i make my site unicode compatible to support more languages other than english. Thanks ...