Guido van Rossum's presentation about Python 3000 mentions several things to make a transition from Python 2 to Python 3 easier eventually. He is specifically talking about text handling since the move to Unicode as the only representation of strings in Python 3 is one of the major changes.
As far as text handling goes, one slide (#14) ...
I execute following code on windows xp and python 2.6.4
But it show IOError.
How to open file whose name has utf-8 codec.
>>> open( unicode('한글.txt', 'euc-kr').encode('utf-8') )
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
open( unicode('한글.txt', 'euc-kr').encode('utf-8') )
IOError: [Errno 22] inva...
I need to take a string, and shorten it to 140 characters.
Currently I am doing:
if len(tweet) > 140:
tweet = re.sub(r"\s+", " ", tweet) #normalize space
footer = "… " + utils.shorten_urls(post['url'])
avail = 140 - len(footer)
words = tweet.split()
result = ""
for word in words:
word += " "
if l...
Need to extract the initial character from a Korean word in MS-Excel and MS-Access.
When I use Left("한글",1) it will return the first syllable i.e 한, what I need is the initial character i.e ㅎ .
Is there a function to do this? or at least an idiom?
If you know how to get the Unicode value from the String I'd be able to work it out from...
I can see some duplicate characters in Unicode. For example, the character 'C' can be represented by the code points U+0043 and U+0421. Why is this so?
...
Hi,
I am trying to pull data from another site and i am getting unicode characters in my result like this
Amazon RDS – The Beginner’s Guide
how can i decode it in php?
Can someone help?
Thanks in advance
...
We have XML content that uses Wingdings to display ticks and possibly other characters. Our web content is generated dynamically from the XML content by an application written in Delphi.NET. It currently outputs <span style="font-family: Wingdings;">ü</span> which displays the tick correctly in Internet Explorer and Chrome, but displays...
Hi,
I need to implement a character encoding conversion function in C++ or C( Most desired ) from a custom encoding scheme( to support multiple languages in single encoding ) to UTF-8.
Our encoding is pretty random , it looks like this
Because of the randomness of this mapping, I am thinking to use std::map for mapping our encoding t...
How can I spool data from a table to a file which contains Unicode characters?
I have a sql file which I execute from SQL*Plus screen and its content is:
SET ECHO OFF
SET FEEDBACK OFF
SET HEADING OFF
SET PAGESIZE 0
SPOOL STREET_POINT_THR.BQSV
SELECT GEOX||'`'||GEOY||'`'||UNICODE_DESC||'`'||ASCII_DESC
FROM GEO.STREET_POINTS;
SPOOL OFF
...
I've got an SSRS report that contains a table and a chart. In the table, the name Café shows up with no problem, but in the chart it gets escaped (rendered as Café).
I'm not having any luck finding a solution to this, but I'm aware that its probably something easy I'm doing wrong/overlooking. Can anyone provide some insight?
...
Hi all,
I have the following radio box:
<input type="radio" value="香">香</input>
As you can see, the value is unicode. It represents the following Chinese character: 香
So far so good.
I have a VBScript that reads the value of that particular radio button and saves it into a variable. When I display the content with a mes...
I am working on Korean Document and the HTML Source Code contains special symbols starting with &#char(w) e.g 껰 Now I would like to convert this symbol to its Unicode represntation.
Is there a way to do so.
...
The thing which I want to ask is pretty simple. I am haveing an html document which is hosted in webbrowser control.
Now when I select a word "Korean word" using the MSHTML range property I am able to get
range.htmlText and range.Text both shows the "Korean word", all I want to do is to convert it to unicode format.
Is it possoble.
F...
I've tracked a problem I'm having down to the following inexplicable behaviour within the .NET System.Text.Encoding class:
byte[] original = new byte[] { 128 };
string encoded = System.Text.Encoding.UTF8.GetString(original);
byte[] decoded = System.Text.Encoding.UTF8.GetBytes(encoded);
Console.WriteLine(original[0] == decoded[0]);
Am ...
I get this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 4: ordinal not in range(128)
I tried setting many different codecs (in the header, like # -- coding: utf8 --), or even using u"string", but it still appears.
How do I fix this?
Edit: I don't know the actual character that's causing this, but since...
I'm using boost::filesystem for cross-platform path manipulation, but this breaks down when calls need to be made down into interfaces I don't control that won't accept UTF-8. For example when using the Windows API, I need to convert to UTF-16, and then call the wide-string version of whatever function I was about to call, and then conve...
Okay, I'm trying to work with UTF8 text files. I'm constantly fighting the BOF chars that the writer drops in for UTF8, which blows up pretty much anything I need to use to read the file including serializers and other text readers.
I'm getting a leading six bytes of data:
0xEF
0xBB
0xBF
0xEF
0xBB
0xBF
(now that I'm looking at it...
Hi there,
I've got an international character stored in a unichar variable. This character does not come from a file or url. The variable itself only stores an unsigned short(0xce91) which is in UTF-8 format and translates to the greek capital letter 'A'. I'm trying to put that character into an NSString variable but i fail miserably.
...
I have upgraded to Python 3 and can't figure out how to convert backslash escaped newlines to HTML. The browser renders the backslashes literally, so "\n" has no effect on the HTML source. As a result, my source page is all in one long line and impossible to diagnose.
I spent hours searching for the solution to no avail. Can anyone help...
I need to decide whether to render geometric symbols in a web GUI (e.g. arrows and triangles for buttons, menus, etc.) as Unicode symbols (MUCH easier and color-independent) or GIF/PNG files (lots of hassle I would like to avoid).
However, I have seen Windows clients that have trouble displaying even advanced punctuation symbols declare...