unicode

Python: data vs. text?

Guido van Rossum's presentation about Python 3000 mentions several things to make a transition from Python 2 to Python 3 easier eventually. He is specifically talking about text handling since the move to Unicode as the only representation of strings in Python 3 is one of the major changes. As far as text handling goes, one slide (#14) ...

File open error by using codec utf-8 in python

I execute following code on windows xp and python 2.6.4 But it show IOError. How to open file whose name has utf-8 codec. >>> open( unicode('한글.txt', 'euc-kr').encode('utf-8') ) Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> open( unicode('한글.txt', 'euc-kr').encode('utf-8') ) IOError: [Errno 22] inva...

Python: Split unicode string on word boundaries

I need to take a string, and shorten it to 140 characters. Currently I am doing: if len(tweet) > 140: tweet = re.sub(r"\s+", " ", tweet) #normalize space footer = "… " + utils.shorten_urls(post['url']) avail = 140 - len(footer) words = tweet.split() result = "" for word in words: word += " " if l...

how to extract characters from a Korean string in VBA

Need to extract the initial character from a Korean word in MS-Excel and MS-Access. When I use Left("한글",1) it will return the first syllable i.e 한, what I need is the initial character i.e ㅎ . Is there a function to do this? or at least an idiom? If you know how to get the Unicode value from the String I'd be able to work it out from...

Why there are duplicate characters in unicode

I can see some duplicate characters in Unicode. For example, the character 'C' can be represented by the code points U+0043 and U+0421. Why is this so? ...

how to decode unicode characters in php

Hi, I am trying to pull data from another site and i am getting unicode characters in my result like this Amazon RDS – The Beginner’s Guide how can i decode it in php? Can someone help? Thanks in advance ...

If I know the font and the character, how can I programatically find the Unicode equivalent?

We have XML content that uses Wingdings to display ticks and possibly other characters. Our web content is generated dynamically from the XML content by an application written in Delphi.NET. It currently outputs <span style="font-family: Wingdings;">ü</span> which displays the tick correctly in Internet Explorer and Chrome, but displays...

Character decoding Conversion Function Implementation

Hi, I need to implement a character encoding conversion function in C++ or C( Most desired ) from a custom encoding scheme( to support multiple languages in single encoding ) to UTF-8. Our encoding is pretty random , it looks like this Because of the randomness of this mapping, I am thinking to use std::map for mapping our encoding t...

Oracle Unicode Spooling

How can I spool data from a table to a file which contains Unicode characters? I have a sql file which I execute from SQL*Plus screen and its content is: SET ECHO OFF SET FEEDBACK OFF SET HEADING OFF SET PAGESIZE 0 SPOOL STREET_POINT_THR.BQSV SELECT GEOX||'`'||GEOY||'`'||UNICODE_DESC||'`'||ASCII_DESC FROM GEO.STREET_POINTS; SPOOL OFF ...

SSRS + e-accute (é) + Chart = Rendering issue

I've got an SSRS report that contains a table and a chart. In the table, the name Café shows up with no problem, but in the chart it gets escaped (rendered as Caf&#233;). I'm not having any luck finding a solution to this, but I'm aware that its probably something easy I'm doing wrong/overlooking. Can anyone provide some insight? ...

Unicode, VBScript and HTML

Hi all, I have the following radio box: <input type="radio" value="&#39321;">&#39321;</input> As you can see, the value is unicode. It represents the following Chinese character: 香 So far so good. I have a VBScript that reads the value of that particular radio button and saves it into a variable. When I display the content with a mes...

Convert &#char(w); to \uxxxx C#

I am working on Korean Document and the HTML Source Code contains special symbols starting with &#char(w) e.g 껰 Now I would like to convert this symbol to its Unicode represntation. Is there a way to do so. ...

Convert Korean Text to Unicode

The thing which I want to ask is pretty simple. I am haveing an html document which is hosted in webbrowser control. Now when I select a word "Korean word" using the MSHTML range property I am able to get range.htmlText and range.Text both shows the "Korean word", all I want to do is to convert it to unicode format. Is it possoble. F...

System.Text.Encoding isn't

I've tracked a problem I'm having down to the following inexplicable behaviour within the .NET System.Text.Encoding class: byte[] original = new byte[] { 128 }; string encoded = System.Text.Encoding.UTF8.GetString(original); byte[] decoded = System.Text.Encoding.UTF8.GetBytes(encoded); Console.WriteLine(original[0] == decoded[0]); Am ...

Python, UnicodeDecodeError

I get this error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 4: ordinal not in range(128) I tried setting many different codecs (in the header, like # -- coding: utf8 --), or even using u"string", but it still appears. How do I fix this? Edit: I don't know the actual character that's causing this, but since...

Cross platform unicode path handling

I'm using boost::filesystem for cross-platform path manipulation, but this breaks down when calls need to be made down into interfaces I don't control that won't accept UTF-8. For example when using the Windows API, I need to convert to UTF-16, and then call the wide-string version of whatever function I was about to call, and then conve...

UTF8 Beginning of File characters are breaking serializer & readers

Okay, I'm trying to work with UTF8 text files. I'm constantly fighting the BOF chars that the writer drops in for UTF8, which blows up pretty much anything I need to use to read the file including serializers and other text readers. I'm getting a leading six bytes of data: 0xEF 0xBB 0xBF 0xEF 0xBB 0xBF (now that I'm looking at it...

How to convert a unichar value to an NSString in Objective-C?

Hi there, I've got an international character stored in a unichar variable. This character does not come from a file or url. The variable itself only stores an unsigned short(0xce91) which is in UTF-8 format and translates to the greek capital letter 'A'. I'm trying to put that character into an NSString variable but i fail miserably. ...

Python 3: Write newlines to HTML

I have upgraded to Python 3 and can't figure out how to convert backslash escaped newlines to HTML. The browser renders the backslashes literally, so "\n" has no effect on the HTML source. As a result, my source page is all in one long line and impossible to diagnose. I spent hours searching for the solution to no avail. Can anyone help...

Unicode support in Web standard fonts

I need to decide whether to render geometric symbols in a web GUI (e.g. arrows and triangles for buttons, menus, etc.) as Unicode symbols (MUCH easier and color-independent) or GIF/PNG files (lots of hassle I would like to avoid). However, I have seen Windows clients that have trouble displaying even advanced punctuation symbols declare...