unicode

How can I make this Python2.6 function work with Unicode?

I've got this function, which I modified from material in chapter 1 of the online NLTK book. It's been very useful to me but, despite reading the chapter on Unicode, I feel just as lost as before. def openbookreturnvocab(book): fileopen = open(book) rawness = fileopen.read() tokens = nltk.wordpunct_tokenize(rawness) nltk...

What is the range of Unicode Printable Characters?

Can anybody please tell me what is the range of Unicode (UTF8) printable characters? [e.g. Ascii printable character range is \u0020 - \u007f] ...

Force display text from Unicode in input field (from ajax)

Hello all, We are doing an ajax call to retrieve from database. Since our clients may use different languages we encode everything into unicode to store in the database (saves worrying about collations and such). Now when we fetch such content to be displayed in an input text field it is displaying the unicode codes. Checked the HTML 4 ...

Unicode characters are being saved incorrectly.

I have a mysql database with unicode text strings in it. My JSF application (running on tomcat 6) can read these unicode strings out and display them correctly in the browser. All the html charsets are set to UTF-8. When I save my object, even having made no changes, Hibernate persists it back to the database. If I look directly in the ...

Converting a string to LPCWSTR for CreateFile() to address a serial port

I seem to be having a bit of a TEXT / UNICODE problem when using the windows CreateFile function for addressing a serial port. Can someone please help point out my error? I'm writing a Win32 console application in VC++ using VS 2008. I can create a handle to address the serial port like this: #include <iostream> #include <windows...

What happens when starting a .NET console application?

What exactly happens when a .NET console application starts? In the process explorer, when starting the exe I am wondering why I cannot see a "cmd.exe" process as a parent process for the console application. What exactly is displayed then? Is there a way to replace the "default" console window by another one? I guess this would mean m...

How to discover what codepage to use when converting RTF hex literals to Unicode

I'm parsing RTF 1.5+ files generated by Word 2003+ that may have content from other languages. This content is usually encoded as hex literals (\'xx). I would like to convert these literals to unicode values. I know my document's code page by looking for ansicpg (\ansi\ansicpg1252). When I use the ansicpg codepage to decode to Unicode,...

Python os.walk and japanese filename crash.

Possible Duplicate: Python, Unicode, and the Windows console I have a folder with a filename "01 - ナナナン塊.txt" I open python at the interactive prompt in the same folder as the file and attempt to walk the folder hierachy: Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copy...

Why Does SetString Take Less Memory in Delphi (with Unicode)?

This is Delphi 2009, so Unicode applies. I had some code that was loading strings from a buffer into a StringList as follows: var Buffer: TBytes; RecStart, RecEnd: PChar; S: string; FileStream.Read(Buffer[0], Size); repeat ... find next record RecStart and RecEnd that point into the buffer; ...

how to generate Chinese Characters using Postscript?

Hi All, Does anyone knows how to generate Chinese characters using Postscript or related tools? I'd like to use unicode to represent Chinese characters but it seems that Postscript doesn't support unicode, yet. In addition, I'd like to specify several fonts to generate the same character. Thus, I have two questions: 1. how to use unic...

Rails 2.x and Unicode

I've read several Stack Overflow questions re. this and haven't been able to find an answer. I'm running Rails 2.3.3 and am having an issue properly displaying Unicode characters from MySQL in my app and even in the Rails console. Connecting to MySQL via my ssh console and querying works fine. However, soon as I retrieve a record co...

Ruby on Rails. Unicode routes

Hi. Is it possible to set a Unicode string as a segment of a path in Rails? I try the following: # app/controllers/magazines_controller.rb class MagazinesController < ApplicationController def index end end ...

How to do a Python split() on languages (like Chinese) that don't use whtespace as word separator?

I want to split a sentence into a list of words. For English and European languages this is easy, just use split() >>> "This is a sentence.".split() ['This', 'is', 'a', 'sentence.'] But I also need to deal with sentences in languages such as Chinese that don't use whitespace as word separator. >>> u"这是一个句子".split() [u'\u8fd9\u662f\u...

[SOLVED] DjangoUnicodeDecodeError and force_unicode

I've simple Django model of news entry: class NewsEntry(models.Model): pub_date = models.DateTimeField('date published') title = models.CharField(max_length = 200) summary = models.TextField() content = models.TextField() def __unicode__(self): return self.title Adding new news (in Admin page) with english text wo...

Finding Unicode character name with Javascript

I need to find out the names for Unicode characters when the user enters the number for it. An example would be to enter 0041 and get given "Latin Capital Letter A" as the result. Thanks ...

Working with files and utf8 in PHP

This is driving me crazy. Lets say I have a file called foo.txt encoded in utf8: aoeu qjkx ñpyf And I want to get an array that contains all the lines in that file (one line per index) that have the letters aoeuñpyf, and only the lines with these letters. I wrote the following code (also encoded as utf8): $allowed_letters=array("...

Python: any way to perform this "hybrid" split() on multi-lingual (e.g. Chinese & English) strings?

I have strings that are multi-lingual consist of both languages that use whitespace as word separator (English, French, etc) and languages that don't (Chinese, Japanese, Korean). Given such a string, I want to separate the English/French/etc part into words using whitespace as separator, and to separate the Chinese/Japanese/Korean part ...

Encoding problem downloading HTML using mechanize and Python 2.6

browser = mechanize.Browser() page = browser.open(url) html = page.get_data() print html It shows some strange characters. I suppose that it is UTF-8 string but Python doesn't know that and cannot show it properly. How can I convert this string to unicode string like u = u'test' ...

node.js Nerve framework unicode response

code: var nerve = require("./nerve"); var sitemap = [ ["/", function(req, res) { res.respond("Русский"); }] ]; nerve.create(sitemap).listen(8100); show in browser: CAA:89 How it should be correct? ...

Python unicode popen or Popen error reading unicode

Hello, I have a program that generates the following output: ┌───────────────────────┐ │10 day weather forecast│ └───────────────────────┘ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ Tonight Sep 27 Clear 54 0 % Tue Sep 28 Sunny 85/61 0 % Wed...