unicode

What is the better error handling method for decoding Python bytes to unicode strings?

Hi I have an old C# program that is being ported to Python 3 for different reasons. Basically, what the program does is to fetch a website and search its content (and process it, but that is not really relevant). I have never really had any issues with the actual fetch-and-search routine, but once I ported it to Python it started compla...

Return unicode string from python via ajax

Hi! I have a small webapp that runs Python on the server side and javascript (jQuery) on the client side. Now upon a certain request my Python script returns a unicode string and the client is supposed to put that string inside a div in the browser. However i get a unicode encode error from Python. If i run the script from the shell (b...

Really fast C++ html parser

Hello to all, I'm doing a html text feature extractor in C++; the program need to be REALLY fast: i need to extract a this features in ms per html page and the memory usage needs to be good and finally unicode encoding well be nice. I know how difficult is to have all of this things, but i want a parser close to these things at least. ...

How can I use ToUnicode without breaking dead key support?

A similar question has already been asked, so I'm not going to waste time re-explaining it, an existing discussion can be found here: http://stackoverflow.com/questions/1964614/toascii-tounicode-in-a-keyboard-hook-destroys-dead-keys The reason I'm posting a new question however is that I seem to have come across a 'solution', but I'm no...

Why do Unicode characters show up properly in database, but as ? when printed in Java via Hibernate?

I'm writing a webapp, and interfacing with MySQL using Hibernate 3.5. Using "デスクトップ ინგლისური" as my test string, I can input the string and see that it is properly persisted into the database. However, when I later pull the value out of the database and print to the console as a String, I see "?????? ?????????". If I use new OutputS...

MySQL treats ÅÄÖ as AAO?!

These two querys gives me the exact same result: select * from topics where name='Harligt'; select * from topics where name='Härligt'; How is this possible? Seems like mysql translates åäö to aao when it searches. Is there some way to turn this off? I use utf-8 encoding everywhere as far as i know. The same problem occurs both from t...

Flex TextField won't accept "ü" and other "German" characters

I'm having problems with Flex (3.5) auto converting "ü" into a "u". As soon as I paste the character in, it transforms. Is there something I need to turn on to enable these other character sets? I thought Flex supported UTF-8? Thanks! ...

How to send parameters with same encoding from javascript?

I have a javascript file that lots of people have embedded to their pages. Since I am hosting the file, I have control over that javascript file; I cannot control the way it is embedded because lots of people is using it already. This javascript file sends GET requests to my servlets, and the parameters passed with the request are recor...

Validation of user input or ���������

We're letting users search a database from a single text input and I'm having difficulties in filtering some user supplied strings. For example, if the user submits: ��������� lcd SONY (Note the ?'s) I need to cancel the search. I include the base64 encoded version of the above string wrapped up so that its easy run: print(base64_d...

reading and writing greek to an oracle database

I do not have a unicode database. I should be able to read and write greek to oracle using nvarchar or nclobs. I got this to work with Oracle's SQL Developer by adding this one line to: sqldeveloper\sqldeveloper\bin directory, open the file sqldeveloper.conf and add another AddVMOption line below other such lines: AddVMOption -Doracle...

Emacs Lisp: how to set encoding for call-process

I thought I knew how to set coding-system (or encoding): use process-coding-system-alist. Apparently, it's not working. ;; -*- coding: utf-8 -*- (require 'cl) (let ((process-coding-system-alist '(("cygwin/bin/bash" . (utf-8-dos . utf-8-unix))))) (setq my-words (list "Lilo" "ಠ_ಠ" "_ಠ" "ಠ_" "ಠ" "Stitch") my-cygwin-bash "C:/cygwin/b...

Convert Unicode char to closest (most similar) char in ASCII (.NET)

How do I to convert different Unicode characters to their closest ASCII equivalents? Like Ä -> A. I googled but didn't find any suitable solution. The trick Encoding.ASCII.GetBytes("Ä")[0] didn't work. (Result was ?). I found that there is a class Encoder that has a Fallback property that is exactly for cases when char can't be convert...

How do I convert from unicode to single byte in C#?

How do I convert from unicode to single byte in C#? This does not work: int level =1; string argument; // and then argument is assigned if (argument[2] == Convert.ToChar(level)) { // does not work } And this: char test1 = argument[2]; char test2 = Convert.ToChar(level); produces funky results. test1 can be: 49 '1' while test2...

Why Read In UTF-16LE File Won't Convert "\r\n" Into "\n" In Windows

I am using Perl to read UTF-16LE files in Windows 7. If I read in an ASCII file with following code then each "\r\n" in file will be converted into a "\n" in memory: open CUR_FILE, "<", $asciiFile; If I read in an UTF-16LE(windows 1200) file with following code, this inconsistency cause problems when I trying to regexp lines with li...

Beautiful Soup Unicode encode error

I am trying the following code with a particular HTML file from BeautifulSoup import BeautifulSoup import re import codecs import sys f = open('test1.html') html = f.read() soup = BeautifulSoup(html) body = soup.body.contents para = soup.findAll('p') print str(para).encode('utf-8') I get the following error: UnicodeEncodeError: 'asci...

How do I best remove the unicode characters that XHTML regards as non-valid using php?

I run a forum designed to support an international mathematics group. I've recently switched it to unicode for better support of international characters. In debugging this conversion, I've discovered that not all unicode characters are considered as valid XHTML (the relevant website appears to be http://www.w3.org/TR/unicode-xml/). O...

Reading greek text from jdbc / SQL Server 2005 and displaying it with a servlet

Well, the subject says it all but I will explain a little further. I have a database in MS SQL server 2005 that contains greek text. I have created a servlet that connects to that database using net.sourceforge.jtds.jdbc.Driver and receive some data with the following commands: Connection con = DriverManager.getConnection(connectionUr...

How to enable reading non-ascii characters in Servlets

How to make the servlet accept non-ascii (Arabian, chines, etc) characters passed from JSPs? I've tried to add the following to top of JSPs: <%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> And to add the following in each post/get method in the servlet: request.setCharacterEncoding("UTF-8"); res...

Parsing unicode XML with Python SAX on App Engine

I'm using xml.sax with unicode strings of XML as input, originally entered in from a web form. On my local machine (python 2.5, using the default xmlreader expat, running through app engine), it works fine. However, the exact same code and input strings on production app engine servers fail with "not well-formed". For example, it happ...

How do I output Unicode characters as a pair of ASCII characters?

How do I convert (as an example): Señor Coconut Y Su Conjunto - Introducciõn to: Señor Coconut Y Su Conjunto - Introducciõn I've got an app that creates m3u playlists, but when the track filename, artist or title contains non ASCII characters it doesn't get read properly by the music player so the track doesn't get played. ...