questions about unicode

How do I enter unicode characters in eclipse?

I am running eclipse in Linux and while shift+control+U+1+1+1+1 works in other apps (where 1111 are the 4 digits of the unicode character in question) like firefox, it fails in eclipse. I just want to add some special chars to my Java string. ...

linux

eclipse

unicode

Python TypeError unsupported operand type(s) for %: 'file' and 'unicode'

I'm working on a django field validation and I can't figure out why I'm getting a type error for this section: def clean_tid(self): data = self.cleaned_data['tid'] stdout_handel = os.popen("/var/www/nsmweb/jre1.6.0_14/bin/java -jar /var/www/nsmweb/sla.jar -t %s grep -v DAN") % data result = stdout_handel.read() Do I have ...

ASP.NET - accept multilanguage input and display correctly

I have a basic web form in asp.net for recording some basic information. How do I ensure that I can accept information in any language, store in sql and then redisplay on another webpage correctly. At the moment accents on certain characheters are displaying incorrectly. ...

Unicode utf-8/utf-16 encoding in Python

In python: u'\u3053\n' Is it utf-16? I'm not really aware of all the unicode/encoding stuff, but this type of thing is coming up in my dataset, like if I have a=u'\u3053\n'. print gives an exception and decoding gives an exception. a.encode("utf-16") > '\xff\xfeS0\n\x00' a.encode("utf-8") > '\xe3\x81\x93\n' print a.encode("utf-8...

Inserting Unicode characters with PHP -> ODBC -> MS SQL?

I have the following code: $sql = "update tbl_test set category = N'resumé'; echo $sql; $rs=odbc_exec($conn,$sql); Where $conn is a DSN ODBC connection to an MSSQL Server. The problem seems to be that somewhere between PHP and MySQL (Maybe ODBC?) unicode characters are converted to junk. If I copy paste exactly what the echo says ...

php

unicode

odbc

Is it safe to convert varchar and char into nvarchar and nchar in SQL Server?

We currently have a number of columns in the database which are of type varchar. The application that uses them are in C# and it uses Linq2Sql for the communication (or what to call it). We would like to support unicode characters, which means we would have to convert the varchar columns into nvarchar. Is this a safe operation? Is it j...

Bypass the need to add the prefix N in string literals

Hello, We inherited an application that was not originally designed to consume localized data (Localized strings like Russian/Japanese, Localized datetime formats, etc). The original developers who worked on the project did not anticipate that there will be unicode strings. Although the table's datatypes support unicode characters (NVA...

sql

unicode

localization

Convert value into a URL friendly format - Unicode decomposition ähhh..

I need to convert a value "Convert value into a a URL friendly format - Unicode decomposition ähhh" into "convert-value-into-a-url-friendly-format-unicode-decomposition-ahhh". Is this possible in SQL-Server? All Unicode - Characters should be handled. I use SQL-Server 2005, 2008 as an option. EDIT Bogdan had a solution that worked for...

sql-server

unicode

SharePoint SPUtility.SendEmail() is scrambling unicode characters in subject line

So, I'm using SharePoint's SPUtility.SendEmail() to send an email with non-ascii characters in the subject line. The problem is that the Icelandic character 'ð' is scrambled to '?'. This only happens in the subject line, the message body is fine. The problem does not seem to be with the email client, since the problem appears both in Gm...

Searching a Unicode file using Python

Setup I'm writing a script to process and annotate build logs from Visual Studio. The build logs are HTML, and from what I can tell, Unicode (UTF-16?) as well. Here's a snippet from one of the files: c:\anonyfolder\anonyfile.c(17169) : warning C4701: potentially uninitialized local variable 'object_adrs2' used c:\anonyfolder...

python

unicode

encoding

How do I use CharNext in the Windows API properly?

I have a multi-byte string containing a mixture of japanese and latin characters. I'm trying to copy parts of this string to a separate memory location. Since it's a multi-byte string, some of the characters uses one byte and other characters uses two. When copying parts of the string, I must not copy "half" japanese characters. To be ab...

c++

unicode

multibyte

Entering Unicode data in Visual Studio, C#

Is there a good way to type Unicode symbols in a C# file? I'm looking for something to the effect of: Press ALT Type Unicode Hex Release Alt Currently, I'm having to type the symbol into word and copy-paste it into my source file. ...

visual-studio

unicode

Where did I go wrong with this unicode field in MySQL?

I have a table with a field which contains strings in my MySQL database. The MySQL version is 5.0.51a. The default character set for the table is 'utf8'. Many of the strings have unicode characters such as \xae and \u21222 (registered symbol and trademark symbol respectively). For example, suppose I have a row with a field this v...

mysql

unicode

Linux vs. Windows: How does the console render unicode characters?

This is quite a low-level (low in the sense of "closer to the metal") question. I was wondering if any of you could point me to documentation, explanations, etc. of how, upon receiving a Unicode character (or any character code, but I'm particularly interested in the Unicode Standard) the console in Windows, good ol' cmd.exe (using, say...

UCS-2LE text file parsing

I have a text file that was created using some Microsoft reporting tool. The text file includes the BOM 0xFFFE in the beginning and then ASCII character output with nulls between characters (i.e "F.i.e.l.d.1."). I can use iconv to convert this to UTF-8 using UCS-2LE for input format and UTF-8 for output format... it works great. My pr...

unicode

wstring

ucs2

latex and unicode: how to write special symbols of other scripts, or import symbols?

I couldn't find the answer to this question in SO. I'll try to explain. I'm writing some text in which I need to change scripts very often. Say I want to write some unicode character (not in the character script, but more in the transliteration of it, say \'a for á). What is the best way to do this in scripts such as indic, chinese, etc...

unicode

latex

Is UTF-8 acceptable for reading/writing Asian languages?

I am accepting user input via a web form (as UTF-8), saving it to a MySQL DB (using UTF-8 character set) and generating a text file later (encoded as UTF-8). I am wondering if there is any chance of text corruption using UTF-8 instead of something like UCS-2? Is UTF-8 good enough in this situation? ...

c#

unicode

utf-8

How do I convert unicode characters to floats in Python?

I am parsing a webpage which has Unicode representations of fractions. I would like to be able to take those strings directly and convert them to floats. For example: "⅕" would become 0.2 Any suggestions of how to do this in Python? ...

How do I calculate the numeric value of a string with unicode components in python?

Along the lines of my previous question, http://stackoverflow.com/questions/1263796/how-do-i-convert-unicode-characters-to-floats-in-python , I would like to find a more elegant solution to calculating the value of a string that contains unicode numeric values. For example, take the strings "1⅕" and "1 ⅕". I would like these to resolve...

Does str() call decode() method behind scenes?

It seems to me that built-in functions __repr__ and __str__ have an important difference in their base definition. >>> t2 = u'\u0131\u015f\u0131k' >>> print t2 ışık >>> t2 Out[0]: u'\u0131\u015f\u0131k' t2.decode raises an error since t2 is a unicode string. >>> enc = 'utf-8' >>> t2.decode(enc) ---------------------------------------...