unicode

Avoid printing unicode replacement character in Java

In Java, why does Character.toString((char) 65533) print out this symbol: � ? I have a Java program which prints these characters all over the place. Its a big program. Any ideas on what I can do to avoid this? ...

What is internal representation of string in Python 3.x

In Python 3.x, a string consists of items of Unicode ordinal. (See the quotation from the language reference below.) What is the internal representation of Unicode string? Is it UTF-16? The items of a string object are Unicode code units. A Unicode code unit is represented by a string object of one item and can hold either a ...

How to convert a UTF-8 byteOffset into a charOffset for a Java String?

I have a byte offset for a byte array containing a UTF-8 encoded string, how can I transform that into a char offset for the corresponding Java String? NB. this question used to read: I have a byte offset into a standard Java String, and I would like to convert that to a character offset. In practice this will mean a method like charO...

How to convert std::wstring to a TCHAR*

std::wstring.c_str() returns a wchar_t*. How do I get from wchar_t* to TCHAR*, or from std::wstring to TCHAR* Thanks ...

What is the easiest way to convert a char array to a WCHAR array?

In my code, I receive a const char array like the following: const char * myString = someFunction(); Now I want to postprocess it as a wchar array since the functions I use afterwards don't handle narrow strings. What is the easiest way to accomplish this goal? Eventually MultiByteToWideChar? (However, since it is a narrow string w...

Python csv library with Unicode/UTF-8 support that "just works"

The csv module in Python doesn't work properly when there's UTF-8/Unicode involved. I have found in Python documentation (http://docs.python.org/library/csv.html) and other webpages snippets that work for specific cases, but you have to understand well what encoding you are handling and use the appropiated snippet. Is there any universa...

Reading web pages / unicode

hello I have this function in Delphi 2009 /2010 It returns garbage, now if I change the char,pchar types to Ansichar,Pansichar it returns the text but all foreign unicode text is garbage. it drive me banana I have been trying all kind of stuff for 2 days now I thought I understoff this unicode crap but I guess I do not Help please tha...

String causes rendering exception with utf-8 defined

One of my template tags should return a list of links; most of the elements get their name from the database with the exception of one, which I'll hardcode because it will never change. lista_menu = '<ul class="menu">\n\ <li><a href="' + reverse('profileloja', args=(s_loja,)) + '">' + \ loja.nome.title() + '</a></li>\n<li><a href="' + r...

Classic ASP: How to write unicode string data in classic ASP?

How can I show an nvarchar column that stores unicode data (Entered with the zawgyi1 font) in a classic ASP web page? When I retrieve and write the value to the page, it shows "?????". I set my ASP page's content type of UTF-8 with the following meta tag: <META http-equiv="Content-Type" content="text/html; charset=UTF-8"> Unfortunat...

How do I read a unicode url with PHP?

I have a unicode url: \test.php?sText=Московский I would like to use the $_Get function to work with the value of sText. The code I have for test.php is: <?php $sVar = $_GET['sText']; echo "Variable = $sVar"; ?> Problem is that the above is coming bach as: Variable = ?????????? What do I need to do? ...

Python: How do I read and parse a unicode utf-8 text file?

I am exporting UTF-8 text from Excel and I want to read and parse the incoming data using Python. I've read all the online info so I've already tried this, for example: txtFile = codecs.open( 'halout.txt', 'r', 'utf-8' ) for line in txtFile: print repr( line ) The error I am getting is: UnicodeDecodeError: 'utf8' codec can't dec...

Convert UTF-8 octets to unicode code points

Hi, I have a set of UTF-8 octets and I need to convert them back to unicode code points. How can I do this in python. e.g. UTF-8 octet ['0xc5','0x81'] should be converted to 0x141 codepoint. ...

unicode document viewer tool?

Is there a tool out there that provides unicode codepoint display? Kind of like a hex editor except the codepoints would be displayed rather than bytes. to clarify: I want to be able to display a document (either a file, or paste what's in the clipboard), and have two views of that document at once, the original text including unic...

C# Unicode (Japanese Characters)

Hello, I have a Japanese final coming up soon, so to help me study I made a program to help me study. But, I can't seem to get VS2008 to display any Unicode in the Console. This is a sample I used to see if I could display Unicode: string diancai = new string(new char[]{ '\u70B9','\u83DC' }); Console.Write(diancai[0] + " " + d...

ANTLRWorks error compiling grammar: "syntax error: invalid char literal: INVALID"

I wrote a stub for a grammar (only matches comments so far), and it's giving me the error "syntax error: invalid char literal: <INVALID>". Moreover, i've tracked down the error to being in the following command: ... ~LINE_ENDING* ... LINE_ENDING : ( '\n' | '\r' | '\r\n'); Can someone help me fix this? ...

t-sql LIKE and special characters

"[" is not classed a unicode character http://en.wikipedia.org/wiki/List%5Fof%5FUnicode%5Fcharacters (my guess) as to why this wouldn't work: declare @v nvarchar(255) set @v = '[x]825' select 1 where @v like '[x]825' Ta! ...

Java Runtime Exec on Windows Fails with Unicode in Arguments

I want to launch a browser and load a web page using Java's Runtime exec. The exact call looks like this: String[] explorer = {"C:\\Program Files\\Internet Explorer\\IEXPLORE.EXE", "-noframemerging", "C:\\ ... path containing unicode chars ... \\Main.html"}; Runtime.getRuntime().exec(explorer); In my case, the path contains ...

Getting the size in bytes or in chars of a member of a struct or union in C/C++?

Let's say that I want to get the size in bytes or in chars for the name field from: struct record { int id; TCHAR name [50]; }; sizeof(record.name) does not work. ...

Unicode filenames on python 2.6 under Mac OS X

I'm using os.walk to create a list of all music files under a folder. Some of these filenames are non-ascii, for example: 01 空即是色.mp3 I'm using the mutagen library to parse metadata for this file, and it professes complete unicode support. The filename is being retrieved as unicode, and can be printed as unicode. However, no matte...

SQLITE update fails with error code 1 (SQLITE_ERROR)

Hey sqliters, I am having a strange or maybe not so strange problem with my sqlite db. I have a field of "Text" type and it worked like a charm with any English texts for ages. The text in the field used to come from an MFC CEdit. Now I switched to CRichEditCtrl to support formatting and UNICODE texts. The CRichEditCtrl dumps the for...