unicode

Best way to convert text files between character sets?

What is the fastest, easiest tool or method to convert text files between character sets? Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa. Everything goes: one-liners in your favorite scripting language, command-line tools or other utilities for OS, web sites, etc. Best solutions so far: On Linux/UNIX/OS X/cy...

How to replace a character programatically in Oracle 8.x series

Due to repetitive errors with one of our Java applications: Engine engine_0: Error in application action. org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x13) was found in the element content of the document. I need to "fix" some Unicode character in an Oracle database, ideally in a programmatic fashion. Once identi...

Problem with unicode String literal in unit test

I have a JUnit test that tests adding Strings to a Dictionary custom type. Everything works fine for everyone else on a Linux/Windows machine, however, being the first dev in my shop on a mac, this unit test fails for me. The offending lines are where unicode string literals are used: dict.add( "Su字/会意pin", "Su字/会意pin" ); dict...

How to remove these kind of symbols (junk) from string?

Imagine I have String in C#: "I Don’t see ya.." I want to remove (replace to nothing or etc.) these "’" symbols. How do I do this? ...

Difference between VARCHAR2(11 BYTE) and VARCHAR2(11 CHAR)

In Oracle, what is the difference between : CREATE TABLE CLIENT ( NAME VARCHAR2(11 BYTE), ID_CLIENT NUMBER ) and CREATE TABLE CLIENT ( NAME VARCHAR2(11 CHAR), -- or even VARCHAR2(11) ID_CLIENT NUMBER ) Thank you. ...

Migrating an Existing Application to accept Unicode.

We have in the process of upgrading our application to full Unicode comptibility as we have recently got Delphi 2009 which provides this out of the box. I am looking for anyone who has experience of upgrading an application to accept Unicode characters. Specifically answers to any of the following questions. We need to change VarChar...

Unicode debug visualizer in Visual Studio 2008

Is there a unicode debug visualizer in Visual Studio 2008? I have a xml file that I'm pretty sure is in unicode. When I open it in wordpad, it shows the japanese characters correctly. When I read the file into a string using File.ReadAllText (UTF8), all the japanese characters show up as blocks in the string visualizer. If I use the xml ...

Non-unicode XML representation

I have xml where some of the element values are unicode characters. Is it possible to represent this in an ANSI encoding? E.g. <?xml version="1.0" encoding="utf-8"?> <xml> <value>受</value> </xml> to <?xml version="1.0" encoding="Windows-1252"?> <xml> <value>&#27544;</value> </xml> I deserialize the XML and then attempt to serializ...

What impact (if any) does Delphi 2009's switch to Unicode(/UTF16) have on executable size and memory footprint?

Here's one from the "No question's too dumb" department: Well, as the subject says: Is there an impact? If so, how much? Will all the string literals I have in my code and in my DFM resources now take up twice as much space inside the compiled binaries? What about runtime memory usage of compiled applications? Will all the string variab...

Will everything in the standard library treat strings as unicode in Python 3.0?

I'm a little confused about how the standard library will behave now that Python (from 3.0) is unicode-based. Will modules such as CGI and urllib use unicode strings or will they use the new 'bytes' type and just provide encoded data? ...

Smallest Unicode encodings for different languages?

What are the typical average bytes-per-character rates for different unicode encodings in different languages? E.g. if I wanted the smallest number of bytes to encode some english text, then on average UTF-8 would be 1-byte per character and UTF-16 would be 2 so I'd pick UTF-8. If I wanted some Korean text, then UTF-16 might average ab...

How to determine whether a character is a letter in Java?

How do you check if a one-character String is a letter - including any letters with accents? I had to work this out recently, so I'll answer it myself, after the recent VB6 question reminded me. ...

How to render unicode characters in the correct font? (C#/WinForms)

My application correctly handles different kind of character sets, but only internally - when it comes to displaying text in standard WinForms labels und textboxes, it seems to have problems with chinese characters. The problem seems to be the font used (Tahoma), because when I copy&paste the text, or view it in the debugger, it is disp...

Is there a way to programatically determine if a font file has a specific Unicode Glyph?

I'm working on a project that generates PDFs that can contain fairly complex math and science formulas. The text is rendered in Times New Roman, which has pretty good Unicode coverage, but not complete. We have a system in place to swap in a more unicode complete font for code points that don't have a glyph in TNR (like most of the "str...

What was the first version of MS Office to officially support Unicode?

I am doing some research on Unicode for a white-paper I am writing. Does anyone remember the first version of MS Office on the Windows platform that was fully Unicode compliant? Not having much luck Googling this answer out of the net. ...

Why unicode() uses str() on my object only with no encoding given?

I start by creating a string variable with some non-ascii utf-8 encoded data on it: >>> text = 'á' >>> text '\xc3\xa1' >>> text.decode('utf-8') u'\xe1' Using unicode() on it raises errors... >>> unicode(text) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0...

Toad unicode input problem

In toad, I can see unicode characters that are coming from oracle db. But when I click one of the fields in the data grid into the edit mode, the unicode characters are converted to meaningless symbols, but this is not the big issue. While editing this field, the unicode characters are displayed correctly as I type. But as soon as I pre...

What is the best unicode library for C?

What is the best unicode library for C? Where "best" is defined by cross-platform support, compiler independence, and reasonable performance across a the most common languages in use. ...

Convert a UTF-8 string to/from 7-bit XML in PHP

How can UTF-8 strings (i.e. 8-bit string) be converted to/from XML-compatible 7-bit strings (i.e. printable ASCII with numeric entities)? i.e. an encode() function such that: encode("“£”") -> "&#8220;&#163;&#8221;" decode() would also be useful: decode("&#8220;&#163;&#8221;") -> "“£”" PHP's htmlenties()/html_entity_decode() pair d...

How do I truncate a java string to fit in a given number of bytes, once UTF-8 encoded?

How do I truncate a java String so that I know it will fit in a given number of bytes storage once it is UTF-8 encoded? ...