What is the fastest, easiest tool or method to convert text files between character sets?
Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa.
Everything goes: one-liners in your favorite scripting language, command-line tools or other utilities for OS, web sites, etc.
Best solutions so far:
On Linux/UNIX/OS X/cy...
Due to repetitive errors with one of our Java applications:
Engine engine_0: Error in application action.
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x13)
was found in the element content of the document.
I need to "fix" some Unicode character in an Oracle database, ideally in a programmatic fashion. Once identi...
I have a JUnit test that tests adding Strings to a Dictionary custom type. Everything works fine for everyone else on a Linux/Windows machine, however, being the first dev in my shop on a mac, this unit test fails for me. The offending lines are where unicode string literals are used:
dict.add( "Su字/会意pin", "Su字/会意pin" );
dict...
Imagine I have String in C#: "I Don’t see ya.."
I want to remove (replace to nothing or etc.) these "’" symbols.
How do I do this?
...
In Oracle, what is the difference between :
CREATE TABLE CLIENT
(
NAME VARCHAR2(11 BYTE),
ID_CLIENT NUMBER
)
and
CREATE TABLE CLIENT
(
NAME VARCHAR2(11 CHAR), -- or even VARCHAR2(11)
ID_CLIENT NUMBER
)
Thank you.
...
We have in the process of upgrading our application to full Unicode comptibility as we have recently got Delphi 2009 which provides this out of the box. I am looking for anyone who has experience of upgrading an application to accept Unicode characters. Specifically answers to any of the following questions.
We need to change VarChar...
Is there a unicode debug visualizer in Visual Studio 2008? I have a xml file that I'm pretty sure is in unicode. When I open it in wordpad, it shows the japanese characters correctly. When I read the file into a string using File.ReadAllText (UTF8), all the japanese characters show up as blocks in the string visualizer. If I use the xml ...
I have xml where some of the element values are unicode characters. Is it possible to represent this in an ANSI encoding?
E.g.
<?xml version="1.0" encoding="utf-8"?>
<xml>
<value>受</value>
</xml>
to
<?xml version="1.0" encoding="Windows-1252"?>
<xml>
<value>殘</value>
</xml>
I deserialize the XML and then attempt to serializ...
Here's one from the "No question's too dumb" department:
Well, as the subject says: Is there an impact? If so, how much? Will all the string literals I have in my code and in my DFM resources now take up twice as much space inside the compiled binaries? What about runtime memory usage of compiled applications? Will all the string variab...
I'm a little confused about how the standard library will behave now that Python (from 3.0) is unicode-based. Will modules such as CGI and urllib use unicode strings or will they use the new 'bytes' type and just provide encoded data?
...
What are the typical average bytes-per-character rates for different unicode encodings in different languages?
E.g. if I wanted the smallest number of bytes to encode some english text, then on average UTF-8 would be 1-byte per character and UTF-16 would be 2 so I'd pick UTF-8.
If I wanted some Korean text, then UTF-16 might average ab...
How do you check if a one-character String is a letter - including any letters with accents?
I had to work this out recently, so I'll answer it myself, after the recent VB6 question reminded me.
...
My application correctly handles different kind of character sets, but only internally - when it comes to displaying text in standard WinForms labels und textboxes, it seems to have problems with chinese characters.
The problem seems to be the font used (Tahoma), because when I copy&paste the text, or view it in the debugger, it is disp...
I'm working on a project that generates PDFs that can contain fairly complex math and science formulas. The text is rendered in Times New Roman, which has pretty good Unicode coverage, but not complete. We have a system in place to swap in a more unicode complete font for code points that don't have a glyph in TNR (like most of the "str...
I am doing some research on Unicode for a white-paper I am writing. Does anyone remember the first version of MS Office on the Windows platform that was fully Unicode compliant? Not having much luck Googling this answer out of the net.
...
I start by creating a string variable with some non-ascii utf-8 encoded data on it:
>>> text = 'á'
>>> text
'\xc3\xa1'
>>> text.decode('utf-8')
u'\xe1'
Using unicode() on it raises errors...
>>> unicode(text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0...
In toad, I can see unicode characters that are coming from oracle db. But when I click one of the fields in the data grid into the edit mode, the unicode characters are converted to meaningless symbols, but this is not the big issue.
While editing this field, the unicode characters are displayed correctly as I type. But as soon as I pre...
What is the best unicode library for C? Where "best" is defined by cross-platform support, compiler independence, and reasonable performance across a the most common languages in use.
...
How can UTF-8 strings (i.e. 8-bit string) be converted to/from XML-compatible 7-bit strings (i.e. printable ASCII with numeric entities)?
i.e. an encode() function such that:
encode("“£”") -> "“£”"
decode() would also be useful:
decode("“£”") -> "“£”"
PHP's htmlenties()/html_entity_decode() pair d...
How do I truncate a java String so that I know it will fit in a given number of bytes storage once it is UTF-8 encoded?
...