I'm going to ask what is probably quite a controversial question: "Should one of the most
popular encodings, UTF-16, be considered harmful?"
Why do I ask this question?
How many programmers are aware of the fact that UTF-16 is actually a variable length encoding? By this I mean that there are code points that, represented as surrogate ...
With this code:
test.py
import sys
import codecs
sys.stdout = codecs.getwriter('utf-16')(sys.stdout)
print "test1"
print "test2"
Then I run it as:
test.py > test.txt
In Python 2.6 on Windows 2000, I'm finding that the newline characters are being output as the byte sequence \x0D\x0A\x00 which of course is wrong for UTF-16.
Am I...
Consider the following string. Its encoded in UTF-16-LE and saved into a PHP variable. I failed to get either mbstring or iconv to replace the ' with single quote. What would be a good way to sanatize it.
String : Carl Sagan's Cosmic Connection
...
I have a textfile encoded in UTF-16. Each line contains a number of columns separated by tabs. For those who care, the file is a playlist TXT export from iTunes. Column #27 contains a filename.
I am reading it using Perl 5.8.8 in Linux using code similar to:
binmode STDIN, ":encoding(UTF-16)";
while(<>)
{
chomp;
my @cols = s...
I'm trying to update a series of xml files by changing names that they reference. I have a table of names that have changed, column for the current name and a column for the name to replace with.
I looked for ways to script search and replace and found sed. It seemed like a good choice until I ran my first attempt. On inspecting the fi...
I have a file watcher that is grabbing content from a growing file encoded with utf-16LE. The first bit of data written to it has the BOM available -- I was using this to identify the encoding against UTF-8 (which MOST of my files coming in are encoded in). I catch the BOM and re-encode to UTF-8 so my parser doesn't freak out. The proble...
Please see here for a related question.
However, char goes to 0xffff (or 65535). I need to write 0xd800df46 (or 66374), Gothic letter Faihu, so casting that int to char will not work. I do the conversion ok, that is, I get the correct integer, meaning I calculate the surrogate pairs ok, but I don't know how to "render" it, convert it t...
Could anyone give me a concise definitions of
Unicode
UTF7
UTF8
UTF16
UTF32
Codepages
How they differ from Ascii/Ansi/Windows 1252
I'm not after wikipedia links or incredible detail, just some brief information on how and why the huge variations in Unicode have come about and why you should care as a programmer.
...
Hi,
Having ignored it all this time, I am currently forcing myself to learn more about unicode in Java. There is an exercise I need to do about converting a UTF-16 string to 8-bit ASCII. Can someone please enlighten me how to do this in Java? I understand that you can't represent all possible unicode values in ASCII, so in this case...
I have an ActionResult that returns XML for an embedded device. The relevant code is:
return Content(someString, "text/xml", Encoding.UTF8);
Even though UTF-8 is specified, the resulting XML is:
<?xml version="1.0" encoding="utf-16"?>
The ASP.NET MVC is compiled as AnyCPU and runs on a Windows 2008 server.
Why is it not returni...
I need to get the ASCII character for every character in a string. Actually its every character in a (small) file. The following first 3 lines successfully pull all a file's contents into a string (per this recipe):
set fp [open "store_order_create_ddl.sql" r]
set data [read $fp]
close $fp
I believe I am correctly discerning the ASC...
Is it possible to know if a file has unicode (16-byte per char) or 8-bit ASCII content ?
...
The value of parameters 'NLS_CHARACTERSET' and 'NLS_NCHAR_CHARACTERSET' is UTF-8 for source database from where i am reading data, and AL32UTF8 and UTF-8 for target database where i am writing data. I am reading data from a text file which has english, european and asian characters, I am using UTF-16 code page to read from source flat fi...
I'm using a variant on code seen in "How to make XMLDOMDocument include the XML Declaration?" (which can also be seen at MSDN. If I change the encoding to "UTF-16" one would think it would output as UTF-16... and it "does"... by looking at the output in a text editor; but checking it in a hex editor, the byte-order mark is missing (despi...
Following up on my previous question concerning the Windows 7 taskbar, I would like to diagnose why Windows isn't acknowledging that my application is independent of javaw.exe. I presently have the following JNA code to obtain the AppUserModelID:
public class AppIdTest {
public static void main(String[] args) {
NativeLibrar...
How do you find valid locale names?
I am currently using MAC OS X.
But information about other platforms would also be useful.
#include <fstream>
#include <iostream>
int main(int argc,char* argv[])
{
try
{
std::wifstream data;
data.imbue(std::locale("en_US.UTF-16"));
data.open("Plop");
}
catch...
Extending from this questions about locales
And described in this question: What I really wanted to do was install a codecvt facet into the locale that understands UTF-16 files.
I could write my own. But I am not a UTF expert and as such I am sure I would get it nearly correct; but it would break at the most inconvenient time. So I was ...
I am about to start working on something the requires reading bytes and creating strings. The bytes being read represent UTF-16 strings. So just to test things out I wanted to convert a simple byte array in UTF-16 encoding to a string. The first 2 bytes in the array must represent the endianness and so must be either 0xff 0xfe or 0xfe...
In my current implementation of a UISearchBarController I'm using [NSString compare:] inside the filterContentForSearchText:scope: delegate method to return relevant objects based on their name property to the results UITableView as you start typing.
So far this works great in English and Korean, but what I'd like to be able to do is se...
In the last section of the code I print what the Reader gives me. But its just bogus, where did I go wrong?
public static void read_impl(File file, String targetFile) {
// Create zipfile input stream
FileInputStream stream = new FileInputStream(file);
ZipInputStream zipFile = new ZipInputStream(new BufferedInputStream(stream...