utf-16

Reading UTF-16 (or UTF-8) values from XML and displaying result with PHP

Hi, I'm having a lot of trouble with unicode (UTF-16) values and PHP/XML. I want to read a set of unicode values from XML and output the correct glyphs to the browser. I've tried with UTF-8 and I get the same problem. This is a simple working example I used for my first test: $text = "\x00\x41"; $text = mb_convert_encoding($text, "AS...

Generating xml utf-16 sample from xsd

We use Visual Studio 2008 to generate a sample XML from a XSD. The XML that is generated is UTF 8, but we need UTF 16. Is there any way to do this? ...

utf-8 to/from utf-16 problem

I based these two conversion functions and an answer on StackOverflow, but converting back-and-forth doesn't work: std::wstring MultiByteToWideString(const char* szSrc) { unsigned int iSizeOfStr = MultiByteToWideChar(CP_ACP, 0, szSrc, -1, NULL, 0); wchar_t* wszTgt = new wchar_t[iSizeOfStr]; if(!wszTgt) assert(0); Mult...

Java File parsing toolkit design, quick file encoding sanity check

(Disclaimer: I looked at a number of posts on here before asking, I found this one particularly helpful, I was just looking for a bit of a sanity check from you folks if possible) Hi All, I have an internal Java product that I have built for processing data files for loading into a database (AKA an ETL tool). I have pre-rolled stages ...

Can I include characters such as "ã" and "ê" in UTF-8 encoded XML, or must it be UTF-16 encoded?

Can I include characters such as "ã" and "ê" in UTF-8 encoded XML, or must it be UTF-16 encoded? ...

What is Unicode, UTF-8, UTF-16?

What's the basis for Unicode and why the need for UTF-8 or UTF-16? I have researched this on Google and searched here as well but it's not clear to me. In VSS when doing a file comparison, sometimes there is a message saying the two files have differing UTF's. Why would this be the case? Please explain in simple terms. ...

Create a file in Java for loading into an nvarchar field in SQLServer 2005 using BCP and UTF-16

Hi All, I want to use BCP to load into a SQL Server 2005 table with an nvarchar field using a loader control file. As I understand it, SQL Server 2005 only supports UTF-16 (and I believe it is UTF-16 LE). The file is being output by a Java program. The way I have it currently set up is as follows: An XML format BCP loader file (cre...

Python UTF-16 WAVY DASH encoding question / issue

Hi. I was doing some work today, and came across an issue where something "looked funny". I had been interpreting some string data as utf-8, and checking the encoded form. The data was coming from ldap (Specifically, Active Directory) via python-ldap. No surprises there. So I came upon the byte sequence '\xe3\x80\xb0' a few times, which...

Javascript - Convert string to UTF-16

I am working with Javascript for one of the first times and its for a sha-1 hash. I have found code to do this, but one of its dependencies is a method to convert the string to utf-8, however the server I am comparing against utilizes utf-16. I have looked around and all my results keep showing up w/ utf-8. Can anybody at least point me ...

Is it possible to reliably auto-decode user files to Unicode? [C#]

I have a web application that allows users to upload their content for processing. The processing engine expects UTF8 (and I'm composing XML from multiple users' files), so I need to ensure that I can properly decode the uploaded files. Since I'd be surprised if any of my users knew their files even were encoded, I have very little hop...

Open mails in outlook from java using the protocol "mapi://"

I developp a Java application using Windows Desktop Search from which I can retrieve some information about files on my computer such as urls (System.ItemUrl). An example of such url is file://c:/users/ausername/documents/aninterestingfile.txt for "normal" files. This field give also urls of mail items indexed from Outlook or Thunderb...

Utf-16BE to ISO-8859-1 in PHP

Hi, i need to convert a Utf-16BE in ISO-8859-1 in PHP (i'm not an expert in encoding so i don't know if Utf-16 and Utf-16BE are the same thing). I've read somewhere to use the mb_convert_encoding function but i haven't that function because i don't have the multibyte extension installed. So do you know an alternative method to do this? ...

Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that. I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite understand how to apply all of that infor...

PHP utf encoding problem

How can I encode strings on UTF-16BE format in PHP? For "Demo Message!!!" the encoded string should be '00440065006D006F0020004D00650073007300610067006'. Also, I need to encode Arabic characters to this format. ...

How to define a string literal containing non-ASCII characters?

I'm programming in VB.NET using Visual Studio 2008. I need to define a string literal containing the character "÷" equivalent to Chr(247). I understand that internally VS uses UTF-16 encoding, but when the source file is written to disk it contains the single byte value F7 for this character. This source file is processed by another pro...

Java, JavaCC: How to parse characters outside the BMP?

Hello, everyone! I am referring to the XML 1.1 spec. Look at the definition of NameStartChar: NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xE...

How can I decode UTF-16 data in Perl when I don't know the byte order?

If I open a file ( and specify an encoding directly ) : open(my $file,"<:encoding(UTF-16)","some.file") || die "error $!\n"; while(<$file>) { print "$_\n"; } close($file); I can read the file contents nicely. However, if I do: use Encode; open(my $file,"some.file") || die "error $!\n"; while(<$file>) { print decode("UTF-16",...

Python: UTF16 decoding adds a new blank line on Windows boxes

I'm running into an issue with extra newlines on windows versus *nix platforms. file = open('UTF16file.xml', 'rb') html = file.read().decode('utf-16') file.close() regexp = re.compile(self.originalurl, re.S) (html, changes) = regexp.subn(self.newurl, html) file = open('UTF16file-regexed.xml', 'w+') file.write(html.encode('utf-16')) f...

Is there any reason to prefer UTF-16 over UTF-8?

Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16. However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for performance reasons, but couldn't find any information. Anyone knows why these languages...

Character Encoding

My text editor allows me to code in several different character formats Ansi, UTF-8, UTF-8(No BOM), UTF-16LE, and UTF-16BE. What is the difference between them? What is commonly regarded as the best format (I'm using Python if that makes a diffrence)? ...