unicode

findstr or grep that autodetects chararacter encoding (UTF-16)

I want to do this: findstr /s /c:some-symbol * or the grep equivalent grep -R some-symbol * but I need the utility to autodetect files encoded in UTF-16 (and friends) and search them appropriately. My files even have the byte-ordering mark FFEE in them so I'm not even looking for heroic autodetection. Any suggestions? Thanks,...

Representing u1F000 as a Java string

I've got a bunch of unicode characters from U1F000 and upwards, and I'm wondering how to represent them in Java. A Java unicode escape is on the form "\uXXXX" and the Java language specification says that "Representing supplementary characters requires two consecutive Unicode escapes". How does that apply to U1F000? String mahjongTile =...

How can I figure out what code page I am looking at ?

I have a device with some documentation on how to send it text. It uses 0x00-0x7F to send 'special' characters like accented characters, euro signs, ... I am guessing they copied an existing code page and made some changes, but I have no idea how to figure out what code page is closest to the one in my documentation. In theory, this s...

Unicode characters in window caption

We're having trouble setting window captions using cyrillic or japanese characters. We either see question marks or random garbage, but not the text we want. We've tried using different encodings, SetWindowText(), SetWindowTextW(), SetWindowTextA(), and so on. We can't even get it to work by passing a string literal to SetWindowText(). ...

Easy UNICODE in C++

Is there a way to make all character sequences UNICODE by default? For instance, now I have to say: std::wstring wstr(L"rofl"); instead, I'd like to say std::wstring wstr("rofl"); Thanks! Visual C++ 8.0 ...

Unicode characters not showing in System.Windows.Forms.TextBox

These characters show fine when I cut-and-paste them here from the VisualStudio debugger, but both in the debugger, and in the TextBox where I am trying to display this text, it just shows squares. 说明\r\n海流受季风影响,3-9 月份其流向主要向北,流速为2 节,有时达3 节;10 月至次年4 月份其流向南至东南方向,流速为2 节。\r\n注意\r\n附近有火山爆发的危险,航行时严加注意\r\n I thought that the TextBox supported...

Why isn't the Byte Order Mark emitted from UTF8Encoding.GetBytes?

The snippet says it all :-) UTF8Encoding enc = new UTF8Encoding(true/*include Byte Order Mark*/); byte[] data = enc.GetBytes("a"); // data has length 1. // I expected the BOM to be included. What's up? ...

How can I properly display German characters in HTML?

My pages contain German characters and I have typed the text in between the HTML tag, but the browser views some characters differently. Do I need to include anything in HTML to properly display German characters? <label> ausgefüllt </label> ...

Display ñ on a C# .NET application

I have a localization issue. One of my industrious coworkers has replaced all the strings throughout our application with constants that are contained in a dictionary. That dictionary gets various strings placed in it once the user selects a language (English by default, but target languages are German, Spanish, French, Portuguese, Man...

Portable and simple unicode string library for C/C++?

I'm looking for a portable and easy-to-use string library for C/C++, which helps me to work with Unicode input/output. In the best case, it will store its strings in memory in UTF-8, and allow me to convert strings from ASCII to UTF-8/UTF-16 and back. I don't need much more besides that (ok, a liberal license won't hurt). I have seen th...

(Encoded) String handling in C++ - questions / best practices?

What are the best practices for handling strings in C++? I'm wondering especially how to handle the following cases: File input/output of text and XML files, which may be written in different encodings. What is the recommended way of handling this, and how to retrieve the values? I guess, a XML node may contain UTF-16 text, and then I ...

How to reverse a Unicode string

It was hinted in a comment to an answer to this question that PHP can not reverse Unicode strings. As for Unicode, it works in PHP because most apps process it as binary. Yes, PHP is 8-bit clean. Try the equivalent of this in PHP: perl -Mutf8 -e 'print scalar reverse("ほげほげ")' You will get garbage, not "げほげほ". – jrockway ...

What is the best way to store UTF-8 strings in memory in C/C++?

Looking at the unicode standard, they recommend to use plain chars for storing UTF-8 encoded strings. Does this work as expected with C++ and the basic std::string, or do cases exist in which the UTF-8 encoding can create problems? For example, when computing the length, it may not be identical to the number of bytes - how is this suppo...

unicode() vs. str.decode() for a utf8 encoded byte string (python 2.x)

Is there any reason to prefer unicode(somestring, 'utf8') as opposed to somestring.decode('utf8')? My only thought is that .decode() is a bound method so python may be able to resolve it more efficiently, but correct me if I'm wrong. ...

Convert from unicode in jsp for JSON's eval function.

Im getting a string (simplified) from the backend that should be : { "menu": "Reallocate:"} However it comes to jsp as: { &amp;#034;menu&amp;#034;: &amp;#034;Reallocate:&amp;#034;} and i cannot pass this to the: var data=eval("(" + src + ")"); as it just doesn't like it.. How can i convert this usable format? I know that: src ...

How can I embed unicode string constants in a source file?

Hi all: I'm writing some unit tests which are going to verify our handling of various resources that use other character sets apart from the normal latin alphabet: Cyrilic, Hebrew etc. The problem I have is that I cannot find a way to embed the expectations in the test source file: here's an example of what I'm trying to do... /// //...

Can i xslt if test = Unicode?

hi guys, i have this block of xslt if-else case and was wondering if there's a way for me to do straight comparison with unicode character? Something along the lines of the code shown below? Or does xslt have some built in function which i can use for this purpose? i.e. change the unicode into html entities and compare via that method?...

How do I reverse Unicode decomposition using Python?

Using Python 2.5, I have some text in stored in a unicode object: Dinis e Isabel, uma difı´cil relac¸a˜o conjugal e polı´tica This appears to be decomposed Unicode. Is there a generic way in Python to reverse the decomposition, so I end up with: Dinis e Isabel, uma difícil relação conjugal e política ...

Using PDF annotations in code

Hi, Does anyone use the annotations functionality of Adobe PDFs remotely? eg accessing them via script or COM? I am having trouble with getting UNICODE info out of a pdf and wondered if anyone had come across similar issues? ...

what's the difference between encode/decode? (python 2.x)

I've never been sure that I understand the difference between str/unicode decode and encode. I know that str().decode() is for when you have a string of bytes that you know has a certain character encoding, given that encoding name it will return a unicode string. I know that unicode().encode() converts unicode chars into a string of b...