unicode

UTF Encoding in java

I need to encode a message from request and write it into a file. Currently I am using the URLEncoder.encode() method for encoding. But it is not giving the expected result for special characters in French and Dutch. I have tried using URLEncoder.encode("msg", "UTF-8") also. Example: Original message: Pour gérer votre GSM After encod...

how to set input charset to unicode in VB.net or VC++.net

Hi there fellow programmers.. i am using Web Browser control in VB.net 2005, the application i wrote shows a webpage on my computer which has 2 text areas, one for input, and the other for output. my problem is, i need the charset of the whole program to be unicode, coz the charset of the webpage is utf8. and right now, when i process ...

Clean source code files of invisible characters

I have a bizarre problem: Somewhere in my HTML/PHP code there's a hidden, invisible character that I can't seem to get rid of. By copying it from Firebug and converting it I identified it as  or 'Zero width no-break space'. It shows up as non-empty text node in my website and is causing a serious layout problem. The problem is,...

Comparing wstring with ignoring the case.

I am sure this would have been asked before but couldn't find it. Is there any built in (i.e. either using std::wstring's methods or the algorithms) way to case insensitive comparison the two wstring objects? ...

Using awk to remove the Byte-order mark

Hi, has anyone an idea how an awk script (presumably a one-liner) for removing a BOM would look like? Specification: print every line after the first (NR > 1) for the first line: If it starts with #FE #FF or #FF #FE, remove those and print the rest ...

.NET Stream Decoders behavior

Hello, I've got a process which attempts to decode different encodings of strings from a binary stream. I get some behavior which does not quite add up in my mind when I step through it. Specifically, what I do is: obtain the maximum number of bytes which would be used to encode a character in the given encoding grab the amount of b...

Looking for great character set/encoding resources or tools for PHP webapp development

Hi guys, I've been having a lot of trouble with character sets/encoding while writing a multi-lingual web app in PHP in different places such as the shell, inside PHP itself, and in the database. I want the whole application to be UTF-8 throughout, so that I won't have to worry about converting anything back and forth anymore. Does any...

Python Unicode UnicodeEncodeError

Hi, I am having issues with trying to convert an UTF-8 string to unicode. I get the error. UnicodeEncodeError: 'ascii' codec can't encode characters in position 73-75: ordinal not in range(128) I tried wrapping this in a try/except block but then google was giving me a system administrator error which was one line. Can someone sugges...

Some good Unicode tutorials in C?

Anyone knows of some good Unicode tutorials with examples in C? I have to create a console app (to be run in xterm), with Unicode support, and it has to be on C. :( ...

Problem outputting unicode in Java

I'm trying to write unicode characters (♠) using System.out, and a question mark gets printed instead. I'm using IntelliJ on Windows, and trying to print within the IDE. ...

Should I use mb_* or iconv_* functions for multibyte strings?

Hi there! As we all now, handling multibyte strings is not that easy in PHP. For example I want to get the length of the following string: ä strlen('ä'); // 2, because ä equals 2 bytes mb_strlen('ä', 'UTF-8'); // 1 iconv_strlen('ä', 'UTF-8'); // 1 Which functions should I use? The mb_* or iconv_*? Why? Considering that the encoding ...

RegEx for all letters (including Chinese, Greek, etc.)

I need a regex that also matches Chinese, Greek, Russian, ... letters. What I basically want to do is remove punctuation and numbers. Until now I removed punctuation and numbers "manually" but that does not seem to be very consistent. Another thing I have tried is /[\p{L}]/ but that is not supported by Mozilla (I use this in a Fi...

Matching Unicode control characters except for three with Regular Expressions

Hi, I would need to get a Regular Expression, which matches all Unicode control characters except for carriage return (0x0d), line feed (0x0a) and tabulator (0x09). Currently, my Regular Expression looks like this: /\p{C}/u I just need to define these three exceptions now. ...

CA2W gave me a "'AtlThrowLastWin32': identifier not found" error

I got a strange compilation error when I followed the MSDN document to use CA2W to convert big5 strings to unicode strings in Visual Studio 2005. This is the code I wrote: #include <string> #include <atldef.h> #include <atlconv.h> using namespace std; int _tmain(int argc, _TCHAR* argv[]) { string chineseInBig5 = "\xA4\xA4\xA4\x...

Strategy for supporting unicode & multi language in PHP5

Hi, I have heard that PHP6 will natively support unicode, which will hopefully make multi-language support much easier. However, PHP5 has pretty weak support for unicode and multi-language (i.e. just a bunch of specialized string functions). I was wondering what are your strategies to enable unicode and multi-languaage support in your ...

UTF8 Filenames in PHP and Different Unicode Encodings

I have a file containing Unicode characters on a server running linux. If I SSH into the server and use tab-completion to navigate to the file/folder containing unicode characters I have no problem accessing the file/folder. The problem arises when I try accessing the file via PHP (the function I was accessing the file system from was st...

Objective-C: unichar vs. char

I'm a little confused between a unichar and a char. Can I treat unichar's similar to char's? For example, can I do this: -(BOOL)isNewLine:(unichar)c { if(c == '\n') return YES; else return NO; } ...

Is there a quick and dirty way to Cast PansiChar to Pchar in Delphi 2009

I have a very large number of app to convert to Delphi 2009 and there are a number of external interfaces that return pAnsiChars. Does anyone have a quick and simple way to cast these back to pChars? There is a lot on string to pAnsiChar, but much I can find on the other way around. ...

PHP file uploads - Handling arabic/chinese/japanese filenames

I have a system where a user uploads documents (pdf, word) etc. The problem being, foreign users are uploading filenames in arabic, chinese, japanese and the system, being able to handle them, is adding them to the database. Where the problem arises is trying to download the files using php: $result = mysql_query($query) or die(...

Is there some functionality in/for Delphi that converts a string with html named and numbered entities to unicode text?

I read data from a mysql database that has is filled by php scripts. All special characters are converted to named or numbered html entities (for example & a m p ; & # 2 8 6 ;). I know of no way to convert these characters back to the original ones in Delphi as unicode strings. Did anyone ever find or even create such a function? This wo...