unicode

What's the difference between unicode and utf8?

Is it true that unicode=utf16 ? UPDATE Many are saying unicode is a standard not an encoding,but most editors support save as Unicode encoding actually. ...

Extended ASCII question

Hi, I read wikipedia but I do not understand whether extended ASCII is still just ASCII and is available on any computer that would run my console application? Also if I understand it correctly, I can write an ASCII char only by using its unicode code in VB or C#. Thank you ...

Convert Ascii 2 Unicode from a font map using Php

I'm trying to convert old text in ascii non-english font to new unicode font. So the keys hav to maped. I have to option. First thing is i have a map file like this sample.map(txtfile) w=ം x=ഃ A=അ B=ആ C=ഇ Cu=ഈ D=ഉ Du=ഊ E=ഋ \p=ഌ F=എ G=ഠsF=ഠH=ഒ Hm=ഓ Hu=ഔ I=ക J=ഖ The code will have to replace all the l...

Javascript ascii to unicode with this code

I'm trying to convert texts written in local ascii non-english font to standard unicode. The problem is that we have to use a map file to map each char to which unicode char. Luckily i found a ready open source piece of code within an firefox addon. It's part of bigger application and i don't understand how I can use it independently. ...

Storing binary data in UTF-8 string

I want to use a WebSocket to transfer binary data, but you can only use WebSockets to transfer UTF-8 strings. Encoding it using base64 is one option, but my understanding is that base64 is most desirable when your text might be converted from one format to another. In this case, I know the data will always be UTF-8, so is there a better...

Replace unicode character

Hi there. I am trying to replace a certain character in a string with another. They are quite obscure latin characters. I want to replace character (hex) 259 with 4d9, so I tried this: str_replace("\x02\x59","\x04\xd9",$string); This didn't work. How do I go about this? ...

JSF Encode UTF - 8 ?

Hello All , now i work with my friend , he is Vietnamese and he want create website with Vietnamese Language, but we have problem with Encode UTF 8 i was write class Filter follow: import java.io.IOException; import javax.servlet.Filter; import javax.servlet.FilterChain; import javax.servlet.FilterConfig; import javax.servlet.ServletExc...

C# console font

Hi, I cannot find out which font the console app uses by default? Is it guaranteed that everyone has that font (when running this .NET app)? Want to display some unicode chars and need to be sure they are present within that font. Thanks ...

How to convert string like "\u0131" , "\u011f"

Hi, I get string in php script. like "A\u011fr\u0131" how to conver it to normal string with umlauts ? ...

How do I write a UTF-8 encoded string to a file in windows, in C++

Hello all, I have a string that may or may not have unicode characters in it, I am trying to write that to a file on windows. Below I have posted a sample bit of code, my problem is that when I fopen and read the values back out windows, they are all being interpreted as UTF-16 characters. char* x = "Fool"; FILE* outFile = fopen( "Se...

What is the correct JNA mapping for UniChar on Mac OS X?

I have a C struct like this: struct HFSUniStr255 { UInt16 length; UniChar unicode[255]; }; I have mapped this in the expected way: public class HFSUniStr255 extends Structure { public UInt16 length; // UInt16 is just an IntegerType with length 2 for convenience. public /*UniChar*/ char[] unicode = new char[255]; ...

String class based on graphemes?

I'm wondering why we don't have some string classes that represent a string of Unicode grapheme clusters instead of code points or characters. It seems to me that in most applications it would be easier for programmers to access components of a grapheme when necessary than to have to organize them from code points, which appears necessa...

Could there be encoding-related problems when storing unicode strings in ini files?

There are already questions regarding unicode and ini files, but many of them are rather domain-specific. So I am not sure if the answer can be applied to the general case. Motivation: I want to use ini files for storing simple data like some numbers and some strings. The strings are provided by users (input via GUI). The software could...

Python Unicode in and out of IDE

Hi, when I run my programs from within Eclipse IDE the following piece of code works perfectly: address_name = self.text_ctrl_address.GetValue().encode('utf-8') self.address_list = [i for i in data if address_name.upper() in i[5].upper().encode('utf-8')] but when running the same piece of code directly with python, I get an "UnicodeD...

Foreign Characters Appearing In Git-Managed Files

I am using git 1.7.2.3 via cygwin on Windows 7 and seeing strange artifacts appearing in some of my source files when switching branches. git status reports everything as unchanged yet they crazy characters are present. I've confirmed on GitHub that the files are as they should be in the repo. My Copy: ਍        ⼀⼀⼀ 㰀猀甀洀洀愀爀礀㸀ഀഀ ...

How can I deal with accented letters, german letters and other characters?

My python script is working now, but I'm having a little trouble: Here is the output: from BeautifulSoup import BeautifulSoup import urllib langCode={ "arabic":"ar", "bulgarian":"bg", "chinese":"zh-CN", "croatian":"hr", "czech":"cs", "danish":"da", "dutch":"nl", "english":"en", "finnish":"fi", "french":"fr", "german":"de",...

Encoding of Process.StartInfo.Arguments

I have a .Net application that fires up a process, passing a long argument list through Process.StartInfo.Arguments. The new process can only handle 8-bit characters in the arguments passed to its main() function. Therefore, I've encoded the string in Process.StartInfo.Arguments so that each character is an 8-bit value. The problem is...

Windows CE uses UTF-16 or UCS-2?

Windows NT only supported UCS-2, then starting with Windows 2000 it started to support UTF-16. But what about Windows CE? It still supports only UCS-2 or the native charset is now UTF-16? ...

Solving the unicode output in Python

Hi, I have written some code which sends queries to google and returns the query results. Apparently the contents which are retrieved are in unicode format, so when I put them in a list for example and print this list (the whole list together and not member by member) an annoying extra 'u' is always behind all of the members in this list...

Recover from using bad code page in C#

I have read string "ńîôč˙" from file by using code page windows-1251, instead of using iso-8859-2. It should be some Cyrillic string. How to implement function that will do following in C#: string res = Recover("ńîôč˙"); string Recover(string input) { ??? } Where res is Cyrillic string that I would have got if I used good page wh...