unicode

What is the current modern term for "Multi-byte Character Set"

I used to be confusing quite a while : http://stackoverflow.com/questions/2384160/confusion-on-unicode-and-multibyte-articles After reading up the comments by all contributors, plus : Looking at an old article (Year 2001) : http://www.hastingsresearch.com/net/04-unicode-limitations.shtml, which talk about unicode : being a 16-bit ...

"Broken" unicode strings encoded in UTF-8?

I have been studying unicode and its Python implementation now for two days, and I think I'm getting a glimpse of what it is about. Just to get confident, I'm asking if my assumptions for my current problems are correct. In Django, forms give me unicode strings which I suspect to be "broken". Unicode strings in Python should be encoded ...

ABAP WebAS Active Codepage

Hello! I need to concatenate different lines in a string. To do so, I need to use CR + LF hexadecimal characters. The problem is that, when I'm using an 8 bit/char environment, I just need to do something like this: constants : c_lf type x value '10'. constants : c_cr type x value '13'. data : g_html type string. concatenate '<htm...

How to convert a unicode string written in font A to font B?

Assuming all the required fonts are available on client's machine. Probably, you all could see following 'stackoverflow' word written in Indic Language script : 'स्टैकओवरफ्लो' . A quick lookup using Web Developers Tools tells that this word is written using 'Arial, Liberation Sans..' font family. Does that mean Arial font supports Indi...

Python Unicode strings and the Python interactive interpreter

I'm trying to understand how python 2.5 deals with unicode strings. Although by now I think I have a good grasp of how I'm supposed to handle them in code, I don't fully understand what's going on behind the scenes, particularly when you type strings at the interpreter's prompt. So python pre 3.0 has two types for strings, namely: str (...

Python: How can I replace full-width characters with half-width characters?

If this was PHP, I would probably do something like this: function no_more_half_widths($string){ $foo = array('1','2','3','4','5','6','7','8','9','10') $bar = array('1','2','3','4','5','6','7','8','9','10') return str_replace($foo, $bar, $string) } I have tried the .translate function in python and it indicates that the arrays a...

Writing to a file in Unicode

I am having some problems writing to a file in unicode inside my c program. I am trying to write a unicode Japanese string to a file. When I go to check the file though it is empty. If I try a non-unicode string it works just fine. What am I doing wrong? setlocale(LC_CTYPE, ""); FILE* f; f = _wfopen(COMMON_FILE_PATH,L"w"); fwprintf(f,L"...

iPhone app rejection for using ICU (Unicode extensions)

I received the following mail form Apple, considering my application: *Thank you for submitting your update to Νομοθεσία to the App Store. During our review of your application we found it is using private APIs, which is in violation of the iPhone Developer Program License Agreement section 3.3.1; "3.3.1 Applications may only use Doc...

Text to a PNG on App Engine (Python)

Note: I am cross-posting this from App Engine group because I got no answers there. As part of my site about Japan, I have a feature where the user can get a large PNG for use as desktop background that shows the user's name in Japanese. After switching my site hosting entirely to App Engine, I removed this particular feature because...

jquery post work with hebrew (unicode) but not with spaces, get not working with hebrew but does spaces

i have tried load and hebrew didn't work for me so i changed my code to $.ajax({ type: "post", url: "process-tb.asp", data: data, success: function(msg) (partial code) not knowing that post and get is the problem for my hebrew querystring. so know i can get my page to get the hebrew and english bu...

Java: How to get Unicode name of a character (or its type category)?

Hello, everyone! The Character class in Java defines methods which check a given char argument for equality with certain Unicode chars or for belonging to some type category. These chars and type categories are named. As stated in given javadoc, examples for named chars are HORIZONTAL TABULATION, FORM FEED, ...; example for named type ...

Tab / LF / CR unicode character

I have a Unicode file (UTF-16 FFFE little-endian BOM) which contains rows of tab-separated fields. Read http://stackoverflow.com/questions/2308112/splitting-unicode-i-think-using-split-in-ruby, I am going to use the Ruby split (file to lines, then line to fields). BTW, what's the Unicode char for: LF CR Tab Thanks! ...

Multilangual Unicode rendering in opengl

Hi Folks, I have to extend an OpenGL-Rendering System to support international characters (especially Hebrew, Arabic and cyrillic). Development Platform is Windows(XP|Vista|7), Alas using Embercardero Delphi 2010. I currently use wglOutLineFont(...) to build my font's display list and glCallLists(length(m_Text), UNSIGNED_SHORT, PWcha...

Encoding problem with preg_replace() and scandir()

Hi, On OS-X (PHP5.2.11) I have a file: siësta.doc (and thousand other with Unicode filenames) and I want to convert the file names to a web-consumable format (a-zA-Z0-9.). If I hardcode the file name above I can do the right conversion: <?php $file = 'siësta.doc'; echo preg_replace("/[^a-zA-Z0-9.]/u", '_', $file); // Output: si_s...

A better way of converting Codepage-1251 in RTF to Unicode

I am trying to parse RTF (via MSEDIT) in various languages, all in Delphi 2010, in order to produce HTML in unicode. Taking Russian/Cyrillic as my starting point I find that the overall document codepage is 1252 (Western) but the Russian parts of the text are identified by the charset of the font (RUSSIAN_CHARSET 204). So far I am: 1...

Delphi 10, .NET, how do I convert a hex UTF-8 string to its unicode character?

Hi all, I am trying to make my web app compatible with international languages and I am stuck with trying to convert escaped characters in my Delphi .NET DLL. The front end code is passing the UTF-8 hex notation with an escape character e.g for お I pass \uE3818A. In my DLL I capture this and constract the following string '$E3818A'. ...

Is there any need for me to use wstring in the following case

Currently, I am developing an app for a China customer. China customer are mostly switch to GB2312 language in their OS encoding. I need to write a text file, which will be encoded using GB2312. I use std::ofstream file I compile my application under MBCS mode, not unicode. I use the following code, to convert CString to std::string, a...

Characters with jquery json

Hi everyone, I'm using jquery $.getJSON to retrieve list of cities. Everything works fine, but I'm from Estonia (probably most of you don't know much about this country =D) and we are using some characters like õ, ü. ä, ö. When I pass letters like this to callback function, I keep getting empty strings. I've tried to base64 encode(serv...

UnicodeDecodeError on attempt to save file through django default filebased backend

When i attempt to add a file with russian symbols in name to the model instance through default instance.file_field.save method, i get an UnicodeDecodeError (ascii decoding error, not in range (128) from the storage backend (stacktrace ended on os.exist). If i write this file through default python file open/write all goes right. All fil...

jQuery :contains(unicode_characters)

I have an element like this: <span class="tool_tip" title="The full title">The ful&#8230;</span> This seems to work: jQuery('span:contains(…)'); But this does not: jQuery('span:contains(&#8230;)'); I am pretty sure that it would be bad to use the first one because if someone else saves the file, or the browser decides to get the...