unicode

Delphi 2006 system.delete for widestrings?

Hi all, is there a counterpart of the Delete procedure that could be used for widestrings? Or should I just use copy and concatenate the resulting WideStrings? ...

What is Microsoft using as the data type for Unicode Strings?

I am in the process of learning C++ and came across an article on the MSDN here: http://msdn.microsoft.com/en-us/magazine/dd861344.aspx In the first code example the one line of code which my question relates to is the following: VERIFY(SetWindowText(L"Direct2D Sample")); More specifically that L prefix. I had a little read up, and...

C++ wstring how to assign from NULL-terminated wchar_t array

Most texts on the C++ standard library mention wstring as being the equivalent of string, except parameterized on wchar_t instead of char, and then proceed to demonstrate string only. Well, sometimes, there are some specific quirks, and here is one: I can't seem to assign a wstring from an NULL-terminated array of 16-bit characters. The...

Create Unicode double underline string in C#

I want to use a Unicode character such as '\u033F' and build a continuous double-underline string. This would be used to underline totals in a report. Just using "===========" is not acceptable. How would I do this in C#? Everything I try just leaves me with a single character? Many thanks. ...

How to replace non ascii characters in string?

I have a string that looks like so: 6 918 417 712 The clear cut way to trim this string (as I understand Python) is simply to say the string is in a variable called s, we get: s.replace(' ', '') That should do the trick. But of course it complains that the Non-ASCII character '\xc2' in file blabla.py is not encoded. I never q...

What Unicode character do you use in your website? (instead of image icons)

I am looking for character which could replace image icon, for example like ✘ (xmark) and ✔ (tick), maybe some symbol to "draft" or "new message"? EDIT: Fav: ❤ Draft: ✍ Message: ✉ ...

What makes a good test string for testing web forms for unicode compatibility?

What test text do you try and type into your web forms to check that they handle all the edge cases properly (esp unicode and xss style problems). I am particularly interested in good unicode strings that maybe do something odd if they are mis-encoded when they are displayed again. Text that contains potentially problematic characters...

I need help fixing Broken UTF8 encoding

I am in the process of fixing some bad UTF8 encoding. I am currently using PHP 5 and MySQL In my database I have a few instances of bad encodings that print like: î The database collation is utf8_general_ci PHP is using a proper UTF8 header Notepad++ is set to use UTF8 without BOM database management is handled in phpMyAdmin not al...

NSDirectoryEnumerator and unicode file paths

I am using NSDirectoryEnumerator to get all file names in a particular directory. It works fine until it encounters a Japanese file name. When I print this string (NSString) in gdb it prints a sequence of "?" question mark characters for the unicode part of the file name. If I use fileSystemRepresentationWithPath: to get a c string repre...

How can I reverse a string that contains combining characters in Perl?

I have the the string "re\x{0301}sume\x{0301}" (which prints like this: résumé) and I want to reverse it to "e\x{0301}muse\x{0301}r" (émusér). I can't use Perl's reverse because it treats combining characters like "\x{0301}" as separate characters, so I wind up getting "\x{0301}emus\x{0301}er" ( ́emuśer). How can I reverse the str...

"Unicode Error "unicodeescape" codec can't decode bytes... Cannot open text files in Python 3.

I am using python 3.1, on a windows 7 machines. Russian is the default system language, and utf-8 is the default encoding. Looking at the answer to a previous question, I have attempting using the "codecs" module to give me a little luck. Here's a few examples: >>> g = codecs.open("C:\Users\Eric\Desktop\beeline.txt", "r", encoding="utf...

Is it possible to display superscript characters in the alert() dialog?

Is it possible to display superscripted characters (not only numbers) in the alert(), confirm() or prompt() dialogue boxes in JavaScript? Due to some reasons I need to insert a text: 2 followed by superscripted 'n' 2^n Into JavaScript alert, confirm and prompt boxes. Fast google searching did help but not exactly I found a way to dis...

Check unicode in PHP

How can I check whether a character is a Unicode character or not with PHP? ...

how to insert unicode text to SQL Server from query window

I'm using the following code: INSERT INTO tForeignLanguage ([Name]) VALUES ('Араб') this value inserted like this '????' How do I insert unicode text from the sql management studio query window? ...

String losing data when assigning to TStringList

Hi there. I have this method, var s : TStringList; fVar : string; begin s := TStringList.Create; fVar := ZCompressStr('text'); ShowMessage( IntToStr(length(fVar) * SizeOf(Char)) ); //24 s.text := fVar; ShowMessage( IntToStr( length(s.text) * SizeOf(Char)) ); //18 end; The ZCompressStr is from http://www.base2ti.com/zlib.htm with...

How are unicode allocated for different languages?

It seems the most confusing issue to me. How is the beginning of a new character recognized? How are the codepoints allocated? Let's take Chinese character for example. What range of codepoints are allocated to them, and why is it thus allocated,any reason? EDIT: Plz describe it in your own words,not by citation. Or could you rec...

How to read unicode characters accurately

I have a text file containing what I am told are unicode characters, for example: \320\222\320\21015-25'ish per main or \320\222\320\21020-40'ish per starter Which should read: £15-25'ish per main or £20-40'ish per main starter However, when viewing this text in Firefox, the output is mangled with various unwanted characters. So, ar...

How to get the character from unicode value in PHP?

For example, how to get the character corresponding to 010F? ...

How is transformation of code point to final character implemented in Unicode?

Characters included in BMP as specified by 4 digits, and those characters outside of BMP contains 5 or 6 digits. But my doubt is: how is the finanal character drawed from value of code point? Are the pictures of each character restored in each computer and when displaying just show the matching picture? Or the final glyph is a comput...

What's the complete range for Chinese characters in Unicode?

U+4E00..U+9FFF is part of the complete set,but not all ...