unicode

I exported via mysqldump to a file. How do I find out the file encoding of the file ?

Given a text file in ubuntu (or debian unix in general), how do I find out the file encoding of the file ? Can I run od or hexdump on it to fingerprint its encoding ? What should I be looking out for ? ...

string.Empty.StartsWith(((char)10781).ToString()) always returns true?

I trying to handle to following character: ⨝ (http://www.fileformat.info/info/unicode/char/2a1d/index.htm) If you checking whether an empty string starting with this character, it always returns true, this does not make any sense! Why is that? // visual studio 2008 hides lines that have this char literally (bug in visual studio?!?) so ...

unicode regular expression in rails

how can i create regular expression in rails for unicode characters ? ...

ISO-8859-1 vs UTF-8 ?

What should be used and when ? or is it always better to use UTF-8 always? or ISO-8859-1 still has importance in specific conditions? Is Character-set related to geographic region? Edit: Is there any benefit to put this code @charset "utf-8"; or like this <link type="text/css; charset=utf-8" rel="stylesheet" href=".." /> at the t...

What is the benefit to add @charset "ISO-8859-15"; at top in css file?

What is the benefit to add @charset "ISO-8859-15"; or @charset "utf-8"; at top in css? ...

execute a query right after connexion in cakePHP

I want to execute a mysql query right after connecting to the database to enable utf-8 SET NAMES 'utf-8' COLLATE 'utf8_unicode_ci' and i want an answer either for a specific model all for the whole application ...

how can I deal with unicode in PHP without mbstring extension

I am using a shared hosting service to host my site so I can't get direct access to PHP configuration or install any extension. So my problem is with utf-8 strings that can't be processed by standard PHP string functions since I don't have mbstring extension installed on the server. I am looking for another way to deal with unicode strin...

Unicode / Non-Unicode / UTF-8 Problems

An application I am working on stores data in an INI file. The application creates the INI file which in turn will be read by another application we also created. The INI file may also be hand edited. It is likely sooner or later that the INI file will contain different languages so we were careful to ensure that all data used in thi...

ruby 1.9: how do I get a byte-index-based slice of a String?

I'm working with UTF-8 strings. I need to get a slice using byte-based indexes, not char-based. I found references on the web to String#subseq, which is supposed to be like String#[], but for bytes. Alas, it seems not to have made it to 1.9.1. Now, why would I want to do that? There's a chance I'll end up with an invalid string should ...

Does Ruby auto-detect a file's codepage?

If a save a text file with the following character б U+0431, but save it as an ANSI code page file. Ruby returns ord = 63. Saving the file with UTF-8 as the codepage returns ord = 208, 177 Should I be specifically telling Ruby to handle the input encoded with a certain code page? If so, how do you do this? ...

Linux/Unix: Non-ascii characters in home directory?

I am using getenv("HOME") in C to get the user's home directory in order to read/write a settings file. But is it possible that the home directory filename could contain characters that cannot be represented as an 8 bit char? (for example, unicode or UTF-8 encoded) Does this differ for various varieties of Linux and *BSD? Thanks in adv...

Finding WndProc Address

How can I find the address of a WndProc (of a window of another process). Even if I inject a DLL and try to find it with either GetClassInfoEx() or GetWindowLong() or GetWindowLongPtr() I always get values like 0xffff08ed, which is definitely not an executable address. It is according to MSDN: "... the address of the window procedure, or...

Unicode filenames on Windows with Python & subprocess.Popen()

Why does the following occur: >>> u'\u0308'.encode('mbcs') #UMLAUT '\xa8' >>> u'\u041A'.encode('mbcs') #CYRILLIC CAPITAL LETTER KA '?' >>> I have a Python application accepting filenames from the operating system. It works for some international users, but not others. For example, this unicode filename: u'\u041a\u0433\u044b\u044...

Ruby 1.9: how to properly upcase/downcase multibyte strings?

So matz took the questionable decision to keep upcase and downcase limited to /[A-Z]/i in ruby 1.9.1. ActiveSupport::Multibyte has long had great i18n case jiggering in ruby 1.8.x via String#mb_chars. However, when tried under ruby 1.9.1, it doesn't seem to work. Here's a simple test script I wrote, along with the output I'm getting: ...

Detecting individual Unicode character support with JavaScript

Is it possible to detect if the client supports a particular Unicode character or if it will be rendered as a missing glyph box? Important: Support in as many browsers as possible Not important: Efficiency, speed, or elegance The only method I can think of trying is using a canvas, so I figured I'd ask before I start going down that r...

How to completely sanitize a string of illegal characters in python?

I have a feature of my program where the user can upload a csv file, which my program goes through and uses as input. I have one user complaining about a problem where his input is throwing up an error. The error is cause by there being an illegal character that is encoded wrong. The characters is below: � Sometimes it appears as a di...

can't open unicode url with python

using python 2.5.2 and linux debian i'm trying to get the content from a spanish url that contains a spanish char ('í'): import urllib url = u'http://mydomain.es/índice.html' content = urllib.urlopen(url).read() I'm getting this error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range...

Can BSTR's hold characters that take more than 16 bits to represent?

I am confused about Windows BSTR's and WCHAR's, etc. WCHAR is a 16-bit character intended to allow for Unicode characters. What about characters that take more then 16-bits to represent? Some UTF-8 chars require more then that. Is this a limitation of Windows? Edit: Thanks for all the answers. I think I understand the Unicode aspec...

How do I find/replace � (unicode replacement character) in ColdFusion?

What CFML will replace � with another character of my choice? ...

BlackBerry - Unicode text display

Hi! I would like to display some Arabic text into LabelField in j2me app on BlackBerry device. Presume that Arabic font is installed on device. In localization resources, if Arabic locale is used, all text is saved in Unicode sequences. But event if I use such format explicitly, also setting Arabic locale, it's not working: Locale....