Given a text file in ubuntu (or debian unix in general), how do I find out the file encoding of the file ? Can I run od or hexdump on it to fingerprint its encoding ? What should I be looking out for ?
...
I trying to handle to following character: ⨝ (http://www.fileformat.info/info/unicode/char/2a1d/index.htm)
If you checking whether an empty string starting with this character, it always returns true, this does not make any sense! Why is that?
// visual studio 2008 hides lines that have this char literally (bug in visual studio?!?) so ...
how can i create regular expression in rails for unicode characters ?
...
What should be used and when ? or is it always better to use UTF-8 always? or ISO-8859-1 still has importance in specific conditions?
Is Character-set related to geographic region?
Edit:
Is there any benefit to put this code @charset "utf-8";
or like this <link type="text/css; charset=utf-8" rel="stylesheet" href=".." />
at the t...
What is the benefit to add @charset "ISO-8859-15"; or @charset "utf-8"; at top in css?
...
I want to execute a mysql query right after connecting to the database to enable utf-8
SET NAMES 'utf-8' COLLATE 'utf8_unicode_ci'
and i want an answer either for a specific model all for the whole application
...
I am using a shared hosting service to host my site so I can't get direct access to PHP configuration or install any extension. So my problem is with utf-8 strings that can't be processed by standard PHP string functions since I don't have mbstring extension installed on the server. I am looking for another way to deal with unicode strin...
An application I am working on stores data in an INI file. The application creates the INI file which in turn will be read by another application we also created. The INI file may also be hand edited.
It is likely sooner or later that the INI file will contain different languages so we were careful to ensure that all data used in thi...
I'm working with UTF-8 strings. I need to get a slice using byte-based indexes, not char-based.
I found references on the web to String#subseq, which is supposed to be like String#[], but for bytes. Alas, it seems not to have made it to 1.9.1.
Now, why would I want to do that? There's a chance I'll end up with an invalid string should ...
If a save a text file with the following character б U+0431, but save it as an ANSI code page file.
Ruby returns ord = 63. Saving the file with UTF-8 as the codepage returns ord = 208, 177
Should I be specifically telling Ruby to handle the input encoded with a certain code page? If so, how do you do this?
...
I am using getenv("HOME") in C to get the user's home directory in order to read/write a settings file. But is it possible that the home directory filename could contain characters that cannot be represented as an 8 bit char? (for example, unicode or UTF-8 encoded)
Does this differ for various varieties of Linux and *BSD?
Thanks in adv...
How can I find the address of a WndProc (of a window of another process). Even if I inject a DLL and try to find it with either GetClassInfoEx() or GetWindowLong() or GetWindowLongPtr() I always get values like 0xffff08ed, which is definitely not an executable address. It is according to MSDN: "... the address of the window procedure, or...
Why does the following occur:
>>> u'\u0308'.encode('mbcs') #UMLAUT
'\xa8'
>>> u'\u041A'.encode('mbcs') #CYRILLIC CAPITAL LETTER KA
'?'
>>>
I have a Python application accepting filenames from the operating system. It works for some international users, but not others.
For example, this unicode filename:
u'\u041a\u0433\u044b\u044...
So matz took the questionable decision to keep upcase and downcase limited to /[A-Z]/i in ruby 1.9.1.
ActiveSupport::Multibyte has long had great i18n case jiggering in ruby 1.8.x via String#mb_chars.
However, when tried under ruby 1.9.1, it doesn't seem to work. Here's a simple test script I wrote, along with the output I'm getting:
...
Is it possible to detect if the client supports a particular Unicode character or if it will be rendered as a missing glyph box?
Important: Support in as many browsers as possible
Not important: Efficiency, speed, or elegance
The only method I can think of trying is using a canvas, so I figured I'd ask before I start going down that r...
I have a feature of my program where the user can upload a csv file, which my program goes through and uses as input. I have one user complaining about a problem where his input is throwing up an error. The error is cause by there being an illegal character that is encoded wrong. The characters is below:
�
Sometimes it appears as a di...
using python 2.5.2 and linux debian i'm trying to get the content from a spanish url that contains a spanish char ('í'):
import urllib
url = u'http://mydomain.es/índice.html'
content = urllib.urlopen(url).read()
I'm getting this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range...
I am confused about Windows BSTR's and WCHAR's, etc. WCHAR is a 16-bit character intended to allow for Unicode characters. What about characters that take more then 16-bits to represent? Some UTF-8 chars require more then that. Is this a limitation of Windows?
Edit: Thanks for all the answers. I think I understand the Unicode aspec...
What CFML will replace � with another character of my choice?
...
Hi!
I would like to display some Arabic text into LabelField in j2me app on BlackBerry device.
Presume that Arabic font is installed on device.
In localization resources, if Arabic locale is used, all text is saved in Unicode sequences. But event if I use such format explicitly, also setting Arabic locale, it's not working:
Locale....