questions about unicode | ansaurus

unicode

short Unicode \N{} names for Latin-1 characters in Python ?

Are there short Unicode u"\N{...}" names for Latin1 characters in Python ? \N{A umlaut} etc. would be nice, \N{LATIN SMALL LETTER A WITH DIAERESIS} etc. is just too long to type every time. (Added:) I use an English keyboard, but occasionally need German letters, as in "Löwenbräu Weißbier". Yes one can cut-paste them singly, L cutpaste ö...

ToAscii/ToUnicode in a keyboard hook destroys dead keys.

It seems that if you call ToAscii() or ToUnicode() while in a global WH_KEYBOARD_LL hook, and a dead-key is pressed, it will be 'destroyed'. For example, say you've configured your input language in Windows as Spanish, and you want to type an accented letter á in a program. Normally, you'd press the single-quote key (the dead key), then...

Converting C-Strings from Local Encoding to UTF8

I'm writing a small App in which i read some text from to console, which is then stored in a classic char* string. As it happens i need to pass it to an lib which only takes UTF-8 encoded Strings. Since the Windows console uses the local Encoding, i need to convert from local encoding to UTF-8. If i'm not mistaken i could use MultiByteTo...

What to do with Unicode non-awareness in PHP < 6?

I'm working on a project which needs to be Unicode aware. PHP provides bunch of useful functions like str_count_words() to calculate the number of words in some input, but they won't work against UTF-8 data in PHP < 6 which is a shame. The same applies to strlen(), strrev(), etc. What should I do about this? PHP 6 is still not even out ...

Unrecognized extra characters in file parsed with php

I've got a csv file I'm parsing with PHP. (Actually, it's tab-separated.) In a text editor, the file looks like this: Object Id Page/Master Id Page/Master Name ... Using this code: $f = file_get_contents($filepath); echo $f; I get this in the browser: ��O�b�j�e�c�t� �I�d� �P�a�g�e�/�M�a�s�t�e�r� �I�d� �P�a�g�e�/�M�a�s�t�e�r� �N�...

How can I guess the encoding of a string in Perl?

I have a Unicode string and dont know what its encoding is. When this string is read by a Perl program, is there a default encoding that Perl will use? If so, how can I find out what it is? I am trying to get rid of non-ASCII characters from the input. I found this on some forum that will do it my $line = encode('ascii', normalize('KD'...

Firefox setting to allow finding accented or other Unicode characters using a non-accented search term?

Howdy, I'm generating UTF-8 encoded web content that includes characters using diacritical marks, typically "accented" characters, e.g. "é". Firefox's Find (find in page) function requires that such characters be typed in order to find them, which makes sense, but makes for a usability problem. This is tricky for users who don't know ...

character-encoding

SendInput() and non-English characters and keyboard layouts.

I'm having trouble simulating character keypresses when a non-English input keyboard language is being used in Windows. I tried calling SendInput() with KEYEVENTF_UNICODE: KEYBDINPUT ki; INPUT input; int character = 0; ki.wVk = 0; ki.wScan = character; ki.dwFlags = KEYEVENTF_UNICODE; ki.time = 0; ki.dwExtraInfo = 0; input.type = IN...

keyboard-events

keyboard-layout

Ruby Unicode Question

I have a small plugin for Rails where I add permalinks to my model without storing them (the permalink) in the database (http://github.com/nhocki/make_permalink). I forked the plugin from a friend and changed the regex, but I don't really know how to make a more friendly and readable regex. I want to remove all the á, é, í (characters ...

ruby-on-rails-plugins

File names with Japanese characters turn to garbage when written to a zip file using java.util.zip.*

I have a directory with a name that contains Japanese characters, and I need to use the zip utils in java.util.zip to write it to a zip file. Writing the zip file succeeds, but when I open the resulting zip file with either Windows' built-in compressed file utility or 7-Zip, the directory with Japanese characters in the name appears as ...

internationalization

how can i escape '\xff\xfe' to a readable string.

i see a string in this code: data[:2] == '\xff\xfe' i don't know what '\xff\xfe' is, so i want to escape it ,but not successful import cgi print cgi.escape('\xff\xfe')#print \xff\xfe how can i get it. thanks ...

What does python print() function actually do?

I was looking at this question and started wondering what does the print actually do. I have never found out how to use string.decode() and string.encode() to get an unicode string "out" in the python interactive shell in the same format as the print does. No matter what I do, I get either UnicodeEncodeError or the escaped string wit...

academic-interest

Accessing Unicode telugu text from Ms-Access Database in Java

I have an MS-Access database ( A English-telugu Dictionary database) which contains a table storing English words and telugu meanings. I am writing a dictionary program in Java which queries the database for a keyword entered by the user and display the telugu meaning. My Program is working fine till I get the data from the database, b...

Unicode Spreadsheet to MySQL

I am trying to get a document from Excel or OpenOffice that contains UTF 16 characters into MySQL, but I can't find the best way to export the document so that phpmyadmin will read it. I can export out of NeoOffice as "unicode" format, but the closest option in MySQL is ucs2. When I try that format, it just spins and thinks. UTF8 doesn't...

how to convert a hexadecimal string to a corresponding integer in c++?

i have a unicode mapping stored in a file. like this line below with tab delimited. a 0B85 0 0B85 second column is a unicode character. i want to convert that to 0x0B85 which is to be stored in int variable. how to do it? ...

Best way to decode hex sequence of unicode characters to string

Hi! I'm working with C# .Net I would like to know how to convert a Unicode form string like "\u1D0EC" (note that it's above "\uFFFF") to it's symbol... "" Thanks For Advance!!! ...

Convert double-byte numbers and spaces in filenames to ASCII

Given a directory of filenames consisting of double-byte/full-width numbers and spaces (along with some half-width numbers and underscores), how can I convert all of the numbers and spaces to single-byte characters? For example, this filename consists of a double-byte number, followed by a double-byte space, followed by some single-byte...

Is there a way to tell if a unicode character is a control, alpha, numeric or symbolic?

Assuming all you have is the binary data and no pre-canned functions, is there a pattern or algorithm to categorize the type of character? ...

language-agnostic

string-manipulation

What happens to Unicode in a System.Data.SQLCommand

Hello all. I have a SQLCommand : "Update Customers Set Name = @name where code = @code" and this code: cmd.Parameters[0].Value = "بهروز";//(some Unicode characters) cmd.Parameters[1].Value = 1; cmd.ExecuteNonQuery(); or this code: UpdateCommand.CommandText = "UPDATE [Customers] SET [Name] = @p1 WHERE (([Code]...

python hebrew input\filesytem format

import os import pprint import subprocess def Convert (dir): curDir = dir pathToBonk = "C:\\Program Files\\BonkEnc\\becmd.exe" #Where the becmd.exe file lives problemFiles = [] #A list of files that failed conversion # for item in os.listdir(curDir): if item.upper().endswith('.M4A'): fullPath = os...

1
...
48
49
50
51
52
...
104