unicode

Javascript sorting to match SQL Server sorting

Can anyone point me towards a sorting algorithm in javascript that would sort the same way SQL Server does (for nvarchar/unicode columns)? For reference, my previous question about this behavior can be found here: http://stackoverflow.com/questions/3213717/sql-server-2008-different-sort-orders-on-varchar-vs-nvarchar-values Rather than ...

What is the _snowman param in Rails 3 forms for?

In Rails 3 (currently using Beta 4) I see that when using the form_tag or form_for helpers there is a hidden field named _snowman with the value of ☃ (unicode \x9731) showing up. So, what is this for? ...

How can I change console font?

I have a problem with output Unicode in Windows XP console. (Microsoft Windows XP [Version 5.1.2600]) First code is that(from http://blogs.msdn.com/b/michkap/archive/2008/03/18/8306597.aspx) #include #include #include int main(void) { _setmode(_fileno(stdout), _O_U16TEXT); wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x67...

Python Unicode Encode Error

I'm reading and parsing an Amazon XML file and while the XML file shows a ' , when I try to print it I get the following error: 'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128) From what I've read online thus far, the error is coming from the fact that the XML file is in UTF-8, but Python wants...

Removing HTML tags from a unicode string in Python

Hey all, I have a strong that I scraped from an XML file and It contains some HTML formatting tags (<b>, <i>, etc) Is there a quick and easy way to remove all of these tags from the text? I tried str = str.replace("<b>","") and applied it several times to other tags, but that doesn't work ...

Python: Sanitize a string for unicode?

I have a string that I'm trying to make safe for the unicode() function: >>> s = " foo “bar bar ” weasel" >>> s.encode('utf-8', 'ignore') Traceback (most recent call last): File "<pyshell#8>", line 1, in <module> s.encode('utf-8', 'ignore') UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 5: ordinal not in ran...

How to open a file which has name with unicode symbols.

I created an .exe file and associated .myFile extension to that .exe file. I want to double click on any .myFile file and get that file opened by the .exe. For that I have done the following: int main(int argc, char *argv[]) { QString fileName(QObject::tr(argv[1])); if ( fileName != "" ) { mainWin.loadFile(fileName);...

Unicode in MS Access using ODBC(PHP)

I am trying to insert , update some data into MS Access database using ODBC , that data came from $string=html_entity_decode($unicodedata, ENT_NOQUOTES,'UTF-8')."\n"; that look like this " ബാലകàµà´¯à´·àµà´£à´¨àµ*à´¨à ",actual data needed like this പ്രമോദ്കുമാര്‍, i have worked on this with Mysql , which has connection collatio...

unicode code table combination to support most languages

Hello I just coded the first version of an efficient glyph-to-texture function which takes ranges of unicode characters to store into one or more pov2 textures and am searching for information regarding which code charts are used in which language. I know that the Unicode Consortium gives this per glyph, but that would take really long ...

How to list subfolders of local area network shares if non-latin characters are in computer name?

I have a php script which lists workgroups/domains, computers in workgroups/domains and shares on computers. The apache is running under specific credentials (so not Local System), so it has access on LAN. Listing subfolders of the shares is working partially: only with latin characters. For example if I have a computer named ABC and a...

Python: Using .format() on a Unicode-escaped string

I am using Python 2.6.5. My code requires the use of the "more than or equal to" sign. Here it goes: >>> s = u'\u2265' >>> print s >>> ≥ >>> print "{0}".format(s) Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)`...

Spring MVC: How to store € character?

Hi there, I am using Spring 3 MVC and I have setup a form to capture input from a user. This form includes a textarea for a description String in my model object, Event. My corresponding controller looks like this: @RequestMapping(value = "/admin/event/{eventId}/edit", method = RequestMethod.POST) public String updateEvent(@ModelAtt...

Is there a situation where we should prefer using hex escape sequence over unicode escape sequence or vice versa?

1) Escape sequences are mostly used for characters constants that either have a special meaning (such as “ or \ ) or for characters that can't be represented graphically. Any character literal could be represented using hex ('\xhhhh') or unicode ('\0hhhh') escape sequences. Is there a situation where we should prefer using hex escape seq...

Why does the Java ecosystem use different character encodings throughout their software stack?

For instance class files use CESU-8 (sometimes also called MUTF-8), but internally Java first used UCS-2 and now it uses UTF-16. The specification about valid Java source files says that a minimal conforming Java compiler only has to accept ASCII characters. What's the reason for these choices? Wouldn't it make more sense to use the sam...

How to escape Unicode escapes in Groovy's /pattern/ syntax

The following Groovy commands illustrate my problem. First of all, this works (as seen on lotrepls.appspot.com) as expected (note that \u0061 is 'a'). >>> print "a".matches(/\u0061/) true Now let's say that we want to match \n, using the Unicode escape \u000A. The following, using "pattern" as a string, behaves as expected: >>> pri...

How can I find a single occurrence of a non-latin character using regular expressions?

I am using a regular expression to see if there is a single non-latin character within a string. $latin_check = '/[\x{0030}-\x{007f}]/u'; //This is looking for only latin characters if(preg_match($latin_check, $_POST['full_name'])) { $error = true; } This should be checking to see if there is at least one character pres...

SQL Inserting multilingual data - loses diacritic marks etc

Inserting multilingual data into a SQL 2008 database (nvarchar field) I notice that it seems to lose some special character marks. e.g. INSERT INTO [dbName].[dbo].[tbl_Question_i18n] ([QuestionId] ,[LanguageId] ,[QuestionText]) VALUES (@lastinsertedquestionid ...

Programming Languages that Make Use of Special Characters

I'm working on a general-purpose programming language. In addition to the modern requirement of Unicode support in strings and identifiers, I'm considering supplying alternate spellings of some operators, specifically: Relational ( for <= >= !=) Bitwise and Setwise ( for & |) Logical ( for && || !) Arrows ( for -> =>) I know th...

Webkit GTK Non English Text

I'm working on an app which involves the use of webkitgtk. It works fine except non-english characters. Example, webkitgtk widget does not render the following Russian text correctly. Пишу в English, значит все в порядке. Спасибо! It rather display this, Пишу в English, значит вÑе в порÑдке. СпаÑиÐ...

How do I get STL std::string to work with unicode on windows?

Hello All, At my company we have a cross platform(Linux & Windows) library that contains our own extension of the STL std::string, this class provides all sort of functionality on top of the string; split, format, to/from base64, etc. Recently we were given the requirement of making this string unicode "friendly" basically it needs to s...