unicode

python-re: How do I match an alpha character.

How can I match an alpha character with a regular expression. I want a character that is in \w but is not in \d. I want it unicode compatible that's why I cannot use [a-zA-Z]. ...

MonoDevelop - Arabic and other Unicode in code editor

When I paste in some upper unicode, or even ansi like العربية I get gibberish in MonoDevelop. I am using the MonoTouch framework. Any idea how to get it to allow me to paste in Arabic, Chinese etc.... ian ...

how do i regex search for weird non-ascii characters in python?

I'm using the following regex basically to search for and delete these characters. invalid_unicode = re.compile(ur'(Û|²|°|±|É|¹|Í)') My source code in ascii encoded, and whenever I try to run the script it spits out: SyntaxError: Non-ASCII character '\xdb' in file ./release.py on line 273, but no encoding declared; see http://www.pyt...

C# ASCII or Unicode

hi im a beginner in programming and network development. i have a question regarding ASCII and Unicode encoding. in msdn and other web examples do the following: byte[] byteData = Encoding.ASCII.GetBytes(data); is this because these code samples are old? shouldn't it be: byte[] byteData = Encoding.Unicode.GetBytes(data); thanks fo...

How to replace old ParamText/StandartAlert with newer CFString replacements?

ParamText() is an really old way of replacing parameters in a string that is based on Pascal strings. Also StandardAlert is not quite Unicode ready. The new message box (not so new) replacement is CFUserNotificationDisplayNotice but this one expects CFString and I found out that if I'm about to switch to using CFString I'm not able to u...

Converting these types of unicode to UTF8 in PHP

Hi, I am trying to convert this in to readable UTF8 text in PHP Tel Aviv-Yafo (Hebrew: \u05ea\u05b5\u05bc\u05dc\u05be\u05d0\u05b8\u05d1\u05b4\u05d9\u05d1-\u05d9\u05b8\u05e4\u05d5\u05b9; Arabic: \u062a\u0644 \u0623\u0628\u064a\u0628\u200e, Tall \u02bcAb\u012bb), usually called Tel Aviv Any ideas on how to do so? Tried several methods...

Get ready for Delphi 2009 and up when developing with Delphi 7?

Hi I'm developing a Word addin in Delphi 7, but soon I'll upgrade it to Delphi 2010, as you know, since version 2009 Delphi introduces the new string type UnicodeString which equals to the keyword string . On the other hand, according to this thread we need to use WideString to communicate with COM. My question is, what should I do in...

What is the default VB6 charset?

Hi, we have an application written in Java which reads some text generated by a VB6 application. The problem is: this VB6 application generate this output using some special characters, like ç,ã,á which we don't know in what charset. So the question is: is there a default charset used by VB6? Which is it? ...

iphone UIWebView Unicode Arabic Html Slow Rendering

Hello Guys, i use the UIWebView to load Arabic Html, using UTF8 Unicode, but the rendering is deadly slow, so is the scrolling. on the contrary when using English Html, everything works more reasonable. any advice on how to render unicode Html on the UIWebView?? Appreciate your Help! Thanks. ...

What encoding are filenames in NTFS stored as?

I'm just getting started on some programming to handle filenames with non-english names on a WinXP system. I've done some recommended reading on unicode and I think I get the basic idea, but some parts are still not very clear to me. Specifically, what encoding (UTF-8, UTF-16LE/BE) are the file names (not the content, but the actual nam...

How do I export this Unicode table as CSV in Perl 5?

I have a table that has some Unicode in it. I know that the Unicode data is fine as it comes out as JSON on our webserver just fine. But for some reason the CSV that I'm generating is ending up mangled. Here's our current code: my $csv = Text::CSV->new ({ eol => "\015\012" }); open my $fh, '>:encoding(utf8)', 'Foo.csv'; my $sth...

Visual C++: Migrating traditional C and C++ string code to a Unicode world

I see that Visual Studio 2008 and later now start off a new solution with the Character Set set to Unicode. My old C++ code deals with only English ASCII text and is full of: Literal strings like "Hello World" char type char * pointers to allocated C strings STL string type Conversions from STL string to C string and vice versa using S...

Enumerating a string by grapheme instead of character

Strings are usually enumerated by character. But, particuarly when working with Unicode and non-English languages, sometimes I need to enumerate a string by grapheme. That is, combining marks and diacritics should be kept with the base character they modify. What is the best way to do this in .Net? Use case: Count the distinct phonetic ...

RegEx: \w - "_" + "-" in UTF-8

I need a regular expression that matches UTF-8 letters and digits, the dash sign (-) but doesn't match underscores (_), I tried these silly attempts without success: ([\w-^_])+ ([\w^_]-?)+ (\w[^_]-?)+ The \w is shorthand for [A-Za-z0-9_], but it also matches UTF-8 chars if I have the u modifier set. Can anyone help me out with this ...

Regex - Unicode Properties Reference and Examples

I feel lost with the Regex Unicode Properties presented by RegexBuddy, I cannot distinguish between any of the Number properties and the Math symbol property only seems to match + but not -, *, /, ^ for instance. Is there any documentation / reference with examples on regular expressions Unicode properties? ...

Search for unicode text inside Windows XP

Is there a way of searching for unicode characters inside a text file under Windows XP? For example suppose I wish to find text documents with the euro symbol. Although the standard XP search allows me to search for the euro symbol it does not produce any matches when I know they should be at least a few. Wingrep has the same issue. I...

Problem with diacritics and mb_substr

I am slicing unicode string with diacritics using mb_substr function but it works as I would use simple substr function. It splits unicode characters in half displaying question marked diamond. E.g. echo mb_substr('ááááá', 0, 5); //Displays áá� What might be wrong? ...

Standard Keyboard Layouts

I'm working on a FOSS project at http://unicode.codeplex.com. In this project we try to collect some information about standard keyboardlayouts. What we want to know is there a place or document or ... which mention what's the Standard Keyboard Layout for exact language. I mean if you are a German or American or Arab or ... , what's t...

Python os.stat and unicode file names

In my Django application, a user has uploaded a file with a unicode character in the name. When I'm downloading files, I'm calling : os.path.exists(media) to test that the file is there. This, in turn, seems to call st = os.stat(path) Which then blows up with the error : UnicodeEncodeError: 'ascii' codec can't encode character u'...

Flex TextArea Unicode chracters with control key

Hi Experts, I am developing a Flex based window application. In that I have used a textArea, Now when I type some characters like ctrl+b, ctrl+e or ctrl+q, it shows some square characters in text area, I think these are some unicode characters but why these are being entered. Unlike in simple textArea control on adobe example when I pr...