ansaurus

Question

How do I match non-ASCII characters with RegexKitLite?

Answer 1

A:

Looks like an encoding problem to me. Either you're saving the source code in an encoding that can't handle that character (like ASCII), or the compiler is using the wrong encoding to read the source files. Going back to the original regex, try creating the subject string like this:

subjectString = @"define_a\xC3\xB1adir";

or this:

subjectString = @"define_a\u00F1adir";

If that works, check the encoding of your source code files and make sure it's the same encoding the compiler expects.

EDIT: I've never worked with the iPhone technology stack, but according to this doc you should be using the stringWithUTF8String method to create the NSString, not the @"" literal syntax. In fact, it says you should never use non-ASCII characters (that is, anything not in the range 0x00..0x7F) in your code; that way you never have to worry about the source file's encoding. That's good advice no matter what language or toolset you're using.

Alan Moore 2009-12-09 01:49:25

Correction: the example I posted does work - I simplified my code to keep it easy to read, but I may have more clues...My source code file .m is UTF8. I check with the unix command `file`. These string values are actually read from HTML files, which are also in UTF8. I have printed out the file contents with NSLog to reveal "xn--define_aadir-hhb" where I expect "define_añadir" to be read from the HTML into subjectString. Where may I check the encoding the compiler expects as you mentioned Alan? Also, not all of my source files I've found are UTF8, some are ASCII. May this be a problem?

ojreadmore 2009-12-09 18:20:06

ASCII is a subset of UTF-8, so every ASCII file is also a UTF-8 file. As for the rest, see my edit.

Alan Moore 2009-12-10 02:57:59

ansaurus

tags:

views:

answers:

How do I match non-ASCII characters with RegexKitLite?

related questions