views:

359

answers:

2

Hey all,

So I'm trying to rip URLs from an NSString using RegExKitLite and I came across an odd problem.

NSLog(@"Array: %@", [message componentsMatchedByRegex:@"^(http://)[-a-zA-Z0-9+&@#/%?=~_()|!:,.;]*"]);

NSString *message is just some text with a URL within it. The strange thing is it doesn't work with the ampersand in it. If I take the ampersand out it works fine, but for obvious reasons I want to keep the ampersand in. I'm also a Regex newb, so don't bash my search expression to much :)

Anyone experience this before with RegExKitLite or RegEx in general in Objective-C?

A: 

I've no experience with RegExKitLite, and never encountered & as special inside a character class, but try putting a \ before it to see if that works?

NSLog(@"Array: %@", [message componentsMatchedByRegex:@"^(http://)[-a-zA-Z0-9+\&@#/%?=~_()|!:,.;]*"]);
Peter Boughton
No I tried that before and it still doesn't work. Plus xcode throws a warning saying it's an unknown escape sequence.
Meroon
+3  A: 

In ICU regular expression character classes, & means intersection. For example @"[[:letter:] & [a-z]]". So it needs to be quoted as Peter suggestion, with a backslash, ie \& in the regular expression. However, \ has a special meaning in C strings, including Objective C strings. So the \ has to itself be quoted with . So you need \& in your pattern. Ie, [-a-zA-Z0-9+\&@#/%?=~_()|!:,.;]

Also, I'm not sure what your intention is with the ^ at the start of the URL. If you want the regex to match anywhere in the string, you should use \b (word break) instead. If you want it to match URLs that are only at the start of the message, then you would only ever get a single match as written. If you want it to match URLs that are at the start of a line, then add (?m) at the start of the regex to turn on multiline matching for ^ (and consider adding $ to the end of the regex).

Peter N Lewis
Meroon