ansaurus

Question

Extract known pattern substring from NSString (without regex)

Answer 1

+2 A:

I think you should go with your initial instinct. Use RegexKitLite. It's very small and simple to add to the project.

Another option, if this is for iPhone or iPad using iPhone OS 3.2, you can use the new NSRegularExpressionSearch option with -rangeOfCharacterFromSet:options:.

If I weren't going to use regular expressions, however, I would have a series of indexOf, rangeOf and substring calls. It'd probably only be half a dozen lines, but still not as simple and pretty as regular expressions.

Cory Kilger 2010-05-26 05:04:54

Thanks, but surely picking chunks out of strings is such an everyday task that apple expect programmers to be able to do it cleanly without third-party frameworks? I'm guessing Apple don't rely on projects like RegexKit in their own code because (unless the problem is complex) there are already Cocoa ways to do this?

d11wtq 2010-05-26 06:31:11

Actually, I think Apple does use RegExKit in at least some of their shipped code. However, there is a new `NSRegularExpression` class in iPhone OS 4.0 and I don't imagine it will be too long before it turns up in Mac OS X also. I agree that it's a major hole in the framework.

Rob Keniger 2010-05-26 06:58:55

Thanks, since my regex needs are very minimal, I'd rather not wildly go bringing in an extra dependency (this is an open source project I want to keep relatively self-contained). I've been playing around with NSScanner and I'm discovering that this little beast is a lot more powerful than I first thought. I'll post a solution using NSScanner once I've played a bit more. If I start hitting more complex pattern-matching needs I'll definitely bring in an external framework. NSScanner is probably faster in any case.

d11wtq 2010-05-26 08:41:23

Answer 2

+1 A:

If these are HTTP Content-Type headers, technically, the second one is illegal according to my reading of RFC2616. You don't quote character set names. Having said that, you can't control your input and if you are getting them, you need to deal with them.

Anyway, assuming we are talking about HTTP headers, I'd be tempted to write a proper parser even if I did have a regex library to hand. Assuming you want to be a bit lazy, without a regex library or a parser, you need to do something like this:

Strip "Content-Length:".
Use -componentsSeparatedByString: to split at semicolons.

The mime type is first part trimmed of leading and trailing white space.

Now comes the tricky part. Iterate through each of the remaining components.

for the part you are on, make sure the semicolon you split on was not embedded in a string. The easiest way to do this is to count the number of unescaped double quote characters and make sure zero or two. If yuou did split on a quoted semicolon, join the next component back on and repeat
split at the = sign
if the first part is charset (case insensitive) you have found the found the one you are looking for. The second part is the actual character set - strip white spaces and enclosing double quotes.

The above is quite complex and there are probably edge cases it fails on, but then any regular expression you create to do the same will also be complex, have edge case failures, be unreadable and impossible to debug with the Xcode debugger.

JeremyP 2010-05-26 08:36:12

You're right, the quotes should not be there, but unfortunately sometimes they are. I've edited my post and added an example using NSScanner. This feels a little less clumsy. In theory you could lexically analyze a string pretty well with the scanner. Much more long-winded than a quick regex, granted.

d11wtq 2010-05-26 09:42:13

It doesn't look long winded to me. A third of your code just looks like setting up the character sets. NB there is a class method +alphanumericCharacterSet for alpha numerics, so you don't need to construct your own.

JeremyP 2010-05-26 11:12:31

-alphanumericCharacterSet allows unicode with diacritic marks etc which made me decide to create my own, but in hindsight there's never going to be a scenario where it would pose an issue so I should just ditch that, you're right ;)

d11wtq 2010-05-26 15:34:06

ansaurus

tags:

views:

answers:

Extract known pattern substring from NSString (without regex)

related questions