I'm really tempted to drop RegexKit (or my own libpcre wrapper) into my project in order to do this, but before I do that I want to know how Cocoa developers manage to do half of this basic stuff without really convoluted code or without linking with RegexKit or another regular expression library.
I find it gobsmacking that Cocoa does not include any regular expression matching features. I've so accustomed to using regular expressions for all kinds of things that I'm lost without them. I can do what I need without them, but the code would be rather convoluted. So, Cocoa devs, I ask you, what's the "Cocoa way" to do this...
The problem is an everyday problem in programming as far as I'm concerned. Cocoa must have ways of doing this with the built-in features. Note that the position of the elements I want to match changes, and sometimes "quotes" are present. Whitespace is variable.
Take the following strings:
Content-Type: application/xml; charset=utf-8
Content-Type: text/html; charset="iso-8859-1"
Content-Type: text/plain;
charset=us-ascii
Content-Type: text/plain; name="example.txt"; charset=utf-8
From all of these strings, how would you go about determining the mime type (e.g. text/plain) and the charset (e.g. utf-8) using just the built-in Cocoa classes?
I'd end up performing a series of -rangeOfString:
and substring calls, with conditional checks to deal with the optional quotes etc. Is there a way to do this with NSScanner? The NSScanner class seems to have a pretty naive API to me.
Something like C's sscanf()
that works for NSString objects would be an ideal fit. Most of my string parsing needs are simple such as this example so maybe regular expressions, while I'm accustomed to them, are overkill?
EDIT | The code is a bit long winded but it turns out NSScanner is actually quite easy to work with. It basically walks along your string doing as you tell it. The most annoying part of creating the NSCharacterSet
instances it needs.
- (void)testNSScannerUseCase {
NSString *testString = @"Content-type: application/xml; name=\"test\";\n charset=\"utf-8\"";
unsigned int a = 'a', zero = '0';
// There's probably a quicker way than to make these character sets this way
NSMutableCharacterSet *alphaNumSet = [NSMutableCharacterSet characterSetWithRange:NSMakeRange(a, 26)];
[alphaNumSet addCharactersInRange:NSMakeRange(zero, 10)];
NSMutableCharacterSet *mimeTypeSet = [NSMutableCharacterSet characterSetWithCharactersInString:@"/-"];
[mimeTypeSet formUnionWithCharacterSet:alphaNumSet];
NSMutableCharacterSet *charsetSet = [NSMutableCharacterSet characterSetWithCharactersInString:@"-"];
[charsetSet formUnionWithCharacterSet:alphaNumSet];
// Initialize a case-insensitive scanner
NSScanner *scanner = [NSScanner scannerWithString:testString];
[scanner setCaseSensitive:NO];
// Prepare to capture mime-type
NSString *mimeType = nil;
// Skip past the Content-Type: section
if ([scanner scanUpToString:@":" intoString:NULL] && [scanner scanString:@":" intoString:NULL]) {
[scanner scanCharactersFromSet:mimeTypeSet intoString:&mimeType];
}
GHAssertEqualStrings(@"application/xml", mimeType, @"Mime-type should be application/xml");
// Prepare to look for the charset attribute
NSString *charset = nil;
// Ignore quotes as well as whitespace
[scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:@"\r\n\t \""]];
// Skip past the charset attribute declaration
if ([scanner scanUpToString:@"charset=" intoString:NULL]
&& [scanner scanString:@"charset=" intoString:NULL]) {
[scanner scanCharactersFromSet:charsetSet intoString:&charset];
}
GHAssertEqualStrings(@"utf-8", charset, @"Charset should be utf-8");
}
This could be made a little smarter by using a while loop reading up to ";" then checking to see if it's the attribute I'm scanning for.
I dare say it benchmarks faster than using a regex and that my rather long code can be refactored down to something much smaller.