views:

317

answers:

3

I'm receiving an NSString which uses commas as delimiters, and a backslash as an escape character. I was looking into splitting the string using componentsSeparatedByString, but I found no way to specify the escape character. Is there a built-in way to do this? NSScanner? CFStringTokenizer?

If not, would it be better to split the string at the commas, and then rejoin tokens that were falsely split (after inspecting them for a (non-escaped) escape character at the end) or looping through each character trying to find a comma, and then looking back one character to see if the comma is escaped or not (and then one more character to see if the escape character is escaped).

Now that I think about it, I would need to check that the amount of escape characters before a delimiter is even, because only then is the delimiter itself not being escaped.

If someone has a method that does this, I'd appreciate it if I could take a look at it.

+1  A: 

I think the most straightforward method to do this would be to go through the string character by character as you suggest, appending into new string objects. You can follow two simple rules:

  1. if you find a backslash, ignore but copy the next character (if exists) unconditionally
  2. if you find a comma, end of that section

You could do this manually or use some of the functionality of NSScanner to help you (scanUpToCharactersFromSet:intoString:)

wipolar
Thanks, I was almost done coding what I described in the original post, but I think your solution is quicker and easier to implement.I'm baffled there's no framework function to do this, I would've expected it to be a lot more common.
noroom
A: 

I would prefer to use a regular expression based parser to weed out the escape characters and then possibly doing a split operation (of some type) on the string.

Kangkan
A: 

Okay, (I hope) this is what wipolar suggested. It's the first implementation that works. I've just started with a non-GC-collected language, so please post a comment if you think this code can be improved, especially in the memory-management department.

- (NSArray *) splitUnescapedCharsFrom: (NSString *) str atChar: (char) delim withEscape: (char) esc
{
    NSMutableArray * result = [[NSMutableArray alloc] init];
    NSMutableString * currWord = [[NSMutableString alloc] init];

    for (int i = 0; i < [str length]; i++)
    {
        if ([str characterAtIndex:i] == esc)
        {
            [currWord appendFormat:@"%c", [str characterAtIndex:++i]];
        }
        else if ([str characterAtIndex:i] == delim)
        {
            [result addObject:[NSString stringWithString:currWord]];
            [currWord release];
            currWord = [[NSMutableString alloc] init];
        }
        else
        {
            [currWord appendFormat:@"%c", [str characterAtIndex:i]];
        }
    }

    [result addObject:[NSString stringWithString:currWord]];
    [currWord release];

    return [NSArray arrayWithArray:result];
}
noroom