views:

84

answers:

4

Hi,

What is the smallest code I can use to count the number of occurrences of the newline character in a file with objective-c / cocoa touch?

Thanks!

+1  A: 

This should get you going:

NSString *fileContents = [NSString stringWithContentsOfFile:file encoding:encoding error:&error];
NSUInteger newlineCount = [fileContents numberOfOccurrencesOfString:@"\n"];

@interface NSString ()

- (NSUInteger)numberOfOccurrencesOfString:(NSString *)aString;
- (NSUInteger)numberOfOccurrencesOfChar:(char)aChar;

@end

@implementation NSString ()

- (NSUInteger)numberOfOccurrencesOfString:(NSString *)aString {
    NSRange range = [self rangeOfString:aString];
    NSUInteger length = [self length];
    NSUInteger count = 0;
    while (range.location != NSNotFound) {
        range = [self rangeOfString:aString options:0 range:NSMakeRange(range.location + range.length, length - range.location - range.length)];
        count++;
    }
    return count;
}

- (NSUInteger)numberOfOccurrencesOfChar:(char)aChar {
    const char *cString = [self cStringUsingEncoding:NSUTF8StringEncoding];
    NSUInteger stringLength = strlen(cString);
    NSUInteger count = 0;
    for (int i = 0; i < stringLength; i++) {
        if (cString[i] == aChar) {
            count++;
        }
    }
    return count;
}

@end

While "numberOfOccurrencesOfString:" allocates no additional memory and supports string needles, "numberOfOccurrencesOfChar:" allocates an autoreleased c-string copy of the NSString and searches for a single char. ""

As you were asking for a count of newlines (hence single chars) I figured a quick benchmark might be good for this particular purpose: So I took a test string of length 2486813 containing total of 78312 '\n'. (I basically took a variation of OSX's words file) and… …ran [testString numberOfOccurrencesOfString:@"\n"] 100 times: 19.35s …ran [testString numberOfOccurrencesOfChar:'\n'] 100 times: 6.91s (Setup: 2.2GHz Core 2 Duo MacBook Pro, run on a single thread)

[Edit: small bug fixed; made second snippet into category string method.]

Regexident
Except..... Where does 'testRange' come from?
Fred Whitefield
Oops, fixed. Relict from a version before I tightened the code ;)
Regexident
A: 

You can scan through the string using SubstringWithRange:

Count up the number of times \n appears.

Warren Burton
You mean [string substringWithRange:NSMakeRange(i, 1)] for i = 0...n? with an "isEqualToString:\n"? This would create an autoreleased one-char substring for every single char. Not good and comparably slow. But I guess you probably meant "rangeOfString:" ;) See my answer using it.
Regexident
True. I hadn't considered the memory footprint of substringWithRange: . rangeOfString: will have much the same effect. But I was trying to avoid doing the OPs homework question. Sorry if that's not the case but reading between the lines it seems like it.
Warren Burton
True point. I for one just figured I'd need it in an own project soon anyway, and thus didn't bother too much :P
Regexident
A: 

Both of the other answers are correct, but with the caveat that they require loading the entire file into memory to work.

The way around that is to load the file incrementally using an NSFileHandle. Something like this:

NSFileHandle * file = [NSFileHandle fileHandleForReadingAtPath:pathToFile];
NSUInteger chunkSize = 1024;
NSData * chunk = [file readDataOfLength:chunkSize];
NSUInteger numberOfNewlines = 0;
while ([chunk length] > 0) {
  const unichar * bytes = (const unichar *)[chunk bytes];
  for (int index = 0; index < [chunk length]; ++index) {
    unichar character = (unichar)bytes[index];
    if ([[NSCharacterSet newlineCharacterSet] characterIsMember:character]) {
      numberOfNewlines++;
    }
  }
  chunk = [file readDataOfLength:chunkSize];
}
Dave DeLong
Nice one! Actually I had simply overlooked where Fred was getting the string from. Mine is universally usable for substring counting, but as an answer to the very question of Fred yours is obviously the better choice. The compiler throws an error on "(unichar)bytes[index];" for me though (Error: "Void value not ignored as it ought to be") and for some unknown and creepy reason the number count varies for me on each run and is way too low (by magnitudes). O_o
Regexident
Btw, as [fileHandle readDataOfLength:length] should return an autoreleased NSData (docs don't state an exception, so I have to assume it does, by naming convention) you will eventually end up having just the same amount of data in memory, given that in your code there's currently no NSAutoreleasePool in place within the while loop. correct me if I'm wrong.
Regexident
@Regexident WRT the error: indexing into a `void*` results in `void`, so the answer would be to use a different array type (edited answer). WRT memory: yes, you should probably be creating and draining autorelease pools to keep memory usage down.
Dave DeLong
A: 

smallest you say? That automatically turns this question into code golf

FILE*f=fopen(path,"r");
int i,c;
while(1+c)i+=(c=fgetc(f))==10;
printf("%i",i);

(please don't ever actually use this code)

cobbal