views:

11264

answers:

7

What is the appropriate way of dealing with large text files in Objective-C? Let's say I need to read each line separately and want to treat each line as an NSString. What is the most efficient way of doing this?

One solution is using the NSString method:

+ (id)stringWithContentsOfFile:(NSString *)path 
      encoding:(NSStringEncoding)enc 
      error:(NSError **)error

and then split the lines with a newline separator, and then iterate over the elements in the array. However, this seems fairly inefficient. Is there no easy way to treat the file as a stream, enumerating over each line, instead of just reading it all in at once? Kinda like Java's java.io.BufferedReader.

A: 

This answer is NOT ObjC but C.

Since ObjC is 'C' based, why not use fgets?

And yes, I'm sure ObjC has it's own method - I'm just not proficient enough yet to know what it is :)

KevinDTimm
If you don't know how to do it in Objective-C, then why say it's not the answer? There are plenty of reasons not to drop down to straight C if you can do it otherwise. For example, C functions handle char* but it takes a lot more work to read something else, such as different encodings. Also, he wants NSString objects. All told, rolling this yourself is not only more code, but also error-prone.
Quinn Taylor
I agree with you 100%, but I have found that (sometimes) it's better to get an answer that works quickly, implement it and then when a more correct alternative appears, utilize that. This is especially important when prototyping, giving the opportunity to get something to work and then progressing from there.
KevinDTimm
I just realized that it began "This answer" not "The answer". Doh! I agree, it's definitely better to have a hack that works than elegant code that doesn't. I didn't downvote you, but throwing out a guess w/o knowing what Objective-C may have probably isn't very helpful, either. Even so, making an effort is always better than someone that knows and doesn't help... ;-)
Quinn Taylor
+2  A: 

You can use NSInputStream which has a basic implementation for file streams. You can read bytes into a buffer (read:maxLength: method). You have to scan the buffer for newlines yourself.

Diederik Hoogenboom
+5  A: 

That's a great question. I think @Diederik has a good answer, although it's unfortunate that Cocoa doesn't have a mechanism for exactly what you want to do.

NSInputStream allows you to read chunks of N bytes (very similar to java.io.BufferedReader), but you have to convert it to an NSString on your own, then scan for newlines (or whatever other delimiter) and save any remaining characters for the next read, or read more characters if a newline hasn't been read yet. (NSFileHandle lets you read an NSData which you can then convert to an NSString, but it's essentially the same process.)

Apple has a Stream Programming Guide that can help fill in the details, and this SO question may help as well if you're going to be dealing with uint8_t* buffers.

If you're going to be reading strings like this frequently (especially in different parts of your program) it would be a good idea to encapsulate this behavior in a class that can handle the details for you, or even subclassing NSInputStream (it's designed to be subclassed) and adding methods that allow you to read exactly what you want.

For the record, I think this would be a nice feature to add, and I'll be filing an enhancement request for something that makes this possible. :-)


Edit: Turns out this has request been around for a few years. There's a Radar dating from 2006 for this (rdar://4742914 for Apple-internal people).

Quinn Taylor
See Dave DeLong's comprehensive approach to this problem here: http://stackoverflow.com/questions/3707427#3711079
Quinn Taylor
+5  A: 

This should do the trick:

#include <stdio.h>

NSString *readLineAsNSString(FILE *file)
{
    char buffer[4096];

    // tune this capacity to your liking -- larger buffer sizes will be faster, but
    // use more memory
    NSMutableString *result = [NSMutableString stringWithCapacity:256];

    // Read up to 4095 non-newline characters, then read and discard the newline
    int charsRead;
    do
    {
        if(fscanf(file, "%4095[^\n]%n%*c", buffer, &charsRead) == 1)
            [result appendFormat:"%s", buffer];
        else
            break;
    } while(charsRead == 4095);

    return result;
}

Use as follows:

FILE *file = fopen("myfile", "r");
// check for NULL
while(!feof(file))
{
    NSString *line = readLineAsNSString(file);
    // do stuff with line; line is autoreleased, so you should NOT release it (unless you also retain it beforehand)
}
fclose(file);

This code reads non-newline characters from the file, up to 4095 at a time. If you have a line that is longer than 4095 characters, it keeps reading until it hits a newline or end-of-file.

Note: I have not tested this code. Please test it before using it.

Adam Rosenfield
just change [result appendFormat:"%s", buffer];to [result appendFormat:@"%s", buffer];
Codezy
A: 

The appropriate way to read text files in Cocoa/Objective-C is documented in Apple's String programming guide. The section for reading and writing files should be just what you're after. PS: What's a "line"? Two sections of a string separated by "\n"? Or "\r"? Or "\r\n"? Or maybe you're actually after paragraphs? The previously mentioned guide also includes a section on splitting a string into lines or paragraphs. (This section is called "Paragraphs and Line Breaks", and is linked to in the left-hand-side menu of the page I pointed to above. Unfortunately this site doesn't allow me to post more than one URL as I'm not a trustworthy user yet.)

To paraphrase Knuth: premature optimisation is the root of all evil. Don't simply assume that "reading the whole file into memory" is slow. Have you benchmarked it? Do you know that it actually reads the whole file into memory? Maybe it simply returns a proxy object and keeps reading behind the scenes as you consume the string? (Disclaimer: I have no idea if NSString actually does this. It conceivably could.) The point is: first go with the documented way of doing things. Then, if benchmarks show that this doesn't have the performance you desire, optimise.

Stig Brautaset
+2  A: 

Mac OS X is Unix, Objective-C is C superset, so you can just use old-school fopen and fgets from <stdio.h>. It's guaranteed to work.

[NSString stringWithUTF8String:buf] will convert C string to NSString. There are also methods for creating strings in other encodings and creating without copying.

porneL
A: 

use this script ..it works great: Let the file is : First: abhi,Last: dhiman First: sanjay,Last: dhiman NSString *path = @"/Users/xxx/Desktop/names.txt"; NSError *error; NSString *stringFromFileAtPath = [NSString stringWithContentsOfFile: path encoding: NSUTF8StringEncoding error:&error]; if (stringFromFileAtPath == nil) { NSLog(@"Error reading file at %@\n%@", path, [error localizedFailureReason]); } NSLog(@"Contents:%@", stringFromFileAtPath);

abhi