views:

163

answers:

5

I'm writing a Cocoa app for Leopard that, given a directory of text files, will scan through them, looking for a search pattern (let's pretend they're source files and I just want to find C comments). It will then present the results to the user.

While I think I could certainly do this with Cocoa, it feels like it's really meant to be handed off to a scripting language. But which would be better for this task and why? I'm thinking with going with Ruby (I own a book on it, but I've never had a good reason to learn it too well), but I'm certainly open to others (Perl obviously springs to mind).

The kind of searching I'll be doing isn't too advanced, but I would like to integrate this into my Cocoa app one way or another.

How should I best approach this?

+3  A: 

One braindead approach: fire off grep -l as an NSTask.

NSTask *task = [[NSTask alloc] init];
[task setLaunchPath: @'/bin/grep'];
NSArray *args = [NSArray arrayWithObjects: @'-I', @'-l', searchString, @'/path/to/textfiles', nil];
[task setArguments: args];
NSPipe *p = [NSPipe pipe];
[task setStandardOutput: p];
NSFileHandle *f = [pipe fileHandleForReading];
[task launch];

then open your file handle, read the contents, and do whatever you like with the results. -I searches only text files, -l hands back only the names of files that matched (each filename will only appear once).

Meredith L. Patterson
Alternatively, use the PCRE library (BSD license), upon which grep is built. Of course, it's somewhat harder to integrate a C library into a program than it is just to fire off a new process.
Adam Rosenfield
PCRE's fairly easy to integrate into Objective-C, at least, since Objective-C is a superset of C. OTOH, I'm a big fan of non-monolithic code, and discretizing separate tasks into separate processes is a major part of that. It's also trivial to run an NSTask in non-blocking mode, so that the user can continue to do stuff with the calling process rather than sitting on his thumbs till the results come back.
Meredith L. Patterson
I'm thinking this might be the best way to go. I don't need to do very complex searching, but I'll need to do a lot of it, quickly. Thanks for the answer.
jbrennan
+1  A: 

There is no "best" language. They all have tradeoffs. Though, if all you're doing is searching for patterns I doubt you'll write anything that is better than grep or awk.

If you're concerned about performance and want to write it yourself your best bet is C. From a scripting language point of view, most will do fine (though likely noticibly slower than doing it in C). Personally I'd recommend Tcl since it does regex very well and it's handling of unicode is for all intents and purposes completely transparent -- much better unicode support than Python for example.

Ruby, python, bash are all fine too, as are many other scripting languages. From an integration point of view, Tcl is very easy to integrate with other apps. Lua is also easy to integrate from what I've heard, though I personally don't see a compelling reason to choose it over Tcl from a technical standpoint. Lua has a lot of mind share these days though, if you want to pick a "hot" technology.

Personally I'd avoid perl. I think it's day has come and gone, though some people still swear by it. I think its syntax is a bit obtuse, and there are reasons it has a reputation of being a "write only" language.

Bryan Oakley
+3  A: 

If the searching isn't too advanced, then just do it yourself:

  • Scan the directory using -[NSFileManager contentsOfDirectoryAtPath:]

  • Read each file into a string using +[NSData dataWithContentsOfFile:] and -[NSString initWithData:encoding:] (contemplate the encoding you need, or just use MacRoman for ASCII searching as you wont care what happens to the high byte characters)/

  • Search each string using -[NSString rangeOfString:] or a variant, or use RegexKit for regular expressions.

I doubt that code would be much harder than maintaining two different chunks of code in two languages and piping data between them.

Peter N Lewis
This would have enormous overhead if the textfiles are non-trivially long. I think the grep approaching would be much faster and have lower overhead.
Dave DeLong
Sure, hence the "If the searching isn't too advanced". The OP said "given a directory of text files" and "The kind of searching I'll be doing isn't too advanced". If the intention is to do advanced searching, big files, lots of files, then a more complex solution would be worthwhile.
Peter N Lewis
+2  A: 

Maybe your example is fictional, but parsing C comments is not something regular expressions excel at. They do a good enough job 90% of the time, but it's easy to think of examples in the 10%:

[myString replaceOccurrencesOfString:@"/*" withString:@"*/"];

There is no comment there, and any claims your regular expression makes to the contrary are wrong.

[myString replaceOccurrencesOfString:@"/*" withString:@"//"]; /*Step 1 of converting winged comments to C99 single-line comments*/

There is a comment there, but it's much shorter than a hastily-built regular expression will think. In fact, there are actually two ways this sample can go wrong.

If you're not actually parsing C comments, then perhaps you can disregard this entire answer. But if you are, you're much better off with an actual parsing system, like lex/yacc or perhaps TDParseKit.

Peter Hosey
Hmmm, good point. This is more of an experiment than anything else. But C comments seem to be "complex enough" to make me feel like I'm actually learning something. Thanks for your input.
jbrennan
A: 

Seems like the spotlight API would be the way to go here. http://developer.apple.com/macosx/spotlight.html

Your searches are already going to be indexed, and the fact that applications can provide plugins means that spotlight is going to be able to dig into files that you otherwise couldn't.

JimDusseau