tags:

views:

99

answers:

3

Hi there,

I need to search a pretty large text file for a particular string. Whats the best way to go about doing that?

Thanks

Edit: Should have given more detail, my bad. Its a build log with about 5000 lines of text. Using regex shouldn't cause any problems should it? I'll go ahead and read blocks of lines, and use the simple find. Thanks guys.

+2  A: 

You could do a simple find:

f = open('file.txt', 'r')
lines = f.read()
answer = lines.find('string')

A simple find will be quite a bit quicker than regex if you can get away with it.

JoshD
+1  A: 

If there is no way to tell where the string will be (first half, second half, etc) then there is really no optimized way to do the search other than the builtin "find" function. You could reduce the I/O time and memory consumption by not reading the file all in one shot, but at 4kb blocks (which is usually the size of an hard disk block). This will not make the search faster, unless the string is in the first part of the file, but in all case will reduce memory consumption which might be a good idea if the file is huge.

Michele Balistreri
Depends on how huge. If it's around 1MB, I'd expect this way to be slower than loading the whole thing because of the latency for each read of all 256 blocks. If anything, I'd prefer a larger size of chunk to read each time. Perhaps a test...
JoshD
The latency may indeed be more, but not necessarily, the important is to read a multiple of the physical block size, not to waste read data. Indeed i'd not call a 1mb text file "huge", i was think something along the lines of some hundred of megabytes. I agree with you 100% than if the file is less than say 10 or even 50mb is not worth to read it in chunks.
Michele Balistreri
+5  A: 

If it is "pretty large" file, then access the lines sequentially and don't read the whole file into memory:

with open('largeFile', 'r') as inF:
    for line in inF:
        if 'myString' in line:
            # do_something
eumiro