ansaurus

Question

Python: Fastest way to iterate this through a large file

Answer 1

A:

This is more of an observation than a solution, but porting that function to C++ and loading it in with the Python API would get you a lot of speed gain to begin with before loop optimization.

Xorlev 2010-02-23 16:50:10

I don't know C++ at all I'm afraid (embarrassing I know). Any ideas on how you would do it? I imagine 3-dimension arrays alongside binary extraction directly to int16's without bitswapping and all that low-level nastiness isn't too easy?

Duncan Tait 2010-02-23 16:57:41

I don't think it's as bad as you'd think. That being said, I'm having a hard time envisioning your data structure as I'm not the most fluent in Python. With the ifstream::seekg() function, you can grab your numbers in sequence depending on the number of bytes needed and cast+store in vectors.

Xorlev 2010-02-23 17:09:31

@Duncan Tait: You have absolutely no reason to be embarrassed.

John Machin 2010-02-23 21:56:19

Answer 2

+1 A:

I'd try to use as few loops and as much constants as possible. Everything that can be done in a linear fashion should be done so. If values don't change, use constants to reduce lookups and such, because that eats up cpu cycles.

This is from a theoretical point of view ;-)

If possible use highly optimised libraries. I don't exaclty know what you are trying to achieve but i'd rather use an existing FFT-Lib than writing it myself :>

One more thing: http://en.wikipedia.org/wiki/Big_O_notation (can be an eye-opener)

alex 2010-02-23 17:13:33

Answer 3

+3 A:

import numpy as np
def NB2(self, ID_LEN):
    r1=np.fromfile(ReadFile.fid,dTypes.NB_HDR,1)
    num_receivers=r1[0][0]
    num_channels=r1[0][1]
    num_samples=r1[0][5]

    # first, match your array bounds to the way you are walking the file
    blockReturn = np.zeros((num_receivers,num_channels,num_samples))

    for rec in range(0,num_receivers):
        for chl in range(0,num_channels):
            # second, read in all the samples at once if you have enough memory
            r2_iq=np.fromfile(ReadFile.fid,np.int16,2*num_samples)
            r2_iq.shape = (-1,2) # tell numpy that it is an array of two values

            # create dot product vector by squaring data elementwise, and then
            # adding those elements together.  Results is of length num_samples
            r2_iq = r2_iq * r2_iq
            r2_iq = r2_iq[:,0] + r2_iq[:,1]
            # get the distance by performing the square root "into" blockReturn
            np.sqrt(r2_iq, out=blockReturn[rec,chl,:])

    return blockReturn

This should help your performance. Two main ideas in numpy work. First, your result arrays dimensions should match how your loop dimensions are crafted, for memory locality.
Second, Numpy is FAST. I've beaten hand coded C with numpy, simply because it uses LAPack and vector acceleration. However to get that power, you have to let it manipulate more data at a time. That is why your sample loop has been collapsed to read in the full sample for the receiver and channel in one large read. Then use the supreme vector powers of numpy to calculate your magnitude by dot product.

There is a little more optimization to be had in the magnitude calculation, but numpy recycles buffers for you, making it less important than you might think. I hope this helps!

Shane Holloway 2010-02-23 18:26:42

Thanks alot for this in depth answer - I tried out both this and the one below and that was slightly faster.

Duncan Tait 2010-02-24 10:54:53

Answer 4

+1 A:

Most importantly, you shouldn't do file access at the lowest level of a triple nested loop, whether you do this in C or Python. You've got to read in large chunks of data at a time.

So to speed this up, read in large chunks of data at a time, and process that data using numpy indexing (that is, vectorize your code). This is particularly easy in your case since all your data is int32. Just read in big chunks of data, and reshape the data into an array that reflects the (receiver, channel, sample) structure, and then use the appropriate indexing to multiply and add things for Pythagoras, and the 'sum' command to add up the terms in the resulting array.

tom10 2010-02-23 18:32:03

Answer 5

+2 A:

Because you know the length of a block after you read the header, read the whole block at once. Then reshape the array (very fast, only affects metadata) and take use the np.hypot ufunc:

blockData = np.fromfile(ReadFile.fid, np.int16, num_receivers*num_channels*num_samples*2)
blockData = blockData.reshape((num_receivers, num_channes, num_samples, 2))
return np.hypot(blockData[:,:,:,0], blockData[:,:,:,1])

On my machine it runs in 11ms per block.

Ants Aasma 2010-02-23 18:34:25

This is an awesome solution, if you have enough memory to load all the receiver's channels into memory at the same time.

Shane Holloway 2010-02-24 06:08:18

Awesome, literally, thanks!Doing it in under 10ms per block, thats a factor of 10 improvement!

Duncan Tait 2010-02-24 10:55:25

ansaurus

tags:

views:

answers:

Python: Fastest way to iterate this through a large file

related questions