tags:

views:

97

answers:

2

Here is an example of my input csv file:

...
0.7,0.5,0.35,14.4,0.521838919218

0.7,0.5,0.35,14.4,0.521893472678

0.7,0.5,0.35,14.4,0.521948026139

0.7,0.5,0.35,14.4,0.522002579599
...

I need to select the top row where the last float > random number. My current implementation is very slow (script has a lot of iterations of this and outer cycles):

for line in foo:
   if float(line[-1]) > random.random():
      res = line
      break
...

How can I make this better and faster?

EDIT:

I was advised to use bisect for this task, but I don't know how to do it.

+3  A: 

The fastest approach is to use bisect (assuming the float list is ordered). You can do it like this:

import bisect

float_list = [line[-1] for line in foo]
index = bisect.bisect(float_list, random.random())
if index < len(float_list)
    result = foo[index]
else:
    result = None # None exists

The float list has to be ordered for this to work.

Nadia Alramli
+1  A: 

You might actually be able to use the appropriate SQL command if you import the CSV file into SQLite. Python has a built-in sqlite library you can use to query the database.

Jason Baker