ansaurus

Question

Answer 1

A:

file.csv contents:

a,b
d,f
g,h

Python script that loads it all into one giant dictionary:

#Python 3.1
giant_dict = {id.strip(): url.strip() for id, url in (line.split(',') for line in open('file.csv', 'r'))}

print(giant_dict)
{'a': 'b', 'd': 'f', 'g': 'h'}

Hamish Grubijan 2010-04-16 23:34:19

Dear lord, why are you parsing it yourself instead of using the CSV module??

moshez 2010-04-16 23:58:26

the problem is that this file will be more than 5GB. so I cannot load it into memory at once!

Hossein 2010-04-17 00:19:22

What exactly are you trying to do? You can read file line by line with this: for line in open('file.csv'). Also, why not just get 9GB or RAM installed?

Hamish Grubijan 2010-04-17 00:22:26

The urls of this larg file should be compare with another large file,and for faster access i need to do some indexing on.

Hossein 2010-04-17 00:27:09

I still do not understand what you are trying to do. What if there is a match? What if there is no match? Describe the whole thing please. Indexing is no silver bullet.

Hamish Grubijan 2010-04-17 00:34:18

@Hossein: Please add new facts to the question -- do not add new facts as comments to an answer.

S.Lott 2010-04-17 00:55:34

Answer 2

+1 A:

PyLucene is very easy to work with, but as you haven't posted your example i am not sure what problem you are facing.

Alternatively when you have only key:value type of data, may be better than Pylucene would be DB like Berkeley DB(python bindings pybsddb). It will work like python dictionary and should be more or as fast as lucene, you can try that.

Anurag Uniyal 2010-04-17 04:42:52

ansaurus

tags:

views:

answers:

Indexing CSV file contents in Python

related questions