Hello, I am working on a project on Info Retrieval. I have made a Full Inverted Index using Hadoop/Python. Hadoop outputs the index as (word,documentlist) pairs which are written on the file. For a quick access, I have created a dictionary(hashtable) using the above file. My question is, how do I store such an index on disk that also has quick access time. At present I am storing the dictionary using python pickle module and loading from it but it brings the whole of index into memory at once (or does it?). Please suggest an efficient way of storing and searching through the index.
My dictionary structure is as follows (using nested dictionaries)
{word : {doc1:[locations], doc2:[locations], ....}}
so that I can get the documents containing a word by dictionary[word].keys() ... and so on.