ansaurus

Question

Answer 1

+2 A:

At present I am storing the dictionary using python pickle module and loading from it but it brings the whole of index into memory at once (or does it?).

Yes it does bring it all in.

Is that a problem? If it's not an actual problem, then stick with it.

If it's a problem, what kind of problem do you have? Too slow? Too fast? Too colorful? Too much memory used? What problem do you have?

S.Lott 2010-09-10 19:45:12

Thank you all for your kind support ... I am supposed to build a text search engine that uses inverted index. So far, I have build a full inverted index. So, the problem I think I may have, is that as index size grows, then bringing it all in would probably consume too much memory(???) .. at present I am only working on a prototype with reduced functionality, so index size is trivial ... but when its finished, it would probably be a large file .... that's the problem that I have. –

Siddharth Sharma 2010-09-13 14:20:35

@Siddharth Sharma: "consume too much memory(???)". If you don't know, then don't start by trying optimize it. First -- build using a simple dictionary until you can **prove** that it uses too much memory. Then -- and only after you have **proof** -- switch to shelve. Later, you'll have to switch to a bloom filter. But only after you can **prove** that shelve is too slow.

S.Lott 2010-09-13 18:11:36

@ S.Lott : thanx ... yeah Knuth said, "Early optimization is root of all evil" .... will try keeping it in mind.

Siddharth Sharma 2010-09-14 10:22:37

@Siddharth Sharma: Keep your goals in mind. (1) Works. (2) Optimal use of resources. You can't do #2 until you've done #1.

S.Lott 2010-09-14 13:38:56

Answer 2

A:

Just store it in a string like this:

<entry1>,<entry2>,<entry3>,...,<entryN>

If <entry*> contains ',' character, use some other delimiter like '\t'. This is smaller in size than an equivalent pickled string.

If you want to load it, just do:

L = s.split(delimiter)

OTZ 2010-09-10 21:01:47

Answer 3

A:

You could store the repr() of the dictionary and use that to re-create it.

ikanobori 2010-09-10 21:40:56

That'll be space-inefficient. My solution takes less space.

OTZ 2010-09-10 23:08:43

Why does space matter? What problem does the original question actually have? Time? Space? Licensing fees for third party software? There's no hint as to what to optimize.

S.Lott 2010-09-11 00:29:25

Answer 4

A:

If it's taking a long time to load or using too much memory, you might need a database. There are many you might use; I would probably start with SQLite. Then your problem is "reduced" ;-) to simply formulating the right query to get what you need out of the database. This way you will only load what you need.

kindall 2010-09-10 22:36:36

Answer 5

A:

I would use Lucene. Why reinvent the wheel?

Jay Askren 2010-09-14 03:24:54

ansaurus

tags:

views:

answers:

Storing an inverted index

related questions