ansaurus

Question

Assistance with building an inverted-index

Answer 1

+1 A:

I would use a single file to get and put the serialized string. I would also use json as the serialization.

Put the data

$string = "bad barley base";
$data = explode(" ",$string);
$hashmap["ba"] = $data;

$jsonContent = json_encode($hashmap);
file_put_contents("a-z.txt",$jsonContent);

Get the data

$jsonContent = file_get_contents("a-z.txt");
$hashmap = json_decode($jsonContent);

foreach($hashmap as $firstTwoCharacters => $value) {
    if ($firstTwoCharacters == 'ba') {
        $wordCount = count($value);
    }
}

Brant 2010-04-03 04:09:46

I am working with a 29mb txt file. You don't think a single file containing json_encode($hashmap) would be inefficient

tipu 2010-04-03 05:09:03

You could break up to where each alpha character has it's own file. a.txt, b.txt, c.txt. For searching data yes it would be taxing. You could only write to a-z.txt when an addition happens. It really depends on what you're using the data for?

Brant 2010-04-03 05:46:25

Answer 2

A:

You didn't explain the problem you are trying to solve. I'm guessing you are trying to make a full text search engine, but you don't have document ids in your hashmap so I'm not sure how you are using the hashmap to find matching documents.

Assuming you want a full text search engine, I would look into using a trie for the data structure. You should be able to fit everything in it without it growing too large. Nodes that match a word you want to index would contain the ids of the documents containing that word.

jshen 2010-04-09 17:13:37

You're absolutely correct in assuming I'm making a full text search engine. I'm taking a look at the trie data structure at the moment and this is so much more efficient than what I'm doing at the moment (which is what I described above). I'm looking to implement this now, thanks!

tipu 2010-04-10 09:36:02

go here http://www.ics.uci.edu/~chenli/pubs.html and look at the paper titled Efficient Interactive Fuzzy Keyword Search

jshen 2010-04-12 16:14:33

ansaurus

tags:

views:

answers:

Assistance with building an inverted-index

related questions