views:

56

answers:

1

Which datastructure would you use in the place of X to have efficient merges, sorts and additions as described below?

#1 Possible solution: one HashMap to X -datastructure

Having a HashMap pointing from fileID to some datastructure linking word, wordCount and wordID may be a good solution. However, I have not found a way to implement it.


I am not allowed to use Postgres or any similar tool to keep my data neutralized. I want to have efficient merges, sorts and additions according to fileID, wordID or wordCount for the type below.


I have the type Words which has the field fileID that points to a list of words and to relating pieces of information:

The Type

class Words
===================================
fileID: int 
[list of words] : ArrayList
[list of wordCounts] : ArrayList
[list of wordIDs] : ArrayList

Example of the data in

fileID   word   wordCount   wordID
      instance1 of words
1        He     123         1111
1        llo    321         2
      instance2 of words
2        Van    213         666
2        cou    777         932

Example of needed merge

fileID     wordID                 fileID     wordID
1          2
1          3           wordID=2
2          2           ========>  1          2
2          3                      2          2 

I cannot see any usage of set-operations such as intersections here because order is needed.

Having about three HashMaps makes sorting difficult:

  1. from word to wordID in a given fileID
  2. from wordID to fileID
  3. from wordID to wordCount in a given fileID
+3  A: 

Why don't you use a class to store the word, word count and word id together? Then you would need a single list for each fileID. This would IMO at least simplify the operations.

class Word {
    private String text;
    private long count;
    private long id;
    // getters, setters
}

class Words {
    private int fileID;
    private List<Word> words;
    // getters, setters
}

This would be the skeleton, which already automatically resolves your mapping 3. Then you can add the needed additional mappings to Words and/or Word.

I don't understand from your description, whether the same word always have the same wordID, or can it have different IDs in different files; without this I can't move forward with the design idea. But I hope this so far helps you get over the stalemate :-)

Péter Török
This seems to be an excellent answer. However, I need to first make a few experiments for merges, additions and such to verify that its efficient.
Masi
Yes, the same word must always have the same wordID. It cannot have different IDs.
Masi