views:

241

answers:

3

How can you make the efficient many-to-many -relation from fileID to Words and from word to fileIDs without database -tools like Postgres in Java?

I have the following classes. The relation from fileID to words is cheap, but not the reverse, since I need three for -loops for it.

alt text

My solution is not apparently efficient. Other options may be to create an extra class that have word as an ID with the ArrayList of fileIDs.

Reply to JacobM's answer

The relevant part of MyFile's constructors is:

            /**
             * Synopsis of data in wordToWordConutInFile.txt:
             * fileID|wordID|wordCount
             *
             * Synopsis of the data in the file wordToWordID.txt:
             * word|wordID
             **/        


    /**
     * Getting words by getting first wordIDs from wordToWordCountInFile.txt and then words in wordToWordID.txt.
     */
    InputStream in2 = new FileInputStream("/home/dev/wordToWordCountInFile.txt");
    BufferedReader fi2 = new BufferedReader(new InputStreamReader(in2));

    ArrayList<Integer> wordIDs = new ArrayList<Integer>();
    String line = null;
    while ((line = fi2.readLine()) != null) {
        if ((new Integer(line.split("|")[0]) == currentFileID)) {
            wordIDs.add(new Integer(line.split("|")[6]));
        }
    }
    in2.close();

    // Getting now the words by wordIDs.
    InputStream in3 = new FileInputStream("/home/dev/wordToWordID.txt");
    BufferedReader fi3 = new BufferedReader(new InputStreamReader(in3));

    line = null;
    while ((line = fi3.readLine()) != null) {
        for (Integer wordID : wordIDs) {
            if (wordID == (new Integer(line.split("|")[1]))) {
                this.words.add(new Word(new String(line.split("|")[0]), fileID));
                break;
            }
        }
    }
    in3.close();

    this.words.addAll(words);

The constructor of Word is at the paste.

+1  A: 

Wouldn't a more efficient approach be to assign the link from Word to MyFile at the point that you know the Word is in the File? That is to say, how do you build the list of Words in the MyFile object? If you're reading the words in to the MyFile out of, say, a file on the filesystem, than as you read in each word, you assign its MyFile to the current file.

//within MyFile constructor or setter for Words
while (//there's another word to add) {
   Word newWord = new Word(//read word from file);
   words.add(newWord);
   newWord.setMyFile(this);
}

This is akin to the typical way to manage a bidirectional parent-child relationship:

//in Parent
public void addChild(Child child) {
   myChildren.add(child);
   child.setParent(this);
}

It might help if you show us how you build the MyFile object.

Edited after you added the code that builds the list of Words:

OK, so having seen the code that builds your Words, I don't think setting up the relationship is the source of your inefficiencies. It looks like you are setting up the relationship in exactly the way I suggested (as you add each word, you give that word the fileID of the corresponding file).

It looks like the source of your inefficiencies are that, for each word, you have to match it up with various things that you currently have in a set of files (e.g. WordToWordId). So for every word you have to loop through every line of that file, and find the match. This is certainly inefficient.

The better approach is to have those pairings in memory in a HashMap, initialized at startup. That way, if you have a particular word and need the corresponding ID, or vice versa, you look them up in your HashMap, which is a constant-time operation. Similarly, for each word, you are looping through every file; again, do that loop ONCE, and store the result in a HashMap. Then lookups become constant time.

JacobM
I added the consructors of MyFile to my question.
Masi
OK, I see the constructor, but I still don't see where the list of words gets populated. That's what I'm interested in.
JacobM
Thank you for pointing that out! I added the relevant part of the constructor to my question.
Masi
+1  A: 

Both classes should override hashCode and equals. Thus you will decide what is equal.

Then you will create a set in each of your classes.

public class MyFile implements Comparable<MyFile> {
    //your fields here
    Set<Word> words = new HashSet<Word>(0);
    //Remember to override hashCode and equals
}

public class Word implements Comparable<Word> {
    //your fields here
    Set<MyFile> words = new HashSet<MyFile>(0);
    //Remember to override hashCode and equals
}

In your sets now you will have all the MyFiles.words and otherway around, all the Words.myFile

Shervin
This can also be used in conjuction with what @JacobM said
Shervin
A: 

I think you want that the file know it's words and the words know the files where it is used.

public class File {

private List<Word> words;
public File(){
words=new Vector<Word>();
}

/**
*The method add  word to word list.
**/
public addWord(Word word){
this.words.add(word);
word.addFile(this);
}
}
public class Word{
List<File> files;
public addFile(File file){
this.files.add(file);
}
}

or vice versa... but you should question GRASP Design pattern.Maybe your data type is wrong (I dont say wrong because itis your desing,so i respect).

ibrahimyilmaz
Yes, a typo from my part. I have fixed it now
Shervin