views:

144

answers:

4

Hi All, I moved from Php to java. I am not much familiar with java power. I have couple of questions.

1 - I am reading a directory and that directory has more than 500 files. I am storing file names and contents in hashmap. Please tell me is it good for speed (performance)? If no which other data structure I can use?

2 - I am using above mentioned hashmap in different classes. When I create above mentioned class object and call that function. I think it again process all the files and store it in hashmap ( kill the speed ). Question: Is there any way I can store hashmap as global ? so I can access its data anywhere in the project without calling that function again and again.

3 - I am using JAMA java library to calculate SVD and it took over 1 hour to perform the task on 500 column and 600 rows matrix. Any guess?

Thanks

P.S: I am using Eclipse

EDITED: Is it good to store all file data in memory? or read all data again and again when you need it? Which one is more better for speed?? By the way I need this data in every class :)

A: 

You'll find that PHP, being a true dynamic language, is more powerful than Java, which is toy-like in its simplicity.

To answer your question:

  1. Do you really need to hold a copy of the files in a hashmap? Can you just iterate through each file name and act on it?

A hashmap is OK, if you need access to the key and the value. But do you really need access to the key? Isn't this just a list of file names? An array may suit you better.

  1. You should avoid globals. At least create a new class to manage reading the directory, storing the file names, and controlling access to it.
ראובן
Yes I want to keep all file names and contents. Because there are some steps. Stopper, stemmer, dictionary creation so I can't read all the directory again and again.I couldn't understand your 2nd answer. Can you please refer an example or tutorial about it on a site?
Can we try and keep the flame-baiting to a minimum?
aperkins
Wow, this actually got an upvote? Does not compute!
Konrad Rudolph
+2  A: 
  1. Hashmap is OK, it's best used when you have lots of reads based on keys, if you simply need the file names and you access them sequentially then ArrayList should suffice and perform better.

  2. Instead of global you can use public static member, but that's not very elegant. In addition if your application is multi threaded then you'll need to synchronize accesses to this variable. It would be better if your objects simply collaborate passing the HashMap (or ArrayList) from the Reader object to the other objects that process it

LiorH
Synchronization is not needed, unless he intend to modify the files. However, that doesn't come back in his functional requirement and still then, using a `Map` cache would then be a bad idea because you need to take much more into account if you want the changes being reflected back in the actual files at the file system.
BalusC
A: 

1: You can do so. The speed is not a concern, it may only be memory hogging. You're basically duplicating the local disk file system into RAM memory (which is obviously faster than a harddisk) You can also consider to only retrieve the data of interest everytime so that you can just keep this information at its original place and that the code is more memory efficient.

2: To have a "global variable" in Java, just declare it public static final. You can use a static initializer block to fill the Map during loading of the class. E.g.

public class Test {
    public static final Map<File, byte[]> FILE_CONTENTS = new HashMap<File, byte[]>();
    static {
        // Fill the map.
    }
}

The above is just an example, assuming that there may be mixed files containing either binary or character data. If it's only files with character data, then you can replace byte[] by String. You can if necessary replace File by String denoting unique file identifier (filename?).

3: No idea as I don't do JAMA.

Once again, it may be memory hogging. Whenever you intend to request some content, just access the file in question directly. Or use a database instead of a local disk file system, so that you can make use of the SQL powers.

BalusC
just a small question. when I will access this hashmap variable from other classes will it just give me the data or will it again go and read all the data from disk?? Also I tried to access this variable from other classes it printed blank hashmap. Where are values?
Just fill it **only once** with help of the static initializer (which I edited in on the last moment).
BalusC
+1  A: 

Since you're new to java, I provide you a full example of a Singleton which loads all file content. If you want to access this by filename the HashMap it would be best choice, if want to iterate through the filenames you could use an ArrayList.

  • Have you considered using a database?

  • See main() how to use this thing.

import java.io.File; import java.util.HashMap;

public class FileStore {

private static FileStore instance;
private static HashMap <String,String> fmap = new  HashMap<String, String>();

public static FileStore getInstance() {
    if ( instance == null ) {
        instance = new FileStore();
    }
    return instance;
}

public static String getFileContent( String fname ) {
    return fmap.get( fname );
}
// force it to be accessed through getInstance()
private FileStore() {
}

public void setDirectory( String path ) {

    File root = new File( path );
    File[] list = root.listFiles();
    for ( File f : list ) {
        // add read file code here
        fmap.put( f.getName(), "content of :" + f.getName());
    }
}

public static void main(String[] args) {
    FileStore.getInstance().setDirectory("c:\\");
    System.out.println( fmap );
    // add this line everywhere in your app where you need access
    FileStore.getInstance().getFileContent("filename");
}

}

stacker
Thanks for your great help and effort. It would be PLUS if you could answer my questions too :)
Just store and check the timestamps of your files to determine whether you need to reread them in getFileContent() best performance guaranteed
stacker
The nullcheck in `getInstance()` is superfluous if you just directly instantiate during declaration.
BalusC
@agazerboy what else do you need to know? 1+2 Q is answered but of course needs some customization and maybe minor optimizations as BalusC mentioned.
stacker