views:

72

answers:

2

Suppose I know a text file format,

say, each line contains 4 fields like this:

firstword secondword thirdword fourthword
firstword2 secondword2 thirdword2 fourthword2
...

and I need to read it fully into memory

I can use this approach:

open a text file
while not EOF
  read line by line
  split each line by a space
  create a new object with four fields extracted from each line
  add this object to a Set

Ok, but is there anything better, a special 3-rd party Java library?

So that we could define the structure of each text line beforehand and parse the file with some function

thirdpartylib.setInputTextFileFormat("format.xml");
thirdpartylib.parse(Set, "pathToFile")

?

+1  A: 

If you know definitively what the separator will be then your suggested aproach will be fast and reliable and have very little code overhead. The upside with a 3rd party library (google "java text file library" for a long list) is that it is likely to have a bunch of code to handle odd cases that the authors care about. The downside is that it will probably be more code than you need if you have a simple and reliable text file format you are handling.

The upside of doing this yourself is that you can tune the code to exactly your requirements, including scalability issues which may well be a consideration if you have a lot of data. Quite often 3rd party libraries will make a full read of the file which may not be practical if you have, say, several million rows.

My recommendation would be to spend an hour or so writing your own and see where you get. You may crack it with very little effort. If it turns out you have a complex problem to solve with different special issues around data format, then start looking for a library.

Simon
+1  A: 

You can do it like this:

// Assuming a Reader called in and a Set called mySet

String line = in.readLine();
while(line != null)
{
  String[] splat = line.split(" ");
  mySet.add(new Widget(splat[0], splat[1], splat[2], splat[3]));
  line = in.readLine();
}

But you really need to better define what you mean by 'better'. The above approach will not behave nicely with 'bad' input but it will be pretty fast (it really depends on the implementation of the Set. If you're constantly resizing it you may incur a performance penalty).

Using XML and defining a schema will allow you to validate the input before parsing and will probably streamline object creation but you won't be able to just have four strings on each line (you'll need XML tags, etc.). See XMLBeans for an example third party library.

Catchwa