views:

120

answers:

4

Hello. I have created an example to introduce my question.

public class Algorithm
{
    // This is the best, but circumstances prevent me from doing this.
    /*public static void computeSomething(Data data)
    {
     // Compute some stuff
    }*/

    public static void computeSomething(DataFileReader reader) throws IOException
    {
     // Compute some stuff.
    }

    public static void computeSomething(File file) throws IOException, DataFormatException
    {
     DataFileReader = DataFileReaderFactory.newDataFileReader(file);

     // Compute some stuff.
    }
}

public class DataFileReaderFactory
{
    private enum FileExtension { XML, UNSUPPORTED_EXTENSION }

    private static final String XMLExtension = ".xml";

    public static DataFileReader newDataFileReader(File file) throws DataFormatException
    {
     switch(computeFileExtension(file))
     {
      case XML : return new XMLFileReader(file);

      default : throw new DataFormatException();
     }
    }

    private static FileExtension computeFileExtension(File file)
    {
     if(file.getName().endsWith(XMLExtension))
      return FileExtension.XML;
     else
      return FileExtension.UNSUPPORTED_EXTENSION;
    }
}

So, I would like to know if I should define my interface to take Files, or my own file readers, which ensure that the data is in a valid format. Obviously, I would like to be able to take the data itself as a Data object, but I am limited in this regard. The reason has to do with the data being very large and me having to serialize it for multiple objects. In this case, it is more practical to send a path to the data rather than the data itself.

Anyhow, in regards to the question, I am leaning toward the method which takes an instance of Java's File, as it seems more general, but I want hear your advice. Thanks!

+4  A: 

Use something that allows you to create test programs in memory. E.g. using InputStream instead of File, allow you to write a simple InputStream implementation for a test instead of having to create a file on the file system, put things in it, and remove it when you are done.

If you have an interface for getting Data objects that would in my opinion be the best.

Thorbjørn Ravn Andersen
Unfortunately, streams aren't serializable, which is a necessary requirement for the algorithm. For testing and local execution, I will probably make a version of the algorithm like the one I commented out in the above example, which allows for the data to be passed in directly.
YOUR implementation could be serializable...
Thorbjørn Ravn Andersen
A: 

I agree with the above answer that you should really be using a Data object/interface. When you do your testing, you can create mocks of your data objects to allow for easier testing. Also, if you're reading data from different sources - databases, files, in memory, etc, it may not always be easy to get it into the same stream format (but you could have adapters for each source type that converts it to the correct Data format).

I noticed that your methods are static as well. You might want to consider having instance methods and creating an instance of the algorithm. Instance methods will allow you to store state if needed.

Jeff Storey
A: 

The big question I see here is does your algorithm need to operate on the full dataset all at once, or does it operate on the data set in streamed format ?

If you need that dataset all at once to operate your algorithm (i.e. to randomly navigate back and forth amongst the data elements), then you should keep that first method you have commented out. In your other methods, take the stream and read it into the full data set, then pass that full data set over to your algorithm method. Just because you need a particular interface doesn't mean you have to drop the whole implementation into that one location.

If on the other hand this is an algorithm designed to operate on a stream of data (i.e. a routing algorithm) then keep your junk in that method and operate on the stream like you are supposed to...

Zak
A: 

Given your constraints I would have both the method that uses a File and a method that uses a DataFileReader and have the former call the latter. This is particularly true if you can extend DataFileReader to create an in-memory reader for test.

Kathy Van Stone
At no point in the body of the above method do I actually read in the data. In reality, I create a bunch a tasks which are serialized and passed over a network. It is here that I read in the data. Unfortunately, it is not feasible to serialize a copy of the data for every task that is created.