views:

128

answers:

4

I have an application that may receive data via various methods and in various formats. I have pluggable receivers that somehow acquire the data (e. g. by polling a mailbox, listening for HTTP requests, watch the content of a directory etc.), associate it with a MIME type and then pass it on wrapped like this:

public class Transmission {
    private String origin;      // where the data came from
    private String destination; // where the data was sent to
    private String mime;        // the MIME type of the data
    private BLOB data;          // this is what I need an appropriate type for
}

Further down the line, the data is processed by specialized handlers according to the value of the mime field. I'm expecting things like ZIP files, Excel documents, SOAP, generic XML, plain text and more. At this point, the code should be agnostic as to what's in the data. What is an appropriate type for the data field? Object? InputStream? Byte[]?

+4  A: 

I would go with either byte[] or InputStream, preferring the stream since it is more flexible. You can use a ByteInputArrayStream to feed it an array of bytes, if need be. But you can't do it the other way around.

There is also the benefit of memory efficiency, since the stream can handle large chunks of external data without much memory. If you use byte[] you need to load all the data to memory. In other words, the stream is lazy.

Martinho Fernandes
byte[] yes, Byte[] absolutely NO!
Michael Borgwardt
An InputStream can only be read once, generally speaking. Not a good idea in a data model.
skaffman
Where's this stream reading from ? What implementation ?
Brian Agnew
@skaffman: I didn't knew that (not a Java expert). You're right.
Martinho Fernandes
skaffman - surely that's the point of this "Transmission" object - passing it onto a "Transmission Consumer" that consumes it the once. If that means it consumes it into a byte[]/XML/hamsterdancevideo internally before processing it, that's fine.
JeeBee
@Brian: The implementation doesn't matter. It reads stuff from places. That's what's needed/intended.
Martinho Fernandes
@JeeBee - who says it's going to a TransmissionConsumer that consumes it once ?
Brian Agnew
Martinho - It's a good answer when you consider the context of the original question.
JeeBee
@Martinho - surely the implemetation does matter, since you may instantiate the Tranmission object from a datasource, and want to hold it separate from that source. e.g. instantiate and then your db goes down (maybe not - but that's why I think the implementation is important - you'll note I'm not fussed *what* that implementation is)
Brian Agnew
Brian - It is good to use an interface or higher up abstract class when that is sensible. Who cares if it is a ByteArrayInputStream, MemoryInputStream, ZipInputStream, VideoInputStream - only the consumers care, and they're instantiated from the mime type anyway and will throw an exception if the stream type isn't as expected.
JeeBee
@Brian: If you want to hold it separate from the db you can cache it into a byte[] (or something else) and pass ByteArrayInputStream (what I called MemoryStream in the answer). If you don't need to hold it separate, you can still use the InputStream... But you can't do that with the byte[]
Martinho Fernandes
A: 

In your above class I would make it a byte[]. Why not a java.sql.Blob ? So your Transmission object is SQL (or datastore) agnostic.

e.g. you may at some stage want to write it a Javaspace, CouchDb or something else that isn't a SQL database. By storing it as a byte array this info is in it's basic form, and you can translate it as you wish. If your byte[] is really sizable, then your Transmission object can handle caching via disk etc. But I would worry about that later.

EDIT: Reference to SQL made since an old answer (now deleted) recommended java.sql.Blob. Unfortunately once the answer disappears it makes the reference here somewhat anomalous.

Brian Agnew
The original post never mentioned SQL! byte[] is fine, until he needs to handle streams. InputStream is the best option, IMO.
JeeBee
byte[] can be easily wrapped in a ByteArrayInputStream. Store the data, not the stream.
skaffman
What implementation of InputStream ? You need to recommend that unless you're talking about interface design
Brian Agnew
I don't see any need for a stream here. Derive a stream later if you need it, but it seems completely counter-intuitive for a store of bytes
Brian Agnew
Why should I wait half a day for my streaming video to be downloaded into a byte[] for your implementation, when I could be processing it as it comes in?
JeeBee
That's a requirement that wasn't specified (things that were specified - Excel/ZIP/SOAP)
Brian Agnew
So? ZIPs can be big as well. And it says: data agnostic somewhere.
Martinho Fernandes
If in doubt, go for the more flexible option :-). As I said in my first response, byte[] is fine, until his boss asks him to monitor some webcams for movement or something.
JeeBee
It says data-agnostic "further down the line"
Brian Agnew
If in doubt, implement what you have to now, and refactor later as/when you require. Otherwise we'll spend all day debating whether we want to handle latency from our moon probe :-)
Brian Agnew
Maybe the transmission object could have both a byte[] for direct data transmission, and an InputStream for deferred/streamed data transmission. What if we need to talk back to the data creator ...
JeeBee
+7  A: 

Multiple Possibilities:

  • byte[]
    • the most direct way
  • ByteBuffer
    • flexible
    • has random access and bulk operations
    • has operations for duplicating, slicing, etc
    • preferable if IO/Network intensive (NIO)
  • InputStream
    • allows pipelining if done right
    • has no support of random access or bulk operations.
    • Not as flexible as the ByteBuffer.

I would not use Blob, because putting DB-related stuff into our main model seems strange.

dmeister
A: 

Personally, i would use Spring's Resource abstraction. This provides a nicer wrapper around the idea of resource that exists somewhere. It provides methods to retrieve an InputStream for when you want to consume the resource.

The easiest implementation for you might be ByteArrayResource which encapsulates a byte[]. If that gets too big, then later you can switch to something like a FileSystemResource, or a URLResource, or one of the various other implementations provided by Spring. But since you always talk to the Resource interface, your client code shouldn't change too much.

Also, since this is just a set of utility classes and interfaces in the Spring API, you can use Resource and its implementations in isolation, without using anything else from Spring.

skaffman