views:

1744

answers:

5

Howdy,

Does anybody recommend a design pattern for taking a binary data file, parsing parts of it into objects and storing the resultant data into a database?

I think a similar pattern could be used for taking an XML or tab-delimited file and parse it into their representative objects.

A common data structure would include:

(Header) (DataElement1) (DataElement1SubData1) (DataElement1SubData2)(DataElement2) (DataElement2SubData1) (DataElement2SubData2) (EOF)

I think a good design would include a way to change out the parsing definition based on the file type or some defined metadata included in the header. So a Factory Pattern would be part of the overall design for the Parser part.

Keith

+1  A: 

The Strategy pattern is maybe one you want to look at. The strategy being the file parsing algorithm.

Then you want a separate strategy for database insertion.

Campbell
A: 

Are you referencing this pattern? Strategy-Pattern

Keith Sirmons
+2  A: 
  1. Just write your file parser, using whatever techniques come to mind
  2. Write lots of unit tests for it to make sure all your edge cases are covered

Once you've done this, you will actually have a reasonable idea of the problem/solution.

Right now you just have theories floating around in your head, most of which will turn out to be misguided.

Step 3: Refactor mercilessly. Your aim should be to delete about half of your code

You'll find that your code at the end will either resemble an existing design pattern, or you'll have created a new one. You'll then be qualified to answer this question :-)

Orion Edwards
Good answer. Speculating on which pattern to use is a not a really good practice.
jop
+1  A: 

I fully agree with Orion Edwards, and it is usually the way I approach the problem; but lately I've been starting to see some patterns(!) to the madness.

For more complex tasks I usually use something like an interpreter (or a strategy) that uses some builder (or factory) to create each part of the data.

For streaming data, the entire parser would look something like an adapter, adapting from a stream object to an object stream (which usually is just a queue).

For your example there would probably be one builder for the complete data structure (from head to EOF) which internally uses builders for the internal data elements (fed by the interpreter). Once the EOF is encountered an object would be emitted.

However, objects created in a switch statement in some factory function is probably the simplest way for many lesser tasks. Also, I like keeping my data-objects immutable as you never know when someone shoves concurrency down your throat :)

Henrik Gustafsson
A: 

Use Lex and YACC. Unless you devote the next ten years exclusively to this subject, they will produce better and faster code every time.

Peter Wone