views:

482

answers:

7

I have an application that imports information from a CSV file or from a database and exports it to XML. This XML is currently being persisted to a file. However due to project needs I have decided it may be better to persist this XML to a database.

Currently I have CSV, XML and SQL repositories that deal with importing/exporting data. The XML repository persists the passed in object to a file. It currently is where the mapping of the object to XML is stored, hence it is the only place that knows about this structure (likewise, the other repositories for their respective structures).

Now that I want to store the XML in the database I am beginning to question this architecture. In order to do an insert into the database, the structure of the XML must be accessible from the SQL repository (n.b. data in other columns can be inserted into the DB along with the XML). This leads me to wonder if the XML representation should be stored in the object itself, or in a service layer or somewhere else.

What are the best ways to implement a solution to this problem?

UPDATE: A clarification to my question. The XML repository currently persists to a file. It seems to me that this is the wrong level to keep the knowledge of the structure of the XML at, as then I don't have flexibility in persisting the XML representation to whatever medium I desire. Is it poor design to let the object have knowledge of it's XML representation (or CSV representation, etc)? Should the knowledge of that structure be kept at another level, and what level would that be?

A: 

If you want the SQL repository to be able to use the existing xml layer, then I'm sure some kind of interface-based abstraction (provider / IoC / DI / etc) is possible, so that the SQL repository can consume the xml stack without having to know about it explicitly.

Of course, if the object model itself defined the xml structure (via the attributes for XmlSerializer etc), then this is simpler, and the xml repo becomes fairly trivial.

As for knowing the structure inside the db: it depends whether you need to query the data inside the database. If so, there is a lot you can do in SQL Server - for example, you can bind the xml column to an xsd, or you can promote portions of the xml (via udf/xquery) to persisted, indexed columns. However, if you are just using the database for storage, then this isn't necessary - and indeed, using xsd in the database is a major PITA if you need to change the xsd at a later date (it isn't trivial).

Marc Gravell
Thanks for the response, Marc. I updated the question, as I may not have been clear what I was trying to work out.
Laz
A: 

You could also try a native XML database, such as eXist, which would fit ok for your needs of XML storing and retrieving.

Update: Another solution could be using XML capabilities of your DBMS. Just almost any modern DBMS support XML types, so the best approach should go this way (using XML-types for columns where you would store data).

Update 2: If what you need is just to store structured information (you import from CSV and store it in a single XML file, if I am not wrong), why not using a plain old RDMBS? I find no advantage in storing data in a XML file that can grow without control. What happens when you want to retrieve back some data (summarized or not) from that file? The bigger it becomes the more time and computer resources will it take to have the job done. If you use SAX you would need to proccess the entire file, so accessing data at the end of the file would take more time than to access the start of it. Using DOM would be just worse, because the bigger the file it becomes, the more memory you need to process it.

On the other hand if I understood you wrong and what you need is to keep many XML files, and every XML file makes sense by itself as a piece of information (you must deliver XML files or pieces of an XML file as result) I would go the native XML database way, since it is the most natural way of storing and accessing this data.

Fernando Miguélez
Thanks for the response Fernando. I updated the question as I may have been unclear as to my problem.
Laz
+1  A: 

Wouldn't you just change your XML repository to have a SQL backend. It really shouldn't matter what yopur repositories do they are just meant to be 'black boxes' for the storage and retrieval of your data. Keep you secret XML knowledge in the XML repository and just backend it with SQL.

You could chain it to your SQL repository or keep it seperate.

Brody
Thanks for the suggestion Brody. However, I want insert a row containing other information besides the XML. Letting the XML repository persist to the database, means I would have to insert the other information with the SQL repository, and then do an update of 1 column in the XML repository.
Laz
A: 

Persistence could be considered a cross cutting concern meaning that the object should not care how the persistence is carried out only that it should be carried out; for example you could define a PersistentStoreFactory that creates objects that implement a PersistentStore interface that has a single Generic Persist method.

You could then decorate the the class with a Persist attribute that would use the factory to create the PersistentStore and then call the Persist method passing in the object instance. If you decided in the future that you wanted different persistence mechanisms for different types of classes you could extend your attribute to have an argument of type string to expose the PersistanceMedia ie Database, FileSystem, Cloud, URL etc

Michael Mann
A: 

If the structure of the XML is constant, you can see my blog post on Easy Xml Serialization. It allows to write XML but mainly to read it. Once all your classes are created for your XML, you can easily access your objects through the code and make/run batch script directly from your code while ensuring the data is valid and respect a certain schema.

Of course, the best way is still to do CSV > Database directly with some DTO if it's possible.

Maxim
A: 

It does seem to me that the processes of convert-object-to-text-format and store-text-somewhere could and should be separable. Which of those things would be "primary" I guess would depend on how exactly your application is structured. That is, whether you

  1. Tell the object, "go store yourself," and the configuration of the application determines that that means X format, and Y storage.
  2. Tell the object, "go store yourself in the database," and the database storage asks the object for its text format (or passes the object to a service layer that returns that text).
  3. Tell the object, "go store your XML," and then that serialization process knows which storage mechanism to choose.

When I say "configuration," that could be as simple as something in the app settings, or as complicated/robust as an IoC container instantiating particular components when the app launches.

You should be able to encapsulate the serialization and storage processes such that changing your mind about how they get wired together, or adding new options, doesn't require you to rewrite too much of your application.

Sixten Otto
+1  A: 

While many databases support storing XML directly I think you would be in a world of hurt (from experience) trying to do this. If the schema changes then you need to go and update each record to the new schema.

I would suggest, if you are going to store it in a DB, then store the data in a table which is easier to modify and manage.

Also, you should question why you are storing the data in a database. If you aren't going to be querying the data then storing in a flat file is going to be just as good.

I'm not saying what you are doing is wrong, but think long and hard about your requirements.

marshall