views:

259

answers:

5

I am currently working on a XML-based CMS that saves data in chunks called "items". These can be used on the website to display content.

Now, at the moment I have one separate XML file for every item. Since most pages on that website use about three to four of these items, a rather small website with e.g. 20 pages has about 100 different items. And therefor the same number of xml files in my /xml/items folder.

Would it be preferable to store all that data in one single items.xml file or is my current approach the better one?

Pro Single File - xml/items.xml

  • Less files (maybe starts to become a performance issue when talking about thousands of items on a larger website.)
  • Less disk access (especially in the administration with a list of all items)

Pro Multiple Files - xml/items/*.xml

  • Faster to access one single item since only one small file needs to be parsed
A: 

If you're not simply going the database route, which, to me, feels obvious, I'd suggest several files. The primary reason is that if you use only one file, and update it, your app needs to parse the entire file when displaying a page again which is a bad thing(tm).

Buhb
+3  A: 

I think your current approch is the better of the two alternatives. Given your users use an interface you create to edit the files they will not be searching for files in a directory with many files anyway.

Given what it takes to corrupt a file, an advantage of many files, is you will not get one big hit, but only a hit on a single file. Locking is also better - as one file at a time is locked for writing, instead of the complete 'master XML file'.

Thies
Thank you for your comment on locking. At the moment the CMS works on a very small scale, but adding a locking mechanism to prevent data loss is something to keep in mind!
Jörg
@Jørg - In reference to single XML file. Data loss and locking are two different things. Given you have a large site with 1000's of pages. If someone edits a single page - the complete XML file for all pages will be locked until the write is completed (depending on code and speed may take time). You also get into the fun stuff of versioning - what happens if two people edit two pages at the same time? One file => one persons changes are overwritten.
Thies
Hehe, I was talking about data loss in case two users open and save the same file at the same time. Locking a file will solve that but only works when working with multiple files. Using one large XML file makes that same task much more difficult (as you said, versioning and merging changes, etc.)
Jörg
+1  A: 

Will your user work with the XML files directly or is simply a way to store the data?

If the latter, it is a technical issue and disk access and parsing speed are relevant issues.

If the former, the most important question is what makes most sense for the user. You can then work around the technical issues with caching and such. So assuming the user works directly with the XML files, you have to ask yourself if it's helping or hindering your user to have multiple files or a single file. If each item describes an individual component, and there are few or no relations with other items, I would put them in separate files. If you create a single file with lots of unrelated items, the user will spend much time searching for the relevant item. If you have multiple files, he can use the file name to immediately select the right one.

beetstra
The user does not know it's XML data he is working on. It's just a way to store the data. That is way disk access and parsing speed are mentioned in my pros and cons.
Jörg
+1  A: 

I think it depends on how much memory your server has; how big the XML files are; and what parser you are using. If the server has plenty of memory then I think one XML file would be preferable as it could be cached in memory and then easily parsed. I think this would outweigh the IO overhead of opening/reading many files.

Also, it would be much more maintainable and flexible for future. For instance, should you want to generate a list of all items, or perhaps search them, then this would be very difficult using lots separate XML files. To use a database analogy - if you had common page data in a DB would you create a separate table for each page? Of course not.

Dan Diplo
Actually your comment about search - which I haven't really thought about yet, for some reason - really changed my current position. I will look into searching my data and what advantage one single file may have here.
Jörg
+1  A: 

Many thoughtful responses here already.

Either 1 big file, or many small files, should work just fine. The areas of concern to think about are more likely around administration and maintenance. If its difficult to maintain items because they are in a bunch of different files, then maybe one big file is the answer.

Some thoughts:

  • One big file means that a single mistake (invalid xml) could take down the whole application, while many files would only affect pages using that item(s). Mitigated by not editing data in production.

  • Does each server have its own items file structure? Or are the items located in a single highly available share? The more copies of the data you have laying around, the more likely you'll have data get out of sync on a particular server which might be hard to track down.

  • Whether you choose 1 file, or many files, you can likely solve/abstract any data access (locking, searching, etc) problems in code. The more code you have to write to do things like locking, searching, the more bugs your likely to have to debug.

  • Consider caching items for a period of time to avoid disc access if performance begins to become a problem.

You might want to check out Scott Hanselman's dasBlog blogging engine. I believe it is essentially an xml/text file based content management system that took the many file approach and it might be helpful to review.

Zach Bonham