tags:

views:

24

answers:

2

Now I am facing a situation that I'm building a website without mysql or any other databases. After consideration, I choose to use Xml file for configurations and data storage (And actually I have really no choice).

My problem is that if different users read and modify the same xml file at one time, will there be any synchronization problem or other problems? If there are, is there a solution?

For a simple bullet-board system, how is the performance of xml (compared with the same system using mysql)? Any tips for improving the performance?

A: 

An XML file is like any other file in the sense that if two people modify it concurrently, they will clobber each other's data. The usual solution is to use a database that can handle concurrent access, e.g. by splitting the data up into independently modifiable records.

If you really cannot use a database (why?), a workaround would be to try to separate the data into a separate file for each user, so that they won't collide. But if you have data that applies to multiple users that wouldn't work.

You could use an XML database, but that would still be a database. Besides, XML databases are typically optimized for document storage rather than configuration and data storage.

LarsH
A: 

Yes, there will quite possibly be synchronization problems.

Are you deploying on UNIX/Linux? If so, I would do the following. Each time you write a file, create the file with a new temporary filename, in the same directory. Once you're done, rename the file to be the file you want, overwriting any file that's already there.

This has the following consequences:

  • If your program crashes half way through writing an XML file, it doesn't matter, there is just one extra junk file on the disk
  • As the file is in the same directory, it's on the same disk, meaning the rename operation is atomic; it either succeeds (in which case you have the new file there) or fails (then you have the old file there). In either case you have a valid file
  • If two processes try and write the same file at the same time, they will both create different temporary files. One of them will win and on the disk afterwards you will have the version that one of them wanted to write. You lose data (double-update problem) however at least you don't have corrupted data.
  • If another process is in the middle of reading the old file, when it gets overwritten, the read will still continue against the old file as the inode for the file still exists. The OS will delete the file once the last process closes it. Again, the process sees a consistent view of the file.
  • The main problem is the loss of data if two processes write the file at once, or if there is a "load data, process it, write it" procedure which will lose any writes which happen during the "process it" phase. However, this is no different to "SELECT, process, UPDATE" with a database, and is perhaps acceptable.

To get rid of junk files you could do a find command from crontab to find files older than a day and matching the name schema of your temporary files and delete them.

If you really need to lock files (i.e. "read, process, write" should not be interrupted by other processes) then you can use some file-locking API offered by the OS. Writing temporary "lock" files is dangerous, as if your process crashes, then they will stay around forever, and then nothing will be able to write to the file.

Adrian Smith
should I block the other users when one user is writing a file?
Cauly