tags:

views:

191

answers:

4

I was wondering how i should go about writing an XML data layer for a fairly simple php web site. The reasons for this are:

  1. db server is not available.
  2. Simple data schema that can be expressed in xml.
  3. I like the idea of having a self contained app, without server dependencies.
  4. I would possibly want to abstract it to a small framework for reuse in other projects.

The schema resembles a simple book catalog with a few lookup tables plus i18n. So, it is quite simple to express.

The size of the main xml file is in the range of 100kb to 15mb. But it could grow at some point to ~100mb.

I am actually considering extending my model classes to handle xml data. Currently I fetch data with a combination of XMLReader and SimpleXml, like this:

public function find($xpath){            

    while($this->xml_reader->read()){

        if($this->xml_reader->nodeType===XMLREADER::ELEMENT && 
           $this->xml_reader->localName == 'book' ){


            $node = $this->xml_reader->expand();
            $dom = new DOMDocument();
            $n = $dom->importNode($node, true);
            $dom->appendChild($n);
            $sx = simplexml_import_dom($n); 


            // xpath returns an array

            $res = $sx->xpath($xpath);

            if(isset($res[0]) && $res[0]){

                $this->results[] = $res;                        
            }
    }

    return $this->results;
}

So, instead of loading the whole xml file in memory, I create a SimpleXml object for each section and run an xpath query on that object. The function returns an array of SimpleXml objects. For conservative search I would probably break on first found item.

The questions i have to ask are:

  1. Would you consider this as a viable solution, even for a medium to large data store?
  2. Are there any considerations/patterns to keep in mind, when handling XML in PHP?
  3. Does the above code scale for large files (100mb)?
  4. Can inserts and updates in large xml files be handled in a low overhead manner?
  5. Would you suggest an alternative data format as a better option?
+1  A: 

I would go with SQLite instead, which is perfect for small websites and x-copy style deployments.

XML-based data storage won't scale well.

"SQLite is an ACID-compliant embedded relational database management system contained in a relatively small (~225 kB) C programming library. The source code for SQLite is in the public domain.

Unlike client-server database management systems, the SQLite engine is not a standalone process with which the program communicates. Instead, the SQLite library is linked in and thus becomes an integral part of the program. It can also be called dynamically. The program uses SQLite's functionality through simple function calls, which reduces latency in database access as function calls within a single process are more efficient than inter-process communication. The entire database (definitions, tables, indices, and the data itself) is stored as a single cross-platform file on a host machine. This simple design is achieved by locking the entire database file at the beginning of a transaction."

Koistya Navin
Locking the entire database is not scalable at all, XML files can be locked or not depending what you tell the OS
Robert Gould
+2  A: 

No, it won't scale. It's not feasible.

You'd be better off using e.g. SQLite. You don't need a server, it's bundled in with PHP by default and stores data in regular files.

vartec
Sqlite and XML are both file based and actually a folder of flat XML files scales much better than Sqlite if you know what you are doing. Even if you don't know what you are doing it'll scale better actually. Not that I'd use XML files over a clustered DB but it beats sqlite for scalability
Robert Gould
Could you paste URL, where there are some benchmarks of SimpleXML vs SQLite?
vartec
Nope, because I'm on a phone, but have you considered that since Sqlite locks the database you can't actually serve two clients at once? You can't run two sessions in parallel, and it is a single point for a bottleneck. Sqlite will outperform on a one to one basis, but XML will scale better
Robert Gould
Sqlite is great for prototyping and developing SQL based systems, but if my system can't rely on a true database server in production I'd dump SQL and use Xml anyday. If I knew I will have a real database server I'd never use XML, unless I want easy editing.
Robert Gould
Robert, are you sure that SQLite prevents two clients being served at once? Any ACID compliant DB would need to be able to lock its data, especially when there are multiple concurrent users. And according to Wikipedia, multiple threads _can_ access an SQLite database concurrently with no problems.
Calvin
"Several computer processes or threads may access the same database without problems. Several read accesses can be satisfied in parallel. A write access can only be satisfied if no other accesses are currently being serviced..." -- http://en.wikipedia.org/wiki/SQLite
Calvin
@Calvin: any ACID DB indeed locks data. The difference is, that most advanced RDBMS do it at row level, simpler at table level and simplest at file level. SQLite is the last case. But so would be XML or flat-files.
vartec
@Robert: read locks are not exclusive, they only prevent writing at same time. You can have as many threads reading at same time, as you wish. Now locking is something, that you'd have to do in flat-files or XML anyways. You can't concurrently update a flat-file.
vartec
But you can have many smaller XML files, one per entry and then you select them by hashed naming, like how the iPod stores music if you haven't seen it done before
Robert Gould
From experience if one process(session) is writting all other processes halt, becausethey can't read until the write is done. That's why it's not really server-side production friendly, for client side and prototyping SQLite rocks I love it and use it everyday.
Robert Gould
You can have many smaller DB files. Same thing. XML does not give you advantage over SQLite. Full-fledged RDBMS does.
vartec
+3  A: 

If you have a saw and you need to pound in a nail, don't use the saw. Get a hammer. (folk saying)

In other words, if you want a data store, use a data-base, not a markup language.

PHP has good support for various database systems via PDO; for small data sets, you can use SQLite, which doesn't need a server (it is stored in a normal file). Later, should you need to switch to a full-featured database, it is quite simple.

To answer your questions:

  1. Viable solution - no, definitely not. XML has its purposes, but simulating a database is not one, not even for a small data set.
  2. With XML, you're shuffling strings around, all the time. That might be just bearable on read, but is a real nightmare on write (slow to parse,large memory footprint, etc.). While you could subvert XML to work as a data store, it is simply the wrong tool for the job.
  3. No (everything will take forever, if you don't run out of memory before that).
  4. No, for many reasons (locking, re-writing the whole XML-string/file, not to mention memory again).

5a. SQLite was designed with very small and simple databases in mind - simple, no server dependencies (the db is contained in one file). As @Robert Gould points out in a comment, it doesn't scale for larger applications, but then

5b. for a medium to large data store, consider a relational database (and it is usually easier to switch databases than to switch from XML to a database).

Piskvor
You bet it's easier :) moving from Xml to SQL is an epic pain! :)But it scales better than Sqlite and XML doesn't tie you to flat schemas. It's easier to map some models. So I don't think it's fair to call it a saw. SQL can be a saw as well. But for the Q neither is a saw, just don't want hype
Robert Gould
@Robert Gould: Indeed, we don't really have enough information about the schema - in some cases, XML might fit better to the job, e.g. working with any kind of tree in SQL is hard.
Piskvor
Regardless of the schema and considering your answer, I think that I should give sqlite a try. Maybe I'll keep my xml handling code for configuration files etc.
dxrsm
A: 

Everyone loves to throw dirt on XML files, but in reality it works, I've seen large applications use them, and I know of an MMO that uses simple flatfiles for storage and it works fine( by the way the MMO is among the top 5 worldwide, so it's not just a toy). However my job right now is creating a better and more savy persistence layer based on SQL, and if your site will be big SQL is the best solution but XML is capable of Massive (MMO) scalability if done well.

But a caveat is migration from XML to SQL is rough if the mapping isn't easy.

Robert Gould
It's not that "XML is baaad." Some tasks are better suited for XML, some for a database, some for something completely different. Of course, you can pound in a nail with a saw, or even make a high-performance special saw for pounding nails, but it would be much easier to use a hammer.
Piskvor
Do you have any hints/references concerning... doing it well?
dxrsm