ansaurus

Question

How to improve on PHP's XML loading time?

Answer 1

+1 A:

If the XML stays relatively static, you could cache it as a PHP array, something like this:

<xml><foo>bar</foo></xml>

is cached in a file as

<?php return array('foo' => 'bar');

It should be faster for PHP to just include the arrayified version of the XML.

Jani Hartikainen 2009-07-08 14:19:11

This is a good answer, but we're already doing that for several runs at once; the XML is expected to not stay particularly static for more than a few seconds, but we're allowing for a few minutes' worth of changes to slip by for a few runs. After that, we have to take up all the changes, which means recreating the array. Still very CPU intensive.

Polymeron 2009-07-08 17:15:37

Answer 2

+3 A:

Just to clarify: is the data you're loading coming from XML files for processing in its current state and is it being modified before being sent to the Flash application?

It looks like you'd be better off using a database to store your data and pushing out XML as needed rather than reading it in XML first; if building the XML files gets slow you could cache files as they're generated in order to avoid redundant generation of the same file.

Mathew Hall 2009-07-08 14:33:05

This is what I was going to suggest. +1

ceejayoz 2009-07-08 14:41:07

Yes, 100k objects are better kept in an embedded database, or a dedicated one if you can access it; then you can generate just the bits of the xml that the client needs.

Mercer Traieste 2009-07-08 15:49:14

To clarify: The Flash interface and the runs are completely separate, except that the runs modify some data which will eventually be displayable. But the runs are independent of whether or not the objects are being queried by users.The data coming from XML is in its current state; when sent to Flash, it isn't modified. The users however have the ability to make changes to loaded files via the interface.The question is, faster user access not withstanding, does working with a DB speed up the *runs*? We're more concerned about that currently.

Polymeron 2009-07-08 17:36:35

In the case of the actual runs it seems you might be able to gain a performance increase from a database; the overhead in loading the data will be significantly decreased vs. parsing the XML each time. At the very least this would reduce the cost of each run.

Mathew Hall 2009-07-08 19:06:10

Answer 3

A:

~1k/hour, 3600 seconds per hour, more than 3 runs a second (let alone the 50k/hour)...

There are many questions. Some of them are:

Does your php script need to read/process all records of the data source for each single run? If not, what kind of subset does it need (~size, criterias, ...)
Same question for the flash application + who's sending the data? The php script? "Direct" request for the complete, static xml file?
What operations are performed on the data source?
Do you need some kind of concurrency mechanism?
...

And just because you want to deliver xml data to the flash clients it doesn't necessarily mean that you have to store xml data on the server. If e.g. the clients only need a tiny little subset of the availabe records it probably a lot faster not to store the data as xml but something more suited to speed and "searchability" and then create the xml output of the subset on-the-fly, maybe assisted by some caching depending on what data the client request and how/how much the data changes.

edit: Let's assume that you really,really need the whole dataset and need a continuous simulation. Then you might want to consider a continuous process that keeps the complete "world model" in memory and operates on this model on each run (world tick). This way at least you wouldn't have to load the data on each tick. But such a process is usually written in something else than php.

VolkerK 2009-07-08 14:44:15

To clarify, the runs should work in the background, processing data that will be displayed to the users eventually.- We'll need every single object's data for every single run.- When users are viewing the interface, the interface calls specific XML files in order to know what to display.- No need for concurrency mechanisms - we're ok on that front, I think.Searchability is all good and well for the users, but would using the DB be more efficient for the background runs? That's the current concern.

Polymeron 2009-07-08 17:27:05

If it's a background process why do you need to read/load the whole dataset repeatedly? If you say you have to we probably have to believe you ;-) but many times such a question is asked in php forums it boils down to "no, you don't need a (almost-)continuous simulation for that". Can you be more specific on the dataset and the operations you want to perform on each run?

VolkerK 2009-07-08 20:33:48

Then I would try to get rid of the files or at least of the repeated load operations. I.e. a continuously running process that a) does the simulation, b) accepts and serves requests for subsets of the data and c) handles requests to modify the data. So instead of uploading a file (that is stored as a file on the server) this process would integrate the new data into its world model (and probably store it in a database as backup)

VolkerK 2009-07-09 11:54:40

ansaurus

tags:

views:

answers:

How to improve on PHP's XML loading time?

related questions