Is it more efficient to parse external XML or to hit the database?

views:

477

answers:

+1 Q:

Is it more efficient to parse external XML or to hit the database?

I was wondering when dealing with a web service API that returns XML, whether it's better (faster) to just call the external service each time and parse the XML (using ElementTree) for display on your site or to save the records into the database (after parsing it once or however many times you need to each day) and make database calls instead for that same information.

+3 A:

Consuming the webservices is more efficient because there are a lot more things you can do to scale your webservices and webserver (via caching, etc.). By consuming the middle layer, you also have the options to change the returned data format (e.g. you can decide to use JSON rather than XML). Scaling database is much harder (involving replication, etc.) so in general, reduce hits on DB if you can.

Boon 2009-06-10 23:13:52

+6 A:

First off -- measure. Don't just assume that one is better or worse than the other.

Second, if you really don't want to measure, I'd guess the database is a bit faster (assuming the database is relatively local compared to the web service). Network latency usually is more than parse time unless we're talking a really complex database or really complex XML.

Jonathan 2009-06-10 23:14:27

+1 A:

There is not enough information to be able to say for sure in the general case. Why don't you do some tests and find out? Since it sounds like you are using python you will probably want to use the timeit module.

Some things that could effect the result:

Performance of the web service you are using
Reliability of the web service you are using
Distance between servers
Amount of data being returned

I would guess that if it is cacheable, that a cached version of the data will be faster, but that does not necessarily mean using a local RDBMS, it might mean something like memcached or an in memory cache in your application.

lambacck 2009-06-10 23:18:38

Perhaps more importantly: the frequency of update from the remote site vs. the frequency of access in the local site.

S.Lott 2009-06-11 00:09:03

+1 A:

It depends - who is calling the web service? Is the web service called every time the user hits the page? If that's the case I'd recommend introducing a caching layer of some sort - many web service API's throttle the amount of hits you can make per hour.

Whether you choose to parse the cached XML on the fly or call the data from a database probably won't matter (unless we are talking enterprise scaling here). Personally, I'd much rather make a simple SQL call than write a DOM Parser (which is much more prone to exceptional scenarios).

Andy Baird 2009-06-10 23:19:51

It depends from case to case, you'll have to measure (or at least make an educated guess).

You'll have to consider several things.

Web service

it might hit database itself
it can be cached
it will introduce network latency and might be unreliable
or it could be in local network and faster than accessing even local disk

might be slow since it needs to access disk (although databases have internal caches, but those are usually not targeted)
should be reliable

Technology itself doesn't mean much in terms of speed - in one case database parses SQL, in other XML parser parses XML, and database is usually acessed via socket as well, so you have both parsing and network in either case.

Caching data in your application if applicable is probably a good idea.

Domchi 2009-06-11 00:06:35

As a few people have said, it depends, and you should test it.

Often external services are slow, and caching them locally (in a database in memory, e.g., with memcached) is faster. But perhaps not.

Fortunately, it's cheap and easy to test.

2009-06-11 00:29:58

+3 A:

Everyone is being very polite in answering this question: "it depends"... "you should test"... and so forth.

True, the question does not go into great detail about the application and network topographies involved, but if the question is even being asked, then it's likely a) the DB is "local" to the application (on the same subnet, or the same machine, or in memory), and b) the webservice is not. After all, the OP uses the phrases "external service" and "display on your own site." The phrase "parsing it once or however many times you need to each day" also suggests a set of data that doesn't exactly change every second.

The classic SOA myth is that the network is always available; going a step further, I'd say it's a myth that the network is always available with low latency. Unless your own internal systems are crap, sending an HTTP query across the Internet will always be slower than a query to a local DB or DB cluster. There are any number of reasons for this: number of hops to the remote server, outage or degradation issues that you can't control on the remote end, and the internal processing time for the remote web service application to analyze your request, hit its own persistence backend (aka DB), and return a result.

Fire up your app. Do some latency and response times to your DB. Now do the same to a remote web service. Unless your DB is also across the Internet, you'll notice a huge difference.

It's not at all hard for a competent technologist to scale a DB, or for you to completely remove the DB from caching using memcached and other paradigms; the latency between servers sitting near each other in the datacentre is monumentally less than between machines over the Internet (and more secure, to boot). Even if achieving this scale requires some thought, it's under your control, unlike a remote web service whose scaling and latency are totally opaque to you. I, for one, would not be too happy with the idea that the availability and responsiveness of my site are based on someone else entirely.

Finally, what happens if the remote web service is unavailable? Imagine a world where every request to your site involves a request over the Internet to some other site. What happens if that other site is unavailable? Do your users watch a spinning cursor of death for several hours? Do they enjoy an Error 500 while your site borks on this unexpected external dependency?

If you find yourself adopting an architecture whose fundamental features depend on a remote Internet call for every request, think very carefully about your application before deciding if you can live with the consequences.

Jarret Hardie 2009-06-11 01:08:21

Did you read the question carefully? It sounded like the primary result always comes from an external web service, so network outages are already something that needs to be handled.Also it sounds like web service in question is only external to client host, but likely local in grand scheme of things.

StaxMan 2009-06-11 02:29:12

Test definitely. As a rule of thumb, XML is good for communicating between apps, but once you have the data inside of your app, everything should go into a database table. This may not apply in all cases, but 95% of the time it has for me. Anytime I ever tried to store data any other way (ex. XML in a content management system) I ended up wishing I would have just used good old sprocs and sql server.

Dale Wilbanks 2009-06-11 01:12:47

It sounds like you essentially want to cache results, and are wondering if it's worth it. But if so, I would NOT use a database (I assume you are thinking of a relational DB): RDBMSs are not good for caching; even though many use them. You don't need persistence nor ACID. If choice was between Oracle/MySQL and external web service, I would start with just using service.

Instead, consider real caching systems; local or not (memcache, simple in-memory caches etc). Or if you must use a DB, use key/value store, BDB works well. Store response message in its serialized form (XML), try to fetch from cache, if not, from service, parse. Or if there's a convenient and more compact serialization, store and fetch that.

StaxMan 2009-06-11 02:26:38

why did this get voted down? i thought it was good answer :)

rick 2009-06-18 01:20:56

I was wondering the same thing... :)

StaxMan 2009-06-19 04:47:23

ansaurus

tags:

views:

answers:

Is it more efficient to parse external XML or to hit the database?

related questions