views:

428

answers:

10

Lots of sites today have APIs that allow users to get data from the site as XML or JSON using a GET HTTP request. Flickr and del.icio.us are example of sites with APIs. These APIs require the server to access the database, and then output the result as either XML or JSON.

Why do we need this translation though? Why not just create a user on the database (for example MySQL)? The user would be given limited access to the database, only being allowed to SELECT, and only certain tables and certain columns in those tables. Wouldn't this be a lot more efficient for the server (it wouldn't have to deal with the HTTP request), and it would be easier for developers, who could now access exactly the data they need, the way they need it.

+28  A: 

Can you prevent a malicious individual from crafting a super-complex SQL query that will peg your database's CPU at 100%? Can you prevent a lot of innocent programmers from crafting inefficient queries that will never be optimized that will do the same thing?

Bob Aman
Aren't there tools available to set a timeout on an SQL script?
Marius
Sure. But queries come in asynchronously, and until you hit the timeout, the database (almost always your bottleneck even without this scheme) is still going to be doing a ton of CPU cycles. I don't care how short you set that timeout, I can exploit it.
Bob Aman
And even then, none of that covers the issues that everyone else raised: no caching, no scaling, no flexibility, no ability to alter schemas, no portability. The reasons for not doing this are countless.It's a great question, and I totally get why someone might briefly consider it, but no one should ever attempt it with something they care about.As an experiment, try exposing a database with a lock-down user on a virtual machine to the outside sometime. Invite people to play around on it. I'll be stunned if it lasts an hour.
Bob Aman
+32  A: 

Security considerations aside, so that you can change your database structure without affecting your clients. Also, poorly formed queries tie up your server, not the clients.

Jeff Ober
Oh, good point. The abstraction layer to give you flexibility is huge.
Bob Aman
+2  A: 

the web server gives you a buffer that you can control. if there is some bug in your sql server or whatever, you don't want it exposed directly to the internet. true, if the web server has bugs, it might be just as bad ... except you have that extra layer between the data and the world.

-don

Don Dickinson
+3  A: 

Portability too. Lets say for licensing reasons and scaling you make the business decision to move from MSSQL to MySql. Syntax ain't quite the same and your clients will all have to change their code.

Much better to just buffer it all off and keep the implementation abstracted away. Whose to say you're not persisting the state of the application using trained monkeys scratching marks on bottletops?

Quibblesome
+1  A: 

It's not as much a 'why not' than a 'why should you' question. Handling HTTP requests is a small penalty for complete control over what all data you allow or disallow a user from accessing. Further, should the nature / quantity / security level of data change in future, you will be better off with a JSON / XML response than allowing total access.

Crimson
+11  A: 

An API:

  • Makes it easier to montior and control usage (implementing 'limited queries per X' for DB users may be harder)
  • Allows for presenting simpler structures to the user than may be used in the DB.
  • Means the user doesn't have to understand your DB structure.
  • Allows for DB portability. (Oh you've grown massive and now need to implement: sharding, move to bigtable, etc. - With an API the user doesn't need to know)
  • Allows for different (better? / variable?) caching of requests.
  • Means you don't have to pay for extra DB users (If that's how the DB is licensed.)
Matt Lacey
+10  A: 

Coding to Contract - with APIs, you may change everything behind them without affecting outsiders use of them. Here you'd be tying them to not just MySQL but your schema

Caching - Allowing them any query almost removes any opportunity for caching that predictable queries over http that can be used. This is probably the number one way to remove the often number one bottleneck, the database.

Security - with this approach, it would be easy for a denial of service attack, even by accident. Not to mention the fact you'd have to give access to data layer, which is often put in a restricted zone where security can be tightened

Usability - not everyone is a developer or wants to understand a your internal domain. They probably prefer a pre baked straight forward and self-explaining API. An extreme example would be to give managers db privileges rather than reports.

dove
A: 

API is a kind of Wrapper around of database. Users do not know anything about database internal representation of data, he only need to send a number of unified requests and get unified response on it. How and when data will be processed on the server - it's not his headache.

Victor Vostrikov
+3  A: 

Security is the number 1 reason but I hope those reasons are obvious. The user tying up precious resources with bad queries is another good reason.

Beyond that though, why an abstraction layer?

  • Might you ever want to add some logging to database queries to diagnose speed or to help debug?
  • Might you ever go from MySQL to MS SQL or vice versa where SQL other than pure ANSI might break?
  • Should the customer really have to learn your schema rather than a more logical abstraction?
  • When a new programmer learns of normalization and can now see your whole schema including your carefully balanced denormalizations, do you want to put up with every uninformed criticism?
  • When a more experienced db person points out improvements, do you want to be stuck with your old schema?

Why to use an API is a question of why to use abstractions and my list here barely scratches the surface.

Dinah
+1  A: 

The thing to bear in mind when you're thinking of security issues is that it's really hard to anticipate all of the possible vectors that someone could use to attack you. For instance, are you really sure you've gotten your database permissions set so that people can't mess things up?

Therefore, you want to try restricting actions to only what you know to be good, not just trying to restrict the things you know to be bad. This can be done with a web service that you have absolute control over, but it's difficult to allow somebody to access the database directly and be sure that you're secure.

Jason Baker
And are you really certain your RDBMS is bug-free and that *no* user-elevation bugs exist in the SQL parser? If there is such a bug, it's fairly disasterous ("ooo lookie: I can do DROP DATABASE master!"). A simple well designed API can give an extra barrier (still possible if you bugger up or badly design your API though).
Chris J