tags:

views:

69

answers:

4

One of my coworkers is going to build api directly from database. He wants to return xml for queries. Is really good idea?

We have to build api for our partners and we are looking for good architectural approach. We have few milions of products, reviews and so on.

Some partners will take 1000 products others would like to copy almost whole database. We have to restrict access to some fields for example one partner will see productDescription other only id and productName. Sometimes we would like to return only info about categories in xml response, sometimes we would like to include 100 products for each category returned.

One of programmers is forcing solution based almost only on mssql 2005 for xml auto. He wants to build query in application, send it to server and then return xml to partner. Without any caching within application.

Is it rally good idea?

A: 

It depends on who will consume this API - If this API is going to be consumed by a large range of different languages, then yes it may make sense to expose returned data in an Xml format, as pretty much everything is able to parse Xml.

If on the other hand the API is predominanly going to be consumed by only 1 language (e.g. C# / .Net) then you would be much better writing the API in that language and directly exposing data in a format native to that language - exposing Xml based results will result in a needless generation and subsequent parsing of Xml.

Personally I would probably opt for a mixed approach - choose a suitable commonly used language (for the customers of this API) to write the API in, and then on top of that expose an extra xml based API if it turns out its needed.

Kragen
+1  A: 

I have used this technique for a particular web application. I have mixed feelings about this approach.

One pro is that this is really convenient for simple requirements. Another pro is that it is really easy to translate a database schema change in a change in the XML format, since everything is in one place.

I found there are cons too. When your target XML gets more complex, has more nested structures then this solution can get rapidly out of hand. Consider for example this (taken from http://msdn.microsoft.com/en-us/library/ms345137(SQL.90).aspx#forxml2k5_topic5):

SELECT CustomerID as "CustomerID",
  (SELECT OrderID as "OrderID"
   FROM Orders "Order"
   WHERE "Order".CustomerID = Customer.CustomerID
   FOR XML AUTO, TYPE),
  (SELECT DISTINCT LastName as "LastName"
   FROM Employees Employee
   JOIN Orders "Order" ON "Order".EmployeeID = Employee.EmployeeID
   WHERE Customer.CustomerID = "Order".CustomerID
   FOR XML AUTO, TYPE)
FROM Customers Customer
FOR XML AUTO, TYPE

So essentially, you see that you begin writing SQL to mirror the structure of the XML output. And if you think about it, this is a bad thing - you're mixing the data retrieval logic with presentation logic - the fact that the presentation is in this case representation in a data exchange format, really does not change the fact that you're mixing two different things, making both of them harder.

For example, it is quite possible that the requirements for the exact structure of the XML change over time, whereas the actual associated data requirements remain unchanged. Then you would be rewriting queries even though there is nothing wrong with the actual dataset you are already retrieving. That's a code smell if you ask me.

Another consideration is performance/query tuning. I cannot say I have done much benchmarking of these types of queries, but I would typically avoid correlated subqueries like this whenever I can...and now, just because of this syntactic sugar, I would suddenly throw that overboard because of the convenience of generating XML with no intermediary language? I don't think it's a good idea.

So in short, I would use this technique if I could considerably simplify things. But if I could anticipate that I would need an intermediary language anyway to generate all the XML structures I need, I would choose to not use this technique at all. If you are going to generate XML, do it all in one place, don't put some in the query and some in your program because it will become a nightmare to manage change and maintain it.

Roland Bouman
A: 

It's a bad idea. SQL Server can return data only trough the TDS protocol, meaning it can only be a result set (rows). Returning XML means you still return a rowset, but a arowset of SQL data formated as XML. But ultimately, you still need a TDS protocol client (ie. SqlClient, OleDB, ODBC, JDBC etc). You still need to deal with rows, columns and T-SQL errors. You are returning a column containing XML data, not an XML response.

So if your client has to be database client, what advantage does XML give? Other than you lost all schema result metadata information in the process...

In adition, consider that stored procedures are an API for access for everything, including SSIS tasks, maintenance and ETL jobs, other applications deployed on the database etc. Having everything presented to that layer as XML will be a mess. Two stored procedures from related apps, both in the same instance, exchanging call via for-xml and then xpath-squery? Why? Keep in mind, your database will outlive every application you have in mind today.

I understand XML as a good format for exchange of data between web services. But not between the database and the client. So the answer is that your partners should see XML, but from your web service layer, not from your database.

Remus Rusanu
A: 

In general, I see nothing wrong with exposing information stored in a RDBMS via a "read only" API that enforces access restrictions based on user privilege. Since your application is building the queries you can expose whatever the appropriate names are for your nouns (tables) and attributes (columns) in the user-facing API.

Most DBs can cache queries (and although I don't know SQL server at all I imagine it can do this), and the main advantage of not caching "downstream" in the application is simplicity - the data returned for each API call will be up to date, without any of the complexity of having to figure out when to refresh a "downstream" cache. You can always add caching later - when you're sure that everything works properly and you actually need the performance boost.

As for keeping the queries and the XML in sync - if you're simply dumping data records which are generated from a single table then there's not much of an issue, here. It is true that as you start combining information from multiple tables on the back end it may become harder to generate data records with a single query, but fixing this with intermediate data structures in the web server application is an approach that (typically) scales poorly as the tables grow - you're often better off putting the data you need to expose in a single query/API call into a database "view".

If your XML is designed in such a way that you have to load all the data into memory and calculate statistics (to be displayed in XML element attributes, for example) before rendering begins, then you'll have scalability problems no matter what you do. So try to avoid designing your XML this way from the beginning.

Note also that XML can often be "normalized" (just as DB tables should be), using internal links and "GLOSSARY" elements that factor out repeated information. This reduces the size of XML generated, and by rendering a GLOSSARY element at the end of the XML document (perhaps with information extracted from a subsequent SQL query) you can avoid having to hold lots of data in web server memory while serving the API call.

Peter
Peter, look at that example query in my answer. one master, two (or more) unrelated details. there is to the best of my knowledge no way you write an efficient query for that. If you can, enlighten me.
Roland Bouman