views:

37

answers:

4

Currently we are saving all SOAP requests to a log file (rolling, one per day). When I get a request about who did what, when, how often etc - then I need to resort to heavy duty text manipulation in order to extract that information. My idea was to save those SOAP requests to a database where I could perform some queries.

A SOAP request consists of a header and an envelope. Within the header is the username of the one invoking the action. Within the body of the envelope, the first element is the WS method name, followed by all the parameters to the method. Each WS method obviously has different parameters:

  • SOAP request
    • Envelope
      • Header
        • Username = XXX
      • Body
        • WS name
          • WS Parameters (different between methods, hierarchical)

I'd like to construct queries like these without too much difficulties:

  • Which WS methods has user XXX invoked within certain period
  • How often was a each WS method called
  • Show me all requests made to WS XXX where a particular parameter to the WS was YYY (e.g. every call the "changeParty" where the partyId = 123)

Is this something that would be ideally suited to a "Document store type of database" / NoSQL, such as CouchDB/MongoDB? Would this be hard to set up? Any alternative suggestions / recommendations / thoughts?

Thanks!

EDIT:

  • In most cases I think I'd be fine with an index on: user, WS method, requestDate. Any other field would probably not be indexed (although I would do some filtering on them). As it is for auditing purposes only, it wouldn't have to be lightning fast.
  • I was thinking about putting NoSql server on the monster app-servers that we have and give the "real underlying" database (DB2) a break from any kind of logging as it is in pretty heavy use.
+1  A: 

Well, specifically with regards to Couch and Mongo, the first issue you will have to deal with is converting the SOAP XML payload in some relevant JSON payload, since that's what these DBs understand.

Next, most of these DBs require you to have a good understanding of what queries you want to make from them up front, during initial design, and ideally before you start loading data in to them. Adding new indexes can be expensive for existing data sets, particularly large ones.

It's all very doable, but you want to get out of your head any concept of an "ad hoc" query like in an RDBMS. They simply don't typically perform well in NoSQL systems.

Finally, since you have to convert the data from XML to JSON already, you may well be better off extracting the "interesting fields" from your SOAP requests, the ones you want to query on, and simply populate your own index fields and rows and store the data in your (I assume) already existing SQL database.

Edit for comment:

It's not transparent to you, the users, because the burden of conversion to JSON falls upon you, since the two DBs mentioned require a JSON document to be passed to them.

That said, you could simply extract the few bits you care about rather than marshall the entire document. You would have to marshall everything you're interested in, including for filtering, in that case.

So, you can create a document:

{ "user":"Bob",
  "methodName" : "helloWorld",
  "soapDate" : "2010-09-01 12:23:45",
  "soapRequest" : "...escaped XML goes here..."
}

Then you could build views or indexes off of anything exposed within the JSON document. You won't be able to (easily) query anything from within the raw SOAP request. Only elements you specifically extract and put in to the JSON payload will you be able to access readily through the DB.

My basic point is, tho, that if you have to go through any marshalling process to convert your data and make it ready for the database, then you likely may as well use a database you already know and is supported at your company (i.e. likely your existing RDBMS) rather than introduce another brick in to your stack by adding a new DB.

If you were already native JSON, or, say, found a NoSQL-esque DB that supported native XML vs JSON, then that might be worthwhile since you can skip that part of the process.

But you can easily create a table (or tables) to handle this data in most any modern RDBMS.

Will Hartung
I guess the XML to JSON conversion could be done transparently? In most cases I think I'd be fine with an index on: user, WS method name, soap-requestDate. Any other field would probably not be indexed (although I would do some filtering on them). As it is for auditing purposes only, it wouldn't have to be lightning fast. Only reasonable
AtliB
A: 

Why nosql? If you used Postgres, for example, you could dump the XML payload plus extracted data chunks in to an xml field using a simple insert trigger to extract the data you want straight in to the DB. You could have a table like this:

create table soaps (
  payload text,
  user int,
  method_invoked char,
  .. etc ..
);

I'm sure other SQL databases could do that as well.

The only reason I can see to use a no-SQL solution is if you've got enourmous volumes of data. Either way you'll probably need some data life cycle management tools to archive old data.

Robin
I'm actually using an old version of DB2 on AS/400 which I don't think supports XML natively. Besides, I didn't want to put additional load and storage space requirements on the database machine.
AtliB
+1  A: 

As @Robin said, there's no good reason to use a NoSQL database here. I have recently done this sort of thing with SQL Server, since it supports XML as a native data type. It is possible to just store the SOAP requests and responses in the database as columns of type XML. One can then query the database using XQuery to find any set of requests and responses you might need.

John Saunders
+1 hadn't seen this answer when posted the same in part of my answer.
eglasius
A: 

If you have that data loaded in object instances in .net or java: db4o. Save the top instance and define some indexes.

If you have it in xml, I'd use something that gives me support for that directly. I think Sql Server Express does.

eglasius