tags:

views:

48

answers:

1

I need to handle a lot of incoming XMLs which may be rather complex. A typical situation is the following:

<SomeNode>
  <Request>
    <Id>1</Id>
    <!-- Request specific stuff -->
  </Request>
  <Request>
    <Id>2</Id>
    <!-- Request specific stuff -->
  </Request>
  <Response>
    <Id>1</Id>
    <!-- Feedback on request no. 1 -->
  </Response>
  <Response>
    <Id>2</Id>
    <!-- Feedback on request no. 2 -->
  </Response>
</SomeNode>

Note that SomeNode doesn't have to be a top node. I have to match these requests with requests already stored in my database, i.e. if a request in an incoming XML doesn't match a record in the db, I need to take action. Usually I will ask the user to manually match the parts of the XML that isn't recognized, and re-process the XML according to these manual rules. Any "error" (both failure and success) should be logged accordingly, preferably with some level of detail.

Finally, it is worth pointing out that there are many different types of XML coming in to my system - hard coding the processing logic is probably not what I want. Re-compiling and shipping a new executable just to handle a new kind of message is too cumbersome. And of course: time is money. Implementing new kind of XMLs should be as fast and as reliable as possible.

At the moment, I'm more interested in technology than specific implementations. Is XQuery a good place to start? Or is this perhaps overkill? Will XPath 1.0 get us all the way, or do we have to use 2.0? Perhaps we don't need any sophisticated processing at all, so that basic XML parsing may suffice? What do you guys think?

I'm sorry for the long post, but we all know the GIGO principle don't we? :)

+3  A: 

I see three parts to your problem:

  • you must first find a way to quickly and easily get the "identifying" information from the XML
  • you must then be able to check your database
  • and if it's not already present, you need to "handle" your XML somehow

Now for the first piece, you'll probably just need a clever XPath expression - something like //SomeNode/Response/Id in your example here - to define how to read the "ID" - whatever that might be. So store this XPath expression in a config - you can change that "on the fly".

The second part is checking for existence - take the value retrieved by step no. 1 and check your database - you didn't provide any details here and this is not XML-related, so I guess this should be fairly simple to do.

The third step is handling the XML and again, you're not awfully explicit about what that involves. You will most likely need another XPath to select the node to handle from the original XML, and then do whatever it takes to "handle" this XML.

What you could do in this case is create an abstract base class that contains this logic - just the stubs of the methods to call - and thus defines the order of the steps and all.

For each XML that you need to handle, create a concrete descendant class that then actually implements these three steps for your concrete problem you're trying to solve.

That way, you can capture common problems and common tasks in the base class, and handle problem-specific logic in your descendant class.

Marc

marc_s
Thanks marc_s! I'm pretty confident when it comes to the db side of things, that's why I didn't bring it up. And I didn't want to go into specifics regarding handling the XML, because people tend to get caught up in the examples.I've been thinking along those same lines; abstract class does all the general stuff, actual implementations takes care of the type specific details. Just out of curiosity: how far can one go with XPath 1.0? Do I have to consider 2.0?
conciliator