tags:

views:

373

answers:

5

Hi,

I need to implement a Routing Table where there are a number of paramters.

For eg, i am stating five attributes in the incoming message below

Customer Txn Group Txn Type Sender Priority  Target
UTI       CORP     ONEOFF   ABC    LOW       TRG1
UTI       GOV      ONEOFF   ABC    LOW       TRG2

What is the best way to represent this data in XML so that it can be queried efficiently.

I want to store this data in XML and using Java i would load this up in memory and when a message comes in i want to identify the target based on the attributes.

Appreciate any inputs.

Thanks, Manglu

A: 

That depends on what is repeating and what could be empty. XML is not known for its efficient queryability, as it is neither fixed-length nor compact.

Stephan Eggermont
+1  A: 

If you're loading it into memory, it doesn't really matter what form the XML takes - make it the easiest to read or write by hand, I would suggest. When you load it into memory, then you should transform it into an appropriate data structure. (The exact nature of the data structure would depend on the exact nature of the requirements.)

EDIT: This is to counter the arguments made in comments by Dimitre:

I'm not sure whether you thought I was suggesting that people implement their own hashtable - I certainly wasn't. Just keep a straight hashtable or perhaps a MultiMap for each column which you want to use as a key. Developers know how to use hashtables.

As for the runtime efficiency, which do you think is going to be more efficient:

  • You build some XSLT (and bear in mind this is foreign territory, at least relatively speaking, for most developers)
  • XSLT engine parses it. This step may be avoidable if you're using an XSLT library which lets you just parameterise an existing query. Even so, you've got some extra work to do.
  • XSLT engine hits hashtables (you hope, at least) and returns a node
  • You convert the node into a more useful data structure

Or:

  • You look up appropriate entries in your hashtable based on the keys you've been given, getting straight to a useful data structure

I think I'd trust the second one, personally. Using XSLT here feels like using a screwdriver to bash in a nail...

Jon Skeet
It is not a "must" or even a "should" to transform the data into another data structure. Sometimes the time for such transformations (to and from) is bigger than the time for processing. Often an XML representation expresses well the data and the relationships and is best kept as is.
Dimitre Novatchev
@Jon Skeet It is not slower than loading the data into a hashtable, because a hashtable is the typical implementation for xslt keys. And this is much higher level, the XSLT processor does it for you and you simply cannot make any error that you could typically make working with the HT yourself.
Dimitre Novatchev
As being "a lot harder to work with for the vast majority of programmers" it's to the contrary: no chance of mistakes with your own hashtable, no time spent on designing, coding and debugging the additional data structures. You get the solution correct and faster and have more time to optimize it.
Dimitre Novatchev
@Jon Skeet Let's separate *feelings* from facts even if the former belong to Jon Skeet :) It does not matter whether I think it is faster (I never said that, only that it's not slower) or whether you think it's slower. What matters is the *fact* how fast/slow it is
Dimitre Novatchev
Well, I'm claiming that the XSLT version simply has to do more work - it has to do all the same things as a version which uses a hashtable directly, *and* it has to do XSLT parsing etc. But I agree, let's benchmark. You write up the framework and I'll port it to my version.
Jon Skeet
Then we can also take a poll on what proportion of developers find the "direct" version easier to understand than the XSLT version, seeing as a core part of my argument is that XSLT is relatively poorly understood by most Java programmers, whereas virtually all Java devs know hashtables.
Jon Skeet
(I would add that Java's relatively nasty built-in XML APIs and lack of LINQ don't help. A C#/LINQ to XML solution would probably be simpler still.)
Jon Skeet
@Jon Skeet The possibility for error is not great in table.get(key) but it is very real in the initial population of the hashtable -- something that XSLT does for you correctly and does not require you to do manually
Dimitre Novatchev
1) Implement immutable type with appropriate fields, equals and hashcode. All standard things for a Java dev to do. Lots of help available, Effective Java etc. 2) Transform each element from XML to that type. We'll have to know how to do that anyway to get useful results from XSLT. 3) Put in table.
Jon Skeet
But I look forward to seeing your implementation for benchmarking. I've asked the original questioner for more details of what we need to be able to query on.
Jon Skeet
A: 

I agree with the previous two posters - you should definitely not keep the internal representation of this data in XML when querying as messages come in.

The XML representation can be anything, you could do something like this:

<routes>
  <route customer="UTI" txn-group="CORP" txn-type="ONEOFF" .../>
  ...
  </routes>

My internal representation would depend on the format of the message coming in, and the language. A simple representation would be a map, mapping a structure of data (i.e. the key fields from which the routing decision is made) to the info on the target route.

Depending on your performance requirements, you could keep the key/target information as strings, though in any high performing system you'd probably want to do a straight memory comparison (in C/C++) or some form integer comparison.

Jamie Love
A: 

Yeah, your basic problem is that you're using "XML" and "efficient" in the same sentence.

Edit: No, seriously, yer killin' me. The fact that several people in this thread are using "highly efficient" to describe anything to do with operations on a data format that require string parsing just to find out where your fields are shows that several people in this thread do not even know what the word "efficient" means. Downvote me as much as you like for saying it. I can take it, coach.

chaos
To be fair to XML here, it's not like we have to do string parsing on every query - just to start with. That's unless we need to parse an XSLT template on each query... I would assume Dimitre is proposing a parameterised template though.
Jon Skeet
A: 

Here is a pure XML representation that can be processed very efficiently as is, without the need to be converted into any other internal data structure:

<table>
 <record Customer="UTI" Txn-Group="CORP" 
      Txn-Type="ONEOFF" Sender="ABC1" 
      Priority="LOW"  Target="TRG1"/>

 <record Customer="UTI" Txn-Group="Gov" 
      Txn-Type="ONEOFF" Sender="ABC2" 
      Priority="LOW"  Target="TRG2"/>


</table>

There is an extremely efficient way to query data in this format using the <xsl:key> instruction and the XSLT key() function:

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes"/>

 <xsl:key name="kRec" match="record"
  use="concat(@Customer,'+',@Sender)"/>

    <xsl:template match="/">
      <xsl:copy-of select="key('kRec', 'UTI+ABC2')"/>
    </xsl:template>
</xsl:stylesheet>

when applied on the above XML document produces the desired result:

<record Customer="UTI" 
        Txn-Group="Gov" Txn-Type="ONEOFF" 
        Sender="ABC2" Priority="LOW" 
        Target="TRG2"/>

Do note the following:

  1. There can be multiple <xsl:key>s defined that identify a record using different combinations of values to be concatenated together (whatever will be considered "keys" and/or "primary keys").

  2. If an <xsl:key> is defined to use the concatenation of "primary keys" then a unique record (or no record) will be found when the key() function is evaluated.

  3. If an <xsl:key> is defined to use the concatenation of "non-primary keys", then more than one record may be found when the key() function is evaluated.

  4. The <xsl:key> instruction is the equivalent of defining an index in a database. This makes using the key() function extremely efficient.

  5. In many cases it is not necessary to convert the above XML form to an intermediary data structure, due neither to reasons of understandability nor of efficiency.

Dimitre Novatchev
Wow, XSLT and extremely efficient in one sentence. That's an interesting value of efficient you're using there.
Stephan Eggermont
Glad you learned something today :) I am solving ProjectEuler problems and often my XSLT solutions are faster than some Java, C#, Python,Ruby,even C/C++ solutions. Certainly, it is the algorithm that matters more... As for *quering XML*, XQuery or XSLT are not less efficient than DBMS rel. data
Dimitre Novatchev
You think this is going to be quicker than loading the data into a hashtable to start with? I very, very much doubt that. Further more, I'd argue it's a lot harder to work with for the vast majority of programmers.
Jon Skeet
@Jon Skeet It is not slower than loading the data into a hashtable, because a hashtable is the typical implementation for xslt keys. And this is much higher level, the XSLT processor does it for you and you simply cannot make any error that you could typically make working with the HT yourself.
Dimitre Novatchev
No, because of course calling table.get(key) is *so* much more error prone than building XSLT... I'm sorry, but I just *really* feel XSLT is an inappropriate technology to use here. It might be faster for *you* to code that way, but I suspect that would be the case for very, *very* few other devs.
Jon Skeet
@Jon Skeet Let's separate *feelings* from facts even if the former belong to Jon Skeet :) It does not matter whether I think it is faster (I never said that, only that it's not slower) or whether you think it's slower. What matters is the *fact* how fast/slow it is.
Dimitre Novatchev
Given the rest of the responses here, I think the onus is on you to show that it's fast - both in terms of development time (for the *average* Java dev, not an XSLT expert) and execution time efficiency.
Jon Skeet
@Jon Skeet The possibility for error is not great in table.get(key) but it is very real in the initial population of the hashtable -- something that XSLT does for you correctly and does not require you to do manually.
Dimitre Novatchev