views:

903

answers:

8

We have developers with knowledge of these languages - Ruby , Python, .Net or Java. We are developing an application which will mainly handle XML documents. Most of the work is to convert predefined XML files into database tables, providing mapping between XML documents through database, creating reports from database etc. Which language will be the easiest and fastest to work with? (It is a web-app)

+4  A: 

In .NET, C# 3.0 and VB9 provide excellent support for working with XML using LINQ to XML:

LINQ to XML Overview

cxfx
+2  A: 

either C# or VB.Net using LiNQ to XML. LiNQ to XML is very very powerful and easy to implement

FailBoy
+12  A: 

A dynamic language rules for this. Why? The mappings are easy to code and change. You don't have to recompile and rebuild.

Indeed, with a little cleverness, you can have your "XML XPATH to a Tag -> DB table-field" mappings as disjoint blocks of Python code that your main application imports.

The block of Python code is your configuration file. It's not an .ini file or a .properties file that describes a configuration. It is the configuration.

We use Python, xml.etree and the SQLAlchemy (to separate the SQL out of your programs) for this because we're up and running with very little effort and a great deal of flexibility.


source.py

"""A particular XML parser.  Formats change, so sometimes this changes, too."""

import xml.etree.ElementTree as xml

class SSXML_Source( object ):
    ns0= "urn:schemas-microsoft-com:office:spreadsheet"
    ns1= "urn:schemas-microsoft-com:office:excel"
    def __init__( self, aFileName, *sheets ):
        """Initialize a XML source.
        XXX - Create better sheet filtering here, in the constructor.
        @param aFileName: the file name.
        """
        super( SSXML_Source, self ).__init__( aFileName )
        self.log= logging.getLogger( "source.PCIX_XLS" )
        self.dom= etree.parse( aFileName ).getroot()
    def sheets( self ):
        for wb in self.dom.getiterator("{%s}Workbook" % ( self.ns0, ) ):
            for ws in wb.getiterator( "{%s}Worksheet" % ( self.ns0, ) ):
                yield ws
    def rows( self ):
        for s in self.sheets():
            print s.attrib["{%s}Name" % ( self.ns0, ) ]
            for t in s.getiterator( "{%s}Table" % ( self.ns0, ) ):
                for r in t.getiterator( "{%s}Row" % ( self.ns0, ) ):
                    # The XML may not be really useful.
                    # In some cases, you may have to convert to something useful
                    yield r

model.py

"""This is your target object.  
It's part of the problem domain; it rarely changes.
"""
class MyTargetObject( object ):
    def __init__( self ):
        self.someAttr= ""
        self.anotherAttr= ""
        self.this= 0
        self.that= 3.14159
    def aMethod( self ):
        """etc."""
        pass

builder_today.py One of many mapping configurations

"""One of many builders.  This changes all the time to fit
specific needs and situations.  The goal is to keep this
short and to-the-point so that it has the mapping and nothing
but the mapping.
"""

import model

class MyTargetBuilder( object ):
    def makeFromXML( self, element ):
        result= model.MyTargetObject()
        result.someAttr= element.findtext( "Some" )
        result.anotherAttr= element.findtext( "Another" )
        result.this= int( element.findtext( "This" ) )
        result.that= float( element.findtext( "that" ) )
        return result

loader.py

"""An application that maps from XML to the domain object
using a configurable "builder".
"""
import model
import source
import builder_1
import builder_2
import builder_today

# Configure this:  pick a builder is appropriate for the data:
b= builder_today.MyTargetBuilder()

s= source.SSXML_Source( sys.argv[1] )
for r in s.rows():
    data= b.makeFromXML( r )
    # ... persist data with a DB save or file write


To make changes, you can correct a builder or create a new builder. You adjust the loader source to identify which builder will be used. You can, without too much trouble, make the selection of builder a command-line parameter. Dynamic imports in dynamic languages seem like overkill to me, but they are handy.

S.Lott
Exlpain a bit in detail, please
Varun Mahajan
I'd also like to throw in a shout out for lxml, which provides pythonic bindings for libxml2. In addition to building etrees, it provides more complete xpath support that elementtree alone.
Aaron Maenpaa
+ 1. I tried XML processing in C#, PHP, Java and Python and the last wins. It's just easier and clearer. And there are already tools to do most of the mapping automatically.
e-satis
You begin, "A dynamic language rules for this. Why? The mappings are easy to code and change. You don't have to recompile and rebuild." You have confusing dynamic with interpreted, if you're using static in the usual PL sense.
ja
@ja: "confusing dynamic with interpreted"? You'll have to provide me some help on the differences between dynamic and interpreted. I didn't know there was a difference.
S.Lott
+7  A: 

XSLT

I suggest using XSLT templates to transform the XML into INSERT statements (or whatever you need), as required.
You should be able to invoke XSLT from any of the languages you mention.

This will result in a lot less code than doing it the long way round.

AJ
XSLT wont work as we also have to save mapping between files in database
Varun Mahajan
can you give an example?
AJ
You can use an existing (or your own) extension function or extension element to write/update table rows in a database. For example Saxon has a the following 7 types of extension elements: sql:connect, sql:query, sql:insert, sql:column, sql:update, sql:delete, and sql:close
Dimitre Novatchev
+4  A: 

I'll toss in a suggestion for Hpricot, a popular Ruby XML parser (although there are many similar options).

Example:

Given the following XML:

<Export>
  <Product>
    <SKU>403276</SKU>
    <ItemName>Trivet</ItemName>
    <CollectionNo>0</CollectionNo>
    <Pages>0</Pages>
  </Product>
</Export>

You parse simply by:

FIELDS = %w[SKU ItemName CollectionNo Pages]

doc = Hpricot.parse(File.read("my.xml")) 
(doc/:product).each do |xml_product|
  product = Product.new
  for field in FIELDS
    product[field] = (xml_product/field.intern).first.innerHTML
  end
  product.save
end

It sounds like your application would be very fit for a Rails application, You could quickly prototype what you need, you've got direct interaction with your database of choice and you can output the data however you need to.

Here's another great resource page for parsing XML with Hpricot that might help as well as the documentation.

mwilliams
+2  A: 

For quick turnaround I've found Groovy very useful.

rich
It's good to point out that using groovy means having access to Java's XML parsing libraries, in particular their XPath support, without a lot of the boiler plate that goes along with a Java implementation.
Aaron Maenpaa
+2  A: 

An interesting solution could be Ruby. Simply use XML->Object mappers and then use an object-relational-mapper (ORM) to put it inside a database. I had to do a short talk on XML Mapping with ruby, you could look at the slides and see what you like best: http://www.marc-seeger.de/2008/11/25/ruby-xml-mapping/

As for the ORM: Active Record or Datamapper should be the way to go

Marc Seeger
A: 

ECMAScript handles XML pretty nicely using E4X ("ECMAScript for XML"). This can be seen in Adobe's latest version of ActionScript, version 3. I believe JavaScript 2 (to be released with Firefox 4, I think) will support E4X as well.

Not sure about the support of the standalone JavaScript interpreters (i.e. Rhino, et al) of this, which is what matters most to you I suppose... But if it looks good to you, you can always look up their support for it (and report back to us :-)).

See http://en.wikipedia.org/wiki/E4X#Example for a simple example.