We have developers with knowledge of these languages - Ruby , Python, .Net or Java. We are developing an application which will mainly handle XML documents. Most of the work is to convert predefined XML files into database tables, providing mapping between XML documents through database, creating reports from database etc. Which language will be the easiest and fastest to work with? (It is a web-app)
In .NET, C# 3.0 and VB9 provide excellent support for working with XML using LINQ to XML:
either C# or VB.Net using LiNQ to XML. LiNQ to XML is very very powerful and easy to implement
A dynamic language rules for this. Why? The mappings are easy to code and change. You don't have to recompile and rebuild.
Indeed, with a little cleverness, you can have your "XML XPATH to a Tag -> DB table-field" mappings as disjoint blocks of Python code that your main application imports.
The block of Python code is your configuration file. It's not an .ini
file or a .properties
file that describes a configuration. It is the configuration.
We use Python, xml.etree and the SQLAlchemy (to separate the SQL out of your programs) for this because we're up and running with very little effort and a great deal of flexibility.
source.py
"""A particular XML parser. Formats change, so sometimes this changes, too."""
import xml.etree.ElementTree as xml
class SSXML_Source( object ):
ns0= "urn:schemas-microsoft-com:office:spreadsheet"
ns1= "urn:schemas-microsoft-com:office:excel"
def __init__( self, aFileName, *sheets ):
"""Initialize a XML source.
XXX - Create better sheet filtering here, in the constructor.
@param aFileName: the file name.
"""
super( SSXML_Source, self ).__init__( aFileName )
self.log= logging.getLogger( "source.PCIX_XLS" )
self.dom= etree.parse( aFileName ).getroot()
def sheets( self ):
for wb in self.dom.getiterator("{%s}Workbook" % ( self.ns0, ) ):
for ws in wb.getiterator( "{%s}Worksheet" % ( self.ns0, ) ):
yield ws
def rows( self ):
for s in self.sheets():
print s.attrib["{%s}Name" % ( self.ns0, ) ]
for t in s.getiterator( "{%s}Table" % ( self.ns0, ) ):
for r in t.getiterator( "{%s}Row" % ( self.ns0, ) ):
# The XML may not be really useful.
# In some cases, you may have to convert to something useful
yield r
model.py
"""This is your target object.
It's part of the problem domain; it rarely changes.
"""
class MyTargetObject( object ):
def __init__( self ):
self.someAttr= ""
self.anotherAttr= ""
self.this= 0
self.that= 3.14159
def aMethod( self ):
"""etc."""
pass
builder_today.py One of many mapping configurations
"""One of many builders. This changes all the time to fit
specific needs and situations. The goal is to keep this
short and to-the-point so that it has the mapping and nothing
but the mapping.
"""
import model
class MyTargetBuilder( object ):
def makeFromXML( self, element ):
result= model.MyTargetObject()
result.someAttr= element.findtext( "Some" )
result.anotherAttr= element.findtext( "Another" )
result.this= int( element.findtext( "This" ) )
result.that= float( element.findtext( "that" ) )
return result
loader.py
"""An application that maps from XML to the domain object
using a configurable "builder".
"""
import model
import source
import builder_1
import builder_2
import builder_today
# Configure this: pick a builder is appropriate for the data:
b= builder_today.MyTargetBuilder()
s= source.SSXML_Source( sys.argv[1] )
for r in s.rows():
data= b.makeFromXML( r )
# ... persist data with a DB save or file write
To make changes, you can correct a builder or create a new builder. You adjust the loader source to identify which builder will be used. You can, without too much trouble, make the selection of builder a command-line parameter. Dynamic imports in dynamic languages seem like overkill to me, but they are handy.
XSLT
I suggest using XSLT templates to transform the XML into INSERT statements (or whatever you need), as required.
You should be able to invoke XSLT from any of the languages you mention.
This will result in a lot less code than doing it the long way round.
I'll toss in a suggestion for Hpricot, a popular Ruby XML parser (although there are many similar options).
Example:
Given the following XML:
<Export>
<Product>
<SKU>403276</SKU>
<ItemName>Trivet</ItemName>
<CollectionNo>0</CollectionNo>
<Pages>0</Pages>
</Product>
</Export>
You parse simply by:
FIELDS = %w[SKU ItemName CollectionNo Pages]
doc = Hpricot.parse(File.read("my.xml"))
(doc/:product).each do |xml_product|
product = Product.new
for field in FIELDS
product[field] = (xml_product/field.intern).first.innerHTML
end
product.save
end
It sounds like your application would be very fit for a Rails application, You could quickly prototype what you need, you've got direct interaction with your database of choice and you can output the data however you need to.
Here's another great resource page for parsing XML with Hpricot that might help as well as the documentation.
An interesting solution could be Ruby. Simply use XML->Object mappers and then use an object-relational-mapper (ORM) to put it inside a database. I had to do a short talk on XML Mapping with ruby, you could look at the slides and see what you like best: http://www.marc-seeger.de/2008/11/25/ruby-xml-mapping/
As for the ORM: Active Record or Datamapper should be the way to go
ECMAScript handles XML pretty nicely using E4X ("ECMAScript for XML"). This can be seen in Adobe's latest version of ActionScript, version 3. I believe JavaScript 2 (to be released with Firefox 4, I think) will support E4X as well.
Not sure about the support of the standalone JavaScript interpreters (i.e. Rhino, et al) of this, which is what matters most to you I suppose... But if it looks good to you, you can always look up their support for it (and report back to us :-)).
See http://en.wikipedia.org/wiki/E4X#Example for a simple example.