views:

389

answers:

12

I know that any language is capable of parsing XML; I'm really just looking for advantages or drawbacks that you may have come across in your own experiences. Perl would be my standard go to here, but I'm open to suggestions.

Thanks!

UPDATE: I ended up going with XML::Simple which did a nice job, but I have one piece of advice if you plan to use it--research the forcearray option first. I had to rewrite a bunch of statements after learning that it is usually best practice to set forcearray. This page had the clearest explanation that I could find. Frankly, I'm surprised this isn't the default behavior.

+8  A: 

XML::Twig is very nice, especially because it’s not as awfully verbose as some of the other options.

zoul
A second for XML::Twig, specially if you have to handle enormous data sets.
squeeks
XML::Twig allows processing XML in mixed mode.
Alexandr Ciornii
+8  A: 

If you are using Perl then I would recommend XML::Simple:

As more and more Web sites begin using XML for their content, it's increasingly important for Web developers to know how to parse XML data and convert it into different formats. That's where the Perl module called XML::Simple comes in. It takes away the drudgery of parsing XML data, making the process easier than you ever thought possible.

Andrew Hare
XML::Simple is acceptable *sometimes*, but when it comes to complex, strictly-formatted data, it's often more trouble than it's worth. I would try XML::Twig or XML::LibXML instead.
hobbs
Or when XML is large - using ANY DOM parser, including XML::Simple, is a Very Bad Idea.
Lemurik
+5  A: 

For pure XML parsing, I wouldn't use Java, C#, C++, C, etc. They tend to overcomplicate things, as in you want a banana and get the gorilla with it as well.

Higher-level and interpreted languages such as Perl, PHP, Python, Groovy are more suitable. Perl is included in virtually every Linux distro, as is PHP for the most part.

I've used Groovy recently for especially this and found it very easy. Mind you though that a C parser will be orders of magnitude faster than Groovy for instance.

_NT
Parsing XML is C# is straightforward, what have you used to report such a bad experience? Was it with something else than the standard libraries? And for the record, I would hardly place Perl and PHP as "higher language" in comparison, they are not true fully-fledgd object-oriented languages.
RedGlyph
I've used Mono (.net 2.0 compatible). And I said higher-level language, not higher, do some Googling about to see what that means.
_NT
Ah, your sentence looked like a comparative but you simply meant _high_-level language then - And no, I don't usually use Google or Wikipedia to check word definitions but I understand it's a common mistake others often do ;-) In any case, System.Xml is also in Mono and I didn't find anything overcomplicated about it. That just emphasizes another important criterion is that one must feel at ease with the programming language.
RedGlyph
@RedGlyph: Nothing much wrong with it -you can get work done. Just more complicated than the others I mention.
_NT
RedGlyph: Look at Moose for Perl.
Alexandr Ciornii
+1  A: 

Python has some pretty good support for XML. From the standard library DOM packages to much more 'pythonic' libraries that parse XML directly into more usable object structures.

There isn't really a 'right' language though... there are good XML packages for most languages nowadays.

workmad3
+5  A: 

It's all going to be in the libraries.

Python has great libraries for XML. My preference is lxml. It uses libxml/libxslt so it's fast, but the Python binding make it really easy to use. Perl may very well have equally awesome OO libraries.

Lennart Regebro
+3  A: 

Not exactly a scripting language, but you could also consider Scala. You can start from here.

kgiannakakis
+1 for Scala... and fifteen more characters
wheaties
+2  A: 

It's not a scripting language, but Scala is great for working with XML natively. Also, see this book (draft) by Burak.

geowa4
+3  A: 

I saw that people recommend XML::Simple if you decide on Perl.

While XML::Simple is, indeed, very simple to use and great, is a DOM parser. As such, it is, sadly, completely unsuitable to processing large XML files as your process would run out of memory (it's a common problem for any DOM parser, not limited to XML::Simple or Perl).

So, for large files, you must pick a SAX parser in whichever language you choose (IIRC, XML::Twig is SAX, or many other XML parsers in Perl - can't speak for other languages).

Lemurik
A: 

Reading Data out of XML files is dead easy with C# and LINQ to XML!

Somehow, although I really love python, I found it hard to parse XML with the standard libraries.

Daren Thomas
+2  A: 

Scala's XML support is rather good, especially as XML can just be typed directly into Scala programs.

Microsoft also did some cool integrated stuff with their LINQ for XML

But I really like Elementtree and just that package alone is a good reason to use Python instead of Perl ;)

Here's an example:

import elementtree.ElementTree as ET

# build a tree structure
root = ET.Element("html")

head = ET.SubElement(root, "head")

title = ET.SubElement(head, "title")
title.text = "Page Title"

body = ET.SubElement(root, "body")
body.set("bgcolor", "#ffffff")

body.text = "Hello, World!"

# wrap it in an ElementTree instance, and save as XML
tree = ET.ElementTree(root)
tree.write("page.xhtml")
A: 

I would say it depends like everything else. VB.NET 2008 uses XML literals, has IntelliSense for LINQ to XML, and a few power toys that help turn XML into XSD. So personally, if you are working in a .NET environment I think this is the best choice.

Wade73
+1  A: 

If you're going to use Ruby to do it then you're going to want to take a look at Nokogiri or Hpricot. Both have their strengths and weaknesses. The language and package selection really comes down to what you want to do with the data after you've parsed it.

PatrickTulskie