tags:

views:

215

answers:

5

I've been fiddling for 3 hours and I can't get this F*** parser to work. Sorry for cursing. I don't understand why I can't find **one decent tutorial that does exactly what I want.

I just want to send the function a String/XML. Then, parse it. it's not that hard. In python, I can do it with my eyes closed. Awesome, freaking documentation right here: http://www.crummy.com/software/BeautifulSoup/documentation.html

import BeautifulSoup
soup = BeautifulSoup(the_xml)
persons_name = soup.findAll('first_name')[0].string

Why can't I find a good, simple, documentation that teaches me how to parse XML????? This is my current code for JAVA SAX, and its not working, and I don't even know why.

  public static void parseit(String thexml)
    {
      SAXParserFactory factory = SAXParserFactory.newInstance();
      try {   
            SAXParser saxParser = factory.newSAXParser();
            saxParser.parse( thexml , new DefaultHandler() );
      } catch (Throwable err) {
            err.printStackTrace ();
      }
    }

Can someone just write me the code to parse the XML using SAX parser...please...It's just like 5 lines of code.

+3  A: 

You have to extends your default handler DefaultHandler. For example, try this:

 saxParser.parse( new InputSource(new StringReader(thexml)) , new DefaultHandler()
                {
                public void     startElement(String uri, String localName, String qName, Attributes attributes)
                      {
                      System.out.println("Hello "+qName);
                      } 
                 });
Pierre
A: 

I don't know if this would be an option for you, but since Groovy and Java play nice together why not try one of the Groovy options to process XML.

In particular look at the XML Slurper (http://groovy.codehaus.org/Reading+XML+using+Groovy's+XmlSlurper)

def records = new XmlSlurper().parseText(thexml)
def persons_name = records.first_name[0]

In my opinion that is as close as you'll get to BeautifulSoup in a Java compatible way.

Heinrich Filter
+3  A: 

Ok, so what you need to do is to implement your own handler (instead of using default one). So replace

saxParser.parse( thexml , new DefaultHandler() );

with

 saxParser.parse( thexml , new MyFreakingHandler() );

where MyFreakingHandler implements interface HandlerBase or it can extend DefaultHandler class. Then simply provide implementation for such methods like

public void startDocument () throws SAXException
public void endElement (String name) throws SAXException

I don't know however why you could not find any tutorial on the web. I haven't been using SAXParser for at least 3 years now and in order to answer your post I just simply asked Google for help.

EDIT:

Ok, so to clear things out. There used to be an official Java tutorial for SAX, that somehow I cannot find on the web now, however there are still number of decent non-official tutorials that can be quite helpful. Try with this on for instance: http://www.java-samples.com/showtutorial.php?tutorialid=152

Paul Szulc
+2  A: 

You must extend DefaultHandler with your own implementation. The sax parser is good if you are working with large documents. If not, you might be better off with another xml parser, for example dom4j.

Here's a simple sax tutorial

Helgi
A: 

Using the Java XPath API

XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
XPathExpression xPathExpression = xPath.compile("//first_name");
NodeList nodes = (NodeList) xPathExpression.evaluate(
    new InputSource(new FileInputStream(the_xml)), XPathConstants.NODESET);

Yes it is unnecessarily verbose.

Michael Barker