views:

14

answers:

1

Hi everyone this is my first question here and im not a programmer.

I would like to generate a sitemap. I am crawling a website with webcrawler (crawler.dev.java.net). Is there any way to use a sax parser for the data i get?

I also used jtidy and i got the homepage html data converted in an xml file.

im very confused there are so many sax parsers, idont know the difference between them and which one to choose.

I want to have access to the attributes of html tags and i cant do that with webcrawler or i dont know how to do it

Whats the difference between org.xml.sax and all the other packages?

A: 

Java provides a standard way of interacting with SAX parsers through JAXP (see code below). To switch between SAX parsers typically you just need to add the parser jar to your class path the code stays the same.

You can do sax parsing as follows:

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

public class Demo {

    public static void main(String[] args) throws Exception {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        XMLReader xmlReader = sp.getXMLReader();
        xmlReader.setContentHandler(new MyContentHandler());
        xmlReader.parse(input);

    }

    private static class MyContentHandler implements ContentHandler {

        public void setDocumentLocator(Locator locator) {
        }

        public void startDocument() throws SAXException {
        }

        public void endDocument() throws SAXException {
        }

        public void startPrefixMapping(String prefix, String uri)
                throws SAXException {
        }

        public void endPrefixMapping(String prefix) throws SAXException {
        }

        public void startElement(String uri, String localName, String qName,
                Attributes atts) throws SAXException {
        }

        public void endElement(String uri, String localName, String qName)
                throws SAXException {
        }

        public void characters(char[] ch, int start, int length)
                throws SAXException {
        }

        public void ignorableWhitespace(char[] ch, int start, int length)
                throws SAXException {
        }

        public void processingInstruction(String target, String data)
                throws SAXException {
        }

        public void skippedEntity(String name) throws SAXException {
        }

    }

}
Blaise Doughan