views:

546

answers:

7

I am currently developing an app that retrieves some data from the internet . For this purpose i use SAX ,i used it before for parsing simple xml files like google weather api .However the websites that i am interested in take parsing to the next level ,the page is huge and looks messy. I only need to retrieve some specific lines ,the rest is not useful for me. I was wondering if it is possible to skip those useless lines/tags or i have to go step by step ?

+1  A: 

Yes, you can skip. Just define those tag which you want and it will only fetch those tag values.

RockOn
Thaks a Lot ,your answer will save me long lines of codes!
Amine
+1  A: 

You can try to use XPath which will use SAX behind the scene to parse your xml. The downside here is that XML will be parsed on every call of Xpath evaluate method.

Georgy Bolyuba
Thanks for you response i will look into this possibility !
Amine
+1  A: 

You you want to read specific tags then DOM parser is much faster than SAX parser..SAX parser is useful if you want to parse big XML files..

SAX parsing is much faster than DOM. DOM also requires that the entire document be brought into memory.
Blaise Doughan
Thanks a lot for your answer , i really appreciate your help , i will try to use DOM and see what outputs i get !
Amine
+1  A: 

Yes you can do it, just ignore the tags you are not interested in. But note that the entire document will have to be parsed for this (DefaultHandler impl)

public startElement(String uri, String localName, 
     String qName, Attributes attributes)  {
  if(localName.equals("myInterestingTag") {
     // do your thing....
  }
}

public void endElement(String uri, String localName, String qName) {
  if(localName.equals("myInterestingTag") {
     // do your thing....
  }
}

public void characters(char[] ch, int start, int length) {
  // if parsing myinteresting tag... do some stuff.
}
naikus
Thanks , that's exactly how i usually do it ;) !
Amine
+1  A: 

You can try a combination of TagSoup for creating a parseable XML document and XPath for fetching the interesting parts.

DaDaDom
thanks DaDaDom for your answer i ll look into that asap !
Amine
sadly using xpath is not that easy on android
Janusz
+1  A: 

See my answer to a similar question for a strategy of using SAX to skip/ignore tags:

http://stackoverflow.com/questions/3357247/skipping-nodes-with-sax/3366536#3366536

It involves switching ContentHandlers on the XMLReader. When you read a porting of the XML document you want to skip you simply swap in a ContentHandler that does nothing with the events. When the end of the section to be ignored is reached it passes control back to the content handler you were using to process the XML content.

Blaise Doughan
Your answer will certainly help achieve my goal ! thanks a lot-Cheers
Amine
+3  A: 

I like commons-digester. It allows you to specify rules against particular tags. The rule gets executed only when the tag is encountered.

Digester is built over sax and hence has all the sax features plus the specificity that is required for selectively parsing specific tags. It also uses a stack that is pushed with new elements as and when the corresponding tag is encountered and is popped when the element ends.

I use it for parsing all my configuration files.

Check out digester at http://commons.apache.org/digester/

raja kolluru
Thanks a lot Raja , I will look into this solution !!
Amine