views:

65

answers:

2

I'm accessing some website and I need to extract some data. To be more specific - from this part:

<input type="hidden" value="1" name="d520783895194bd08750e47c744d553d">

I need to extract the "name" part. I heard that reular expressions are not the best solution, so I'd like to ask what is the best way to access this piece of data I need.

+1  A: 

Use a Html parsing library, they fix malformed Html a make it easy to navigate the document to find and update elements. Here is a link to a list of Java/Grovy implementations:

http://www.wavyx.net/2009/01/13/looking-for-a-java-html-parser-or-groovy/

Looks like NekoHTML and TagSoup are popular, but I haven't used either or Grovy for that matter. But I have used Html Parsers in other languages.

tarn
+2  A: 

After parsing a website with NekoHTML or TagSoup (which should take care of the fact that your input field tag is not closed), I suggest to use a xpath expression:

//input[@type='hidden'][@value=1]/@name

In groovy you will apply it in form of GPath.

Skarab