tags:

views:

152

answers:

3

I am querying a particularly ugly HTML file using xpath. I want to extract an HTML table that is buried deep within the document. However instead of going down through the hierarchy from //html/, is there anyway I can just reference the table's unique id attribute?

Obviously this would have far less chance of breaking due to page edits too.

+2  A: 
descendant::*[@id='whatever']

If the "ID" was not unique, you will get a list of all matching nodes.

mihi
Alternatively, `//*[@id='whatever']` will do the same job. But I don't want to take away from mihi's good answer. +1
Welbog
wouldn't that just pull out the id only when it is equal to 'whatever'? I thought he was asking how to get the id attribute regardless of it's value.
ChadNC
thanks - and if it was unique amongst div elements could I get a match via '//div[@id="datafiles"]' ?
rutherford
no they guessed correct - but I can see the confusion. I wanted to use xpath to get a table/div element with a specified id
rutherford
@rutherford: You're correct about the `//div[@id='datafiles']` question.
Welbog
`//*` is not the same as `descendant::*` - `//` means `descendant-or-self::node()` - http://www.w3.org/TR/1999/REC-xpath-19991116#path-abbrev
NickFitz
but `//*/*` is still shorter than `descencant::*` :)
mihi
A: 

something like

xpath.evaluate("/html/body/table[@id]");

It's been a while since I used xpath so that not may be exactly correct but it's close.

ChadNC
ChadNC, you are close. That will only match a table that is a direct child of the body element that contains an attibute named "id". What you want is something like what mihi (http://stackoverflow.com/questions/1744651/xpath-noob-querying-an-xml-element-using-xpath-and-a-unique-attribute/1744694#1744694) said. Or xpath.evaluate("//table[@id = 'tableId']") where 'tableId' is the unique id being searched for.
Jordan S. Jones
roger that. That was just meant to be a simple example that would only find exactly what you said it would. I figured he/she could adjust it to thier specific needs. But I hadn't thought about just using "//table[@id='tableid']" which I think I'll be putting to use fairly soon :)
ChadNC
+1  A: 

You can also just use:

//table[@id='yourId']

The // matches the element anywhere in the document, the 'table' matches only table elements and the filter (between the square brackets) only retrieves the element with your id.

jonstjohn