Using apricot, it is pretty easy to see how I can extract all elements with a given id or class using a CSS Selector. Is it possible to extract elements from a document based on whether some attribute of those elements matches against some regular expression?
A:
If you mean do something like:
doc.search("//div[@id=/regex/]")
then I don't think it can be done. The alternative is to find all elements and then iterate through the results deleting those that don't match a regex.
result = doc.search("//div")
result.delete_if (|x| x.to_s !~ /regex/)
There are lots of alternative approaches. This thread has two other suggestions: Hpricot and Regular Expression.
Note, depending on exactly what it is you are trying to match you may be able to use the "Supported, but different" syntaxes available on the Hpricot Wiki, e.g:
E[@foo$=“bar”]
Matches an E element whose “foo” attribute value ends exactly with the string “bar”
i5m
2009-12-02 14:51:18