views:

17

answers:

0

I'd like to know if any (experimental or not ) wrapper induction libraries for java exist.

Given a website of choice I would like to be able to point my code to product-pages of a particular website. The Wrapper Induction library should be able to: - infer the 'wrapper' or schema of the product pages from a couple of examples. - have an easy way of labeling the schema, to pinpoint the data that I want extracted. - use the schema to extract the data.

Essentially this means a way of transforming semi-structured html to structured xml without knowing the transformation-schema to do so in advance.

Ideally the libraries are pretty robust, in that wrappers stay valid under small changes to html-formatting. (using css-classes as markers for example) .

Thanks