I am building an app in Java using Jena for semantic information scraping. I am looking for a RDFa parser that would allow me to correctly extract all the RDFa statements. Specifically, one that extracts info about namespaces used and presuming that RDFa tags are correct in the page produces correct triples, ones that distinguish between object and data properties.
I went through all RDFa parsers from the site http://rdfa.info/wiki/Consume for Java. They all struggle to extract any RDFa statements and if they do not crash, Jena RDFa parser shows plenty of errors and then dies a terrible death, the data is of little use as it is incorrectly processed and generally mixed up. I am newbie in this area so please be gentle:)
I was also thinking of using a library written in different language but then again I don't really know how to plug it into Java code. Any suggestions?