Hi. Does anyone know a good parser for document metadata in python for unix like systems. In Java, apache tika is great.
No com ... please :)
Thanks
Hi. Does anyone know a good parser for document metadata in python for unix like systems. In Java, apache tika is great.
No com ... please :)
Thanks
If you like tika, you could always use Jython so you can reference tika directly.
hachoir_metadata works great with excel documents http://bitbucket.org/haypo/hachoir/wiki/Home
You don't have to use Jython to use Tika. You can call Java from Python using JCC. You can find decent instructions for this here.
When installing JCC you'll have to use one of two provided patches for setuptools, so it can build shared objects. The c7 version worked for me on Ubuntu 10.04.
Another option would be to use the python subprocess module to call and capture the stdout of Tika.