I am looking for a solution similiar to PDFBox for PDFs of Apache Tika, however, for PS files.
thanks.
I am looking for a solution similiar to PDFBox for PDFs of Apache Tika, however, for PS files.
thanks.
You could use Ghostscript to convert to a pdf, http://www.osalt.com/ghostscript, then there are various libraries to handle a pdf.
This has an advantage in that you are only pulling from PDFs, so you can handle other formats as long as you can convert them to PDFs.
Like James Black says, it's probably best just to convert to PDF and use your familiar tools.
However, these does exist pstotext which is available in, e.g., the Ubuntu universe in its own package.
Ghostscript itself also comes with both ps2txt and ps2ascii which can also do this.