Hi all, I am relatively new to the wonderfulworld of Solr and have the following question. What is the best way to process documents in terms of extracting the document structure and passing it onto Solr for indexing.
I would like to be able to extract the text from Word Docs, PDF's, Spreadsheets, HTML pages etc. In fact virtually any document that contains text.
I have taken a look at Windows Filters and at first glance they seem to provide the functionality I require.
Is this how you would do it?
sime