views:

59

answers:

2

I have a digital library system where I store metadata and the path to physical file in the database. The files may be anything: plain text,Word,PDF,MP3,JPEG,MP4...

How can I provide full text search to both my domain objects and the physical files (or some text extraction of the files).

Is my only choice to store the document text in the domain object? I do need to be able to retrieve a list of domain objects regardless of if the search results come from the domain object or the physical document. There is of course a possible connection using the file path and I actually drop each document in a folder named by a GUID, so the connection is there.

I need to do this in Grails, ideally using the solr or searchable plugin, but a Java solution would help.

A: 

You don't need to store the content in the domain object, just associated the content with the domain object when creating the index entry. I used Apache POI to extract my content, but there are higher level services like Apache Tika

you could code it up in java using Lucene directly but I would suggest SOLR instead

grails searchable plugin based on Compass which is based on Lucene

Aaron Saunders
Thanks for the reply. Can you give me an example or point me to the right documentation?
Brad Rhoads
@Brad Rhoads updated answer with links to additional content
Aaron Saunders
Thanks. I'm aware of searchable; its really being replaced by the solr plugin it seams. What I'm looking for is an example of *how* to get the physical docs indexed. E.g., how do I tell solr (or searchable) to 1) index my domain object and 2) follow the path to the physical document pointed to by Domain.uri and index the physical document along with the domain object. (1) is clearly documented, but I'm looking for help with (2). Thanks.
Brad Rhoads