views:

45

answers:

2

Hi All,

I am planning to use Apache lucense in one of my projects, I want to index files based on the file properties (I won’t be indexing the data) and I want lucense to query the index so that I can quickly find list of files to based on the properties .

E.g: give me all the files with access time greater than 10/10/2005 and access time less than 10/04/2010 and file created by james.

Can i use Lucene for these kind of projects ? or i better of using windows search (the foor print is very heavy almost 5 MB :( ) and i have to bundling this as part of my application is seems to tough.

Can you please suggest is there any better alternatives here?

A: 

Lucene is definitely a feasible option for indexing file properties, I have done something very similar in the past (searching for images based on image properties).

I am slightly concerned with respect to getting the properties. Are you planning on using the APIs within the File class? Glancing quickly, I find that the APIs are very limiting; as a result, you are probably getting these properties elsewhere and store them in an intermediate medium.

An alternative to Lucene is Sphinx, it seems more light-weight (based on my experience and observation, Lucene is better for larger datasets, in the millions range). I have never worked with Sphinx, but have heard good things. It might be worthwhile to investigate before you commit.

Cambium
i don't have much expertise on the java side, i will see if java has support for the All the file properties i need then i will call a native windows API from java to get the file properties. i will also look in to the Sphinx, tanks very much for quick response.
sneha
A: 

Can i use apache POI or Apache Tika so that i don't need to do lot of processing ?

sneha