views:

495

answers:

4

I've been working on a site idea the general concept is a full text search of documents that also allows user ratings based on these rating I wanted to boost the item's value in the Lucene index. But I'm trying to find if I should extend JackRabbit or just build from the Lucene base. Is there any good way to extend JackRabbit in this way and effect the index or would it be best to work directly off Lucene?

Either way I go I am strongly leaning to using groovy on grails with either the searchable plugin or work directly with JackRabbit is there any major reasons I should just stick to Java?

Clarification:

I would like to boost an item based on the average user rating of an item, is JackRabbit open enough or expandable enough where I can capture user ratings then have those effect the index within JackRabbit or is it so far out of the core of JackRabbit I should just build up from Lucene?

+1  A: 

is there any major reasons I should just stick to Java?

Not really. As you probably already know, you can use any Java library with Groovy/Grails, so there's nothing you can do in Java that you can't do in Groovy. Although the contrary is also true, in my experience, it takes a lot more (boilerplate) code to get things done in Java.

Although Java is considerable faster than Groovy, this doesn't necessarily mean your app will be faster if written in Java, as the bottleneck could likely be the database rather than code execution.

As for whether you should use Lucene/Searchable or JackRabbit, it's very difficult to say without knowing much about what you can achieve. All you've told us so far is that you want to index documents and boost certain items in the index. You can certainly do both of those with Lucene.

Don
I have tried to clarify my question, the main question is around JackRabbit vs Lucene. With the groovy question just double checking there is not gotcha with either JackRabbit or Lucene.
Jeff Beck
+1  A: 

I would recommend using JCR/Jackrabbit on top of Lucene for a couple of reasons:

1) Your repository structure could readily support document nodes with child nodes that store all of your meta-data including owner, ratings, flagging, comments, etc.

2) JCR is ideal for document/node based app development, providing a lot of the heavy lifting at the framework level while not getting in your way at the app level.

Lance Weber
So there is a way to make a child node that is meta data effectively boost the parent node in search?
Jeff Beck
+1  A: 

I recommend using JCR, with the implementation of Jackrabbit behind it. JCR allows you to separate between what you store and how you store it.

By staying within a JCR framework, you should be able to easily switch among JCR implementations. (There are several, not just Apache's.) Even within Jackrabbit are many persistence managers, not just Lucene. This flexibility is useful when you want to trade off between storage space and performance.

JCR already includes full text searches and the ability to maintain user ratings. It should be a good fit for your project.

Chip Uni
+1  A: 

I would recommend you to use Apache Sling, it comes with Jackrabbit/Lucene built-in. Most of the committers are also involved with Jackrabbit, so it's designed to work well with it -- even better, it's designed to run on top of it.

One of the nice features of Sling is that it mounts the entire JCR repository in the URL space and exposes it via REST endpoints. So you can access your documents/metadata very easily by doing a simple HTTP request to it. It also allows you to write your own servlets and expose them as REST endpoints. (This is extremely easy -- no fiddling about with applicationContext.xml files, just 1 annotation)

It also allows you to write jsp, esp, groovy, ...

Simon
Sounds interesting how does it let you effect the search results?
Jeff Beck
I don't think you need to extend Jackrabbit/Lucene for this. I would probably add a property on the item called 'my:score' and each time some positive feedback has been left, I'd increase the value. Then I would do a standard query and order the items descending on 'my:score'. To keep things fast, you would probably have to create an index for the 'my:score' property.
Simon