lucene

Can't search in a certain field using solR

Hi, I'm setting up an environment using Nutch 1.0 + solR 1.4. In Nutch I configured the subcollection plugin which seems to work nicely. If I search as normal adding fl=* I can see the subcollection field is filled as intented. (something like <str name="subcollection">mysite.com</str>). My problem is, I would like to be able to sear...

How to count term frequency for set of documents?

i have a Lucene-Index with following documents: doc1 := { caldari, jita, shield, planet } doc2 := { gallente, dodixie, armor, planet } doc3 := { amarr, laser, armor, planet } doc4 := { minmatar, rens, space } doc5 := { jove, space, secret, planet } so these 5 documents use 14 different terms: [ caldari, jita, shield, planet, gallente...

How to index a string like "aaa.bbb.ddd-fff" in Lucene?

Hi, I have to index a lot documents that contain reference numbers like "aaa.bbb.ddd-fff". The structure can change but it's always some arbitrary numbers or characters combined with "/","-","_" or some other delimiter. The users want to be able to search for any of the substrings like "aaa" or "ddd" and also for combinations like "aaa...

Zend Lucene search relevancy

What are the best practices to configure Zend Lucene to make the search results more relevant? i have the following fields and document type productname (Text) description (Text) category (Keyword) Please give some sample codes. ...

Apache Lucene or another Search in iPhone app

Hi I would like to implement a search functionality within my iPhone app which can search for terms within all the documents in the application. I believe I cannot use Apache Lucene directly since it is in Java. Can I use Lucy which is a C port of Lucene (not sure if Perl and Ruby would work on it)? Or is there any other open-source s...

How get the offset of term in Lucene ?

I want to get the offset of one term in the Lucene . How can i get it ? I vectored my content as Field.TermVector.WITH_POSITIONS_OFFSETS Is there any method in Lucene that give me offset of the term in one Document ? ...

Lucene 2.2 arabic analyzer

Is it possible to modify Lucene 2.2 to add Arabic analyzer and if anyone have done this already where can I get source/jar ...

Lucene DuplicateFilter question

Hi, Why DuplicateFilter doesn't work together with other filters? For example, if a little remake of the test DuplicateFilterTest, then the impression that the filter is not applied to other filters and first trims results: public void testKeepsLastFilter() throws Throwable { DuplicateFilter df = new DuplicateFi...

Custom Solr sorting

Hello everyone, I've been asked to do an evaluation of Solr as an alternative for a commercial search engine. The application now has a very particular way of sorting results using something called "buckets". I'll try to explain with a bit of details: In the interface they have 2 fields: "what" and "where". Both fields are actually ...

Rollback in lucene

Is there a rollback in lucene? I'm saving & updating database repository & lucene repository simultaneously so that the lucene index & database are in sync.. ex. CustomerRepository.add(customer); SupplierRepository.add(supplier); CustomerLuceneRepository.add(customer); SupplierLuceneRepository.add(supplier); // If this here fails i...

Solr/Lucene user click based ranking

I am facing the problem of sort Lucene results based on user click log. I would like that more accessed results comes first. Does anyone knows how to configure or implement such property in Lucene or Solr? Thank you very much. ...

Solr DatImportHandler, multiple resuls of the same type?

Hey guys, some help here would as always be greatly appreciated. I'm indexing data from a db using Solr. Each row in the first table, event_titles, can have more than one start date associated with it, contained in the table event_dates. Data-config is as follows; <entity name="events" query="select id,title_id,name,summary,descripti...

SOLR and Natural Language Parsing - Can I use it?

hey guys, my requirements are pretty similar to this: Requirements http://stackoverflow.com/questions/90580/word-frequency-algorithm-for-natural-language-processing Using Solr While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP. I thought of SOL...

Lucene based database search engine

Hi All, I am planing to add search feature in my web application. I am using Struts 2 framwork for the application and the items that will be searched are strored in a Relational database. In order to achieve a full text search engine I have following doubts : For database based search engine should I use just lucene or some oth...

apache solr : sum of data resulted from group by

Hi, We have a requirement where we need to group our records by a particular field and take the sum of a corresponding numeric field e.x. select userid, sum(click_count) from user_action group by userid; We are trying to do this using apache solr and found that there were 2 ways of doing this: Using the field collapsing feature (htt...

Is there a way I can provide Lucene.NET with a list of predefined relevant terms?

I know I can, during search, specify a "boost factor" to a term as described in http://lucene.apache.org/java/2_4_0/queryparsersyntax.html. My question is: Can I provide Lucene with a predefined table of relevance? For instance, I could say that "chair" and "table" are relevant words with a boost factor of 4 and all subsequent searches...

Lucene search taking TOOO long.

I;m using Lucene.net (2.9.2.2) on a (currently) 70Gig index.. I can do a fairly complicated search and get all the document IDs back in 1 ~ 2 seconds.. But to actually load up all the hits (about 700 thousand in my test queries) takes 5+ minutes. We aren't using lucene for UI, this is a datastore between processes where we have hundreds...

Advice on reading indexes

Hello, I'm trying to figure out the right way to read lucene index only once whilst running the application multiple times, how can I do that in java? Because indexed data will not change so reading them each time would not be necessary. Can someone explain me the logic of it reading them only once? thank you UPDATE : public List ini...

Problems using Lucene Highlighter

I am using Lucene Highlighter 2.4.1 for my application. I use the highlighter to get the best matching fragments, and display them. I make a call to a function String[] getFragmentsWithHighlightedTerms(Analyzer analyzer, Query query, String fieldName, String fieldContents, int fragmentsNumber, int fragmentSize). For example : String te...

Lucene - escape word?

I am playing with lucene for a location search off of a city and state, and everything is going pretty well. the query parser fails when i pass it "state:OR" and disreguards "state:or" Is there a way to tell the searcher/query parser that I am indeed searching for "OR" ? Thanks. ...