lucene

Hibernate Search for multiple classes

There are two tables with no relation defined eg : Bugs and Comments For each bug id there are multiple comments.Suppose I am using a query like select b.bugid,b.bugtitle,c.comment from bugs b , comments c where b.bugid = c.bugid In hibernate search ,is there any method to write text queries for multifield searches ? In case, for a...

Lucene number extracting

Hi, I have this number extracting problem. I want to get all matches that don't have a certain number in it ex : 125501874, 125001873 Every number that as 55 at the position 2 are not to be considered. The first numbers range is 0 to 9 and the second is 1-9 so the real range is [01-99] (we cannot have 00 as the first two number) With L...

Mediawiki + Lucene: How To Strip Markup?

Hi, I have the Lucene search extension (http://www.mediawiki.org/wiki/Extension_talk:Lucene-search) integrated with my mediawiki installation. Its all working really well, however- lucene seems to have indexed all the mediawiki /html markup as well and it is showing up in the results. i.e. searching for "green" will return results with...

How does Lucene compute multifield score?

Here's Lucene scoring equation: score(q,d) = coord(q,d) · queryNorm(q) · ∑ ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) ) What about multifield scoring? Does the score gets directly summed or averaged or..? ...

Does StackOverflow use Lucene for tagged searches?

How has SO implemented the tagged search? Is it using Lucene or any other open-source search engine library for tagged searching? What is the best way to search document (PDF, XML, HTML, MS Word) or database? ...

What should go in my Lucene document?

I use Lucene.net to index content and documents etc.. on our CMS. This has worked well so far, but now I've got to take account of the following additions to web pages: Publish date Expiry date Page 'is active' User authorisation So the search results should only show pages that are within the Publish / Expiry window, are 'active' an...

How do I pass a list of 'allowed' IDs to filter a Lucene search?

I need to return just the documents that a user has access to from a Lucene search. I can get a list of IDs from a database that make up the 'allowed' subset. How can I pass these to Lucene? The articles I've found on the web suggest I need to use a BitSet and FieldCache (am I right?), but I'm having trouble finding good examples. Does a...

Zend_Search_Lucene crashes during indexing

I wanted to create search engine for my webpage, but during indexing on server it crashes with errors : Warning: opendir(/admin/lucene/) [function.opendir]: failed to open dir: Too many open files in /admin/includes/Zend/Search/Lucene/Storage/Directory/Filesystem.php on line 159 Warning: readdir(): supplied argument is not a valid...

Sorting in lucene.net

I got my lucene index with a field that needs to be sorted on. I have my query and I can make my Sort object. If I understand right from the javadoc I should be able to doe query.SetSort(). But there seems to be no such method... Sure I'm missing something vital. Any suggestions? ...

How to find "FooBar" when seaching "Foo Bar" in Zend Lucene

I'm building a search function for a php website using Zend Lucene and i'm having a problem. My web site is a Shop Director (something like that). For example i have a shop named "FooBar" but my visitors seach for "Foo Bar" and get zero results. Also if a shop is named "Foo Bar" and visitor seaches "FooBar" nothing is found. I tried t...

Why isn't this encoding strategy in my Lucene index working?

In my datasource there are a lot of special characters like forward slash, minus, plus etc. A lot of these characters bring problems to lucene. That's why I decided to encode all the strings I put in the index. For example apple/pear would become apple%2Fpear I would imagine that searching for the very same string would then return me t...

Lucene Performance: Retrieve all document from Searcher

I have approximately 10 million objects indexed using NIOFSDirectory. When I retrieve documents with MatchAllDocsQuery, the performance is significantly worse than other types of Query's, such as BooleanQuery. I ran some tests, performance is approximately 100 times worse. Since I am only interested in the top n documents anyway, is...

Linq to Lucene: "The predicate of a Lucene Term can not be the empty string."

I am trying to implement Linq To Lucene in my project, but when trying to search for something, I am always getting a Enumeration yielded no results result and when I debug and try to open my [IndexContext].[TableProperty] in the Watch window, I am getting this message: The predicate of a Lucene Term can not be the empty string. ...

How to sort search results on multiple fields using a weighting function?

I have a Lucene index where every document has several fields which contain numeric values. Now I would like to sort the search result on a weighted sum of this field. For example: field1=100 field2=002 field3=014 And the weighting function looks like: f(d) = field1 * 0.5 + field2 * 1.4 + field3 * 1.8 The results should be ordered ...

Security (aka Permissions) and Lucene - How ? Should it be done ?

First some background to my question. Individual entities can have read Permissions. If a user fails a read permission check they cant see that instance. The probelm relates to introducing Lucene and performing a search which simply returns a list of matching entity instances. My code would then need to filter entities one by one. Th...

Implementation of NGramTokenizer for php zend Lucene?

How to implement NGramTokenizer in zend lucene? is it allready implemented somewhere? ...

Lucene performance

Hi guys, could you please suggest on the steps to be followed for lucene performance. especially with large data (around 1TB of pdf files to be indexed) ...

Using Lucene like a relational database

I am just wondering if we could achieve some RDBMS capabilities in lucene. Example: 1) I have 10,000 project documents (pdf files) which have to be indexed with their content to make them available for search. 2) Every document is related to a SINGLE PROJECT. The project can contain details like project name, number, start date, end dat...

How to do a full text search in Cocoa?

I need something like Lucene to do an optimized full text search in Cocoa. I am working on an Iphone app to search through a database. Anybody has any luck with other databases. Any help is appreciated. So far, I can only find this. http://github.com/tcurdt/lucenekit/tree/master ...

Exception when updating Lucene index

Hi, Am a newbie to Lucene search API. I keep getting following exception when updating Lucene index...why do i get this error and how do i avoid it? System.IO.IOException: Lock obtain timed out: SimpleFSLock@C:\Indexes\write.lock at Lucene.Net.Store.Lock.Obtain(Int64 lockWaitTimeout) at Lucene.Net.Index.IndexWriter.Init(Directory...