lucene

FastVectorHighlighter.Net returning null on GetBestFragment

Hi I have a large index, on which Highlighter.Net works fine, but FastVectorHighlighter returns null as a Best Fragment on Some documents. the searcher works fine. It is just the highlighter. The field has been indexed in the same manner for all documents, so I fail to understand Why it highlights some documents but not all. Using Lu...

How can I use Lucene for personal name (first name, last name) search?

I'm writing a search feature for a database of NFL players. The user enters a search string like "Jason Campbell" or "Campbell" or "Jason". I'm having trouble getting the appropriate results. Which Analyzer should I use when indexing? Which Query when querying? Should I distinguish between first name and last name or just index the f...

My Lucene queries only ever find one hit

I'm getting started with Lucene.Net (stuck on version 2.3.1). I add sample documents with this: Dim indexWriter = New IndexWriter(indexDir, New Standard.StandardAnalyzer(), True) Dim doc = Document() doc.Add(New Field("Title", "foo", Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO)) doc.Add(New Field("Date",...

Lucene: Fastest way to return the document occurance of a phrase?

Hi Guys, I am trying to use Lucene (actually PyLucene!) to find out how many documents contain my exact phrase. My code currently looks like this... but it runs rather slow. Does anyone know a faster way to return document counts? phraseList = ["some phrase 1", "some phrase 2"] #etc, a list of phrases... countsearcher = IndexSearcher(...

Return Entire field from GetBestFragment in FastVectorHighlighter

In Highlighter.Net, we can use NullFragmenter to return the entire field content. Is there any way we can do this in FastVectorHighlighter.Net? ...

java create a jar file for contrib packages in lucene

I have downloaded the source code of Apache Lucene using svn. Now I want to create a jar file for a particular java file in the contrib portion of the code. the problem is that when I do javac x.java to get a class file and package it into a jar file using jar cf jarfile.jar x.class the package hierarchy is not preserved in the jar file...

Extending / changing how Zend_Search_Lucene searches

Hi, I am currently using Zend_Search_Lucene to index and search a number of documents currently at around a 1000 or so. What I would like to do is change how the engine scores hits on a document, from the current default. Zend_Search_Lucene scores on the frequency of number of hits within a document, so a document that has 10 matches ...

Get highest frequency terms from Lucene index

Hello! i need to extract terms with highest frequencies from several lucene indexes, to use them for some semantic analysis. So, I want to get maybe top 30 most occuring terms(still did not decide on threshold, i will analyze results) and their per-index counts. I am aware that I might lose some precision because of potentionally dr...

Get starting and end index of a highlighted fragment in a searched field

"My search returns a highlighted fragment from a field. I want to know that in that field of particular searched document, where does that fragment starts and ends ?" for instance. consider i am searching "highlighted fragment" in above lines (consider the above para as single document). I am setting my fragmenter as : SimpleFragm...

Nested BooleanQuery?

I'm using a BooleanQuery to combine several queries. I find that if I add a BooleanQuery to the BooleanQuery, then no result is returned. The added BooleanQuery is a MUST_NOT one, like -city_id:100. But as lucene's spec says, BooleanQuery could be nested, which I think means it's okay to add such BooleanQuery. Now I have to get all clau...

Running Long Process: Indexing 5GB docs with Lucene

Situation:I have an ASP .NET application that will search through docs using Lucene. I want to run the initial indexing (the index will be incremental after the initial run so there wont be need to index the whole directory again in future). Currently, I have about 5GB of docs (45000files). Problem: My application times out before compl...

How do I get Lucene (.NET) to highlight correctly with wildcards?

I am using the Lucene.NET API directly in my ASP.NET/C# web application. When I search using a wildcard, like "fuc*", the highlighter doesn't highlight anything, but when I search for the whole word, like "fuchsia", it highlights fine. Does Lucene have the ability to highlight using the same logic it used to match with? Various maybe-...

Setting wildcard queries as default for QueryParser

When my users enter a term like "word" I would like it be treated as a wildcard query "word*" so all terms beginning "word" are found. Is there a way to tell the QueryParser to automatically create wildcard queries or do I have to parse the query myself? This shouldn't be a problem for simple queries but it may become tricky for more com...

Lucene (.NET) Document stucture and performance suggestions.

Hello, I am indexing about 100M documents that consist of a few string identifiers and a hundred or so numaric terms.. I won't be doing range queries, so I haven't dugg too deep into Numaric Field but I'm not thinking its the right choose here. My problem is that the query performance degrades quickly when I start adding OR criteria...

Zend Lucene - cannot search numbers

Using Zend Lucene I cannot search numbers in description fields Added it like this: $doc->addField(Zend_Search_Lucene_Field::Text('description', $current_item['item_short_description'], 'utf-8')); Googling for this showed that applying following code should solve the problem, but it did not..: Zend_Search_Lucene_Analysis_Analyzer::s...

Different analyzers for each field

Hi, How can I enable different analyzers for each field in a document I'm indexing with Lucene? Example: RAMDirectory dir = new RAMDirectory(); IndexWriter iw = new IndexWriter(dir, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.UNLIMITED); Document doc = new Docu...

Custom Lucene Sharding with Hibernate Search

Has anyone experience with custom Lucene sharding / paritioning using Hibernate Search? The documentation of Hibernate Search says the following about Lucene Sharding : In some cases, it is necessary to split (shard) the indexing data of a given entity type into several Lucene indexes. This solution is not recommended unless...

lucene get matched terms in query

what is the best way to find out which terms in a query matched against a given document returned as a hit in lucene? I have tried a weird method involving hit highlighting package in lucene contrib and also a method that searches for every word in the query against the top most document ("docId: xy AND description: each_word_in_query")...

Lucene QueryParser needed that works with Custom Analyzer having stopfilter and porterstemfilter

With QueryParser, the stemfilter does not seem to work and with AnalyzingQueryParser, the stop filter is not effective. Is my observation correct? How to solve this problem? Update OK So did some experiments with the code. The AnalyzingQueryParser does not allow stopfilter and the QueryParser does not allow porterstemmerfilter with fu...

Using Lucene to Query File properties in Windows

Hi All, I am planning to use Apache lucense in one of my projects, I want to index files based on the file properties (I won’t be indexing the data) and I want lucense to query the index so that I can quickly find list of files to based on the properties . E.g: give me all the files with access time greater than 10/10/2005 and access t...