lucene.net

lucene and plurals

If you look at the comment here you'll see Lucene is very much the tool to do this. If you want apple and apples (plural) to match, you just need to be careful about using the correct language stemmer when indexing and querying the index. I'm new to lucene and barley understand how adding and saving document work. How do...

Lucene Search for japanese characters

Hi All, I have implemented lucene for my application and it works very well unless you have introduced something like japanese characters. The problem is that if I have japanese string こんにちは、このバイネイです and I search with こ that is the first character than it works well whereas if I use more than one japanese character(こんにち)in search token...

Distributed Lucene.NET

Hi, I have a Terabyte of data, maybe more, which I'd like to index and search with Lucene. I'd like to be able to split the index out to different machines, similar to what Solr does (if I understand Solr correctly). Are there any existing tools to do this on the Windows platform? Thanks! Edit: I'm not very keen on running Java Luce...

Slow Lucene.Net search performance

Facing slow search performance using Lucene.Net (+ NHibernate.Search but that doesn't matter). Luke toolbox overview: Number of fields: 33 Number of documents: 5607 Number of terms: 101377 Has deletions? / Optimized?: Yes (97478) / No Index directory is ~200Mb large. Query (using org.apache.lucene.analysis.SimpleAnalyzer)...

Lucene.NET - sorting by int

In the latest version of Lucene (or Lucene.NET), what is the proper way to get the search results back in sorted order? I have a document like this: var document = new Lucene.Document(); document.AddField("Text", "foobar"); document.AddField("CreationDate", DateTime.Now.Ticks.ToString()); // store the date as an int indexWriter.AddDoc...

Lucene Search for documents that have a particular field?

Lucene.Net - Is there a way to query for documents that contain a particular field. Lets say some of my documents have a field 'foo' and some do not. I want to find all documents that have the field 'foo' - regardless of what the value of foo is. How do I do this? Is it some sort of TermQuery? ...

Lucene.Net PrefixQuery

Hi, i´m development a suggest box for my site search service. I has to search fields like these: Visual Basic Enterprise Edition Visual C++ Visual J++ My code is: Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", false); IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher( dir,true); Term term = n...

Lucene.NET - Find documents that do not contain a specified field

Let's say I have 2 instance of a class called 'Animal'. Animal has 3 fields: Name, Age, and Type The name field is nullable, so before I insert an instance of Animal as a Lucene indexed document, I check if Animal.Name == null, and if it does, I do not insert it as a field in my document. If I were to retrieve all animals, I would see ...

Lucene and Special Characters

I am using Lucene.Net 2.0 to index some fields from a database table. One of the fields is a 'Name' field which allows special characters. When I perform a search, it does not find my document that contains a term with special characters. I index my field as such: Directory DALDirectory = FSDirectory.GetDirectory(@"C:\Indexes\Name", fa...

Lucene Analyzer to Use With Special Characters and Punctuation?

I have a Lucene index that has several documents in it. Each document has multiple fields such as: Id Project Name Description The Id field will be a unique identifier such as a GUID, Project is a user's ProjectID and a user can only view documents for their project, and Name and Description contain text that can have special characte...

Lucene - How to index a value with special characters

I have a value I am trying to index that looks like this: Test (Test) Using a StandardAnalyzer, I attempted to add it to my document using: Field.Store.YES, Field.Index.TOKENIZED When I do a search with the value of 'Test (Test)' my QueryParser generates the following tags: +Name:test +Name:test This operates as I expect because...

Lucene HTMLFormatter skipping last character

I have this simple Lucene search code (Modified from http://www.lucenetutorial.com/lucene-in-5-minutes.html) class Program { static void Main(string[] args) { StandardAnalyzer analyzer = new StandardAnalyzer(); Directory index = new RAMDirectory(); IndexWriter w = new IndexWriter...

ASP.NET library to extract plain text from Open XML file formats

Is there a pre-existing library to extract plain text form Open XML file formats (e.g. docx, pptx, and xlsx) files? I require this to populate a lucene.net index. I've found this example which extracts text from docx and it seems to work okay. But before building my own solution based on this I was wondering if there's something alread...

Building a case for solr

Our product consists of multiple applications, All using Lucene. 2 of the applications I am involved with have Lucene indexes of about 3 GB and 12GB. Another team is building an application, for which they estimate the LUCENE INDEX size to be close to 1 Terabyte. New documents are added to the indexes every 15 days approx. We do not have...

FastVectorHighlighter.Net returning null on GetBestFragment

Hi I have a large index, on which Highlighter.Net works fine, but FastVectorHighlighter returns null as a Best Fragment on Some documents. the searcher works fine. It is just the highlighter. The field has been indexed in the same manner for all documents, so I fail to understand Why it highlights some documents but not all. Using Lu...

My Lucene queries only ever find one hit

I'm getting started with Lucene.Net (stuck on version 2.3.1). I add sample documents with this: Dim indexWriter = New IndexWriter(indexDir, New Standard.StandardAnalyzer(), True) Dim doc = Document() doc.Add(New Field("Title", "foo", Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO)) doc.Add(New Field("Date",...

Return Entire field from GetBestFragment in FastVectorHighlighter

In Highlighter.Net, we can use NullFragmenter to return the entire field content. Is there any way we can do this in FastVectorHighlighter.Net? ...

Proper LINQ to Lucene Index<T> usage pattern for ASP.NET?

What is the proper usage pattern for LINQ to Lucene's Index<T>? It implements IDisposible so I figured wrapping it in a using statement would make the most sense: IEnumerable<MyDocument> documents = null; using (Index<MyDocument> index = new Index<MyDocument>(new System.IO.DirectoryInfo(IndexRootPath))) { documents = index.Where(d...

Get starting and end index of a highlighted fragment in a searched field

"My search returns a highlighted fragment from a field. I want to know that in that field of particular searched document, where does that fragment starts and ends ?" for instance. consider i am searching "highlighted fragment" in above lines (consider the above para as single document). I am setting my fragmenter as : SimpleFragm...

How do I get Lucene (.NET) to highlight correctly with wildcards?

I am using the Lucene.NET API directly in my ASP.NET/C# web application. When I search using a wildcard, like "fuc*", the highlighter doesn't highlight anything, but when I search for the whole word, like "fuchsia", it highlights fine. Does Lucene have the ability to highlight using the same logic it used to match with? Various maybe-...