questions about lucene | ansaurus

lucene

Word importance in lucene index

Hi all! hmmm, i need to get how important is the word in entire document collection that is indexed in the lucene index. I need to extract some "representable words", lets say concepts that are common and can be representable to whole collection. Or collection "keywords". I did the fulltext indexing and the only field i am using are te...

What is the correct way to rebuild Lucene's index

I have a forum like web application written in Asp.net MVC. I'm trying to implement Lucene.net as the search engine. When I build my index, every now and then I get exceptions related to Lucene not being able to rename the deletable file. I think it's because I empty the index every time I want to rebuild it. Here is the code that deals ...

lucene updation problem

hello , i am using this function to update the index .. private static void insert_index(String url)throws Exception { System.out.println(url); IndexWriter writer = new IndexWriter( FSDirectory.open(new File(INDEX_DIR)), new StandardAnalyzer(Version.LUCENE_CURRENT), true, IndexWri...

Finding the start and end of a match with Lucene

I would like to find the start and end positions of a match from a lucene (Version 3.0.2 for Java) query. It seems like I should be able to get this info from Highlighter or FastVectorHighligher, but these classes seem only return a text fragment with the relevant text highlighted. Is there any way to get this info, either with a Highl...

Why are document stores like Lucene / Solr not included in NoSQL conversations?

All of us have come across the recent hype of no-SQL solutions lately. MongoDB, CouchDB, BigTable, Cassandra, and others have been listed as no-SQL options. Here's an example: http://architects.dzone.com/articles/what-nosql-store-should-i-use However, three years ago a co-worker and I were using Lucene.NET as what seem to fit the descr...

What is the Use of Lucene?

Hey Friend, i have heard lot of time the name Lucene , while i try to fetch details of web crawler it show up most of time.whats the use of Lucene? ...

i need to implement solr schema with frequently updated field

I'm using Lucene/solr for searching and navigation in file upload application i need to update the indexed value 'downloaded' for each document for each download. the same case happed in digg.com , they have how many "diggs" for each link while u searching does i have to delete/insert new document for each download. or there something...

Lucene Search Problem

I have built an index on my database rows (Each row as a document) which are of unicode type in MySQL(i.e. Charset: utf8 and Collation: utf8-bin). But When I search any word English or non-English it gives me no answers. It says: 0 total matching documents My code is the demo code of lucene for search except that I have changed fie...

how to add "did you mean" in nutch-lucene search engine

i am having problem of implementing this suggestion to my bangla search engine. could anyone kindly help me out? ...

Zend Search Lucene and Accented Characters

Hello, I'm trying to find a way in Zend_Search_Lucene to pull off the following scenario: Let's say we have a user and her name is Aïcha (note the special character). If I'm searching the index for Aicha (without the special derivative of i), I'd like for Aïcha to be returned in the results. Is there something special I need to do wh...

Lucene Search Prolem with Unicode Characters

I have indexed a database of some texts and the database texts are of unicode encoding. When I search an english word with lucene search everything goes OK. But when I use a non-English query like: "تو" it gives me the following exception: Exception in thread "main" org.apache.lucene.queryParser.ParseException: Cannot parse '??': '' ...

Getting the Vector Space Model (tf-idf) from a query on a lucene index

I need to get the Vector Space Model(with tf-idf weighting) from the results of a lucene query, and cant figure out how to do it. It seems like it should be simple, and at this stage maybe one of you guys can point me in the right direction. I have been trying to figure out how to do this for a good while, and either I haven't copped h...

Using NHibernate to Index Large Amounts of Data in Lucene.Net

We are using Nhibernate as our data access layer. We have a table of 1.7 million records which we need to index one by one through Lucene for our search. As we run the console app we wrote to build our index, it starts off fast, but as it goes through the items, it progressively gets slower and slower. Our First iteration was to just ...

How do I use ASCIIFoldingFilter in my Lucene app?

I have a standard Lucene app which searches from an index. My index contains a lot of french terms and I'd like to use the ASCIIFoldingFilter. I've done a lot of searching and I have no idea how to use it. The constructor takes a TokenStream object, do I call the method on the analyzer that retrieves a TokenStream when you send it a...

Fuzzy Queries in Lucene

I am using Lucene in JAVA and indexing a table in our database based on company name. After the index I wish to do a fuzzy match (Levenshtein distance) on a value we wish to input into the database. The reason is that we do not want to be entering dupes because of spelling errors. For example if I have the company name "Widget Makers ...

Query types within Lucene

Lucene NOOB alert! I consider myself to be a human of at least reasonable intelligence, however I am having enormous problems mentally grokking the query types within Lucene. In my particular instance I need to search a single string field in my document that is of only moedrate length (avg around 50 chars). I want the user to be able...

How to map a component collection with compass?

I need to map a collection of components with compass (using XML mapping)... Is there any way to achieve this? Thanks in advance for any suggestions. Example classes: class ClassA { private Set<ClassB> bs; // ... getBs/setBs ... } class ClassB {} Example mapping: <class name="com.package.ClassA" alias="classA"> <!-- no ...

Merge factor, minMergeDocs, Lucene

Hi, I am unable to understand the difference between mergefactor and minMergeDocs. For e.g. I want to index 10,000 Documents and say 100 of those Documents fill up my RAM buffer, so Lucene will write out these 100 Documents as a file. Now if I set mergefactor=5, when a fifth segment is to be written to the disk, Lucene will merge all t...

Query for lucene search result

I have a storage of news with the following fields (Title, Body, NewsDate) I need a best query with the following criteria 1) title is more important but less than date 2) date should be compare to the current date if the date of a document is near the current date it is more valuable (NOTE: It doesn't mean that sorting descending on n...

Zend lucene : Multiple criteria on search = bad results

Hello, I new to lucene, and i noticed something annoying : In my search bar, if I type "USA" : return all the matches -> OK. If I type "Developper" : return all the matches -> OK BUT, -if i type "USA Developper", it'll not return me all the developper in the USA. It'll return me some developper in UK, DE, FR + Developpers, Sta...

zend-search-lucene

1
...
36
37
38
39
40
...
48