lucene

Lucene.net index directory usage in java lucene

Lucene.net is a direct port of Lucene for java, so it stands to reason that i could use the index directory created by Lucene.net directly from Lucene in java, is this assumption correct? ...

no segments* file found

Hi, I need to access a lucene index ( created by crawling several webpages using Nutch) but it is giving the error shown above : java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/home/<path>: files: at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:516) ...

Solr search for lots of values

I'm using Solr to search for a long list of IDs like so: ID:("4d0dbdd9-d6e1-b3a4-490a-6a9d98e276be" "4954d037-f2ee-8c54-c14e-fa705af9a316" "0795e3d5-1676-a3d4-2103-45ce37a4fb2c" "3e4c790f-5924-37b4-9d41-bca2781892ec" "ae30e57e-1012-d354-15fb-5f77834f23a9" "7bdf6790-de0c-ae04-3539-4cce5c3fa1ff" "b350840f-6e53-9da4...

Lucene as data store

Hi, Is it possible to use Lucene as full fledged data store (like other(mongo,couch) nosql variants). I know there are some limitations like newly updated documents by one indexer will not be shown in other indexer. So we need to restart the indexer to get the updates. But i stumble upon solr lately, it seems these problems are avoid...

How Lucene scores results in a RegexQuery?

I can see how two values, when doing a regular/fuzzy full text search, can be compared to determine which one is "better" (i.e. one value contains more keywords than the other, one contains less non-keywords than the other). However, how Lucene computes the score when doing regex queries using RegexQuery? It is a boolean query - a field...

Install LuceneSail on sesame2

I am looking for a web resource where I can find a step-by-step instruction how to install LuceneSail in OpenRdf sesame 2 server and how to enable the sesame workbench to create new LuceneSail repositories. Has anybody successfully installed this sail? ...

Why can I not search for a "0" field in Solr?

From schema.xml: <field name="myfield" type="integer" indexed="true" stored="false"/> The record with id 5 has myfield with value of 0, which I've confirmed by searching for plain id:5 and looking at the objectXml. A search for id:5 AND myfield:0 returns no records. A search for id:5 AND -myfield:1, however, returns the record I am ...

Solr/Lucene behaves weird with some word searches.

I have Solr installed with default configuration (out of box). I have a word "alternatives" in the index. Search for any of the following gives empty results: 1. name:alterna 2. name:alterna 3. name:alterna* 4. name:*altern Obviously, I am expecting to find that entry given any part of the word "alternatives" Anybody with such an exper...

How to get Lucene explanation for a SolrDocument with Solrj?

I'm searching an Solr index with SolrJ and trying to get the Lucene explanation for logging it for further use. The code goes like this: SolrServer server = new CommonsHttpSolrServer("solr_url"); SolrQuery solrquery = new SolrQuery(); solrquery.set("fl", "score, id"); // id is a String field solrquery.set("rows", "1000"...

nutch crawler relative urls problem

Has any one experience a problem with the way the standard html parser plugin handles relative urls? There is a site - http://xxxx/asp/list_books.asp?id_f=11327 and when browsing a link with its href set to '?id_r=442&id=41&order=' a browser will naturally take you to http://xxxx/asp/list_books.asp?id_r=442&amp;id=41&amp;order= However,...

custom Analyzer using ASCIIFoldingFilter not replacing diacritics

Hello experts, We have an issue with a custom Lucene.NET Analyzer which uses ASCIIFoldingFilter and LowerCaseFilter. While indexing our content, the lower case filter works and makes all terms low case but the ASCIIFoldingFilter leaves the diacritics untouched (there are no errors but characters like őŏő are not replaced with o, they ...

In Lucene, using a Standard Analyzer, I want to make fields with spaces searchable.

In Lucene, using a Standard Analyzer, I want to make fields with space searchable. I set Field.Index.NOT_ANALYZED and Field.Store.YES using the StandardAnalyzer When I look at my index in LUKE, the fields are as I expected, a field and a value such as: location -> 'New York'. Here I found that I can use the KeywordAnalyzer to find this v...

Lucene PorterStemmer question

Given the following code: Dim stemmer As New Lucene.Net.Analysis.PorterStemmer() Response.Write(stemmer.Stem("mattress table") & "<br />") // Outputs: mattress t Response.Write(stemmer.Stem("mattress") & "<br />") // Outputs: mattress Response.Write(stemmer.Stem("table") & "<br />") // Outputs: tabl Could someone explain why the Port...

What are the current java search options like Hibernate Search or Compass?

With Compass going the way of the dodo (or at least no longer being actively developed), I wonder what other technologies there are that fill a similar role. I'm aware of Hibernate Search, but nothing else really. It seems the direction things are going is towards full indexing agnostic of entities and relationships. Are there other tech...

How to handle very frequent updates to a Lucene index

I am trying to prototype an indexing/search application which uses very volatile indexing data sources (forums, social networks etc), here are some of the performance requirements, Very fast turn-around time (by this I mean that any new data (such as a new message on a forum) should be available in the search results very soon (less th...

Clustered search results for .net app on Win '03 / '08

hey all, i'm building a .net app using ASP.net 3.5 on win '03 or '08 (not sure yet) using SQL Server 2008. A major part of the app is building a powerful search function which has to cluster search results similar to this site. e.g. search for blindness and you see a cluster of results for blindness but also for visually impaired, eye...

what is the difference between - and NOT operator in Lucene?

In the query syntax of Lucene it is said the following: The NOT operator excludes documents that contain the term after NOT. ... The "-" or prohibit operator excludes documents that contain the term after the "-" symbol I think the difference is that the - operator can be used alone, which is not the case for NOT. Is that it? ...

dose mysql consume memory & cpu very much?

currently i have a project using solr,now i want to add some feature,so i'm thinking is need add mysql to my project solution, as i use a vps,so i must consider memory & cpu consume? so my question is dose mysql cost memory & cpu to much ? also i was thinking is solr can provide the same function,then i can reduce dependence software us...

is CLucene is faster than java lucene?

hello, i am using java lucene and i am moving my code from java to c++ for some reason so i need to know about the performance of clucene can any one explain ...

Symfony with Zend Lucene and related models (with foreign keys)

Well I was developing an application usin Symfony 1.4 and Doctrine when I realized a major drawback on my Zend Lucene implementation. I have a model called Publication that is related (via foreign key relations) with a few other models (subjects, genres, languages, authors, etc.) and I'm getting they're names when adding a new document ...