solr

Solr DataImportHandler with SQL Server

I'm having a problem getting Solr to talk to Microsoft SQL Server via the Microsoft JDBC Driver. I have the handler registered in solrconfig.xml: <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">C:\Program Files\Apache Software Foundation\Tomc...

Denormalizing relational data for lucene/solr

I have an architectural question about using apache solr/lucene. I'm building a solr index for searching a CV database. Basically every cv on there will have some fields like: rate of pay, address, title these fields are straight forward. The area I need advise on is, skills and job history. For skills, someone might add an entry l...

Indexing Fields with SOLR and LowerCaseFilterFactory

I have a Field defined as <fieldType name="text_ws_lc" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll=...

Hadling org.apache.lucene.store.LockObtainFailedException inf Solr

Hi, I am using apache-solr-1.4 server and solrj as my solr client for indexing my documents. I am getting the org.apache.lucene.store.LockObtainFailedException when i index the documents. This exception is not generated eachtime i try to index the documents.They are generated in some time interval and the solr/data/index dir contains lu...

Why does my solr slave index keep growing?

I have a 5-core solr 1.4 master that is replicated to another 5-core solr using solr replication as described here. All writes are done against the master and replicated to the slave intermittently. This is done using the following sequence: Commit on each master core Replicate on each slave core Optimize on each slave core Commit on e...

Solr: strip punctuation before index

I am having a problem with striping punctuation from the solr index When the punctuation sign follow right after a word then this word is not indexed properly. For example: if we index "hello, John", the asset won't be found by keyword "hello" while there will be no issue if we remove comma after word "hello". Is there any FilterFactor...

How does MyISAM scale compared to Solr for Django searching?

Imagine you have a web application written in Django and Python 2.65, and MySQL 5.1 is your database of choice. Now, imagine you will need to scale your app to handle searching 100's of thousands of document and potentially 100's of thousands of users will be using it. Reality: Haystack 1.0 with PySolr and Solr 1.4.0 is proving slow i...

Literal field value for Solr CSV

Is there a way to provide a literal field value when adding a CSV document? The documents do not contain a necessary field, so I need a way to specify the value some other way. I've tried things like f.field.map=:VALUE to no avail. Setting a default value for the missing field in schema.xml works, but it's obviously not a practical so...

PHP, MySQL, spatial data and design

Im building an application where vehicles coordinates are being logged by GPS. I want to implement a couple of features to start with, such as: realtime tracking of vehicles history tracking of vehicles keeping locations and area's for customer records I need some guidelines as where to start on database and application design. Anyth...

Django + Haystack how to do this search

Hi all, I'm new to Haystack and to the search world so I didn't know how to ask this question. What I want to achieve is the following. Having a search query like: one two I would like to get returned any content like: This one one two two one something one here Is this possible with Haystack + solr/xapian? Is also possible to ha...

Solr - use 64-bit Java, not 32-bit Java on Windows 7 64-bit

I have a Windows 7 64-bit machine. I have Java Runtime Environment for both 32-bit and 64 bit installed on the machine. How do I tell Solr to use the 64-bit version of JRE when I start up Solr? ...

Efficiently sorting and paging with Solr when index is changing

I'm working on a structured document viewer, where each Solr document is a "section" or "paragraph" in a large set of legal documents, along with assorted metadata. I have a corpus which will probably represent 10^12 or more of these sections. I want to provide paging for the user so that they can view N of these sections at a time in so...

Solr - multifaceting syntax

I'm having a hard time constructing the URL for a query that has more than one multifacet. I'm following the sample here: http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html For instance, take a look at the eBay screendump, how would the URL look like if you select 'Sony' and 'LG' in the 'Brand' section and the...

How do you think Reddit handles reindexing their posts to keep the accurately in order?

I can't imagine it indexing per vote. It would strain the server innapropriately. I mention this because I'm trying to do something similar on a project of mine, and can't figure out what the best way to index objects after they have been voted on. I am using Sunspot-Solr. ...

How can I Schedule data imports in Solr

The wiki page, http://wiki.apache.org/solr/DataImportHandler explains how to index data using DataImportHandler. But the example uses a command to initiate the import operation. How can I schedule a job to do this on a regular basis?c ...

Wildcard searches using dismax handler ?

Hi, I have successfully indexed files, and want to be able to search using wildcards. I am currently using the dismaxRequestHandler (QueryType = dismax) for the searches so that I can search all the fields for the query. A general search like 'computer' returns results but 'com*er' doesn't return any results. Similary, a search like 'c...

Solr: Retrieve field names from a solr index?

How can I query a solr instance for all (or prefixed) field names? I want to use dynamic fields like category_0_s category_1_s ... but i do not know how many may exist. So I want to retrieve all fields (preferably with the prefix "category_"). Any Ideas? Thanks ...

NoSQL (MongoDB) vs Lucene (or Solr) as your database

With the NoSQL movement growing based on document-based databases, I've looked at MongoDB lately. I have noticed a striking similarity with how to treat items as "Documents", just like Lucene does (and users of Solr). So, the question: Why would you want to use NoSQL (MongoDB, Cassandra, CouchDB, etc) over Lucene (or Solr) as your "dat...

Solr for constantly updating index

I have a news site with 150,000 news articles. About 250 new articles are added daily to the database at an interval of 5-15 minutes. I understand that Solr is optimized for millions of records and my 150K won't be a problem for it. But I am worried the frequent updation will be a problem, since the cache gets invalidated with every upda...

How to perform Phonetic and Aproximative search in Lucene.net

When I read the Lucene.net docs, the only analyzer that I find is the standard one. I want to make sure I can do Phonetic or Aproximative search on my index. Is there some extra library I should use on top of Lucene.net? ...