Information Retrieval database formats? | ansaurus

tags:

views:

41

answers:

1

+1 Q:

Information Retrieval database formats?

I'm looking for some documentation on how Information Retrieval systems (e.g., Lucene) store their indexes for speedy "relevancy" lookups. My Google-fu is failing me: I've found a page which describes Lucene's file format, but it's more focused on how many bits each number is than on how the database is used in producing speedy queries.

Surely someone has some useful bookmarks lying around that they can refer me to.

Thanks!

+2 A:

The Lucene index is an inverted index, so any search on this topic should be relevant, like:

Pascal Dimassimo 2010-04-13 18:25:38

True, it's an inverted index, but if I have a 10-term query, is lucene really looking up each term in the inverted index, intersecting the results, and ranking them?

jemfinch 2010-04-13 18:33:21

In essence, yes, if you look at the Lucene scoring formula (http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/Similarity.html), you'll see that each query terms is used to build a vector that is gonna be used to search in the index

Pascal Dimassimo 2010-04-13 18:38:29

related questions

Lucene.Net Search result to highlight search keywords

Does a pom.xml.template tell me everything I need to know to use the project as a dependency

Can someone compare a Fuzzy Query to a LuceneDictionary solution?

Has anyone used lucene.net with Linq-to-Entities?

Can someone give me a high overview of how lucene.net works?

Using Lucene to count results in categories

Which search technology to use with ASP.NET?

How to do query auto-completion/suggestions in Lucene?

Should an index be optimised after incremental indexes in Lucene?

What is the best search approach using Lucene?

How to best search against a DB with Lucene?

Is there a fast, accurate Highlighter for Lucene?

How to sort by Lucene.Net field and ignore common stop words such as 'a' and 'the'?

How do I estimate the size of a Lucene index?

Analyzer for Russian language in Lucene and Lucene.Net

In Lucene how do terms get used in calculating scores, can I override it with a CustomScoreQuery?

Troubleshoot Java Lucene ignoring Field

Best full text search alternative to ms sql, c++ solution

Strategies for keeping a Lucene Index up to date with domain model changes

How to get facet ranges in solr results?

Using Lucene to search for email addresses

WildcardQuery error in Solr

With Lucene: Why do I get a Too Many Clauses error if I do a prefix search?

Lucene exact ordering

Lucene Score results