indexing

Lucene.Net alternative sorting

Most of the time (>90%) I will want results to be sorted by the UpdatedOn field. When I do a search with this sort the results take almost 500% longer than a search based on a score sort. Is there some alternate way of indexing that will optimize this kind of sort? I use multiple indexes (MultiSearcher is that matters). ...

How do I extract keywords used in text?

How do I data mine a pile of text to get keywords by usage? ("Jacob Smith" or "fence") And is there a software to do this already? even semi-automatically, and if it can filter out simple words like "the", "and", "or", then I could get to the topics quicker. ...

WAIT in Transaction - Firebird

Hi all, Can we set the index active when another transaction is in progress. Will firebird wait till the other transaction completes its operation and the index will become inactive/active (this is for reindexing) after that ? Thank you. Regards, Sabu ...

C++ sorting and keeping track of indexes

Using c++, and hopefully the STL, I want to sort a sequence of samples in ascending order, but I also want to remember the original indexes of the newly samples. For example I have a set, or vector, or matrix of samples A : [5, 2, 1, 4, 3] I want to sort these to be B : [1,2,3,4,5], but I also want to remember the original indexes of th...

Django: create Index: non-unique, multiple column

Given the following model, I want to index the fields (sequence,stock) class QuoteModel(models.Model): quotedate = models.DateField() high = models.FloatField() #(9,2) DEFAULT NULL, low = models.FloatField() #(9,2) DEFAULT NULL, close = models.FloatField() #(9,2) DEFAULT NULL, closeadj = models.FloatFi...

What indexing implemetations can handle arbitrary column combinations?

I am developing a little data warehouse system with a web interface where people can do filtered searches. There are current about 50 columns that people may wish to filter on, and about 2.5 million rows. A table scan is painfully slow. The trouble is that the range of queries I'm getting have no common prefixes. Right now I'm using sql...

Is an Index Organized Table appropriate here?

I recently was reading about Oracle Index Organized Tables (IOTs) but am not sure I quite understand WHEN to use them. So I have a small table: create table categories ( id VARCHAR2(36), group VARCHAR2(100), category VARCHAR2(100 ) create unique index (group, category, id) COMPRESS 2; The id column is a foreign ...

How do I delete old documents from Lucene/Lucene.NET

What is the idiomatic way to delete old documents from a Lucene Index? I have a date field (YYYYMMddhhmmss) on all of the documents, and I'd like to remove anything more than a day old (for example). Should I perform a filtered search or enumerate through the IndexReader's documents? I'm sure the question is the same regardless of whi...

os.popen subprocess conversion

Hi Pythoners. This snippet gets me the dotted quad of my BSD network interface. I would like to figure out how to use the subprocess module instead. ifcfg_lines = os.popen("/sbin/ifconfig fxp0").readlines() x = string.split(ifcfg_lines[3])[1] Seems as if I can't use subprocess in exactly the same way. I don't think I want shell=True ...

MySQL question: Indexes on columns!

Hi guys... I've a MySQL question I've two tables (posts and authors) in a one to many relationship (since each post is written by an author and an author can write multiple posts). So here are the tables: Authors: id:BIGINT, name:VARCHAR(255) Posts: id:BIGINT, author_id:BIGINT, body:TEXT I've got 700,000 posts and 60,000 au...

How can I index a bunch of files in Perl?

I'm trying to clean up a database by first finding unreferenced objects. I have extracted all the database objects into a list, and all the ddl code into files, I also have all the Java source code for the project. Basically what I want to do (preferably in Perl as it's the scripting language that I'm most familiar with) is to somehow i...

Implementing and indexing User Defined Fields in an SQL DB

I need to store a large table (several millions or rows) that contains a large number of user-defined fields (not known at compile time, but probably around 20 to 40 custom fields). It is very important (performance-wise) for me to be able to query the data based on those custom fields: i.e. "Select the rows where this attribute has that...

How do I optimize a database for superstring queries?

So I have a database table in MySQL that has a column containing a string. Given a target string, I want to find all the rows that have a substring contained in the target, ie all the rows for which the target string is a superstring for the column. At the moment I'm using a query along the lines of: SELECT * FROM table WHERE 'my supe...

How many fields should be indexed and how should I create them?

I've got a table in a MySQL database that has the following fields: ID | GENDER | BIRTHYEAR | POSTCODE Users can search the table using any of the fields in any combination (i.e., SELECT * FROM table WHERE GENDER = 'M' AND POSTCODE IN (1000, 2000); or SELECT * FROM table WHERE BIRTHYEAR = 1973;) From the MySQL docs, it uses left inde...

Peculiar case with SQL Server, indices and parameters

I have a table, let's call it History. The primary key (aka Clustered Index) is called HIST_ID. The table has some 2300 rows in the development DB. Now consider the following two queries: Query 1: declare @x int set @x = 14289 select * from History where hist_id=@x Query 2: declare @x int set @x = 14289 select * from History where...

Lucene Index and Query Design Question - Searching People

I have recently just started working with Lucene (specifically, Lucene.Net) and have successfully created several indicies and have no problem with any of them. Previously having worked with Endeca, I find that Lucene is lightweight, powerful, and has a much lower learning curve (due mostly to a concise API). However, I have one specif...

How does indexing a list with a tuple work?

I am learning Python and came across this example: W = ((0,1,2),(3,4,5),(0,4,8),(2,4,6)) b = ['a','b','c','d','e','f','g','h','i'] for row in W: print b[row[0]], b[row[1]], b[row[2]] which prints: a b c d e f a e i c e g I am trying to figure out why! I get that for example the first time thru the expanded version is: print...

Does SQL Server jump leaves when using a composite clustered index?

Consider the following composite clustered index: CREATE UNIQUE CLUSTERED INDEX ix_mytable ON mytable(a, b) Obviously, a separate index on b will make searching for a particular value of b faster. However, if a separate index on b is not employed, it seems to me that the composite index can still be used to find tuples with a particu...

Apache Indexing question...

Hello all. Wondering how I might achieve this look with Apache's Indexing? Is there a module I can download? ...

How to search for text fragments in a database

Are there any open source or commercial tools available that allow for text fragment indexing of database contents and can be queried from Java? Background of the question is a large MySQL database table with several hundred thousand records, containing several VARCHAR columns. In these columns people would like to search for fragments ...