I need to index up to 500,000 entries for fastest read. The index needs to be rebuilt periodically , on disk. I am trying to decide between a simple file like a hash on disk or a single table in an embedded database. I have no need for an RDBMS engine.
views:
251answers:
7I'm assuming you're referring to indexing tables on a relational DBMS (like mySql, Oracle, or Postgres).
Indexes are secondary data stores that keep a record of a subset of fields for a table in a specific order.
If you create an index, any query that includes the subset of fields that are indexed in its WHERE clause will perform faster.
However, adding indexes will reduce INSERT performance.
In general, indexes don't need to be rebuilt unless they become corrupted. They should be maintained on the fly by your DBMS.
If the data doesn't need to be completely up to date, you might also like to think about using a data warehousing tool for OLAP purposes (such as MSOLAP). The can perform lightning fast read-only queries based on pre-calculated data.
Perhaps BDB? It is a high perf. database that doesn't use a DBMS.
PyTables Pro claims that "for situations that don't require fast updates or deletions, OPSI is probably one of the best indexing engines available". However I've not personally used it, but the F/OSS version of PyTables gives already gives you good performance: