views:

24

answers:

2

Active Data Structure

the concept is..

You have a table (one that holds lots of records)..

[data1] (call this the 'active' table)
---------
Data1ID (required)
SysID (required)
createdDate (required)
lastUpdatedDate (required)
[mydata] (user's fields)

you also have n tables holding the same columns as Data1.

[data2] (call these the 'inactive' tables)
---------
Data2ID (req.)
SysID (req.)
createdDate (req.)
lastUpdatedDate (req.)
[mydata] (user's fields)

...

[data(n)]
---------
Data(n)ID (req.)
SysID (req.)
createdDate (req.)
lastUpdatedDate  (req.)
[mydata] (user's fields)

When a user queries for a record, a Trigger is activated which initiates the search to happen on Data1 (the 'active' table).
If a record isn't found, it checks the 'inactive' tables in order down the line.

If a record is found in a 'inactive' table, the record is removed from that table and inserted into Data1 (the 'active' table).

New records go into the 'active' table.

There would never be redundant records across tables sharing the sysid.

Daily Job

  • Each table is assigned a 'Term' date range:

Data1: now - 3 months past.
Data2: 3months - 6 months
Data(n): anything previous to the last defined etc..

  • Any Row modifications are applied to the 'inactive' tables based on the 'active' table.

  • Any records with lastUpdatedDate in the 'active' table that fall within the 'Term' of the 'inactive' tables, are deleted from the 'active' table and moved to the appropriate 'inactive' table. Inactive tables are also checked in the same fasion.

Benefits

My thinking is that this structure could easily be added to existing bloated tables and speed the query performance as well as being a scalable solution by simply redefining your 'Term' ranges and adding 'inactive' tables.

Issues..

Searches on inactive data will obviously take longer.

A: 

Looks like roll your own partitioning. I'd recommend looking at the partitioning options available from your RDBMS vendor. They would not exactly fit this situation, but you may find something that allows partition pruning in most cases.

Other issues

If you are moving data from inactive to active "partitions" on select, the selects are going to have to slow down for data modification.

Data modification tend to take heavier locks then reading. Take for example, Oracle. Two queries are trying to read the same record(s). Normally both would simply read the values, but with this scheme, instead of just reading the record, both are going to try to modify, by moving between tables, and one will block the other.

Shannon Severance
+1  A: 

This is called sharding. SQL Server supports something called horizontal partitioning, which does this kind of thing. You can roll your own, or you can using something like NHibernate which can do this transparently.

You can't run a trigger on a select statement. You could create a view or use a service / repository which handles the database queries and keeps searching until you hit a match, but I'd look at using NHibernate or something that already has some tested support for this.

Jon Galloway