This is more of a generic SQL problem but I'm using Firebird 2.5 if anyone knows of a Firebird/Interbase specific optimization. First, the following is a simplified example schema to illustrate the issue I'm trying to solve:
CREATE TABLE users
(
id INTEGER PRIMARY KEY,
name VARCHAR(16)
);
CREATE TABLE data_set
(
id INTEGER PRIMARY KEY,
name VARCHAR(64)
);
CREATE UNIQUE INDEX data_set_name_idx ON data_set(name);
CREATE TABLE data
(
user_id INTEGER,
data_set_id INTEGER,
data BLOB,
PRIMARY KEY(user_id, data_set_id)
);
CREATE INDEX data_user_id_idx ON data(user_id);
CREATE INDEX data_data_set_id_idx ON data(data_set_id);
The query I'm trying to run is as follows:
SELECT users.name, data_set.name, data FROM users, data_set, data
WHERE user_id=XXX AND user_id=users.id AND data_set_id=data_set.id
ORDER BY data_set.name;
With 'XXX' being filled in with the *user_id* I want. So what I'm doing is selecting all the rows from the data table that are owned by a particular user and I'm sorting the results based on the *data_set* name.
This works as it is but the problem is the data table has over a billion rows in it and the *data_set* table is not small either. The result set for a single user id may be many hundreds of millions of rows. What happens is that in order for the ORDER BY to work the database has to create a massive amount of temporary data which is incredibly slow and uses a lot of disk space. Without the ORDER BY it's fast but obviously not sorted like I need.
One solution would be to take the *data_set.name* values and just put them in a varchar column in data. Then that could be indexed and would be quick to sort. The problem with this approach is that it will have a lot of duplicate data and make the database absolutely massive.
Another solution would be something like an Indexed View or an indexed Computed Column. As far as I know neither of those is supported by Firebird.
Any other ideas?