I have encountered a problem when designing the table schema for our system.
Here is the situation:
our system has a lot of items ( more than 20 millions ), each item has an unique id, but for each item there can be lots of records. For example for the item with id 1 there are about 5000 records and each record has more than 20 attributes. The needs to be identified by its id and status of one or more of its attributes for use in
select
,update
ordelete
.I want to use innodb
But the problem is when using innodb, there must be an cluster index.
Due to the situation described above it seems had to find a cluster index so I can only use an auto_increment int
as the key
The current design is as follows:
create table record (
item_key int(10) unsigned NOT NULL AUTO_INCREMENT,
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_key`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1
the sql statement:
select * from records
where item_id=1 and attribute_1='a1' and attribute_2 between 10 and 1000;
the update
and delete
statement are similar.
I don't think this a good design, but I can't think of anything else; all suggestions welcome.
Thank you very much!
Sorry if I didn't make the question clear.
What I want to access ( select, update, delete, insert) is the records, not the items. The items have there own attributes, but in the descritions above, the attributes that I mentioned are belongs to the records.
Every item can have many records, like item 1 have about 5000 records.
Every record have 42 attributes, some of them can be NULL, every record has an unique id, this id is unique among different items, but this id is an string not an number
I want to access the records in this way:
A. I will only get(or update or delte) the records that belongs to one specific item at on time or in one query
B. I will get or update the values of all attributes or some specific attributes in the query
C. The attributes that in the condition of the query may not the same as the attributes that I want.
So there could be some sql statements like:
Select attribute_1, attribute_N from record_table_1 where item_id=1 and attribute_K='some value' and attribute_M between 10 and 100
And the reasons that why I think the original design is not good are:
I can't choose an attribute or the record id as the primary key, because it is no use, in every query, I have to assign the item id and some attriubtes as the query condition ( like "where item_id=1 and attribute_1='value1' and attribte_2 between 2 and 3), so I can only use an auto_increment int number as the primary key. The result of this is that each query have to scna two b-trees, and it look like that scan of the secondary index is not effective.
Also compound keys seems useless, because the condition of the query could vary among many attributes.
With the original design, it seems that I have add a lot of indexs to satisfy different queries, otherwise I have to deal with the full talbe scan problem, but it is obviously that too many indexs is not good for update, delete, insert opertaions.
Thank you again for your reply.