views:

49

answers:

2

I have encountered a problem when designing the table schema for our system.

Here is the situation:

  1. our system has a lot of items ( more than 20 millions ), each item has an unique id, but for each item there can be lots of records. For example for the item with id 1 there are about 5000 records and each record has more than 20 attributes. The needs to be identified by its id and status of one or more of its attributes for use in select, update or delete.

  2. I want to use innodb

But the problem is when using innodb, there must be an cluster index. Due to the situation described above it seems had to find a cluster index so I can only use an auto_increment int as the key

The current design is as follows:

create table record (
item_key int(10) unsigned NOT NULL AUTO_INCREMENT,
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_key`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1

the sql statement:

select * from records 
  where item_id=1 and attribute_1='a1' and attribute_2 between 10 and 1000;

the update and delete statement are similar.

I don't think this a good design, but I can't think of anything else; all suggestions welcome.

Thank you very much!


Sorry if I didn't make the question clear.

  1. What I want to access ( select, update, delete, insert) is the records, not the items. The items have there own attributes, but in the descritions above, the attributes that I mentioned are belongs to the records.

  2. Every item can have many records, like item 1 have about 5000 records.

  3. Every record have 42 attributes, some of them can be NULL, every record has an unique id, this id is unique among different items, but this id is an string not an number

  4. I want to access the records in this way:

A. I will only get(or update or delte) the records that belongs to one specific item at on time or in one query

B. I will get or update the values of all attributes or some specific attributes in the query

C. The attributes that in the condition of the query may not the same as the attributes that I want.

So there could be some sql statements like:

Select attribute_1, attribute_N from record_table_1 where item_id=1 and attribute_K='some value' and attribute_M between 10 and 100

And the reasons that why I think the original design is not good are:

  1. I can't choose an attribute or the record id as the primary key, because it is no use, in every query, I have to assign the item id and some attriubtes as the query condition ( like "where item_id=1 and attribute_1='value1' and attribte_2 between 2 and 3), so I can only use an auto_increment int number as the primary key. The result of this is that each query have to scna two b-trees, and it look like that scan of the secondary index is not effective.

  2. Also compound keys seems useless, because the condition of the query could vary among many attributes.

  3. With the original design, it seems that I have add a lot of indexs to satisfy different queries, otherwise I have to deal with the full talbe scan problem, but it is obviously that too many indexs is not good for update, delete, insert opertaions.

Thank you again for your reply.

A: 

If you want a cluster index and don't want to use the myisam engine, it sounds like you should use two tables: one for the unique properties of the items and the other for each instance of the item (with the specified attributes).

igelkott
A: 

You're right the schema is wrong. Having the attribute 1..20 as fields within the table is not the way to do this, you need a separate table to store this information. This table would have the item_key from this record together with its own key and a value and therefore this second table would have indexes that allow much better searching.

Something like the following:

simple database diagram

Looking at the diagram it is obvious that something is wrong because the record table is too empty, it doesn't look right to me so maybe I'm missing something in the original question....

Compound Keys

I think maybe you are looking to have compound key rather than a clustered index which is a different thing. You can achieve this by:

create table record (
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_id`,`attribute_1`,`attribute_2`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1
Richard Harrison
Martin
I'm aware of the limitations of the EAV - and you're right that it's not a terribly good general pattern and can defeat many of the features that should be designed into a good schema. However I still think it has a place where the attributes are not pre-defined. In this case I'm not sure because the OP is vague, so I'll revise to better explain what I mean.
Richard Harrison
hi Richard,thank you for your advise.Sorry if I did't make myself clear.Actually what I want to access is the records, not the items.Each record has an unique id and belongs to one item, every record has more than 20 attributes.
And One item can have many records.I want to get the values of some attributes of the records that belongs to one specific item with the conditions of other attribures.Such as: select attribute_1, attribute_2, attribute_3 from records where item_id=1 and attribute_4='hello' and attribute_5 between 3 and 6.Is EAV model suitable for this situation? It semms I have to access many tables
And the problem is that there are too many items ( more than 20 millions), so even if I split the record table into several small tables with the same structure ( such as id between 1 and 10000 in table1, id between 10001 and 20000 in table2), it is still seems that the original design is not good, because there may be too many records that belongs the same item in one table.