views:

50

answers:

1

Some days ago I started using partitioning to make it possible to work with large amount of data.

I created master table as documentation suggest, created some child tables which inherit from the master one and then added a constraint for each child table to define the allowed key value. Constraint was like:

CHECK (site_id = 'google.com')

Now here is my question:

While reading the documentation I came across the following suggestion: "For each partition, create an index on the key column(s)"

Should I do that if the key will be the same for the whole partition? Will there be any benefits of having an index on site_id column?

When I first read it, I decided that if all rows have the same key (say site_id = 'google_com') then there is no need in index.


Here is SQL illustraing DB structure and typical usage:

CREATE TABLE site ( site_id varchar(50) PRIMARY KEY, descr text );

-- req is a master table
CREATE TABLE req ( site_id varchar(50), timestamp timestamp );

-- now let's create a few child tables

INSERT INTO site (site_id) VALUES ('google_com'); CREATE TABLE req_google_com ( CHECK (site_id = 'google_com') ) INHERITS (req);

INSERT INTO site (site_id) VALUES ('twitter_com'); CREATE TABLE req_twitter_com ( CHECK (site_id = 'twitter_com') ) INHERITS (req);

-- and now write some requests:
INSERT INTO req_google_com VALUES ('google_com', now()), ('google_com', now());

A: 

Yes, if all the data in the table has the same column value, you do not need an index. In fact, you do not really need the column at all.

If you do have an index, and it does get used (for example as the result of a WHERE clause), this may actually slow down things. Going through an index to filter rows is only benificial when you end up selected a small subset of the whole table (some say ten percent at most). Accessing the index and then ending up with every single row in there is a waste of energy.

While reading the documentation I came across the following suggestion: "For each partition, create an index on the key column(s)"

Are you sure the documentation was talking about the partition key? Not the global primary key?

Thilo
I have just edited original post to show the DB structure. I can't get rid of the site_id column. Although the value of site_id is always the same for a given fragment, it is different for another and will be used in WHERE clause to access data in a transparent way (by select from master req table).
Prikrutil
I have read it as "For each partition, create an index on the site_id column". Sorry, I don't understand the difference between "global primary key" and "partition key"
Prikrutil