I'm creating custom forum software for a site I'm building, which includes 2 tables (that are relevant to this question): topics
and posts
. A post belongs to a topic, and the topic contains the subject, while each post contains the body.
Here is the basic table structures with the columns relevant to my question:
CREATE TABLE topics (
id bigserial NOT NULL,
title varchar(128) NOT NULL,
created timestamp with time zone NOT NULL default NOW(),
updated timestamp with time zone NOT NULL default NOW(),
PRIMARY KEY (id)
);
CREATE TABLE posts (
id bigserial NOT NULL,
topic_id bigint NOT NULL REFERENCES topics(id) ON DELETE CASCADE,
body text NOT NULL,
created timestamp with time zone NOT NULL default NOW(),
updated timestamp with time zone NOT NULL default NOW(),
PRIMARY KEY (id)
);
Here are my two options on building fulltext indexes.
Option 1: Create dynamic tsvector indexes on title/body columns.
CREATE INDEX topics_title_idx ON topics USING gin(to_tsvector(title));
CREATE INDEX posts_body_idx ON posts USING gin(to_tsvector(body));
Option 2: Create extra columns to hold tsvector-ized title/body data, and add indexes on those.
ALTER TABLE topics ADD COLUMN topics_vector tsvector NOT NULL;
CREATE TRIGGER topics_ins BEFORE INSERT OR UPDATE ON topics FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(title_vector, 'pg_catalog.english', title);
CREATE INDEX topics_title_idx ON topics USING gin(title_vector);
ALTER TABLE posts ADD COLUMN posts_vector tsvector NOT NULL;
CREATE TRIGGER posts_ins BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(body_vector, 'pg_catalog.english', body);
CREATE INDEX posts_body_idx ON posts USING gin(body_vector);
I'm debating between the two since option 1 will save me disk space, but provide slower searches, and option 2 will require additional disk space while providing faster searches.
Let's pretend there are 20 new topics & 100 new posts per day. Which would you choose? What if the number of topics/posts per day way twice that? Five times that? Ten times? Does your decision of one vs. the other change?