I'm running PostgreSQL 8.3 on a 1.83 GHz Intel Core Duo Mac Mini with 1GB of RAM and Mac OS X 10.5.8. I have a stored a huge graph in my PostgreSQL database. It consists of 1.6 million nodes and 30 million edges. My database schema is like:
CREATE TABLE nodes (id INTEGER PRIMARY KEY,title VARCHAR(256));
CREATE TABLE edges (id INTEGER,link INTEGER,PRIMARY KEY (id,link));
CREATE INDEX id_idx ON edges (id);
CREATE INDEX link_idx ON edges (link);
The data in the table edges looks like
id link
1 234
1 88865
1 6
2 365
2 12
...
So it stores for each node with id x the outgoing link to id y.
The time for searching all the outgoing links is ok:
=# explain analyze select link from edges where id=4620;
QUERY PLAN
---------------------------------------------------------------------------------
Index Scan using id_idx on edges (cost=0.00..101.61 rows=3067 width=4) (actual time=135.507..157.982 rows=1052 loops=1)
Index Cond: (id = 4620)
Total runtime: 158.348 ms
(3 rows)
However, if I search for the incoming links to a node, the database is more than 100 times slower (although the resulting number of incoming links is only 5-10 times higher than the number of outgoing links):
=# explain analyze select id from edges where link=4620;
QUERY PLAN
----------------------------------------------------------------------------------
Bitmap Heap Scan on edges (cost=846.31..100697.48 rows=51016 width=4) (actual time=322.584..48983.478 rows=26887 loops=1)
Recheck Cond: (link = 4620)
-> Bitmap Index Scan on link_idx (cost=0.00..833.56 rows=51016 width=0) (actual time=298.132..298.132 rows=26887 loops=1)
Index Cond: (link = 4620)
Total runtime: 49001.936 ms
(5 rows)
I tried to force Postgres not to use a Bitmap Scan via
=# set enable_bitmapscan = false;
but the speed of the query for incoming links didn't improve:
=# explain analyze select id from edges where link=1588;
QUERY PLAN
-------------------------------------------------------------------------------------------
Index Scan using link_idx on edges (cost=0.00..4467.63 rows=1143 width=4) (actual time=110.302..51275.822 rows=43629 loops=1)
Index Cond: (link = 1588)
Total runtime: 51300.041 ms
(3 rows)
I also increased my shared buffers from 24MB to 512MB, but it didn't help. So I wonder why my queries for outgoing and incoming links show such an asymmetric behaviour? Is something wrong with my choice of indexes? Or should I better create a third table holding all the incoming links for a node with id x? But that would be quite a waste of disk space. But since I'm new into SQL databases maybe I'm missing something basic here?