ansaurus

Question

Answer 1

A:

4GB RAM -> 6+ you have 100M records, which isn't large but for a desktop machine memory could matter. If this is not a desktop, I'm not sure why you would have such a small amount of memory
AND "DateID">20100610 AND "DateID"<20100618; -> DateID BETWEEN 20100611 AND 20100617;
Create an index on the DateID
Get rid of all the double quotes around field names
Instead of a VarChar, make RoadID a text field

vol7ron 2010-10-24 16:20:50

RAM upgrade isn't an option, unfortunately. More RAM also shouldn't really make any difference, since the index files are already "small" enough to load into memory. I've also tried DateID BETWEEN, sadly made no difference.

TroutKing 2010-10-24 17:53:06

well you also want to create an index on the DateID; see the additions

vol7ron 2010-10-24 19:09:45

Right, I've added an index on DateID. Sadly no difference.

TroutKing 2010-10-24 20:24:08

Answer 2

+2 A:

The slow part is obviosly fetching the data from the tables, since the index access seems to be very fast. You might either optimize your RAM usage parameters (see http://wiki.postgresql.org/wiki/Performance_Optimization and http://www.varlena.com/GeneralBits/Tidbits/perf.html), or optimize the layout of the data in the table by issuing a CLUSTER command (see http://www.postgresql.org/docs/8.3/static/sql-cluster.html).

CLUSTER "TrafficData" USING "RoadDate_Idx";

should do it.

Daniel 2010-10-24 18:43:23

I tried this out yesterday, but had to give up when the CLUSTER command was still running after eleven hours.

TroutKing 2010-10-26 08:56:34

Holla... Maybe a simple sorted select into another might be faster? If CLUSTER is that slow, it really seems to be an io problem, and the data does not seem to match the index at all. If you can, try this on server class hardware...

Daniel 2010-10-26 14:09:30

Answer 3

A:

Adding to Daniel's answer, the cluster operation is a one off process that rearranged the data on disk. The intent is to get your 2000 results rows from fewer disk blocks.

As this is dummy data, being used to find out how you can quickly query it, I'd recommend reloading it, in a pattern closer to how it will be loaded as it is generated. I imagine that the data is generated one day at a time, which will effectively result in strong correlation between DateID and the location on disk. If that is the case, then I'd either cluster by DateID, or split your test data into 365 separate loads, and reload it.

Without that, and having randomly generated data, you're most likely having to perform over 2000 seeks of your disk head.

I'd also check that anything else you're running on Windows 7 isn't adding time to those reads that you don't need, such as ensuring that the blocks read do not contain virus signatures, or concurrently performing an automatically scheduled disk defragmentation (resulting in the disk head hardly ever being anywhere close to where it was last time a database block was read).

Stephen Denne 2010-10-26 22:21:26

ansaurus

tags:

views:

answers:

Very slow bitmap heap scan in Postgres

related questions