tags:

views:

907

answers:

3

I have a DB table with 25M rows, ~3K each (i.e. ~75GB), that together with multiple indexes I use (an additional 15-20GB) will not fit entirely in memory (64GB on machine). A typical query locates 300 rows thru an index, optionally filters them down to ~50-300 rows using other indexes, finally fetching the matching rows. Response times vary between 20ms on a warm DB to 20 secs on a cold DB. I have two related questions:

  1. At any given time how can I check what portion (%) of specific tables and indexes is cached in memory?

  2. What is the best way to warm up the cache before opening the DB to queries? E.g. "select *" forces a sequential scan (~15 minutes on cold DB) but response times following it are still poor. Is there a built-in way to do this instead of via queries?a

Thanks, feel free to also reply by email ([email protected]])

-- Shaul

+1  A: 

Ad. 1 - I have absolutely no idead.

Ad. 2 - why don't you just choose randomly some queries that you know that are important, and run them on cold server? the more the queries you'll run, the better will be the warmup process.

depesz
+1  A: 

2) I usually solve this by having a log of queries from a live system and replaying them. This warms up the typical parts of the data and not the parts that aren't as frequently used (which would otherwise waste RAM).

Thomas
The problem is that I cannot guess user's queries, think "Amazon" - what will be the next 10000 queries? So I would have liked to run something that pulls specific tables and indexes into the cache.
Shaul Dar
Don't guess. Take an actual log from the last 5 minutes, or the last 10'000 queries. I've worked on "a leading search engine provider" and this works great.Or if you have servers that are up and running and you want to warm up a new one you can mirror the queries to the server that is to be warmed up.
Thomas
+2  A: 

Regarding your first point, the contrib module "pg_buffercache" allows you to inspect the contents of the buffer cache. I like to define this:

create or replace view util.buffercache_hogs as
select case
       when pg_buffercache.reldatabase = 0
            then '- global'
       when pg_buffercache.reldatabase <> (select pg_database.oid from pg_database where pg_database.datname = current_database())
            then '- database ' || quote_literal(pg_database.datname)
       when pg_namespace.nspname = 'pg_catalog'
            then '- system catalogues'
       when pg_class.oid is null and pg_buffercache.relfilenode > 0
            then '- unknown file ' || pg_buffercache.relfilenode
       when pg_namespace.nspname = 'pg_toast' and pg_class.relname ~ '^pg_toast_[0-9]+$'
            then (substring(pg_class.relname, 10)::oid)::regclass || ' TOAST'::text
       when pg_namespace.nspname = 'pg_toast' and pg_class.relname ~ '^pg_toast_[0-9]+_index$'
            then ((rtrim(substring(pg_class.relname, 10), '_index'))::oid)::regclass || ' TOAST index'
       else pg_class.oid::regclass::text
       end as key,
       count(*) as buffers, sum(case when pg_buffercache.isdirty then 1 else 0 end) as dirty_buffers,
       round(count(*) / (SELECT pg_settings.setting FROM pg_settings WHERE pg_settings.name = 'shared_buffers')::numeric, 4) as hog_factor
from pg_buffercache
     left join pg_database on pg_database.oid = pg_buffercache.reldatabase
     left join pg_class on pg_class.relfilenode = pg_buffercache.relfilenode
     left join pg_namespace on pg_namespace.oid = pg_class.relnamespace
group by 1
order by 2 desc;

Additionally, the "pageinspect" contrib module allows you to access a specific page from a relation, so I suppose you could simply loop through all the pages in a relation grabbing them?

select count(get_raw_page('information_schema.sql_features', n))
from generate_series(0,
        (select relpages-1 from pg_class where relname = 'sql_features')) n;

This will load all of information_schema.sql_features into the cache.

araqnid