views:

77

answers:

3

Problem

String concatenation is slowing down a query:

date(extract(YEAR FROM m.taken)||'-1-1') d1,
date(extract(YEAR FROM m.taken)||'-1-31') d2

This is realized in code as part of a string, which follows (where the p_ variables are integers, provided as input by end users):

date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''') d1,
date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''') d2

This part of the query runs in 3.2 seconds with the dates, and 1.5 seconds without, leading me to believe there is ample room for improvement.

The query's total run time is under 10 seconds; am looking to bring the entire query down to about 2 or 3 seconds. A hardware upgrade has already happened. ;-)

Version

PostgreSQL 8.4.4.

Question

What is a better way to create the date (presumably without concatenation)?

Update

This looks promising: PGTYPESdate_mdyjul

Many thanks!

A: 

Another alternative would be to create a function index on the concatenation. This works in more general cases where there isn't a better data type available.

John
@John: Thank you. I don't think I can do that in this case; the documentation shows `((first_name || ' ' || last_name))`, however I am not using the month and day from the table: I am creating the date using the year from the table and the month/day from the use. What columns would I index?
Dave Jarvis
+2  A: 

Sadly, I dont think there is other way to build a date without texts concatenation.

Yes, frankly, I dislike the aproach Postgresql have here. It seems that most date manipulation must be made by extracting date fields as integers, casting them as text, appending them to more texts to create a textual representation of a date, and then telling postgres to parse that text as date... This smells bad to me, I instictively feel that building a date by parsing a string should be only done from textual inputs. But, I think, postgresql ties too strongly the data types handling with their textual representations. And so, for example, if I want to build a date from three integer values (D,M,Y) I MUST (if I'm not mistaken) build a string and make PG parse it. I feel so unclean doing this...

Rant aside, I doubt that this can slow down much your performance.

leonbloy
@leonbloy: Using `date('1960-1-1') d1`, the performance is 2.6s. Using `date(extract(YEAR FROM m.taken)||'-1-1') d1`, the performance is 4.3s.
Dave Jarvis
@leonbloy: Using `date('1960'||'-1-1') d1`, the performance is 3.0s.
Dave Jarvis
Using date('1960-1-1') vs date(extract(YEAR FROM m.taken)||'-1-1') d1 sounds like using a constant is faster than an expression containing a column value - which is hardly any surprise. It doesn't sound like the concatenation is the issue...
rfusca
@rfusca: Fair enough; the problem still remains, though. I have to first convert integers to text then back into a date. Surely there must be a better way?
Dave Jarvis
@Dave Jarvis - I don't think there's a better way at least for that snippet of your query. My two cents.
rfusca
you can use a construct such as `date('1960-01-01'::date + year*'1 year'::interval + month*'1 month'::interval + day*'1 day'::interval)` but it's still going round the houses a bit.
araqnid
@araqnid : That's ok. But more to my point: having to write `d * '1 day'` to express "an interval of 'd' days" makes me cringe.
leonbloy
@leonbloy You can expose some of the C levels functions as per http://www.postgres.cz/index.php/Project_of_UDF_and_its_realization_at_C_for_PostgreSQL . They could take pure integers then.
rfusca
A: 

Wow. I'm suprised, but using the fuctions from this page - specifically the one to build a date value from three integers - which really do nothing more expose the internal C date functions, really is a lot faster. Benchmarking for me show that creating the dates that way was much faster.

First one is the implementation of the "dateserial" function:

postgres=# select to_date(a,1,3) 
postgres-# from generate_series(100,1000000) as v(a);

Time: 1365.851 ms

postgres=# select (a::text||'-01-03')::date from 
postgres-# generate_series(100,1000000) as v(a);

Time: 3454.224 ms

Full Solution

Edit dateserial.c:

#include "postgres.h"
#include "utils/date.h"
#include "utils/nabstime.h"

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

Datum dateserial(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1 (dateserial);

Datum
dateserial(PG_FUNCTION_ARGS) {
  int32 p_year = PG_GETARG_INT32(0);
  int32 p_month = PG_GETARG_INT32(1);
  int32 p_day = PG_GETARG_INT32(2);

  PG_RETURN_DATEADT( date2j( p_year, p_month, p_day ) - POSTGRES_EPOCH_JDATE );
}

Edit Makefile:

MODULES = dateserial
PGXS := $(shell pg_config --pgxs)
include $(PGXS)

Edit inst.sh (optional):

#!/bin/bash

make clean && make && strip *.so && make install && /etc/init.d/postgresql-8.4 restart

Run bash inst.sh.

Create a SQL function dateserial:

CREATE OR REPLACE FUNCTION dateserial(integer, integer, integer)
  RETURNS date AS
'$libdir/dateserial', 'dateserial'
  LANGUAGE 'c' IMMUTABLE STRICT
  COST 1;
ALTER FUNCTION dateserial(integer, integer, integer) OWNER TO postgres;

Test the function:

SELECT dateserial( 2007::int, 5, 5 )
rfusca
@rfusca: Yes. String concatenation is a (relatively) huge performance hit. I implemented the dateserial function and am now using it.
Dave Jarvis
@Dave Jarvis - I expected a penalty, just not as much as I saw.
rfusca
@rfusca: `dateserial( extract(YEAR FROM m.taken), 1, 1 )`; I should get another performance boost by using `dateserial( m.taken, 1, 1 )` (extracting the year within the dateserial function).
Dave Jarvis
@rfusca: Extracting the year in the function is only 1/10th of a second improvement; hardly worth the effort.
Dave Jarvis