views:

105

answers:

4

I have a Ruby on Rails application with a PostgreSQL database; several tables have created_at and updated_at timestamp attributes. When displayed, those dates are formatted in the user's locale; for example, the timestamp 2009-10-15 16:30:00.435 becomes the string 15.10.2009 - 16:30 (the date format for this example being dd.mm.yyyy - hh.mm).

The requirement is that the user must be able to search for records by date, as if they were strings formatted in the current locale. For example, searching for 15.10.2009 would return records with dates on October 15th 2009, searching for 15.10 would return records with dates on October 15th of any year, searching for 15 would return all dates that match 15 (be it day, month or year). Since the user can use any part of a date as a search term, it cannot be converted to a date/timestamp for comparison.

One (slow) way would be to retrieve all records, format the dates, and perform the search on that. This could be sped up by retrieving only the id and dates at first, performing the search, and then fetching the data for the matching records; but it could still be slow for large numbers of rows.

Another (not database-agnostic) way would be to cast/format the dates to the right format in the database with PostgreSQL functions or operators, and have the database do the matching (with the PostgreSQL regexp operators or whatnot).

Is there a way to do this efficiently (without fetching all rows) in a database-agnostic way? Or do you think I am going in the wrong direction and should approach the problem differently?

+1  A: 

Watever the user enters, you should extract three values: Year, Month and Day, using his locale as a guide. Some values may be empty.

  • If the user enters only Year use WHERE Date BETWEEN 'YYYY-01-01' AND 'YYYY-12-31'
  • If the user enters Year and Month use WHERE Date BETWEEN 'YYYY-MM-01' AND 'YYYY-MM-31' (may need adjustment for 30/29/28)
  • If the user enters the three values use SELECT .... WHERE Date = 'YYYY-MM-DD'
  • If the user enters Month and Day, you'll have to use the 'slow' way
Carlos Gutiérrez
A: 

IMHO, the short answer is No. But definitely avoid loading all rows.

Few notes:

  • if you had only simple queries for exact dates or ranges, I would recommend using ISO format for DATE (YYYY-MM-DD, ex: 2010-02-01) or DATETIME. But since you seem to need queries like "all years for October 15th", you need custom queries anyways.
  • I suggest you create a "parser" that takes your date query and gives you the part of the SQL WHERE clause. I am certain that you will end up having less then a dozen of cases, so you can have optimal WHEREs for each of them. This way you will avoid loading all records.
    • you definitely do not want to do anything locale specific in the SQL. Therefore convert local to some standard in the non-SQL code, then use it to perform your query (basically separate localization/globalization and the query execution)
    • Then you can optimize. If you see that you have a lot of query just for year, you might create a COMPUTED COLUMN which would contain only the YEAR and have index on it.
van
This sounds like a good compromise if I don't want to add (many) computed columns to the database. I'll look into it. Thanks.
Arcturus
+1  A: 

"Database agnostic way" is usually a synonym for "slow way", so the solutions will unlikely be efficient.

Parsing all records on the client side would be the least efficient solution in any case.

You can process your locale string on the client side and form a correct condition for a LIKE, RLIKE or REGEXP_SUBSRT operator. The client side of course should be aware of the database the system uses.

Then you should apply the operator to a string formed according to the locale with database-specific formatting function, like this (in Oracle):

SELECT  *
FROM    mytable
WHERE   TO_CHAR(mydate, 'dd.mm.yyyy - hh24.mi') LIKE '15\.10'

More efficient way (that works only in PostgreSQL, though) would be creating a GIN index on the individual dateparts:

CREATE INDEX ix_dates_parts
ON      dates
USING   GIN
        (
        (ARRAY
        [
        DATE_PART('year', date)::INTEGER,
        DATE_PART('month', date)::INTEGER,
        DATE_PART('day', date)::INTEGER,
        DATE_PART('hour', date)::INTEGER,
        DATE_PART('minute', date)::INTEGER,
        DATE_PART('second', date)::INTEGER
        ]
        )
        )

and use it in a query:

SELECT  *
FROM    dates
WHERE   ARRAY[11, 19, 2010] <@ (ARRAY
        [
        DATE_PART('year', date)::INTEGER,
        DATE_PART('month', date)::INTEGER,
        DATE_PART('day', date)::INTEGER,
        DATE_PART('hour', date)::INTEGER,
        DATE_PART('minute', date)::INTEGER,
        DATE_PART('second', date)::INTEGER
        ]
        )
LIMIT 10

This will select records, having all three numbers (1, 2 and 2010) in any of the dateparts: like, all records of Novemer 19 2010 plus all records of 19:11 in 2010, etc.

Quassnoi
This seems like a very efficient way. I might not use it for this project because I probably don't have the time to learn all the database-specific features. But I will look into GIN indexes and such. Thank you.
Arcturus
+2  A: 

Building on the answer from Carlos, this should allow all of your searches without full table scans if you have indexes on all the date and date part fields. Function-based indexes would be better for the date part columns, but I'm not using them since this should not be database-specific.

CREATE TABLE mytable (
    col1 varchar(10),
    -- ...
    inserted_at timestamp,
    updated_at timestamp);

INSERT INTO mytable
VALUES
    ('a', '2010-01-02', NULL),
    ('b', '2009-01-02', '2010-01-03'),
    ('c', '2009-11-12', NULL),
    ('d', '2008-03-31', '2009-04-18');

ALTER TABLE mytable
    ADD inserted_at_month integer,
    ADD inserted_at_day integer,
    ADD updated_at_month integer,
    ADD updated_at_day integer;

-- you will have to find your own way to maintain these values...
UPDATE mytable
SET
    inserted_at_month = date_part('month', inserted_at),
    inserted_at_day = date_part('day', inserted_at),
    updated_at_month = date_part('month', updated_at),
    updated_at_day = date_part('day', updated_at);

If the user enters only Year use WHERE Date BETWEEN 'YYYY-01-01' AND 'YYYY-12-31'

SELECT *
FROM mytable
WHERE
    inserted_at BETWEEN '2010-01-01' AND '2010-12-31'
    OR updated_at BETWEEN '2010-01-01' AND '2010-12-31';

If the user enters Year and Month use WHERE Date BETWEEN 'YYYY-MM-01' AND 'YYYY-MM-31' (may need adjustment for 30/29/28)

SELECT *
FROM mytable
WHERE
    inserted_at BETWEEN '2010-01-01' AND '2010-01-31'
    OR updated_at BETWEEN '2010-01-01' AND '2010-01-31';

If the user enters the three values use SELECT .... WHERE Date = 'YYYY-MM-DD'

SELECT *
FROM mytable
WHERE
    inserted_at = '2009-11-12'
    OR updated_at = '2009-11-12';

If the user enters Month and Day

SELECT *
FROM mytable
WHERE
    inserted_at_month = 3
    OR inserted_at_day = 31
    OR updated_at_month = 3
    OR updated_at_day = 31;

If the user enters Month or Day (you could optimize to not check values > 12 as a month)

SELECT *
FROM mytable
WHERE
    inserted_at_month = 12
    OR inserted_at_day = 12
    OR updated_at_month = 12
    OR updated_at_day = 12;
cope360
I think I'll go with this solution for now. Computed columns (maintained by the application) should fulfill the requirements without using any of the database-specific features. Thank you for the examples.
Arcturus