views:

798

answers:

5

Consider two tables:

Transactions, with amounts in a foreign currency:

     Date  Amount
========= =======
 1/2/2009    1500
 2/4/2009    2300
3/15/2009     300
4/17/2009    2200
etc.

ExchangeRates, with the value of the primary currency (let's say dollars) in the foreign currency:

     Date    Rate
========= =======
 2/1/2009    40.1
 3/1/2009    41.0
 4/1/2009    38.5
 5/1/2009    42.7
etc.

Exchange rates can be entered for arbitrary dates - the user could enter them on a daily basis, weekly basis, monthly basis, or at irregular intervals.

In order to translate the foreign amounts to dollars, I need to respect these rules:

A. If possible, use the most recent previous rate; so the transaction on 2/4/2009 uses the rate for 2/1/2009, and the transaction on 3/15/2009 uses the rate for 3/1/2009.

B. If there isn't a rate defined for a previous date, use the earliest rate available. So the transaction on 1/2/2009 uses the rate for 2/1/2009, since there isn't an earlier rate defined.

This works...

Select 
    t.Date, 
    t.Amount,
    ConvertedAmount=(   
        Select Top 1 
            t.Amount/ex.Rate
        From ExchangeRates ex
        Where t.Date > ex.Date
        Order by ex.Date desc
    )
From Transactions t

... but (1) it seems like a join would be more efficient & elegant, and (2) it doesn't deal with Rule B above.

Is there an alternative to using the subquery to find the appropriate rate? And is there an elegant way to handle Rule B, without tying myself in knots?

+4  A: 

You could first do a self-join on the exchange rates which are ordered by date so that you have the start and the end date of each exchange rate, without any overlap or gap in the dates (maybe add that as view to your database - in my case I'm just using a common table expression).

Now joining those "prepared" rates with the transactions is simple and efficient.

Something like:

WITH IndexedExchangeRates AS (           
            SELECT  Row_Number() OVER (ORDER BY Date) ix,
                    Date,
                    Rate 
            FROM    ExchangeRates 
        ),
        RangedExchangeRates AS (             
            SELECT  CASE WHEN IER.ix=1 THEN CAST('1753-01-01' AS datetime) 
                    ELSE IER.Date 
                    END DateFrom,
                    COALESCE(IER2.Date, GETDATE()) DateTo,
                    IER.Rate 
            FROM    IndexedExchangeRates IER 
            LEFT JOIN IndexedExchangeRates IER2 
            ON IER.ix = IER2.ix-1 
        )
SELECT  T.Date,
        T.Amount,
        RER.Rate,
        T.Amount/RER.Rate ConvertedAmount 
FROM    Transactions T 
LEFT JOIN RangedExchangeRates RER 
ON (T.Date > RER.DateFrom) AND (T.Date <= RER.DateTo)

Notes:

  • You could replace GETDATE() with a date in the far future, I'm assuming here that no rates for the future are known.

  • Rule (B) is implemented by setting the date of the first known exchange rate to the minimal date supported by the SQL Server datetime, which should (by definition if it is the type you're using for the Date column) be the smallest value possible.

Lucero
This works perfectly, thanks so much.
Herb Caudill
Is there a reason not to use BETWEEN in the last line?
Herb Caudill
Yes, because BETWEEN would return false results, it's equivalent to `(T.Date >= RER.DateFrom) AND (T.Date <= RER.DateTo)` which is not what you want, since transaction dates equal to echange rate dates will result in 2 rows to be joined then (one of which will use the wrong exchange rate).
Lucero
Should the last line be: `ON (T.Date >= RER.DateFrom) AND (T.Date < RER.DateTo)` (reversing the conditions with the 'or equal' part)?
Jonathan Leffler
It depends, I had it like that at first but then changed it to match your sample, because the output was different.If a transaction and an exchange rates have the same date, shall this exchange rate be used or the previous one? In A, you state "the most recent *previous* date", which sounds like the current condition is correct.
Lucero
+1  A: 

I can't test this, but I think it would work. It uses coalesce with two sub-queries to pick the rate by rule A or rule B.

Select t.Date, t.Amount, 
  ConvertedAmount = t.Amount/coalesce(    
    (Select Top 1 ex.Rate 
        From ExchangeRates ex 
        Where t.Date > ex.Date 
        Order by ex.Date desc )
     ,
     (select top 1 ex.Rate 
        From ExchangeRates  
        Order by ex.Date asc)
    ) 
From Transactions t
Ray
A: 
SELECT 
    a.tranDate, 
    a.Amount,
    a.Amount/a.Rate as convertedRate
FROM
    (

    SELECT 
        t.date tranDate,
        e.date as rateDate,
        t.Amount,
        e.rate,
        RANK() OVER (Partition BY t.date ORDER BY
                         CASE WHEN DATEDIFF(day,e.date,t.date) < 0 THEN
                                   DATEDIFF(day,e.date,t.date) * -100000
                              ELSE DATEDIFF(day,e.date,t.date)
                         END ) AS diff
    FROM 
        ExchangeRates e
    CROSS JOIN 
        Transactions t
         ) a
WHERE a.diff = 1

The difference between tran and rate date is calculated, then negative values ( condition b) are multiplied by -10000 so that they can still be ranked but positive values (condition a always take priority. we then select the minimum date difference for each tran date using the rank over clause.

Paul Creasey
Right, now A seems to be implemented. However - I agree with it not being very elegant, and keep in mind a cross join with this kind of computation will result in slow performance with growing counts of transactions and exchange rates, since the number of rows to be computes is equal to their counts multiplied, e.g. 10 years with one exchange rate per day and 3 transactions per day would give you about 40 million rows to be computed.
Lucero
~Lurcero, yes your absolutely right.
Paul Creasey
+2  A: 

Suppose you had an extended exchange rate table that contained:

 Start Date   End Date    Rate
 ========== ========== =======
 0001-01-01 2009-01-31    40.1
 2009-02-01 2009-02-28    40.1
 2009-03-01 2009-03-31    41.0
 2009-04-01 2009-04-30    38.5
 2009-05-01 9999-12-31    42.7

We can discuss the details of whether the first two rows should be combined, but the general idea is that it is trivial to find the exchange rate for a given date. This structure works with the SQL 'BETWEEN' operator which includes the ends of the ranges. Often, a better format for ranges is 'open-closed'; the first date listed is included and the second is excluded. Note that there is a constraint on the data rows - there are (a) no gaps in the coverage of the range of dates and (b) no overlaps in the coverage. Enforcing those constraints is not completely trivial (polite understatement - meiosis).

Now the basic query is trivial, and Case B is no longer a special case:

SELECT T.Date, T.Amount, X.Rate
  FROM Transactions AS T JOIN ExtendedExchangeRates AS X
       ON T.Date BETWEEN X.StartDate AND X.EndDate;

The tricky part is creating the ExtendedExchangeRate table from the given ExchangeRate table on the fly. If it is an option, then revising the structure of the basic ExchangeRate table to match the ExtendedExchangeRate table would be a good idea; you resolve the messy stuff when the data is entered (once a month) instead of every time an exchange rate needs to be determined (many times a day).

How to create the extended exchange rate table? If your system supports adding or subtracting 1 from a date value to obtain the next or previous day (and has a single row table called 'Dual'), then a variation on this will work (without using any OLAP functions):

CREATE TABLE ExchangeRate
(
    Date    DATE NOT NULL,
    Rate    DECIMAL(10,5) NOT NULL
);
INSERT INTO ExchangeRate VALUES('2009-02-01', 40.1);
INSERT INTO ExchangeRate VALUES('2009-03-01', 41.0);
INSERT INTO ExchangeRate VALUES('2009-04-01', 38.5);
INSERT INTO ExchangeRate VALUES('2009-05-01', 42.7);

First row:

SELECT '0001-01-01' AS StartDate,
       (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
       (SELECT Rate FROM ExchangeRate
         WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
FROM Dual;

Result:

0001-01-01  2009-01-31      40.10000

Last row:

SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
       '9999-12-31' AS EndDate,
       (SELECT Rate FROM ExchangeRate
         WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
FROM Dual;

Result:

2009-05-01  9999-12-31      42.70000

Middle rows:

SELECT X1.Date     AS StartDate,
       X2.Date - 1 AS EndDate,
       X1.Rate     AS Rate
  FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
       ON X1.Date < X2.Date
 WHERE NOT EXISTS
       (SELECT *
          FROM ExchangeRate AS X3
         WHERE X3.Date > X1.Date AND X3.Date < X2.Date
        );

Result:

2009-02-01  2009-02-28      40.10000
2009-03-01  2009-03-31      41.00000
2009-04-01  2009-04-30      38.50000

Note that the NOT EXISTS sub-query is rather crucial. Without it, the 'middle rows' result is:

2009-02-01  2009-02-28      40.10000
2009-02-01  2009-03-31      40.10000    # Unwanted
2009-02-01  2009-04-30      40.10000    # Unwanted
2009-03-01  2009-03-31      41.00000
2009-03-01  2009-04-30      41.00000    # Unwanted
2009-04-01  2009-04-30      38.50000

The number of unwanted rows increases dramatically as the table increases in size (for N > 2 rows, there are (N-2) * (N - 3) / 2 unwanted rows, I believe).

The result for ExtendedExchangeRate is the (disjoint) UNION of the three queries:

SELECT DATE '0001-01-01' AS StartDate,
       (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
       (SELECT Rate FROM ExchangeRate
         WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
FROM Dual
UNION
SELECT X1.Date     AS StartDate,
       X2.Date - 1 AS EndDate,
       X1.Rate     AS Rate
  FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
       ON X1.Date < X2.Date
 WHERE NOT EXISTS
       (SELECT *
          FROM ExchangeRate AS X3
         WHERE X3.Date > X1.Date AND X3.Date < X2.Date
        )
UNION
SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
       DATE '9999-12-31' AS EndDate,
       (SELECT Rate FROM ExchangeRate
         WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
FROM Dual;

On the test DBMS (IBM Informix Dynamic Server 11.50.FC6 on MacOS X 10.6.2), I was able to convert the query into a view but I had to stop cheating with the data types - by coercing the strings into dates:

CREATE VIEW ExtendedExchangeRate(StartDate, EndDate, Rate) AS
    SELECT DATE('0001-01-01')  AS StartDate,
           (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
           (SELECT Rate FROM ExchangeRate WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
    FROM Dual
    UNION
    SELECT X1.Date     AS StartDate,
           X2.Date - 1 AS EndDate,
           X1.Rate     AS Rate
      FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
           ON X1.Date < X2.Date
     WHERE NOT EXISTS
           (SELECT *
              FROM ExchangeRate AS X3
             WHERE X3.Date > X1.Date AND X3.Date < X2.Date
            )
    UNION 
    SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
           DATE('9999-12-31') AS EndDate,
           (SELECT Rate FROM ExchangeRate WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
    FROM Dual;
Jonathan Leffler
@Jonathan, same basic idea I had (and I guess that you were working on it when I posted my answer), however I came up with a different (ans probably more efficient) approach to compute the ExtendedExchangeRate (called RangedExchangeRate in my code, and implementing the open-close date range) by actually using only one self-join. Keeping only one date in the table makes sense because - assuming correct computation - you will never end up with an inconsistent data structure which may break the following join with the transactions.
Lucero
@Lucero: As I noted, I didn't use any of the OLAP operations (such as OVER) in the SQL. Although my example uses inclusive ranges, I suspect that it would be better written using open-closed ranges. I stand by 'it would be better to make the ExtendedExchangeRate (XXR) table into the real table' (with the existing ExchangeRate table then being a simple projection view of the XXR table); the likely imbalance between the few modify operations on XXR and the many select operations on it would be be decisive for me. The actual operations for inserting a new value can be handled by a trigger.
Jonathan Leffler
@Lucero: We _are_ in agreement that the query is simpler when you have the explicit date range information available in an extended exchange rate table. And it is certainly possible to do it your way if the DBMS supports notations such as OVER.
Jonathan Leffler
@Jonathan: Thanks - this is very helpful. I am using SQL Server.
Herb Caudill
A: 

Many solutions will work. You should really find the one that works best (fastest) for your workload: do you search usually for one Transaction, list of them, all of them?

The tie-breaker solution given your schema is:

SELECT      t.Date,
            t.Amount,
            r.Rate
            --//add your multiplication/division here

FROM        "Transactions" t

INNER JOIN  "ExchangeRates" r
        ON  r."ExchangeRateID" = (
                        SELECT TOP 1 x."ExchangeRateID"
                        FROM        "ExchangeRates" x
                        WHERE       x."SourceCurrencyISO" = t."SourceCurrencyISO" --//these are currency-related filters for your tables
                                AND x."TargetCurrencyISO" = t."TargetCurrencyISO" --//,which you should also JOIN on
                                AND x."Date" <= t."Date"
                        ORDER BY    x."Date" DESC)

You need to have the right indices for this query to be fast. Also ideally you should not have a JOIN on "Date", but on "ID"-like field (INTEGER). Give me more schema info, I will create an example for you.

van
@Van - I do have a numeric primary key field on every table, e.g. TransactionID and ExchangeRateID.
Herb Caudill
updated the query; try it out
van
the solution above is only for the A) rule. Did not notice the B) initially. Does not make sense to add now since you have the answer that suits you.
van