tags:

views:

98

answers:

3

I have a list of values and dates for stock ticker symbols and want to calculate quarterly returns in SQL.

CREATE TABLE `symbol_details` (
  `symbol_header_id` INT(11) DEFAULT NULL,
  `DATE` DATETIME DEFAULT NULL,
  `NAV` DOUBLE DEFAULT NULL,
  `ADJ_NAV` DOUBLE DEFAULT NULL)

For fixed quarter start and end dates that works fine:

set @quarterstart='2008-12-31';
set @quarterend='2009-3-31';

select sha, (100*(aend-abegin)/abegin) as q1_returns from
(select symbol_header_id as sha, ADJ_NAV as abegin from symbol_details
  where date=@quarterstart) as a,
(select  symbol_header_id as she, ADJ_NAV as aend from symbol_details
  where date=@quarterend) as b
where sha=she;

This calculates all quarterly returns for all symbols. Sometimes that end of the quarter is a non-trading day, or the stock ceased operation, so I want to get the closest start and end dates to the quarter start and end dates.

The solution would be to somehow get only the one start and one end of quarter value for each symbol_header_id by some GROUP BY statement, like (1)

SET @quarterstart = '2009-03-01';
SET @quarterend = '2009-4-31';
SELECT  symbol_header_id, DATE, ADJ_NAV AS aend FROM symbol_details
WHERE
 DATE BETWEEN @quarterstart AND @quarterend
 AND symbol_header_id BETWEEN 18540 AND 18550
GROUP BY symbol_header_id asc;

This gives the ADJ_NAV value for the date closest to the start of the quarter for each of the symbol_header_id values.

So finally, (2)

SET @quarterstart = '2008-12-31';
SET @quarterend = '2009-3-31';

SELECT sh1, a.date, b.date, aend, abegin, 
       (100*(aend-abegin)/abegin) AS quarter_returns FROM
(SELECT  symbol_header_id sh1, DATE, ADJ_NAV AS abegin FROM symbol_details
WHERE
 DATE BETWEEN @quarterstart AND @quarterend
 GROUP BY symbol_header_id DESC) a,

(SELECT  symbol_header_id sh2, DATE, ADJ_NAV AS aend FROM symbol_details
WHERE
 DATE BETWEEN @quarterstart AND @quarterend
 GROUP BY symbol_header_id ASC) b
WHERE sh1 = sh2;

should calculate the quarterly returns for each symbols.

Unfortunately this does not work. For some reason when I restrict the IDs like in (1) the correct start and end dates are used, but when the "AND symbol_header_id BETWEEN 18540 AND 18550" statements are removed, the same start and end dates come up. Why????

The answer with unrolled JOINs is:

SET @quarterstart = '2008-12-31';
SET @quarterend = '2009-3-31';


SELECT tq.sym                                AS sym,
       (100*(alast.adj_nav - afirst.adj_nav)/afirst.adj_nav) AS quarterly_returns
FROM

  -- First, determine first traded days ("ftd") and last traded days
  -- ("ltd") in this quarter per symbol
  (SELECT symbol_header_id AS sym,
          MIN(DATE)      AS ftd,
          MAX(DATE)      AS ltd
   FROM symbol_details
   WHERE DATE BETWEEN @quarterstart AND @quarterend
   GROUP BY 1) tq

  JOIN symbol_details afirst
  -- Second, determine adjusted NAV for "ftd" per symbol (see WHERE)
   ON afirst.DATE BETWEEN @quarterstart AND @quarterend
   AND afirst.symbol_header_id = tq.sym

  JOIN
  -- Finally, determine adjusted NAV for "ltd" per symbol (see WHERE)
    symbol_details alast
    ON alast.DATE BETWEEN @quarterstart AND @quarterend
    AND alast.symbol_header_id = tq.sym

WHERE
  afirst.date = tq.ftd
  AND
  alast.date  = tq.ltd;
A: 

Are you sure that you're using GROUP BY ... ASC in your queries? GROUP BY groups those rows which satisfy some criteria and using ASC/DESC makes no sense. By the way, in your case your criterion is equality of symbol_header_id column which, assuming that "id" means that it is a key, effectively makes groups consisting of a single row.

Anyway, if your groups are really groups, you may have problems because you're selecting not only grouping attribute and aggregate functions, but also some other attributes. Their value is unpredictable in this case.

Dmitry
+1  A: 

UPDATE:

Incorporating @Hogan's suggestion fully ... and testing it. :) This EXPLAINs much simpler, and should be a better performer.

Again, presuming ANSI_QUOTES behavior:

SELECT tq.sym AS sym,
       (100*(alast.adj_nav - afirst.adj_nav)/afirst.adj_nav) AS quarterly_returns
FROM
  (SELECT symbol_header_id AS sym, -- find first/last traded day ("ftd", "ltd")
          MIN("date") AS ftd,
          MAX("date") AS ltd
   FROM symbol_details
   WHERE "date" BETWEEN @quarterstart AND @quarterend
   GROUP BY 1) tq
JOIN symbol_details afirst         -- JOIN for ADJ_NAV on first traded day
  ON tq.sym = afirst.symbol_header_id
     AND
     tq.ftd = afirst."date"
JOIN symbol_details alast          -- JOIN for ADJ_NAV on last traded day
  ON tq.sym = alast.symbol_header_id
     AND
     tq.ltd = alast."date"

ORIGINAL:

Presuming SET SESSION sql_mode = 'ANSI_QUOTES', try this:

SELECT tq.sym                                AS sym,
       (100*(adj_end - adj_begin)/adj_begin) AS quarterly_returns
FROM
  -- First, determine first traded days ("ftd") and last traded days
  -- ("ltd") in this quarter per symbol
  (SELECT symbol_header_id AS sym,
          MIN("date")      AS ftd,
          MAX("date")      AS ltd
   FROM symbol_details
   WHERE "date" BETWEEN @quarterstart AND @quarterend
   GROUP BY 1) tq
  JOIN
  -- Second, determine adjusted NAV for "ftd" per symbol (see WHERE)
  (SELECT symbol_header_id AS sym,
          "date"           AS adate,
          adj_nav          AS adj_begin
   FROM symbol_details) afirst
  ON afirst.sym = tq.sym
  JOIN
  -- Finally, determine adjusted NAV for "ltd" per symbol (see WHERE)
  (SELECT symbol_header_id AS sym,
          "date"           AS adate,
          adj_nav          AS adj_end
   FROM symbol_details) alast
  ON alast.sym = tq.sym
WHERE
  afirst.adate = tq.ftd
  AND
  alast.adate  = tq.ltd;
pilcrow
This looks really great, but when I try this on my MySQL DB with 2 million records it never finishes. Is there a performance optimization?
Martin Stein
unroll the sub-querys to real joins.
Hogan
This turns into real work. Thanks for the tip. Will try
Martin Stein
+1  A: 

Example of unrolling the last sub-query from the below query:

(note -- I did not test)

SELECT tq.sym                                AS sym,
       (100*(alast.adj_nav - adj_begin)/adj_begin) AS quarterly_returns
FROM
  -- First, determine first traded days ("ftd") and last traded days
  -- ("ltd") in this quarter per symbol
  (SELECT symbol_header_id AS sym,
          MIN("date")      AS ftd,
          MAX("date")      AS ltd
   FROM symbol_details
   WHERE "date" BETWEEN @quarterstart AND @quarterend
   GROUP BY 1) tq
  JOIN
  -- Second, determine adjusted NAV for "ftd" per symbol (see WHERE)
  (SELECT symbol_header_id AS sym,
          "date"           AS adate,
          adj_nav          AS adj_begin
   FROM symbol_details) afirst
  ON afirst.sym = tq.sym
  -- Finally, determine adjusted NAV for "ltd" per symbol (see WHERE)
  LEFT JOIN symbol_details alast ON tq.sym = alast.symbol_header_id AND tg.ltd = alast."date"

WHERE
  afirst.adate = tq.ftd
Hogan
+1. @Hogan, that works, with s/tg/tq/, and I've updated, unrolled further, and tested.
pilcrow