tags:

views:

114

answers:

2

Given rows:

symbol_id profit date
1         100    2009-08-18 01:01:00
1         100    2009-08-18 01:01:01
2         80     2009-08-18 01:01:02
2         -10    2009-08-18 01:01:03
1         156    2009-08-18 01:01:04
2         98     2009-08-18 01:01:05
1         -56    2009-08-18 01:01:06
1         18     2009-08-18 01:01:07
3         234    2009-08-18 01:01:08
3         167    2009-08-18 01:01:09
3         34     2009-08-18 01:01:10

I'm looking for average and largest runs/streaks of wins (profit >= 0) and losses (profit < 0) per symbol_id.

Looking at just symbol_id = 1:

symbol_id profit date
1         100    2009-08-18 01:01:00
1         100    2009-08-18 01:01:01
1         156    2009-08-18 01:01:04
1         -56    2009-08-18 01:01:06
1         18     2009-08-18 01:01:07

you can see there are 3 consecutive "wins", then a "loss", then a "win"

average of 2 wins ((3 + 1) / 2)
largest streak is 3
average of 1 loss (1 / 1)
largest streak is 1

Desired Query Result:

symbol_id avg_winning_streak largest_winning avg_losing_streak largest_losing_streak
1         2                  3               1                 1
2         1                  1               1                 1
3         3                  3               0                 0
+1  A: 

I wrote a solution for SQL Server 2005 or later. It's standard SQL but I don't know if MySQL supports row_number(). You can replace the CTEs with subqueries if need be.

Note that I count a profit of zero as both a win and a loss, which is only one way you might handle zeroes. You can change the inequalities in these two rows to count zeroes differently:

case when sp>=0 then 1.0*count(*) end as win_run_len_decimal,
case when sp<=0 then 1.0*count(*) end as loss_run_len_decimal

Here's the full query:

with Trk as (
  select
    symbol_id,
    sign(profit) as sp,
    row_number() over (
      partition by symbol_id
      order by d
    ) as rk,
    row_number() over (
      partition by symbol_id,sign(profit)
      order by d
    ) as rksp
  from T
), Trk_agg as (
  select
    symbol_id,
    sp,
    case when sp>=0 then 1.0*count(*) end as win_run_len_decimal,
    case when sp<=0 then 1.0*count(*) end as loss_run_len_decimal
  from Trk
  group by symbol_id, sp, rk-rksp
)
  select
    symbol_id,
    avg(win_run_len_decimal) as avg_winning_streak,
    max(win_run_len_decimal) as longest_winning_streak,
    avg(loss_run_len_decimal) as avg_losing_streak,
    max(loss_run_len_decimal) as longest_losing_streak
  from Trk_agg
  group by symbol_id;
Steve Kass
Brilliant solution in using rank function and CTE, especially the aggregation.
Irawan Soetomo
+1  A: 
SELECT  symbol_id,
        COALESCE(AVG(IF(res, cnt, NULL)), 0) AS avgwin,
        COALESCE(MAX(IF(res, cnt, NULL)), 0) AS maxwin,
        COALESCE(AVG(IF(NOT res, cnt, NULL)), 0) AS avglose,
        COALESCE(MAX(IF(NOT res, cnt, NULL)), 0) AS maxlose
FROM    (
        SELECT  symbol_id, streak, COUNT(*) AS cnt, res
        FROM    (
                SELECT  g.*,
                        @streak := @streak + ((profit > 0) XOR @result) AS streak,
                        @result := (profit > 0) AS res
                FROM    (
                        SELECT  @streak := 0,
                                @result := false
                        ) vars,
                        t_game g
                ORDER BY
                        symbol_id, date
                ) q
        GROUP BY
                symbol_id, streak
        ) q2
GROUP BY
        symbol_id
Quassnoi
i actually found another answer of your doing something very similar. modified it some and got it mostly working... but this works as is. thanks!
james