The problem is a bit more difficult than in your generalization. I would state it as the following:
SELECT a.group, func(a.group, avg_avg)
FROM a
(SELECT AVG(field1_avg) as avg_avg
FROM (SELECT a.group, AVG(field1) as field1_avg
FROM a
WHERE (YOUR_CONDITION)
GROUP BY a.group) as several_lines -- potentially
) as one_line -- always
WHERE (YOUR_CONDITION)
GROUP BY a.group -- again, potentially several lines
You have a subset of data (limited by your condition), which is grouped and an aggregation is made for each group. Then, you merge down aggregations to a single value and you want to apply a function of the value to each group again. Obviously, you can not reuse the condition until the result of the grouped subquery can be referenced as an entity.
In MSSQL and Oracle, you would use WITH
operator. In MySQL the only option is to use a temporary table. I assume that there is more than one year in your report (otherwise, the query would be much simplier).
UPD: I am sorry, I can not post the ready code now (can do it tomorrow), but I have an idea:
You can concatenate the data you need to output in the subquery with GROUP_CONCAT
AND split it back in the outer query with FIND_IN_SET
, and SUBSTRING_INDEX
functions. the outer query will JOIN only the YEAR_REF and the result of the aggregation.
The condition in the outer query then will be just WHERE FIND_IN_SET(year, concatenated_years)
.
UPD:
Here is the version that uses GROUP_CONCAT to pass the required data to the outer JOIN.
My comments start with --newtover:
. By the way, 1) I do not think STRAIGHT_JOIN adds any benefit, and 2) COUNT(*)
has a special meaning in MySQL and should be used when you want to count rows.
SELECT STRAIGHT_JOIN
-- newtover: extract the corresponding amount back
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUPED_AMOUNTS, '|', @pos),'|', -1) as AMOUNT,
Y.YEAR * ymxb.SLOPE + ymxb.INTERCEPT as REGRESSION_LINE,
Y.YEAR as YEAR,
MAKEDATE(Y.YEAR,1) as AMOUNT_DATE,
ymxb.SLOPE,
ymxb.INTERCEPT,
ymxb.CORRELATION,
ymxb.MEASUREMENTS
FROM
-- newtover: list of tables now contains only the subquery, YEAR_REF for grouping and init_vars to define the variable
YEAR_REF Y,
(SELECT
SUM(MEASUREMENTS) as MEASUREMENTS,
((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) /
(power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE,
((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) -
(sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) /
(power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT,
((avg(t.AMOUNT * t.YEAR)) - avg(t.AMOUNT) * avg(t.YEAR)) /
(stddev( t.AMOUNT ) * stddev( t.YEAR )) as CORRELATION,
-- newtover: grouped fields for matching years and the corresponding amounts
GROUP_CONCAT(Y.YEAR) as GROUPED_YEARS,
GROUP_CONCAT(AMOUNT SEPARATOR '|') as GROUPED_AMOUNTS
FROM (
SELECT STRAIGHT_JOIN
COUNT(1) as MEASUREMENTS,
AVG(D.AMOUNT) as AMOUNT,
Y.YEAR as YEAR
FROM
CITY C,
STATION S,
STATION_DISTRICT SD,
YEAR_REF Y,
MONTH_REF M,
DAILY D
WHERE
-- For a specific city ...
$X{ IN, C.ID, CityCode } AND
-- Find all the stations within a specific unit radius ...
6371.009 *
SQRT(
POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
(COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= $P{Radius} AND
SD.ID = S.STATION_DISTRICT_ID AND
-- Gather all known years for that station ...
Y.STATION_DISTRICT_ID = SD.ID AND
-- The data before 1900 is shaky; insufficient after 2009.
Y.YEAR BETWEEN 1900 AND 2009 AND
-- Filtered by all known months ...
M.YEAR_REF_ID = Y.ID AND
-- Whittled down by category ...
M.CATEGORY_ID = $P{CategoryCode} AND
-- Into the valid daily climate data.
M.ID = D.MONTH_REF_ID AND
D.DAILY_FLAG_ID <> 'M'
GROUP BY
Y.YEAR
) t
) ymxb,
(SELECT @pos:=NULL) as init_vars
WHERE
-- newtover: check if the year is in the list and store the index into the variable
@pos:=CAST(FIND_IN_SET(Y.YEAR, GROUPED_YEARS) as UNSIGNED)
GROUP BY
Y.YEAR