views:

125

answers:

3

I am writing a query to find employees who earn greater than the average salary within their department. I need to display the employee ID, salary, department id, and average salary of that department.

I have a query that just almost works but it keeps giving me "ORA-00904: "AVG_SAL": invalid identifier" errors. Am I doing this correctly. Why am i getting this invalid identifier error?

SELECT employee_id, salary, department_id,
  (SELECT ROUND(AVG(salary),2)
  FROM employees e_inner
  WHERE e_inner.department_id = e.department_id) AS avg_sal
FROM employees e
WHERE salary > avg_sal
ORDER BY avg_sal DESC
+3  A: 

I don't believe you can refer to a column alias (avg_sal in this case) in a WHERE clause.

You'll need to repeat that inner query, i.e.:

SELECT employee_id, salary, department_id,
  (SELECT ROUND(AVG(salary),2)
  FROM employees e_inner
  WHERE e_inner.department_id = e.department_id) AS avg_sal
FROM employees e
WHERE salary > 
 (SELECT ROUND(AVG(salary),2)
  FROM employees e_inner
  WHERE e_inner.department_id = e.department_id)
ORDER BY avg_sal DESC

Not great, with those two inner queries, but that's the most-straightforward way to correct the error.

Update: Haven't tested this, but try the following:

SELECT e.employee_id, e.salary, e.department_id, b.avg_sal
FROM employees e
INNER JOIN
(SELECT department_id, ROUND(AVG(salary),2) AS avg_sal
 FROM employees
 GROUP BY department_id) e_avg ON e.department_id = e_avg.department_id AND e.salary > e_avg.avg_sal
ORDER BY e_avg.avg_sal DESC
djacobson
+1: Correct - column aliases can not be referenced in the WHERE clause. In Oracle, the earliest you can reference them is the ORDER BY.
OMG Ponies
I was afraid I would have to brute force this and do two subqueries. Any idea on why I cannot refer to a column alias like this?
ChrisOPeterson
Why not use avg(salary) over (partition by department) like I showed below? (I don't know why the formatting didn't show up well.)
RussellH
@ChrisOPeterson The conceptual model of a SQL query was explained to me this way: Despite the SELECT clause preceding the WHERE clause *syntactically*, the WHERE clause is actually executed *first*. A little thought will tell us why: The resultset is first retrieved (FROM/JOIN), then filtered (WHERE) and, finally, selected from (SELECT). So such an alias is only applied to the result column *after* WHERE is evaluated.
djacobson
I like your answer RussellH but it is a bit more advanced than I am at right now. I really want to know how the column aliases work because they drive me insane in oracle. The query I wrote makes perfect sense to me but doesn't work. I want to know why.
ChrisOPeterson
@ChrisOPeterson Once you've got the aliasing concept down, though, do take a look at RussellH's answer. Those "windowing" functions (applying an aggregate over a group) are very useful. :)
djacobson
Repeating the subquery is a bad idea for performance and maintainability. You can reference the calculated column simply by wrapping another query around this one.
Dave Costa
@djacobson I will definitely be using RussellH's approach once i get this down. Thanks for your explanation. At least I know whats wrong now. @Dave Costa Can you give a code example. I'm having trouble visualizing what you mean.
ChrisOPeterson
+3  A: 

You could rewrite it as a join:

SELECT  e1.employee_id
,       e1.salary
,       e1.department_id
,       ROUND(AVG(e2.salary),2) as Avg_Sal
FROM    employees e
JOIN    employees e2
ON      e2.department_id = e.department_id
GROUP BY
        e1.employee_id
,       e1.salary
,       e1.department_id
HAVING  e1.salary > ROUND(AVG(e2.salary),2)

Or a subquery:

SELECT  *  
FROM    (
        SELECT  employee_id
        ,       salary
        ,       department_id
        ,       (
                SELECT  ROUND(AVG(salary),2)
                FROM    employees e_inner
                WHERE   e_inner.department_id = e.department_id
                ) AS avg_sal
        FROM    employees e
        ) as SubqueryAlias
WHERE   salary > avg_sal
Andomar
the second one is a very bad idea, very slow with much data...
Quandary
I don't see why the self join is necessary. Why not use round(avg(salary) over (partition by department_id), 2) avg_sal?
RussellH
@RussellH: A subquery is a second pass over the data, so unless there's something stellar in the EXECUTION PLAN then your point is moot.
OMG Ponies
@OMG Ponies, in Oracle 11G R2, both these queries did 2 scans of employees, whereas the query with the window function did a single scan when I tested them.
RussellH
@Quandary -- the second one is simply the OP's query nested inside a second query so the column alias can be used in the predicate. It's not likely to have worse performance than the accepted answer, which repeats the subquery multiple times. Overall though, the analytic answer would probably have the best performance.
Dave Costa
@Dave Costa: I can't speak for Oracle, but on MS-SQL, selecting from a subselect will create a temorary table for the subselect, which means if the subselect table contains some thoudsands of rows, it will get very very slow, because it will have to swap the temp table, because there won't be enough RAM. Don't ever do it, or you'll get performance and/or timeout problems in a production environment.
Quandary
+10  A: 

More efficient to use analytics:

select employee_id, salary, department_id, avg_sal
from
(
  SELECT employee_id, salary, department_id, 
    round(avg(salary) over (partition by department_id), 2) avg_sal
  from emp
)
where salary > avg_sal
order by avg_sal desc
RussellH
+1 Good idea. You can format by 1) four spaces before each row; 2) selecting the code and pressing CTRL-K; or 3) selecting the code and clicking the code button, the one with 1010101 as label
Andomar
Thanks for the tip.
RussellH
+1 OP might've been looking to understand the "referring to a column alias in the WHERE clause" problem in particular, but this technique is worth highlighting.
djacobson
+1, analytics are very powerful constructs, and in my experience more efficient than equivalent SQL using self-joins, etc.
DCookie