views:

1189

answers:

5

I have a query like this:

SELECT t1.id,
    (SELECT COUNT(t2.id)
     FROM t2
     WHERE t2.id = t1.id
          ) as num_things
FROM t1
WHERE num_things = 5;

The goal is to get the id of all the elements that appear 5 times in the other table. However, I get this error:

ERROR: column "num_things" does not exist
SQL state: 42703

I'm probably doing something silly here, as I'm somewhat new to databases. Is there a way to fix this query so I can access num_things? Or, if not, is there any other way of achieving this result?

+1  A: 

I think you could just rewrite your query like so:

SELECT t1.id
FROM t1
WHERE (SELECT COUNT(t2.id)
     FROM t2
     WHERE t2.id = t1.id
          ) = 5;
Brian Fisher
Oh, I should have re-read the goal. You're right. At first I was like, "but then you lose the count information, what if you need that?"
Kev
yep, that works. strange, i thought i tried that at first, but got a 'no aggregates allowed in where clause' or something. i must've done something different.is there a reason that the way i posted doesn't work, though?
Claudiu
I can't find it, but there was a thread once in the newsgroups about this, and there was a reason. All I can remember was someone saying that (if you did repeat the function) in most cases the overhead of calculating twice is really small.
Kev
If you alias a column in the select list you can't then use that alias anywhere else in the query. It is only used as the field label for the result.
Brian Fisher
@Brian: you can use column aliases in the GROUP BY, HAVING, and ORDER BY clauses, but not in the WHERE clause or in other expressions in the select-list.
Bill Karwin
Thanks for the correction, Bill.
Brian Fisher
+5  A: 

A few important points about using SQL:

  • You cannot use column aliases in the WHERE clause, but you can in the HAVING clause. That's the cause of the error you got.
  • You can do your count better using a JOIN and GROUP BY than by using correlated subqueries. It'll be much faster.
  • Use the HAVING clause to filter groups.

Here's the way I'd write this query:

SELECT t1.id, COUNT(t2.id) AS num_things
FROM t1 JOIN t2 USING (id)
GROUP BY t1.id
HAVING num_things = 5;

I realize this query can skip the JOIN with t1, as in Charles Bretana's solution. But I assume you might want the query to include some other columns from t1.


Re: the question in the comment:

The difference is that the WHERE clause is evaluated on rows, before GROUP BY reduces groups to a single row per group. The HAVING clause is evaluated after groups are formed. So you can't, for example, change the COUNT() of a group by using HAVING; you can only exclude the group itself.

SELECT t1.id, COUNT(t2.id) as num
FROM t1 JOIN t2 USING (id)
WHERE t2.attribute = <value>
GROUP BY t1.id
HAVING num > 5;

In the above query, WHERE filters for rows matching a condition, and HAVING filters for groups that have at least five count.

The point that causes most people confusion is when they don't have a GROUP BY clause, so it seems like HAVING and WHERE are interchangeable.

WHERE is evaluated before expressions in the select-list. This may not be obvious because SQL syntax puts the select-list first. So you can save a lot of expensive computation by using WHERE to restrict rows.

SELECT <expensive expressions>
FROM t1
HAVING primaryKey = 1234;

If you use a query like the above, the expressions in the select-list are computed for every row, only to discard most of the results because of the HAVING condition. However, the query below computes the expression only for the single row matching the WHERE condition.

SELECT <expensive expressions>
FROM t1
WHERE primaryKey = 1234;

So to recap, queries are run by the database engine according to series of steps:

  1. Generate set of rows from table(s), including any rows produced by JOIN.
  2. Evaluate WHERE conditions against the set of rows, filtering out rows that don't match.
  3. Compute expressions in select-list for each in the set of rows.
  4. Apply column aliases (note this is a separate step, which means you can't use aliases in expressions in the select-list).
  5. Condense groups to a single row per group, according to GROUP BY clause.
  6. Evaluate HAVING conditions against groups, filtering out groups that don't match.
  7. Sort result, according to ORDER BY clause.
Bill Karwin
what is the difference between a HAVING and a WHERE clause? semantically, in english, they seem to mean the same thing
Claudiu
+4  A: 

All the other suggestions would work, but to answer your basic question it would be sufficient to write

  SELECT id  From T2
  Group By Id
  Having Count(*) = 5
Charles Bretana
Very compact query I like it.
Brian Fisher
Yes, given the exact problem described by the OP. But perhaps you need other columns from the t1 table.
Bill Karwin
if other columns are needed, this can still be used as subquery to restrict output from table with other columns.
Charles Bretana
@Charles: But then it's a correlated subquery, which is run for each row of t1. It's more efficient in that case to use JOIN and GROUP BY.
Bill Karwin
well, not all subquerys are correllated, In this case, if it's just to get data from another table for the single t1 value, then it would not be correllated... and second, modern optimizers often create identical query plans for logically equivilent corrrellated subqueries and Joins
Charles Bretana
A: 

try this

SELECT t1.id,
    (SELECT COUNT(t2.id) as myCount
     FROM t2
     WHERE t2.id = t1.id and myCount=5
          ) as num_things
FROM t1
Xian
This is wrong; you can't use a column alias in a WHERE clause.
Bill Karwin
A: 

I'd like to mention that in PostgreSQL there is no way to use aliased column in having clause.

i.e.

SELECT usr_id AS my_id FROM user HAVING my_id = 1

Wont work.

Another example that is not going to work:

SELECT su.usr_id AS my_id, COUNT(*) AS val FROM sys_user AS su GROUP BY su.usr_id HAVING val >= 1

There will be the same error: val column is not known.

Im highliting this because Bill Karwin wrote something not really true for Postgres:

"You cannot use column aliases in the WHERE clause, but you can in the HAVING clause. That's the cause of the error you got."