A few important points about using SQL:
- You cannot use column aliases in the WHERE clause, but you can in the HAVING clause. That's the cause of the error you got.
- You can do your count better using a JOIN and GROUP BY than by using correlated subqueries. It'll be much faster.
- Use the HAVING clause to filter groups.
Here's the way I'd write this query:
SELECT t1.id, COUNT(t2.id) AS num_things
FROM t1 JOIN t2 USING (id)
GROUP BY t1.id
HAVING num_things = 5;
I realize this query can skip the JOIN
with t1, as in Charles Bretana's solution. But I assume you might want the query to include some other columns from t1.
Re: the question in the comment:
The difference is that the WHERE
clause is evaluated on rows, before GROUP BY
reduces groups to a single row per group. The HAVING
clause is evaluated after groups are formed. So you can't, for example, change the COUNT()
of a group by using HAVING
; you can only exclude the group itself.
SELECT t1.id, COUNT(t2.id) as num
FROM t1 JOIN t2 USING (id)
WHERE t2.attribute = <value>
GROUP BY t1.id
HAVING num > 5;
In the above query, WHERE
filters for rows matching a condition, and HAVING
filters for groups that have at least five count.
The point that causes most people confusion is when they don't have a GROUP BY
clause, so it seems like HAVING
and WHERE
are interchangeable.
WHERE
is evaluated before expressions in the select-list. This may not be obvious because SQL syntax puts the select-list first. So you can save a lot of expensive computation by using WHERE
to restrict rows.
SELECT <expensive expressions>
FROM t1
HAVING primaryKey = 1234;
If you use a query like the above, the expressions in the select-list are computed for every row, only to discard most of the results because of the HAVING
condition. However, the query below computes the expression only for the single row matching the WHERE
condition.
SELECT <expensive expressions>
FROM t1
WHERE primaryKey = 1234;
So to recap, queries are run by the database engine according to series of steps:
- Generate set of rows from table(s), including any rows produced by
JOIN
.
- Evaluate
WHERE
conditions against the set of rows, filtering out rows that don't match.
- Compute expressions in select-list for each in the set of rows.
- Apply column aliases (note this is a separate step, which means you can't use aliases in expressions in the select-list).
- Condense groups to a single row per group, according to
GROUP BY
clause.
- Evaluate
HAVING
conditions against groups, filtering out groups that don't match.
- Sort result, according to
ORDER BY
clause.