Nice question. Here's what I've done in the past (many things you've mentioned already):
- Check whether SELECT clause is present.
- If it's not, add
select count(*)
- Otherwise check whether it has DISTINCT or aggregate functions in it. If you're using ANTLR to parse your query, it's possible to work around those but it's quite involved. You're likely better off just wrapping the whole thing with
select count(*) from ()
.
- Remove
fetch all properties
- Remove
fetch
from joins if you're parsing HQL as string. If you're truly parsing the query with ANTLR you can remove left join
entirely; it's rather messy to check all possible references.
- Remove
order by
- Depending on what you've done in 1.2 you'll need to remove / adjust
group by
/ having
.
The above applies to HQL, naturally. For Criteria queries you're quite limited with what you can do because it doesn't lend itself to manipulation easily. If you're using some sort of a wrapper layer on top of Criteria, you will end up with equivalent of (limited) subset of ANTLR parsing results and could apply most of the above in that case.
Since you'd normally hold on to offset of your current page and the total count, I usually run the actual query with given limit / offset first and only run the count(*)
query if number of results returns is more or equal to limit AND offset is zero (in all other cases I've either run the count(*)
before or I've got all the results back anyway). This is an optimistic approach with regards to concurrent modifications, of course.
Update (on hand-assembling HQL)
I don't particularly like that approach. When mapped as named query, HQL has the advantage of build-time error checking (well, run-time technically, because SessionFactory has to be built although that's usually done during integration testing anyway). When generated at runtime it fails at runtime :-) Doing performance optimizations isn't exactly easy either.
Same reasoning applies to Criteria, of course, but it's a bit harder to screw up due to well-defined API as opposed to string concatenation. Building two HQL queries in parallel (paged one and "global count" one) also leads to code duplication (and potentially more bugs) or forces you to write some kind of wrapper layer on top to do it for you. Both ways are far from ideal. And if you need to do this from client code (as in over API), the problem gets even worse.
I've actually pondered quite a bit on this issue. Search API from Hibernate-Generic-DAO seems like a reasonable compromise; there are more details in my answer to the above linked question.