ansaurus

Question

Java coding best-practices for reusing part of a query to count

Answer 1

+2 A:

Well, I'm not sure this is a best-practice, but is my-practice :)

If I have as query something like:

select A.f1,A.f2,A.f3 from A, B where A.f2=B.f2 order by A.f1, B.f3

And I just want to know how many results will get, I execute:

select count(*) from ( select A.f1, ... order by A.f1, B.f3 )

And then get the result as an Integer, without mapping results in a POJO.

Parse your query for remove some parts, like 'order by' is very complicated. A good RDBMS will optimize your query for you.

Good question.

Sinuhe 2009-10-21 13:00:40

Thanks for your support :-) My typical query is HQL, either returning pojos or a list of properties.

KLE 2009-10-21 14:23:40

Like you, I don't feel like parsing the query to remove some parts ; but I don't really trust the RDBMS to optimize the query in all cases. I think some cases get messed up, and it's hard to predict which. **Are there facts list about these optimizations?**

KLE 2009-10-21 14:27:02

I don't know where to find these fact lists or if this information is public. I can't be completely sure that a RDBMS optimize this way. But as student, I saw some "old and basic" optimizations that seem more difficult than this one. In this case e.g. RDBMS would think: "They are querying me for a number of rows, so the order is avoidable".

Sinuhe 2009-10-22 01:14:15

Answer 2

+2 A:

Have you tried making your intentions clear to Hibernate by setting a projection on your (SQL?)Criteria? I've mostly been using Criteria, so I'm not sure how applicable this is to your case, but I've been using

getSession().createCriteria(persistentClass).
setProjection(Projections.rowCount()).uniqueResult()

and letting Hibernate figure out the caching / reusing / smart stuff by itself.. Not really sure how much smart stuff it actually does though.. Anyone care to comment on this?

Tim 2009-10-21 13:47:18

Hibernate doesn't cache queries by itself; you have to do so explicitly. The problem with the above approach (and using Criteria in general) is that layer assembling the criteria has to create another copy of it just for counting. In other words, I can't just create a Criteria in the business layer, pass it to service (or DAO) and get back 1 page of results + total count. Not a huge deal for small apps but leads to a LOT of unnecessary code in bigger ones.

ChssPly76 2009-10-22 20:52:03

Answer 3

A:

In a freehand HQL situation I would use something like this but this is not reusable as it is quite specific for the given entities

Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();

Do this once and adjust starting number accordingly till you page through.

For criteria though I use a sample like this

final Criteria criteria = session.createCriteria(clazz);  
            List<Criterion> restrictions = factory.assemble(command.getFilter());
            for (Criterion restriction : restrictions)
                criteria.add(restriction);
            criteria.add(Restrictions.conjunction());
            if(this.projections != null)
                criteria.setProjection(factory.loadProjections(this.projections));
            criteria.addOrder(command.getDir().equals("ASC")?Order.asc(command.getSort()):Order.desc(command.getSort()));
            ScrollableResults scrollable = criteria.scroll(ScrollMode.SCROLL_INSENSITIVE);
            if(scrollable.last()){//returns true if there is a resultset
                genericDTO.setTotalCount(scrollable.getRowNumber() + 1);
                criteria.setFirstResult(command.getStart())
                        .setMaxResults(command.getLimit());
                genericDTO.setLineItems(Collections.unmodifiableList(criteria.list()));
            }
            scrollable.close();
            return genericDTO;

But this does the count every time by calling ScrollableResults:last().

non sequitor 2009-10-21 15:48:03

Answer 4

+1 A:

Nice question. Here's what I've done in the past (many things you've mentioned already):

Check whether SELECT clause is present.
1. If it's not, add select count(*)
2. Otherwise check whether it has DISTINCT or aggregate functions in it. If you're using ANTLR to parse your query, it's possible to work around those but it's quite involved. You're likely better off just wrapping the whole thing with select count(*) from ().
Remove fetch all properties
Remove fetch from joins if you're parsing HQL as string. If you're truly parsing the query with ANTLR you can remove left join entirely; it's rather messy to check all possible references.
Remove order by
Depending on what you've done in 1.2 you'll need to remove / adjust group by / having.

The above applies to HQL, naturally. For Criteria queries you're quite limited with what you can do because it doesn't lend itself to manipulation easily. If you're using some sort of a wrapper layer on top of Criteria, you will end up with equivalent of (limited) subset of ANTLR parsing results and could apply most of the above in that case.

Since you'd normally hold on to offset of your current page and the total count, I usually run the actual query with given limit / offset first and only run the count(*) query if number of results returns is more or equal to limit AND offset is zero (in all other cases I've either run the count(*) before or I've got all the results back anyway). This is an optimistic approach with regards to concurrent modifications, of course.

Update (on hand-assembling HQL)

I don't particularly like that approach. When mapped as named query, HQL has the advantage of build-time error checking (well, run-time technically, because SessionFactory has to be built although that's usually done during integration testing anyway). When generated at runtime it fails at runtime :-) Doing performance optimizations isn't exactly easy either.

Same reasoning applies to Criteria, of course, but it's a bit harder to screw up due to well-defined API as opposed to string concatenation. Building two HQL queries in parallel (paged one and "global count" one) also leads to code duplication (and potentially more bugs) or forces you to write some kind of wrapper layer on top to do it for you. Both ways are far from ideal. And if you need to do this from client code (as in over API), the problem gets even worse.

I've actually pondered quite a bit on this issue. Search API from Hibernate-Generic-DAO seems like a reasonable compromise; there are more details in my answer to the above linked question.

ChssPly76 2009-10-21 17:25:03

+1 Thanks for these precisions on manipulating the query. Thanks also for the excellent precision that **a count query should be run only after a first query**.

KLE 2009-10-22 08:27:28

I updated my question, would you make another answer related to the new part? I liked many of your others posts, and you are a java expert :-) ...

KLE 2009-10-22 08:42:00

Thanks :-) I've updated my answer above.

ChssPly76 2009-10-22 21:11:56

ansaurus

tags:

views:

answers:

Java coding best-practices for reusing part of a query to count

related questions