views:

47

answers:

2

Hello all,

I have a table containing events with a "speed" property.

In order to see the statistical distribution of this property, I'd like to group the results by intervals, let's say:

[0-49.99km/h] 3 objects
[50-100km/h] 13 objects
[100-150km/h] 50 objects
etc

This would let me see that most objects are in a certain interval.

Obviously that could be done with several queries with the appropriate Where conditions, such as:

select count from GaEvent a where speed >= MIN and speed < MAX

but this is extremely inefficient. Is there a better way of grouping these values?

Cheers!

A: 

A more efficient way to tackle this in SQL alone is to join the table in question against a derived table which contains the minimum and maximum values you want in your histogram.

For example:

select t.min, t.max, count(*)
from  (
    select 0 as min, 14.9 as max
    union 
    select 15, 29.9
    union
    select 30, 44.9
    union ...
) t
left outer join cars c on c.speed between t.min and t.max
group by t.min, t.max
order by t.min

min | max  | count
-----------------
 0  | 14.9 | 1
 15 | 29.9 | 1
 30 | 44.9 | 2

This is highly dependent on which database vendor you are using though. For example, PostgreSQL has a concept of window functions which may grossly simplify this type of query and prevent you from needing to generate the "histogram table" yourself.

When it comes to Hibernate though, there seems to be very little in the way of the Projections and support for aggregrate functions that would apply to anything like this. This may very well be a scenario where you want to drop down to using raw SQL for the query, and/or do the calculations in Java itself.

matt b
thank you! I'm using Hibernate on Grails on a PostGreSQL.At the moment I solved the issue with multiple queries but it's very very slow.Do you know if this can be done in HQL? I wouldn't like to lose DB independence.Cheers
Mulone
I highly doubt it, as the goal of HQL doesn't quite fit in with the type of query you'd like to run. In fact I don't think HQL can handle querying against non-entity/non-mapped tables. Also, HQL is structured towards returning as results instances of your entites, not the results of arbitrary queries (where you would want to return the min/max/count of each row in the histogram etc)
matt b
A: 

if your intervals are all of the same size, you can use something like this:

select 50*trunc(c.speed/50), count(*) from Car c group by 1

Maurice Perry