views:

434

answers:

1

I am attempting to nest SELECT queries in Arel and/or Active Record in Rails 3 to generate the following SQL statement.

SELECT sorted.* FROM (SELECT * FROM points ORDER BY points.timestamp DESC) AS sorted GROUP BY sorted.client_id

An alias for the subquery can be created by doing

points = Table(:points)
sorted = points.order('timestamp DESC').alias

but then I'm stuck as how to pass it into the parent query (short of calling #to_sql, which sounds pretty ugly).

How do you use a SELECT statement as a sub-query in Arel (or Active Record) to accomplish the above? Maybe there's an altogether different way to accomplish this query that doesn't use nested queries?

+2  A: 

The question is why would you need a "nested query"? We do not need to use "nested queries" this is thinking in the mindset of SQL not Relational Algebra. With relational algebra we derive relations and use the output of one relation as input to another so the following would hold true:

points = Table(:points, {:as => 'sorted'}) # rename in the options hash
final_points = points.order('timestamp DESC').group(:client_id, :timestamp).project(:client_id, :timestamp)

It's best if we leave the renaming to arel unless absolutely necessary.

Here the projection of client_id AND timestamp is VERY important since we cannot project all domains from the relation (i.e. sorted.*). You must specifically project all domains that will be used within the grouping operation for the relation. The reason being is there is no value for * that would be distinctly representative of a grouped client_id. For instance say you have the following table

client_id   |   score
----------------------
    4       |    27
    3       |    35
    2       |    22
    4       |    69

Here if you group you could not perform a projection on the score domain because the value could either be 27 or 69 but you could project a sum(score)

You may only project the domain attributes that have unique values to the group (which are usually aggregate functions like sum, max, min). With your query it would not matter if the points were sorted by timestamp because in the end they would be grouped by client_id. the timestamp order is irrelevant since there is no single timestamp that could represent a grouping.

Please let me know how I can help you with Arel. Also, I have been working on a learning series for people to use Arel at its core. The first of the series is at http://Innovative-Studios.com/#pilot I can tell you are starting to know how to since you used Table(:points) rather than the ActiveRecord model Point.

Snuggs
Thank you for the detailed response."the timestamp order is irrelevant since there is no single timestamp that could represent a grouping." You're right; I see what you are saying. It appears MySQL works around this inconsistency by returning just the first row of the client_id group, which is what I was aiming for. I see now this is not behavior I should count on.My goal is to return the most recent point for all client_ids, i.e. a single point with the maximum timestamp per each client_id grouping. It is important to do in one query because it will be polled often.
Schrockwell
We would need to use some aggregate function. If we ask ourselves "What are we trying to do?" The answer would be to find the most recent or "maximum" date so we would pass max(timestamp) in sql. This would correspond to Arel::Attribute::Expression::Maximum which can be called with syntactic sugar on an Arel::Attribute like sorted[:timestamp].maximum() . There is one caveat. Make sure you add timestamp to group operation #group('client_id, timestamp') or the entire grouping scenario will error.I know the MAX aggregate function works on dates in Postgres and I'm sure in MySQL as well.
Snuggs
Firstly, sorting and order is not part of relational algebra. Arel defines it anyway. Secondly, whether or not subqueries are part of relational algebra is irrelevant. Conceptually, the result of a SELECT is not visible until the WHERE clause executes. Therefore not all databases (eg. Postgres) allow column aliases in WHERE clauses and instead depend on subqueries. If Arel cannot handle subqueries then names in the WHERE clause cannot be aliased. This can get messy when you can't depend on Arel to generate names.
Samuel Danielson