views:

65

answers:

3

I have a table whose records represent certain objects. For the sake of simplicity I am going to assume that the table only has one column, and that is the unique ObjectId. Now I need a way to store combinations of objects from that table. The combinations have to be unique, but can be of arbitrary length. For example, if I have the ObjectIds

1,2,3,4

I want to store the following combinations:

{1,2}, {1,3,4}, {2,4}, {1,2,3,4}

The ordering is not necessary. My current implementation is to have a table Combinations that maps ObjectIds to CombinationIds. So every combination receives a unique Id:

ObjectId | CombinationId
------------------------
1        | 1
2        | 1
1        | 2
3        | 2
4        | 2

This is the mapping for the first two combinations of the example above. The problem is, that the query for finding the CombinationId of a specific Combination seems to be very complex. The two main usage scenarios for this table will be to iterate over all combinations, and the retrieve a specific combination. The table will be created once and never be updated. I am using SQLite through JDBC. Is there any simpler way or a best practice to implement such a mapping?

A: 

This may be heresy, but for your usage scenarios it might work better to use a denormalized structure where you store the combinations themselves as some kind of composite (text) value:

CombinationId | Combination
---------------------------
1             | |1|2|
2             | |1|3|4|

If you make the rule that you always sort the ObjectIds when generating the composite value, it's easy to retrieve the Combination for a given set of Objects.

David Gelhar
well, possibly split that into a "combination" table and have a "combination_object" listing the rows (1,1),(1,2) etc individually, and a trigger to recalculate the "combination" column whenever something gets inserted into/deleted from combination_object. (essentially this is making something like a function-based index)
araqnid
A: 

Another option would be to use relation-valued attributes, which in SQL DBMSs are called multisets or nested tables.

Relation-valued attributes may make sense if there is no identifier for the set of objects other than the set itself. However, I don't think any SQL DBMS permits keys to be declared on columns of that type so that could be a problem if you don't have some alternative key you can use.

http://download.oracle.com/docs/cd/B10500_01/appdev.920/a96594/adobjbas.htm#458790

dportas
+1  A: 

The problem is, that the query for finding the CombinationId of a specific Combination seems to be very complex.

Shouldn't be too bad. If you want all combinations containing the selected items (with additional items allowed), it's just something like:

SELECT combinationID
FROM Combination
WHERE objectId IN (1, 3, 4)
GROUP BY combinationID
HAVING COUNT(*) = 3 -- The number of items in the combination

If you need only the specific combination (no extra items allowed), it can be more like:

SELECT combinationID FROM (
   -- ... query from above goes here, this gives us all with those 3
) AS candidates

-- This bit gives us a row for each item in the candidates, including 
-- the items we know about but also any 'extras'
INNER JOIN combination ON (candidates.combinationID = combination.combinationID)

GROUP BY candidates.combinationID
HAVING COUNT(*) = 3 -- Because we joined back on ALL, ones with extras will have > 3

You can also use a NOT EXISTS here (or in the original query), this seemed easier to explain.

Finally you could also be fancy and have a single, simple query

SELECT combinationID
FROM Combination AS candidates
INNER JOIN Combination AS allItems ON 
  (candidates.combinationID = allItems.combinationID)
WHERE candidates.objectId IN (1, 3, 4)
GROUP BY combinationID
HAVING COUNT(*) = 9 -- The number of items in the combination, squared

So in other words, if we're looking for {1, 2}, and there's a combination with {1, 2, 3}, we'll have a {candidates, allItems} JOIN result of:

{1, 1}, {1, 2}, {1, 3}, {2, 1}, {2, 2}, {2, 3}

The extra 3 results in COUNT(*) being 6 rows after GROUPing, not 4, so we know that's not the combination we're after.

Cowan
Thanks for that. I am not very good with SQL and my solution involved nested queries. This is indeed not so bad.
Space_C0wb0y