views:

43

answers:

1

Suppose a database table containing properties of some elements:

Table Element (let's say 1 000 000 rows):
ElementId    Property_1    Property_2    Property_3
-------      ----------    ----------    ----------
1            abc            1            1
2            bcd            1            2
3            def            2            4
...

The table is being frequently updated. I'd like to store definitions of sets of these elements so that using a single SQL statement I would get eg.

SetId    Element
---      -------
A        2
B        1
B        3
C        2
C        3
...

I'd also like to change the definitions when needed. So far I have stored the definitions of the sets as unions of intersections like this:

Table Subset (~1 000 rows):
SubsetId    Property    Value    Operator
--------    --------    -----    --------
1            1          bcd      =
1            3          1        >
2            2          3        <=
...

and

Table Set (~300 rows):
SetId    SubsetId
---      ------
...
E        3
E        4
F        7
F        9
...

In SQL I suppose I could generate lots of case expressions from the tables, but so far I've just loaded the tables and used an external tool to do essentially the same thing.

When I came up with this I was pleased (and also implemented it). Lately I've been wondering whether it is as wonderful as I thought. Is there a better way to store the definitions of the sets?

+1  A: 

I would think using duck-typing may be intuitive here, as an alternative.

For example all modern-languages (C#, Java, Python) have the concept of sets. If you are going to "intersect" or "union" (set operators) via SQL, then you have to store them in a relational way. Else, why not store them in a language native way ?. (as opposed to relational). By native way, I would mean that if it was done in Python and we used a Python set, then that is what I would persist. Same with Java or C#.

So if a set-id 10 had the members 1,4,5,6 it would be persisted in the DB as follows:

      SetId              Set
______________________________________
10                       1,4,5,6
11                       2,3
12                       null

Sure, this has the disadvantage that it could be proprietary, or maybe even non-performant - which you can perhaps tell as you have the complete problem definition. If you need SQL to analyze it, maybe my suggestion has further downsides.

In a sense, the set representation feature of each of these languages are like a DSL (Domain specific Language) - if you will need to 'talk' a lot of set-stuff between your application classes / objects, then why not use the natural fit?

blispr