views:

91

answers:

5

I have a table with:

id | parameter
 1 | A
 1 | B
 2 | A
 3 | A
 3 | B

That represent objects defined with the values as:

 1 -> A,B
 2 -> A
 3 -> A,B

I want to count the number of objects with different parameters using a SQL query, so in this case it would be 2 unique objects as 1 and 3 have the same parameters.

There is no constraint on the number of parameters, there can be 0, or any other number.

The database is a Microsoft SQL Server 2000. But I do not mind knowing the solution for other databases.

A: 

You can use a having clause to filter for two unique parameters:

select  count(*)
from    YourTable
group by
        id
having  count(distinct parameter) > 1
Andomar
I have edited the questing to clarify that: There is no constraint on the number of parameters there can be 0, or any other number.
Eduardo
+3  A: 

If I understand correctly, you want the number of distinct combinations of parameters per id represented in your table, possibly with the number of entities exhibiting each of those distinct combinations.

I can't speak for SQL Server, but under MySQL you could do something like this:

  SELECT parameter_set, COUNT(*) AS entity_count
    FROM (
          -- Here we "flatten" the different parameter combinations per id
             SELECT id,
                    GROUP_CONCAT(parameter ORDER BY parameter) AS parameter_set
               FROM tbl
           GROUP BY id
         ) d
GROUP BY parameter_set;

which will give you this:

 parameter_set | entity_count
---------------+--------------
 A,B           |            2   -- two entities have params A, B
 A             |            1   -- one entity has param A

and SELECT COUNT(DISTINCT parameter_set FROM (... flattening query ...)) d will give you the number of distinct parameter sets.

pilcrow
Yes, this would work for MySQL but I need the solution for Microsoft SQL Server 2000
Eduardo
If at all possible, I'd suggest implementing your own group concatenation function (http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/rowset-string-concatenation-which-method-is-best.aspx, for example, though searching for "row concatenation" would yield others). I think this solution is simple and intuitive.
Cheran S
This may not work as the OP desires. He says that an ID can have zero parameters -- which usually means a null value either directly in the table or in a left join. But `GROUP_CONCAT` ignores nulls. It should give an invalid result. (I say "should" verus "will" only because I don't have a MySQL server handy at the moment, to verify).
Brock Adams
Also, `GROUP_CONCAT` returns a string that defaults to 1024 characters and truncates with no warning. This invalidates the results if the *parameters* are too numerous and/or too long.
Brock Adams
+2  A: 

Okay, here's my attempt. It might be possible to implement this logic in a way that doesn't require 5 accesses to the same table, but I can't think of it right now.

The logic here is to first eliminate duplicate objects, then count the remaining IDs. The NOT IN subquery represents objects that have a matching object with a smaller ID. The subquery joins the parameters of two objects t1 and t2, then counts how many parameters matched for each t1/t2 pair. If the number of matching parameters is the same as the number of parameters in t1 and in t2, then t2 and t1 are matches and we should exclude t1 from the resultset.

DECLARE @tab TABLE (ID int, parameter varchar(2));

INSERT INTO @tab
SELECT 1, 'A' UNION ALL
SELECT 1, 'B' UNION ALL
SELECT 2, 'A' UNION ALL
SELECT 3, 'A' UNION ALL
SELECT 3, 'B' UNION ALL
SELECT 4, 'A' UNION ALL
SELECT 5, 'C' UNION ALL
SELECT 5, 'D';

SELECT
    COUNT(DISTINCT t.ID) AS num_groups
FROM
    @tab AS t
WHERE
    t.ID NOT IN
        (SELECT
             t1.ID AS ID1
         FROM
                 @tab AS t1
             INNER JOIN
                 @tab AS t2
             ON
                 t1.ID > t2.ID AND
                 t1.parameter = t2.parameter
         GROUP BY
             t1.ID,
             t2.ID
         HAVING
             COUNT(*) = (SELECT COUNT(*) FROM @tab AS dupe WHERE dupe.ID = t1.ID) AND
             COUNT(*) = (SELECT COUNT(*) FROM @tab AS dupe WHERE dupe.ID = t2.ID)
        );

Result on SQL Server 2008 R2:

num_groups
3

As for objects with 0 parameters, it depends on how they're stored, but generally, you'd just need to add one to the answer above if there are any objects with 0 parameters.

Cheran S
Actually this doesn't work. It will give incorrect values if more than one ID has zero parameters.
Brock Adams
+1  A: 

There is no foolproof way to do this in SQL Server 2000, with the conditions specified, but the following will work for most situations and it will warn you if it won't work.

Given table, "tbl":

ID  Parameter
 1      A
 1      B
 2      A
 3      A
 3      B
 4      A
 4      NULL
 5      C
 5      D
 6      NULL

.
Create this function:

CREATE FUNCTION MakeParameterListFor_tblID (@ID INT)
RETURNS VARCHAR(8000)
AS
BEGIN
    DECLARE
        @ParameterList  VARCHAR(8000),
        @ListLen        INT
    SET
        @ParameterList  = ''

    SELECT
        @ParameterList  = @ParameterList + COALESCE (Parameter, '*null*') + ', '
    FROM
        tbl
    WHERE
        ID  = @ID
    ORDER BY
        Parameter


    SET @ListLen        = LEN (@ParameterList)
    IF  @ListLen > 7800 -- 7800 is a SWAG.
        SET @ParameterList  = '*Caution: overflow!*' + @ParameterList
    ELSE
        SET @ParameterList  = LEFT (@ParameterList, @ListLen-1) -- Kill trailing comma.

    RETURN @ParameterList
END
GO

.
Then this query:

SELECT
    COUNT (ID)   AS NumIDs,
    NumParams,
    ParamList
FROM
    (
        SELECT
            ID,
            COUNT (Parameter)                   AS NumParams,
            dbo.MakeParameterListFor_tblID (ID) AS ParamList
        FROM
             tbl
        GROUP BY
            ID
    ) AS ParamsByID
GROUP BY
    ParamsByID.ParamList,
    ParamsByID.NumParams
ORDER BY
    NumIDs      DESC,
    NumParams   DESC,
    ParamList   ASC

.
Will give what you asked for.
Results:

NumIDs  NumParams   ParamList
  2         2         A, B
  1         2         C, D
  1         1         *null*, A
  1         1         A
  1         0         *null*
Brock Adams
A: 

I resolved the problem with the link Cheran S provided (as Microsoft SQL Server still does not have a GROUP_CONCAT() function) http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/rowset-string-concatenation-which-method-is-best.aspx

Eduardo