views:

245

answers:

5

Given a the following table:

Index | Element
---------------
  1   |    A
  2   |    B
  3   |    C
  4   |    D

We want to generate all the possible permutations (without repetitions) using the elements. the final result (skipping some rows) will look like this:

  Results
----------
   ABCD
   ABDC
   ACBD
   ACDB
   ADAC
   ADCA

   ...

   DABC
   DACB
   DBCA
   DBAC
   DCAB
   DCBA

  (24 Rows)

How would you do it?

+3  A: 
DECLARE @s VARCHAR(5);
SET @s = 'ABCDE';

WITH Subsets AS (
SELECT CAST(SUBSTRING(@s, Number, 1) AS VARCHAR(5)) AS Token,
CAST('.'+CAST(Number AS CHAR(1))+'.' AS VARCHAR(11)) AS Permutation,
CAST(1 AS INT) AS Iteration
FROM dbo.Numbers WHERE Number BETWEEN 1 AND 5
UNION ALL
SELECT CAST(Token+SUBSTRING(@s, Number, 1) AS VARCHAR(5)) AS Token,
CAST(Permutation+CAST(Number AS CHAR(1))+'.' AS VARCHAR(11)) AS
Permutation,
s.Iteration + 1 AS Iteration
FROM Subsets s JOIN dbo.Numbers n ON s.Permutation NOT LIKE
'%.'+CAST(Number AS CHAR(1))+'.%' AND s.Iteration < 5 AND Number
BETWEEN 1 AND 5
--AND s.Iteration = (SELECT MAX(Iteration) FROM Subsets)
)
SELECT * FROM Subsets
WHERE Iteration = 5
ORDER BY Permutation

Token Permutation Iteration
----- ----------- -----------
ABCDE .1.2.3.4.5. 5
ABCED .1.2.3.5.4. 5
ABDCE .1.2.4.3.5. 5
(snip)
EDBCA .5.4.2.3.1. 5
EDCAB .5.4.3.1.2. 5
EDCBA .5.4.3.2.1. 5

first posted a while ago here

However, it would be better to do it in a better language such as C# or C++.

AlexKuznetsov
@Alex: Thank you Alex. this is virtually the same approach. it uses recursive CTEs and a column (permutation) to remove duplicate elements from permutations
SDReyes
@Alex UPDATE: I think that a solution using native SQL would have better performance than one using C# integration. what do you think?
SDReyes
@SDReyes: at the time of this writing recursive CTEs are slow.
AlexKuznetsov
@Alex: OK. that's new for me :). I think make some bencharking using C# and the recursive CTE is fair. maybe I can do some tests this weekend. thank you Alex! +1
SDReyes
+1  A: 

Am I correctly understanding that you built Cartesian product n x n x n x n, and then filter out unwanted stuff? The alternative would be generating all the numbers up to n! and then using factorial number system to map them via element encoding.

Tegiri Nenashi
+1  A: 

Simpler than a recursive CTE:

declare @Number Table( Element varchar(MAX), Id varchar(MAX) )
Insert Into @Number Values ( 'A', '01')
Insert Into @Number Values ( 'B', '02')
Insert Into @Number Values ( 'C', '03')
Insert Into @Number Values ( 'D', '04')

select a.Element, b.Element, c.Element, d.Element
from @Number a
join @Number b on b.Element not in (a.Element)
join @Number c on c.Element not in (a.Element, b.Element)
join @Number d on d.Element not in (a.Element, b.Element, c.Element)
order by 1, 2, 3, 4

For an arbitrary number of elements, script it out:

if object_id('tempdb..#number') is not null drop table #number
create table #number (Element char(1), Id int, Alias as '_'+convert(varchar,Id))
insert #number values ('A', 1)
insert #number values ('B', 2)
insert #number values ('C', 3)
insert #number values ('D', 4)
insert #number values ('E', 5)

declare @sql nvarchar(max)
set @sql = '
select '+stuff((
  select char(13)+char(10)+'+'+Alias+'.Element'
  from #number order by Id for xml path (''), type
  ).value('.','NVARCHAR(MAX)'),3,1,' ')

set @sql += '
from #number '+(select top 1 Alias from #number order by Id)

set @sql += (
  select char(13)+char(10)+'join #number '+Alias+' on '+Alias+'.Id not in ('
    +stuff((
      select ', '+Alias+'.Id'
      from #number b where a.Id > b.Id
      order by Id for xml path ('')
      ),1,2,'')
    + ')'
  from #number a where Id > (select min(Id) from #number)
  order by Element for xml path (''), type
  ).value('.','NVARCHAR(MAX)')

set @sql += '
order by 1'

print @sql
exec (@sql)

To generate this:

select 
 _1.Element
+_2.Element
+_3.Element
+_4.Element
+_5.Element
from #number _1
join #number _2 on _2.Id not in (_1.Id)
join #number _3 on _3.Id not in (_1.Id, _2.Id)
join #number _4 on _4.Id not in (_1.Id, _2.Id, _3.Id)
join #number _5 on _5.Id not in (_1.Id, _2.Id, _3.Id, _4.Id)
order by 1
Peter
A: 

After making some perhaps snarky comments, this problem stuck in my brain all evening, and I eventually came up with the following set-based approach. I believe it definitely qualifies as "elegant", but then I also think it qualifies as "kinda dumb". You make the call.

First, set up some tables:

--  For testing purposes
DROP TABLE Source
DROP TABLE Numbers
DROP TABLE Results


--  Add as many rows as need be processed--though note that you get N! (number of rows, factorial) results,
--  and that gets big fast. The Identity column must start at 1, or the algorithm will have to be adjusted.
--  Element could be more than char(1), though the algorithm would have to be adjusted again, and each element
--  must be the same length.
CREATE TABLE Source
 (
   SourceId  int      not null  identity(1,1)
  ,Element   char(1)  not null
 )

INSERT Source (Element) values ('A')
INSERT Source (Element) values ('B')
INSERT Source (Element) values ('C')
INSERT Source (Element) values ('D')
--INSERT Source (Element) values ('E')
--INSERT Source (Element) values ('F')


--  This is a standard Tally table (or "table of numbers")
--  It only needs to be as long as there are elements in table Source
CREATE TABLE Numbers (Number int not null)
INSERT Numbers (Number) values (1)
INSERT Numbers (Number) values (2)
INSERT Numbers (Number) values (3)
INSERT Numbers (Number) values (4)
INSERT Numbers (Number) values (5)
INSERT Numbers (Number) values (6)
INSERT Numbers (Number) values (7)
INSERT Numbers (Number) values (8)
INSERT Numbers (Number) values (9)
INSERT Numbers (Number) values (10)


--  Results are iteratively built here. This could be a temp table. An index on "Length" might make runs
--  faster for large sets.  Combo must be at least as long as there are characters to be permuted.
CREATE TABLE Results
 (
   Combo   varchar(10)  not null
  ,Length  int          not null
 )

Here's the routine:

SET NOCOUNT on

DECLARE
  @Loop     int
 ,@MaxLoop  int


--  How many elements there are to process
SELECT @MaxLoop = max(SourceId)
 from Source


--  Initialize first value
TRUNCATE TABLE Results
INSERT Results (Combo, Length)
 select Element, 1
  from Source
  where SourceId = 1

SET @Loop = 2

--  Iterate to add each element after the first
WHILE @Loop <= @MaxLoop
 BEGIN

    --  See comments below. Note that the "distinct" remove duplicates, if a given value
    --  is to be included more than once
    INSERT Results (Combo, Length)
     select distinct
        left(re.Combo, @Loop - nm.Number)
        + so.Element
        + right(re.Combo, nm.Number - 1)
       ,@Loop
      from Results re
       inner join Numbers nm
        on nm.Number <= @Loop
       inner join Source so
        on so.SourceId = @Loop
      where re.Length = @Loop - 1

    --  For performance, add this in if sets will be large
    --DELETE Results
    -- where Length <> @Loop

    SET @Loop = @Loop + 1
 END

--  Show results
SELECT *
 from Results
 where Length = @MaxLoop
 order by Combo

The general idea is: when adding a new element (say "B") to any string (say, "A"), to catch all permutations you would add B to all possible positions (Ba, aB), resulting in a new set of strings. Then iterate: Add a new element (C) to each position in a string (AB becomes Cab, aCb, abC), for all strings (Cba, bCa, baC), and you have the set of permutations. Iterate over each result set with the next character until you run out of characters... or resources. 10 elements is 3.6 million permutations, roughly 48MB with the above algorithm, and 14 (unique) elements would hit 87 billion permutations and 1.163 terabytes.

I'm sure it could eventually be wedged into a CTE, but in the end all that would be is a glorified loop. The logic is clearer this way, and I can't help but think the CTE execution plan would be a nightmare.

Philip Kelley
In what way is this elegant?
Denis Valeev
It is set-based, and uses a bare minimum of string manipulation.
Philip Kelley
We need to stress test it against the cte solution to evaluate its merit in the optimization department.
Denis Valeev
stress test of 'abcdefghi':your solution took 24 seconds to complete'the answer' took 2 minutesAlexKuznetsov's solution took 1 minute and 25 secondsSo... you win!
Denis Valeev
Marked as new Answer. Thanks Denis : ) +1
SDReyes
A: 

Current solution using a recursive CTE.

-- The base elements
Declare @Number Table( Element varchar(MAX), Id varchar(MAX) )
Insert Into @Number Values ( 'A', '01')
Insert Into @Number Values ( 'B', '02')
Insert Into @Number Values ( 'C', '03')
Insert Into @Number Values ( 'D', '04')

-- Number of elements
Declare @ElementsNumber int
Select  @ElementsNumber = COUNT(*)
From    @Number;



-- Permute!
With Permutations(   Permutation,   -- The permutation generated
                     Ids,            -- Which elements where used in the permutation
                     Depth )         -- The permutation length
As
(
    Select  Element,
            Id + ';',
            Depth = 1
    From    @Number
    Union All
    Select  Permutation + ' ' + Element,
            Ids + Id + ';',
            Depth = Depth + 1
    From    Permutations,
            @Number
    Where   Depth < @ElementsNumber And -- Generate only the required permutation number
            Ids Not like '%' + Id + ';%' -- Do not repeat elements in the permutation (this is the reason why we need the 'Ids' column) 
)
Select  Permutation
From    Permutations
Where   Depth = @ElementsNumber
SDReyes