ansaurus

Question

Creating combinations that have no more one intersecting element

Answer 1

+6 A:

Say you have n letters (or students, or whatever), and every week want to partition them into subsets of size k (for a total of n/k subsets every week). This method will generate almost n/k subsets every week - I show below how to extend it to generate exactly n/k subsets.

Generating the Subsets (no partitioning)

First pick p, the largest prime <= n/k.

Let's consider every ordered pair (a,b) such that

0 <= a < k
0 <= b < p

We can map each pairing to one of our letters; thus, we can map p*k <= n letters this way (again, I show below how to map exactly n letters)

(0,0) => 'A'
(0,1) => 'B'
...
(0,p-1) => 'F'
(1,0)   => 'G'
(1,1)   => 'H'
...
(k-1,p-1) => 's'

Now, given

0 <= w < p  
0 <= i < p

We can create a set S_w(i) of our ordered pairs. Each pairing in S_w(i) will represent one letter (according to our mapping above), and the set S_w(i) itself represents one "grouping of letters" aka. one subset of size k.

The formula for S_w(i) is

S_w(i) = {(0,i mod p), (1,(w+i) mod p), (2,(2w+i) mod p),..., ((k-1),((k-1)*w+i) mod p)}
      = { (x,y) | 0 <= x < k and y = w*x + i (mod p)}

If we vary w and i over all possible values, we get p² total sets. When we take any two of these sets, they will have at most one intersecting element.

How it works

Say we have two sets S_w1(i₁) and S_w2(i₂). If S_w1(i₁) and S_w2(i₂) have more than one element in common, then there exists at least two x such that

w₁*x + i₁ = w₂*x + i₂ (mod p)  
(w₁-w₂)*x + (i₁-i₂) = 0 (mod p)

However, anyone who's taken modular arithmetic knows that if p is prime, either x has a unique solution or (w₁ = w₂ and i₁ = i₂); thus, there cannot be more than one x, and S_w1(i₁) and S_w2(i₂) can have at most one intersecting element.

Analysis

Since p < n/k, by Chebyshev's Theorem (which states there is a prime between x and 2x for x > 3)

n/2k < p <= n/k

Thus, this method generates at least (n/2k)² subsets of letters, though in practice p will be nearer to n/k, so the number will be nearer to (n/k)². Since a simple upper bound for the maximum possible such subsets is n(n-1)/(k(k-1)) (see BlueRaja's comment below), this means the algorithm is asymptotically optimal, and will generate near the optimal amount of sets (even in the worst case, it won't generate less than about 1/4th the optimal amount; see again the comment below)

Partitioning

You now want to group the letters into partitions each week: each week, all letters are included in exactly one group.
We do this by letting w be fixed to a certain value (representing the week) and letting i vary from 0 to p-1.

Proof

Consider the groups we created:

S_w(i) = { (x,y) | 0 <= x < k and y = w*x + i (mod p)}

Let's say w is fixed and i varies from 0 to p-1. Then we get p sets:

S_w(0), S_w(1), ..., S_w(p-1)

Now let's say S_w(i₁) and S_w(i₂) (with i₁ =/= i₂) intersect; then

w*x + i₁ = w*x + i₂ (mod p)

for some x, and hence i₁ = i₂. Thus, S_w(i₁) and S_w(i₂) don't intersect.

Since no two of our sets intersect, and there are exactly p of them (each with k elements), our sets form a partition of the k*p letters.

Generating n/k Subsets Each Week

The biggest disadvantage of this method is that it generates sets for p*k letters, rather than n letters. If the last letters can't be left out (as in your case, where the letters are students), there are two ways to generate exactly n/k subsets each week:

Find a set of prime numbers p₁, p₂, p₃, ... which sums up to exactly n/k. Then we can treat each group of p_i*k letters as an independent alphabet, so that rather than finding subsets of p*k letters, we find one group of subsets for p₁*k letters, another group of subsets for p₂*k letters, another group...
This has the disadvantage that letters from one group will never be paired with letters from another group, reducing the total number of subsets generated. Luckily, if n is even, by Goldbach's conjecture† you will only need two groups at the most (if n is odd, you will only need three at most)
This method guarantees subsets of size k, but doesn't generate as many subsets.
† Though unproven, it is known to be true for every ridiculously large number you will likely encounter for this problem
The other option is to use the smallest prime p >= n/k. This will give you p*k >= n letters - after the subsets have been generated, simply throw out the extra letters. Thus, in the end this gives you some subsets with size < k. Assuming k divides n evenly (ie. n/k is an integer), you could take the smaller subsets and mix them up by hand to make subsets of size k, but you risk having some overlap with past/future subsets this way.
This method generates at least as many subsets as the original method, but some may have size < k

Example

Take n = 15, k = 3. i.e. there are 15 students and we are making groups of three.

To begin with, we pick largest prime p <= n/k. n/k is prime (lucky us!), so p = 5.

We map the 15 students into the ordered pairs (a,b) described above, giving us (each letter is a student):

(0,0) = A
(0,1) = B
(0,2) = C
(0,3) = D
(0,4) = E

(1,0) = F
(1,1) = G
(1,2) = H
(1,3) = I
(1,4) = J

(2,0) = K
(2,1) = L
(2,2) = M
(2,3) = N
(2,4) = O

The method generates 25 groups of three. Thus, since we need to schedule n/k = 5 groups each week, we can schedule 5 weeks of activities (5 groups a week * 5 weeks = 25 groups).

For week 0, we generate the partition as

S₀(i), for i = 0 to 4.

S₀(0) = { (0,0), (1,0), (2,0) } = AFK
S₀(1) = { (0,1), (1,1), (2,1) } = BGL
S₀(2) = { (0,2), (1,2), (2,2) } = CHM
S₀(3) = { (0,3), (1,3), (2,3) } = DIN
S₀(4) = { (0,4), (1,4), (2,4) } = EJO

For week 4 it will be

S₄(i) for i = 0 to 4.

S₄(0) = { (0,0), (1, (4*1 + 0) mod 5), (2, (2*4 + 0) mod 5) }
      = { (0,0), (1,4), (2,3) }
      = AJN
S₄(1) = { (0,1), (1, (4*1 + 1) mod 5), (2, (4*2 + 1) mod 5) }
      = { (0,1), (1,0), (2,4) }
      = BFO
S₄(2) = { (0,2), (1, (4*1 + 2) mod 5), (2, (4*2 + 2) mod 5) }
      = { (0,2), (1,1), (2,0) }
      = CGK
S₄(3) = { (0,3), (1, (4*1 + 3) mod 5), (2, (4*2 + 3) mod 5) }
      = { (0,3), (1,2), (2,1) }
      = DHL
S₄(4) = { (0,4), (1, (4*1 + 4) mod 5), (2, (4*2 + 4) mod 5) }
      = { (0,4), (1,3), (2,2) }
      = EIM

Here's the schedule for all 5 weeks:

Week: 0
S₀(0) ={(0,0) (1,0) (2,0) } = AFK
S₀(1) ={(0,1) (1,1) (2,1) } = BGL
S₀(2) ={(0,2) (1,2) (2,2) } = CHM
S₀(3) ={(0,3) (1,3) (2,3) } = DIN
S₀(4) ={(0,4) (1,4) (2,4) } = EJO

Week: 1
S₁(0) ={(0,0) (1,1) (2,2) } = AGM
S₁(1) ={(0,1) (1,2) (2,3) } = BHN
S₁(2) ={(0,2) (1,3) (2,4) } = CIO
S₁(3) ={(0,3) (1,4) (2,0) } = DJK
S₁(4) ={(0,4) (1,0) (2,1) } = EFL

Week: 2
S₂(0) ={(0,0) (1,2) (2,4) } = AHO
S₂(1) ={(0,1) (1,3) (2,0) } = BIK
S₂(2) ={(0,2) (1,4) (2,1) } = CJL
S₂(3) ={(0,3) (1,0) (2,2) } = DFM
S₂(4) ={(0,4) (1,1) (2,3) } = EGN

Week: 3
S₃(0) ={(0,0) (1,3) (2,1) } = AIL
S₃(1) ={(0,1) (1,4) (2,2) } = BJM
S₃(2) ={(0,2) (1,0) (2,3) } = CFN
S₃(3) ={(0,3) (1,1) (2,4) } = DGO
S₃(4) ={(0,4) (1,2) (2,0) } = EHK

Week: 4
S₄(0) ={(0,0) (1,4) (2,3) } = AJN
S₄(1) ={(0,1) (1,0) (2,4) } = BFO
S₄(2) ={(0,2) (1,1) (2,0) } = CGK
S₄(3) ={(0,3) (1,2) (2,1) } = DHL
S₄(4) ={(0,4) (1,3) (2,2) } = EIM

More Practical Example

In your case, n = 1000 students and k = 4 in each group. Thus, we pick p as the largest prime <= (n/k = 1000/4 = 250), so p = 241. Without considering the alterations above under "Generating n/k Subsets Each Week", this method will generate a schedule for 961 students lasting 241 weeks.

(An upper-bound for the maximum number of subsets possible would be 1000*999/(4*3) = 83250, though the actual number is likely less than that. Even so, this method generates 58081 subsets, or about 70% of the theoretical maximum!)

If we use the first method above to generate a schedule for exactly 1000 students, we take p₁ = 113, p₂ = 137 (so that p₁ + p₂ = n/k). Thus, we can generate (113)^2 + (137)^2 = 31,538 subsets of students, enough to last 113 weeks.

If we use the second method above to generate a schedule for exactly 1000 students, we take p = 251. This will give us a schedule for 1004 students for 251 weeks; we remove the 4 phantom students from the schedule each week. Usually, this will result in four groups of 3 every week (though unlikely, it is also possible to have for example one group of 2 and two groups of 3). The groups with < 4 students will always have a multiple-of-4 total number of students, so you could manually place those students into groups of 4, at the risk of potentially having two of those students together again later in another group.

Final thoughts

One flaw of this algorithm is that it's not really flexible: if a student drops out, we are forever stuck with a phantom student. Also, there is no way to add new students to the schedule midway through the year (unless we allow for them by initially creating phantom students).

This problem falls under the category of Restricted Set Systems in combinatorics. See this paper for more information, especially Chapters 1 and 2. Since it is a postscript file, you will need gsview or something to view it.

Moron 2010-06-02 06:43:08

Thanks for the answer. I am trying to make sense of it.

khuss 2010-06-02 15:05:05

Do you know if the above logic could be mapped to a graph?

khuss 2010-06-02 18:27:48

@khuss: Yes, but depends. What kind of graph are you looking for?

Moron 2010-06-02 18:32:39

well. I am not sure. I was thinking if this could be mapped to standard graph algorithm, it could be easier to manage.

khuss 2010-06-02 20:59:28

@Khuss: Perhaps if you explain a bit more clearly what you need to do... like how many groups a week do you need? Do you need many groups are the same time? etc etc.

Moron 2010-06-02 21:28:34

@moron if there are 1000 users, we can generate up to 250 groups in a week. More than that doesn't make sense since the same person cannot be in two groups in the same week. The program will be run every week to generate ~250 new groups until we can no longer generate unique groups. By unique group, I mean the maximum number of intersecting elements in any two groups should not be more than 1.

khuss 2010-06-03 04:43:26

@khuss: Good news! I believe the construction I gave works :-). You don't need any graphs. I have added an EDIT: to my answer. Please read that and see if it works for you. Please let me know if you need any clarification regarding the edit (or the earlier portion).

Moron 2010-06-03 06:06:19

let me check and get back to you. Thanks for the help

khuss 2010-06-03 06:18:22

could you also add an example how you would use your procedure to generate partitions of size 3 from a set with 9 elements? that should make it much more clear or any potential problems more obvious.

Unreason 2010-06-03 15:03:29

@Unreason. Sure. I will try to add an example using the approach above, but it will probably only give 9 sets when n=9 and k=3. The method becomes more useful when n gets large. Perhaps I will try and give an example with larger n.

Moron 2010-06-03 17:04:50

@moron I am having a hard time following the logic. An example would be great.

khuss 2010-06-03 19:40:36

@khuss, @Unreason: I have added an example to make it clearer.

Moron 2010-06-03 21:46:32

Ideally I would like to run the program every week and create the groups so that we can account for students dropping out and joining every week. I guess I can run this every week until we reach n/k groups. Every time we form a group, it needs to be checked against a database to make sure that it is a unique group. Do you see any problem with this approach?

khuss 2010-06-03 22:51:10

@khuss: Sorry, this method is not really flexible. For flexibilty you would probably need some graph algorithms. What you are looking for is to find a spanning subgraph of a graph which is the disjoint union of copies of K_4 (the complete graph on 4 vertices). I have a feeling this is going to be a hard problem...

Moron 2010-06-03 23:28:45

@khuss: I have added another post/answer which talks in terms of graphs. Catering to students dropping joining etc, this becomes a hard problem. Sorry and good luck!

Moron 2010-06-04 01:47:17

Beautiful algorithm :) I really enjoy your answers!

Matthieu M. 2010-06-04 06:29:09

+1 Nice, as you say it provides a fair number of results. i have been brute forcing through all k-combinations and getting the number of results from 27 to 35, depending on where you start, for the same parameters as in your example. this is a nice problem, i wonder if the solution can be even improved.

Unreason 2010-06-04 11:03:54

@Matthieu: Thanks! I am glad you enjoy my answers :-)

Moron 2010-06-04 14:15:02

@Unreason: I believe it is still an open problem of even knowing what the maximum is. There might already be algorithms out there which gives a better number than what we have here. I wasn't able to find any by a web search though.

Moron 2010-06-04 14:18:49

@khuss: I've verified that this should indeed work. One thing that is sort of glossed over (but mentioned at the end) is that algorithm works for k*p students, but p <= n/k so k*p <= n. A simpler solution would be to use p=251, giving you 4 "ghost" students which you are free to ignore (meaning you'll need a few groups of 2 or 4, which you'd need anyways since 3 does not divide 1000). @Moron: I've never seen anything like this; this is more creative than I'd expect even from an algorithms grad student. Did you find this somewhere, or are you some sort of supergenius algorithms professor?

BlueRaja - Danny Pflughoeft 2010-06-04 15:11:46

(cont.) Also, would you mind if I polished up the formatting to make it more beginner-friendly? @Unreason @Moron: If we model this as a complete graph (every node a student, remove k(k-1)/2 edges every time we make a grouping), since there are n(n-1)/2 edges, an absolute upper-bound of the number of possible groupings is n(n-1)/(k(k-1)). Using Moron's minimum # of groupings formed above, this means that **in the absolute worst case** the number of groupings this algorithm generates is off from the max by a factor of n(k-1)/(4k(n-1)), or about 1/4. In practice it will be even better.

BlueRaja - Danny Pflughoeft 2010-06-04 15:24:50

@BlueRaja: Thanks for cleaning it up. This algorithm basically derives from the proof that there are alteast (n/2k)^2 such sets. I had known this before from a combinatorics course I took long time back. I believe the result is due to Frankl, and I think the result is in the ps file I mentioned in the answer. The basic idea is that polynomials of degree k over a finite field have no more than k roots, and so sets {(x, P(x)) | x in A} and {(x, Q(x))| x in A} can have atmost k intersections (roots of P-Q). We could actually also use prime powers: GF(p^t), instead of just primes

Moron 2010-06-04 16:50:34

@Moron: I have not changed it; I didn't want to butt heads.

BlueRaja - Danny Pflughoeft 2010-06-04 21:42:25

@BlueRaja: I meant go ahead, I don't mind :-)

Moron 2010-06-04 21:46:27

@Khuss: Check out the rewrite @Moron: okay, I've finished the cleanup - if you feel it is obtrusive, please rewrite it or revert back to the previous revision.

BlueRaja - Danny Pflughoeft 2010-06-15 19:39:12

@BlueRaja: Awesome! I wouldn't even dream of changing it back. Thanks!

Moron 2010-06-15 19:43:25

thank for the rewrite.. it is much more clear now

khuss 2010-06-21 05:31:14

Answer 2

+1 A:

Here is some outline of the algorithm.

First find all the pairs: AB BC CD DE EF FG GH HI AC BD CE DF EG FH GI AD BE CF DG EH FI AE BF CG DH EI AF BG CH DI AG BH CI AH BI AI

These can be stored in an array of sixe n(n-1)

Now start attempting to combine consecutive pairs using the following rules: a. Two pairs can be combined only when there is a common character. b. The combination is possible when the pair formed by leaving the common character is also available. e.g. if we want to combine AB and AC then we need to check if BC is also available. c. When the above rules are satisfied we combine the two pairs into a triple (e.g. AB and AC merged to form ABC) and mark all the three pairs, such as AB, AC and BC as unavailable.
Continue doing this looking for available pairs in the array and merging them to form triples and marking the pairs unavailable until there are no available pairs or no triples can be formed anymore.

Example: 1. combine AB + AC --> ABC; Mark AB, AC and BC unavailable. 2. combine AD + AE --> AED; Mark AD, AE and DE unavailable. 3. combine AF + AG --> AFG; Mark AF, AG and FG unavailable. ..

Anil 2010-06-02 07:40:50

will this work if the subset size is something other than 3, let us say 4.

khuss 2010-06-02 14:35:17

By applying recursively on the set of triples, I suppose. However in term of efficiency it's quite bad. The issue is that you can have many triples in a single 4-tuple, and you need to mark all of them as unavailable. One would need an inserting index structure I guess :)

Matthieu M. 2010-06-03 15:43:26

Answer 3

+1 A:

Here's an approach which will satisfy the requirements you have stated. Whether it does what you want I don't know.

On a large sheet of paper draw a regular grid with at least 250 squares.
Label the sides of the squares with the letters in your alphabet (250 squares x 4 sides == 1000).
Each square defines one of your subsets. Each shares one side (ie one letter) only with each of its (up to) 4 neighbours. No side (ie letter) is shared by more than 2 squares (subsets).

I'll leave it up to you to turn this into working code, but I don't think that it should be too difficult and it should scale well. Note that it should also work for any other size of subset, you can tile the plane with irregular n-gons for any n, though it might get difficult.

The other approach I thought of is:

On a large sheet of paper draw N dots, where N is the size of your alphabet. Label each dot with one of the letters of the alphabet.
Connect any n dots, where n is the size of the required subsets. That's your first subset.
Choose one of the already-connected dots, connect it to (n-1) more unconnected dots. That's your next subset.
Repeat step 3 until you are finished.

This requires a bit more book-keeping, and there are a number of corner cases to deal with, but again it shoudln't be too difficult to code up.

EDIT: It's easy to transform my first approach into a form which is more obviously an algorithm on a graph. Model each subset as a node, and each letter in the alphabet as an edge. Construct a graph where each node has degree n (number of elements in the subset) and each letter (edge) is used once.

High Performance Mark 2010-06-02 07:44:55

could you give an example on how this works. I am not sure how to represent the subsets and alphabets as nodes and edges. Simple example would suffice.

khuss 2010-06-03 19:42:33

@khuss: http://en.wikipedia.org/wiki/Graph_(data_structure) explains things better than I can.

High Performance Mark 2010-06-03 20:49:24

@High Performance Mark: in your first approach you will get only a (relatively small) fraction of possible subsets. Still +1.

Unreason 2010-06-04 13:25:01

Answer 4

A:

@khuss

Same method can be generalized. But it is not a linear algorithm and may be exponential.

For example: When subset size is 4, we pick 2 or more pairs with 4 unique characters.

e.g. "AB and CD" or "AB, AC & AD" only if the following conditions are satisfied:

All the pairs formed by the characters of the 4-tuple are available. e.g. if we want to form ABCD using AB, AC & AD then all the pairs fromed out A, B, C & D i.e. AB, AC, AD, BC, BD, CD all must be available.
If condition 1 is satisfied then we form ABCD and mark the C(4,2) = 6 pairs as unavailable.

We continue as before until no more 4-tuples can be formed or no more pairs are available.

Anil 2010-06-02 19:14:00

There are 1000C2 possible pairs, and thus (1000C2)C2 = 125 billion pairs-of-pairs. That might be too much to brute force on a teacher's salary :)

BlueRaja - Danny Pflughoeft 2010-06-03 22:11:42

Answer 5

+2 A:

I had to add another answer as the other one was too long already.

If you have the following constraints:

1) You need groups of 4 every week.

2) Each group in a certain week is disjoint and each student is in exactly one group.

3) If two students are in the same group, they need cannot be in the same group in future.

If you construct a graph G as follows:

1) Each student is a node.

2) Two students are joined by an edge iff they haven't been together in a group before.

With students dropping/joining arbitrarily, this becomes a hard problem! Even though you start out with a complete graph intially, after some weeks, the graph could become quite unpredictable.

Your problem: You need to find a spanning subgraph of G, such that it is a union of copies of K_4 or in other words a partition into K_4s.

Unfortunately, look like this problem is NP-Hard: Exact cover by 4-sets (which is NP-Complete) can be reduced to your problem (just like Exact cover by 3-sets can be reduced to partition into triangles).

Perhaps this will help give some aproximation algorithms: http://www.siam.org/proceedings/soda/2010/SODA10_122_chany.pdf

(Your problem can be reduced to Hypergraph matching and so algorithms for that can be used for your problem).

Exact Cover: http://en.wikipedia.org/wiki/Exact_cover

Partition into triangles: https://noppa.tkk.fi/noppa/kurssi/t-79.5103/viikkoharjoitukset/T-79_5103_solutions_5.pdf

Exact Cover by 4 sets = Given a set S of size 4m and a collection C of 4-element subsets of S, does there exist a subset C' of C, such that each element of S appears precisely once in C'.

Unfortunately, seems like you might have to change some constraints.

Moron 2010-06-04 01:41:43

I can see why dropping students would cause problems but why would adding new students(nodes) be an issue? Could you please explain?

khuss 2010-06-04 04:58:09

for dropped out students, can we mark corresponding nodes "inactive" somehow? This means while creating new sub graphs we can ignore those node that are inactive. Why wouldn't this work?

khuss 2010-06-04 05:00:59

Thanks for the links, some of them gave me new look on the problem.

Tomek Tarczynski 2010-06-04 08:26:54

As long as you are listing different approaches, there's also representation of selected elements as binary numbers (01100 on n=5, k=2 represents BC). In that notation the solution can be expressed as all the binary numbers that have hamming distance of k-1.

Unreason 2010-06-04 13:42:23

BlueRaja - Danny Pflughoeft 2010-06-04 14:04:38

@khuss: Yes. For dropped out students, we can delete the node altogether (or mark it as inactive to save that info about the student). After a week, you delete the edges corresponding the students pairs who were in the same group (6 edges for 1 group). So you graph changes pretty quickly and you would likely need a _general_ algorithm to determine the student partition each week, especially since the students dropping/adding is likely not in your hands. If there were no students dropping/adding you could use the other approach.

Moron 2010-06-04 14:04:44

@khuss: Continuing from previous comment. It might still be possible to do something clever and cater to students dropping/adding and not have to have a _general_ algorithm which you apply to your case, but seems like a hard combinatorics problem.

Moron 2010-06-04 14:11:28

@BlueRaja, erm, yes, i fouled it up. Let me try again: weight(a

Unreason 2010-06-04 14:48:16

would it be easier to solve this problem if we use a graph with a database. The graph could be used to represent the current state of groups as well as to create new groups. Every week, find unconnected nodes to form groups until 250 groups are reached. This information can be stored in the database. Next time, graph can load this information from the database, then do another search to find the next 250 groups. Still need to figure out the actual process to create the groups from the graph. Do you see any obvious problems with this approach?

khuss 2010-06-05 04:52:39

Since we are eliminating nodes as we go, I would think the processing time will not be too bad. We still have to determine the maximum number of groups possible. I think this is required to evenly distribute groups across nodes.

khuss 2010-06-05 04:56:13

@khuss: The problem is not with storage, the problem is that just 'picking groups till 250 is reached' might not even work. It might happen that unless you pick smartly, you might keep having to back-track your picking process for the week i.e. you will see that you have to pick a student already in a group even before 250 groups is reached. Also, we are eliminating edges as we go not nodes. Nodes disappear only if a student drops out.

Moron 2010-06-05 14:17:21