tags:

views:

270

answers:

6

Pascal's rule on counting the subset's of a set works great, when the set contains unique entities.

Is there a modification to this rule for when the set contains duplicate items?

For instance, when I try to find the count of the combinations of the letters A,B,C,D, it's easy to see that it's 1 + 4 + 6 + 4 + 1 (from Pascal's Triangle) = 16, or 15 if I remove the "use none of the letters" entry.

Now, what if the set of letters is A,B,B,B,C,C,D? Computing by hand, I can determine that the sum of subsets is: 1 + 4 + 8 + 11 + 11 + 8 + 4 + 1 = 48, but this doesn't conform to the Triangle I know.

Question: How do you modify Pascal's Triangle to take into account duplicate entities in the set?

+4  A: 

A set only contains unique items. If there are duplicates, then it is no longer a set.

Dima
Generally referred to as a multiset or bag.
Hank
I find it depressing that a quibble about terminology is the most popular response. Did anyone actually not understand the question that was asked?
Mathematics is built on very precise definitions, so "terminology" matters. If you arbitrarily change the definition of a set, then the operations defined on sets may change their meaning, or become meaningless altogether.
Dima
@Dima In this case a lecture on terminology is particularly unhelpful. If you are sophisticated enough to care about such things then you should also introduce the terms and forms that leaves the intention unchanged but makes the question well formed.
Steven Noble
Ok, I apologize for sounding like a pompous ass. But I think the point still stands. If you have a "set" that allows duplicates, then terms like "subset", "member", "union", and "intersection" do not mean what Pascal thought they meant.
Dima
Sorry Dima, but you're wrong. Multi-sets look like sets but they allow duplicates, and "subset", "member", "union", and "intersection" mean what Pascal thought they meant.
Let's just say, this kind of semantic quibbling, while of some minor benefit (if you're absolutely positive you're right), isn't going to get someone to mark it as "best answer".
Bill James
A: 

Even though mathematical sets do contain unique items, you can run into the problem of duplicate items in 'sets' in the real world of programming. See this thread on Lisp unions for an example.

Paul Reiners
+1  A: 

You don't need to modify Pascal's Triangle at all. Study C(k,n) and you'll find out -- you basically need to divide the original results to account for the permutation of equivalent letters.

E.g., A B1 B2 C1 D1 == A B2 B1 C1 D1, therefore you need to divide C(5,5) by C(2,2).

Pedro
That helps if you want the total number of sub multi-sets. It does not help if you want to solve the total number of sub multi-sets with, say, 5 elements.
C(5,5) = 1, as does C(2,2)... this isn't close to what I need.
Bill James
+1  A: 

Without duplicates (in a set as earlier posters have noted), each element is either in or out of the subset. So you have 2^n subsets. With duplicates, (in a "multi-set") you have to take into account the number the number of times each element is in the "sub-multi-set". If it m_1,m_2...m_n represent the number of times each element repeats, then the number of sub-bags is (1+m_1) * (1+m_2) * ... (1+m_n).

David Nehme
+3  A: 

Yes, if you don't want to consider sets, consider the idea of 'factors.' How many factors does:

p1^a1.p2^a2....pn^an

have if p1's are distinct primes. If the ai's are all 1, then the number is 2^n. In general, the answer is (a1+1)(a2+1)...(an+1) as David Nehme notes.

Oh, and note that your answer by hand was wrong, it should be 48, or 47 if you don't want to count the empty set.

Thomas Andrews
+4  A: 

It looks like you want to know how many sub-multi-sets have, say, 3 elements. The math for this gets very tricky, very quickly. The idea is that you want to add together all of the combinations of ways to get there. So you have C(3,4) = 4 ways of doing it with no duplicated elements. B can be repeated twice in C(1,3) = 3 ways. B can be repeated 3 times in 1 way. And C can be repeated twice in C(1,3) = 3 ways. For 11 total. (Your 10 you got by hand was wrong. Sorry.)

In general trying to do that logic is too hard. The simpler way to keep track of it is to write out a polynomial whose coefficients have the terms you want which you multiply out. For Pascal's triangle this is easy, the polynomial is (1+x)^n. (You can use repeated squaring to calculate this more efficiently.) In your case if an element is repeated twice you would have a (1+x+x^2) factor. 3 times would be (1+x+x^2+x^3). So your specific problem would be solved as follows:

(1 + x) (1 + x + x^2 + x^3) (1 + x + x^2) (1 + x)
  = (1 + 2x + 2x^2 + 2x^3 + x^4)(1 + 2x + 2x^2 + x^3)
  = 1    + 2x   + 2x^2 +  x^3 +
    2x   + 4x^2 + 4x^3 + 2x^4 +
    2x^2 + 4x^3 + 4x^4 + 2x^5 +
    2x^3 + 4x^4 + 4x^5 + 2x^6 +
    x^4  + 2x^5 + 2x^6 +  x^7
  = 1 + 4x + 8x^2 + 11x^3 + 11x^4 + 8x^5 + 4x^6 + x^7

If you want to produce those numbers in code, I would use the polynomial trick to organize your thinking and code. (You'd be working with arrays of coefficients.)