views:

2345

answers:

2

Hi all,

I was tasked with counting the number of distinct strings in a column in excel. A quick Google search later yielded the following formula found here:

=SUM(IF(FREQUENCY(MATCH(B2:B10,B2:B10,0),MATCH(B2:B10,B2:B10,0))>0,1))

Consider the data:

A B C D A B E C

Now, the match function would return an array (as the first argument is an array):

1 2 3 4 1 2 7 3

So far so good. What I don't understand is how the FREQUENCY function works here, in particular how it treats bins that are replicated (for example the bin 1 is replicated in the above data). The result of the frequency function is:

2 2 2 1 0 0 1 0 0

Thanks

Taras

A: 

EDIT: I realised how your solution was working - amended to reflect this.

FREQUENCY is searching for entries from your bins in the search array. Here's how it's working:

Search array: 1 2 3 4 1 2 7 3

Bins: 1 2 3 4 1 2 7 3

Bin 1 => there are two 1's => 2

Bin 2 => there are two 2's => 2

Bin 3 => there are two 3's => 2

Bin 4 => there is one 4 => 1

Bin 1 repeated => 1 already counted => 0

Bin 2 repeated => 2 already counted => 0

Bin 7 => there is one 7 => 1

Bin 3 repeated => 3 already counted => 0

It almost seems that the solution is exploiting a FREQUENCY quirk, that is, it won't count the same bin twice, because you might expect the second bin with value 1 to be non-zero as well. But that's how it works -- as it will only count the number of occurrences for the first bin and not a duplicate bin, the number of rows with a value greater than zero will give you the number of distinct entries.

Here's an alternative approach which you might find useful. it can be used to calculate the number of distinct values:

Suppose your string range is B2:B10. Fill down in another column

=(MATCH(B2,B$2:B2,1)-(ROW(B2)-ROW(B$2)))>0

The row should change as you copy down, so the second row should be, for example:

=(MATCH(B3,B$2:B3,1)-(ROW(B3)-ROW(B$2)))>0

This is signalling TRUE if the current row contains the first instance of a string (if you give it a couple of minutes you should be able to work out what it's doing). Therefore, if you count the number of TRUEs with COUNTIF() then you should get the number of distinct strings.

Joel Goodwin
That's what I thought the frequency function was doing, effectively ignoring duplicated bins. I was hoping that someone could say for certainty that this is the documented behaviour of the function, but it looks like it's a well known quirk. Thanks for the answer!
Taras
A: 

You could use a vba routine:

Sub Uniques()

    Dim rng As Range
    Dim c As Range
    Dim clnUnique As New Collection

    Set rng = Range("A1:A8")

    On Error Resume Next
    For Each c In rng
        clnUnique.Add c.Value, CStr(c.Value)
    Next c
    On Error GoTo 0

    MsgBox "Number of unique values = " & clnUnique.Count

End Sub

If you need to display the unique results, you can just loop through the collection and write the values on your worksheet.

Hey Dendarii,I was actually asking how the frequency function worked.. your routine is appreciated though!
Taras
You're quite right! Looks like Joel gave the answer to your actual question :)