ansaurus

Question

Measures of association in R -- Kendall's tau-b and tau-c

Answer 1

+5 A:

Have you tried the function cor? There is a method you can set to "kendall" (also options for "pearson" and"spearman" if needed), not sure if that covers all the standard errors you are looking for but it should get you started.

Stedy 2010-04-01 04:06:45

Answer 2

+5 A:

Just to expand of Stedy's answer... cor(x,y,method="kendall") will give you the correlation, cor.test(x,y,method="kendall") will give you a p-value and CI.

Also, take a look at the Kendall package, which provides a function which claims a better approximation.

> library(Kendall)
> Kendall(x,y)

There is also the cor.matrix function in the Deducer package for nice printing:

> library(Deducer)
> cor.matrix(variables=d(mpg,hp,wt),,
+ data=mtcars,
+ test=cor.test,
+ method='kendall',
+ alternative="two.sided",exact=F)

                          Kendall's rank correlation tau                          

           mpg     hp      wt     
mpg    cor 1       -0.7428 -0.7278
         N 32      32      32     
    stat**         -5.871  -5.798 
   p-value         0.0000  0.0000 
----------                        
 hp    cor -0.7428 1       0.6113 
         N 32      32      32     
    stat** -5.871          4.845  
   p-value 0.0000          0.0000 
----------                        
 wt    cor -0.7278 0.6113  1      
         N 32      32      32     
    stat** -5.798  4.845          
   p-value 0.0000  0.0000         
----------                        
    ** z
    HA: two.sided

Ian Fellows 2010-04-01 06:07:25

Answer 3

A:

There's a routine for Kendall's coefficient in psych package with corr.test(x, method = "kendall"). This function can be applied on data.frame, and also displays p-values for each pair of variables. I guess it displays tau-a coefficient. Only downside is that it's actually a wrapper for cor() function.

Wikipedia has good reference on Kendall's coefficient, and check this link out. Try sos package and findFn() function. I got bunch of stuff when querying "tau a" and tau b, but both ended with no luck. And search results seem to merge to Kendall package, as @Ian suggested.

aL3xa 2010-04-01 11:42:25

Answer 4

+7 A:

There are three Kendal tau statistics (tau-a, tau-b, and tau-c) and they're not interchangeable--none of the answers posted so far deal with the last two, which again, is the subject of the OP's question.

I was unable to find functions to calculate tau-b or tau-c, either in the standard library (stat et al.) or in any of the Packages--so that's the short answer to the OP's Question.

In any event, it's easy to roll your own.

Writing R functions for the Kendall statistics is just a matter of translating these equations into code:

Kendall_tau_a = (P - Q) / (n*(n-1)/2)

Kendall_tau_b = (P - Q) / ( (P + Q + Y0)*(P + Q + X0) )^0.5 

Kendall_tau_c = (P-Q)*( (2*m)/n^2*(m-1) )

tau-a: (equal to concordant minus discordant pairs divided by a factor to account for total number of pairs, or sample size).

tau-b: explicit accounting for ties (ie, both members of the data pair have the same value (equal to concordant minus discordant pairs divided by a term representing the geometric mean between the number of pairs not tied on x (X0) and the number not tied on y (Y0).

tau-c: larger-table variant also optimized for non-square tables (equal to concordant minus discordant pairs multiplied by a factor that adjusts for table size).

# number of concordant pairs 
P = function(t) {   
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply(function(r, c){sum(t[(r_ndx > r) & (c_ndx > c)])}, r = r_ndx, c = c_ndx))}

# number of discordant pairs
Q = function(t) {r_ndx = row(t)c_ndx = col(t)
  sum(t * mapply(function(r, c){sum(t[(r_ndx > r) & (c_ndx < c)])}, r = r_ndx, c = c_ndx))}

# sample size (total number of pairs)
n = n = sum(t)

# the lesser of number of rows or columns
m = min(dim(t))

These four parameters, P, Q, m, & n (plus X0 & Y0 for tau-b) are all you need to calculate tau-a, tau-b, and tau-c.

For instance, the code for tau-c is:

kendall_tau_c = function(t){
  t = as.matrix(t) 
  m = min(dim(t)) n = sum(t) ks_tauc = (m*2 * (P(t)-Q(t))) / ((n^2)*(m-1))
}

(W/r/t where Kendall's tau statistics fit with the other statistical tests used in categorical data analysis: all three Kendall tau statistics, along with Goodman's and Kruskal's 'gamma' are for correlation of ordinal and binary data, and so are counterparts to the simple chi-square and Fisher's exact tests, both of which are (as far as i know) suitable only for nominal data. (As i just learned) the Kendall tau statistics are more sophisticated alternatives to the gamma statistic (just P-Q), which is what i have been using until i read this Question and researched the subject.)

example:

cpa_group = c(4, 2, 4, 3, 2, 2, 3, 2, 1, 5, 5, 1)
revenue_per_customer_group = c(3, 3, 1, 3, 4, 4, 4, 3, 5, 3, 2, 2)
weight = c(1, 3, 3, 2, 2, 4, 0, 4, 3, 0, 1, 1)

dfx = data.frame(CPA=cpa_group, LCV=revenue_per_customer_group, freq=weight)

# reshape data frame so 1 row for each event 
# (prediate step to create contingency table)
dfx2 = data.frame( lapply(dfx, function(x){rep(x, dfx$freq)}))

t = xtabs(~ revenue + cpa, dfx)

kc = kendall_tau_c(t)

# returns -0.35

doug 2010-04-02 16:41:58

ansaurus

tags:

views:

answers:

Measures of association in R -- Kendall's tau-b and tau-c

related questions