ansaurus

Question

Matlab test of independence

Answer 1

A:

The Statistics Toolbox will contain the functions you want.

EDIT: Or it doesn't

Here is someone who did it on their own. Chi^2 independence test

James 2010-07-28 18:31:15

@James...which function in the statistic toolbox? There are multiple chi square functions in there

Elpezmuerto 2010-07-28 18:33:11

Gah, doesn't look like they have that function in there. They only have goodness of fit, maybe i was thinking of maple...

James 2010-07-28 18:47:14

@James, Ya I looked in there before I posted and I am kinda surprised Matlab doesn't have anything, hence why I am here :)

Elpezmuerto 2010-07-28 18:48:50

For reference, the statistics package in Maple is probably easier to work with.

James 2010-07-28 18:48:58

@James, I went to the posted webpage too but the link where he posts the code is down. I have to use Matlab :p

Elpezmuerto 2010-07-28 18:50:39

@Elpezmuerto If i had the time i'd crack out my probability text and write you one, unfortunately i'm rather tied up at the moment. I'd say look around on the net, someones gotta have one.

James 2010-07-28 19:16:43

@James I spent a good while looking online and got nothing useful. I really hate to use stack overflow like this, but I am stuck.

Elpezmuerto 2010-07-28 19:19:25

@James I grabbed a stat book and wrote the code, thanks

Elpezmuerto 2010-07-28 20:09:23

Answer 2

+7 A:

Here is my own implementation that I use:

function [hNull pValue X2] = ChiSquareTest(o, alpha)
    %#  CHISQUARETEST  Pearson's Chi-Square test of independence
    %#
    %#    @param o          The Contignecy Table of the joint frequencies
    %#                      of the two events (attributes)
    %#    @param alpha      Significance level for the test
    %#    @return hNull     hNull = 1: null hypothesis accepted (independent)
    %#                      hNull = 0: null hypothesis rejected (dependent)
    %#    @return pValue    The p-value of the test (the prob of obtaining
    %#                      the observed frequencies under hNull)
    %#    @return X2        The value for the chi square statistic
    %#

    %# o:     observed frequency
    %# e:     expected frequency
    %# dof:   degree of freedom

    [r c] = size(o);
    dof = (r-1)*(c-1);

    %# e = (count(A=ai)*count(B=bi)) / N
    e = sum(o,2)*sum(o,1) / sum(o(:));

    %# [ sum_r [ sum_c ((o_ij-e_ij)^2/e_ij) ] ]
    X2 = sum(sum( (o-e).^2 ./ e ));

    %# p-value needed to reject hNull at the significance level with dof
    pValue = 1 - chi2cdf(X2, dof);
    hNull = (pValue > alpha);

    %# X2 value needed to reject hNull at the significance level with dof
    %#X2table = chi2inv(1-alpha, dof);
    %#hNull = (X2table > X2);

end

And an example to illustrate:

t = [3 999997 ; 10 999990]
[hNull pVal X2] = ChiSquareTest(t, 0.05)

hNull =
     1
pVal =
     0.052203
X2 =
       3.7693

Note that the results are different from yours because chisq.test performs a correction by default, according to ?chisq.test

correct: a logical indicating whether to apply continuity correction when computing the test statistic for 2x2 tables: one half is subtracted from all |O - E| differences.

Alternatively if you have the actual observations of the two events in question, you can use the CROSSTAB function which computes the contingency table and return the Chi2 and p-value measures:

X = randi([1 2],[1000 1]);
Y = randi([1 2],[1000 1]);
[t X2 pVal] = crosstab(X,Y)

t =
   229   247
   257   267
X2 =
     0.087581
pVal =
      0.76728

the equivalent in R would be:

chisq.test(X, Y, correct = FALSE)

Note: Both (MATLAB) approaches above require the Statistics Toolbox

Amro 2010-07-28 20:06:04

Ah, ninja'd. +1 for the code!

Jonas 2010-07-28 20:07:38

@Amro, How would you implement the `correct = true` for matlab?

Elpezmuerto 2010-07-28 20:29:12

well according to the R documentation just subtract half from |O-E|, so use the following instead: `X2 = sum(sum( (abs(o-e)-0.5).^2 ./ e ));` but you will have to manually check that this correction is only applied for 2x2 tables: http://en.wikipedia.org/wiki/Yates%27_correction_for_continuity

Amro 2010-07-28 20:37:33

@Amro...High Five! The correct works!

Elpezmuerto 2010-07-28 20:53:10

To calculate the alternative likelihood-ratio G2 statistic, use G2 = 2*sum(sum(o .* log(o./e)));

Elpezmuerto 2010-08-25 15:37:23

@Elpezmuerto: thanks for the tip, for those interested here's the Wikipedia entry on G-test: http://en.wikipedia.org/wiki/G-test

Amro 2010-08-25 21:47:38

Answer 3

A:

This function will test for independence using the Pearson chi-squared statistic and the Likelihood-Ratio statistic, along with calculating residuals. I know this can be vectorized further, but I am trying to show the math for each step.

function independenceTest(data)
df = (size(data,1)-1)*(size(data,2)-1); % Mean Degrees of Freedom
sd = sqrt(2*df);                        % Standard Deviation

u         = nan(size(data)); % Estimated expected frequencies 
p         = nan(size(data)); % Values used to calculate chi-square
lr        = nan(size(data)); % Values used to calculate likelihood-ratio
residuals = nan(size(data)); % Residuals

rowTotals    = sum(data,1);
colTotals    = sum(data,2);
overallTotal = sum(rowTotals);

%% Calculate estimated expected frequencies
for r=1:1:size(data,1)
    for c=1:1:size(data,2)
        u(r,c) = (rowTotals(c) * colTotals(r)) / overallTotal;
    end
end

%% Calculate chi-squared statistic
for r=1:1:size(data,1)
    for c=1:1:size(data,2)
        p(r,c) = (data(r,c) - u(r,c))^2 / u(r,c);
    end
end
chi = sum(sum(p)); % Chi-square statistic

%% Calculate likelihood-ratio statistic
for r=1:1:size(data,1)
    for c=1:1:size(data,2)
        lr(r,c) = data(r,c) * log(data(r,c) / u(r,c));
    end
end
G = 2 * sum(sum(lr)); % Likelihood-Ratio statisitc

%% Calculate residuals
for r=1:1:size(data,1)
    for c=1:1:size(data,2)
        numerator   = data(r,c) - u(r,c);
        denominator = sqrt(u(r,c) * (1 - colTotals(r)/overallTotal) * (1 - rowTotals(c)/overallTotal));
        residuals(r,c) = numerator / denominator;
    end
end

Elpezmuerto 2010-07-28 20:17:36

Check out @Amro's code. He does the same calculations without looping, and thus more concisely.

Jonas 2010-07-28 20:19:57

ansaurus

tags:

views:

answers:

Matlab test of independence

related questions