CHISQD: help and advice

CHISQD: help and advice


Consult the reference manual for further details and worked examples.
W.G.Bardsley, University of Manchester, U.K.
The chi-square distribution

If Y is a normal random variable with mean mu and standard deviation sigma, then the transformed random variable

     Z = (Y - mu)/sigma
is unit normal, i.e. Z is Normal(0,1). A sum of N of these independent Z values squared, such as
     X = Z1^2 + Z2^2 + ... + ZN^2,
is chi-square distributed with N degrees of freedom, i.e. X is distributed chi^2(N).
Clearly N >= 1 and X >= 0.
If the probability density function is pdf(x), the cdf(x) or cumulative distribution function, is the integral of pdf(u) from u = 0 to u = x.

Upper and lower tail probabilities are defined for an alpha (0 =< alpha =< 1) and some x-value (say x = x-critical) by

     lower tail probability = P(X =< x-critical) = 1 - alpha
     upper tail probability = P(X >= x-critical) = alpha
where P(E) = probability of event E in the sample space. Sometimes percentage points 100(1 - alpha)% or 100*alpha% are preferred.

Option 1. Initialising the degrees of freedom
You input N for all of the subsequent calculations except for chi-square and Fisher exact tests (Options 6 and 7). Option 1 is selected to change N.

Option 2. Calculating point probabilities
You input x-values and the program calculates probability density functions, i.e. pdf(x) values.

Option 3. Calculating cumulative probabilities
You input x-values and the program calculates cumulative distribution functions, i.e. cdf(x) values.

Option 4. Calculating critical values
You input significance levels alpha and obtain x-critical values such that P(X >= x-critical) = alpha, i.e. inverses of the chi-square distribution.

Option 5. Test for a chi-square distribution
You input measured values V1, V2, ..., VM and the program calculates transforms W1, W2, ... , WM where Wi = cdf(Vi) under the null hypothesis that V is distributed chi^2(N). Kolmogorov-Smirnov and chi-square tests are performed on the transforms to test the hypothesis that W is uniformly distributed on the interval (0,1), which is equivalent to the null hypothesis.
Use MAKMAT/EDITMT to prepare/edit this V data vector.

Option 6. Chi-square test for observed and expected frequencies
Suppose you have observed M frequencies O1, O2, ..., OM and for the same number of M partitions (bins) you calculate the expected frequencies E1, E2, ... , EM. Then S defined by

     S =  (O1 - E1)^2/E1 + (O2 - E2)^2/E2 +...+ (OM - EM)^2/EM
     
is a measure of goodness of fit between the O and E values. In fact, if k parameters are estimated from the O values and used to calculate the E values, then S is approximately chi-square distributed with M - k - 1 degrees of freedom.
Usually the sample size should be 4 or 5 times the number of bins and the E values should not be too small (>= 5 ?).
Use MAKMAT/EDITMT to prepare/edit vectors of observed values and vectors of expected values for input to Option 6. The no. of observed values must equal the no. of expected values and expected values must be nonzero.

Option 7. Analysis of contingency tables
A contingency table is an M x N matrix of frequencies F(i,j) which can be tested for association using a chi-square test with (M - 1)(N - 1) degrees of freedom. The input data frequency matrix should be prepared in a file with a title, then a header with the no. of rows and columns followed by the matrix of frequencies.Programs MAKMAT/EDITMT should be used to prepare/edit such files.

This program first produces a reduced matrix (by discarding rows or columns with zero sums). Then it attempts to shrink further until all expected frequencies are >= 1 if possible.
There are then just two cases.

  1. The dimension of the reduced matrix is 2 by 2 and the sum of all frequencies is =< 40. The Fisher exact test gives p(i) values and a chi-square test is also done.
  2. Only a chi-square test is performed.

Test files to use with this program

chisqd.tf1
Use option 1 to set the number of degrees of freedom equal to 10 then read in these pseudo random numbers into option 5 and see if they are consistent with a chi-square distribution.
chisqd.tf2 and chisqd.tf3
These are columns of observed and expected values that can be used to see how option 6 works to test if a set of observed values are consistent with a corresponding expected set.
chisqd.tf4
This is an example of a data set that can be used with option 7 to perform a contingency table analysis by the chi-square and Fisher exact procedures.
chisqd.tf5
Another contingency table for option 7 but now there are too many elements for a Fisher exact test.