BINOMIAL: help and advice

BINOMIAL: help and advice


Consult the reference manual for further details and worked examples.
W.G.Bardsley, University of Manchester, U.K.
The binomial distribution

If there are x successes in N (independent) Bernoulli trials, each with probability of success equal to p, the random variable X is said to be distributed as a binomial random variable with parameters N and p, i.e. X is distributed b(N,p).
Clearly N >= 1, 0 =< p =< 1 and 0 =< X =< N.
If the probability mass function is pmf(x), then the cdf(x), or cumulative distribution function is the sum of pmf(u) for u = 0 to u = x.

Upper and lower tail probabilities are defined for an alpha (0 =< alpha =< 1) and some X-value (say X = x-critical) by

     lower tail probability = P(X =< x-critical) = 1 - alpha
     upper tail probability = P(X > x-critical) = alpha,
where P(E) = probability of event E in the sample space. Note that, since X is an integer, the cdf is a step function. Sometimes percentage points 100(1 - alpha)% or 100alpha% are preferred.

Option 1. Initialising N, P, lambda
This is selected to initialise or change N, p, or lambda. You input N and p for all subsequent calculations except options 5, 6, 7 and 8 where N and x are input directly.

Option 2. Calculating point probabilities
You input x-values and the program calculates probability mass functions, i.e. pmf(x) values.

Option 3. Calculating cumulative probabilities
You input x-values and the program calculates cumulative distribution functions, i.e. cdf(x) values.

Option 4. Calculating critical values
You input significance levels alpha and obtain A,B-values where P(X > A) >= alpha, P(X > B) =< alpha (if possible), sothat A and B define inverses of the binomial distribution.

Option 5. Calculating binomial coefficients
The binomial coefficient N-choose-X or NCX(x) is defined as

              NCX = N!/[x!(N - x)!] for 0 =< x =< N
         where N! = 1*2*3*...*N
           NCX(0) = NCX(N) = 1
       and pmf(x) = NCX(x)*[p^x]*[(1 - p)^(N - x)].
     
This program calculates NCX(x) and the sum of NCX(k) for 0 =< k =< x and any x =< N.

Option 6. Estimating p with confidence limits
This is for when you have x events (successes) in N trials. You supply pairs of N and x values and the program will then estimate p with an exact non-central 95% confidence range. Alternatively you can select 90% or 99% ranges.

Option 7. Estimating parameters
You input measured values x1, x2, ..., xM and the program calculates a sample estimate for p (i.e. p-hat) and performs a chi-square test according to the null hypothesis

        H0: X is a binomial random variable with parameters N and p
        
(or else N and p-hat). You can decide how many partitions or bins are used in the chi-square test by controlling the minimum number for observed and expected bin sizes. Use MAKMAT/EDITMT to prepare/edit the X input vector.

Option 8. Analysis of proportions and Meta Analysis
This arises when there are many estimates of p(i) = x(i)/N(i) and it is necessary to test if all the p(i) are the same. Special procedures are possible for sets of 2 by 2 contingency tables. Analysis may be inaccurate if x(i) = 0 or x(i) = N(i), so you may wish to pool to remove such singular cases before using option 8. When you input such a set of x(i), N(i) values, the overall p and confidence limits are calculated and likelihood ratio and chi-square tests are performed to test H0: p(i) are identical. Relative risks, Odds, Log Odds Ratios, etc. can be plotted.

Testing for p(t)
You may wish to explore whether p varies systematically, as a function of some control variable t (e.g. space, time, etc.). To do this, you add a third column of t values to your sample and the program will plot p(t) with assorted additional lines to test for significant trends. If the t variable is arbitrary (e.g. for spacing) the program will generate successive integer values from a starting value. You can also create a curve fit file then explore parametric models for p(t) by nonlinear regression or generalised interactive modelling.

Option 9. Analysis of trinomial proportions
This arises when there are many estimates of x, y and z where
x = the number of X-type outcomes (e.g. male hatchlings)
y = the number of Y-type outcomes (e.g. female hatchlings)
z = the number of Z-type outcomes (e.g. infertile eggs), and
N = x + y + z is the total no. observations (e.g. eggs).
There are of course three probability estimates, namely px-hat = x/N, py-hat = y/N, and pz-hat = z/N, but only two are independent since
px-hat + py-hat + pz-hat = 1.

Testing for changes in px, py, pz (e.g. as functions of t)
You input data in the form of rows of x, y, N (!not x, y, z!) and the program does a chi-square test and plots selected x,y parameter-pair confidence regions at set %significance levels by contouring the X-transpose-A-X chi-square function(not the approximate ellipse). Disjoint regions indicate significantly different parameter-pair estimates. Note that some overlap is still consistent with statistically significant differences.

Option 10. Power and sample size
This allows you to explore the effect of sample size on the precision with which binomial parameters can be estimated, or the sample size needed to differentiate between proportions.

Option 11. Changing the significance levels
If for some reason you do not want the default 95% limits, you can select 90% or 99% limits.

Option 12 : The Poisson distribution
When N is large and p small the binomial distribution can be approximated by a Poisson distribution with lambda = Np and

         pdf = [(lambda)^x]exp(-lambda)/x!,
     
which is used for the analysis of counting data, e.g. number of cells in apoptosis in a microscope field. You can input x = no. observed then estimate lambda and confidence limits using a chi-square variable, and there are other Poisson options.

Test files for this program

binomial.tf1
50 random numbers from a binomial distribution with N = 10 and p = 0.5. Use option 1 to set N and p then read in this file to test if the numbers are consistent with b(N,p).

binomial.tf2
A set of x, N values to use for analysis of proportions.

binomial.tf3
A set of x, N, t values to use for analysis of proportions with an indexing parameter (e.g. variation of proportions with time or some other treatment).

meta.tf1
A set of 2 by contingency tables for meta analysis.

trinom.tf1
Data for x, y, z illustrating the effect of sample size on the confidence limits for determining trinomial parameters. trinom.tf2 Data for x, y, z illustrating the technique for identifying statistically significant changes in trinomial proportions by observing disjoint regions amongst the set of confidence contours.