SIMSTAT: Data Exploration

Exhaustive analysis: arbitrary vector
This is a very simple yet powerful item for exploring your data. You should always choose this option when investigating a sample, since it calculates all the summary statistics, displays a histogram and cdf and tests (by Shapiro-Wilks) to see if a normal distribution is appropriate. The best way to find out how to use this option is to read in a Simfit test file, such as normal.tf1 which just has some random numbers from a normal distribution. This option can also be used to prepare pdf, cdf and (1 - cdf) files for plotting or for fitting parametric statistical distribution models to samples, e.g. survival data.
Note that when a sample has been read in for exhaustive analysis it is possible to choose a run and sign test analysis. This is very useful if the data is normalised so as to be distributed either side of a zero median value but in random order, as with residuals.
Consult program NORMAL for more details about the normal distribution and program RSTEST to find out about run and sign tests.

Exhaustive analysis: arbitrary matrix
This option is provided so that you can investigate the properties of the individual rows and columns in a data matrix, or a library file where all columns have the same length. Read in a matrix test file such as matrix.tf2 and observe how the overall column and row statistics can be calculated, but note that an exhaustive analysis can also be done on any selected row or column. This option can also plot a matrix as a 2-D barchart (rows are cases and columns are variables), as a 3-D barchart (a(i,j) values are heights of bars at x = i, y = j) or as a box and whisker plot (medians and quartiles calculated for each column), and it can also calculate sums of squares and cross products, variance-covariance and correlation matrices. A maximum likilhood test is provided to test for sphericity, i.e. to see if the covariance matrix of the untransformed data is a multiple of the identity matrix. Another useful graphical option is to plot the rows as functions of the columns in the form of a scattergram, using a different symbol for each row and joining each row by dotted lines for clarity when matrices have 12 or fewer rows.

Exhaustive analysis: multivariate normal matrix
This option is intended for preliminary investigations of a data set before proceeding to use techniques like MANOVA which rely on multivariate normality. A diagnostic plot can be displayed which, for large samples should be linear, the covariance matrix along with its inverse and eigenvalues or determinant can be calculated, there are tests for compound symmetry or sphericity, and a Hotelling T-squared test can be done for hypotheses concerning the mean vector.

All possible comparisons
This option compares all possible pairs of samples in a library file referencing vectors only, but not necessarily of the same length. Read in the test file npcorr.tfl to see how a t, Mann-Whitney U and Kolmogorov-Smirnov 2-sample test is applied to all possible pairs. It is very useful when exploring a set of files in a library file, but please remember the Bonferroni principle when scanning the results. The p values can be regarded as providing a measure of the difference between any two pairs (small p indicating a large difference) even if p is not less than alpha/n. If the sample sizes are comparable, and it is assumed that the samples are normal with the same variance, then 1-way ANOVA followed by a Tukey-Q test should be done. The procedure will, of course, fail with singular data sets, e.g. constant vectors.

Back to Help Menu or End Help