SIMSTAT: Data Exploration
Exhaustive analysis: arbitrary vector
This is a very simple yet powerful item for exploring your data.
You should always choose this option when investigating a sample, since
it calculates all the summary statistics, displays a histogram and
cdf and tests (by Shapiro-Wilks) to see if a normal distribution is
appropriate. The best way
to find out how to use this option is to read in a Simfit test file,
such as
normal.tf1 which just has some random numbers from a normal distribution.
This option can also be used to prepare pdf, cdf and (1 - cdf) files
for plotting
or for fitting parametric statistical distribution models to samples,
e.g. survival data.
Note that when a sample has been read in for exhaustive analysis it
is possible to choose a run and sign test analysis. This is very useful
if the data is normalised so as to be distributed either side of a zero
median value but in random order, as with residuals.
Consult program NORMAL for more details about
the normal distribution and program RSTEST to find out about run and sign
tests.
Exhaustive analysis: arbitrary matrix
This option is provided so that you can investigate the properties
of the individual rows and columns in a data matrix, or a library file
where all columns have the same length.
Read in a matrix test file such as matrix.tf2 and observe how the
overall column and row statistics can be calculated, but note that
an exhaustive analysis can also be done on any selected row or column.
This option can also plot a matrix as a 2-D barchart (rows are cases
and columns are variables), as a 3-D barchart (a(i,j) values are
heights of bars at x = i, y = j) or as a box and whisker plot
(medians and quartiles calculated for each column), and it can also
calculate sums of squares and cross products, variance-covariance and
correlation matrices. A maximum likilhood test is provided to test
for sphericity, i.e. to see if the covariance matrix of the
untransformed data is a multiple of the identity matrix.
Another useful graphical option is to plot the
rows as functions of the columns in the form of a scattergram, using
a different symbol for each row and joining each row by dotted lines
for clarity when matrices have 12 or fewer rows.
Exhaustive analysis: multivariate normal matrix
This option is intended for preliminary investigations of a data set
before proceeding to use techniques like MANOVA
which rely on multivariate normality. A diagnostic plot can be displayed
which, for large samples should be linear, the covariance matrix along
with its inverse and eigenvalues or determinant can be calculated, there
are tests for compound symmetry or sphericity, and a Hotelling
T-squared test can be done for hypotheses concerning the mean vector.
All possible comparisons
This option compares all possible pairs of samples in a library file
referencing vectors only, but not necessarily of the same length.
Read in the test file npcorr.tfl to see how a t, Mann-Whitney U
and Kolmogorov-Smirnov 2-sample test is applied to all possible
pairs. It is very useful when exploring a set of files in a
library file, but please remember the Bonferroni principle when scanning
the results. The p values can be regarded as providing a measure
of the difference between any two pairs (small p indicating a large
difference) even if p is not less than alpha/n. If the sample sizes
are comparable, and it is assumed that the samples are normal with the
same variance, then 1-way ANOVA followed by a Tukey-Q test should be
done. The procedure will, of course, fail with singular data sets,
e.g. constant vectors.
Back to Help Menu or End Help