ADDERR: help and advice

ADDERR: help and advice


Consult the reference manual for further details and worked examples.
W.G.Bardsley, University of Manchester, U.K.
Summary

This program generates simulated experimental data from data files with exact data in order to perform sensitivity analysis as follows.

      x, y(x), s                 ... one variable (usual) case
     x1, x2, y(x1,x2), s         ... two variable case
     x1, x2, x3, y(x1,x2,x3), s  ... three variable case
After reading in such files you can then use this program in several ways to perturb the y-values by adding random errors. Finally, an output file can be produced with old x-values, new y-values, and s-values set equal to, or approximately equal to, the standard error of y. The output file is then ready for curve fitting.

The input file must have exact values for y as a function of x and arbitrary s, e.g. s = 1. Such files can be generated by program MAKDAT, or editors MAKFIL(1 var.), or MAKMAT(2/3 var.), and they are not altered in any way by program ADDERR. The output files you save contain the simulated experimental data.

The statistical theory of experimental error assumes that

     y-perturbed = y-exact + random error
and you have several choices for random errors.

Variance models

Three commonly encountered models for s^2 (i.e. V(y) the variance of y) are given special prominence.

  1. Constant variance
    V(y) = sigma^2
  2. Constant relative error
    V(y) = (fraction|y|)^2
  3. Mixed power law
    V(y) = sigma^2 + (coefficient|y|)^power
To simulate 1, 2 or 3, normally distributed pseudo random numbers with zero mean and appropriate variance are added to y-exact to give y-perturbed, then s-values can be set in several ways. Note that variance types 1 and 2 are really special cases of type 3, so the distinction is only for convenience.

This program also allows you to generate pseudo random errors from a variety of distributions, so you can explore the effect of uniform, exponential, normal, or Cauchy random errors.

Options for s (the standard deviation of y)

Weighted nonlinear least-squares regression analysis needs s to be the exact standard deviation of the y-value, but this can never be obtained in real life.

Three situations can occur.

  1. You assume a model for V(y) then substitute measured, i.e. perturbed (or best-fit y) in a formula for V(y). A special case would be assuming constant variance, i.e. all s = 1. Another special case would be assuming constant relative error where the variance is estimated as s^2 = alpha*y^2, or even s^2 = alpha*f(x)^2, where weights are adjusted at each iteration of the optimisation.
  2. You use replicates to estimate s at each fixed x and then set s = sample estimates of standard deviations.
  3. You perform experiments to estimate s then substitute for s = F(y), s = G(x) or smoothing, e.g. by program EDITFL.
All these options are provided by SIMFIT, but users who are uncertain about weighting should set all s = 1, as inappropriate weighting is worse than no weighting at all. This program allows you to set exact values for s, or to use s-values that would be typical of 1, 2 or 3. You can assume single measurements or simulate replicates, and you can generate outliers then re-calculate s if required.

Outliers

Outliers are y-values with errors that are improbably large, or are not from the same distribution as the other errors. To avoid generating such errors in the previous options, the normal distribution is truncated at 3 standard deviations.

There are several ways you can add arbitrary errors to data to simulate outliers. You can use a Cauchy distribution, or add outliers directly to the original or perturbed data. The s-values can be left alone, or replaced by replicates re-calculated from the perturbed data set with outliers.

Positions and signs of outliers can be selected randomly or by the user, and the magnitude can be fixed in several ways. The outlier can be a fixed % of the exact |y|-value, it can be a set amount, or you can input individual errors etc.

Note that the effect of outliers can be very dramatic, especially when they occur at critical positions in small data sets.