CALCURVE: help and advice

CALCURVE: help and advice


Consult the reference manual and tutorials for further details and worked examples.
W.G.Bardsley, University of Manchester, U.K.
General information

This program is for data smoothing and calibration after measuring y = f(x) at known values of x. It creates a best fit spline curve by minimising the weighted sum of squared residuals, WSSQ, or the unweighted sum of squares, SSQ. You can use replicates to estimate s, the standard deviation of y at distinct x, or you can use optional substitutions, such as s = 1 for unweighted fitting. Often, for instance, you will have measured V(y), a vector of y-signals under the same conditions as the calibration data, and wish to use these to predict x given y, possibly with approximate 95% confidence limits.
The program requires the following values.

     n    : total number of calibration measurements
     x(i) : calibration settings x (i = 1 to n)
     y(i) : calibration measurements y (i = 1 to n)
     s(i) : estimated std. dev. y (or else s(i) = 1) (i = 1 to n)
     m    : number of prediction/evaluation measurements
     v(i) : x-evaluation or y-prediction values v (i = 1 to m)
The s-values are for weights (w = 1/s^2) and 95% confidence limits.
Use MAKFIL/EDITFL to prepare/edit/weight calibration files.
Use MAKMAT/EDITMT to prepare/edit prediction files.
With small data sets you can type values in directly.

Spline knot density

Interior spline knots are selected points lying inside the range set by the x-data where the sections of the best-fit cubic spline curves join smoothly together. The number of knots used depends on the number of distinct data x-values. If you use too many spline knots, the calibration curve may be too flexible, allowing turning points.

The default spline type uses cross validation with a knot at each distinct x-value and an algorithm to apply tension by calculating a smoothing parameter. The resulting curve will be an attempt to balance goodness of fit with smoothness as estimated using the second derivatives at the knots. With noisy data or where the calibration data supplied causes numerical difficulties, an IFAIL message will be issued and this could result in failure to create a good standard curve. In such cases you should change the configuration of program CACURVE and select to fit a simple weighted least squares spline curve, where the number of knots chosen is sufficient to create a good standard curve.

In such cases you should use a sparse, medium, or dense option to control the smoothing by varying the number of interior knots. With noisy data the dense or solid options may generate wavy standard curves, when the sparse or medium knot densities should be preferred.
To calibrate with lines, quadratics, etc. use program POLNOM.
To calibrate with your own model, etc. use program QNFIT.
For bioassay, a GLM method is also provided by program SIMSTAT, and there are many other SIMFIT options for estimating LD50, etc.
Consult the tutorial documents about spline smoothing and use program SPLINE to understand these issues.

Using log(x) or transforming x to log(x) internally

In this program log(x) means logarithm to base 10.
You can only expect a good calibration curve if the settings for x are sufficiently dense and uniformly spaced over the range of x to prevent the best-fit curve from oscillating. Failure to present the program with appropriate x data can lead to turning points and ambiguous prediction of x if the calibration curve fluctuates too much about the data. For instance your data might have been prepared by a process involving serial dilution of a stock solution, when log(x) is uniformly spaced but x will be geometrically spaced. Then it would be better to transform x into log(x) by EDITMT before input to the program. You could also try log(x) if your data approach a horizontal asymptote.

There are two distinct techniques for using log(x) instead of x: you can input your data as log(x) directly, but you can also transform to log(x) interactively. If you do this internal transformation, then spline or graphical coordinates saved will be log(x), but predictions from y-input and the calibration curve will be x-values.

95% confidence limits

If a correct deterministic equation is fitted to calibration data then the sum of squares and best-fit equation can yield exact confidence limits. This approach is not possible with splines since, due to high flexibility and local properties depending on knot placement, sums of squares can be made to become arbitrarily small. Also constant variance is often not appropriate if y covers a large range. This program uses the s values supplied or else w from a weighting option (or SSQ if s = 1, or w = 1) to construct a local variance estimate. The program uses this estimate and constructs 95% confidence curves which are then used to predict confidence limits for x given y and y given x. The confidence limits will only be approximate and should be interpreted with a fair degree of restraint and common sense. Remember that garbage-in equals garbage-out. If you supply ridiculous w values for weighting you must not be surprised to find the program giving equally ridiculous confidence limits. As the values you input or set for s, or %|y| increase, confidence limits will expand.

Expert mode

This technique is used when you have a set of data for a standard curve that you want to use repeatedly for predicting x given y. To achieve this, additional values are added to the standard curve data file which will then always result in the same standard curve being generated from the same data file. To understand this you should compare the following two test files provided for program CALCURVE.

calcurve.tf1 demonstrates the EXPERT mode flags added to the end of the data
calcurve.tf2 is the same data without weighting and EXPERT mode flags.
Expert mode is only activated locally when additional data are added such as with calcurve.tf1, and the standard default is restored when the next standard curve without additional parameters is input.

When you have used the program for a while you will realise that there are just eight options involved with the standard weighted least squares method and you will have a good idea which you personally require. If you choose expert mode you can paste a trailer onto the end of your files with an integer j, then I1,...,I8 and a number ERR as follows.

      ...   ...   ...,
     x(n), y(n), s(n)                         (last line of data)
     j                                        (no. of text lines)
     I1, I2, I3, I4, I5, I6, I7, I8, ERR      (control  settings)
       ...                                    (j - 1  text lines)
I1 etc. will then over-ride the default settings. The actual values for I1, I2, ...,I8 are those entered from the menus and ERR is percentage coefficient of variation of y as summarised shortly.

Remember that, if you decide to use the expert mode to run this program, you are living dangerously unless you are quite sure what values to substitute for I1, I2,..., I8 and ERR. One advantage of expert mode, for instance, is when you have found a set of control parameters that work well with your data and you wish to install them, pasted to the end of the calibration files, to use as defaults. However, the great advantage of expert mode is when you have one special data set that you want to use repeatedly to predict x given y, but being quite certain that the same standard curve is always generated.

Options in expert mode

To use this mode you will have to prepare calibration input files with title and data followed by the values you choose to substitute for the integers I1, I2, ...,I8 and ERR. Print or browse the test file calcurve.tf1 for an example which has the following additional line after the data

     2, 2, 2, 1, 2, 3, 2, 2, 5.0 
     
The meaning of these values will now be explained.
I1: data input mode   1:=File/File 2:=File/Keyb 3:Keyb/Keyb
I2: internal coord.   1:=x         2:=log(x)
I3: spline density    1:=Sparse    2:=Medium    3:=Dense    4:=Solid
I4: weights           1:=1/s^2     2:=1/%|y|^2  3:=1
I5: graph coord.      1:=x,y       2:=add 95%cl
I6: 95% con. lim.     1:=None      2:=Slack     3:=Medium   4:=Tight
I7: Reserved for future use
I8: Reserved for future use
ERR: If I4=1 ERR is not used as the s values supplied will be used.
     If I4=2 or I4=3 set ERR=percentage coefficient of variation i.e.CV%
     where CV% = 100(sample standard deviation of y)/|y|
     ERR is then used to estimate approximate 95% confidence limits for 
     plotting or predicting x from y or y from x.
In expert mode I1, I2,..,I8, ERR over-ride the defaults.