SPLINE: help and advice

SPLINE: help and advice


Consult the reference manual for further details and worked examples.
W.G.Bardsley, University of Manchester, U.K.
Ways to use this program

There are three distinct ways to use this program.

  1. A data file (like compare.tf1) can be input and a piecewise cubic spline can be defined by weighted least squares curve fitting.

    This option is selected to search for trends in a data set, to simply fit a smooth curve for display purposes, or to fit a spline for calibration. A spline defined by this procedure overwrites any spline input from file and becomes the current spline.

  2. A spline file (like spline.tf1) can be input to define a set of knots and coefficients.

    This option is selected when a spline calculated previously is required for retrospective use. A spline defined by this procedure overwrites any curve fitting spline and becomes the current spline.

  3. A spline file can be saved to file.

    This option is selected when a satisfactory spline has been fitted to data, and it is wished to archive the knots and coefficients for future use. It is the only way to save a spline so that it can be re-used as a mathematical model in subsequent calculations.

Spline Curves

Piecewise cubic spline curves are useful for smoothing noisy data sets to demonstrate trends, or interpolating between fixed points when the model has no known mathematical form, or where it is more convenient to use an empirical model. The splines are defined by sets of coefficients for each section between adjacent knots, where the coefficients will depend upon the knot positions and smoothing criteria used to select the particular spline. Splines can be used for plotting, or estimating areas and derivatives, as well as arc length s over an interval (A,B), defined as

s = integral of sqrt[1 + (dy/dx)^2]dx

and total absolute curvature K, defined as

K = integral of |(d^2y/dx^2)/(1 + (dy/dx)^2)^(3/2)|ds, or
K = integral of |(d^2y/dx^2)/(1 + (dy/dx)^2)|dx.

The arc length s indicates the overall length of a spline curve fitted to a data set, while the total absolute curvature K measures the amount of wave nature (as the overall angle turned in radians) in a fitted curve. Both s and K, or alternatively the average absolute curvature K/s, are extremely useful for summarising the amount of oscillatory behaviour in a best fit curve. This is because, when constructing a standard curve to use for predicting x given y, great care is needed to find a well fitting but smooth, non-wavy spline. Once you have such a spline fitted, you can save a spline file with the spline knots and coefficients. You can use this program with a splines from files or fitting to plot, calculate areas, calibrate, etc., just as if the spline was available as a deterministic mathematical model.

Curve fitting files

Data files (like compare.tf1) from program MAKFIL, etc must be formatted with values of x, and y, or else x, y and s, where s can be the standard error of y or s = 1 for unweighted fitting.

Fitting splines

There are four methods provided for fitting, as follows.

  1. Fixed knots

    The program calculates the coefficients of a best-fit spline that minimises the weighted sum of squares

    WSSQ = Summation of {[y(i) - f(x(i)]/s(i)}^2 for i = 1, 2,..., n

    subject only to the fixed spline knot positions.

    This is the method used by program CALCURVE and it allows users to use predefined knots, e.g. from a spline file, or to specify a fixed number of equally spaced knots. If predefined knots are used, then the new coefficients will overwrite any existing coefficients. The number of knots should be kept to a minimum, otherwise over-fitting will result.

    Note that, to use the current spline curve to define knots, the first four knots should equal the lowest x-value, the last four knots must equal the highest x-value, and intermediate knots must be in nondecreasing order, but need not be equally spaced. This technique is most useful when data sets have large local undulations, requiring closely packed knots in the region of possible spikes. Programs CALCURVE and CSAFIT allow users to specify such grouped knots on the data files.

  2. Automatic knots

    The program minimises the sum of squares of the discontinuity jump in the third order derivatives of the spline at interior knots subject to

    WSSQ =<F

    where F is a smoothing factor. Hence, large values of F lead to over-smoothing, while small values cause over-fitting.

    This is the method used by program COMPARE and it allows users to specify a smoothing factor, say of the order of the number of distinct x-values, which then allows the knots and coefficients to be calculated automatically.

  3. n - 1 interior knots: rho input by user

    This is an extremly versatile method as the knots are fixed and smoothing only depends on the parameter rho which can be varied interactively. The factor rho controls the amount of tension in the spline as it is a multiplier of the integral of the second derivative squared of the spline in the objective function, which is

    WSSQ + rho*(integral of f(x)^2 dx)

    Again, large values of rho give smoother curves but worse fit.

  4. n - 1 interior knots: rho estimated by generalised cross validation

    This could be the method of choice for determining trends in data by fitting an arbitrary smooth curve as it is fully automatic. However the previous technique might be preferred if this leads to over-fitting.

In all cases it is imperative to view the fit to the data to avoid over-fitting or under-fitting which would lead to erroneous estimates of derivatives, area, arc length, curvature, or x predicted from y.

Spline files

Files (like spline.tf1) from programs CALCURVE or COMPARE, must be formatted as follows.

Title
L 1
K(1)
...
K(M) (i.e. M knots where M = (L + 4)/2)
C(1)
...
C(N) (i.e. N coefficients where N = (L - 4)/2)
J
... (i.e. J further lines of detail) )