POLNOM: help and advice

POLNOM: help and advice


Consult the reference manual for further details and worked examples.
W.G.Bardsley, University of Manchester, U.K.
Alternative ways to use this program

There are several alternative interfaces to this program provided by Simfit.

Best-fit polynomials by weighted least squares

This program requires x,y,s data, where y is measured as a a function of x, and s are estimated standard deviations of y for weighting (or all s = 1 for unweighted fitting). Then it fits a sequence of polynomials

f(x) = p(0) + p(1)x + p(2)x^2 + ... + p(n)x^n.

However, inside the program, x is mapped into coordinates z where -1 <= z <= 1, so that the function can be represented in Chebyshev form as

g(z) = A0*T0(z)/2 + A1*T1(z) + A2*T3(z) + ... + An*Tn(z)

where Ti are Chebyshev polynomials of degree i.

All of your communication with the program is in external coordinates, but the coefficients Ai are also output so that if you choose to input your data as a Chebyshev orthogonal set, the internal and external coordinates will be the same and the Ai will have the special statistical properties.

Predicting x given mean y (Inverse prediction/Calibration)

Choose the lowest degree n which passes F and chi-square tests (5 or 1% significance level) or else the lowest degree where the weighted sum of squares WSSQ, or SIGMA = SQRT(WSSQ/NDOF) stabilises. This should normally be 1, 2 or at most 3 or 4. The 95% confidence limits estimated by the program will only be exact if the true model is chosen and the values you have supplied as standard errors for y are proportional to the true standard deviations of y as a function of x.

Predicting x from y is ambiguous when turning points occur. If the best-fit polynomial you choose has turning points you will be warned. In the event of ambiguity you can choose to search upwards for the smallest, or downwards for largest x. You may also decide to replace x by log(x) or 1/x etc., but remember to change the standard errors of y as necessary if you transform the y data. Use LINFIT for linear calibration or CALCURVE to fit cubic splines, which may be better for inverse prediction with some complicated data sets.

How to supply data

Simfit requires data sets for a data matrix X, that is x(i,j), where there are n rows and m columns, i.e., 1 =<  i =<  n and 1 =<  j =<  m.
This matrix can consist of any combination of integers (e.g. 7, 11, 23), numbers in decimal notation (e.g. 7.0, 11.0, 23.9), or numbers in scientific notation (e.g. 7.0E+00, 1.1E+01, 2.39E+02).
There must be no missing values and should be a short meaningful title followed by a line consisting of the integers n and m, then the matrix in either space-separated or comma-separated form.
It is possible to have no more content after the end of the data matrix, or e.g. k >= 1 indicating k lines of arbitrary extra text, or even special constructs such as row and column labels for plotting, or starting estimates and limits for curve fitting, etc.
Here, for example, is the Simfit test file MATRIX.TF1 using comma-separated variables in scientific notation.

Test file matrix.tf1: arbitrary 5 by 5 matrix
     5     5
  1.2000E+00,  4.5000E+00,  6.1000E+00,  7.2000E+00,  8.0000E+00
  3.0000E+00,  5.6000E+00,  3.7000E+00,  9.1000E+00,  1.2500E+01
  1.7100E+01,  2.3400E+01,  5.5000E+00,  9.2000E+00,  3.3000E+00
  7.1500E+00,  5.8700E+00,  9.9400E+00,  8.8200E+00,  1.0800E+01
  1.2400E+01,  4.3000E+00,  7.7000E+00,  8.9500E+00,  1.6000E+00
     1
Default line
Each time you run a Simfit program you can view the file format for the default test file supplied so you will know how to prepare your own data.
To do this you can use a text editor, like Notepad, paste a matrix from the clipboard into the file-open control, create a data file using a Simfit program like MAKMAT, or use Excel with the simfit6.xls macro.
Note that some data sets must have restricted formats, like curve fitting files that must have column 1 in nondecreasing order and positive weights in column 3 if required, as shown next.
Here is the file POLNOM.TF1 consisting of triplets at six values of the dependent variable in nondecreasing order in column 1, observations in column 2, and nonnegative weighting factors in column 3.
Note that, for unweighted fitting (i.e assuming constant variance) column 3 can be omitted or set to 1.
Test file polnom.tf1: y = 0.1 + 2x - 0.1x**2 (with 5% rel. err.)
    18     3
  0.0000E+00,  9.8421E-02,  5.6072E-03
  0.0000E+00,  1.0950E-01,  5.6072E-03
  0.0000E+00,  1.0248E-01,  5.6072E-03
  2.0000E+00,  3.8448E+00,  5.2139E-02
  2.0000E+00,  3.8647E+00,  5.2139E-02
  2.0000E+00,  3.9434E+00,  5.2139E-02
  4.0000E+00,  6.8490E+00,  3.8867E-01
  4.0000E+00,  6.1469E+00,  3.8867E-01
  4.0000E+00,  6.2091E+00,  3.8867E-01
  6.0000E+00,  8.5864E+00,  2.2982E-01
  6.0000E+00,  9.0156E+00,  2.2982E-01
  6.0000E+00,  8.6585E+00,  2.2982E-01
  8.0000E+00,  9.8616E+00,  4.5524E-01
  8.0000E+00,  9.8748E+00,  4.5524E-01
  8.0000E+00,  9.0798E+00,  4.5524E-01
  1.0000E+01,  9.5218E+00,  5.1790E-01
  1.0000E+01,  9.3098E+00,  5.1790E-01
  1.0000E+01,  1.0294E+01,  5.1790E-01
     7
Test data for program POLNOM
Polynomial of degree (N - 1)
y(x) = P(1) + P(2)*x + ... + P(N)*(x**(N - 1))
NMOD = N
NPAR = N
Exact data prepared by program MAKDAT
RANDOM ERROR ADDED USING PROGRAM ADDERR

Advice

  1. Choose sensible units for x and y, say -1000 <= x,y <= 1000
  2. Use all data and not sample means of y for the data input
  3. Use sample standard deviations s for weighting only if the sample size at each x-setting is at least 5
  4. If in doubt set all s = 1 (unweighted) or use s = 7.5% of measured-y (constant relative error)
  5. Use MAKFIL/EDITFL to prepare/edit/weight your data files
  6. Use MAKMAT/EDITMT to prepare/edit evaluation/prediction files, i.e. vectors of x or y values.
  7. Avoid choosing too high a degree for best-fit polynomial
  8. Always check the best-fit polynomial graphically to make sure it is not oscillatory
  9. Use the option to input x and predict y only if you want details of the best-fit curve with upper and lower 95% confidence limits
  10. Remember that (Garbage in) = (Garbage out). If you input ridiculous values for the standard deviations of y or choose a badly fitting best-fit curve any predictions or confidence limits will be useless.