Fitting
Abstract
SIMFIT has a set of simple user-friendly programs for fitting
the well known models used in enzyme kinetics, ligand binding,
pharmacokinetics, survival analysis and growth curve estimation.
There are also programs for data smoothing and model-free fitting
as well as procedures that advanced users can employ for fitting
systems of nonlinear differential equations, functions of several
variables and user-supplied models.
Details
Curve fitting is a very advanced area of data analysis which
requires that you are familiar with the following areas:-
- Choosing a model and verifying that it fits adequately
- Collecting data of sufficient quality to be analysed
- Estimating sensible data weighting factors
- Selecting a suitable optimisation method
- Choosing starting estimates and data scaling so that the
objective function, internal parameters and condition no.
of the projected Hessian are order unity at the solution
- Interpreting goodness of fit criteria
Since all of this may be beyond the capacity of many users,
SIMFIT provides programs which attempt to do the above by
automatic methods for the models most often fitted in the
sciences (growth curves, survival data, ligand binding curves,
steady state enzyme kinetics, pharmacokinetics, etc.). There
are items for advanced users, but all users should realise that
optimisation leads to non-unique local minima and those trying
to fit systems of differential eqns. or functions of several
variables must be competent in maths and statistics.
SIMFIT advanced curve fitting functions
QNFIT is a program where you can choose a model from a library
or input your own model for curve-fitting by the quasi-Newton
method. With some versions you can also select to use either
the Simplex, Gauss-Newton, Levenburg-Marquardt or Sequential
Quadratic programming technique. Proceed as follows:
- Prepare files using program MAKFIL (or MAKDAT/ADDERR)
- Check and scale data in the files (use program EDITFL)
- Read the files into QNFIT and decide on the method to use.
- Select a curve-fitting equation (from library provided)
- Choose a first set of trial parameter starting estimates
- Experiment iteratively with random starts, some parameters
fixed, some constrained (restricted to chosen intervals,
etc.) until you have found a good solution point, where the
internal parameters are of order one, where the gradient
vector is small and the condition number of the internal
Hessian matrix is not too large. Examine the goodness of
fit criteria and residuals until satisfied that the best
possible fit has been obtained for the chosen model.
SIMFIT user-friendly curve-fitting programs
The only way to be sure of achieving a best fit to data with
a chosen model is the interactive fitting as just described.
As this approach is complicated, a set of more user-friendly
programs is provided, where most of the decisions required
for model discrimination and parameter estimation are taken
automatically. These programs read in and then scale the data
before using some specialised techniques to choose possible
starting parameter estimates. After fitting, using one or a
sequence of models, best-fit parameters are output together
with statistics for you to evaluate the goodness of fit, and
any parameter redundancy.
Advice
Use MAKFIL to prepare a file with all the data (not
means of replicates) and x in increasing order.
POLNOM (certainly) and GCFIT (probably) will find
best-fit parameters but usually high order models
must be fitted several times to be sure of locating
an optimum solution point.
The user-friendly curve-fitting programs
exfit Sums of exponentials (unconstrained)
gcfit Growth models (exponential/monomolecular/Richards,
Von Bertalanffy/Gompertz/Logistic/Preece-Baines)
Also Weibull-type survival models can be fitted.
hlfit High/low affinity ligand binding (constrained)
mmfit Sum of Michaelis Menten functions (constrained)
polnom Polynomials (unconstrained-Chebyshev)
rffit Positive n:n rational functions (constrained)
sffit n:n saturation function with positive or negative
cooperativity of ligand binding (constrained)
csafit Flow cytometry histograms with stretch and shift
inrate Calculates initial values and rates and any final
inclined/horizontal asymptotes using polynomials,
monomol, Hill, Michaelis-Menten or lag-phase eqn.
calcurve Cubic splines and calibration curves (fixed knots)
compare Cubic splines under tension (using variable knots)
Also calculates derivatives, areas, arc length and
absolute curvature and writes spline files.
The chi-square test for goodness of fit
Let WSSQ = weighted sum of squares and NDOF = no. degrees of
freedom (no. points - no. parameters). If you have set s = 1
WSSQ/NDOF estimates the (constant) variance, sigma^2. So you
can compare it with any other independent estimates for the
(constant) variance of response y. If you had set s = exact
std. err., WSSQ would be a chi-square variable, and you could
consider rejecting a fit if the probability of a chisquare
variable exceeding WSSQ (i.e.P(chisq >= WSSQ)) is < 0.01(1%) or
< 0.05(5% significance level). Where standard error estimates
are based on 4-5 replicates, you can reasonably decrease the
value of WSSQ by some 20% before using this chi-square test.
The t test for parameter redundancy
The number T = (0 - parameter estimate)/(standard error) can
be referred to the t(NDOF) distribution, to assess parameter
redundancy where P(t <= - |T|) = P(t >= |T|) = alpha.
A two tail p value is defined as p = 2*alpha, and parameters
are significantly different from 0 if p < 0.01(1%) (< 0.05(5%)).
Parameter correlation can be assessed from the corresponding
elements of the correlation matrix.
The F test
The F test is very useful for discriminating between models
with up to say 3 or 4 parameters. For models with more than 4
parameters, calculated F test statistics are no longer even
approximately F distributed, but they do estimate the extent
to which model error is contributing to excess variance from
fitting a deficient model. It is unlikely that you will ever
have data that is good enough to discriminate between models
with more than 5 or 6 parameters in any case.
Analysis of residuals
The plot of residuals (or better weighted residuals) against
dependent or independent variable or best-fit response is a
traditional (arbitrary) approach that should always be used.
The signs test is weak, and should be taken rather seriously
if rejection is recommended (P(signs <= observed) < 0.01
(or. < 05)).
The run test conditional on the sum of positive and negative
residuals is similarly weak, but the run test conditional on
observed positive and negative residuals is quite reliable,
especially if the sample size is fairly large (> 20 ?). Reject
if P(runs $lt:= observed) is < 0.01 (1%) (or < 0.05 (5%)))
How good is the fit ?
If you have s = 1, WSSQ/NDOF should be about the same as the
square of the (constant ?) standard deviation of y. Consider
rejecting the fit if there is poor agreement. If s = sample
standard deviation of y (which may be the best choice?) then
WSSQ is approximately chi-square and should be around NDOF.
Relative residuals do not depend on s. They should not be
larger than 25%, there should not be too many symbols ***,
****, or ***** in the residuals table and also the average
relative residual should not be much larger than 10%. These
are all convenient tests for the magnitude of the difference
between your data and the best-fit curve.
A graph of the best-fit curve should show the data scattered
randomly above and below the fitted curve and the number of
positive and negative residuals should be about the same.The
table of residuals should be free from long runs of the same
signs and the plot of weighted residuals against independent
variable should be like a sample from a normal distribution
with mu = 0 and sigma = 1. The sign and run tests help you to
detect any correlations in the residuals.
Choosing a curve fitting program
Data increases/decreases exponentially exfit
Muti-exponential-pharmacokinetics exfit
Sigmoid/monotonic increasing growth curve gcfit
Sigmoid/monotonic decreasing survival fit gcfit
One or more Michaelis Menten isoenzymes mmfit
One or more high/low affinity sites hlfit
Ligand binding to one/more linked sites sffit
Pharmocological dose response curves hlfit
Steady state enzyme kinetics (one variable) rffit
Straight line or gentle curve polnom
Calibration curve (with 95% con. limits) calcurve
Data smoothing/arbitrary curve compare
Plot error bars/compare/slopes/area under curves compare
Stretch/translation of flow cytometry histograms csafit
Estimate final-asymptotes/initial-rates/lag-times inrate
Linear and generalised linear models linfit
Integrated progress curves/differential equations deqsol
Many-variables/user-supplied-model/diff.-eqn. qnfit
Back to Help Menu or End Help