Fitting

Abstract

SIMFIT has a set of simple user-friendly programs for fitting the well known models used in enzyme kinetics, ligand binding, pharmacokinetics, survival analysis and growth curve estimation. There are also programs for data smoothing and model-free fitting as well as procedures that advanced users can employ for fitting systems of nonlinear differential equations, functions of several variables and user-supplied models.

Details

Curve fitting is a very advanced area of data analysis which requires that you are familiar with the following areas:-

Since all of this may be beyond the capacity of many users, SIMFIT provides programs which attempt to do the above by automatic methods for the models most often fitted in the sciences (growth curves, survival data, ligand binding curves, steady state enzyme kinetics, pharmacokinetics, etc.). There are items for advanced users, but all users should realise that optimisation leads to non-unique local minima and those trying to fit systems of differential eqns. or functions of several variables must be competent in maths and statistics.

SIMFIT advanced curve fitting functions

QNFIT is a program where you can choose a model from a library or input your own model for curve-fitting by the quasi-Newton method. With some versions you can also select to use either the Simplex, Gauss-Newton, Levenburg-Marquardt or Sequential Quadratic programming technique. Proceed as follows:

SIMFIT user-friendly curve-fitting programs

The only way to be sure of achieving a best fit to data with a chosen model is the interactive fitting as just described. As this approach is complicated, a set of more user-friendly programs is provided, where most of the decisions required for model discrimination and parameter estimation are taken automatically. These programs read in and then scale the data before using some specialised techniques to choose possible starting parameter estimates. After fitting, using one or a sequence of models, best-fit parameters are output together with statistics for you to evaluate the goodness of fit, and any parameter redundancy.

Advice

Use MAKFIL to prepare a file with all the data (not means of replicates) and x in increasing order. POLNOM (certainly) and GCFIT (probably) will find best-fit parameters but usually high order models must be fitted several times to be sure of locating an optimum solution point.

The user-friendly curve-fitting programs

exfit Sums of exponentials (unconstrained) gcfit Growth models (exponential/monomolecular/Richards, Von Bertalanffy/Gompertz/Logistic/Preece-Baines) Also Weibull-type survival models can be fitted. hlfit High/low affinity ligand binding (constrained) mmfit Sum of Michaelis Menten functions (constrained) polnom Polynomials (unconstrained-Chebyshev) rffit Positive n:n rational functions (constrained) sffit n:n saturation function with positive or negative cooperativity of ligand binding (constrained) csafit Flow cytometry histograms with stretch and shift inrate Calculates initial values and rates and any final inclined/horizontal asymptotes using polynomials, monomol, Hill, Michaelis-Menten or lag-phase eqn. calcurve Cubic splines and calibration curves (fixed knots) compare Cubic splines under tension (using variable knots) Also calculates derivatives, areas, arc length and absolute curvature and writes spline files.

The chi-square test for goodness of fit

Let WSSQ = weighted sum of squares and NDOF = no. degrees of freedom (no. points - no. parameters). If you have set s = 1 WSSQ/NDOF estimates the (constant) variance, sigma^2. So you can compare it with any other independent estimates for the (constant) variance of response y. If you had set s = exact std. err., WSSQ would be a chi-square variable, and you could consider rejecting a fit if the probability of a chisquare variable exceeding WSSQ (i.e.P(chisq >= WSSQ)) is < 0.01(1%) or < 0.05(5% significance level). Where standard error estimates are based on 4-5 replicates, you can reasonably decrease the value of WSSQ by some 20% before using this chi-square test.

The t test for parameter redundancy

The number T = (0 - parameter estimate)/(standard error) can be referred to the t(NDOF) distribution, to assess parameter redundancy where P(t <= - |T|) = P(t >= |T|) = alpha. A two tail p value is defined as p = 2*alpha, and parameters are significantly different from 0 if p < 0.01(1%) (< 0.05(5%)). Parameter correlation can be assessed from the corresponding elements of the correlation matrix.

The F test

The F test is very useful for discriminating between models with up to say 3 or 4 parameters. For models with more than 4 parameters, calculated F test statistics are no longer even approximately F distributed, but they do estimate the extent to which model error is contributing to excess variance from fitting a deficient model. It is unlikely that you will ever have data that is good enough to discriminate between models with more than 5 or 6 parameters in any case.

Analysis of residuals

The plot of residuals (or better weighted residuals) against dependent or independent variable or best-fit response is a traditional (arbitrary) approach that should always be used. The signs test is weak, and should be taken rather seriously if rejection is recommended (P(signs <= observed) < 0.01 (or. < 05)). The run test conditional on the sum of positive and negative residuals is similarly weak, but the run test conditional on observed positive and negative residuals is quite reliable, especially if the sample size is fairly large (> 20 ?). Reject if P(runs $lt:= observed) is < 0.01 (1%) (or < 0.05 (5%)))

How good is the fit ?

If you have s = 1, WSSQ/NDOF should be about the same as the square of the (constant ?) standard deviation of y. Consider rejecting the fit if there is poor agreement. If s = sample standard deviation of y (which may be the best choice?) then WSSQ is approximately chi-square and should be around NDOF. Relative residuals do not depend on s. They should not be larger than 25%, there should not be too many symbols ***, ****, or ***** in the residuals table and also the average relative residual should not be much larger than 10%. These are all convenient tests for the magnitude of the difference between your data and the best-fit curve. A graph of the best-fit curve should show the data scattered randomly above and below the fitted curve and the number of positive and negative residuals should be about the same.The table of residuals should be free from long runs of the same signs and the plot of weighted residuals against independent variable should be like a sample from a normal distribution with mu = 0 and sigma = 1. The sign and run tests help you to detect any correlations in the residuals.

Choosing a curve fitting program

Data increases/decreases exponentially              exfit
Muti-exponential-pharmacokinetics                   exfit
Sigmoid/monotonic increasing growth curve           gcfit
Sigmoid/monotonic decreasing survival fit           gcfit
One or more Michaelis Menten isoenzymes             mmfit
One or more high/low affinity sites                 hlfit
Ligand binding to one/more linked sites             sffit
Pharmocological dose response curves                hlfit
Steady state enzyme kinetics (one variable)         rffit
Straight line or gentle curve                       polnom
Calibration curve (with 95% con. limits)            calcurve
Data smoothing/arbitrary curve                      compare
Plot error bars/compare/slopes/area under curves    compare
Stretch/translation of flow cytometry histograms    csafit
Estimate final-asymptotes/initial-rates/lag-times   inrate
Linear and generalised linear models                linfit
Integrated progress curves/differential equations   deqsol
Many-variables/user-supplied-model/diff.-eqn.       qnfit

Back to Help Menu or End Help