SFFIT: help and advice

SFFIT: help and advice


Consult the reference manual (w_manual.pdf) and tutorials (w_examples.pdf) for further details and worked examples.
W.G.Bardsley, University of Manchester, U.K.

Note: sv_sffit supresses several sffit options to simplify the interface

Introduction

Program SFFIT can be used in one of three experimental situations.

  1. Fitting cooperative ligand binding data where the number of dependent binding sites n is known precisely, (e.g, haemoglobin where n = 4) and so the observations can therefore be normalised to lie between 0 and 1.
    In this case the only parameters to be estimated are the binding constants.
  2. Fitting cooperative binding data where the number of dependent binding sites may not be certain, or the prortionality constant between observations and proportional saturation may not be known exactly, or where there may be a background noise component contributing to the observations.
    In this case the values of these extra parameters have to be estimated in addition to the binding constants.
  3. Fitting noisy binding data where the value of ligand concentration is nonnegative, the observations are positive, and all that is required is to fit a smoothing curve that may be sigmoid but cannot have turning points in order to plot the data and estimate the half-saturation point.
It must be realised that by adding more parameters to a curve fitting model the precision of parameter estimates will be decreased.

Fitting cooperative ligand binding saturation functions

To be precise, program SFFIT fits monotonically increasing saturation or dose response curves that are appropriate for a protein with n cooperatively linked binding sites. Usually the number of binding sites, n, is known independently, so that an appropriate model is the fractional saturation, or binding isotherm, y normalised so that

0 =< y(x) =< 1, for x >= 0.

However, there are often complications as follows.

To accomodate all these options, this program fits sequential order n saturation functions defined in terms of a binding polynomial p(x) and its first derivative p' by

y(x) = (Z/n)xp'/p + C, where

xp' = K(1)x + 2K(2)x^2 +...+ nK(n)x^n, and

p(x) = 1 + K(1)x + K(2)x^2 +... + K(n)x^n.

Note that K(i) > 0, n is the number of binding sites, and there are two scaling factors, Z > 0 and C. When x = 0, y = C and, as x tends to infinity, then y tends to Z + C.

The asymptote and half saturation point are calculated numerically. Ideally you should normalise your data to Z = 1 and C = 0, so y(0) = 0 and y tends to 1 as x increases. This can be done with a preliminary run to estimate scaling factors Z and C, and then using program EDITFL to change baseline, units, etc.

Normally you would only require order n = 1 but, if your data are good, then you can see if order n = 2 gives statistically significant improvement. Only if the data are very extensive and of high quality is there any point in trying n >= 4, and the parameter estimates may be of little value anyway. Data for analysis must be in a formatted file that you can prepare, edit, weight using programs MAKFIL and EDITFL.

Alternative definitions for binding constants

With one binding site there are two conventions, but with two or more receptors there are eight.

Suppose receptor R and ligand L interact as in R + L = RL1, then there is an association constant Ka and a dissociation constant Kd, where Ka = [RL1]/([R][L]) = 1/Kd.
Suppose a second interaction takes place as in RL1 + L = RL2, then there are numerous alternatives, as now listed.

Overall association constants

K(1) = [RL1]/([R][L]), K(2) = [RL2]/([R][L]^2)

Adair association constants

A(1) = [RL1]/([R][L]), A(2) = [RL2]/([RL1][L])

Intrinsic association constants

B(1) = A(1)/2, B(2) = 2A(2)

Two independent sites R1 + L = R1L, R2 + L = R2L

Ka(1) = [R1L]/([R1][L]), Ka(2) = [R2L]/([R2][L])

There are also the corresponding reciprocals, i.e., dissociation constants.

Binding polynomials and definitions for cooperativity

When x is free ligand activity [L], the binding polynomial can be expressed in various ways, such as

where c1, c2, etc. are binomial coefficients. Hence fractional saturation functions are simply the logarithmic differentials, i.e.

y = = (1/n) dlog(p)/dlog(x) = (x/n)(dp/dx)/p.

Thermodynamic cooperativity depends on the sign of H(x), the Hessian of p, which using p' the first derivative of p and p'' the second derivative of p with respect to x, is

H(x) = npp'' - (n - 1)(p')^2, since

dlog[y/(1 - y)]/dlog(x) = 1 + x*H(x)/[p'(np - xp')] where np - xp' is positive.

Note that, if you select cooperativity analysis, SFFIT gives many indices that are in use to measure binding cooperativity.

For independent sites p(x) = (1 + Ka(1)x)...(1 + Ka(n)x), and you should use programs MMFIT or HLFIT, not this program.

For more details see Bardsley et al J. Theor. Biol. (1987) 126, 183-201 and J. Theor. Biol. (1989) 139, 85-102.

Starting estimates

First data ranges are estimated and used for re-scaling into internal coordinates.

Then analysis of low x and high x points is used to estimate slope and intercept at the origin, and the value and rate of approach to the final horizontal asymptote. These will only be useful if your data approach near to origin and asymptote, and they are used to get preliminary estimates for Z and C.

A wide ranging random lognormal then local search of allowed parameter space attempt to find feasible starting estimates. Then constrained minimisation of the over-determined linear system in the L1 norm from the best random estimates is done to improve these estimates. Since these procedures are not likely to succeed with noisy data or n > 1, especially if Z and/or C are varied, you can input starting estimates directly. The internal parameters and objective function are scaled to order unity at the solution point.

Input data

The program checks to make sure that x and y are nonnegative, and that x is in nondecreasing order, so that groups of replicates can be identified and starting estimates calculated.

Test files

The files sffit.tf1, sffit.tf2, sffit.tf3 and sffit.tf4 are examples of arbitrary exact data which can be fitted to see typical curvess. Program ADDERR can be used to add error to these files, or else files like mmfit.tf4 can be analysed.

Precision

SFFIT terminates when either the relative change in objective function (wssq/ndof) or infinity norm of the projected gradient vector are less than tolerance values (set by factr and pgtol). Finite differences and default tolerance should be used unless you have a special need for analytic derivativesand high precision.

Local minim

The greatest problem when fitting positive saturation functions of order 2:2 or greater is convergence to local minima. Fit a 2:2 to mmfit.tf4 with short then extensive random search and you may get two fits with very different parameter estimates. The problem is most acute when starting estimates are widely differing in size, as sometimes happens as a result of using a random search or L1 overdetermined fit. The problem is then dominated by a sub-set of model parameters, often the extreme ones, leading to false convergence.

The short, medium or extensive random searches used by SIMFIT employ different strategies, and are likely to locate different starting estimates, which may lead to alternative parameters at a variety of solution points. In extreme cases you may have experiment to see what happens if you fit from starting values that you set interactively. In all cases of 2:2 or greater you should run the program several times anyway. With very stubborn problems you should run program QNFIT, then plot WSSQ contours round alternative solution points to see what is happening.


Explanations for the run-time options provided by this program
  1. Choose the sequence of functions required
    • Lowest order (>= 1)
    • Highest order (=<   2)
    You can fit just one model if required by setting Lowest order = Highest order. Or, if the data are extensive (i.e. approaching the final horizontal aymptote) and of high quality, you can also try fitting the next model in the series in sequence to see if the better fit can be justified. Only in exceptional circumstances would order >= 3   be justified by the statistical tests employed by this program.

  2. Choose the method to use for parameter starting estimates
    • Short Random search
    • Medium Random search
    • Extensive Random search
    • You input starting estimates
    This program scales the data internally to order unity by estimating the slope at the origin from the first few data points and the final asymptote from the last few data points. To improve upon the starting estimates obtained by this process a short random search can be used to seek for better starting estimates. When trying to fit higher order models a medium or extensive search can be used but, with order >= 3, it will normally be required to input your own starting estimates obtained by previous experimentation. In such cases it would probably be better to add the starting estimates to the data file using the {begin limits} ... {end limits} construction and use program QNFIT in the expert mode.

  3. Choose the procedures required for this analysis
    • Display the goodness of fit analysis
    • Display values of starting estimates
    • Display details of any random search
    • Plot the best fit curves and data
    • Provide options to plot residuals
    • Display tables of (wtd.) residuals
    • Write residuals to results log file
    • Use analytic-gradient/high-precision
    • Do a cooperativity analysis (if n >   1)
    • Store/test parameters/covariance-matrix
    • Fit a scaling factor Z
    • Fit a baseline correction factor C

    If all these options are switched off, the program will simply calculate the fit then display a table of parameter estimates. However the default options also output goodness of fit criteria and plots of the data with best-fit curves. It is possible to use an explicitly calculated gradient vector rather than a finite difference estimation for the iterations, but this can slow the fitting down and is only ever required when investigating the convergence with higher order models. There is also an option to store the estimated parameters and covariance matrices for retrospective invesigation concerning model discrimination. Note that, when fitting models in sequence of increasing order, statistical tests are output for model discrimination. Note that SIMFIT provides a facility to extract tables from the results log files to import into LaTeX documents or word processing programs.

    Sometimes a scaling factor Z can be estimated if the data cannot be normalised to 0 =<   y =<   1

    Similarly, sometimes there is a backgound signal and the best way to remove this is to estimate it independently then subtract it from the Y-data before fitting. Program SFFIT also allows the estimation of such background factors by estimating a correction parameter but this should only be requested when absolutely necessary, such as when the background noise changes unavoidably between experiments, as it makes fitting much harder.