Summary of SIMFIT definitions, procedures and programs


Abstract

Abstract

SIMFIT can be used for data analysis, graph plotting, simulation, model-discrimination and parameter-estimation using scientific models. There are separate programs for the following procedures.
  • Making data files by reading in clipboard data or typing in values.
  • Editing files to change units or baselines or do row and column arithmetic. The SIMFIT editors are specially designed to do the sort of checking that is often required, like making sure weights are positive and data values are in monotonically increasing order.
  • Fitting models, sequences of models or systems of differential equations. Some programs make all the decisions automatically but some are more suited for expert users who want to choose parameter limits, starting estimates, etc. interactively.
  • Plotting with automatic transformation of axes and displaying pie charts, bar charts, space curves, contours and surfaces. Special facilities are provided for editing PostScript files to re-size, rotate, collect into collages, etc
  • Data-smoothing, estimating areas, lag times, asymptotes and derivatives and constructing calibration-curves. Splines, polynomials, numerical methods or user defined models can be used.
  • Statistical tests and calculations. All the usual tests are supported as well as procedures dedicated to the standard densities and numerous specialised items, e.g. for parameter confidence limit estimation or for power and sample size calculations.
  • Simulating deterministic or stochastic models. Exact data can be calculated then error can be added to simulate experiments, pseudo random numbers can be generated and 1, 2 or 3 dimensional random walks can be plotted.

Each program is free-standing and has a tutorial and there are also test files called name.tf1, name.tf2, name.tf3, etc., where name is the filename of the program you are using. To run SIMFIT, click on the SIMFIT icon to run w_simfit.exe (or x64_simfit.exe), then look at the dedicated tutorial help screens, which will tell you what each program does and how to run it. The tutorials presume that you have some grasp of mathematics/statistics but a good way to learn to use the programs is to run them with test files. Note that scientific (i.e. exponential) notation is used for large and small numbers as follows (E = 10): 1.234E+02 = 123.4, 1.234E+00 = 1.234, 1.234E-02 = 0.01234. To stop any executing program press Ctrl+F4. Print out the w_readme.* files (indexed by w_readme.0) and the reference manuals (w_manual?.ps) for more details.

Back to Summary, Help Menu or End Help Overview

Overview

SIMFIT has been developed for model discrimination and parameter estimation with the sorts of models that are often encountered in the sciences. Such models are nonlinear, so there are no (exact) methods for determining the best-fit parameters or assessing goodness of fit. With linear models and constant variance, there are well understood tests, e.g. the chi-square, F, t, run and sign tests and the analysis of variance, but these classical statistical techniques are not valid with many experimental data sets.

For instance, if a polynomial is fitted, the algorithms are trivial and the best-fit polynomial is unique. On the other hand, fitting a sum of exponentials or binding functions, or positive rational functions needs specialised techniques for starting estimates and convergence to solutions is iterative, locating local minima rather than a global minimum.

Unfortunately experimental data are not often generated by a polynomial or any such linear models and error is not of the constant variance type, so an honest investigator must resort to weighted nonlinear least squares regression.

Weighted nonlinear least squares regression

Suppose the exact model is y = f(x,theta) and x can be fixed exactly but, at each value of x, experimental error e occurs. Then we can write y = f(x,theta) + e and we want to estimate the value of theta, where theta is a vector of unknown model parameters, e.g. rate constants or binding constants. If the errors e are normally distributed, with mean zero and standard deviation sigma, which only depends on x (so there is no correlation between the errors), the principle of maximum likelihood says we should take as the best estimate of theta a value of theta which minimises the weighted sum of squares

WSSQ = summation of [(y - f(x,theta))/sigma]^2.

Unfortunately, in the real world, x cannot be fixed exactly, sigma has to be estimated, we can never be certain that f() is the corect model, experimental errors are never uncorrelated and normally distributed, and WSSQ minimisation is a formidable problem that is not guaranteed to give a unique or sensible solution with nonlinear models. With SIMFIT you can explore such problems.

The input data file

For curve fitting the experimental data must be supplied for x, y and s where x = the fixed independent variable, e.g. time, concentration,
y = the measured dependent variable, e.g. size, initial rate and
s = the estimate for the standard deviation of y at point. x
For statistics either vectors (i.e. columns) of data are required, e.g. for a t test, or matrices, e.g. for multiple correlations. To run a program you have to prepare files that contain your input data in a correct format. Such input data files should be given meaningful names, e.g. data.001, data.002, data.003 data.004, first.set, second.set, third.set, fourth.set etc. so that the file name (e.g. data) and the file extension (e.g 001) are useful for global copying/deleting. Some programs will accept data from keyboards but this is not recommended. Always prepare files using programs MAKFIL or MAKMAT so you will have a permanent store of data that can be analysed repeatedly or edited using programs EDITFL or EDITMT.

The output data file

Programs will create output files on request, containing the details you will need for a permanent record of the analysis to store or print out retrospectively. You can call such files anything you like, e.g. save.it, store.it, a:forget.it, b:file.it, copyof.it, scrap.it etc. Even if you decide not to create a results file, SIMFIT keeps a record of all calculations done while a program is running, and you can access this data at any time to examine the progress of the analysis or copy results to the clipboard.

Why fit simulated data ?

Those statistical tests used in nonlinear regression are not exact, and the optimisation is not guaranteed to locate the best-fit parameter set. This will depend on the information content of your data and parameter redundancy in the model. To see just how reliable your results are you must perform a sensitivity analysis to note how results change as your data set is altered, e.g. by small alteration in parameter values. For instance, suppose that program HLFIT tells you that your binding data requires two binding sites by an F or run test, but only one binding constant can be estimated accurately as shown by a t test.You can then use program MAKMAT to make an exact data set with the best-fit parameters found by HLFIT, use program ADDERR to simulate your experiment, and then fit the data using HLFIT. If you do this repeatedly, you will be able to obtain weighted sums of squares, parameter estimates, t values, F values and run test statistics, and so judge the reliability of the result with your experimental data. This is a Monte Carlo type of sensitivity analysis.

Advice

  • Never use a terminal to input data directly. Use programs MAKFIL or MAKAT to prepare files, and EDITFL or EDITMT to edit them. Never type in means and standard deviations.
  • Read tutorial help screens before running any program.
  • Do not attempt to fit high order models unless your data are of high quality. Several runs are needed to locate global minima with high order equations.
  • Do not be satisfied with a result unless you find support for it from a simulation study.
  • Each program comes along with a set of test files, so you can practise with data where the answer is known. These files have the same file name as parent programs, but the extension .tf1 or .tf2, etc. For instance, to use program NORMAL (to test whether numbers are normally distributed) read in normal.tf1, which contains some numbers generated by program RANNUM. To analyse Michaelis-Menten data using MMFIT, input mmfit.tf1 (generated by program MAKDAT) then mmfit.tf2 (generated from mmfit.tf1 by program ADDERR).

Back to Summary, Help Menu or End Help
Technical

Technical details

Numerical analysis

This version of SIMFIT does not call the NAG library directly but it does use public domain equivalents or specially written routines for many calculations, where the error trapping mechanism has been made very similar to the NAG libary IFAIL system. Where possible, routines for root finding, quadrature, random number generation, numerical integration, optimisation, etc. have been used with the soft fail option (IFAIL = 1), and your attention is drawn to exits with nonzero IFAIL. All the code for file handling, data processing, statistics, numerical analysis, goodness of fit, variance-covariance matrix estimation, starting estimates, scaling, sorting, logic, graph plotting and help/advice has been written by W. G. Bardsley. An attempt has been made to ensure that the input data set, parameters, objective function and condition number of the Hessian matrix are of order unity (in internal coordinates) at the final solution points, by using scaling factors fixed by the range of the original data and starting estimates. The objective function is WSSQ/NDOF, which has expectation 1 with correct weights and the right model. If the wrong model or weights are used, or WSSQ/NDOF < 1.0E-6 or > 1.0E6 then the programs may not converge to a satisfactory solution.

Error messages

Errors will occur during a program operation if the program is unable to read input correctly from files, or because the data leads to a singularity in evaluating some expression. For instance, the formula y = 1/x will lead to overflow when x is too small to evaluate y correctly.

When a program detects any error, an error message will be displayed such as CAUTION, WARNING or *FATAL*, in increasing order of severity, along with information about the cause of the error and any remedial steps required.

Most errors occur because data files are not prepared in the correct format or because you ask programs to do things they cannot do, like fit inappropriate or overdetermined models. NAG library type messages are like this:

WARNING : IFAIL = 7 from E04FDF/DATFIT.

As it happens you can ignore this particular message but, if you are curious to know what it means, then look up the NAG library handbook to find out what IFAIL=7 means on exit from E04FDF, because that has occured in subroutine DATFIT.

Back to Summary, Help Menu or End Help Installation

Installation

  1. Get the self extracting file simfit_setup.exe from http://www.simfit.man.ac.uk
  2. Install in the default folder C:\Program Files\simfit, unless you have some very special requirements.
  3. Create a shortcut to C:\Program Files\simfit\w_simfit.exe (or x64_simfit.exe).
  4. To run SIMFIT just click on the SIMFIT icon.
  5. If you upgrade SIMFIT you must upgrade all the binaries.
  6. Get and install the GSview/Ghostscript package. This is now closely integrated with SIMFIT and is of very great value for printing PS files or transforming them into other types of graphics files. The first time you run SIMFIT read the configuration options and tutorials with each program.


      SIMFIT information bulletin I0: install.txt = w_readme.i0
      ===============================
      bill.bardsley@man.ac.uk, 1st June 2004

      Installing Simfit and GSview/Ghostscript

      1) Installing and configuring Simfit (Version 5.5)
         -----------------------------------------------

      a) You can uninstall previous versions of Simfit or just
         allow the installation program to overwrite the old files.

      b) Download the file simfit_setup.zip from the Simfit website:

         http://www.simfit.man.ac.uk

         The file simfit_setup.zip contains:
         simfit_setup.exe ... the complete Simfit package
         install.txt      ... this document
         configure.txt    ... configuration details
         readme.txt       ... short summary of the package

         Unzip the file simfit_setup.zip into a temporary folder
         and browse the text files using Notepad.
      c) Double click on simfit_setup.exe from Explorer and,
         unless there is some compelling reason to the contrary,
         choose to install in the default folder
         C:\Program Files\simfit.

      d) Make a shortcut from your desktop to the Simfit program
         manager C:\Program Files\simfit\w_simfit.exe (or x64_simfit.exe).
         Execute this shortcut and read any warning messages. Finally,
         press the [Configure] button, then the [Check] button to correct
         the paths if necessary, and then the [Apply] button to complete
         the configuration.

      e) If you decide not to use the default folders, you will have
         to be more careful when configuring the package.

      f) Run w_simfit.exe (or x64_simfit.exe) at least once from the Simfit 
         folder with all flags switched on to check for missing or obsolete 
         files.

      g) Where Simfit is for group use, the test files *.tf? could be
         made read-only to prevent accidental editing.

      h) Print out the manual using GSview (w_manual.ps) or Adobe Acrobat
         (w_manual.pdf).

      i) Common errors include inconsistent .exe and .dll files, failing
         to configure Simfit so that it cannot use your editor, Acrobat
         reader, calculator or GSview, and not giving Simfit write
         permission to create configuration and results files.

      2) Installing and configuring GSview/Ghostscript
         ---------------------------------------------

      The best and most versatile vector graphics format for archiving
      and printing professional quality scientific graphs is PostScript.
      Simfit creates such high quality PostScript files, but it only
      displays a low resolution bitmap representation of the graph on
      screen. To use Simfit PostScript files you must install Ghostscript.
      Ghostscript is a package to interpret and transform PostScript
      files. GSview is a convenient front end to run Ghostscript from
      MS Windows and Simfit is closely integrated with this software which
      must be installed to get the most out of Simfit graphics. With this
      package you can drive any printer or make any graphics files.
      You should visit the GSview home page by activating the link to

      http://www.cs.wisc.edu/~ghost/gsview

      from http://www.simfit.man.ac.uk to read about the latest releases
      of Ghostscript and GSview.


Back to Summary, Help Menu or End Help
Documentation

Documentation

  • The help program (which you are now using) covers most of the SIMFIT procedures.
  • Each individual program contains a set of help screens which should be sufficient to explain what each program does and how to use it. Also there are test files *.tf? for each program.
  • There are some ASCII text files: w_readme.0, w_readme.i1, w_readme.g1, etc. each covering a different aspect of SIMFIT. You should print these out or browse using the viewer. Note that w_readme.0 is an index to all the w_readme.* files.
  • A comprehensive user manual is available as LaTeX source plus ps files for figures, or as a complete document. The Contents and four parts of the manual can be read on screen or printed using GSview, which should be used to print PostScript files if you do not have a PostScript printer. The manual is also provided in .pdf format in case you want to use the Adobe Acrobat reader to view or print the manual.

Back to Summary, Help Menu or End Help