MAKFIL: help and advice

MAKFIL: help and advice


Consult the reference manual for further details and worked examples.
W.G.Bardsley, University of Manchester, U.K.

Explaining program MAKFIL

MAKFIL can be used to generate files from relatively small sets of experimental data which will be compatible with the SIMFIT format for graph plotting, curve fitting, or generating standard curves for calibration analysis.
Large data sets can be pasted directly from your spreadsheet into SIMFIT programs or into program MAKSIM to generate data files. Also macros simfit4.xls and simfit6.xls in the C:\Program Files\simfit\doc folder can be used to create SIMFIT style data files from MS Excel.
The following summary describes the possibilities.


Data format for SIMFIT curve fitting files

SIMFIT curve fitting programs need input files containing values of x(i), y(i), and s(i) for i = 1, 2, ..., n defined as follows.

Then the chosen SIMFIT curve fitting program attempts to minimise the objective function WSSQ/NDOF, where
     WSSQ = {[y(1) - f(x(1))]/s(1)}^2 + {[y(2) - f(x(2))]/s(2)}^2
                                + ... + {[y(n) - f(x(n))]/s(n)}^2, and
     NDOF = number of points - number of parameters,
with respect to the parameters in f(x), where f(.) is the model being fitted with k parameters, i.e.
     f(x) = f(x|p(1), p(2), ..., p(k)).
     

This program is designed to input relatively small data sets into an initially empty spreadsheet-type control which must be filled in sequentially so that the data can be checked item by item to make sure they conform to curve fitting data file format. With large data sets data you can use spreadsheet programs, e.g. MS Excel with the simfit4.xls macro to make files, or copy and paste into the curve fitting program directly, or into program MAKFIL to make files.
This program takes your values for x(i) and y(i) then writes a file for fitting, further editing, plotting or weighting, formatted as follows.

     Title            (an informative title)
     n      3         (a n by 3 matrix)
     x(1), y(1), s(1) (first line of data)
     x(2), y(2), s(2) (second line of data)
        ...
        ...
        ...
     x(n), y(n), s(n) (last line of data)
     m                (m lines of text)
     Text(1)          (first line of text)
     Text(2)          (second line of text)
        ...
        ...
        ...
     Text(m)          (last line of text)

Weighting and s-values

All curve-fitting is actually weighted either directly or by implicit assumption. With constant variance, the variance of y is independent of x and y, and it is legitimate to set all s = 1 to do unweighted regression. Unfortunately, this assumption is usually false. With constant relative error, the cv% i.e. coefficient of variation is independent of x but dependant on y, so it could be reasonable to set s(i) = cv%|y(i)|/100 for accurate y-values or s(i) = alpha|f(x(i))| if a correct model is known. Again this assumption or similar ones, e.g. s^2 = A + By^2 may not be justified. Possible cases and recommendations are:

  1. Single measurements or =< 3 replicates at each fixed x:
    Set s = 1 and do unweighted regression.
  2. 4 or 5 replicates at each fixed x:
    Use EDITFL to estimate a smooth weighting scheme.
  3. >= 5 replicates at each fixed x:
    Set s = sample standard deviations.
  4. You know the correct weighting scheme independently:
    Input s directly using this program or EDITFL.

Choosing the x-input mode for the current run

As curve fitting and plotting programs need x in nondecreasing order, it is usual to check data on input. However there is also a dilution mode (as in immunological dose-response curves) for when you only know antibody concentrations up to an arbitrary factor and have to use dilutions, but to plot, fit, calibrate, etc. x must be supplied. So, e.g. with triplicate extremes and a doubling dilution scheme you might input a sequence like:

     x-input = 1, 1, 1, 2, 4, 8, 16, 32, 32, 32
and let the program rearrange and transform these values into
     x-calculated = 1/32, 1/32, 1/32, 1/16, 1/8, 1/4, 1/2, 1, 1, 1.
As an example, suppose the data input were for doubling dilution of an inhibitor, as follows
     x-input y-input
        1      10
        2      20
        2      20
        4      40
        4      40
        4      40
then the curve fitting file created would be rearranged with an extra column of s = 1 added, like this
         x      y    s
        0.25  40.0  1.0
        0.25  40.0  1.0
        0.25  40.0  1.0
        0.50  20.0  1.0
        0.50  20.0  1.0
        1.00  10.0  1.0
In other words, the data are rearranged into increasing order of x but, after fitting a model, the results can be plotted interactively in negative x-semilog space using powers of two to label the x-axis if required, i.e. in traditional doubling dilution space. You can also input x in arbitrary order but this is not wise as mistakes like 1, 2, 30, 4 instead of 1, 2, 3, 4 will be missed. As an example, here is the actual file that would be created for the previous doubling dilution example.

Doubling dilution data
     6     3
   2.5000E-01,  4.0000E+01,  1.0000E+00
   2.5000E-01,  4.0000E+01,  1.0000E+00
   2.5000E-01,  4.0000E+01,  1.0000E+00
   5.0000E-01,  2.0000E+01,  1.0000E+00
   5.0000E-01,  2.0000E+01,  1.0000E+00
   1.0000E+00,  1.0000E+01,  1.0000E+00
     1
Default line ... EOF
Note that data files created by MAKFIL can use scientific notation as in this example but, as will be clear from browsing the SIMFIT test files, any reasonable notation can used if you create data files in text editors.

Choosing the s-input mode for the current run

It is usual to input x and y then and accept s = 1 as default. However, if you wish to use weights you can input s-values as well as x and y values, set s-values to a constant, calculate s-values as percentages of y-values or calculate s-values as sample standard deviations from groups of replicates.

Plotting to check what you have typed in

The best way to create hardcopy from the SIMFIT package is to make ASCII text data files to be used by the program SIMPLOT. For instance, output files from this program can be read into SIMPLOT to make simple (x,y) plots. However this program will also plot your data before writing the output file so you can easily detect points where you may have typed in wrong values. The main use for plotting options in this program is to check visually what you have typed in before making an output file.

Advice on some other plotting options in SIMFIT

  1. For error bar plots make sure you have sensible s values. Note that the correct way to create error bar files is to make a curve fitting file with all replicates, then allow EDITFL or SIMPLOT to calculate the means and error bars interactively.
  2. For histograms, make sure the x-values are equally spaced. Note that the correct way to make curve-fit, pdf, cdf or histogram-files from samples is to make vector files with MAKMAT to be read into SIMSTAT for exhaustive analysis.
  3. For bar charts, create spacing by using dummy y-values = 0 or use the simple bar chart options in SIMPLOT, or SIMSTAT.

How to use this program

  1. Make a table with a column of x then a column of y values
  2. Use all your x and y measurements not means of replicates
  3. Arrange the data so that x values are in increasing order
  4. Choose a sensible filename for the file, e.g. dataset.001
  5. Choose a meaningful title, e.g. Experiment 10 with Lipase
  6. Select the mode of use options then type in the x,y values
  7. Check a plot of the data and table for obvious mistakes
  8. Use the editor to correct any mistakes in your x,y values
  9. Add any details to the end of the file that may be useful later.
    Some programs need extra information at this point in the file to run in EXPERT mode, e.g. start estimates.
  10. Write an output file containing x, y and all s values = 1
  11. Keep this file as a main master file for further editing
  12. To make corrections, add/delete lines/files of data, swap replicates and s for mean-y and standard errors, edit the title/text, replace s by percentages of absolute y, alter baselines or units used for x,y, etc. use program EDITFL.
  13. To use dilution mode, input the dilution factors as in 1, 10, 100 for 1:1, 1:10, 1:100, etc.

Adding parameter starting estimates and limits

The simple Simfit curve fitting programs such as POLNOM, MMFIT, SFFIT, EXFIT, HLFIT, GCFIT, etc. are dedicated to well-defined specialised models where the data supplied can be used to estimate sensible starting estimates and parameter limits. However, in order to fit data sets using the advanced Simfit curve fitting programs such as QNFIT and DEQSOL, users must supply files with an additional section after data defining the starting parameter estimates and limits. These values are necessary in order to scale the constrained nonlinear regression routines so that the internal parameters are of order unity at the solution point.

For instance. Suppose there are four parameters to be estimated and it is believed that the best-fit values would be something like

P(1) approximately = 1
P(2) approximately = 10
P(3) approximately = -100
P(4) approximately = 0.
Then a section like the following would have to be added in order to run in EXPERT mode, where starting estimates and limits are read from the data file.
begin{limits}
  -5.0    1.0    5.0
   5.0   10.0   15.0 
-150.0 -100.0  -50.0
 -10.0    0.0   10.0
end{limits}
Program MAKFIL allows you to append such a section to the data file if you intend to use the Simfit advanced curve fitting programs to analyse your data. A default set is initialised for you to then edit as appropriate for your needs.