Simfit data file formats

Standard Simfit data file format and adding trailer sections with additional information, e.g., labels for plotting, starting-estimates/limits for curve fitting, variables to be suppressed in multivariate analysis, etc.

Importing data into Simfit

All simfit procedures that require data for curve fitting, statistical analysis, or plotting require a rectangular matrix of data values such as the following table.
 1.1   1.2   1.3   1.4
 2.1   2.2   2.3   2.4
 3.1   3.2   3.3   3.4
 4.1   4.2   4.3   4.4
This table consists of a rectangular array of numbers with no mising values but sometimes, e.g in ANOVA, incomplete matrices are acceptable where not all columns need to have the same number of rows. Then commas are required as separators between column values so that empty cells can be recognised.

Data values in Simfit data files can use integer, floating point, or scientific notation. Integer notation can be used for whole numbers such as 1, 7 -4, 23. Floating point notation is used for medium sized values like 1.0, 7.0, -7.0, -4.0, 23.0 Scientific notation can be used for large or small numbers but is also useful to indicate the number of significant digits. Here the number is shown with the symbol E+xyz which means multiplying by 10 to the power xyz as in

 1.234E+00 = 1.234 (because E+00 is 10 to the power 0, i.e. 1)
 1.234E+03 = 1234.0
 1.234E-03 = 0.001234 
 
Another useful feature of scientific notation is that it allows tables of figures to be listed where the order of magnitude can be seen at a glance.

It is far better to prepare data files for input into Simfit, but from the file open dialogue there may be an option to type data in directly from the keyboard, when the only information required is a rectangular array of values and nothing else. Naturally this option should only be used for small data sets and, if it is used, then a sequence of files called temporary_data.* are created in your usr folder, where * is txt, 001, 002, etc.starting from scratch with each new session in case you wish to archive any typed-in data.

By far the easiest way to import such data into Simfit is to highlight a table (with or without labels) in your spreadsheet, and then copy the selected table to the clipboard. When Simfit requests new data the [Paste] button will then become activated, and by pressing this the table will be written to a temporary file in Simfit format for analysis.

Other possibilities are to export a data file in comma-separated-variable, tab-separated-variable, XML or HTML formats from your spreadsheet, as these can be read into Simfit program Maksim to be transformed into data files in Simfit format. Macros such as simfit6.xls are provided to export Simfit data files from Excel.

Back to Menu

Simfit file formats

Frequently it is necessary to import extra data into Simfit, such as starting estimates for K-means clusters, or parameter limits for constrained curve fitting, and for this purpose it is much better to use text files in Simfit file format, where the extra data values are appended to the data file. Such files can easily be made using a text editor, such as Notepad, or using the suite of Simfit programs provided for this purpose such as:
 Makfil ... prepare curve fitting files
 Editfl ... edit curve fitting files
 Makmat ... prepare rectangular data tables
 Editmt ... edit rectangular data tables
 Maklib ... make a library file
 Maksim ... Transform a table into a Simfit format file 
These programs have a great many features that make Simfit file preparation and editing much simpler, such as making sure X-values are in nondecreasing order or weights are nonnegative in curve fitting files. They can also be used to join files, transpose data sets, delete/add rows/colums, etc.

The Simfit file format is extremely simply, namely, a title, a header with row and column dimensions, the data table, then an optional trailer section as in the next example (where comments are added at the symbol...).

 Simple example     ... the title
 2 4                ... the header (2 rows and 4 columns)
 1 2 3 4            ... row 1
 5 6 7 8            ... row 2
 2                  ... indicates 2 extra lines
 An extra line      ... extra line 1
 Another extra line ... extra line 2
Simfit has the ability to import large numbers of files at the same time which is very useful, e.g. for fitting components for systems of differential equations, or plotting many files simultaneously. This uses a library file or alternatively the project archive technique.

Sometimes the rows and columns will require labels for plotting as in the next table.

 Row/Col Column_1  Column_2 Column_3 Column_4
 Row_1        1.1       1.2      1.3      1.4
 Row_2        2.1       2.2      2.3      2.4
 Row_3        3.1       3.2      3.3      3.4
 Row_4        4.1       4.2      4.3      4.4
However, note that there must be a dummy label in the top left hand cell, and also row and column labels must not have any spaces. The numbers can be in any format, e.g. 1, 1.0, 1.0E+00 for one, and the columns can be separated by commas or tabs if you want.

Back to Menu

Example 1: vector.tf1 (5 rows and 1 column, i.e. a vector)

This just illustrates a header consisting of a title, then row and column dimensions, followed by one column of data, then finally an unused trailer.
Test file vector.tf1: Vector with components 1, 2, 3, 4, 5
     5     1
  1.0
  2.0
  3.0
  4.0
  5.0
     1
Default line

Back to Menu

Example 2: matrix.tf1 (5 rows and 5 columns, i.e. a matrix)

This time the data set is a matrix.
Test file matrix.tf1: arbitrary 5 by 5 matrix
     5     5
           1.2           4.5           6.1           7.2           8.0
           3.0           5.6           3.7           9.1          12.5
          17.1          23.4           5.5           9.2           3.3
          7.15          5.87          9.94          8.82          10.8
          12.4           4.3           7.7          8.95           1.6
     1
Default line
Back to Menu

Example 3: binomial.tf3 (5 rows and 3 columns of integers)

Now the entries are integers and the trailer is used to describe the meaning of the observations.
Test file binomial.tf3: y,N,x for analysis of proportions
5, 3
  23,  84, 1
  12,  78, 2
  31, 111, 3
  65,  92, 4
  71,  93, 5
3
Column 1 = y, no. successes
Column 2 = N, no. Bernoulli trials, N >= y > = 0
Column 3 = x, x-coordinates for plotting

Back to Menu

Example 4: cluster.tf1 (illustrating begin{labels} ... end{labels})

In this example the trailer contains all the row labels to be plotted. To use this method, the labels must be the entries in a begin{labels} ... end{labels} structure. Also appended to the file are details about the meaning of the columns.
Test file cluster.tf1: Cluster analysis data, e.g. dendrogram
    12   8
 1   4   2  11   6   4   3   9
 8   5   1  14  19   7  13  21
 3   1   3   1   3   6  23  37
 9   0   7   7   1   2  21   2
 7  12   9   5  14   9  12  14
 2  13  15   2  23   6  34   8
11   7   2   1   4  17  11   4
 6   3   7  12  11   8   8   0
 8  21   1  10  31   9   3  18
19  14  12   9  16  10   0  27
17  18  10   6  19  14   1  24
15  21   8   7  17  12   4  22
    28
begin{labels}
A-1
B-2
C-3
D-4
E-5
F-6
G-7
H-8
I-9
J-10
K-11
L-12
V1
V2
V3
V4
V5
V6
V7
V8
end{labels}
Meaning of the above 37 lines:-
Line 1 = dendrogram title
Line 2 = no. rows (cases) no. columns (variables)
Lines 3 to 14 =  data matrix
Line 15 = no. extra lines (not necessary, can be omitted)
Lines 17 to 37 = labels

Back to Menu

Example 5: piechart.tf1 (advanced pie chart)

Normal pie charts can be plotted from a simple vector of nonnegative numbers proportional to segment areas, but here a method for creating advanced pie charts is illustrated.
Advanced pie chart 1:  fill styles
   10     4
  1.0,  1.0, 0.0, 15
  1.0,  2.0, 0.0, 15
  1.0,  3.0, 0.0, 15
  1.0,  4.0, 0.0, 15
  1.0,  5.0, 0.0, 15
  1.0,  6.0, 0.0, 15
  1.0,  7.0, 0.0, 15
  1.0,  8.0, 0.0, 15
  1.0,  9.0, 0.0, 15
  1.0, 10.0, 0.0, 15
    12
begin{labels}
Style 1
Style 2
Style 3
Style 4
Style 5
Style 6
Style 7
Style 8
Style 9
Style 10
end{labels}
1) Explanation of the above data values
line 1: title 

line 2: m = no. rows, n = no. columns 

line 3: onwards

   Each line of data looks like this:-
   A, B, C, D
   and has the following interpretation:-
    A = value for the sector (>= 0)
    B = fill style for the sector   (between 0 and 10)
    C = displacement for the sector (between 0 and 1)
    D = colour for the sector       (between 0 and 71)

line m + 3: no. of meaningful trailing text lines

a) begin{labels} indicates the start of the segment labels
b) the next m lines are the segment labels
c) end{labels} indicates the end of the segment labels 
2) More about A, B, C and D
A: values =< 0 are not allowed

B = 0  no display
B = 1  empty
B = 2  solid
B = 3  upward diagonals
B = 4  downward diagonals
B = 5  criss cross
B = 6  horizontal
B = 7  vertical
B = 8  dashes
B = 9  dots
B = 10 dots-dashes

C = 0  minimum displacement
C = ?  fractional displacement if between 0 and 1
C = 1  full displacement

D = 0  Black
D = 1  Blue
D = 2  Green
D = 3  Cyan
D = 4  Red
D = 5  Magenta
D = 6  Brown
D = 7  White
D = 8  Dark Gray
D = 9  Light Blue
D = 10 Light Green
D = 11 Light Cyan
D = 12 Light Red
D = 13 Light Magenta
D = 14 Yellow
D = 15 Intense White, etc. (see w_ps.cfg)
3) More about segment labels

It is also possible to add the labels directly after line m + 3 and without the begin{labels}...end{labels} as long as the integer on line m + 3 is >= m.

4) Using a vector file directly

There is also a short method to plot piecharts.

For instance, If you input a short column vector containing only nonnegative values, a default advanced barchart file will be generated from it and plotted directly.

Back to Menu

Example 6: barchart.tf1 (advanced bar chart)

Normal bar charts can be plotted from a simple matrix where the columns represent groups and the rows independent sets of observations, but here the technique for creating an advanced bar chart is illustrated.
Advanced barchart 1: box and whisker plot
     5     9
  1.0, -2.0, -1.0,  0.0,  2.0,  3.0, 1.0, 1.0, 15.0
  3.0, -1.0,  0.0,  2.0,  4.0,  5.0, 1.0, 1.0, 15.0
  5.0,  1.0,  2.0,  3.0,  5.0,  6.0, 1.0, 1.0, 15.0
  7.0,  0.0,  1.0,  2.0,  4.0,  5.0, 1.0, 1.0, 15.0
  9.0,  1.0,  3.0,  5.0,  6.0,  7.0, 1.0, 1.0, 15.0
     7
begin{labels}
January
February
March
April
May
end{labels}
1) Explanation of the above data values
line 1: title

line 2: m = no. rows, n = no. columns

line 3: onwards
   Each line of data looks like this:-
   x, y1, y2, y3, y4, y5, f, w, c
   and has the following interpretation:-
    x = x coordinate for the bar (x in nondecreasing order)
   y1 = y for bottom of range    (i.e. bottom error bar)
   y2 = y for lower quartile     (i.e. bottom of box)
   y3 = y for median of data     (i.e. divider for box)
   y4 = y for upper quartile     (i.e. top of box)
   y5 = y for top of range       (i.e. top eror bar)
    f = fill-style               (between 0 and 10)
    w = width                    (between 0 and 1)
    c = colour                   (between 0 and 71)

line m + 3: no. of important trailing text lines 

In this case the first m + 2 of these are as follows: 
a) first of all begin{labels} to indicate the start of the labels,
b) then the consecutive labels for data rows 1 to m, 
c) then finally end{labels} to indicate the end of the labels. 
2) Generating advanced bar chart files interactively

This advanced bar chart format is designed so that arbitrary bar charts with any possible configuration of stacks, overlaps, different box widths, hanging boxes, etc., can be created.

But note that simple default box and whisker and bar chart plots can also be generated directly from rectangular data matrices, using the exhaustive analysis of a matrix technique.

For such box and whisker plots, the boxes and whiskers are generated from the median and quartiles of the column values along each row, while for bar charts the column values along each row are taken to be values for the successive bars within each group. When a matrix is plotted in this way, an advanced bar chart file like this one is generated automatically.

3) The allowed values for x, y, f, w and c

x: consecutive values lead to adjacent bars in the plot, intervals of
   1 or more generate spaces between bars, identical x values lead to
   stacking
y: by setting appropriate y values you can suppress error and median
   bars and so create bars or stacks of bars with any y coordinates
   you like
f = 0  no display
f = 1  empty
f = 2  solid
f = 3  upward diagonals
f = 4  downward diagonals
f = 5  criss cross
f = 6  horizontal
f = 7  vertical
f = 8  dashes
f = 9  dots
f = 10 dots-dashes
w = 0  minimum width bar
w = ?  fractional width of the bar if between 0 and 1
w = 1  full width bar
c = 0  Black
c = 1  Blue
c = 2  Green
c = 3  Cyan
c = 4  Red
c = 5  Magenta
c = 6  Brown
c = 7  White
c = 8  Dark Gray
c = 9  Light Blue
c = 10 Light Green
c = 11 Light Cyan
c = 12 Light Red
c = 13 Light Magenta
c = 14 Yellow
c = 15 Intense White, etc. (see w_ps.cfg)
4) Details about labels

Note that, as long as the labels follow directly after line m + 3, and the integer on this line is at least m, then the begin{labels} and end{labels} can be omitted. Also, if the advanced bar chart file has m rows which are to be interpreted as k sets of l bars per group, so that m = k*l, then there must be m labels, but the first k will be the labels for each group and the rest must be arbitrary, except that label m must be l*bars/group to indicate that there are l bars in each of the k groups.

Back to Menu

Example 7: kmeans.tf1 (various uses of begin{} ... end{})

When the begin{} ... end{} technique is used, the extra data can be placed anywhere in the trailer section.
Data for 5 variables on 20 soils (G03EFF, Kendall and Stuart)
20 5
77.3 13.0  9.7 1.5 6.4
82.5 10.0  7.5 1.5 6.5
66.9 20.6 12.5 2.3 7.0
47.2 33.8 19.0 2.8 5.8
65.3 20.5 14.2 1.9 6.9
83.3 10.0  6.7 2.2 7.0
81.6 12.7  5.7 2.9 6.7
47.8 36.5 15.7 2.3 7.2
48.6 37.1 14.3 2.1 7.2
61.6 25.5 12.9 1.9 7.3
58.6 26.5 14.9 2.4 6.7
69.3 22.3  8.4 4.0 7.0
61.8 30.8  7.4 2.7 6.4
67.7 25.3  7.0 4.8 7.3
57.2 31.2 11.6 2.4 6.5
67.2 22.7 10.1 3.3 6.2
59.2 31.2  9.6 2.4 6.0
80.2 13.2  6.6 2.0 5.8
82.2 11.1  6.7 2.2 7.2
69.7 20.7  9.6 3.1 5.9
39
Usage:
Select statistics, then run program simstat, choose
multivariate statistics, then go to K-means clustering

The next line defines the starting clusters for k = 3 
begin{values} <-- token to flag start of appended values
82.5 10.0  7.5 1.5 6.5
47.8 36.5 15.7 2.3 7.2
67.2 22.7 10.1 3.3 6.2
end{values}

The next line defines the variables as 1 = include, 0 = suppress
begin{indicators} <-- token to flag start of indicators
1 1 1 1
end{indicators}

The next line defines the row labels for plotting
begin{labels} <-- token to flag start of row labels
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
end{labels}
Back to Menu

Example 8: gauss3.tf1 (adding starting estimates and parameter limits)

The advanced curve-fitting program qnfit must have starting estimates and parameter limits to use in expert mode model fitting, and this example shows how this is done using begin{limits} ... end{limits} Each limit triple must be in nondecreasing order, i.e.
[botom limit] =< [starting estimate] =< [upper limit]
QNFIT EXPERT mode file: 3 Gaussians plus 7.5% relative error
   150     3
 -3.0000E+00,  4.1947E-03,  8.5276E-04
 -3.0000E+00,  5.8990E-03,  8.5276E-04
 -3.0000E+00,  4.9928E-03,  8.5276E-04
 -2.6330E+00,  1.3290E-02,  1.0403E-03
 -2.6330E+00,  1.3199E-02,  1.0403E-03
 -2.6330E+00,  1.5045E-02,  1.0403E-03
... 
... 137 values suppressed for clarity
...
  1.4630E+01,  3.9144E-02,  1.7401E-03
  1.4630E+01,  3.7181E-02,  1.7401E-03
  1.4630E+01,  4.0651E-02,  1.7401E-03
  1.5000E+01,  3.3475E-02,  2.4596E-03
  1.5000E+01,  3.4022E-02,  2.4596E-03
  1.5000E+01,  2.9515E-02,  2.4596E-03
    27
Data from program MAKDAT using the model:
Sum of n Gauss pdfs: Z = sqrt(2pi), C = p(3n+1)
[p(1)/Zp(2n+1)]exp{-0.5[(x - p(n+1))/p(2n+1)]^2} +
[p(2)/Zp(2n+2)]exp{-0.5[(x - p(n+2))/p(2n+2)]^2} +...+
[p(n)/Zp(3n)]exp{-0.5[(x - p(2n))/p(3n)]^2} + C
 p( 1) =  1.000E+00
 p( 2) =  1.000E+00
 p( 3) =  1.000E+00
 p( 4) =  0.000E+00
 p( 5) =  4.000E+00
 p( 6) =  1.000E+01
 p( 7) =  1.000E+00
 p( 8) =  2.000E+00
 p( 9) =  3.000E+00
 p(10) =  0.000E+00
begin{limits}
  0,  1,  2
  0,  1,  2
  0,  1,  2
 -2,  0,  2
  2,  4,  6
  8, 10, 12
0.1,  1,  2
  1,  2,  3
  2,  3,  4
  0,  0,  0
end{limits}
Back to Menu

Example 9: manova1.tf1 (adding group numbers in column 1)

The group identifiers must be integers in ascending order starting with 1.
MANOVA data: 2 groups, 5 variables (details at end of file)
 10 6
 1  11  18  15  18  15
 1  33  27  31  21  17
 1  20  28  27  23  19
 1  18  26  18  18   9
 1  22  23  22  16  10
 2  18  17  20  18  18
 2  31  24  31  26  20
 2  14  16  17  20  17
 2  25  24  31  26  18
 2  36  28  24  26  29
22
Data from Chatfield C and Collins A J
Introduction to multivariate analysis
Chapman and hall 1980 table 7.3
column 1 = group
columns 2 to 6 = variables 1 to 5
The next section defines variables to include (1) or exclude (0)
begin{indicators}
 1  1  1  1  1
end{indicators}
The next section defines labels to identify cases in plots
begin{labels}
1.1
1.2
1.3
1.4
1.5
2.1
2.2
2.3
2.4
2.5
end{labels}
Back to Menu

Example 10: surface.tf1 (plotting a 3D surface or contours)

This illustrates how to supply the z = f(x,y) data as a vector, but a matrix of values can also be used.
z = f(x,y) data for SIMPLOT: surface with 4 peaks
  1606     1
  40
  40
  0.0000E+00
  1.0000E+00
  0.0000E+00
  1.0000E+00
  1.0552E-08
  4.7420E-08
...  
... 1591 data values suppressed for clarity
...
  2.8376E-03
  1.9514E-03
  1.2821E-03
  8.0478E-04
  4.8261E-04
     13
The format for z(x,y) expressed as a vector to plot a surface, 
contours, or a 3D barchart, assuming regular grid intervals. 
Line 1 = title
Line 2 = no. of rows,  no. of columns
Line 3 = NX, no. of X-divisions
Line 4 = NY, no. of Y-divisions 
Line 5 = X-start
Line 6 = X-end
Line 7 = Y-start
Line 8 = Y-end
Next NX*NY lines are values for z = f(x,y) stored with  
successive values for y = y(1) and x = x(1), ..., x(NX),
then y = y(2) and x = x(1), ..., x(NX), etc. 
Back to Menu

Example 11: images.tfl (library file)

This is a technique for collecting together files of similar type that are required by Simfit, it is a more powerful method than multiple file selection because, if the library file has been created by program Maklib, it will be certain that all the files exist and are of a consistent type.

This file can be used by program editps to create a collage of images, but note that, in general fullly qualified path/filenames must be supplied, not just local file names.

Example of a EPS type library file 
waves.eps
rosenbrock.eps
dendrogram.eps
trinom.eps
ukmap.eps
diffusion.eps
rose.eps
gauss3.eps
convolution.eps

Information about Simfit library files:
=======================================
1) Line 1 is an arbitrary title for the library file.
2) Lines  2 to  n + 1 are names of n files grouped together by the
   library file for plotting, statistics, curve-fitting, etc.
3) The first blank line is taken to be the end of the library file
   and everything after the first blank line is ignored.
4) Library file are usually only valid if all n files specified do exist 
5) However, library files analysed by some Simfit programs (e.g. qnfit 
   and deqsol) can have % to indicate a missing data set.
6) This is a Simfit test file so local names are given for the
   Simfit test files grouped together for analysis. 
   Your own library files must have fully qualified file names
   i.e. path plus filename as in:

   C:\mydata\mydata.one 
   C:\mydata\mydata.two
   C:\mydata\mydata.three

   and not just local unqualified filenames as in

   mydata.one  
   mydata.two
   mydata.three   
Back to Menu