Calibration, bioassay, and dose-response curves.
Abstract
Here are some definitions commonly encountered in this branch of
data analysis, but note that there are many other equivalent
definitions which the context should clarify.
- Calibration
This involves fitting an arbitrary model y = f(x) to a
fixed training set of y(x) data measured at exactly known x-values,
usually in order to use such a reference calibration curve to predict x
given new y observations that are not in the training set.
- Bioassay
This means using a reference calibration curve, usually of restricted
type with limited shape possibilities, to predict x given y,
or evaluate y given x (with confidence limits if possible).
- Dose response curve
This is a special type of best fit calibration curve
(usually monotonic with f(x) equal zero when x is zero and
f(x) approaching a finite asymptote as x tends to infinity,
or vice versa) that has been
constructed to estimate curve shape parameters, such as half saturation
points or final asymptotes, without necessarily supplying any new data
for prediction.
- EC50
Estimated concentration giving half maximal response in a dose-response curve,
i.e. median effective dose, or ED50.
- IC50
Estimated concentration giving half maximal inhibition in a dose-response curve.
- LD50
Estimated concentration causing fifty percent mortality in a dose-response curve,
i.e. median lethal dose.
- AUC
Estimated area under a best-fit dose-response curve (usually from zero to infinity).
- t-half
Estimated time to half maximal response in a time-reponse curve.
Calibration details
Calibration involves one or more typical procedures.
- Measure responses y at fixed values of x (and replicates
to estimate s the sample standard deviation of y ?).
- Find a best fit curve y = f(x) to minimise the sum of
weighted squared residuals.
- Supply x-values and predict y with 95% confidence
limits (i.e. evaluation of y = f(x) if required)
- Supply y-values then predict x with 95% confidence
limits (i.e. inverse prediction for x = g(y)).
Sometimes s is known independently, but is often supposed
constant and unweighted regression is unjustifiably used.
Sometimes a deterministic model is used for f(x), e.g. a
sum of Logistics or Michaelis-Menten functions, but this
is unwise. Calibration curves arise from the operation of
numerous effects and cannot usually be described by one
simple equation. Use of such equations can lead to biased
predictions and is not recommended. Polynomials are useful
for gentle curves as long as the degree is reasonably low
(say, less than 3 ?) but, for many purposes,
a weighted least squares
data smoothing cubic spline is the best choice.
Unfortunately polynomials and splines are too flexible and
follow outliers, leading to oscillating curves rather than
data smoothing which is really required. Also they cannot fit
horizontal asymptotes. You can help in several ways:-
- Getting good data in the first instance.
- If the data approaches horizontal asymptotes, either leave
some data out (it is no use for prediction anyway), or at
least use log(x) rather than x to minimise the problem.
- Experiment with the weighting schemes, polynomial degrees,
spline knots or constraints to find optimum combinations
for your problem.
- Remember that predicted confidence limits depend on the
s values you supply, so get the weighting scheme right.
- You will be warned if f(x) has a turning point since this
can make inverse prediction ambiguous. You can then re-fit
to get a new curve, eliminate bad data points, get new data
etc., or carry on if the feature seems to be harmless.
Dose-response curve details
It should be pointed out first of all that Simfit is set up to use x
not log(x) as the independent variable, e.g., concentration and not
log(concentration). If a hyperbolic saturation curve is fitted to
y(x) data, then Simfit can automatically plot y as a function of
log(x) after fitting if sigmoidal hardcopy is required in
semilogarithmic format.
Dose-response curves are special types of calibration curves
that are usually constructed in order to estimate
such parameters as maximal response, maximum growth rates, maximum
inhibition rates, half saturation points, times to half maximal
response, LD50, etc. It is also usual to require 95% confidence limits
when fitting such curves, and this can lead to complications.
Probably GCFIT is the most versatile Simfit program for this sort of
analysis, but the subject is complicated and controversial so a number
of issues should be considered.
First of all, polynomials or splines can fitted, and this is sometimes
the best thing to do if the data are very noisy or you have no idea
what would be a satisfactory model. Unfortunately, such curves can
have turning points and do not fit asymptotes well. However, the sort of
models that are often fitted to dose response curves to accomodate
asymptotes have restricted shapes, which can lead to biased parameter
estimates. For instance, hyperbolic kinetic
or binding curves cannot fit sigmoid curves, and the usual models
fitted to sigmoid curves (arctan, tanh, logit, logistic, probit, etc.)
often fail to fit because they are too symmetrical. Since every case
has to be treated on its own merits, the only way to achieve success
is to experiment by fitting alternative models until the best option
is discovered. Never accept a computer calculated value unless you
have inspected a plot and confirmed that the best-fit curve is
a sensible choice. A list of suggestions follows.
-
The y(x) data are in the form of a hyperbolic binding curve
and the final asymptotic saturation point must be estimated.
Try MMFIT (or HLFIT if you prefer association constants) but note
that you may require a model of order 2 or 3 to get a good fit,
when the half saturation point will be calculated numerically
without confidence limits.
-
The y(x) data are in the form of a sigmoidal or non-sigmoidal
binding curve and the data has been normalised to proportions, i.e.
to approach an asymptote of 1, say, at 100% saturation.
Try SFFIT but note that you may require order 2 or 3 for a good
fit, when the half saturation point will be calculated numerically
without confidence limits. Program INRATE can be used to fit the
Hill equation with a fixed or variable exponent to sigmoidal
y(x) data.
-
The y(x) data are in the form of a growth curve
and the maximum size at the final asymptote must be estimated.
Program GCFIT is preferred in this situation and it has the
advantage that many models can be can be fitted, and most can
estimate half time points with 95% confidence limits. Also,
some growth models (e.g. the exponential or logistic) in GCFIT
can analyse decreasing data sets to estimate LD50.
-
The y(x) data are in the form of an inhibitor or isotope displacement curve
and the starting inital saturation or half inhibition point
must be estimated.
Try MMFIT in the inhibitor/isotope-displacment mode.
-
The y(x) data are in the form of a survival curve and the data
have been normalised to proportions.
Try program GCFIT in survival mode. This is particularly
useful for determining LD50 values by fitting the Weibull
model.
-
The data are not of y(x) form but are in the form of times to failure,
possibly with censoring.
Try program GCFIT in survival time mode to determine the
time to half survival with 95% confidence limits.
-
The data are not of y(x) form but are in the form of numbers
surviving, say x in groups of size N.
This is frequently the case with bioassay when percentiles
(such as LD50) are required, and the simplified GLM procedure
should be used from either GCFIT or LINFIT in GLM mode, or SIMSTAT in
regression or analysis of proportion mode. The logistic and
probit models are very similar, but the complementary log-log
model is more versatile as it is not symmetrical about the
midpoint.
Some advice
- Prepare y(x) type data input files for the calibration curves using
program MAKFIL.
- Prepare time series and GLM type data files using program MAKMAT.
- Prepare x vector type data input files for prediction (evaluation) and
inverse prediction using program MAKMAT.
- If the data are a shallow curve use POLNOM to fit a low
degree (=< 3 ?) polynomial or line.
Just choose degree = 1 for linear regression.
- Test files for POLNOM are polnom.tf1 (calibration), .tf2
(evaluation) and .tf3 (inverse prediction).
- If the data are of a more complicated form use the program
CALCURVE to fit splines, but keep the number of knots to a
minimum and explore alternative axes and weightings.
- Test files for CALCURVE are calcurve.tf1 (calibration),
.tf2 (evaluation) and .tf3 (inverse prediction).
- calcurve.tf1 is set up as a model showing you how to use
program CALCURVE in the expert mode.
- Test files for LD50 by GLM are ld50.tf1 and ld50.tf2.
- Experienced users can fit any model using QNFIT for
calibration, area under curves, etc.)
Back to Help Menu or End Help