Stats
S-shaped curves (February 12, 2004)
Category: Nonlinear regression
Competitive binding experiments will often need a nonlinear regression
model. This model has to level off at both extremes to represent almost no
binding at one end and saturated binding at the other end. This behavior is
usually represented by an S-shaped curve. In this web page, I will describe
some of the equations that you might use to represent an S-shaped curve.
A very simple S-shaped curve.
A very simple formula that serves as a starting point is

Here is a graph of this function.

Stretching and shrinking the S-shaped curve
Multiplying x by a constant,

yields S shaped curves that are either shrunk towards the center or
stretched out.

A negative value of b,

yields curves that rise rather than fall.

Shifting the S-shaped curve
You can subtract a constant

to shift the curve.

Negative values for a, of course, would shift the curve to the left.
Selecting starting values
When you fit an S shaped curve to your data, you need to provide some
reasonable starting values. They don't have to be perfect, but they do have
to be reasonable. Here's some general guidance.
The parameter "b" represents how spread out the data. When b=1, the middle
50% of the curve (0.25 to 0.75) will span a little more than 2 units on the
x-axis. So divide 2 by the estimated span of the middle 50% in your graph and
use that as a starting value for b. Also keep in mind that a negative value
for b represents an S-shaped curve which increases rather than decreases.
The ratio a/b represents the value where the S-shaped curve reaches 0.5,
the midway point on the curve. Sometimes this midpoint is called the IC50
because it is the concentration that inhibits 50% of the binding.
If you have a rough idea what the IC50 would be (either from previous
studies or an estimate off the line graph), you would multiply this estimate
by the starting value that you selected for b to get a starting value for a.
First example
Here's some data from a normalized assay of competitive binding. X is the
concentration in log10 units, and yn is the normalized signal.
| x |
yn |
| -6.0 |
0.02 |
| -6.3 |
0.10 |
| -6.6 |
0.29 |
| -7.0 |
0.20 |
| -7.3 |
0.80 |
| -7.6 |
1.00 |
Here is a graph of the data.

The middle 50% ranges from about -7.3 to -7.1. So a reasonable starting
values b is 2 / 0.2 = 10. The midway point looks like it is close to -7.2.
Setting the ratio a/b equal to -7.2, you should get -72 as a starting value
for a.
SPSS dialog boxes
Select ANALYZE | REGRESSION | NONLINEAR from the SPSS menu.

First, click on the parameters button.

Specify b and a as parameters with starting values of 10 and -72,
respectively. Then click on the CONTINUE button.
Now include yn as the dependent variable and specify the formula for the
equation as
1 / (1 + exp (b * x - a)).

Then click on the OK button.
SPSS output
This produces a lot of output. Since the output is so large, you need to
double click on it to get a scroll bar. Here's the top third of the output.

Non-linear regression uses an interative estimation approach, and SPSS
provides you with details of these iterations. The residual SS is a measure
of how good the estimates are: it starts at 0.10 and declines to 0.09 as the
estimates improve. Our values of b and a decline slightly as the model
improves.

The remaining iterations appear in the middle third of the screen. You
should examine the reason why iterations stopped. Here they stopped because
the improvements in Residual SS became too small--at the last step, for
example, the residual SS improved only in the ninth significant digit. This
reason for stopping is usually a good sign.
Also notice the Nonlinear Regression Summary Statistics, which is actually
an analysis of variance table. The Residual SS at the final step appears here
in a rounded form (.08982). Notice that the residual SS is small relative to
the Uncorrected Total and the Corrected Total. This is a sign that your model
fits the data reasonably well.
SPSS also produces an R squared value. The interpretation of R squared is
dependent, at times, on the particular nonlinear regression model. As a
general rule, though, an R squared value close to 1.0 is an indication that
the model fits the data well.

The bottom third of the output gives parameter estimates, confidence
intervals, and correlations among the parameters. The ratio of a/b gives an
estimate of the IC50 (-56.55 / 7.92 = 7.1).
Predicted values
You can get predicted values for the regression equation by clicking on the
SAVE button. If you add some extra x values with missing yn values, you can
also get predictions at some of the intermediate points. A plot of the
predicted values shows the general shape of the S-shaped curve.

Notice that this curve badly misses on two of the values (x = -6.3, and x =
-6.6), but comes very close to the remaining four data values. There is no
way, of course to get an S-shaped curve to come close to all the data when
the data has an extra "hump" so perhaps this is as good as we can get.
Poorly specified starting values
I often have difficulty with nonlinear regression. Many times it is because
I chose very bad starting values. To illustrate what can go wrong, I
deliberately chose unusual starting values.
In the first example, I chose a starting value of -10 rather than +10 for
b. This is easy to do if you forget whether the S-shaped curve is rising or
falling.

Notice how the values for b and a hop madly about. This is a sign of bad
starting values, but it looks like SPSS finally settled down on reasonable
values for a and b. Unfortunately, when you plot the predicted values, you
see an S-shaped curve that doesn't come anywhere close to your data.

The curve you see is just a small fragment of the S-shaped curve, with most
of the interesting portions extending off well beyond the range of the data.
As another example of what can go wrong, suppose I naively set the starting
value for a to 0. Here's what the output looks like.

There are three warning signs here. First the model stopped at the first
iteration. Second, it stopped for an unusual reason. Third, the model has
negative sum of squares.
The remainder of the output also has some warning signs.

Notice that the two parameters have zero standard errors, which almost
never happens with real data. Finally the correlation between b and a is
missing.
Alternative models
Go back and look at the model with good starting values. Even here, there
are some issues. Notice how wide the confidence intervals are in spite of
this being a reasonably good fit to the data. Also notice how strongly
correlated the two parameter estimates are. Both of these things are signs of
trouble. What is happening is that the parameters are so highly correlated
that it is difficult to estimate either one individually with any precision.
There are two ways to fix this. First, some (but not all) nonlinear
regression models work better if you center the independent variable(s). You
center a variable by subtracting its mean, so that the center of the data is
now around zero. Select ANALYZE | DESCRIPTIVE STATISTICS | DESCRIPTIVES to
find out that the mean of X is -6.8. Then select TRANSFORM | COMPUTE from the
SPSS menu to create a new variable, XC, which is X - (-6.8).

Here's what the centered data looks like.

Fit the same nonlinear regression model, with two changes:
- The variable "xc" replaces x, and
- The starting value for a is now -4 = (-7.2 - (-6.8)) * 10.
Here are the results.

There is some improvement--the correlation is down to -0.9081 from -0.9998,
but the confidence intervals are still rather wide.
The estimate of the IC50 is about the same, as long as we remember to add
back the mean
(-6.8 + (-2.71) / 7.92 = -7.1).
A second approach is to reparameterize the nonlinear formula. By "reparameterize",
I mean provide a formula that looks a bit different because some of the
parameters have been re-arranged. Here is a reparameterization that is worth
considering:

Here is the output.

Notice that the parameters u and b are almost perfectly uncorrelated. The
confidence interval for b is still wide, but the confidence interval for u is
narrower than the confidence interval for a would have been. Perhaps this is
comparing apples to oranges, but I believe that the reparameterized model is
more useful.
One aspect of this reparameterized model is that u, rather than the ratio
a/b, is a direct estimate of the IC50. We can get confidence limits for the
IC50 as well (-7.3 to -6.9).
S-shaped curves with arbitrary upper and lower bounds
In this experiment, a positive and negative control were used to set the
ceiling and floor of the S-shaped curve. But perhaps these controls were
themselves subject to random error, so it might be interesting to allow the
nonlinear regression model to set the upper and lower limits rather than
arbitrarily fixing them at the control levels.
Here's a plot of the data with the positive and negative controls. In
theory, the controls represent the limits at plus or minus infinity, but I
just set them well outside the range of the remaining data.

Notice how the signal at x = -7.6 extends above the ceiling defined by the
negative control. When we normalized this data, we truncated that value at
1.0 since the simple S-shaped curve could not extend above 1.0 or below 0.0.
With the raw or unnormalized data, we can allow the model itself to set the
ceiling, possibly at a value slightly larger than the negative control and
slightly smaller than the signal at -7.6.
The formula for the S-shaped curve adds two additional parameters, c and d.

The parameter c represents the floor, the minimum value of the S-shaped
curve, and the curve rises to a maximum value of c+d. For starting values, we
can set c to the positive control level (0.2) and if we set c+d to the
negative control level (2.0) then a starting value of 1.8 for d seems
reasonable. Here's the output.

The estimated floor and ceiling levels are a bit surprising. The floor (c)
is 0.4, which is larger than the positive control and the ceiling (c+d)
is 2.3, which is larger than the negative control. This model has an IC50
which is slightly smaller (-7.2) than what the earlier model had computed
(-7.1).
Here's a graph of the fitted curve.

This graph confirms the observations we noted above. Notice how the curve
levels off at a value well above the positive control and well about the
signal at x = -6.0.
Comparing two S-shaped curves
Often in competitive binding assays, we want to compare the IC50 for
different proteins. Using nonlinear regression, we can fit a pair of parallel
S-shaped curves to each protein.
| x |
in |
y |
| -6.0 |
0 |
0.02 |
| -6.3 |
0 |
0.10 |
| -6.6 |
0 |
0.29 |
| -7.0 |
0 |
0.20 |
| -7.3 |
0 |
0.80 |
| -7.6 |
0 |
1.00 |
| -6.0 |
1 |
0.00 |
| -6.3 |
1 |
0.18 |
| -6.6 |
1 |
0.33 |
| -7.0 |
1 |
0.27 |
| -7.3 |
1 |
1.00 |
| -7.6 |
1 |
1.00 |
The middle column is an indicator variable which equals zero for the first
protein and one for the second protein. Here's a plot of the data.

The blue circles represent the first protein. The green pluses represent
the second protein.
To compare the two proteins we will incorporate the indicator variable into
the nonlinear regression model and add a parameter to estimate the difference
between the two S-shaped curves.

Here is the output.

The estimate for u (-7.1) is the estimated IC50 for the first protein. The
estimate for v (0.07) is the difference in the IC50 between the two proteins.
The 95% confidence interval for v goes from -0.17 to 0.31, which indicates
that the estimated difference could easily result just from sampling error.
Here is a graph of the data.

There is a slight separation between the two S-shaped curves. Notice also
that both proteins show a small "hump" at x =-6.6 and -6.3 which neither
S-shaped curve can fit well. This should be investigated further.
07/08/2008.