Stats
A new and simple approach for monitoring safety data (November 18, 2007)
Many hospitals administrators collect safety data, and for the most part
this data is not analyzed well. The people who collect the data are
well-meaning, but the simplistic tables and graphs that they use are
typically unable to reveal important trends and patterns in the data. Much of
the safety data represents a description of events (usually bad events) that
occur. The question that always seemed to be on their minds was: is there a
sudden surge of events that we need to take action on?
The groups that monitor research (Research Ethics Boards or Institutional
Review Boards) also examine safety data. The first thing they are looking for
either an unexpected adverse event that might require a more detailed
informed consent form. These review boards are also concerned with unduly
high rates of an adverse event that might tip the risk-benefit ratio the
wrong way and require that the research study be modified or shut down. Again
much of the review is well-meaning, but is too simplistic to provide an
accurate picture of what is going on.
It was in recognition of the special difficulties that these two groups
have with monitoring safety data that I started researching some adaptations
of the control chart. The work I've done so far is in four areas: analysis of
date gaps rather than rates, adjustments for patient load that provide
solutions analogous to the number needed to harm calculation, and Bayesian
prior distributions and their application to safety data.
Date gaps rather than rates
Consider a series of n events that occur at times T1, T2,
..., Tn. The date gaps G2, G3 , ..., Gn-1
are defined as
Gi = Ti - Ti-1.
You can optionally define an initial time T0 that represents the
time that observation started and an initial date gap,
G1 = T1 - T0.
Monitoring the date gaps will allow you to monitor important trends. If the
events are occurring more frequently than expected, the average time between
events will be smaller than expected. If the events are occurring less
frequently than expected, then the average time between events will be larger
than expected.
Consider a hypothetical research study that started in January 1997 with
the intention to recruit 12 patients per year (one per month) over a ten year
period, for a total sample size of 120 patients. By the end of June 2004,
(roughly 7 1/2 years), the study has enrolled 42 patients (Table 1).
2/26/1997 4/ 4/1997 7/ 7/1997
7/25/1997 2/ 5/1998 2/15/1998
3/ 6/1998 7/ 3/1998 8/ 3/1998
2/ 8/1999 3/19/1999 4/20/1999
5/29/1999 6/21/1999 7/27/1999
9/ 6/1999 1/10/2000 1/11/2000
2/28/2000 3/ 3/2000 4/13/2000
5/30/2000 11/21/2000 12/18/2000
2/ 6/2001 4/30/2001 8/ 3/2001
1/20/2001 12/ 3/2001 12/ 7/2001
9/27/2002 10/ 1/2002 2/ 2/2003
3/ 3/2003 10/31/2003 11/ 4/2003
11/11/2003 1/ 5/2004 2/ 2/2004
4/15/2004 5/23/2004 6/ 2/2004
Note: this table uses the American format for dates (mm/dd/yyyy) rather
than the European format (dd/mm/yyyy).
Clearly this clinical trial has problems. The actual accrual rate is a
meager 5.6 patients per year, and now it is probably too late to fix things.
In order to finish on time, the researchers would have to recruit at a rate
more than 30 patients per year over the remainder of the study. This is more
than 5 times faster than the current accrual rate and 2.5 times faster than
the original planned accrual rate.
Wouldn't it be nicer if the researcher had noticed the problem two years
into the study rather than 7 1/2 years out? The researcher would still have
to hustle, but 14 patients per year would allow the study to still finish on
time and it represents only a modest increase over the planned rate.
| An important aside: I am using the example of
accrual in a clinical trial for two reasons. First, it is easy to
explain. There are some minor complexities with tracking adverse events
that make it more difficult to discuss. Second, I have done a lot of the
preliminary work in this area with the understanding that it can be
easily applied to other areas. From the perspective of
pharmacovigilance, imagine that the dates are not the dates that
patients entered a clinical trial, but rather the dates that a medical
device failed or the dates that a patient is hospitalized because of an
adverse drug reaction associated with the drug you are studying. |
The traditional approach to examining rates is to set a time interval
(weeks, months, or years, for example) and count the number of events per
that time interval. For example, you could compute the monthly rates
Jan97 0
Feb97 1
Mar97 0
Apr97 1
May97 0
Jun97 0
Jul97 2
etc.
The plot of monthly rates looks like this:

Or the yearly rates
1997 4
1998 5
1999 7
2000 8
etc.
which looks like this:

Or something in between like the quarterly rates
97Q1 1
97Q2 1
97Q3 2
97Q4 0
98Q1 3
etc.
which looks like this:

A narrow time interval allows you to respond very rapidly, but the
individual values (mostly zeros and ones) are so granular that the
information value of this approach may be limited. The yearly approach has
more information for any single time interval, but you have to wait a full
year or more to spot any important changes. A quarterly interval offers the
best (worst?) of both worlds.
Here is how you would compute the date gaps for this data set:
56 = ( 2/26/1997) - ( 1/ 1/1997)
37 = ( 4/ 4/1997) - ( 2/26/1997)
94 = ( 7/ 7/1997) - ( 4/ 4/1997)
etc.
The date gaps offer two advantages over monthly, quarterly, or yearly
rates. First, the date gaps are self scaling. Here's a plot of the date gaps:

I deliberately used a mixture of units on this graph to emphasize an
important point. One of the big advantages of using the date gap is that the
graphs are self-scaling. If you are examining events that occur frequently,
your date gaps will be in the lower portion of the graph, where the units are
expressed in days or weeks. If you are examining events that occur rarely,
your date gaps will be in the upper portion of the graph, where the units are
expressed in months, quarters, or even years.
Another advantage of the date gap is that it liberates you from arbitrary
calendar boundaries. Suppose that this chart were monitoring some type of
adverse event that was occurring infrequently (every other week or so), and
suddenly you noticed three adverse events on three consecutive days (December
2, 3, and 4). Do you tell yourself, "Hmmm, that's interesting. We'll have to
see what the monthly rate will be come December 31"? With a date gap model,
every time an event occurs, another date gap is added to the chart. You don't
have to wait until the end of the month, end of the quarter, or (heaven
forbid!) the end of the year before you draw your conclusion. The date gap
allows you to respond rapidly to a sudden surge of events.
A third advantage of the date gap is that the terms in the series of date
gaps form a telescoping sum. If you computed the average date gap, for
example, it would be

which simplifies to

When you divide the number of events by the total elapsed time, you get the
average rate. So what this formula is telling you is that the average date
gap is the inverse of the average rate. Take 42 patients and divide by 7.5
years and you get 5.6 patients per year. The average date gap is 65 days or
0.18 years. If you compute 1 / 0.18, you get 5.6.
This is hardly surprising if you think about it. If you are seeing one
event every fifteen days on average (half a month between events), that
represents a rate of 2 per month.
Adjustments for patient load and the number needed to harm calculations
I want to propose some adjustments to the date gap calculation. Let's
pretend that we are in a bizarre Einsteinian universe where time is not
always constant. This is not too hard to imagine: some days seem to go very
slowly and others fly by. There's a joke that is widely circulated about this
concept.
If I had only one hour to live, I would spend it in a Statistics
class. It would just seem to last so much longer.
Suppose the march of time is represented by a monotone nondecreasing
function F( ). It has to be nondecreasing because you don't want to allow for
the possibility of travel backwards in time. When the slope of F( ) is large,
time marches slowly. When the slope of F( ) is nearly small, time whizzes by
quickly.

Think of the curve as a hill that you are climbing. When the hill is steep
you need a lot of time to move just a little bit, but when the hill is flat,
you can cover long distances quickly.
Define an adjusted date gap Ai by the formula
Ai = F(Ti) - F(Ti-1)
Here's a simple example. Choose a function F that has slope 1 for five
days, is flat for two days, then repeats itself.

If you use this function to compute an adjusted gap, it treats some gaps
the same way: there are two days between Tuesday and Thursday, for example.
But when two time points straddle a weekend, the Saturday and Sunday are
ignored. So the adjusted gap between an event on Friday and an event on
Monday is only 1, not 3. This adjustment counts the number of working days
between two events.
Now in most medical situations, it makes little sense to ignore the
weekends because people don't stop taking medications during the weekend. A
more realistic use of adjustments involves tracking the cumulative number of
patients seen. In the example shown above, the graph of the cumulative number
of patients would be

These patients are undergoing peritoneal dialysis. Some of them experienced
complications during the placement of their catheters. The patients who
experienced problems were recruited on days 93, 579, 1675, and 2588. They
represented the 2nd, 9th, 27th, and 39th patients.

When you compute the adjusted date gaps, you are effectively looking at
distances in the vertical dimension rather than the horizontal dimension.
These adjusted gaps (2, 7, 18, and 12), represent the number of patients
that you have to wait between complications rather than the number of days
that you have to wait between complications.
The average adjusted gap also simplifies because of a telescoping sum

which simplifies to

In the example, the average adjusted gap is (2+7+18+12) / 4 = 39 / 4 =
9.75. The denominator, 4, represents the number of patients who experience
problems and the numerator, 39, represents the number of patients seen up to
and including the fourth problem.
The fraction 4 / 39 represents the estimated probability that a patient
will experience catheter related problems. The inverse of that probability,
39 / 4, is known as the number needed to harm (NNH). This number tells you
that you would have to insert about 10 catheters in order to find one patient
that has trouble with the catheter.
Each time a new patient experiences an adverse event, you get an additional
adjusted gap which helps you refine the estimate of the NNH. The individual
adjusted gaps can even be thought of as individual point estimates of NNH and
they allow you to look for trends and patterns.
There are other adjustments that also make sense and lead to an NNH
calculation. If a patient can experience multiple adverse events (infections
or re-hospitalizations, for example), you might want to calculate the
cumulative number of patient days at risk. The adjusted chart then measures
the number of patient days between events.
Another possibility is to track the cumulative number of medications
dispensed by a hospital pharmacy. Then the adjusted chart would measure the
number of pills between events.
Finally, the holy grail of medical research is developing statistical
measures of acuity. It seems like the doctors who do the best jobs get
referrals for the toughest and most intractable patients. So a naive
comparison will end up making the best doctors look like the worst
performers. It is unclear what form these acuity adjustments will take, but
when they become available, a cumulative acuity score will allow you to look
at a risk adjusted time between events.
What is a reasonable value for NNH?
The NNH has tremendous value for safety data because it places the data in
a context where it is easy for medical professionals to make informed
decisions about the relative risks and benefits of a new drug or device.
Here's a simple example that I calculated from a research paper. A flu
vaccine has an efficacy of 17%. It prevents the flu in about one out of every
six people vaccinated. This tells you that the number needed to treat (NNT)
is 6. The vaccine does not come without side effects, however. One of the
side effects is fever. About 1.1 % of all patients vaccinated develop a short
term fever. This tells you that the NNH is 90.
To see if the benefits are worth the risks, it is useful to examine the
ratio of NNT to NNH. This ratio, 15, tells you that the vaccine prevents 15
cases of flu for every additional short term fever that has to be endured.
I'm not a medical expert, but this seems like a very good tradeoff. The short
term fever seems relatively mild compared to the problems caused by a bout of
the flu. In fact, I'd be tempted to say that a ratio of 1 to 1 or even higher
might still make the vaccine a worthwhile endeavor.
So, to set an acceptable NNH target, ask yourself how serious the side
effect is relative to how beneficial a cure would be. Then set a target for
NNH that makes its ratio comparable to the relative severity. Suppose, for
example, that we found a drug that cured the common cold. In one out of every
four patients, the sniffling, sneezing, and coughing just disappeared. But
let's suppose that the drug produced a rare but serious side effect,
formation of kidney stones. Kidney stones are a very serious matter. If you
created as many kidney stone cases as you saved in sniffling, sneezing, and
coughing, that would be an unacceptable trade-off. So how much worse are
kidney stones-10 times worse, 50 times worse, 100 times worse? If you
believed that kidney stones were 50 times worse--that you would be willing to
endure 50 cases of sniffles, sneezing, and coughing rather than a single
extra case of kidney stones, then you need to make sure that the NNH is
smaller than 50*4 = 200.
Now there are complex issues involving public perception, regulator
scrutiny, etc. that may dominate your concerns and force you to adopt a
different standard. But setting the NNH so that it creates an acceptable
ratio to NNT offers a credible medical way of determining what safety level
is appropriate.
Monitoring targets with a CUSUM chart
The date gaps also provide an interesting pattern when you plot them in a
CUSUM plot. The CUSUM plot examines the cumulative deviation from a target.
In the example of the clinical trial, the original goal was to recruit 12
patients per year or one every 30 days. So the cumulative sums are
S1 = (30 - 56) = -26
which tells you that the first patient was recruited 26 days behind
schedule. The second cumulative sum is
S2 = (30 - 56) + (30 - 37) = -33
Since the second patient took seven days longer than your target, you have
fallen 7 more days behind for a total deficit of 33 days. With the third
cumulative sum,
S3 = (30 - 46) + (30 - 37) + (30 - 94) = -97
you have learned that you are now more than three months behind schedule.
Here's a plot of all the cumulative sums.

You can see that the pattern is consistent--with every patient recruited,
you are falling further and further behind. Once in a while you make a tiny
bit of progress upward, but the downward trend tells you that this study is
already 4 years behind schedule.
The rules for identifying a signal in a CUSUM chart are somewhat complex.
You set a vertical distance h and a horizontal distance d that define a
V-mask.

(Source:
www.itl.nist.gov/div898/handbook/pmc/section3/pmc323.htm)
The choices for h and d are not defined well. An alternative choice is to
set a Bayesian prior distribution, compute the posterior distribution for
each cumulative sum and then examine the 2.5 percentile and 97.5 percentile
of this distribution. If the path of future cumulative sums stays inside the
2.5 and 97.5 percentiles then the process is in control. If the path drops
below the 2.5 percentile, then events are occurring more frequently than the
previous trend might suggest. If the path rises above the 97.5 percentile,
then events are occurring less frequently than the previous trend might
suggest.
Here's an example

This chart represents the cumulative patient years between exit site
infections in a cohort of patients undergoing peritoneal dialysis. Let's
suppose that a change in treatment options was made after the 20th event. You
want to examine the trend of the following events to see if the change led to
a substantial slowing of these bad events. Although the original trend
appears to persist for the next seven or eight events, the graph then takes a
sharp upward swing. This increase in the amount of patient years between exit
site infections shows that the change eventually led to a lower rate of exit
site infections.
I'm not an expert on Bayesian methods, so most of the credit for this
approach belongs to a colleague of mine, Byron Gajewski. These ideas are
still in the early stage of development which may lead to some vagueness in
my writing. My relative inexperience in Bayesian methods may also contribute
to some of the vagueness. Please bear with me, though, because the Bayesian
approach appears to be a very attractive one for safety data.
A common objection to the use of Bayesian prior distributions is that the
researcher should not go into the research with preconceived notions on how
the data should behave. That's a debate which I don't want to tackle today,
but it is worth noting that there are some notable exceptions to the rule
about preconceived notions.
First, the Bayesian approach always allows you to specify a vague prior.
The vague prior can either be your acknowledgement that you don't really have
a lot of information about how this experiment will come out or it can
represent your effort not to incorporate any preconceived notions into the
data analysis.
Second, the example that I just described involves accrual of patients into
a clinical trial. No researcher would start a project unless they had at
least an inkling of how many patients were out there who might qualify for
the research and how many of those might volunteer for the study.
This perspective is probably accurate for pharmacovigilance studies as
well. These studies are not done in a vacuum because you have already
accumulated some information about adverse events during the process of
getting your drug approved. It would be naive to ignore this information. In
fact, the careful and judicious use of Bayesian priors might represent a
formal way to combine safety information across Phase III and Phase IV
trials.
Third, a process of careful Bayesian analysis ought to include the
specification of not a single prior distribution, but several. It might be
wise to adopt both an optimistic and a pessimistic prior distribution for an
efficacy study, for example. If the Bayesian analysis midway through the
trial shows that even a pessimistic prior leads to a declaration of efficacy,
you have a strong case for stopping the trial for early evidence of efficacy.
After all, the data is convincing enough that even a pessimist has to admit
that the results are promising. If the Bayesian analysis midway through the
trial shows that even an optimistic prior leads to declaration of no effect,
you have a strong case for stopping the trial early for futility. After all,
if the data is so disappointing that even an optimist's hopes are dashed, why
go any further?
Conclusion
When you are monitoring safety for a newly marketed drug or device, the
control chart represent a simple approach that is easy to apply and easy to
understand. It is especially useful if the safety event is well defined. You
can improve the sensitivity of the control chart by computing the date gap.
Adjusting the date gap for the number of patients seen or the number of
medications dispensed provides a way for you to continually monitor the
number needed to harm. The CUSUM chart and Bayesian prior distributions allow
you to improve the sensitivity to small but consistent changes in the signal.
Category: Adverse events in clinical
trials