|
 |
The
Trouble With Animal Models
Why
did human trials fail? By
Andrea Gawrylewski |
|
On October 26, 2006, at the opening day of the
Joint World Congress for Stroke in Cape Town,
South Africa, disappointing news spread quickly
among the attendees: The second Phase III clinical
trial for NXY-059 had failed. The drug, a
free-radical spin trap agent for ischemic stroke,
had been eagerly anticipated as a successful
neuroprotective agent for stroke patients. As the
drug developer, AstraZeneca, issued a press
release reporting the news, e-mails circulated
quickly within the stroke research community, many
with the subject line, "Have you heard the bad
news?"
"We were optimistic that this would be the new
stroke drug," says Marc Fisher, director of the
stroke program at the University of Massachusetts
Medical Center, who was at the conference in Cape
Town. "We were all talking about it. There were a
lot of long faces that day."
To the dismay of clinicians and researchers of
acute stroke, the compound showed limited efficacy
in neuroprotection versus the placebo. Instead,
NXY-059 joined the family of more than a dozen
failed neuroprotective agents, including glutamate
antagonists, calcium channel blockers,
anti-inflammatory agents, GABA agonists, opioid
antagonists, growth factors, and drugs of other
mechanisms. All had reached Phase III clinical
trials and failed miserably at doing what their
animal model tests had suggested they would: stop
the cascade of necrosis in the event of stroke,
and protect the remaining viable brain cells.
That NXY-059 had fallen victim to the same fate
was particularly disheartening to a stroke
roundtable group that had, in 1999, directly
addressed the disconnection between animal models
for stroke and their counterpart human trials. The
group had devised a set of guidelines whose aim
was to standardize the path to stroke
therapeutics. During its development, NXY-059 had
been its poster child. "This drug was being hailed
as the first one to follow the standards," says
Sean Savitz, assistant professor of neurology at
Harvard Medical School. "But it didn't do that."
"It was very disappointing to all of us to have
it fail, and it totally failed," says Sid Gilman,
director of the Michigan Alzheimer's Disease
Research Center in the Department of Neurology at
the University of Michigan, and one of the
consultants for AstraZeneca on the Phase III
clinical design.
If the outcome of the Stroke Acute Ischemic NXY
Treatment (SAINT) trials was an anomaly,
investigators might have just shrugged it off. But
it's not: Nearly half of all molecular entities
that come into development fail, according to
Janet Woodcock, deputy commissioner of the Food
and Drug Administration. "There's no doubt about
the absence of an effect [of NYX-059], and that
called into question the many other studies in
stroke, and how good are the animal models?" says
Gilman. "So many agents appeared to be effective
in the animal model and failed in human
trials."
Because of these failures, hundreds of millions
of dollars, and a potential approach to stroke
treatment, have disappeared down the drain. The
failure of NXY-059 may have stalled the quest for
a neuroprotective agent, at least for some time.
"This trial has poisoned stroke studies," says
Gilman. "I'm doubtful that investors will want to
invest in clinical stroke trial[s] for a while."
The fault, it appears, may rest in the slipshod
use of animal models.
In
1998, Fisher flew from Boston to Germany to help a
drug company, along with academics specializing in
animal modeling, to examine two sets of clinical
trial results for a new stroke treatment that had
failed. They wanted to uncover where they had gone
wrong. (Fisher declines to disclose the company
and trials that were involved.)
 |
|
Human brain scan of cerebral
hemorrhaging four weeks after astroke ᅡヨ the
blue indicates internal bleeding. AstraZeneca
also tested NXY-059 in the CHANT clinical trials
to treat intracranial hemorrhage, but further
development past the phase II trial was culled
after the failure of SAINT II. |
|
On his return flight, it occurred to Fisher
that the chaos the stroke research field had been
facing for years might benefit from the kind of
meeting he had just attended: industry and
academia collaborating to develop standardized
practices. The next year, Fisher convened the
first Stroke Therapy Academic Industry Roundtable
(STAIR) group that devised a set of
recommendations for preclinical and clinical
stroke drug development. On the preclinical side,
some of the recommendations seemed obvious: The
candidate drug should be evaluated in rodents and
also higher animal species; blind testing should
be performed; tests should be done in both sexes
and in varying ages of animals; and all data, both
positive and negative, should be published.
Approximately 26 million animals are used for
research each year in the United States and
European Union, according to estimates by the
Research Defense Society in the United Kingdom.
However, the number of animal procedures has been
reduced by half over the past 30 years, likely due
to stricter controls, improvements in animal
welfare, and scientific advances.
Still, unlike in human clinical trials, no
best-practice standards exist for animal testing.
STAIR is the stroke research community's attempt
at standardization. NXY-059 was the first
neuroprotective agent to be developed under the
auspices of the STAIR guidelines, though the
implementation of the guidelines may have been
just lip service. In particular, as Savitz wrote
in an article published online in Experimental
Neurology in May, the preclinical testing had
several holes, including statistical robustness
and the way in which the results were translated
into clinical design.1
The main problems, Savitz writes, were
randomization and bias. In the initial evaluation
of NXY-059 in rat models of focal ischemia,
reports didn't say whether researchers had been
blinded with regard to drug administration,
behavioral testing, and histologic analysis. The
results from the rodent study were mixed, showing
a range of reduction of cerebral infarction size
over a variety of intervals. However positive the
results might have been, Savitz notes, the clear
lack of statistical robustness calls any result
into question. A subsequent report on the effects
of NXY-059 in a rabbit embolic model showed a 35%
reduction in infarction after 48 hours, but it did
not indicate whether statistical analysis, blind
testing, physiologic measurements, blood flow
monitoring, or behavior assessments had been
done.
AstraZeneca maintains that the preclinical
animal tests and the clinical phase of SAINT
adhered to the STAIR guidelines: "The design of
the SAINT trials was sound and well considered in
light of the strong evidence for neuroprotection
that existed across the models and species tested
at the time," according to a statement sent to
The Scientist in response to Savitz's
paper. Gilman, also editor-in-chief of
Experimental Neurology, says he is not
aware of any official response being drafted or
submitted by AstraZeneca.
"So many agents
appeared to be effective in the animal model and
failed in human trials." - Sid
Gilman |
The
statistical troubles that mired some of the
NXY-059 preclinical trials are common in animal
models. Surveys of papers based on animal models
find errors in about half, according to Michael
Festing, a recently retired laboratory animal
scientist at the UK Medical Research Council and
board member of the National Center for Three Rs
(NC3Rs ᅡヨ replacement, refinement, and reduction),
an organization that advocates using fewer animals
in research and streamlining current animal tests.
"Whether those are serious enough that the
conclusions are invalid is debatable," Festing
says.
Even the innumerous successful cases of animal
experimentation that led to effective treatments
for high blood pressure, asthma, transplant
rejection, and the polio, diphtheria, and whooping
cough vaccines were all carried out without
standardized testing methods.
"People don't report if studies are
randomized," says Ian Roberts, professor of
epidemiology at the London School of Hygiene and
Tropical Medicine. How animals are selected, or
whether assessments were blind, are rarely
included in the methods and thus create a
potential for bias. "Imagine a cage of 20 rats,
and you've got a treatment for some," explains
Roberts. "So you stick your hand in a cage, and
pull out a rat. The rats that are the most
vigorous are hardest to catch, so when you pull
out 10 rats, they're the sluggish ones, the tired
ones, they're not the same as the ones still in
the cage, and they're the control. Immediately
there's a difference between the two groups."
The NC3Rs, in cooperation with the National
Institutes of Health, is surveying a group of 300
papers, half from the United Kingdom, half from
the United States, for their statistical quality
in mouse, rat, and primate model studies.
Researchers hope that by fall they will have a
report describing how well (or not) the studies
were randomized and whether they used the correct
statistical methods. In an initial pilot study of
12 papers conducted in 2001 for the Medical
Research Council, Festing reported: In six of the
papers the number of animals used wasn't clear;
only two of the papers reported randomization; and
only six of the papers specified the sex of the
animals tested. (For more on how gender can
influence results, see "Why
Sex Matters".)
 |
|
|
Illustrations by Joelle
Bolt
|
Statistics aren't the only problem. Methodology
is arbitrary, replication is lacking, and negative
results are often omitted. A report in Academic
Emergency Medicine by Vik Bebarta et al. in
2003 showed that animal experiments where
randomization and blind testing are not reported
are five times more likely to report positive
results.2
In a December 2006 paper in the British Medical
Journal, Pablo Perel et al. showed that in six
clinical trials for conditions including neonatal
respiratory distress syndrome, hemorrhage, and
osteoporosis, only three of the trials had
corresponding animal studies that agreed with
clinical results.3
The authors attribute this discrepancy to poor
methodology (i.e., bias in the animal models) and
the failure of the models to mimic the human
disease condition.
The
difficulties associated with using animal models
for human disease result from the metabolic,
anatomic, and cellular differences between humans
and other creatures, but the problems go even
deeper than that.
| When experimenting in
animals researchers often use incorrect
statistical methods, adopt an arbitrary
methodology, and fail to publish negative
results. |
One of the major criticisms of the NXY-059
testing was the lack of correlation between how
the effects of the drug were monitored in animals
versus in humans. In the rodent model, researchers
induced an ischemic event, administered the drug
at various time intervals, and measured the size
of the infarction. During the clinical trials,
however, the drug's effect was evaluated in stroke
patients using behavioral indicators such as the
modified Rankin scale and NIH stroke severity
(NIHSS) scale. In the primate tests the behavior
assessments were based on a food-reward system,
showing that NXY-059 did not improve left arm
weakness in the aftermath of a stroke. "Even if we
accept that NXY-059 does improve arm weakness,"
writes Savitz, "how would such a finding translate
to human acute stroke studies that use the
modified Rankin scale and NIHSS scores as primary
outcome measures?" Indeed, some consider the two
phases of testing, from animal to human,
completely out of whack, and that only by
statistical fluke was SAINT I, the first clinical
trial, deemed a success.
Some say that animal research is best when
targeted at specific mechanisms of action.
"Animals are better used for understanding disease
mechanism and potential new treatments, rather
than predicting what will happen in humans," says
Simon Festing, executive director of RDS (and son
of Michael Festing). RDS is a UK organization that
advocates the understanding of animal research in
medicine. "The 2001 Nobel Prize in medicine
involved sea urchins and yeast, organisms that
evolved apart from humans by millions of years,"
says the younger Festing. "And yet, they are ideal
models for studying cell divisions ᅡヨ research
that is being used in cancer therapeutics in
humans now."
For specific models of human disease, Simon
Festing adds, the farther away from the human
species the animal studies get, the less
predictive the model will be. For example,
researchers studying some conditions, including
Parkinson disease, have established a clear animal
model. The primate model displays symptoms similar
to human symptoms, whereas a mouse model may not
be able to show the distinct tremor in the limbs.
While this difference in essence relates back to
fundamental anatomic variation among the various
species, finding the best model is inherently
difficult.
"The choice of animals is rather narrow," says
Michael Festing. "There are 4,000 species of
rodents, but we use only three or four of them.
Then there's a shortage of anything that's not
rodents, and in some cases we're restricted to
dogs and cats ᅡヨ which are a problem from the
ethical point of view ᅡヨ and primates, also a
problem from the ethical point of view. So
[choosing the right animal model is] sort of done
by default: Eliminate the ones that are not
suitable and choose from what's left."
Perhaps
because of its abundance and short gestation, the
mouse has become the flagship of animal testing,
especially useful with genetic modifications, gene
knockouts, and knockins. In 2003, NIH launched the
Knockout Mouse Project (KOMP) and has awarded more
than $50 million with the goal of creating a
library of mouse embryonic stem cells lines, each
with one gene knocked out.
Nonetheless, even genetically manipulated mice
have their problems. The current knockout mouse
model for amyotrophic lateral sclerosis (ALS) may
be completely wrong, according to John Trojanowski
at the University of Pennsylvania School of
Medicine. He and colleagues recently showed that
two versions of the disease, sporadic and
hereditary, are biochemically distinct, and that a
different mechanism controls the disease in each
case.4
In hereditary ALS the disease is associated with a
mutation (SOD-1), whereas the sporadic cases are
associated with the TDP-43 protein. Until now,
research has focused primarily on SOD-1 knockout
mice, with virtually no success in human trials.
The new findings relating to the TDP-43 protein
suggest that the SOD-1 knockout model for ALS
could be wrong. "There was this nagging doubt"
about the validity of the current models,
Trojanowski says. "And there may be a whole new
pathology characteristic, so we need models based
on TDP-43."
A recent study at the Massachusetts Institute
of Technology shows distinct differences between
gene regulation in humans and mouse liver ᅡヨ
particularly how the master regulatory proteins
function.5
In a comparison of 4,000 genes in humans and mice,
the researchers expected to see identical behavior
ᅡヨ that is, the binding of transcription factors
to the same sites in most pairs of homologous
genes. However, they found that transcription
factor binding sites differed between the species
in 41% to 89% of the cases.
Many of the underlying limitations associated
with mice models involve the inherent nature of
animal testing. The laboratory environment can
have a significant effect on test results, as
stress is a common factor in caged life. Jeffrey
Mogil, a psychology researcher at McGill
University in Quebec, demonstrated last year that
laboratory mice feel "sympathy pains" for their
fellow labmates. In other words, seeing another
mouse in distress elevates the amount of distress
the onlooker displays. The average researcher,
when testing for toxicity effects in mice for
example, likely assumes that they are starting at
a pain baseline, when in truth the surrounding
environment is not benign and can significantly
affect results, Mogil says.
| Choosing the right
animal model is "sort of done by default:
Eliminate the ones that are not suitable and
choose from what's left." -Michael
Festing |
In new research, Mogil's group is demonstrating
that the very presence of a lab researcher can
alter behavior in mice. "The surprising thing is
that these effects are visual, not auditory or
olfactory," he says. "It's a huge surprise. Most
people think [mice] are mostly blind anyway. I'm
being convinced that the visual world of the mouse
is a lot richer than expected."
Although
the failure of NXY-059 may be one insult too many
for clinicians and patients eagerly awaiting a
neuroprotective agent, some experts feel that this
hurdle is far from being the final chapter.
Whether they blame weak animal test
standardization, poor clinical design, or
inadequate statistical analysis, questions often
return to the NXY-059 itself as an indicator for
the future of neuroprotection. "This drug is known
to have antioxidant effects, but it was never
shown what its mechanism was on the brain. Early
studies were only hinting at possibilities,"
Savitz says.
In a field where much work is concentrating on
nitrone-based spin trap agents, NXY-059 became the
parent compound. But it's clear that it wasn't the
answer. "The drug probably isn't a good drug to
begin with," says Myron Ginsberg, professor of
neurology and clinician at the University of Miami
School of Medicine. Despite NYX-059's
disappointing failure, other neuroprotective
options are still in the pipeline. Ginsberg is in
the early stages of working on albumin as a
neuroprotective therapeutic, and researchers are
also considering hypothermia as a way of
preserving brain cells after ischemic stroke. "The
fact that this drug failed, Ginsberg says,
"doesn't say anything about the potential for
neuroprotection [in the future.]."
comment:
Mouse Models
by Brian [Comment posted
2007-07-11 13:44:10]
Model organisms provide essential
windows into normal development. But, it is
strange that despite years of failure the NIH
continues to pour dollars into research for
therapeutics using rodent models. Obviously there
are technical challenges involved in developing
and refining human therapies, but mice appear to
be a very, very poor model for human diseases. The
most glaring I think is cancer research. How many
times have cancer cures been observed in mice? It
seems that rodents are actually mostly immune to
cancer as a disease, but that doesn't stop people
from advocating their continued use, because as
one research told me, its the best thing we have.
The research money could be much better spent
looking for better models that would be more
appropriate for translational
research.
comment:
Models are merely
models!
by Richard N. Sifers,
Ph.D. [Comment posted 2007-07-11
13:47:56]
We, as scientists, must admit
that models are simply models! Although
statistical robustness is certainly needed in
animal studies, it must be accepted that models do
not, and often cannot, recapitulate sophisticated
human physiology. Similarities exist between
apples and oranges (both are round and contain
seeds), but one had better focus on an apple tree
if interested in understanding the intimate
details of its fruit. In a similar manner, only in
very limited ways will any non-primate model
recapitulate human physiology. However, these vast
differences are not identified until one examines
systems at a biochemical level, and this has
become a very rare event. For too long, we have
studied evolution in terms of investigating
?similarities? between different species. These
examples gave us clues as to the existence of
evolution. HOWEVER, the evolutionary process, by
definition, actually refers to the vast
differences that exist between species, and even
between cells within a given species. Although
many of the genes are shared, the regulation of
their products can differ considerably! I
suspect that even if all the animal models
faithfully mimicked the actual primary defects
found in human diseases that they would still fall
short of mimicking the human situation. Finally,
model systems are certainly appropriate for some
endeavors, but they will likely fall short (more
times than not) when trying to identify drugs that
will correct human diseases. For this reason, I
sometimes wonder to what extent science actually
advanced (in terms of understanding human disease)
during the genomics era?
comment:
Rodent models can be useful
in other areas
by John R. Moffett
PhD [Comment posted 2007-07-12
06:13:53]
I have been involved in using
animal stroke models for a number of years and it
is clear to me that the rodent brain is far more
resilient after stroke than the human brain. It is
probably going to turn out that rodents do not
provide a useful model for ischemia research.
However, this does not mean that rodent
models are useless in science. Indeed, rodent
knockout models for genetic diseases have proven
themselves to be quite useful for basic research
and biomedical studies. Currently we are working
on both murine and rat models of a genetic
disorder known as Canavan disease, and we are
getting good results using a dietary supplement
that provides a missing metabolite (acetate) in
both models. It may turn out as with ischemia that
these rodent models do not reflect real-world
situations with humans suffering from the disease,
but we are very hopeful that our results will
translate to an efficacious treatment for infants
suffering from this genetic disorder.
|