Wilmar Igl, PhD1 & John Constant, PhD2
1Biostatistics Consulting Services, ICON PLC, Sweden
2Scientific Affairs, ICON PLC, Canada
Ruberg et al.2 have recently published a perspective on the application of Bayesian approaches in drug development. They illustrated the principles of Bayesian statistics and compared them to frequentist statistics. In addition, the authors gave multiple examples of how Bayesian statistical methods have already contributed to drug development. Finally, they also addressed barriers for their implementation and gave recommendations for their application in future drug development.
Although the article gave a very educational and seemingly persuasive overview on Bayesian approaches in drug development for a broader audience, a different perspective is provided in the following. Here, we argue that the arguments and examples supporting the application of Bayesian methods in the original article may in fact make Bayesian methods less attractive to drug developers and regulators. Therefore, to harness the strengths that Bayesian methods offer, these issues need to be addressed.
In the following, we will discuss some of the originally presented conceptual arguments and examples and suggest solutions were feasible. A more detailed review of the article is provided elsewhere3.
Fundamental differences and enormous consequences
“There are two fundamental distinctionsRuberg et al. (2023)
between frequentist and Bayesian approaches. …
This … has enormous consequences logically and
for statistical inference, as the two statistical approaches
answer fundamentally different questions.”
The authors emphasize the fundamental differences between frequentist and Bayesian approaches and point out their enormous consequences on logic and statistical inference: First, while frequentists make inferences within a single experiment, Bayesians synthesize information across multiple sources of information. Second, while frequentists “indirectly” make their inference about a parameter value of interest by “proof by contradiction”, i.e. they ask how likely the data is based on an assumed parameter value, Bayesians draw their conclusion “directly” from the data, i.e. they ask how likely the parameter value is based on the data.
We would argue, that the priority for regulators (and hence for drug developers) is more on being conservative and consistent and less on being innovative (and even less on being disruptive). Therefore, we assume that regulators will feel very uncomfortable with replacing conventional frequentist statistical methods with fundamentally different ones with enormous consequences. In my experience the attitude of the EMA regulatory network still seems to be a catch-224 attitude in the sense that Bayesian methods are acceptable if they give the same conclusions as frequentist methods, at least for regulatory approval of pivotal studies. Moreover, the prospect of companies presenting Bayesian re-analysis of rejected and approved trials asking for a (re-)assessment of their drug approval or drug labels, is not something that regulators will look forward to.
While this argument of principle differences between Bayesian vs frequentist statistics seems important to drive forward academic development and also for creating group identities of frequentists and Bayesians (cf tribalism5), it is overstated6, p. 62. Both the Bayesian and frequentist approaches have the same goal of estimating unknown values of parameters from known observations. Bayesian approaches and (likelihood-based) frequentist approaches will give similar if not identical results if a non-informative prior distribution is used (all other things being equal). They will also converge towards identical results the more data is collected, because the influence of the prior information will be reduced and the likelihood based on the observed data will dominate. Their intimate relationship is also illustrated by the hybrid concept of “Bayesian p values”7 (i.e. posterior predictive probabilities), which allow the interpretation of Bayesian results with frequentist concepts. In summary, it would be wiser for drug developers who want to apply Bayesian methods in a regulatory context to emphasize the similarities of frequentist and Bayesian approaches instead of the (presumed) fundamental differences.
Use of existing empirical data
“Our aim is to engage stakeholdersRuberg et al. (2023)
in the process of considering
when the use of existing data is appropriate and
how Bayesian methods can be implemented more routinely
as an effective tool for doing so.”
The authors propose the application of Bayesian methods for the use of existing data, e.g. observational “real-world” data or phase-2 data. However, in our opinion regulators are more concerned about the quality of the data generation process (“design”) than the quality of the data analysis process (“analysis”), since randomized controlled trials (RCTs) are still the reference standard, for example, in EU law8. Randomized controlled trials can control bias by known and unknown (!) confounders, while statistical methods (i.e. adjustment by covariates) cannot remove the effect of unknown confounders. Even known confounders have to be measured first and modelled based on specific assumptions, which may be violated, to successfully remove bias. Therefore, the recommendation to use Bayesian methods for data from non-randomized, un-controlled studies or even from randomized controlled studies which, however, may not represent the target population, will require to reverse the principle of “design trumps analysis”9, i.e. require that Bayesian analysis can compensate for poor data. Even if this were true in certain studies, one would have to convince regulators that complex, hard to understand Bayesian methods solve the problem of using complex, hard to understand existing data and trust that the final results will be precise and accurate. Of course, Ruberg et al.2 have clearly stated the application of Bayesian methods needs to depend on the context and whether external data are appropriate to use. However, using Bayesian methods to quantify the effect of existing data on a clinical study instead of just considering it narratively in one’s assessment, does not completely remove the role of subjective belief in the relevance of existing data but transfers parts of it to another step in the interpretation process.
Use of existing knowledge
“Put simply, evolutions inRuberg et al. (2023)
science, drug development, pharmacology,
data accessibility and data analysis methodology
should be matched by a similar evolution and
advances in inferential methods,
most notably by careful and explicit
use of existing knowledge and data.”
Ruberg et al.2 argue that not only existing data, but also other forms of knowledge could be used in Bayesian methods. However, there is a fundamental problem of modelling prior knowledge (including subjective beliefs). The current method seems to feed information into a natural neural network (a.k.a. “expert”) and ask the expert to describe an informative prior distribution10. Although evidence exists that useful prior information can be extracted11, to establish such a standardized, validated expert knowledge elicitation process is not trivial, if nothing else, because experts are hard to find. What is more, one’s subjective belief in the validity of a prior distribution needs to be agreed upon with regulators, which may be problematic12.
Costs and complexity
“Thus, a Bayesian approach could confirmRuberg et al. (2023)
the benefits of the new treatment in smaller
— but more complex and expensive — trials
while the entire clinical development programme
can be used in more efficient ways to build better evidence
regarding the safety of the new treatment.”
Ruberg et al (2023)2 state that the application of a Bayesian approach in a clinical study could make it “smaller – but more complex and expensive”, while creating efficiencies in the entire clinical development programme. However, neither sponsors nor regulators will likely feel attracted to conduct or assess more complex and expensive trials with the speculative prospect of creating some vaguely described efficiencies across a clinical development programme. Also the provided argument leading up to the above conclusion that phase 3 clinical trials are overpowered regarding the demonstration of efficacy, because the requirements for demonstrating safety will not allow to reduce the sample size anyway, does not strengthen the argument for using Bayesian methods either.
Therapeutic Hypothermia studies
The authors present the example of Therapeutic Hypothermia illustrating the effect of various prior distributions on the posterior result of a study. However, the Bayesian analysis leads to a more conservative result than the frequentist analysis (although not statistically significant) which, therefore, does not make a convincing argument for the adoption of Bayesian statistics (although the Bayesian results may be correct).
COVID-19 vaccine studies
The authors present the example of the COVID-19 vaccine study13 by Pfizer/BioNTech evaluating the vaccine efficacy of the BNT162b2 (“Comirnaty”) mRNA vaccine. While there is considerable merit in using a Bayesian approach in such a high-profile clinical trial, the Bayesian approach did not add value relative to a frequentist analysis because of the very strong treatment effects14. In addition, the applied Bayesian methodology received some criticism15. For example, the sponsor’s statement that the prior distribution for vaccine efficacy, which was centered around 30%, was “pessimistic” and “minimally informative” (as also repeated by Ruberg et al.) was criticized.15 In summary, while bringing attention to Bayesian methodology16, the trial may not be the best example to shine a light on the added value of applying Bayesian statistics.
On another note, the authors seemed to assume that the primary endpoint of these trials was “containing the spread of the virus”. To our knowledge, many COVID-19 vaccine trials, including the pivotal trial on BNT162b2 by Pfizer/BioNTech, targeted a primary endpoint of reducing some form of symptomatic COVID-19 disease, but were not aiming at reducing the viral load of SARS-CoV-2 or infection rates as the authors indicated. Although this is a common misconception, one should be very clear on that distinction, since it has led to considerable political debates in the European parliament17. Here, (conservative) politicians accused the European Commission of using vaccines without any evidence of controlling the spread of the virus, thereby, counteracting public health authorities’ efforts to vaccinate the population. However, the primary intention of these trials and the EU vaccination policy never was to contain the spread of the SARS-CoV-2 virus, but to avoid overburdening the public health system (ie “flattening the curve”) with COVID-19 patients.
The authors present the example of Bayesian hierarchical models (BHM) to identify differential treatment effects between subgroups in clinical trials. While BHMs which allow “borrowing information” between subgroups can lead to more precise and accurate estimates, this approach is based on the assumption of a shared, common treatment effect. This assumption may be valid, for example, in the well-known “Eight schools” example18, pp. 119 which assumes common treatment effects of educational programs in all included schools. Here, the BHM leads to more precise and accurate estimates of the program effects, especially in smaller schools, which are informed by the treatment effects of larger schools.
However, in clinical studies the assumption of a shared treatment effect is often not plausible and actually is the question at study, i.e. whether differential treatment effects between subgroups are present. Therefore, the application of a BHM approach for this objective may be questionable. For example, the original BHM proposed by Berry et al.19 can show an inflation of the nominal type 1 error of 10% to over 50%20. Although the BHM was developed further into a calibrated BHM by Chu and Yuan20, which considerably improved the original method, the authors state that even their model can still inflate the type 1 error from 10% to 20% in certain scenarios.
To address the question of differential treatment effects in subgroups, Bayesian predictive cross-validation models21 should be used instead which predict treatment effects of a subgroup of interest from all other subgroups. If the treatment effect in the subgroup of interest lies outside the prediction interval, one can conclude that this subgroup shows a differential treatment effect. This Bayesian approach, therefore, does not make the assumption of a shared, common treatment effect and will not show this bias. In summary, the example of using the analysis of differential treatment effects in subgroups with Bayesian hierarchical models does not make a convincing case for the application of Bayesian analysis, while other more suitable (Bayesian) models would have been available.
“We believe that the widespread adoption of Bayesian approaches
has the potential to be the single most impactful tool
for accelerating the development of new medicines,
reducing exposure of clinical trial participants
to suboptimal control arms and providing earlier access
to high-quality treatments for patients.”
Ruberg et al. (2023)
Ruberg et al.2 have presented a highly educational paper championing the application of Bayesian statistics application of Bayesian approaches in drug development. While their final conclusion may still hold, the presented conceptual arguments and examples can be seen from a different perspective, which actually may make the use of Bayesian methods less attractive to drug developers and regulators.
In addition, the arguments – though presented elegantly – are not new and have largely failed to convince sponsors and regulators to use Bayesian statistics more widely over the last decades. Therefore, those who promote Bayesian methods should reflect in more depth why those known arguments have not worked so far. Although we initially stated that our expectation was that Ruberg et al will create “hype, not hope”, our hope actually is that the original paper will start a “virtuous cycle” of publications on “the theory that would not die”22 to address the raised issues, to make the advantages that Bayesian statistical rethinking23 has to offer available to drug development and regulation.
1. Miller, B. Moneyball. (Columbia Pictures, Scott Rudin Productions, Michael De Luca Productions, 2011).
2. Ruberg, S. J. et al. Application of Bayesian approaches in drug development: starting a virtuous cycle. Nat Rev Drug Discov 1–16 (2023) doi:10.1038/s41573-023-00638-0.
3. Igl, W. A review of the “Virtuous Cycle” paper on Bayesian statistics by Ruberg et al. (2023). https://wilmarigl.de/?p=689 (2023).
4. Heller, J. Catch-22. (Simon & Schuster, 1961).
5. Clark, C. J., Liu, B. S., Winegard, B. M. & Ditto, P. H. Tribalism Is Human Nature. Curr Dir Psychol Sci 28, 587–592 (2019).
6. Gill, J. Bayesian Methods: A Social and Behavioral Sciences Approach. (Chapman and Hall/CRC, 2007).
7. Ghosh, J. K. & Delampady, M. Bayesian P-Values. in International Encyclopedia of Statistical Science (ed. Lovric, M.) 101–104 (Springer, 2011). doi:10.1007/978-3-642-04898-2_136.
8. Directive 2001/83/EC of the European Parliament and of the Council of 6 November 2001 on the Community code relating to medicinal products for human use. OJ L vol. 311 (2001).
9. Rubin, D. B. For objective causal inference, design trumps analysis. The Annals of Applied Statistics 2, 808–840 (2008).
10. Brownstein, N. C., Louis, T. A., O’Hagan, A. & Pendergast, J. The Role of Expert Judgment in Statistical Inference and Evidence-Based Decision-Making. The American Statistician 73, 56–68 (2019).
11. Holzhauer, B. et al. Eliciting judgements about dependent quantities of interest: The SHeffield ELicitation Framework extension and copula methods illustrated using an asthma case study. Pharm Stat 21, 1005–1021 (2022).
12. Health, C. for D. and R. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. U.S. Food and Drug Administration https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-use-bayesian-statistics-medical-device-clinical-trials (2020).
13. Polack, F. P. et al. Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine. New England Journal of Medicine 383, 2603–2615 (2020).
14. Statistics. xkcd https://xkcd.com/2400/.
15. Senn, S. The design and analysis of vaccine trials for COVID-19 for the purpose of estimating efficacy. Pharmaceutical Statistics 21, 790–807 (2022).
16. Skorulski, B. Bayesian Statistics of Efficacy of Pfizer-BioNTech COVID-19 Vaccine — part I. Medium https://towardsdatascience.com/bayesian-statistics-of-efficacy-of-pfizer-biontech-covid-19-vaccine-part-i-efac8d4e0539 (2021).
17. Fact Check-Preventing transmission never required for COVID vaccines’ initial approval; Pfizer vax did reduce transmission of early variants. Reuters (2022).
18. Gelman, A. et al. Bayesian Data Analysis. (Chapman and Hall/CRC, 2015). doi:10.1201/b16018.
19. Berry, S. M., Broglio, K. R., Groshen, S. & Berry, D. A. Bayesian hierarchical modeling of patient subpopulations: Efficient designs of Phase II oncology clinical trials. Clinical Trials 10, 720–734 (2013).
20. Chu, Y. & Yuan, Y. A Bayesian basket trial design using a calibrated Bayesian hierarchical model. Clin Trials 15, 149–158 (2018).
21. Dias, S., Sutton, A. J., Welton, N. J. & Ades, A. E. NICE DSU Technical Support Document 3: Heterogeneity: subgroups, meta-regression, bias and bias-adjustment. http://www.nicedsu.org.uk (2011).
22. McGrayne, S. B. The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from. (Yale University Press, 2012).
23. McElreath, R. Statistical Rethinking – A Bayesian Course with Examples in R and STAN. (Chapman & Hall, 2020).