Wilmar Igl, PhD^{1} & John Constant, PhD^{2}

^{1}Biostatistics Consulting Services, ICON PLC, Sweden^{2}Scientific Affairs, ICON PLC, Canada

# Introduction

Ruberg et al.^{2} have recently published a perspective on the application of Bayesian approaches in drug development. They illustrated the principles of Bayesian statistics and compared them to frequentist statistics. In addition, the authors gave multiple examples of how Bayesian statistical methods have already contributed to drug development. Finally, they also addressed barriers for their implementation and gave recommendations for their application in future drug development.

Although the article gave a very educational and seemingly persuasive overview on Bayesian approaches in drug development for a broader audience, a different perspective is provided in the following. Here, we argue that the arguments and examples supporting the application of Bayesian methods in the original article may in fact make Bayesian methods less attractive to drug developers and regulators. Therefore, to harness the strengths that Bayesian methods offer, these issues need to be addressed.

In the following, we will discuss some of the originally presented conceptual arguments and examples and suggest solutions were feasible. A more detailed review of the article is provided elsewhere^{3}.

# Conceptual arguments

## Fundamental differences and enormous consequences

Ruberg et al. (2023)

“There are two fundamental distinctionsbetween frequentist and Bayesian approaches. …This … has enormous consequences logically andfor statistical inference, as the two statistical approachesanswer fundamentally different questions.”

The authors emphasize the fundamental differences between frequentist and Bayesian approaches and point out their enormous consequences on logic and statistical inference: First, while frequentists make inferences within a single experiment, Bayesians synthesize information across multiple sources of information. Second, while frequentists “indirectly” make their inference about a parameter value of interest by “proof by contradiction”, i.e. they ask how likely the data is based on an assumed parameter value, Bayesians draw their conclusion “directly” from the data, i.e. they ask how likely the parameter value is based on the data.

We would argue, that the priority for regulators (and hence for drug developers) is more on being conservative and consistent and less on being innovative (and even less on being disruptive). Therefore, we assume that regulators will feel very uncomfortable with replacing conventional frequentist statistical methods with fundamentally different ones with enormous consequences. In my experience the attitude of the EMA regulatory network still seems to be a catch-22^{4} attitude in the sense that Bayesian methods are acceptable if they give the same conclusions as frequentist methods, at least for regulatory approval of pivotal studies. Moreover, the prospect of companies presenting Bayesian re-analysis of rejected and approved trials asking for a (re-)assessment of their drug approval or drug labels, is not something that regulators will look forward to.

While this argument of principle differences between Bayesian vs frequentist statistics seems important to drive forward academic development and also for creating group identities of frequentists and Bayesians (cf tribalism^{5}), it is overstated^{6, p. 62}. Both the Bayesian and frequentist approaches have the same goal of estimating unknown values of parameters from known observations. Bayesian approaches and (likelihood-based) frequentist approaches will give similar if not identical results if a non-informative prior distribution is used (all other things being equal). They will also converge towards identical results the more data is collected, because the influence of the prior information will be reduced and the likelihood based on the observed data will dominate. Their intimate relationship is also illustrated by the hybrid concept of “Bayesian p values”^{7} (i.e. posterior predictive probabilities), which allow the interpretation of Bayesian results with frequentist concepts. In summary, it would be wiser for drug developers who want to apply Bayesian methods in a regulatory context to emphasize the similarities of frequentist and Bayesian approaches instead of the (presumed) fundamental differences.

## Use of existing empirical data

Ruberg et al. (2023)

“Our aim is to engage stakeholdersin the process of consideringwhen the use of existing datais appropriate andhow Bayesian methodscan be implemented more routinelyas an effective tool for doing so.”

The authors propose the application of Bayesian methods for the use of existing data, e.g. observational “real-world” data or phase-2 data. However, in our opinion regulators are more concerned about the quality of the data generation process (“design”) than the quality of the data analysis process (“analysis”), since randomized controlled trials (RCTs) are still the reference standard, for example, in EU law^{8}. Randomized controlled trials can control bias by known and unknown (!) confounders, while statistical methods (i.e. adjustment by covariates) cannot remove the effect of unknown confounders. Even known confounders have to be measured first and modelled based on specific assumptions, which may be violated, to successfully remove bias. Therefore, the recommendation to use Bayesian methods for data from non-randomized, un-controlled studies or even from randomized controlled studies which, however, may not represent the target population, will require to reverse the principle of “design trumps analysis”^{9}, i.e. require that Bayesian analysis can compensate for poor data. Even if this were true in certain studies, one would have to convince regulators that complex, hard to understand Bayesian methods solve the problem of using complex, hard to understand existing data and trust that the final results will be precise and accurate. Of course, Ruberg et al.^{2} have clearly stated the application of Bayesian methods needs to depend on the context and whether external data are appropriate to use. However, using Bayesian methods to quantify the effect of existing data on a clinical study instead of just considering it narratively in one’s assessment, does not completely remove the role of subjective belief in the relevance of existing data but transfers parts of it to another step in the interpretation process.

## Use of existing knowledge

Ruberg et al. (2023)

“Put simply, evolutions inscience, drug development, pharmacology,data accessibility and data analysis methodologyshould be matched by a similar evolution andadvances in inferential methods,most notably by careful and explicituse of existing knowledge and data.”

Ruberg et al.^{2} argue that not only existing data, but also other forms of knowledge could be used in Bayesian methods. However, there is a fundamental problem of modelling prior knowledge (including subjective beliefs). The current method seems to feed information into a natural neural network (a.k.a. “expert”) and ask the expert to describe an informative prior distribution^{10}. Although evidence exists that useful prior information can be extracted^{11}, to establish such a standardized, validated expert knowledge elicitation process is not trivial, if nothing else, because experts are hard to find. What is more, one’s subjective belief in the validity of a prior distribution needs to be agreed upon with regulators, which may be problematic^{12}.

## Costs and complexity

Ruberg et al. (2023)

“Thus, a Bayesian approach could confirmthe benefits of the new treatment in smaller— but more complex and expensive — trialswhile the entire clinical development programmecan be used in more efficient ways to build better evidenceregarding the safety of the new treatment.”

Ruberg et al (2023)^{2} state that the application of a Bayesian approach in a clinical study could make it “smaller – but more complex and expensive”, while creating efficiencies in the entire clinical development programme. However, neither sponsors nor regulators will likely feel attracted to conduct or assess more complex and expensive trials with the speculative prospect of creating some vaguely described efficiencies across a clinical development programme. Also the provided argument leading up to the above conclusion that phase 3 clinical trials are overpowered regarding the demonstration of efficacy, because the requirements for demonstrating safety will not allow to reduce the sample size anyway, does not strengthen the argument for using Bayesian methods either.

# Examples

## Therapeutic Hypothermia studies

The authors present the example of Therapeutic Hypothermia illustrating the effect of various prior distributions on the posterior result of a study. However, the Bayesian analysis leads to a more conservative result than the frequentist analysis (although not statistically significant) which, therefore, does not make a convincing argument for the adoption of Bayesian statistics (although the Bayesian results may be correct).

## COVID-19 vaccine studies

The authors present the example of the COVID-19 vaccine study^{13} by Pfizer/BioNTech evaluating the vaccine efficacy of the BNT162b2 (“Comirnaty”) mRNA vaccine. While there is considerable merit in using a Bayesian approach in such a high-profile clinical trial, the Bayesian approach did not add value relative to a frequentist analysis because of the very strong treatment effects^{14}. In addition, the applied Bayesian methodology received some criticism^{15}. For example, the sponsor’s statement that the prior distribution for vaccine efficacy, which was centered around 30%, was “pessimistic” and “minimally informative” (as also repeated by Ruberg et al.) was criticized.^{15} In summary, while bringing attention to Bayesian methodology^{16}, the trial may not be the best example to shine a light on the added value of applying Bayesian statistics.

On another note, the authors seemed to assume that the primary endpoint of these trials was “containing the spread of the virus”. To our knowledge, many COVID-19 vaccine trials, including the pivotal trial on BNT162b2 by Pfizer/BioNTech, targeted a primary endpoint of reducing some form of symptomatic COVID-19 disease, but were not aiming at reducing the viral load of SARS-CoV-2 or infection rates as the authors indicated. Although this is a common misconception, one should be very clear on that distinction, since it has led to considerable political debates in the European parliament^{17}. Here, (conservative) politicians accused the European Commission of using vaccines without any evidence of controlling the spread of the virus, thereby, counteracting public health authorities’ efforts to vaccinate the population. However, the primary intention of these trials and the EU vaccination policy never was to contain the spread of the SARS-CoV-2 virus, but to avoid overburdening the public health system (ie “flattening the curve”) with COVID-19 patients.

## Subgroups

The authors present the example of Bayesian hierarchical models (BHM) to identify differential treatment effects between subgroups in clinical trials. While BHMs which allow “borrowing information” between subgroups can lead to more precise and accurate estimates, this approach is based on the assumption of a shared, common treatment effect. This assumption may be valid, for example, in the well-known “Eight schools” example^{18, pp. 119} which assumes common treatment effects of educational programs in all included schools. Here, the BHM leads to more precise and accurate estimates of the program effects, especially in smaller schools, which are informed by the treatment effects of larger schools.

However, in clinical studies the assumption of a shared treatment effect is often not plausible and actually is the question at study, i.e. whether differential treatment effects between subgroups are present. Therefore, the application of a BHM approach for this objective may be questionable. For example, the original BHM proposed by Berry et al.^{19} can show an inflation of the nominal type 1 error of 10% to over 50%^{20}. Although the BHM was developed further into a calibrated BHM by Chu and Yuan^{20}, which considerably improved the original method, the authors state that even their model can still inflate the type 1 error from 10% to 20% in certain scenarios.

To address the question of differential treatment effects in subgroups, Bayesian predictive cross-validation models^{21} should be used instead which predict treatment effects of a subgroup of interest from all other subgroups. If the treatment effect in the subgroup of interest lies outside the prediction interval, one can conclude that this subgroup shows a differential treatment effect. This Bayesian approach, therefore, does not make the assumption of a shared, common treatment effect and will not show this bias. In summary, the example of using the analysis of differential treatment effects in subgroups with Bayesian hierarchical models does not make a convincing case for the application of Bayesian analysis, while other more suitable (Bayesian) models would have been available.

# Conclusion

“We believe that the widespread adoption of Bayesian approacheshas the potential to be the single most impactful toolfor accelerating the development of new medicines,reducing exposure of clinical trial participantsto suboptimal control arms and providing earlier accessto high-quality treatments for patients.”

Ruberg et al. (2023)

Ruberg et al.^{2} have presented a highly educational paper championing the application of Bayesian statistics application of Bayesian approaches in drug development. While their final conclusion may still hold, the presented conceptual arguments and examples can be seen from a different perspective, which actually may make the use of Bayesian methods less attractive to drug developers and regulators.

In addition, the arguments – though presented elegantly – are not new and have largely failed to convince sponsors and regulators to use Bayesian statistics more widely over the last decades. Therefore, those who promote Bayesian methods should reflect in more depth why those known arguments have not worked so far. Although we initially stated that our expectation was that Ruberg et al will create “hype, not hope”, our hope actually is that the original paper will start a “virtuous cycle” of publications on “the theory that would not die”^{22} to address the raised issues, to make the advantages that Bayesian statistical rethinking^{23} has to offer available to drug development and regulation.

# References

1. Miller, B. *Moneyball*. (Columbia Pictures, Scott Rudin Productions, Michael De Luca Productions, 2011).

2. Ruberg, S. J. *et al.* Application of Bayesian approaches in drug development: starting a virtuous cycle. *Nat Rev Drug Discov* 1–16 (2023) doi:10.1038/s41573-023-00638-0.

3. Igl, W. A review of the “Virtuous Cycle” paper on Bayesian statistics by Ruberg et al. (2023). https://wilmarigl.de/?p=689 (2023).

4. Heller, J. *Catch-22*. (Simon & Schuster, 1961).

5. Clark, C. J., Liu, B. S., Winegard, B. M. & Ditto, P. H. Tribalism Is Human Nature. *Curr Dir Psychol Sci* **28**, 587–592 (2019).

6. Gill, J. *Bayesian Methods: A Social and Behavioral Sciences Approach*. (Chapman and Hall/CRC, 2007).

7. Ghosh, J. K. & Delampady, M. Bayesian P-Values. in *International Encyclopedia of Statistical Science* (ed. Lovric, M.) 101–104 (Springer, 2011). doi:10.1007/978-3-642-04898-2_136.

8. *Directive 2001/83/EC of the European Parliament and of the Council of 6 November 2001 on the Community code relating to medicinal products for human use*. *OJ L* vol. 311 (2001).

9. Rubin, D. B. For objective causal inference, design trumps analysis. *The Annals of Applied Statistics* **2**, 808–840 (2008).

10. Brownstein, N. C., Louis, T. A., O’Hagan, A. & Pendergast, J. The Role of Expert Judgment in Statistical Inference and Evidence-Based Decision-Making. *The American Statistician* **73**, 56–68 (2019).

11. Holzhauer, B. *et al.* Eliciting judgements about dependent quantities of interest: The SHeffield ELicitation Framework extension and copula methods illustrated using an asthma case study. *Pharm Stat* **21**, 1005–1021 (2022).

12. Health, C. for D. and R. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. *U.S. Food and Drug Administration* https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-use-bayesian-statistics-medical-device-clinical-trials (2020).

13. Polack, F. P. *et al.* Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine. *New England Journal of Medicine* **383**, 2603–2615 (2020).

14. Statistics. *xkcd* https://xkcd.com/2400/.

15. Senn, S. The design and analysis of vaccine trials for COVID-19 for the purpose of estimating efficacy. *Pharmaceutical Statistics* **21**, 790–807 (2022).

16. Skorulski, B. Bayesian Statistics of Efficacy of Pfizer-BioNTech COVID-19 Vaccine — part I. *Medium* https://towardsdatascience.com/bayesian-statistics-of-efficacy-of-pfizer-biontech-covid-19-vaccine-part-i-efac8d4e0539 (2021).

17. Fact Check-Preventing transmission never required for COVID vaccines’ initial approval; Pfizer vax did reduce transmission of early variants. *Reuters* (2022).

18. Gelman, A. *et al.* *Bayesian Data Analysis*. (Chapman and Hall/CRC, 2015). doi:10.1201/b16018.

19. Berry, S. M., Broglio, K. R., Groshen, S. & Berry, D. A. Bayesian hierarchical modeling of patient subpopulations: Efficient designs of Phase II oncology clinical trials. *Clinical Trials* **10**, 720–734 (2013).

20. Chu, Y. & Yuan, Y. A Bayesian basket trial design using a calibrated Bayesian hierarchical model. *Clin Trials* **15**, 149–158 (2018).

21. Dias, S., Sutton, A. J., Welton, N. J. & Ades, A. E. *NICE DSU Technical Support Document 3: Heterogeneity: subgroups, meta-regression, bias and bias-adjustment*. http://www.nicedsu.org.uk (2011).

22. McGrayne, S. B. *The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from*. (Yale University Press, 2012).

23. McElreath, R. *Statistical Rethinking – A Bayesian Course with Examples in R and STAN*. (Chapman & Hall, 2020).