Today I would like to talk about a common problem in clinical trials in progressive disease (e.g. cancer) with an time-to-event endpoint (eg progression or death), which concerns the interpretation of interim results with low maturity. Low maturity can be caused:
- “by design”: by selection of a timepoint where only a limited fraction of events are observable
- “by surprise”: by a deviation of the data from the assumptions, e.g. of a lower event rate.
Especially in the latter situation b), this may lead to the unfortunate situation that the efficacy of a highly effective treatment, which causes the event never to happen during the follow-up time, cannot be evaluated because of a lack of information (ie excessive censoring).
Therefore, I will discuss methodological issues and potential remedies in these situations, which are based on shifting the burden of proof from the primary (censored) endpoint towards secondary (non-censored) endpoints.
When interpreting interim results with low maturity, a basic concern is that the treatment effect may be inferior in patients with censored events. Patients with censored events may plausibly represent a subpopulation according to the following scenarios:
- who is a random sample of the population
- who is likely to be healthier at baseline (presence of a prognostic factor)
- who responds to the treatment better (presence of a predictive factor)
in comparison to the subpopulation of patients with an event at the interim analysis.
In scenario (1) the estimate of the treatment effect at the interim analysis is an unbiased estimate of the true treatment effect and will only randomly deviate from the estimate at the final analysis. However, this scenario is unlikely, because patients without an event at the interim will by definition have by definition a longer time-to-event, ie survival time, than patients with an event at the interim, this provides a strong argument that patients without an event are relatively healthier.
In scenario (2) the estimate of the treatment effect at the interim analysis is an unbiased estimate of the true treatment effect, only if the factors, which define the subpopulation of patients, are prognostic, but not predictive factors. Since patients without an event at the interim will by definition have a longer time-to-event, ie survival time, than patients with an event at the interim, this provides a strong argument that patients without an event are relatively healthier.
In scenario (3), the estimate of the treatment effect at the interim analysis is an biased estimate,which underestimates the true treatment effect and will non-randomly deviate from the final analysis. Such a conservative bias is, however, acceptable from a methodological perspective. However, an anti-conservative bias of the treatment effect is likely in clinical trials in patients with curable disease and corresponding endpoints, e.g. time-to-cure.
Therefore, a researcher who performs an interim analysis will have a burden of proof to demonstrate that there is no predictive variable which may result in a reduced treatment effect in patients with a censored event. Since “absence of evidence is not evidence of absence”, demonstrating that no such treatment interactions exist will be difficult in practice. One may reduce the set of potential predictive variables to standard prognostic variables of health status at baseline. However, since clinical trials are usually not designed to test equivalence or non-inferiority of treatment effects in subgroups with sufficient power, ie acceptable type II error, this will also be difficult in practice. In addition, one would have to define an equivalence or non-inferiority limit for the treatment effect, which may be difficult to justify.
Under these considerations, I would recommend demonstrating superiority of the treatment effect in patients without events at the interim and shifting the burden of proof from the primary (censored) endpoint towards secondary (non-censored) endpoints by performing the following analyses:
- The researcher should test differences between patients with and without event at the interim regarding (standard) prognostic variables . The objective is to show that patients without an event at the interim are significantly healthier than patients with an event at the interim, when they are compared at baseline.
- The researcher should test the main effect of the prognostic variable and the interactive effect of the prognostic variable with the treatment, ie the predictive effect, on the primary (time-to-event) endpoint. The objective is to show that healthier patients with an event at the interim have a stronger treatment effect at the interim.
- Since the primary endpoint is censored for the subpopulation in question at step 2, the same analysis should be performed comparing patients with and without an event in the primary endpoint at the interim using secondary endpoints which are not censored at the interim. The objective is to show that healthier patients without an event at the interim have a stronger treatment effect at the interim.
- The researcher should test the associations of the studied secondary endpoint(s) is(are) with the primary endpoint, and quantify the strength of association and its uncertainty.
- The effect of the maturity of the data on the treatment effect (and a lack of effect) may be demonstrated by a cumulative follow-up time (or time of recruitment) plot and the corresponding treatment effect, since this plot will show any trends. The analysis may also be performed by comparing subgroups with different quantiles of recruitment times/follow-up times, e.g. 10%, 20%, …, 100%.
If all analyses steps give consistent results regarding a superior treatment effect in patients without event at the interim, this may provide sufficient evidence that there is an acceptable, minimal risk that the estimate of treatment effect at the interim analysis may overestimate the true treatment effect, ie have an anti-conservative bias. However, the burden of proof will be greater, the lower the maturity of the interim data is. These analyses should be designed as confirmatory analysis, but may also be supported by relevant exploratory analyses. In general, the totality of results will be important to make a decision whether the efficacy estimate at the interim is conservative.
To plan statistical analyses to support an early interim analysis, similar historical studies with mature data may be used to support the conclusions. Such historical clinical trial data maybe used to evaluate:
- Bias of the efficacy estimate at the interim (in comparison to the true (or final) treatment effect)
- Probability of success of the supportive analysis (as described above)
Bias and probability of success may be quantified using simulations based on resampling from historical studies.
Additional aspects, which are relevant for the evaluation, are:
- the sufficient exposure of patients to the treatment or time under treatment (to evaluate safety)
- the size of the safety analysis set
- the integrity of the trial
The safety profile in patients with no event at the interim need to evaluated and the probability that the safety profile for the patients without event is worse than in patients with an event estimated, which might lead to a negative benefit-risk balance. A typical example of drugs with an inferior safety profile may be drugs with accumulating toxicity. However, such analyses is not further discussed here.
If the interim analysis gives a null result, it is expected that the integrity of the trial at the final analysis is maintained, and unblinding only affect the analyst and the researcher, who decides whether to stop or continue the trial. If the interim analysis gives a positive result, and the drug is submitted and approved, the integrity of the trial at the final may be impaired, because all involved parties, including patients, clinicians, investigators, analyst and the sponsor may be unblinded. Therefore, a researcher should make a well-informed decision about the benefit-risk balance of conducting an interim analysis with low maturity.