to conclude that the efficacy of an actual efficacious therapy cannot be proven and, as a consequence, to potentially refuse patients an efficacious therapy.
Indeed, from a patient’s perspective the answer might not be straightforward. However, there is a clear answer to this question in clinical research (quick spoiler: situation A is worse!).
Why asking these questions though when talking about intention-to-treat (ITT) vs. per-protocol (PP)? Well, let’s start with some definitions and general explanations:
The intention-to-treat principle defines that every patient randomized to the clinical study should enter the primary analysis. Accordingly, patients who drop out prematurely, are non-compliant to the study treatment, or even take the wrong study treatment, are included in the primary analysis within the respective treatment group they have been assigned to at randomization (“as randomized”).
Consequently, in an analysis according to the ITT principle, the original randomization and the number of patients in the treatment groups remain unchanged, the analysis population is as complete as possible, and a potential bias due to exclusion of patients is avoided. Thus, the patient set used for the primary analysis according to the ITT principle is called “full analysis set”.
There are only some specific reasons that might cause an exclusion of a patient from the full analysis set:
no treatment was applied at all there are no data available after randomizationIn addition, the ICH E9 guideline mentions “failure of major entry criteria” as a reason for exclusion. However, as these major entry criteria are quite specific and only valid under certain circumstances, they are not commonly used for the definition of a full analysis set.
While an analysis according to the ITT principle aims to preserve the original randomization and to avoid potential bias due to exclusion of patients, the aim of a per-protocol (PP) analysis is to identify a treatment effect which would occur under optimal conditions; i.e. to answer the question: what is the effect if patients are fully compliant? Therefore, some patients (from the full analysis set) need to be excluded from the population used for the PP analysis (PP population).
Usually, this applies to patients fulfilling any of the following criteria:
any major protocol deviations (e.g. intake of a concomitant medication affecting the primary endpoint)
non-availability of measurements of the primary endpointThere might be further criteria for selecting a PP population; however, the following approaches are essential:
The assignment to the PP analysis set needs to take place prior to the analysis (if possible in a blinded manner).
Deviations that might be affected by the actual treatment should not be used as exclusion criteria: e.g., “premature discontinuation from the study” might not be a good choice of criterion for exclusion from the PP analysis, if this discontinuation was due to lack of efficacy (and therefore associated with the treatment received).
Both approaches, the ITT and the PP approach, are valid but have different roles in the analysis of clinical studies. Let’s come back to the question at the beginning of this article: What is worse, scenario A (claim a non-existing effect) or B (neglect an existing effect)?
To answer this, consider the essential difference between the two cases:
Case A means that a statistically proven result is actually wrong – a result that might cause dangerous effects. Based on such a proof, an inefficacious treatment might be approved and patients put into danger. Situation B on the other hand means that efficacy was not proven but also not refused. However, the non-proven efficacy does not equal a proven inefficacy! From a scientific perspective, such a non-decision has less implications than a wrong proof.
Therefore, in clinical trials situation A (also known as type I error) is strictly controlled via a low pre-defined level of significance: a level of 5% e.g. says that (if there is actually no effect) the probability of situation A is only 5% or less. Situation B (known as type II error) on the other hand, is controlled via a meaningful sample size calculation, but usually with a less strict criterion (e.g. 20%).
Concluding, it is more essential to avoid a wrong proof than to avoid a wrong non-decision (which is also bad, but A is worse…). Consequently, it is essential to keep the probability of situation A below the level of significance (e.g. 5%).
Thus, the common rule for clinical trial analyses is: be conservative! While “conservative” means: do not increase the probability of a type I error!
In a clinical trial (we only talk about superiority trials here as the situation is different for non-inferiority trials), one wants to detect a benefit of treatment A (e.g. verum) compared to treatment B (e.g. placebo). The aim is to disprove that “treatment A is not better than treatment B (so-called “null hypothesis”). This is equivalent to a proof that “treatment A is actually better than treatment B” (that is the way statistical tests work).
Thus, a high treatment effect leads to a successful trial (i.e. to proven efficacy). However, if you choose a too optimistic method of analysis, i.e. if you over-estimate the effect, you receive more likely a positive result. Or in other words: you increase the probability of a type I error.
Therefore, in clinical trials any over-estimation of the effect needs to be avoided. With respect to prevention of type I error it is still better to choose a method which under-estimates the effect (conservative approach) than a method which might over-estimate it.
What does this general rule mean for the choice of ITT vs. PP? What is the more conservative approach in this context? The simple answer is: it’s the analysis according to the ITT principle.
For this kind of analysis, actual treatment effects usually are watered-down, or in other words: effects are under-estimated. This tendency is also described in common guidelines (e.g. ICH E9). It can be derived from the fact that in the full analysis set also non-compliant patients are included and non-compliance generally is associated with a negative outcome (e.g., patients who dropped out at a very early stage in the study usually have a negative outcome). Presumed that non-compliance occurs in all treatment arms, differences between the treatments consequently diminish.
Consider a superiority trial with two treatment arms (verum vs. placebo), with a dichotomous outcome (response yes, no). The real response rates, i.e. the response rates that are expected, are 60% under verum and 40% under placebo; thus, there is a real treatment effect of 20% points.
Now assume that 10% of the patients in both study arms previously drop out from the study due to missing follow-up (i.e., 10% dropouts, 90% completers). Due to their shortened observation period, none of the dropouts achieved response (a reasonable assumption).
Nevertheless, according to the ITT principle, all patients (including dropouts) are included in the full analysis set. Let’s have a look at the outcome:
Verum (N=100) | 90 Completers | 60%, i.e. 54 Responders | 54 of 100 patients are responders (54%) | ->Effect: Δ=18% |
---|---|---|---|---|
10 Dropouts | 0%, i.e. 0 Responders | |||
Placebo (N=100) | 90 Completers | 40%, i.e. 36 Responders | 36 of 100 patients are responders (36%) | |
10 Dropouts | 0%, i.e. 0 Responders |
The estimated treatment effect in this analysis is 18% points, i.e. the actual treatment difference of 20% points is under-estimated. However, with respect to the aim to not increase the probability of a type I error, this “wrong” (or conservative) estimation is still better than an over-estimation of the effect.
How about the PP analysis in this context? Exclusion of patients from the analysis due to major protocol deviations can of course also cause a tendency to wrong estimations of a treatment effect. This is particularly the case, if the frequency of and the reasons for exclusion vary between the study groups. However, for a PP analysis it is not straightforward to pre-guess the direction of a wrong estimation (over- or under-estimation). Some authors and guidelines claim a tendency of PP analyses to over-estimate an effect (e.g. ICH E9 guideline) although this cannot be derived mathematically.