Various bioassays give rise to replicated binomial count data. For example, in ecotoxicological assays, fish larvae or daphnids in several tanks are exposed to different dosages of a substance and the number of dead or immobile animals per tank is used to assess the hazardousness of the substance. Usually, a small number of replicated tanks are used for each dosage under consideration. If some experimental conditions differ between tanks such that the proportion of dead or immobile daphnids is effected, counted numbers may show higher variance than expected under the binomial assumption, i.e., extra-binomial variability or overdispersion. Similar situations arise in the in-vivo micronucleus assay: the number of cells showing micronuclei is counted for a given number of exposed cells for each (randomized) animal, with the aim to assess the substances' potential to cause cytogenetic damage. Also here, a limited number of replications per dosage is performed, such that differences between animals in the in-vivo micronucleus animals may cause overdispersion. In summary, bioassays that lead to binomial data, often contain clustered replication, and thus make it possible to account for overdispersion in the data. However, the number of replications or clusters per treatment group that allows to assess the extend of overdispersion, is rather limited.
In the statistical analysis of such bioassays, major interest is usually in comparisons to the untreated control group. While a test on significance for the overall effect of dosage of the substance may be of preliminary interest, usually more detailed interpretation for the single dosages is required: Confidence intervals for the effect of given dosages compared to the untreated control or a positive control are required to interpret the toxicological relevance of the observed effect size. Tests of non-inferiority (or equivalence) for given dosages compared to an untreated control1 may be more important than an overall test on significant change in the event rates: In toxicological assessment, confidence is needed primarily when claiming no effect. Both approaches require an interpretable definition of the change of the rate of the detrimental event (death or immobility, presence of micronculei, malformations, etc.) compared to the control treatment: for judging relevance of an effect size or for the definition of a particular non-inferiority margin.1
For this reason, this paper is focused on the ratio of proportions (risk ratio). Compared to the plenty of publications considering the construction of confidence intervals for a single binomial proportion, as well as for differences, ratios or odds-ratios of binomial proportions, the construction confidence intervals for risk ratios of overdispersed binomial data has received only little attention.2,3 The available methods are all asymptotic methods. Their construction and their evaluation is usually motivated by their application to epidemiological studies, where the absence or presence of a disease is counted for a given number of individuals in clusters. In this context, clustering of individuals may arise from humans being clustered in families or locations or from repeated measurements within a given animal, or repeated animals within a given farm when veterinary epidemiology is concerned. In these settings, there is usually a relatively large number of clusters available (many families or locations, many farms or animals), such that simple asymptotic methods work well and the estimation of the overdispersion parameters is rather precise. This is not the case in bioassays that are the focus of this paper. In epidemiological settings it is also rather improbable to observe no single disease case across all clusters in one of the groups to be compared, or, conversely, to observe only disease cases across all clusters in one group. Such outcomes, however, plausibly occur in the untreated control groups of bioassays and may cause problems with Wald-type test and related confidence intervals.4 Finally, in epidemiological studies, major interest is usually in estimation, such that the statistical evaluation of confidence interval methods for the risk ratio does plausibly focus on (two-sided) coverage probability and interval width. In the case of bioassays, estimation as well as related hypothesis tests are of interest. In testing, one-sided hypotheses are most relevant and may involve margins of non-inferiority.
In consequence, previous recommendations of confidence intervals methods for risk ratios of overdispersed data, motivated by epidemiological applications, cannot directly transferred to their application to bioassays. In this paper, previously proposed confidence intervals are thus investigated with special focus on small to very small number of clusters (i.e. replications) and the possibility that all observations in one group may show the same event. Due to the need of estimation as well as one-sided non-inferiority tests, two-sided as well as one-sided coverage probabilities are investigated for a wide range of parameter combinations. Further, a new method based on a straightforward combination of profile deviance intervals, e.g,5,6 and the method of variance recovery7 is proposed and is shown to clearly outperform existing approaches under the assumption of a common level of overdispersion.
Parameter and hypotheses of interest
Assume an experimental setup, where for each treatment i, there are Ji replicated experimental units (tanks, cages, animals, petri dishes, etc.) with index j = 1, …, Ji. In each experimental unit there is a number of nij biological units under observation, and the number of events of interest, xij, is counted in unit ij. These counted events may be death (or survival) of animals, mobility (or immobility) of daphnids, presence or absence of micronuclei, etc. Consider the comparison of one dose group, i = 1, to the untreated control, i = 0. Denote the unknown probability of events in the two groups by i, and parameter of interest is the risk ratio
. Beside estimating ρ and displaying the uncertainty of this estimate in terms of a 95% confidence interval, decisions concerning one-sided hypotheses on tests on non-inferiority or superiority may be of interest. The particular choice of non-inferiority margins, ρ0, may be fixed by convention, compare1 suggesting ρ0 = 0.75 or ρ0 = 0.8 for certain applications. In other situations, it might be a matter of debate. Although general focus is in valid two-sided confidence intervals (i.e., with coverage probability close to the nominal level), it will be further investigated whether confidence intervals do also provide valid upper and lower confidence limits and can thus be used to perform (approximate) level α test for one-sided hypotheses.
Overdispersed binomial data
There are two well-known approaches to model overdispersion in binomial data.8 The quasibinomial approach models overdispersion by assuming the variance-mean-dependency
,
where ϕ is the overdispersion parameter that inflates the binomial variance term by a common fold, independent of the sample size nij. In this parameterization, the binomial assumption,
, is met for ϕ = 1.
The beta-binomial distribution derives from a beta mixture of binomial distributions, i.e.,
, and
(1)
where
,
[9]. Denote the sum of the two parameters of the beta-parameters by
. The variance of beta-binomial counts,
is a function of nij, πij,
[5,9]. When
approaches
, the variance of the beta-binomial counts approaches that of binomial counts. Here, the overdispersion relative to the binomial variance,
, is denoted by
and is a function of nij and
. This allows to choose
given nij such that the over dispersion
is a constant factor, namely
.
Main interest here is in the performance of confidence interval methods in highly controlled laboratory settings. Under such conditions it can be assumed that nij is equal for all experimental units, ij. Under such conditions, the quasibinomial assumption on the variance mean dependency coincides with the variance-mean-dependency under the beta-binomial distribution.5,9 Hence, methods are considered that are explicitly constructed for the beta-binomial distribution, as well as methods that account for overdispersion under the quasibinomial assumption.
Confidence interval methods
Fieller-type intervals applied on the observed proportions (FN):
Naively, the t-test for ratios, with a common variance estimator and the assumption of normal distributed residuals, may be used to test the above hypotheses, treating the observed proportions as the variable of interest,
. The corresponding Fieller-type confidence interval can be obtained by analytically inverting the t-test statistic for ratios.10 The method assumes normal distribution and variance homogeneity for the observed proportions, which is clearly not the case in this application. This interval is referred to as FN.
Binomial MOVER-R-Wilson method after summing-up observed counts across replications (BM): It may be tempting to sum up the counts over experimental units within each treatment group,
and apply a confidence interval for risk ratios under the assumption of binomial distribution, i.e., ignoring possible extra-binomial variation. As a place holder for the many available options, here the MOVER-R method proposed by2 is used; this is computationally simple and was among the best methods in a recent comparative study under the binomial assumption by.11 It is referred to as BM.
The two above methods are merely included here to illustrate the effects of either ignoring the mean-variance-relation and skewness implied by binomial distribution (FN) or the effect of applying binomial methods (BM) when data are indeed overdispersed binomial.
Asymptotic method on the log-scale, accounting for overdispersion via residual variation (LOD): Among other methods,3 investigate and recommend a method based on the delta method applied for the log risk ratios (called MR1 therein). It can be computed from:
,
(2)
Where the variance is estimated from the residuals on the scale of the original observations. The interval is then given by:
(3)
By plugging-in the observed residual variance per treatment group i, this method does not assume a particular mean-variance relation and accounts for overdispersion in a more general way. However, if the number of replications per treatment, Ji, is small, these variance estimates might be unstable. Zaihra and Paul3 additionally consider a closely related method with a sandwich-type variance estimator, which performs worse in their simulation study, and is thus ignored here.
Fieller-Bailey-type interval accounting for overdispersion via residual variance (FOD): Using the estimators above,3 follow the approach of10 and12 that accounts for the skewed distribution of the original Fieller statistic
By considering the solutions of a cubic equation, with
The interval (referred to as MR4 by3) can then be computed by
If A > 0 and B2 - AC > 0. If these two restrictions are not met, the interval has unbounded or disjoint solutions which do not provide meaningful interpretations of ρ. In the simulation study below, the interval
is returned in such cases [3]. Again consider a closely related method that uses sandwich estimator for the variance of
instead. It is not considered here.
Delta method on the log-scale under the beta-binomial assumption (LBB): Lui et al.2 propose methods that are constructed under the assumption of the beta-binomial distribution. Theoretically, the variance of
under this assumption is
where
is the beta-binomial overdispersion factor, expressed as a function of the number under risk in each replication ij of treatment i,
and of ci, the intraclass correlation coefficient. Under beta-binomial sampling, the intraclass correlation depends on
via .2
Lui et al.2 estimate the intraclass correlation using
Based on the between and within mean squared error of the observations,
and
This leads to estimators for the beta-binomial overdispersion factor, and the related variance of the proportion estimator under the beta-binomial assumption, for each treatment group i separately:2
(5)
(6)
The asymptotic interval relying on the delta method applied for the log risk ratio [2] can be constructed by:
(7)
Fieller-Bailey-type interval under beta-binomial assumption: FBB: Lui et al.2 consider a Fieller-type interval and its modification according to12 under the beta-binomial distribution:
and
As in Eq. (4), a meaningful interval can be calculated if A > 0 and B2 - AC > 0:
(8)
In the simulation study below, the interval
is returned if A > 0 and B2-AC > 0. Following [2], in case of the extreme events
or
, is replaced by
in the LBB and FBB method and their subsequent modifications.
Modifications of LOD, FOD, LBB, FBB by pooling and restricting the variance estimates: Lui et al2 state, based on theoretical considerations and in the context of estimation problems that the intraclass correlation ci and the overdispersion
cannot fall below 0 and 1, respectively. However, their estimates may fall below the boundaries imposed by the binomial assumption. For example, the event
may lead to the unreasonable estimates
and
. Moreover, interest here is in small sample laboratory experiments and estimating ci and
separately for each treatment from very few replications Ji may result in over fitting. Then, the assumption of a common beta-binomial overdispersion parameter
may lead to a more stable estimation with small sample sizes. Therefore, the methods LBB and FBB as well as LOD an FOD are simulated with the following additional restrictions and pooling of variance estimates:
- LBB1, FBB1 refer to methods LBB, FBB with the beta-binomial overdispersion factor restricted to be at least 1, i.e., using
instead of
in the equations (5,6)ff.
- LBBp, FBBp refer to methods LBB, FBB with the intraclass correlation estimator using pooled observations across the groups,
and
, instead of
in equations (5, 6)ff.
- LBB1p, FBB1p combine the two approaches by using the group wise variance estimators with the pooled intraclass correlation estimator and restriction of the over dispersion parameter to be
, that is, using
instead of
in equations (5,6)ff. This procedure has been already suggested in the example evaluation of.2
- LOD1, FOD1: refer to methods LOD, FOD, but the group wise variance estimators are restricted to be greater than or equal to the binomial variance estimate: using
in Eq.(2)
MOVER-R for quasibinomial profile deviance intervals (QBM): An alternative option to obtain intervals for the risk ratio would be to fit a generalized linear model (GLM) under the quasibinomial assumption
, using a log-link,
. Then the risk ratio can be estimated via
and intervals for the difference
can be computed by the signed root profile deviance method.6,5 However, with the current user-level implementations in R, fitting this model (glm, stats) and obtaining profile-deviance intervals (profile, confint, package MASS) suffers from numerical difficulties if at least one of the groups shows estimated success probabilities close or equal to 1 or equal to 0.
As a numerically stable work-around, the QBM method is proposed: A GLM with the quasibinomial assumption and logit link is fitted
. If the estimated dispersion parameter in the model fit falls below
, a binomial model is used instead (i.e., assuming
). For the βi, (1-α)--signed root profile deviance intervals can be computed, with limits denoted
and estimates denoted
. In R, these computations can be done in several packages, e.g. package MASS,5 or the add-on package mcprofile.13 Again, in extreme cases, the automatic search of values for the grid of parameter values for the deviance profile may fail in both packages. The signed root deviance is then computed over a pre-specified grid of parameter values,
with elements
, k = 1,…, K. For each i and each k,
is computed, where
retains the sign of the difference
,
is the deviance when replacing
by
while leaving all other parameters at their ML estimates,
,
is the deviance at the ML estimates, and
is the dispersion estimate with all parameters at their ML estimates. For each parameter i, a cubic spline is fitted for tik depending on
and the cut points of the spline with quantiles of the t-distribution,
is determined by linear interpolation between fitted values. When the binomial model is used, tik is replaced by
And the quantiles of the standard normal distribution
, are used instead.
The interval bounds and ML estimates are transformed to the proportion scale using the inverse link,
. These estimators and confidence limits are then used to compute intervals for ρ by the MOVER-R method.7 Eq. (9) of7 is recalled in the following as:
Like the FOD and the FBB method, this method may yield (partially) unbounded intervals, particularly when
, the upper limit for the risk ratio is naturally
.
Simulation study
The beta-binomial distribution is chosen to simulate overdispersed data such that the resulting data are in line with the quasibinomial assumption, i.e., the assumption of the QBM method is met.
In the simulation, the number of experimental units per treatment group, Ji is chosen balanced, (J0, J1) = (3, 3), (5, 5) and (10, 10). The number of biological units under risk in each unit, nij , is chosen balanced nij = 10 or 20 for all i, j. Overdispersion is set at levels
or
, that is, for given nij ,
is chosen according to Section 4.2 to achieve the specified overdispersion: for each set of
,
and
in the beta distribution. The distribution of the counts xij, as well as that of the estimators of the proportions are skewed to different extent, depending on πi, especially if πi is close to the border of the parameter space. Thus, also the distribution of the estimator of ρ or log (ρ) can be skewed if π1 and π0 differ, particularly when they are close to the parameter space. Thus, the performance of large sample methods may severely depend on particular choices of π0, π1. To investigate these potential dependencies, the simulations have been run for a grid of all combinations of π0 = (0.02, 0.04, …, 0.96, 0.98) and π1 = (0.02, 0.04, …, 0.96, 0.98) that imply odds-ratio between 0.1 and 10.
Simulations have been performed with 10000 runs for all methods except the QBM method. Due to high computation times, only 5000 simulation runs are used for QBM, such that the standard error of the estimated coverage probabilities for QBM is by factor higher than for the remaining methods. The simulation study has been performed in R 3.1.2,14 the implementation of all confidence interval methods is available in the R-package pairwiseCI, version > 0.1-2515 FBB, LBB and related methods are implemented in the function Betabin.ratio, FOD, LOD and related methods are implemented in the function ODbin.ratio, and the QBM method is implemented in function Quasibin.ratio.
Figure 1 Boxplots of simulated coverage probabilities of nominal two-sided 95% intervals, for different numbers of replications (no.rep) and two different levels of overdispersion.
Figure 2 Simulated coverage probabilities (color scale) of nominal two-sided 95% intervals over a grid of true proportions π0, π1, for two-fold overdispersion (ϕ = 2) and nij = 10 biological units in each experimental unit.
Figure 3 Simulated coverage probabilities (color scale) of nominal two-sided 95% intervals over a grid of true proportions π0, π1, for moderate overdispersion (ϕ = 1.25) and nij = 20 biological units in each experimental unit.
Figure 4 Asymmetry of non-coverage: color scale shows the proportion of cases where the true parameter is excluded by the lower limit all cases where the parameter is excluded by nominal two-sided 95% intervals over a grid of true prop proportions π0, π1, for clear overdispersion (ϕ = 2) and nij = 10 biological units in each experimental unit.
Figure 5 Boxplots and observed proportions of surviving fathead minnow larvae per tank, for the untreated control group and 5 concentrations (left), and 95% confidence intervals of the proportions of surviving larvae in the treatment groups relative to the control group (right)
Coverage probability of two-sided 95% confidence intervals
Figure 1 shows the simulated coverage probabilities of all 13 methods under comparison: the NF method can either show too low or too high coverage probability, irrespective of increasing sample size, because the relation of mean and variance in the binomial data is ignored. However, its average coverage probability is closer to the nominal level than some of those methods which explicitly account for over dispersed binomial data (LBB, FBB, LBBp, FBBp, LOD and FOD). Simply ignoring the possibility of overdispersion and assuming the binomial distribution results in too low coverage probability even for moderate (1.25-fold) overdispersion: BM method has too low coverage probabilities in nearly all settings. For very small numbers of replications (Ji=3, 5), the far too low coverage probability of the asymptotic methods for overdispersed binomial data (LBB, FBB, LOD, FBB) can be improved slightly by using a pooled variance estimator (LBBp, FBBp) and can be largely improved by setting a lower limit to their variance estimators: If we replace variance estimates suggesting under dispersion by the corresponding estimates under the binomial assumptions, the coverage probabilities of these methods are much closer to the nominal level. The QBM method is always very close to nominal coverage probability but can have slightly too high average coverage probability when overdispersion is moderate and the number of replications is small.
Figures 2 and Figure 3 shows a detailed view on the coverage probabilities in dependence on the true underlying proportions, π0 and π1. This detailed view is restricted to those 6 methods which have an average coverage probability close to 0.95. Figure 2 shows the more difficult case with substantial overdispersion (ϕ = 2), and only nij = 10 biological units in each replication. The QBM methods is close to the nominal confidence levels for a wide range of proportions, but can have too low coverage probabilities if at least one of the proportions is close to 0 or 1. The LBB1p and LOD1 methods need more replications to have coverage probabilities close to 0.95 for a similar range of πi, and still are slightly liberal for almost all πi. LBB1p and LOD1 booth have too high coverage probability for πi close to 0 and too low coverage probabilities when πi close to 1. That is test decisions based on these two methods may be conservative if hypotheses are formulated in terms of mortalities which should be low in the control group, but will be liberal when a similar hypothesis is formulated in terms of the proportion of survivors. In this simulation setting, the Fieller-Bailey-type intervals FBB1p and FOD1 have lower coverage probabilities for almost all parameter combinations considered as compared to the LBB1p and LOD1 method, respectively.
Figure 3 shows results for the less problematic case of moderate over dispersion (ϕ = 1.25) and nij = 20 biological units in each experimental unit. The coverage probability of QBM rarely falls below 0.94, but is slightly too large (between 0.96 and 0.97) if there are only 3 or 5 experimental units per treatment. The LBB1p and FOD1 method have again slightly too low coverage probability if any πi is close to 1, and slightly too high coverage probability if any πi is close to 0. The two Fieller-Bailey-type intervals have slightly lower coverage probabilities than their counterparts based on the log-delta-method.
Symmetry of non-coverage
Figure 4 shows the simulated proportion of cases, where the true parameter was excluded by the lower bound, relative to all cases where the parameter was excluded by the interval. For valid one-sided decisions, methods are preferable that exclude the true parameter equally likely by the lower and upper bound, i.e., with probability α/2 for each limit. For brevity, only the challenging setting with nij = 10 and marked over dispersion (ϕ = 2) is shown, while conclusions for the remaining simulation settings are similar: The Fieller-Bailey-type methods (FBB1p and FOD1) show a wider range of parameter settings where probability of parameter exclusion is equal between lower and upper bounds, as compared to the corresponding methods based on the delta methods on the log scale, LBB1p and LOD1. The QBM method shows asymmetric non-coverage for similar parameter settings as do the Fieller-Bailey-type intervals, i.e. when any of the true proportions is close to 0, but is clearly more symmetric than the FBB1p and FOD1 for a wide range of parameter settings where at least one πi is close to 1.
Examples
Extreme cases: Table 1 shows four extreme cases: Cases 1 and 2 represent cases where proportions are close to 0 in the control treatment as could result from testing the ratio of mortality or immobility proportions. Cases 3 and 4 show data that could arise from test systems that assume proportions close to 1 in the control, for example, survival proportions as in the fathead minnow data below.
|
Case 1 |
Case 2 |
Case 3 |
Case 4 |
Method |
x0j = (0,0,0,2)
x1j = (1,2,5,6)
=0.35/0.05=7 |
x0j = (0,0,1,1)
x1j = (1,2,2,4)
=0.225/0.05=4.5 |
x0j = (8,10,10,10)
x1j = (6,7,9,9)
=0.775/0.95=0.816 |
x0j=(10,10,10,10)
x1j = (9,9,9,10)
=0.925/1=0.925 |
BM |
(1.909, 26.2) |
(1.15, 17.5) |
(0.656, 0.975) |
(0.801, 1.028) |
FBB |
(1.331, 151.9) |
(1.45, 19.8) |
(0.657, 1.005) |
(0.879, 0.997) |
FOD |
(1.269, 182.6) |
(1.41, 21.0) |
(0.654, 1.009) |
(0.885, 0.990) |
FBB1p |
(1.290, 190.0) |
(1.24, 28.9) |
(0.643, 1.022) |
(0.851, 1.028) |
FOD1 |
(1.269, 182.6) |
(1.14, 36.8) |
(0.654, 1.009) |
(0.846, 1.033) |
QBM |
(0.902, 574.1) |
(1.24, 27.6) |
(0.558, 1.083) |
(0.817, 0.998) |
Table 1 95% confidence intervals of selected methods for four extreme cases, assuming all nij = 10, and Ji = 4for two treatment groups, i = 0, 1
In cases 1 and 3, data lead to variance estimates exceeding that of the binomial distribution. Then, the BM method leads to shorter intervals than all other methods. In both cases, the QBM as at least slightly wider confidence intervals than the FBB1p and FOD1 method, which might correspond to the observation that these two have too low coverage probability for small samples. In cases 2 and 4, data show a variance below that of the binomial variance. Then the methods without restriction of variance estimates to that assumed by the binomial distribution (FOD and FBB) yield considerably shorter intervals, than methods which assume that under dispersion is implausible like BM, FBB1p, FOD1 and QBM.
Fathead minnow data: The toxicity of a compound to fathead minnow larvae was investigated using an untreated control group and 5 concentrations of a compound.16 The experiment comprised 24 tanks, 4 tanks in each treatment group, each tank contained 10 larvae. The observed proportions of surviving larvae are shown in (Figure 5, left side). Analyzing the data in a generalized linear model with quasibinomial assumption, logit link shows that there are highly significant differences between the mean proportion of surviving larvae between the treatments (p < 0.0001; F-test in analysis of deviance). An estimated dispersion parameter of 1.082 suggests that the observations are at most slightly overdispersed, i.e., the data are at least roughly in line with the binomial assumption. The right side of Figure 5 shows two-sided 95% confidence intervals for ratios of the proportion of surviving larvae in the treatment groups relative to that in concentration 0. In this case, confidence limits based on the quasibinomial assumption (QBM) based on the full data including all 6 treatment groups and confidence intervals under the binomial assumption (MOVER-R method for Wilson-Score intervals, BM) do hardly differ.