Sample size determination continues to be an important research area of statistics. Perhaps nowhere is this truer than pharmaceutical statistics, where cost and time constraints have made finding the appropriate sample size before conducting a study of the utmost importance. The problem is quite simple: too small a sample can lead to under-powered studies, while too large a sample size wastes precious resources. In this article we consider the sample size determination problem as it pertains to the two-sample testing of Poisson rates from a Bayesian perspective subject to operating characteristics constraints.

There are several advantages to the Bayesian perspective when trying to determine a study's requisite sample size, a topic that is expounded in Adcock [1]. Their construction does not depend on asymptotic approximations or bounds. Classical solutions to the sample size determination problem typically hinge on asymptotic arguments that require the researcher to specify one parameter value (perhaps vector valued) as representative for the entire parameter space, a process that is typically done using bounding arguments. This, for example, is what is done when determining the requisite sample size for a confidence interval of a fixed level and given length. The resulting sample sizes are consequently conservative. On the other hand, the Bayesian approach provides the statistician with the ability to model his indecision about the parameter through expert knowledge or previous studies. As noted by Bayarri and Berger [2], this can allow the Bayesian approach to have better operating characteristics, such as a smaller required sample sizes or better Type I and II error rates.

Various Bayesian sample size determination methods have been studied for binomial and Poisson data. Stamey et al. [3] considered one and two sample Poisson rates from the perspective of interval based criteria such as coverage and width. Hand et al. [4] extend those ideas by considering both interval-based and test-based criteria, albeit without considering power. Katsis and Toman [5] used more decision theoretic criteria for the two sample binomial case, but only to the extent of controlling the posterior risk with a pre specified bound. Zhao et al. [6] extend those ideas by using computational methods to consider expected Bayesian power of the test. In this article, we extend these results to the Poisson data model. We also consider the problem the subject to Type I and Type II constraints. This is thus an important extension of [6], because it

- Extends the ideas to the Poisson case
- Enables the incorporation of operating characteristics.

A subtle difference between the classical and Bayesian methods of sample size determination merits discussion before proceeding. One of the novel contributions of this article is an algorithmic solution to the sample size determination problem subject to operating characteristics constraints for Poisson data. However, since the entire problem is treated in a Bayesian context, the concept of Type I and Type II error rates is understood in an average, or expected, sense; see [7]. For example, the ``power of a test'' retains the interpretation of the probability the decision rule rejects when the null hypothesis when it is false; but rather than being a function defined over the alternative space, here it is averaged over that space and weighted by the prior distribution specified on the alternative hypothesis. To make the distinction more clear, we refer to this as the expected Bayesian power (EBP), as is done in [8]; alternatively, it may be referred to as the probability of a successful test. These ideas, though apparently not considered in the literature previously, can also apply to the concept of the significance level of a test. While frequentist methods typically report one value for significance level, what they are really doing (in non point null hypotheses) is taking the largest possible significance level; thus, taking an expectation of a significance level curve could be done as well. Consequently, in this article we also consider the expected Bayesian significance level (EBSL), defined as the expected value of the test under the prior distribution given on the null space. In both cases, any particular instance of the actual Type I and Type II error rates can be greater than or less than nominal.

This article proceeds as follows. In Section 2 we introduce the theoretical formulation of the sample size determination problem for two Poisson variates including consideration of operating characteristics. In Section 3, we present an algorithmic solution to the sample size determination problem posed in Section 4. Section 5 contains an application of the method in the area of pharmaceutical statistics. We then conclude with a discussion.

**Problem specification and the bayes rule**

We now follow the general framework of [6] in the development of this problem, adapting the binomial case to fit the Poisson data model. Suppose ${Y}_{1}~Pois(t{\lambda}_{1})$
and ${Y}_{2}~Pois(t{\lambda}_{2})$
, independently, where ${\lambda}_{1}$
and ${\lambda}_{2}$
represent the rate parameters of interest and *t* represents a common sample (or “opportunity”) size. Together, we write these $Y=({Y}_{1},{Y}_{2}{)}^{\prime}$
with observations $y=({y}_{1},{y}_{2}{)}^{\prime}$
. The sample size determination problem is to calculate the necessary sample size required to test the hypotheses

${H}_{0}:{\lambda}_{1}={\lambda}_{2}$
vs ${H}_{1}:{\lambda}_{1}\ne {\lambda}_{2}$

using a given decision rule; here we use the Bayes rule. Denoting the parameter pair $\lambda =({\lambda}_{1},{\lambda}_{2}{)}^{\prime}$
, the associated null and alternative spaces are therefore ${\Lambda}_{0}=\{\lambda \in {R}_{+}^{2}:{\lambda}_{1}={\lambda}_{2}\},$
which we identify with ${R}_{+}$
with elements generically denoted $\lambda $
, and ${\Lambda}_{1}=\{\lambda \in {R}_{+}^{2}:{\lambda}_{1}\ne {\lambda}_{2}\}$
.

As this problem is being considered from the Bayesian perspective, we place prior probabilities of ${\pi}_{0}$
and ${\pi}_{1}=1-{\pi}_{0}$
on ${H}_{0}$
and ${H}_{1}$
, respectively. Conditional on the null being true, we represent the expert opinion regarding $\lambda $
as ${p}_{0}(\lambda )$
, defined over the set ${\Lambda}_{0}$
. Alternatively, conditionally on ${H}_{1}$
being true, we represent the belief concerning ${\lambda}_{1}\ne {\lambda}_{2}$
with a joint prior $p({\lambda}_{1},{\lambda}_{2})$
with support ${\Lambda}_{1}$
. Marginalizing over the hypotheses, we have the unconditional prior

$p({\lambda}_{1},{\lambda}_{2})={p}_{0}(\lambda ){\pi}_{0}{I}_{{H}_{0}}+{p}_{1}({\lambda}_{1},{\lambda}_{2}){\pi}_{1}{I}_{{H}_{1}},$
Where ${I}_{{H}_{0}}$
and ${I}_{{H}_{1}}$
are the indicator function of ${H}_{0}$
and ${H}_{1}$
, respectively.

In practice, the prior distributions on $\lambda $
, ${\lambda}_{1}$
, and ${\lambda}_{2}$
summarize expert opinion concerning the parameters in each of the two scenarios ${H}_{0}$
and ${H}_{1}$
. This information can be obtained in a number of ways including past data, prior elicitation of expert opinion (see especially [9]), or based on uninformative criteria. For simplicity, we consider conjugate priors for all three $\lambda $
’s, so that under ${H}_{0}$
, $\lambda \sim \text{Gamma}(\alpha ,\beta )$
, and under ${H}_{1}$
, ${\lambda}_{i}\sim \text{Gamma}({\alpha}_{i},{\beta}_{i})$
independently. This assumption is not very restrictive and allows us to specify parameters of prior distribution sin stead of distributions themselves.

We now derive the optimal Bayes (decision) rule in deciding between the hypotheses presented in (1). In solving the related problem between two binomial proportions, [6] use the classical decision theoretic setup using the 0-1 loss function [10]. Here we use the more general unequal loss function

$L(H,a)=\{\begin{array}{c}{c}_{1}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}{H}_{0}\text{istrueand}a\text{=1,}\\ {c}_{2}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}{H}_{1}\text{istrueand}a\text{=0,}\\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{otherwise}\end{array}$

Where $a=\delta (y)$
is the decision rule with *a* = 0 representing selection of ${H}_{0}$
and *a* = 1 selection of ${H}_{1}$
. Thus, *c*1 represents the loss associated with a Type I error, and *c*2 that of a Type II error.

The Bayes action is simply the one that minimizes posterior expected loss [11]. Since the posterior expected loss associated with an action $a\in \{0,1\}$
is

$$\begin{array}{l}\rho (a)={E}_{\lambda |y}[L(\phi ,a)]\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\{\begin{array}{c}{c}_{1}P({H}_{0}|Y=y),\text{if}a=1,\\ {c}_{2}P({H}_{1}|Y=y),\text{if}a=0,\end{array}\end{array}$$
Setting $c={c}_{1}/{c}_{2}$
we can express the optimal decision rule, ${a}^{*}$
, as

${a}^{*}(y)=\{\begin{array}{c}0,\text{if}P({H}_{1}|Y=y)cP({H}_{0}|Y=y),\\ 1,\text{if}P({H}_{1}|Y=y)\ge cP({H}_{0}|Y=y).\end{array}$

The rejection region, $W$
, is therefore

$W=\{y:P({H}_{1}|Y=y)\ge cP({H}_{0}|Y=y)\}.$
The optimal rule in (2) can be nicely represented in terms of the Bayes factor. The Bayes factor is defined as the ratio of the posterior odds of ${H}_{1}$ to the prior odds of $H$
, so that a large Bayes factor is evidence for rejecting the null hypothesis [12]. Specifically, the Bayes factor is defined

$B=\frac{P({H}_{1}|Y=y)/P({H}_{0}|Y=y)}{{\pi}_{1}/{\pi}_{0}}=\frac{P({H}_{1}|Y=y){\pi}_{0}}{P({H}_{0}|Y=y){\pi}_{1}}.$
This ratio is useful in Bayesian inference because it is often interpreted as partially eliminating the influence of the prior on the posterior, instead emphasizing the role of the data. Moreover, the decision rule is a function of a Bayes factor:

$\begin{array}{l}W=\left\{y:P({H}_{1}|Y=y)\ge cP({H}_{0}|Y=y)\right\}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\left\{y:P({H}_{1}|Y=y)\frac{{\pi}_{0}}{{\pi}_{1}}\ge cP({H}_{0}|Y=y)\frac{{\pi}_{0}}{{\pi}_{1}}\right\}\end{array}$ = $$\text{\hspace{0.17em}}=\left\{y:B\ge c\frac{{\pi}_{0}}{{\pi}_{1}}\right\}.$$

This is particularly useful because it allows for the interpretation of the Bayes factor *B* as the test statistic for the decision rule in (2); this is the condition in (4).

We now derive closed-form expression for the posterior probabilities of the null and alternative hypotheses. Using Bayes’ theorem, we have

$P({H}_{0}|Y=y)=\frac{P(Y=y|{H}_{0}){\pi}_{0}}{P(Y=y|{H}_{0}){\pi}_{0}+P(Y=y|{H}_{1}){\pi}_{1}}$
and

$P({H}_{1}|Y=y)=\frac{P(Y=y|{H}_{1}){\pi}_{1}}{P(Y=y|{H}_{0}){\pi}_{0}+P(Y=y|{H}_{1}){\pi}_{1}}$
.

Consequently, the posterior probabilities have closed-form expressions if the likelihoods do. Computing these, we have

$\begin{array}{l}P(Y=y|{H}_{0})={\displaystyle \underset{0}{\overset{\infty}{\int}}f(y|\lambda ,{H}_{0}){p}_{0}(\lambda |{H}_{0})d\lambda}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}={\displaystyle \underset{0}{\overset{\infty}{\int}}\frac{{(\lambda t)}^{{y}_{1}}{e}^{-\lambda t}}{{y}_{1}!}\frac{{(\lambda t)}^{{y}_{2}}{e}^{-\lambda t}}{{y}_{2}!}\frac{{\beta}^{\alpha}}{\Gamma (\alpha )}{\lambda}^{\alpha -1}{e}^{-\beta \lambda}d\lambda}\end{array}$
$\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\frac{{t}^{{y}_{1}+{y}_{2}}{\beta}^{\alpha}\Gamma ({y}_{1}+{y}_{2}+\alpha )}{{y}_{1}!{y}_{2}!\Gamma (\alpha ){(2t+\beta )}^{{y}_{1}+{y}_{2}+\alpha}},$
and

$\begin{array}{l}P(Y=y|{H}_{1})={\displaystyle \underset{0}{\overset{\infty}{\int}}f(y|\lambda ,{H}_{1}){p}_{1}(\lambda |{H}_{1})d\lambda}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}={\displaystyle \prod _{i=1}^{2}{\displaystyle \underset{0}{\overset{\infty}{\int}}\frac{{({\lambda}_{i}t)}^{{y}_{i}}{e}^{-{\lambda}_{i}t}}{{y}_{i}!}\frac{{\beta}_{i}{}^{{\alpha}_{i}}}{\Gamma ({\alpha}_{i})}{\lambda}_{i}{}^{{\alpha}_{i}-1}{e}^{-{\beta}_{i}{\lambda}_{i}}d{\lambda}_{i}}}\end{array}$
$\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}={\displaystyle \prod _{i=1}^{2}\frac{{t}^{{y}_{i}}{\beta}_{i}{}^{{\alpha}_{i}}\Gamma ({y}_{i}+{\alpha}_{i})}{{y}_{i}!\Gamma ({\alpha}_{i}){(t+{\beta}_{i})}^{{y}_{i}+{\alpha}_{i}}}}.$
Note that the probability of the data under the null hypothesis is the product of two independent negative binomial likelihoods.

Combining (3) with (5) and (6) allows us to represent *W* in terms of the null and alternative likelihoods as follows:

$\begin{array}{l}W=\left\{y:P({H}_{1}|Y=y)\ge cP({H}_{0}|Y=y)\right\}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\left\{y:P(Y=y|{H}_{1}){\pi}_{1}\ge cP(Y=y|{H}_{0}){\pi}_{0}\right\}.\end{array}$
Consequently, (7) and (8) give explicit conditions for the rejection of the optimal decision rule:

$W=\left\{y:B=\frac{\Gamma (\alpha ){(2t+\beta )}^{{y}_{1}+{y}_{2}+\alpha}}{{\beta}^{\alpha}\Gamma ({y}_{1}+{y}_{2}+\alpha )}{\displaystyle \prod _{i=1}^{2}\frac{{\beta}_{i}{}^{{\alpha}_{i}}\Gamma ({y}_{i}+{\alpha}_{i})}{\Gamma ({\alpha}_{i}){(t+{\beta}_{i})}^{{y}_{i}+{\alpha}_{i}}}}\ge c\frac{{\pi}_{0}}{{\pi}_{1}}\right\}$

Note that the left side of (9) is our test statistic and Bayes factor, B, so that (9) is an explicit formulation of the condition presented in (4).

**Sample Size Determination for the Bayes Rule**

The explicit description of the decision rule in (9) allows us to compute all sorts of quantities of interest. For given prior parameters ${\pi}_{0}$
, ${\pi}_{1}$
, α, β, ${\alpha}_{1}$
, ${\beta}_{1}$ , ${\alpha}_{2}$
, and ${\beta}_{2}$
and loss penalties ${c}_{1}$
and ${c}_{2}$
(or simply *c*), the Expected Bayesian Power (EBP) ${\omega}_{t}$
is defined

${\omega}_{t}=P(Y\in W|{H}_{1})={\displaystyle \sum _{y\in W}P(Y=y|{H}_{1}})={\displaystyle \sum _{y\in W}{\displaystyle \prod _{i=1}^{2}\frac{{t}^{{y}_{i}}{\beta}_{i}{}^{{\alpha}_{i}}\Gamma ({y}_{i}+{\alpha}_{i})}{{y}_{i}!\Gamma ({\alpha}_{i}){(t+{\beta}_{i})}^{{y}_{i}+{\alpha}_{i}}}}},$

and the Expected Bayesian Significance Level (EBSL) ${\alpha}_{t}$
is

${\alpha}_{t}=P(Y\in W|{H}_{0})={\displaystyle \sum _{y\in W}P(Y=y|{H}_{0}})={\displaystyle \sum _{y\in W}\frac{{t}^{{y}_{1}+{y}_{2}}{\beta}^{\alpha}\Gamma ({y}_{1}+{y}_{2}+\alpha )}{{y}_{1}!{y}_{2}!\Gamma (\alpha ){(2t+\beta )}^{{y}_{1}+{y}_{2}+\alpha}}}.$

Note three things. First the inclusion of the *t* subscripts highlights the fact that these quantities depend on *t*. Second, both ${\omega}_{t}$
and ${\alpha}_{t}$
marginalize over the corresponding alternative and null spaces ${\Lambda}_{1}$
and ${\Lambda}_{0}$
, respectively, this is the sense in which the power and significance level are expectations. Third, the constant *c* (or *c*1 and *c*2) is represented in the expressions through *Wt*, which is itself dependent on *t*.

In their articles, [5, 12] demonstrate that as the sample size tends to infinity, the Bayes factor converges to either 0 or 1. As a consequence, in the current context as the sample size *t *tends to infinity the Bayes factor $B$
converges to either 0 or 1, so that [12] implies that ${\omega}_{t}\stackrel{a.s.}{\to}1$
and ${\alpha}_{t}\stackrel{a.s.}{\to}0$
. Thus, for any pre-specified power $\omega $
and significance level $\alpha $
, there exists a *t *such that for all ${t}^{\prime}\ge {t}^{*}$
, ${\omega}_{{t}^{\prime}}\ge \omega $
and ${\alpha}_{{t}^{\prime}}\ge \alpha $
. We define ${t}_{\alpha ,\omega}^{*}$
to be the in fimum of this collection of lower bounds, i.e.

${t}_{\alpha ,\omega}^{*}=\underset{t\in {R}_{+}}{\mathrm{inf}}\left\{t:{\alpha}_{t}\le \alpha \text{\hspace{0.17em}}\text{and}{\omega}_{t}\ge \omega \right\}.$
${t}_{\alpha ,\omega}^{*}$
is said to be the optimal sample size for $\omega $
and $\alpha $
, and computing ${t}^{*}$
is called the sample size determination problem. If only a power is specified, or if only a significance level is specified, the other quantity is left off of the subscript and out of the definition. We often write simply ${t}^{*}$
for ${t}_{\alpha ,\omega}^{*}$
.

Were (10) and (11) monotonic and continuous in

*t*, the sample size determination problem would be quite straightforward. Simply run a numerical root-finder (e.g. Newton's method) on

${\omega}_{t}-\omega $
and

${\alpha}_{t}-\alpha $
take the larger

*t*. Unfortunately, however, as a function of

*t* both the power and significance level are discontinuous functions that are not monotonic. As a consequence, it is possible, for example, for there to be two such sample sizes

*t*1and

*t*2 with

${t}_{1}<{t}_{2}$
such that

${\omega}_{{t}_{1}}\ge \omega $
and yet

${\omega}_{{t}_{2}}<\omega $
. This is a consequence of the dependence of

*Wt *on

*t*: as

*t* grows, the rejection region changes, and these changes result in discrete jumps in the rejection region.

Practically speaking, in our experience the appearance of these jumps is monotonically decreasing in magnitude and dissipate quite quickly in *t* so that, while there are jumps, they become relatively minor even for quite small *t*. Thus, while out-of-the-box numerical routines are insufficient for the task, a straight-forward heuristic algorithm suffices to certify ${t}^{*}$
to a reasonable level of accuracy.

Initialize $t={t}_{0}$
with a value small enough to satisfy ${\omega}_{t}<\omega $
and ${\alpha}_{t}>\alpha $
, typically a value like .1 will suffice. Then, double *t* until both conditions are met. Once both conditions are met, decrement *t* by 1 until the conditions are no longer satisfied; then decrement by .1, and so on to achieve the desired precision. Alternatively, one may move in a binary search manner; this method is faster but loses the certificate of the solution up to the highest evaluated point. By contrast, the first heuristic certifies that the sample size achieved is optimal up to the largest *t* observed, which is of the form ${2}^{k}{t}_{0}$
.

One computational detail is relevant for implementing this procedure. By definition, since ${Y}_{1}$
and ${Y}_{2}$
are Poisson variates, their sample spaces are infinitely large, and thus computing ${\omega}_{t}$
and ${\alpha}_{t}$
is not numerically possible - one cannot check every $y$
to determine whether or not it is included in *Wt*. There is, however, a very reasonable work-around for this problem using prior predictive distributions. Under *H*1, the prior predictive distributions on ${Y}_{1}$
and ${Y}_{2}$
are both negative binomial. Specifically,

${Y}_{1}~\text{NegBin}\left({\alpha}_{1},\frac{{\beta}_{1}}{t+{\beta}_{1}}\right)$
and

${Y}_{2}~\text{NegBin}\left({\alpha}_{2},\frac{{\beta}_{2}}{t+{\beta}_{2}}\right).$

While one cannot enumerate the entire sample space of , one can be confident they have a satisfactory approximation by computing small (e.g. .00001) and large (e.g. .99999) quintiles of these distributions and then simply taking every combination of the ranges from low to high. This is the approach we take to computing the sums listed in (10) and (11) above.

**An Example from Cancer Therapeutics**

- An example of the proposed methodology is readily available from the field of cancer therapeutics. Suppose (1) Drug A is an industry standard therapy for a certain type of cancer that is known to have the common side effect of mild seizures every hour of infusion
- (2) Drug B is a novel compound believed to have the same side effect but at a lessened rate
- (3) The goal is to design a design a clinical trial that compares the two using the minimum resources required to meet 5% significance level (EBSL) and 80% power (EBP). Moreover, suppose that the losses associated with Type I and Type II errors have a ratio of 1:1.
** **

From past studies, it is known that the uncertainty in the rate of seizures with Drug A (per hour) is well-represented by a Gamma (4, 4) distribution so that ${\lambda}_{A}~\text{Gamma}(4,4)$
. Drug B, by contrast, is believed to be a bit worse, with perhaps a rate that is double that of Drug A with experts 90% sure the value is less than about 3. This translates into roughly ${\lambda}_{B}~\text{Gamma}(4,8)$
. Figure 1 shows both of these priors graphically.

**Figure 1: **Prior structures used in Poisson sample size determination example.

Assuming that the rates are the same, the past indicators of Drug A supersede the lack of evidence for Drug B, so that under the null hypothesis $\lambda ={\lambda}_{1}={\lambda}_{2}~\text{Gamma}(4,4)$
is most appropriate. Assuming that the null and alternative hypotheses are given the same belief (50%), the rejection rule from (9) is therefore

$$W=\left\{y:\frac{\Gamma (4){(2t+4)}^{{y}_{1}+{y}_{2}+4}}{{4}^{4}\Gamma ({y}_{1}+{y}_{2}+4)}\frac{{4}^{4}\Gamma ({y}_{1}+4)}{\Gamma (4){(t+4)}^{{y}_{1}+4}}\frac{{4}^{8}\Gamma ({y}_{2}+8)}{\Gamma (8){(t+4)}^{{y}_{1}+8}}\ge 3\right\}$$ (12)

$$=\left\{y:\frac{{(2t+4)}^{{y}_{1}+{y}_{2}+4}}{{(t+4)}^{{y}_{1}+4}}\ge \frac{15120}{65536}\frac{({y}_{1}+{y}_{2}+3)!}{({y}_{1}+3)!({y}_{2}+7)!}\right\}$$(13)

To design a test with 80% power and at the 5% significance level, we need but to run the algorithm described in Section 3. To illustrate the scenario, we plot the significance level and power for every *t *from 1 to 80; these are included in Figures 2 and 3, respectively. Note how quickly (in *t*) the functions become smooth, nullifying any concern about jumps. The 80% level is achieved at *t* = 37, with a power of 80.1%, and the 5% significance level is achieved at *t* = 57, where the significance level is 4.9%. Thus, to achieve both, we select the higher sample size, *t* = 57.

We can also verify these results via simulation by generating one million random values of ${\lambda}_{1}$
and ${\lambda}_{2}$
from the priors given above. To validate the significance level result, we simply (1) sample from the prior, (2) generate two Poisson observations with mean equal to the sample in (1) times 57, and (3) determine whether the test rejects (1) or not (0). Averaging the million test results, the value is confirmed to Monte Carlo error (code available upon request). To validate the power result, we (1) sample one number from a Gamma(4, 4)and one number from a Gamma(8, 4), (2) sample two Poisson observations from distributions with mean 37 times the two variates generated in (1), and (3) determine whether the test rejects or not. Averaging the million results validates the theoretical result.