We propose three efficient sequential designs in the software reliability estimation. The fully sequential design the multistage sequential design and the accelerated sequential design. These designs make allocation decisions dynamically throughout the testing process. We then refine these estimated reliabilities in an iterative manner as we sample. Monte Carlo simulation seems to indicate that these sequential designs are second order optimal.
Keywords: software reliability, partition testing, fully sequential design, multistage sequential design, accelerated sequential design
Reliability of a system is an important aspect of any system design since any user of the system would expect some type of guarantee that the system will function to some level of confidence. Failing to meet such guarantee will result in disastrous consequences. On the other hand, overly exceeding such guarantee level may incur additional and unnecessary expense to the developers. Moreover, for any non-trivial software system, an exhaustive testing among the entire input domain can be very expensive. By adopting the partition testing strategy, we attempt to break up the testable input domain of possible test cases into partitions, which must be non-overlapping, such that if test case ii
belongs to partition jj
, then no partition other than jj
will contain ii
. Sayre and Poore1–11 have given several possible mechanics to partition the domain into finitely many subdomains, Xij={1,if testjtaken from partitioniis processed correctly0,otherwise
, such that :
D=k∪i=1Di;Di∩Dj=∅,i≠j
which allows us to define the system reliability by a weighted sum of reliabilities of these subdomains, i.e.
R=∑ki=1piRi
Where R
denotes the system reliability Ri
and is the reliability of each subdomain Di
; and pi
, parameters of the operational profile is the likelihood of this test case belongs to partition Di
, which are assumed to be known.12 As mentioned above, a complete testing of any software system of non-trivial size is practically impossible, Ri
are usually unknown parameters to us. So as to gain knowledge about Ri
, we must distribute the k
test cases among these k
partitions, and generate reasonable estimates for each. Specifically, we denote n1,n2,…,nk
as sizes of the samples which are taken from sub domain D1,D2,…,'Dk
, respectively, where ∑ki=1ni=N
We model the outcome of the jth taken from the ith partition as a Bernoulli random variable Xij such that:
Xij={1,if testjtaken from partitioniis processed correctly0,otherwiseand each Xij follows a Bernoulli distribution with parameter Ri . Then, the estimate of the overall system reliability, R denoted by ˆR can thus be defined as:
ˆR=k〉i=1piˆRi where ˆRi is the estimate of Ri after ni test cases have been allocated to partition such that:Ideally, we would like to execute all possible test paths through the software and determine the true overall reliability of the system. In practice though, resources are often limited, sample test cases must be chosen and allocated strategically to attain the best reliability estimate possible given all kinds of constraints. One of the criteria of distributing test cases among the partitions, which proceeds from rewriting (1.1) as follows:
Var(ˆR)=[∑ki=1pi√Ri(1−Ri)]2N+1N∑k−1i=1∑kj=i+1[nipj√Rj(1−Rj)−njpi√Ri(1−Ri)]2ninj
which is bounded below by the first term:
Var(ˆR)≥[∑ki=1pi√Ri(1−Ri)]2N
with equality of (2.1) to hold is and only if:
ninj=pi√Ri(1−Ri)pj√Rj(1−Ri)
for all . Hence, the optimal allocation is determined by:
niN=pi√Ri(1−Ri)∑kj=1pj√Rj(1−Ri)
for i=1,2,…,k−1 , and
nk=N−k−1〉i=1ni
and the variance incurred by this optimal allocation is:
Var(O)=[∑kj=1pj√Rj(1−Ri)]2N
Note that the optimal allocation depends on the actual reliabilities R1,R2,…,Rk, which are unknown. Therefore the optimal sampling scheme is not practical. It is this shortcoming that motivates us to adopt such dynamic allocation approaches that will be discussed in the following three sections.
By adopting a fully Bayesian approach with Beta priors, Rekab, Thompson and Wei5 proposed a fully sequential design shown to be first order optimal. The fully sequential design is defined as follows;
We first test one case from each of the partitions and estimate the reliability for each of these partitions.
Stage 1:
After cases have been allocated, l≥k,
the next test will be taken from partition i
if for all j
,
nl,inl,j<pi√ˆRl,i(1−ˆRl,i)pj√ˆRl,j(1−ˆRl,j)
where nl,i is the cumulative test cases allocated to partition i after l tests have been allocated and the current estimated reliability for partition i is determined by:
ˆRl,i=∑nl,im=1Ximnl,i
Stage 2:
Repeat step 2 sequentially until all the test cases are allocated, and the final estimate of reliability for partition
is:
ˆRi=∑nN,im=1XimnN,i
And thus, the estimate of the overall reliability of the system is:
ˆR=k〉i=1piˆRi
By adopting a fully Bayesian approach with Beta priors, Rekab, Thompson & Wei6 proposed a multistage sequential design shown to be first order optimal. Instead of making an allocation decision for each test at a time, the multistage sequential sampling allocates test cases among the partitions in
stages in batches, where
and
are both pre-specified. The multistage sequential design is defined as follows:
Stage 1:
We start with an initial sample of test cases, which are allocated among the
partitions with a balanced allocation scheme, such that:
S1,i=S1k
and estimate the reliability for partition by:
ˆR1,i=∑s1,im=1XimS1,i
Therefore,
ˆCi(S1)=pi√ˆR1,i(1−ˆR1,i)∑kj=1pj√ˆR1,j(1−ˆR1,j)
Stage 2 through L:
At stagej, for partition the test cases are distributed by the following way:
2≤j≤L, for partition i=1,2,…,k−1,the test cases are distributed by the following way:
Sj,i=(j〉l=1Sl)ˆCi(ˉSj−1);
and
Sj,k=j〉l=1Sl−k−1〉i=1Sj,i
where
ˉSj−1=j−1〉y=1Sy
At the final stage , the cumulative number of tests allocated to partition
is:
Ni=min{N−k−1〉j=1,j≠iSL−1,j,max(NˆCi(ˉSL−1),SL−1,i)}
and
Nk=N−k−1〉i=1Ni
Among several ways of determining the number of cases at each sampling stage, the simplest one is to select:
S1=S2=…=N/L
However, choosing stage sizes, especially the initial stage size, can be done by following some general criteria, a good initial stage size can be √N
for when a two stage sampling scheme is considered, and more generally, Rekab9 shows that for a two stage procedure, must be chosen such that:
limN→∞S1=∞,andlimN→∞S1N=0
By adopting a fully Bayesian approach with Beta priors, Rekab, Thompson and Wei6 proposed an accelerated sequential design shown to be first order optimal. The accelerated sampling scheme combines the fully sequential sampling scheme and the multistage sampling scheme. It is defined as follows:
Stage 1:
The procedure starts with an initial sample S1
, which satisfies the conditions specified in (4.1). Then, allocate S1
equally among partitions
Stage 2 through L−1
At stage j2≤j≤L−1, for partition i=1,2,…,k−1, the test cases are distributed by the following way:
Sj,i=(j〉l=1Sl)ˆCi(ˉSj−1);
and
Sj,k=j〉l=1Sl−k−1〉l=1Sj,i
where
ˉSj−1=j−1〉y=1Sy
At each stage, Sj must satisfy the two conditions as S1 .
Stage L:
In the final stage, we adopt a fully sequential approach by allocating one test from partition , if for all
,
nl,inl,j<pi√ˆRl,i(1−ˆRl,i)pj√ˆRl,j(1−ˆRl,j)
until all the test cases have been allocated. Note that nl,i
, nl,i
are the cumulative test cases allocated to partition
and
after the allocations of a total of
test cases, where
∑L−1j=1Sj,i≤l≤∑L−1j=1Sj,i+N−∑L−1j=1Sj
Therefore, the estimate for the whole system is finally obtained as:
First order optimality of these three sequential designs was established by Rekab, Thompson & Wu,5–7 although the focus here is on minimizing the variance incurred by the sequential designs rather than minimizing the Bayes risk incurred by these designs. For estimating the mean difference of two independent normal populations, Woodroofe and Hardwick12 adopted a quasi-Bayesian approach. They determined an asymptotic lower bound for the integrated risk and proposed a three-stage design that is second-order optimal. For estimating the mean difference of two general one -parameter exponential family, Rekab and Tahir10 adopted a fully Bayesian approach with conjugate priors. They determined an asymptotic second order lower bound for the Bayes risk.
We consider the case where the test domain is partitioned into two subdomains D1 and D2 with reliability R and R2 respectively with equal usage probabilities p1,p2. Second order optimality of the three sequential designs is investigated through Monte Carlo simulations.
(R1,R2) |
N=300 |
N=500 |
N=800 |
N=2000 |
N=5000 |
N=8000 |
0.1,0.9 |
28.1562 |
3.2104 |
0.739 |
2.4225 |
9.304 |
0.7133 |
0.5,0.2 |
29.038 |
6.102 |
0.63 |
7.9375 |
4.67 |
4.05 |
0.5,0.5 |
1.5972 |
0.146 |
0.212 |
0.9125 |
0.512 |
0.4933 |
0.5,0.9 |
80.7747 |
71.9032 |
19.951 |
7.96 |
5.796 |
1.2666 |
0.9,0.3 |
64.0246 |
53.8008 |
5.5497 |
3.2565 |
15.2546 |
2.6517 |
Table 1 N2*(Var(Δ)−Var(O)) by Fully Sequential Scheme
(R1,R2) |
N=300 |
N=500 |
N=800 |
N=2000 |
N=5000 |
N=8000 |
0.1,0.9 |
10.9835 |
0.0288 |
4.0718 |
55.7962 |
2.332 |
1.8422 |
0.5,0.2 |
11.8285 |
19.172 |
2.0555 |
5.51 |
0.8432 |
7.1258 |
0.5,0.5 |
0.02749 |
0.0459 |
0.175 |
0.4568 |
0.47324 |
0.1064 |
0.5,0.9 |
37.6665 |
31.6257 |
17.6417 |
3.6377 |
6.8483 |
9.2952 |
0.9,0.3 |
20.2034 |
2.86352 |
7.6161 |
3.1589 |
11.7869 |
5.2018 |
Table 2 N2*(Var(Δ)−Var(O)) by Multistage Scheme
(R1,R2) |
N=300 |
N=500 |
N=800 |
N=2000 |
N=5000 |
N=8000 |
0.1,0.9 |
30.5713 |
6.587 |
6.4568 |
17.4023 |
2.1138 |
1.0098 |
0.5,0.2 |
8.1424 |
6.1346 |
7.1683 |
0.8968 |
0.1462 |
2.1009 |
0.5,0.5 |
0.03801 |
0.23284 |
0.0319 |
0.7025 |
0.064 |
0.0392 |
0.5,0.9 |
37.6665 |
31.6257 |
17.6417 |
3.6377 |
6.8483 |
9.2952 |
0.9,0.3 |
48.4148 |
13.817 |
7.8473 |
8.4656 |
5.8734 |
1.6708 |
Table 3 N2*(Var(Δ)−Var(O)) by Accelerated Scheme
Table I, II, III seem to indicate that the speed N2*(Var(Δ)−Var(O)) is bounded.
Second optimal designs are more efficient than the first order optimal designs especially when the total number of cases is very large. This is the main argument that led us to investigate the second order optimality of these three dynamic designs. Simulation studies seem to indicate that these designs are second order optimal. We conjecture that second order optimally may be obtained theoretically as well.
It is very common in parametric estimation to use the squared error loss. However, in reliability estimation one should distinguish between the cost of overestimating and underestimating the system reliability. Examples of practical loss functions were presented by Stüger:2
l(ˆR,R)=co(ˆR−R)2I{ˆR>R}+cu(R−ˆR)2I{ˆR<R}
and by Granger:1
l(ˆR,R)=co(ˆR−R)I{ˆR>R}+cu(R−ˆR)I{ˆR<R}
where represents the overestimation and underestimation costs, respectively.
None.
Authors declare that there are no conflicts of interests.
© . This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7