Statistical predictive or confidence decisions (under parametric uncertainty) for future random quantities (future outcomes, order statistics, etc.) based on the past and current data is the most prevalent form of statistical inference. Predictive inferences for future random quantities are widely used in risk management, finance, insurance, economics, hydrology, material sciences, telecommunications, and many other industries. Predictive inferences (predictive distributions, prediction or tolerance limits (or intervals), confidence limits (or intervals) for future random quantities on the basis of the past and present knowledge represent a fundamental problem of statistics, arising in many contexts and producing varied solutions. The approach used here is a special case of more general considerations applicable whenever the statistical problem is invariant under a group of transformations, which acts transitively on the parameter space.1–29
- Adequate mathematical models of cumulative distribution functions of order statistics for constructing one-sided tolerance limits (or two-sided tolerance interval) in new (future) data samples under parametric uncertainty
Theorem 1: Let us assume that Y1£ … £Yn will be a new (future) random sample of n ordered observations from a known distribution with a probability density function (pdf)
cumulative distribution function (cdf)
, where
is the parameter (in general, vector). Then the adequate mathematical models for a cumulative probability distribution function of the kth order statistic Yk, kÎ{1, 2, …, n}, to construct one-sided
− content tolerance limits (or two-sided tolerance interval) for Yk with confidence level
are given as follows:
- Adequate Applied Mathematical Model 1 of a Cumulative Distribution Function of the kth Order Statistic Yk is given by
(1)
In the above case, a
upper, one-sided
content tolerance limit
with confidence level
can be obtained by using the following formula:
(2)
where
(3)
is the probability density function (pdf) of the beta distribution
with the shape parameters k and nk+1.
Proof: It follows from (1) that
(4)
This ends the proof.
A
lower, one-sided
content tolerance limit with confidence level
can be obtained by using the following formula:
(5)
A
two-sided
content tolerance interval with confidence level
can be obtained by using the following formula:
(6)
- Adequate Applied Mathematical Model 2 of a Cumulative Distribution Function of the kth Order Statistic Yk is given by
(7)
In the above case, a
upper, one-sided
content tolerance limit
with confidence level
can be obtained by using the following formula:
(8)
where
(9)
is the probability density function (pdf) of the beta distribution
with the shape parameters nk+1 and k.
Proof: It follows from (9) that
(10)
This ends the proof.
A
lower, one-sided
content tolerance limit with confidence level
can be obtained by using the following formula:
(11)
A
two-sided
content tolerance interval with confidence level
can be obtained by using the following formula:
(12)
- Adequate Applied Mathematical Model 3 of a Cumulative Distribution Function of the kth Order Statistic Yk is given by
(13)
In the above case, a
upper, one-sided
content tolerance limit
with confidence level
can be obtained by using the following formula:
(14)
where
(15)
is the probability density function (pdf) of the F distribution
with parameters k and n−k+1, which are positive integers known as the degrees of freedom for the numerator and the degrees of freedom for the denominator.
Proof: It follows from (13) that
(16)
This ends the proof.
A
lower, one-sided
content tolerance limit with confidence level
can be obtained by using the following formula:
(17)
A
two-sided
content tolerance interval with confidence level
can be obtained by using the following formula:
(18)
- Adequate Applied Mathematical Model 4 of a Cumulative Distribution Function of the kth Order Statistic Yk is given by
(19)
In the above case, a
upper, one-sided
content tolerance limit
with confidence level
can be obtained by using the following formula:
(20)
where
(21)
is the probability density function (pdf) of the F distribution
with parameters n−k+1 and k, which are positive integers known as the degrees of freedom for the numerator and the degrees of freedom for the denominator.
Proof: It follows from (19) that
(22)
This ends the proof.
A
lower, one-sided
content tolerance limit with confidence level
can be obtained by using the following formula:
(23)
A
two-sided
content tolerance interval with confidence level
can be obtained by using the following formula:
(24)
- Adequate mathematical models of conditional cumulative distribution functions of order statistic for constructing one-sided tolerance limits (or two-sided tolerance interval) in new (future) data samples under parametric uncertainty
Theorem 2: Let us assume that Y1£ … £Yn will be a new (future) random sample of n ordered observations from a known distribution with a probability density function (pdf)
cumulative distribution function (cdf)
, where
is the parameter (in general, vector). Then the adequate mathematical models for a conditional cumulative distribution function (ccdf) of the lth order statistic Yl, lÎ{2, …, n}, to construct one-sided
− content tolerance limits (or two-sided tolerance interval) for Yl (1 £ k < l £ n) ), given Yk=yk, with confidence level
are determined as follows:
- Adequate Applied Mathematical Model 5 of a Conditional Cumulative Distribution Function of the lth Order Statistic Yl is given by
(25)
In the above case, a
upper, one-sided
content tolerance limit
with confidence level
can be obtained by using the following formula:
(26)
where
(27)
is the probability density function (pdf) of the beta distribution
with shape parameters l−k and n−l+1.
Proof: It follows from (25) that
(28)
This ends the proof.
A
lower, one-sided
content tolerance limit with confidence level
can be obtained by using the following formula:
(29)
A
two-sided
content tolerance interval with confidence level
can be obtained by using the following formula:
(30)
- Adequate Applied Mathematical Model 6 of a Conditional Cumulative Distribution Function of the lth Order Statistic Yl is given by
(31)
In the above case, a
upper, one-sided
content tolerance limit
with confidence level
can be obtained by using the following formula:
(32)
where
(33)
is the probability density function (pdf) of the beta distribution
with shape parameters n−l+1 and l−k.
Proof: It follows from (31) that
(34)
This ends the proof.
A
lower, one-sided
content tolerance limit with confidence level
can be obtained by using the following formula:
(35)
A
two-sided
content tolerance interval with confidence level
can be obtained by using the following formula:
(36)
This ends the proof.
- Adequate Applied Mathematical Model 7 of a Conditional Cumulative Distribution Function of the lth Order Statistic Yl is given by
(37)
In the above case, a
upper, one-sided
content tolerance limit
with confidence level
can be obtained by using the following formula:
(38)
where
(39)
is the probability density function (pdf) of the F distribution
with parameters l−k and n−l+1, which are positive integers known as the degrees of freedom for the numerator and the degrees of freedom for the denominator.
Proof: It follows from (36) that
(40)
This ends the proof.
A
lower, one-sided
content tolerance limit with confidence level
can be obtained by using the following formula:
(41)
A
two-sided
content tolerance interval with confidence level
can be obtained by using the following formula:
(42)
This ends the proof.
- Adequate Applied Mathematical Model 8 of a Conditional Cumulative Distribution Function of the lth Order Statistic Yl is given by
(43)
In the above case, a
upper, one-sided
content tolerance limit
with confidence level
can be obtained by using the following formula:
(44)
where
(45)
is the probability density function (pdf) of the F distribution
with parameters nl+1 and lk, which are positive integers known as the degrees of freedom for the numerator and the degrees of freedom for the denominator.
Proof: It follows from (36) that
(46)
This ends the proof.
A
lower, one-sided
content tolerance limit with confidence level
can be obtained by using the following formula:
(47)
A
two-sided
content tolerance interval with confidence level
can be obtained by using the following formula:
(48)
This ends the proof.
- Two-parameter exponential distribution
Let Y = (Y1 £ ... £ Ym) be the first m ordered observations (order statistics) in a sample of size h from the two-parameter exponential distribution with the probability density function
(49)
and the cumulative probability distribution function
(50)
where
is the shift parameter and
is the scale parameter. It is assumed that these parameters are unknown. In Type II censoring, which is of primary interest here, the number of survivors is fixed and Y is a random variable. In this case, the likelihood function is given by
(51)
where
(52)
is the complete sufficient statistic for ρ. The probability density function of S = (S1, Sm) is given by
(53)
where
(54)
(55)
(56)
is the pivotal quantity, the probability density function of which is given by
(57)
(58)
is the pivotal quantity, the probability density function of which is given by
(59)
- Constructing a
upper, one-sided
content tolerance limit with confidence level
for the case of Model 1
Theorem 3: Let Y1£…£Ym be the first m ordered observations from the preliminary sample of size h from a two-parameter exponential distribution defined by the probability density function (49). Then the upper one-sided γ-content tolerance limit (with a confidence level β)
on the kth order statistic Yk from a set of n future ordered observations Y1£…£Yn also from the distribution (49), which satisfies
(60)
is given by
(61)
where
(62)
Proof: It follows from (2) and (3) that
(63)
where
(64)
It follows from (63) and (64) that
(65)
It follows from (64) and (65) that
(66)
It follows from (66) that
(67)
Then (61) follows from (67), this ends the proof.
- Constructing a
lower, one-sided
content tolerance limit with confidence level
for the case of Model 1
Theorem 4: Let Y1£…£Ym be the first m ordered observations from the preliminary sample of size h from a two-parameter exponential distribution defined by the probability density function (49). Then the lower one-sided γ-content tolerance limit (with a confidence level β)
on the kth order statistic Yk from a set of n future ordered observations Y1£…£Yn also from the distribution (49)), which satisfies
(68)
is given by
(69)
where
(70)
Proof: It follows from (3) and (5) that
(71)
where
(72)
It follows from (57) and (71) that
(73)
It follows from (72) and (73) that
(74)
It follows from (74) that
(75)
Then (69) follows from (75), this ends the proof.
- Numerical Practical Example
Let us assume that k =5, m =8, h =10, n=12, γ = β = 0.95,
(76)
Then , the
upper, one-sided
content tolerance limit
with confidence level
can be obtained from (61), where the quantile of
is given by
(77)
(78)
It follows from (61), (76) and (78) that
(79)
The
lower, one-sided
content tolerance limit
with confidence level
can be obtained from (69), where the quantile of
is given by
(80)
(81)
It follows from (69), (76) and (81) that
(82)
The
two-sided
content tolerance interval with confidence level
can be obtained by using (6), (79) and (82):
(83)
- New intelligent transformation technique for derivation of the density function of the student’s T distribution
Theorem 5: If
and
are independent random variables, then
(84)
where
follows the student’s t distribution with
degrees of freedom,
(85)
Proof.
(86)
where
(87)
It follows from (86) and (87) that
(88)
(89)
It follows from (88) and (89) that
(90)
This ends the proof.
- Confidence interval for the difference of means of two different normal populations
In most applications, two populations are compared using the difference in the means. Let U1, U2, ..., Um be a sample of size m from a normal population having mean
and variance
and let Z1, ..., Zn be a sample of size n from a different normal population having mean
and variance
and suppose that the two samples are independent of each other. We are interested in constructing a confidence interval for
To obtain this confidence interval, we need the distribution of
where
(91)
It follows from (91) that
(92)
It follows from (92) that
(93)
This is independent of
(94)
and
(95)
where
(96)
Taking (84), (93) and (96) into account, we have that
(97)
where T(m+n-2) is a t-random variable with m + n – 2 degrees of freedom,
(98)
Using (97) and (98), it can be obtained a 100(1-a)% confidence interval for
from
(99)
by suitably choosing the decision variables t1 and t2. Hence, the statistical confidence interval for
is given by
(100)
The length of the statistical confidence interval for
is given by
(101)
In order to find the confidence interval of shortest-length for
, we should find a pair of decision variables t1 and t2 such that (101) is minimum.
It follows from (98) and (99) that
(102)
where p
is a decision variable,
(103)
and
(104)
Then t2 represents the
- quantile, which is given by
(105)
t1represents the
- quantile, which is given by
(106)
The shortest length confidence interval for
can be found as follows:
Minimize
(107)
subject to
(108)
The optimal numerical solution minimizing
can be obtained using the standard computer software "Solver" of Excel 2016. If
it follows from (101) that
(109)
If, for example, m=58, n=27, a = 0.05,
then the optimal numerical solution of (107) is given by
(110)
and it follows from (99) and (109) that the 100(1-a)% confidence interval of shortest-length (or equal tails) for
is given by
(111)
or
(112)
- Confidence interval for the ratio of means of two different normal populations
Ratio in the means is used to compare two populations of positive data. Let U1, U2, ..., Um be a sample of size m from a normal population having mean
and variance
and let U1, ..., Un be a sample of size n from a different normal population having mean
and variance
and suppose that the two samples are independent of each other. We are interested in constructing a confidence interval for the ratio of means
of two different normal populations To obtain this confidence interval, we need the distribution of
where
(113)
It can be shown that
(114)
or
(115)
This is independent of
(116)
and
(117)
where
(118)
It follows from (84), (115) and (118) that
(119)
where T(m+n-2) is a t-random variable with m + n – 2 degrees of freedom. Taking Theorem 5 into account, we have that
(120)
Using (119) and (120), it can be obtained a 100(1-a)% confidence interval for
from
(121)
By suitably choosing the decision variables t1 and t2. Hence, the statistical confidence interval for
is given by
(122)
The length of the statistical confidence interval for
is given by
(123)
In order to find the confidence interval of shortest-length for
, we should find a pair of decision variables t1 and t2 such that (123) is minimum. It follows from (121) and (123) that
(124)
where p
is a decision variable,
(125)
and
(126)
Then t2 represents the
- quantile, which is given by
(127)
t1 represents the
- quantile, which is given by
(128)
The shortest length confidence interval for
can be found as follows:
Minimize
(129)
subject to
(130)
The optimal numerical solution minimizing
can be obtained using the standard computer software "Solver" of Excel 2016. If
it follows from (123) that
(131)
If, for example, m=6, n=4, a = 0.05,
then the optimal numerical solution of (129) is given by
(132)
and it follows from (121) and (131) that the 100(1-a)% confidence interval of shortest-length (or equal tails) for
is given by
(133)
If it follows from (133) that
(134)
or
(135)
An analytical expression for determining the optimal value of
(the ratio in means of two different normal populations) can be obtained from (121), where it is assumed that
and
(136)
Thus, it follows from (136) that
(137)