Research Article Volume 8 Issue 6
Department of Statistics, Mathematics and Insurance, Faculty of Commerce, Tanta University, Egypt
Correspondence: Dina H. Abdel Hady, Department of Statistics, Mathematics and Insurance, Faculty of Commerce, Tanta University, Egypt
Received: September 24, 2019 | Published: December 30, 2019
Citation: Abdelhady DH. Parameter estimation for the bivariate inverse lomax distribution by the EM algorithm based on censored samples. Biom Biostat Int J. 2019;8(6):223-229. DOI: 10.15406/bbij.2019.08.00293
In this article, we estimate the parameters of the bivariate Inverse Lomax distribution of Marshall-Olkin based on right censored sample. Utilizing EM algorithm is a priority because the vector of the observed data is not complete but viewed as an observable function of complete data. After that, the EM algorithm makes use of the simplicity of maximum likelihood estimation for complete data. In addition, normal deviations of the estimates of bivariate Lomax distribution are derived.
A comparison is conducted via a simulation study between estimates obtained by using the EM algorithm and without the EM algorithm.
Keywords: bivariate lomax distribution, censored sample, EM algorithm, missing data, simulation study
Inverse Lomax distribution that is one of the inverted distributions which is very flexible to analyze the situation with non-monotonic failure rate, Singh, et al.1 If a random variable Y has Lomax distribution, then x=1/y has an Inverse Lomax distribution (ILD) Singh, et al.2 The Inverse Lomax Distribution (ILD) is used in random modeling of life components that have a decreasing failure rate. Like other distributions included in the family of generalized Beta distribution, Inverse Lomax Distribution also has application in actuarial sciences and economy and Kleiber and Kotz.3
Inverse Lomax was implemented on geophysical databases McKenzie, et al. (2011)4 These databases were about sizes of land fires in the state of California Rahman et al.,5 carried out research about statistical inference and Prediction on inverse Lomax distribution via Bayesian inferences. Kleiber6 tackled Inverse Lomax distribution to possess the Lorenz ordering relationship between ordered statistics.
In this article, maximum likelihood estimates (MLEs) of the parameters of the bivariate Inverse Lomax distribution (MOBIL) of Marshall-Olkin7 are obtained based on censored samples. The censoring time (T) is supposed to be independent of the life times X,Y of the two components. For example, this situation happens in medical studies of organs with pairs like kidneys, eyes, lungs, or any other paired organs of an individual like a two-component system that is interdependent. Failure of an individ ual might censor failure of either one of the paired organ or both.
This scheme of censoring is right censoring, There are similar situation in engineering science whenever sub-systems are considered having two components with life times (X,Y) being independent of the life time (T) of the entire system. However, failure of the main system may censor failure of either one component or both. Maximum likelihood estimators of the parameters for the case of univariate right censoring were derived by Hanagal (a,b).8,9 In addition, censoring might happen in different ways. Patients may not follow up during the study. Some patients might decide to move somewhere else. Thereupon, the experimenter may not follow them again, or the patients may not continue to cooperate because of bad side effects of the treatment. These cases are called withdrawal from the study. Useful information is presented by patients with censored data. Therefore, they should not be omitted from the analysis. Due to lack of data about real processes, the data in this study are derived from the BVL of Marshall-Olkin7 with Matlab software. Subsequently, the parameters are assessed using EM algorithm.
Let X be a random variable with the following CDF as follows;
The distribution of this form is said to be a Inverse Lomax distribution with parameters α and θ, will be denoted by IL(α,θ). The PDF and the hazard function of Inverse Lomax distribution with parameter (α,θ) will be
Suppose U1,U2 and U3 are three independent random variables such that for i = 1, 2 and 3 it is assumed that . Define and .
Then we say that the bivariate vector has a bivariate Inverse Lomax distribution Marshall-Olkin.7
The joint cumulative function of X and Y is defined as
where
and
The joint probability density function of X and Y takes the form
where
and
The density functions of and Z=min (X,Y) are given as follows:
This article aims at deriving an estimation method for the parameters of a bivariate Inverse Lomax distribution of Marshall-Olkin by the EM algorithm. In Section 2, the EM algorithm is presented, while in Section 3, the parameters are estimated by applying the EM algorithm. Finally in Section 4, we present the results of a simulation study.
The expectation-maximization (EM) algorithm was introduced by Dempster et al.10 The algorithm is considered a repeated procedure used to find the maximum likelihood estimates of data that are missing, incomplete, unobserved or censored data. Because it is easy to implement, the impact of the EM algorithm has had great effect, not only as a tool for computation but also as a method of solving complicated statistical problems.
The basic idea behind the method is to transform a set of incomplete data into a complete data problem for which the required maximization is computationally more tractable and stable numerically. Each repetition raises the likelihood, which is finally converged to a local maximum. The complete data set x can be viewed as consisting of vectors , where t is the observed incomplete data, and is the missing data.
The EM Algorithm method has been applied by several authors, for example, see Qin, et al.,11 Rudolf,12 Ning, et al.,13 Zanini, et al.,14 Açıkgöz,15 Kalabatsos,16 Attia, et al.17 and Hanagal and Ahmadi.18
The iterations
The objective is to draw inferences about the parameter vectors We will useto denote the likelihood function where t is the vector of observed data. Let represent the vector of missing data. Starting with a guessed value for the parameter , carry out the following iterations:
Execution of the algorithm
The two steps are repeated iteratively until the difference between two successive iterations is less than 0.00001. This iterative procedure leads to a monotonic increase of log :
Since the likelihood increases in each step, the EM algorithm converges generally to a local maximum. When there is no closed form solution of the M-step, a numerical algorithm as, for example, the Newton-Raphson procedure, may be used for iteratively computing . In fact, in this paper the Newton-Raphson procedure is used to obtain maximum likelihood estimates of at the (k +1)th iteration as follows:
The iterative procedure is carried out until the difference , For more detail see Hanagal and Ahmadi.18
The density function of (X,Y) is given by
For the bivariate life time distribution, we use the univariate censoring plan presented by Hanagal(a,b)8,9 as the persons do not join the study at the same time and withdrawal or death of a person or ending the study will censor life times of both components. The time of censoring is independent of life times of both components, This is the standard univariate right.
Suppose that there are n independent pairs of components, for example, paired kidneys, lungs, eyes, ears in an individual under study and ith pair of the components have life times and censoring time (Ti ). The life times associated with i-th pair of the components are given by
for i = 1,2,...,n
where Tix,Tiy and Tixy represent the unobserved random variables x<y, x>y, x=y respectively.
The likelihood of the sample of size n after discarding factors which do not contain any of the parameters of interest is given as follows
where
where , be the numbers of realizations falling in the range corresponding to and j=1,2,...,6, respectively., , and are density functions with respect to the Lebesque measure on R2, while , ,and are density functions with respect to the Lebesque measure on R.
Let the range of variability corresponding to be denoted by , j=1, 2, …., 6.and are disjoint sets and letting ,, and , the log-likelihood function can be written as follows:
Estimation parameter with EM algorithm
The E-step and M- step are obtained as follow:
E- step:
The unobserved random variables, Tix,Tiy and Tixy follow the distributions as stated in (7). The conditional distributions of , and are given as follows:
The values of the first moments of the conditional unobserved random variables are as follows.
The conditional expectation is obtained as follows:
(17)
suppose is θ known and are expressed in terms of at Kth iteration, and
M-step:
The following equations are obtained by equating the partial derivatives of with respect to to zero:
Where and
The above likelihood equations are solved for the maximum likelihood estimates using the Newton-Raphson procedure. Below the observed elements of the symmetric information matrix at the current guess are given which is required in the Newton-Raphson iterative procedure.
As mentioned above the iterative process is carried out until the following condition is met by two subsequent solutions:
Estimation parameter without EM algorithm
The likelihood of the sample of size n after discarding factors which do not contain any of the parameters of interest is given as follows
where
The log-likelihood function can be written as follows:
The following likelihood equations are obtained by equating the partial derivatives of with respect to to zero:
The sample data are generated based on following algorithms:
Step 1: Generate using the Lomax distribution white parameters .
Step 2: Take and and, therefore (X,Y) follows a bivariate Inverse Lomax distribution of Marshall-Olkin type.
Step 3: Generate ti using the Inverse Lomax distribution with , where tis are the censoring times.
For two cases with respect to the , we generated 1000 sets of samples. Each set consisted of three samples with sizes n = 20, 35, 50 and 100. The corresponding maximum likelihood estimates are displayed in Table 1 together with the empirical standard deviation. The estimates denoted by and are obtained by using the EM algorithm, while the estimates denoted by MLE and SE are obtained without the EM algorithm.
Parameters |
α1 | α2 | α3 | α4 | α5 | α6 |
---|---|---|---|---|---|---|
1.6 |
1.4 |
1.8 |
0.6 |
0.5 |
0.4 |
|
n=20 | ||||||
MLEem |
1.6113 |
1.3882 |
1.8154 |
0.6131 |
0.4812 |
0.3874 |
SEem |
0.0771 |
0.0682 |
0.0854 |
0.0865 |
0.0967 |
0.0793 |
MLE |
1.6173 |
1.3845 |
1.8193 |
0.6216 |
0.4735 |
0.3818 |
SE |
0.0943 |
0.0817 |
0.0975 |
0.0972 |
0.1076 |
0.0926 |
n=35 | ||||||
MLEem |
1.6092 |
1.3914 |
1.8122 |
0.6092 |
0.4885 |
0.3917 |
SEem |
0.0723 |
0.0637 |
0.0761 |
0.0782 |
0.0913 |
0.0725 |
MLE |
1.6117 |
1.3902 |
1.8164 |
0.6173 |
0.4834 |
0.3861 |
SE |
0.0788 |
0.0636 |
0.0826 |
0.0841 |
0.1015 |
0.0844 |
n=50 | ||||||
MLEem |
1.6064 |
1.3931 |
1.8093 |
0.6062 |
0.4921 |
0.3943 |
SEem |
0.0681 |
0.0586 |
0.0717 |
0.0726 |
0.0813 |
0.0637 |
MLE |
1.6113 |
1.3915 |
1.8124 |
0.6125 |
0.4876 |
0.3903 |
SE |
0.0714 |
0.0622 |
0.0773 |
0.0796 |
0..0972 |
0.0779 |
n=100 | ||||||
MLEem |
1.6033 |
1.3947 |
1.8068 |
0.6046 |
0.4969 |
0.3973 |
SEem |
0.0655 |
0.0561 |
0.0693 |
0.0597 |
0.0772 |
0.0589 |
MLE |
1.6055 |
1.3968 |
1.8059 |
0.6071 |
0.4934 |
0.3942 |
SE |
0.0658 |
0.0572 |
0.0714 |
0.0615 |
0.0786 |
0.0603 |
Table 1 Comparison of MLEs obtained using the EM algorithm and without EM algorithm.
Table 1 and Table 2 shows that the estimates obtained by using the EM algorithm gave a smaller empirical standard error than those obtained without the EM algorithm.
Parameters |
α1 | α2 | α3 | α4 | α5 | α6 |
---|---|---|---|---|---|---|
2.5 |
2 |
2.2 |
1.1 |
0.9 |
1.4 |
|
n=20 | ||||||
MLEem |
2.613 |
2.221 |
2.414 |
1.289 |
1.089 |
1.603 |
SEem |
0.141 |
0.135 |
0.157 |
0.167 |
0.149 |
0.163 |
MLE |
2.777 |
2.335 |
2.575 |
1.432 |
1.176 |
1.699 |
SE |
0.173 |
0.169 |
0.189 |
0.192 |
0.176 |
0.194 |
n=35 | ||||||
MLEem |
2.587 |
2.198 |
2.385 |
1.176 |
1.054 |
1.547 |
SEem |
0.109 |
0.119 |
0.132 |
0.123 |
0.135 |
0.152 |
MLE |
2.711 |
2.282 |
2.513 |
1.356 |
1.103 |
1.621 |
SE |
0.148 |
0.135 |
0.159 |
0.184 |
0.161 |
0.183 |
n=50 | ||||||
MLEem |
2.534 |
2.145 |
2.313 |
1.142 |
0.987 |
1.498 |
SEem |
0.089 |
0.097 |
0.114 |
0.114 |
0.119 |
0.124 |
MLE |
2.665 |
2.214 |
2.478 |
1.258 |
1.067 |
1.557 |
SE |
0.123 |
0.123 |
0.137 |
0.162 |
0..149 |
0.168 |
n=100 | ||||||
MLEem |
2.516 |
2.061 |
2.257 |
1.122 |
0.931 |
1.433 |
SEem |
0.078 |
0.068 |
0.096 |
0.079 |
0.088 |
0.104 |
MLE |
2.608 |
2.111 |
2.319 |
1.207 |
0.987 |
1.519 |
SE |
0.117 |
0.109 |
0.121 |
0.123 |
0.117 |
0.147 |
Table 2 Comparison of MLEs obtained using the EM algorithm and without EM algorithm.
Moreover, the estimates MLE_em are close to the true parameter values and the standard errors (SE_em) decrease as the sample size increases. The estimates for both methods are obtained by taking the mean of the 1000 maximum likelihood estimates and the mean of the 1000 standard errors from the 1000 samples of size n = 20, 35,50 and 100.
Also note that with a large sample size estimates of the values of both methods (with and without EM algorithm) from each other as well as the standard error, which means that when the sizes of large samples may give both methods convergent results and then fit whichever is applicable.
None.
None.
©2019 Abdelhady. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7