Komal distribution with properties and application in survival analysis

Rama  Shanker

doi:10.15406/bbij.2023.12.00381

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 12 Issue 2

Komal distribution with properties and application in survival analysis

Rama Shanker

Department of Statistics, Assam University, Silchar, Assam, India

Correspondence: Department of Statistics,Assam University, Silchar, Assam, India

Received: April 05, 2023 | Published: April 25, 2023

Citation: Shanker R. Komal distribution with properties and application in survival analysis. Biom Biostat Int J. 2023;12(2):40-44. DOI: 10.15406/bbij.2023.12.00381

Download PDF

Abstract

The modeling and analysis of lifetime data are becoming a challenge for the statistician and policy makers because the lifetime data are in general stochastic in nature. During recent decades several one parameter lifetime distributions have been proposed by researchers but they are not suitable due to the nature of the distribution and the stochastic nature of the data. In this paper an attempt has been made to propose a new one parameter lifetime distribution named Komal distribution. The statistical properties, estimation of parameter and application of the distribution to a lifetime dataset have been presented.

Keywords: lifetime distributions, statistical properties, estimation of parameter, applications

Introduction

In the present era, modeling of lifetime data is a serious challenge because the lifetime data are stochastic in nature. It has been observed that policy makers are struggling to find a suitable distribution for lifetime data. During recent decades several one parameter lifetime distributions have been proposed in Statistics literature but due to distributional nature or the nature of the lifetime data, these proposed distributions do not give proper fit. Several researchers in the field of distribution theory are trying to propose a new lifetime distribution as per the stochastic nature of lifetime data. Upto 1958, there was only one lifetime distribution named exponential distribution which was in use for the analysis and modeling of lifetime data. Lindley¹ proposed another lifetime distribution known as Lindley distribution and Ghitanty et al.² after detailed study on its statistical properties and application observed that Lindley distribution gives much closure fit than exponential distribution. While working on the comparative study of exponential and Lindley distribution, Shanker et al.³observed that exponential and Lindley distributions are competing well and there were some datasets where these two distributions do not provide good fit. Shanker^4,5 proposed two new one parameter lifetime distributions namely Shanker distribution and Akash distribution which gave much better fit than both exponential and Lindley distribution. Shanker et al.⁶provides a comparative study on applications of exponential, Lindley and Akash distribution. Further, Shanker & Hagos⁷ presented a detailed study on applications of exponential, Lindley, Shanker and Akash distribution and showed that still there are some datasets where these distributions did not provide better fit. Further, Shanker⁸ introduced Sujatha distribution which provides much better fit than exponential, Lindley, Shanker and Akash distribution. Again, Shanker⁹ proposed another one parameter lifetime distribution named Garima distribution to model data arising from behavioral sciences, but this also does not give good fit on several real lifetime datasets. Now the question is to search a distribution which is both flexible and tractable in nature to capture the variation in the datasets. When a distribution does not give good fit, then some researchers prefer to transform the dataset to satisfy the assumptions of the distribution but this is not a preferable method because the original nature of the dataset gets lost. Some researchers also prefer to modify the distribution by adding extra shape parameter or scale parameter distribution to suit the nature of the dataset. But, instead of transforming the original dataset or modifying the distribution suiting to dataset, it is better to search a new distribution which provides better fit for the given datasets when the existing distributions fails to provide good fit.

In the present paper an attempt has been made to propose a new one parameter lifetime distribution, named Komal distribution, which would provide a better fit over exponential, Lindley, Shanker, Akash and Sujatha distributions. Some of its statistical properties, estimation techniques of parameter and an application to a real lifetime dataset has been discussed and presented.

Komal distribution

Taking the convex combination of exponential $(θ)$ and gamma $(2, θ)$ distributions with respective mixing proportions $\frac{θ (θ + 1)}{θ^{2} + θ + 1}$ and $\frac{1}{θ^{2} + θ + 1}$ , a new probability density function (pdf) can be expressed as

$f (x; θ) = \frac{θ^{2}}{θ^{2} + θ + 1} (1 + θ + x) e^{- θ x}; x > 0, θ > 0$

We would call this new distribution as ‘Komal distribution’. Like other one parameter lifetime distributions, Komal distribution has been derived as a convex combination of exponential distribution and gamma distribution, it is expected to give better fit over exponential and other one parameter distributions derived using convex combinations of exponential distribution and gamma distribution. The cumulative distribution function (cdf) of Komal distribution can thus be obtained as

$F (x; θ) = 1 - [1 + \frac{θ x}{θ^{2} + θ + 1}] e^{- θ x}; x > 0, θ > 0$

The behaviour of the pdf and the cdf of Komal distribution for varying values of parameter $θ$ have been presented in Figures 1 & 2 respectively.

Figure 1 Graphs of the pdf of Komal distribution for selected values of the parameter.

Figure 2 Graphs of the cdf of Komal distribution for selected values of the parameter.

Descriptive measures of Komal distribution

As we know that moments are essential to know the descriptive nature such as coefficient of variation, skewness, kurtosis and index of dispersion of any distribution. Following the approach of obtaining the $r$ th moment of Shanker distribution and Akash distribution by Shanker^4,5, the $r$ th moment about origin $μ_{r}^{'}$ of Komal distribution can be obtained as

$μ_{r}^{'} = E (X^{r}) = \frac{θ^{2}}{θ^{2} + θ + 1} \int_{0}^{\infty} x^{r} (1 + θ + x) e^{- θ x} d x$

$= \frac{r! (θ^{2} + θ + r + 1)}{θ^{r} (θ^{2} + θ + 1)}; r = 1, 2, 3, \cdot \cdot \cdot$ (3.1)

Substituting $r = 1, 2, 3, 4$ in (3.1), the first four moments about origin of Komal distribution can be obtained as

$μ_{1}^{'} = \frac{θ^{2} + θ + 2}{θ (θ^{2} + θ + 1)}$ , $μ_{2}^{'} = \frac{2 (θ^{2} + θ + 3)}{θ^{2} (θ^{2} + θ + 1)}$

$μ_{3}^{'} = \frac{6 (θ^{2} + θ + 4)}{θ^{3} (θ^{2} + θ + 1)}$ , $μ_{4}^{'} = \frac{24 (θ^{2} + θ + 5)}{θ^{4} (θ^{2} + θ + 1)}$ .

The moments about the mean of Komal distribution, using relationship between moments about the mean and the moments about the origin, can thus be obtained as

$μ_{2} = \frac{θ^{4} + 2 θ^{3} + 5 θ^{2} + 4 θ + 2}{θ^{2} {(θ^{2} + θ + 1)}^{2}}$

$μ_{3} = \frac{2 (θ^{6} + 3 θ^{5} + 9 θ^{4} + 13 θ^{3} + 12 θ^{2} + 6 θ + 2)}{θ^{3} {(θ^{2} + θ + 1)}^{3}}$

$μ_{4} = \frac{3 (3 θ^{8} + 12 θ^{7} + 42 θ^{6} + 84 θ^{5} + 119 θ^{4} + 112 θ^{3} + 76 θ^{2} + 32 θ + 8)}{θ^{4} {(θ^{2} + θ + 1)}^{4}}$ .

The descriptive constants including coefficient of variation (CV), coefficient of skewness (CS), coefficient of kurtosis (CK) and the index of dispersion (ID) of Komal distribution are thus obtained as

$C V = \frac{\sqrt{μ_{2}}}{μ_{1}^{'}} = \frac{\sqrt{θ^{4} + 2 θ^{3} + 5 θ^{2} + 4 θ + 2}}{θ^{2} + θ + 2}$

$C S = \frac{μ_{3}^{2}}{μ_{2}^{3}} = \frac{4 {(θ^{6} + 3 θ^{5} + 9 θ^{4} + 13 θ^{3} + 12 θ^{2} + 6 θ + 2)}^{2}}{{(θ^{4} + 2 θ^{3} + 5 θ^{2} + 4 θ + 2)}^{3}}$

$C K = \frac{μ_{4}}{μ_{2}^{2}} = \frac{3 (3 θ^{8} + 12 θ^{7} + 42 θ^{6} + 84 θ^{5} + 119 θ^{4} + 112 θ^{3} + 76 θ^{2} + 32 θ + 8)}{{(θ^{4} + 2 θ^{3} + 5 θ^{2} + 4 θ + 2)}^{2}}$

$I D = \frac{μ_{2}}{μ_{1}^{'}} = \frac{θ^{4} + 2 θ^{3} + 5 θ^{2} + 4 θ + 2}{θ (θ^{2} + θ + 1) (θ^{2} + θ + 2)}$ .

Behaviour of coefficient of variation (CV), coefficient of skewness (CS), coefficient of kurtosis (CK) and index of dispersion (ID) of Komal distribution for changing values of parameter are shown in the Figure 3. The coefficient of variation and the coefficient of skewness are non-decreasing whereas the coefficient of kurtosis and the index of dispersion are non-increasing.

Figure 3 Graph of CV, CS, CK and ID of Komal distribution for values of the parameter.

Reliability properties of Komal distribution

Hazard rate function

The hazard rate function of a random variable $X$ having pdf $f (x; θ)$ and cdf $F (x; θ)$ is defined as

$h (x, θ) = \lim_{Δ x \to 0} \frac{P (X < x + Δ x | X > x)}{Δ x} = \frac{f (x; θ)}{1 - F (x; θ)}$

Thus, the hazard rate function of Komal distribution can be obtained as

$h (x, θ) = \frac{θ^{2} (1 + θ + x)}{(θ^{2} + θ + 1 + θ x)}$ .

This gives $h (0, θ) = \frac{θ^{2} (θ + 1)}{θ^{2} + θ + 1} = f (0, θ)$ . The behaviour of the hazard rate function of Komal distribution for various values of parameter $θ$ is shown in the following Figure 4. The hazard rate of Komal distribution is monotonically non-decreasing. Further, as the values of parameter increases, the hazard rate of Komal distribution scaled up.

Figure 4 Graphs of the hazarad rate function of Komal distribution for selected values of the parameter.

Mean residual life function

Let $X$ be a random variable over the support $(0, \infty)$ representing the lifetime of a system. Mean Residual life (MRL) function measures the expected value of the remaining lifetime of the system, provided it has survived up to time $x$ . Let us consider the conditional random variable $X_{x} = (X - x | X > x); x > 0$ . Then, the MRL function, denoted by $m (x)$ , is defined as

$m (x) = E (X_{x}) = \frac{1}{S (x)} \int_{x}^{\infty} [1 - F (t)] d t$

The MRL function of Komal distribution can thus be obtained as

$m (x) = \frac{1}{{θ^{2} + θ + 1 + θ x} e^{- θ x}} \int_{x}^{\infty} t (θ^{2} (1 + θ + t)) e^{- θ t} d t - x$ $= \frac{θ^{2} + θ + 2 + θ x}{θ (θ^{2} + θ + 1 + θ x)}$ .

This gives $m (0) = \frac{θ^{2} + θ + 2}{θ (θ^{2} + θ + 1)} = μ_{1}^{'}$ . The behaviour of the mean residual life function of Komal distribution for various values of parameter is shown in the following Figure 5. It is clear that the mean residual life function of Komal distribution is monotonically non-increasing.

Figure 5 Graphs of the mean residual life function of Komal distribution for values of parameter.

Stochastic ordering

In Probability theory and statistics, a stochastic order quantifies the concept of one random variable being bigger than another. A random variable $X$ is said to be smaller than a random variable $Y$ in the

Stochastic order $(X \leq_{s t} Y)$ if $F_{X} (x) \geq F_{Y} (y)$ for all $x$
Hazard rate order $(X \leq_{h r} Y)$ if $h_{X} (x) \geq h_{Y} (y)$ for all $x$
Mean residual life order $(X \leq_{m r l} Y)$ if $m_{X} (x) \geq m_{Y} (y)$ for all $x$
Likelihood ratio order $(X \leq_{l r} Y)$ if $\frac{f_{X} (x)}{f_{Y} (y)}$ decrease in $x$

The following results due to Shaked & Shantikumar¹⁰ are well known for establishing stochastic ordering of distributions

\underset{\begin{array}{l} ⇓ \\ X <_{s t} Y \end{array}}{X <_{l r} Y \Rightarrow X <_{h r} Y \Rightarrow X <_{m r l} Y}

Theorem: Let $X ~$ Komal distribution $(θ_{1})$ and $Y ~$ Komal $(θ_{2})$ . If $θ_{1} > θ_{2}$ , then $X <_{l r} Y$ hence $X <_{h r} Y$ , $X <_{m r l} Y$ and $X <_{s t} Y$ .

Proof: We have

$\frac{f_{X} (x; θ_{1})}{f_{Y} (x; θ_{2})} = [\frac{θ_{1}^{2} (θ_{2}^{2} + θ_{2} + 1)}{θ_{2}^{2} (θ_{1}^{2} + θ_{1} + 1)}] (\frac{1 + θ_{1} + x}{1 + θ_{2} + x}) e^{- (θ_{1} - θ_{2}) x}$ .

This gives

$\log [\frac{f_{X} (x; θ_{1})}{f_{Y} (x; θ_{2})}] = \log [\frac{θ_{1}^{2} (θ_{2}^{2} + θ_{2} + 1)}{θ_{2}^{2} (θ_{1}^{2} + θ_{1} + 1)}] + \log (\frac{1 + θ_{1} + x}{1 + θ_{2} + x}) - (θ_{1} - θ_{2}) x$

Therefore, $\frac{d}{d x} \log [\frac{f_{X} (x; θ_{1})}{f_{Y} (x; θ_{2})}] = \frac{θ_{2} - θ_{1}}{(1 + θ_{1} + x) (1 + θ_{2} + x)} - (θ_{1} - θ_{2})$

Thus, for $θ_{1} > θ_{2}$ , $\frac{d}{d x} \log [\frac{f_{X} (x; θ_{1})}{f_{Y} (x; θ_{2})}] < 0$ . this means that $X <_{l r} Y$ hence $X <_{h r} Y$ , $X <_{m r l} Y$ and $X <_{s t} Y$ .

Parameter estimation of Komal distribution

Method of moment estimate

Since Komal distribution has one parameter, equating the population mean to the corresponding sample mean $(\bar{x})$ , we get the third-degree polynomial equation of parameter $θ$ in the form

$\bar{x} θ^{3} + (\bar{x} - 1) θ^{2} + (\bar{x} - 1) θ - 2 = 0$ .

Solving this third-degree polynomial equation using Newton-Raphson method, we can easily get the moment estimate of the parameter.

Maximum likelihood estimate

Suppose $(x_{1}, x_{2}, x_{3}, ..., x_{n})$ be a random sample of size $n$ from Komal distribution. The log likelihood function, $\log L$ of Komal distribution is given by

$\log L = \sum_{i = 1}^{n} \log f (x_{i}; θ) = n {2 \log θ - \log (θ^{2} + θ + 1)} + \sum_{i = 1}^{n} \log (1 + θ + x_{i}) - n θ \bar{x}$

The maximum likelihood estimate (MLE) $(\hat{θ})$ of the parameters $(θ)$ of Komal distribution is the solution of the following log likelihood equation

$\frac{d \log L}{d θ} = \frac{2 n}{θ} - \frac{n (2 θ + 1)}{θ^{2} + θ + 1} + \sum_{i = 1}^{n} \frac{1}{1 + θ + x_{i}} - n \bar{x} = 0$

This gives

$\sum_{i = 1}^{n} \frac{1}{1 + θ + x_{i}} + \frac{n (θ + 2)}{θ (θ^{2} + θ + 1)} - n \bar{x} = 0$ .

This is a non-linear equation in $θ$ . This can be solved using Newton-Raphson method available in R software to get the maximum likelihood estimate (MLE) of the parameter $θ$ by taking the moment estimate of the parameter as the initial value. It should be noted that the method of moment estimate of the parameter will not be the same as that of the MLE.

Application of Komal distribution

The application and the goodness of fit of Komal distribution has been discussed with a failure time dataset. Following failure time dataset has been considered.

Data set: The following skewed to right dataset relating to the failure times of 20 electric bulbs discussed by Murthy et al.¹¹ is considered and the observations are:

1.32, 12.37, 6.56, 5.05, 11.58, 10.56, 21.82, 3.60, 1.33, 12.62, 5.36, 7.71, 3.53, 19.61, 36.63,

0.39, 21.35, 7.22, 12.42, 8.92.

The values ML estimates of parameter and its standard error in parenthesis, $- 2 \log L$ , AIC (Akaike Information Criterion), AICC (Akaike Information Criterion corrected), BIC (Bayesian Information criterion), K-S (Kolmogorov-Smirnov) for the considered distributions for the given dataset have been computed and presented in table 2. The formulae for computing AIC, AICC, BIC and K-S Statistics are as follows:

$A I C = - 2 l o g L + 2 k$ , $A I C C = A I C + \frac{2 k (k + 1)}{n - k - 1}$ , $B I C = - 2 l o g L + k \log n$ , $D = \underset{x}{S u p} | F_{n} (x) - F_{0} (x) |$ .

$where k = number of parameter, n = sample size$

The pdf and the cdf of the fitted distributions are given in the Table 1.

Distributions	Pdf	Cdf
Garima	$f (x; θ) = \frac{θ}{θ + 2} (1 + θ + θ x) e^{- θ x}$	$F (x; θ) = 1 - (1 + \frac{θ x}{θ + 2}) e^{- θ x}$
Sujatha	$f (x; θ) = \frac{θ^{3}}{θ^{2} + θ + 2} (1 + x + x^{2}) e^{- θ x}$	$F (x, θ) = 1 - [1 + \frac{θ x (θ x + θ + 2)}{θ^{2} + θ + 2}] e^{- θ x}$
Akash	$f (x; θ) = \frac{θ^{3}}{θ^{2} + 2} (1 + x^{2}) e^{- θ x}$	$F (x; θ) = 1 - [1 + \frac{θ x (θ x + 2)}{θ^{2} + 2}] e^{- θ x}$
Shanker	$f (x; θ) = \frac{θ^{2}}{θ^{2} + 1} (θ + x) e^{- θ x}$	$F (x; θ) = 1 - (1 + \frac{θ x}{θ^{2} + 1}) e^{- θ x}$
Lindley	$f (x; θ) = \frac{θ^{2}}{θ + 1} (1 + x) e^{- θ x}$	$F (x; θ) = 1 - [1 + \frac{θ x}{θ + 1}] e^{- θ x}$
Exponential	$f (x; θ) = θ e^{- θ x}$	$F (x; θ) = 1 - e^{- θ x}$

Table 1 The pdf and the Cdf of fitted distributions

The fitted plots of considered distributions for the given datasets have been presented in Figure 6. The goodness of fit in Table 2 and the fitted plots of distributions for the dataset in figure 6 show that Komal distribution provides best fit over exponential, Lindley, Shanker, Akash and Sujatha distributions and therefore Komal distribution can be considered as the most suitable lifetime distribution for modeling lifetime data from biomedical science and engineering.

Distributions	$\hat{θ}$	$- 2 \log L$	AIC	AICC	BIC	K-S	P-value
Komal	0.1745 (0.0275)	133.33	135.33	135.90	135.52	0.0992	0.9914
Garima	0.1408 (0.0273)	133.18	135.18	135.75	135.37	0.1255	0.9218
Sujatha	0.2689 (0.0345)	137.54	139.54	140.11	139.73	0.1294	0.9037
Akash	0.2786 (0.0355)	138.47	140.47	141.04	140.66	0.1607	0.6434
Shanker	0.1885 (0.0292)	134.65	136.65	137.22	136.84	0.1172	0.9539
Lindley	0.1762 (0.0280)	133.44	135.44	136.01	135.63	0.1122	0.9684
Exponential	0.0952 (0.0212)	134.04	136.04	136.61	136.23	0.1255	0.9218

Table 2 ML estimates, $- 2 \log L$ , AIC, AICC, BIC, K-S of the distribution for the dataset

Figure 6 Fitted plots of distributions considered for dataset.

Conclusions and future works

In this paper a new lifetime distribution named Komal distribution for analysing and modeling lifetime data from biomedical science and engineering has been proposed. Some of its important statistical properties, estimation of parameter and application to a real lifetime dataset from survival analysis has been presented. Since the distribution is new one, it would be of great hope and expectation that this will capture the attention of researchers working in biomedical science, engineering and insurance to model lifetime data in their respective fields. As the distribution has flexibility, tractability and practicability, future of the distribution would be quite bright among researchers in biomedical sciences and engineering.