Fisher1 initially developed the concept of weighted distributions to represent the ascertainment bias. Subsequently, Rao2 extended this idea cohesively while modelling statistical data in which standard distributions were not appropriate for recording these observations with equal probability. To capture the observations in such instances’, weighted models were created using a weighted function. Biased data will arise from the frequency distribution of data recorded such as at least one boy child per family, at least one girl child per family, at least one migration per family, etc.
Assume that the original observation y is based on a distribution with probability density function (pdf)
, where
may be a vector of parameters and that the observation x is recorded based on a probability that is re-weighted by weight function
, where
is a new parameter vector.
where D is a constant used to normalize. Note that these kinds of distributions are referred to as weighted distributions. The simple size-biased distributions or length-biased distributions are the weighted distributions with the weight function
. A few broad probability models that produce weighted probability distributions were examined by Patil3 and Rao,4 along with applications and the fact that
is a natural outcome in sampling-related situations.
In distribution theory, it is highly helpful to add a shape parameter to an existing distribution using a weighted approach. The existing distribution exhibits increased flexibility and tractability tendencies with the inclusion of a parameter. Weighted distributions are used to model heterogeneity, clustered sampling, and extraneous variance in the dataset.
Weighted versions of one parameter lifetime distributions have been derived by several researchers using the weight function
. For examples, Ghitany et al.5 proposed weighted Lindley distribution (WLD) from Lindley distribution of Lindley,6 Eyob and Shanker7 suggested weighted Garima distribution (WGD) from Garima distributions of Shanker,8 Ganaie et al.9 suggested weighted Aradhana distribution (WAD) from Aradhana distribution of Shanker,10 Shanker and Shukla11 suggested weighted Sujatha distribution (WSD) from Sujatha distribution of Shanker,12 Shanker et al.13 suggested weighted Komal distribution (WKD) from Komal distribution of Shanker,15 Shanker et al.14 suggested weighted Uma distribution (WUD) from Uma distribution of Shanker17 respectively. It has been noted that, depending on a conceptual or applied angle, these weighted distributions did not provide a suitable fit in certain datasets. Therefore, search for better weighted distribution corresponding to recent lifetime distribution is required.
Shanker16 introduced a one parameter Pratibha distribution with statistical properties and applications and observed that Pratibha distribution provides better fit than exponential distribution, Lindley distribution, Sujatha distribution, Shanker distribution by Shanker,18 Akash distribution by Shanker19 and Garima distribution. The pdf and cdf of Pratibha distribution are given by
The primary goal is to study the weighted Pratibha distribution (WPD) and examine its properties. The WPD is being proposed because it is expected to provide a better fit than the weighted counterpart of the Lindley, Sujatha, Komal, and Garima distributions, given that the Pratibha distribution provides the highest degree of fit over these distributions.
Weighted Pratibha distribution
Let a random variable
WPD
with the weight function
, the pdf and cdf of WPD can be expressed as
where
is a scale parameter and
is shape parameter of the distribution. When
, WPD reduced to Pratibha distribution with parameter
. The plots of pdf and cdf of WPD are shown in the following Figures 1 & 2 respectively. From the Figure 1, it is clear that when
and for increasing values of
, the pdf has unimodal and positively skeweed natures. When
and
, it has monotonically increasing natures and for
and
, the pdf have bimodal and positively skewwed natures. The most important feature of WPD is that it is unimodal and bimodal for different values of parameters and in general flood dataset shows unimodal or bimodal shapes depending upon the time period of the flood and WPD would be the best choice for modeling data of flood.
Survival function
The survival function of WPD can be obtained as
Hazard function
The hazard function of WPD can be obtained as
The plots of hazard function of WPD are graphically shown in the following Figure 3. It shows different shapes including monotonically increasing, decreasing, upside bathtub and downside bathtub and it means that the distribution is applicable for modelling data of these natures.
Figure 3 Hazard function of WPD.
Reverse hazard function
Mean residual life function
Mean residual life function of WPD can be obtained as
.
The plots of mean residual life function are shown in the following Figure 4. It is quite obvious that mean residual life function is monotonically decreasing.
Figure 4 Mean residual life function of WPD.
Moments related measures
The rth raw moment (moment about origin) of WPD, after little algebraic simplification, can be obtained as
Putting
, the first four raw moments are obtained as
The central moments of WPD, after simple algebraic simplification, can be obtained as
Thus, the coefficient of variation (C.V), coefficient of skewness
, coefficient of kurtosis
, and index of dispersion
of WPD are obtained as
When
, variance is greater than the mean. The plots of coefficient of variation, skewness, kurtosis and index of dispersion are shown in the following Figure 5.
Figure 5 Coefficient of variation, coefficient of skewness, coefficient of kurtosis, index of dispersion for differnet values of the parameters of WPD.
Figure 5 illustrates that for fixed values of ω and increasing values of η, the coefficient of variation, coefficient of skewness and coefficient of kurtosis are monotonically increaseing, wheres as for fixed values of η and increasing values of ω, coefficient of variation, coefficient of skewness and coefficient of kurtosis are monotonically decreaseing. On the other hand for fixed values of ω and increasing values of η and ficxed values η and incrasing values of ω index of dispersion is decreasing.
Maximum likelihood estimation
Let
be a random sample from WPD. The log-likelihood function of WPD can be expressed as
This gives
where
is a digamma function.
The log-likelihood equations presented here are not readily solvable in closed form, necessitating the use of maximization techniques using R software. Iterative solutions are employed to optimize the likelihood function until sufficiently close parameter values are achieved. These equations can be solved using Fisher’s scoring method. For Fisher's scoring method, the following approach is undertaken
For finding the MLEs
of parameters
of WPD, following equations can be solved
where
and
are the initial values of
and
. These equations are solved iteratively till close estimates of parameters are obtained.
A simulation study
To assess the effectiveness of maximum likelihood estimators for WPD, a simulation study has been conducted. The investigation involved examining mean estimates, biases (B), mean square errors (MSEs), and variances of the maximum likelihood estimates (MLEs) for WPD, utilizing the specified formulas.
,
where
and
.
The acceptance-rejection method of simulation study has been employed to generate data. This method is commonly used in simulation studies to produce random samples from a target distribution. The method for generating random samples from the WPD involves the following steps:
- Generate Y from exponential
distribution
- Generates U from Uniform
distribution
- If
, then set
(“accept the sample”); otherwise (“reject the sample”) and if reject then repeat the process: step (a-c) until getting the required samples. Where
is a constant
- Each sample size is replicated 10000 times
The biases, MSEs, and variances of the MLEs of the parameters decrease for increasing sample size as evident in Tables 1 & 2. This supports the first-order asymptotic theory of MLEs.
Parameter
|
Sample size
|
Mean
|
Bias
|
MSE
|
Variance
|
|
25
|
0.09587
|
-0.00412
|
0.00001
|
0.00000
|
50
|
0.09600
|
-0.00399
|
0.00001
|
0.00000
|
100
|
0.09629
|
-0.00370
|
0.00001
|
0.00000
|
200
|
0.09745
|
-0.00254
|
0.00001
|
0.00000
|
300
|
0.097994
|
-0.00200
|
0.00000
|
0.00000
|
|
25
|
1.48705
|
-0.01294
|
0.00049
|
0.00033
|
50
|
1.49011
|
-0.00988
|
0.00038
|
0.00028
|
100
|
1.49332
|
-0.00667
|
0.00028
|
0.00023
|
200
|
1.49509
|
-0.00490
|
0.00021
|
0.00019
|
300
|
1.49674
|
-0.00325
|
0.00016
|
0.00015
|
Table 1 Descriptive constants of WPD for η=0.1,ω=1.5
Parameter
|
Sample size
|
Mean
|
Bias
|
MSE
|
Variance
|
|
25
|
0.23411
|
0.03411
|
0.00117
|
0.00001
|
50
|
0.23359
|
0.03359
|
0.00113
|
0.00001
|
100
|
0.23324
|
0.03324
|
0.00111
|
0.00001
|
200
|
0.23299
|
0.03299
|
0.00109
|
0.00001
|
300
|
0.23205
|
0.03205
|
0.00103
|
0.00001
|
|
25
|
0.30356
|
0.00356
|
0.00026
|
0.00025
|
50
|
0.30297
|
0.00297
|
0.00019
|
0.00018
|
100
|
0.30267
|
0.00267
|
0.00014
|
0.00014
|
200
|
0.30211
|
0.00211
|
0.00012
|
0.00011
|
300
|
0.30130
|
0.00130
|
0.00010
|
0.00010
|
Table 2 Descriptive constants of WPD for η=0.1,ω=1.5
Variance-Covariance matrix for the prameters
and
respectively as
and
To test the goodness of fit of WPD , we have considered a real lifetime dataset from flood discharge. The following right-skewed dataset discussed by Montfort,20 presents the maximum annual flood discharges of the North Saskatchewan in units of 1000 cubic feet per second of the north Saskatchewan river at Edmonton over a period of 47 years.
19.885, 20.940, 21.820, 23.700, 24.888, 25.460, 25.760, 26.720, 27.500, 28.100, 28.600,
30.200, 30.380, 31.500, 32.600, 32.680, 34.400, 35.347, 35.700, 38.100, 39.020, 39.200,
40.000, 40.400, 40.400, 42.250, 44.020, 44.730, 44.900, 46.300, 50.330, 51.442, 57.220,
58.700, 58.800, 61.200, 61.740, 65.440, 65.597, 66.000, 74.100, 75.800, 84.100, 106.600,
109.700, 121.970, 121.970, 185.560.
The summary of the dataset and its total time in test (TTT) plots are shown in the folowing Table 3 and the Figure 6. The goodness of fit of the WPD along with other weighted and unweighted distributions are shown in the Table 4.
Minimum
|
1st Quartile
|
Median
|
Mean
|
3rd Quartile
|
Maximum
|
19.89
|
30.34
|
40.40
|
51.50
|
61.34
|
185.56
|
Table 3 Goodness of fit of the datasetDescriptive constants of WPD for
Distributions
|
MLE
|
|
|
AIC
|
K-S
|
P-value
|
|
θ^
|
α^
|
-2log L
|
|
|
|
WPD
|
0.0717 (0.0148)
|
1.7187 (0.7103)
|
443.17
|
447.17
|
0.12
|
0.47
|
WSD
|
0.0776 (0.0186)
|
2.0226 (0.9041)
|
443.34
|
447.34
|
0.15
|
0.24
|
WKD
|
0.0717 (0.0148)
|
0.2055 (0.0521)
|
443.18
|
447.18
|
0.27
|
0.00
|
WLD
|
0.0717 (0.0148)
|
2.7180 (0.7104)
|
443.17
|
447.17
|
0.16
|
0.16
|
WGD
|
0.0757 (0.0145)
|
3.1524 (0.6626)
|
443.78
|
447.78
|
0.14
|
0.28
|
WAD
|
0.0724 (0.0147)
|
0.7822 (0.7061)
|
443.31
|
447.31
|
0.26
|
0.00
|
Table 4 Goodness of fit of the dataset
Figure 6 TTT- plot of the observed and simulated samples of WPD respectively.
The Table-4 shows that WPD have the least, AIC and K-S values as compared to the WSD, WKD, WLD, WGD, and WAD. So, we conclude that WPD provides a better fit as compared to WSD, WKD, WLD, WGD, and WAD. From the fitted plot and the P-P plot of the considered distribution presented in the Figure 7 & 8 for the dataset also exhibit that WPD provides a better fit as compared to the considered distributions.
Figure 7 Fitted plot of the considered distributions of the dataset.
Figure 8 P-P plots of the theoretical and sample quantiles of the considered distributions of the dataset.