Research Article Volume 11 Issue 2
Department of Statistics, Federal University of Technology, Nigeria
Correspondence: Okafor Ikechukwu Boniface, Department of Statistics, Federal University of Technology, Owerri, Imo state, Nigeria
Received: November 19, 2021 | Published: April 25, 2022
Citation: Boniface OI, Chijioke OA, Justin OC, et al. Effect of correlated measurement errors on estimation of population mean with modified ratio estimator. Biom Biostat Int J. 2022;11(2):52-56. DOI: 10.15406/bbij.2022.11.00354
This paper proposes a class of modified ratio estimators of population mean using correlation coefficient between study and auxiliary variables in the presence of correlated measurement errors under simple random strategy. Usual unbiased estimator of sample mean per unit, ratio and product-type estimators belong to the suggested modified class of estimators. Considering large sample approximation, properties of the proposed estimator are obtained. Theoretical and empirical analysis revealed that the proposed class of estimators are more efficient than some existing estimators.
Keywords: Correlated measurement errors, ratio estimator, bias, mean squared error, correlation coefficient.
Many researchers have widely utilized auxiliary information while estimating population parameters. This has contributed immensely in advancing sampling theory as a result of its ability to improve the accuracy of sampling strategies and reduce their design variances. Due to the fact that sample sizes are not sufficiently large in most of the survey exercises, estimators of population parameters based on these survey exercises may not be satisfactory in terms of their variances. At the same time it is not unusual that some auxiliary information about the study variable may be available. Such additional information, if available, can be utilized to improve properties of estimators. Some of the auxiliary information about the population that is used to improve the accuracy of an estimator may include a known variable to which study variable is approximately related. Such estimators which utilize auxiliary information include ratio, product and regression estimators. Although use of auxiliary information may have improved the estimates of population parameters, measurement errors may still influence the efficiency of the estimators.
In sampling survey, properties of estimators presume that observed values are indeed true values. However, several observations of the same quantity on the same subject may not in most cases be the same as a result of natural variation in the subject, variation in the observational process, or both. Hence, it is generally accepted that data available for statistical analysis are subject to error.
The difference between the individual observed values and their corresponding true values are referred to as measurement errors. This constitutes an essential part of errors in any sample survey data and their presence is practically inevitable whatever precautions one takes. The causes of these measurement errors may be attributed to errors during data collection stage due to respondents or enumerators’ bias or both, and to data collation and coding.1,2 The magnitude of the effect of measurement errors on statistical inference drawn about the population parameter may sometimes be inconsequential. However, in some other situation, the magnitude may throw a serious concern which may invalidate the inference drawn and lead to unfortunate implication.
Shalabh3 had examined the issue of observational error or measurement errors on ratio estimator under simple random sampling strategy. Following his work, other researchers further investigated the impact of measurement errors on the estimators of population parameters using different sampling schemes. Manish and Singh4 considered linear combination of ratio estimator and sample mean per unit and came up with a family of estimators of population mean. They obtained the bias and mean squared error of the proposed family of estimators when the sample data are contaminated with measurement errors. Using variable transformation, Diwakar et al.5 worked on estimator of a population mean in the presence of measurement errors and the properties of the estimator were obtained. Comparing this estimator with the estimators proposed by Manish and Singh4 and Shalabh3 when the study and auxiliary variables are contaminated with measurement errors, it was observed that their proposed estimator is more efficient in a localized domain. Using variable transformation, Viplav et al.6 studied a class of difference-type estimator for estimating the population mean of the study variable when measurement errors are present. They generated some new estimators that belong to the family of estimators proposed by them. Their empirical study showed that the suggested estimators have more gain in efficiency overother existing estimators.
Gregoire and Salas7 studied systematic measurement errors as well as measurement errors that are assumed to be stochastic in nature. They obtained the statistical properties of three ratio estimators under these measurement error conditions. They concluded that the ratio-of-means estimator appears to be less affected when the auxiliary variants are contaminated with measurement errors. Empirical study of ratio and regression estimators through Monte Carlo simulation by Sahoo et al.8 when the auxiliary variable is contaminated with the measurement errors reveals that the regression estimator is more sensitive to measurement errors than the ratio estimator with respect to their efficiency. Bias of both estimators is sensitive to measurement errors with the bias of an estimator decreasing as the sample size is increasing, and increase when the regression line of (study variable) on (auxiliary variable)moves away from the origin.
All the work reviewed so far were based on the general assumption that measurement errors are uncorrelated though the study variable and auxiliary variable are correlated. However, Shalabh and Jia-Ren9 relaxed the general assumption and studied the performance of ratio as well as product estimators of population mean with correlated measurement errors.
In this work, we examine the performance of modified ratio-type estimator of population mean under the influence of correlated measurement errors using simple random sampling scheme.
Considering, a population of size N, (Ui=U1,U2,…,UN)(Ui=U1,U2,…,UN) . Let’s denote the study variable as yy and the auxiliary variableas xx and let them take on the values yiyi and xixi respectively on the ithith unit of Ui,(i=1,2,…,N)Ui,(i=1,2,…,N) . We denote population mean of yy and xx as μYμY and μXμX respectively, and the population variance of yy and xx as σ2Yσ2Y and σ2Xσ2X respectively. Also let σXYσXY and ρρ denote the population covariance and the correlation coefficient between ρρ and xx .
Assume a simple random sample without replacement (SRSWOR) of size n is drawn from population U. Let ˉy¯y and ˉx¯x be the sample means of yy and xx respectively. Thus, for a simple random sampling scheme, let ( yiyi ,xixi ) be observed values instead of the true values (y*i,x*i)(y∗i,x∗i) on the two characteristics (y,x)(y,x) respectively for the ithith unit (i=1,2,…,n)(i=1,2,…,n) in a sample of size n. Let the measurement errors be defined as:
ui=yi−y*iui=yi−y∗i (1)
vi=xi−x*ivi=xi−x∗i (2)
Such that
E(u)=E(v)=0E(u)=E(v)=0
Var(u)=σ2uVar(u)=σ2u , Var(v)=σ2vVar(v)=σ2v
cov(u,v)=ρ*σuσvcov(u,v)=ρ∗σuσv
Thus, expressing the observed value as a function of the true value and the measurement errors, we have,
yi=y*i+uiyi=y∗i+ui (3)
xi=x*i+vixi=x∗i+vi (4)
Considering large sample approximation, the finite population correction 1−f1−f can be ignored,
where
f=nNf=nN
We define mean and variance of study variable YY and auxiliary variable XX as
ˉX=1NN∑i=1Xi,ˉY=1NN∑i=1Yi,σX=1NN∑i=1(Xi−ˉX)2,σY=1NN∑i=1(Yi−ˉY)2¯¯¯X=1NN∑i=1Xi,¯¯¯Y=1NN∑i=1Yi,σX=1NN∑i=1(Xi−¯¯¯X)2,σY=1NN∑i=1(Yi−¯¯¯Y)2
Further, we define the coefficient of variation of XX and YY as
CX=σXˉXandCY=σYˉYrespectivelyCX=σX¯¯¯XandCY=σY¯¯¯Yrespectively
Also Covariance of YY and XX , Correlation Coefficient between YY and XX , and Correlation Coefficient between uu and vv are defined as
σXY=1NN∑i=1(Xi−ˉX)(Yi−ˉY),ρ=σXYσXσYandρ*=σuvσvσurespectivelyσXY=1NN∑i=1(Xi−¯¯¯X)(Yi−¯¯¯Y),ρ=σXYσXσYandρ∗=σuvσvσurespectively
Using delta notation, we define the following:
δ0=ˉyˉy−1⇒ˉy=ˉY(1+δo)δ0=¯y¯y−1⇒¯y=¯¯¯Y(1+δo) (5)
δ1=ˉxˉx−1⇒ˉx=ˉX(1+δ1)δ1=¯x¯x−1⇒¯x=¯¯¯X(1+δ1) (6)
Such that,
E(δ0)=E(δ1)=0E(δ0)=E(δ1)=0 (7)
E(δ20)=σ2YnθYˉY2E(δ20)=σ2YnθY¯¯¯Y2 (8)
E(δ21)=σ2XnˉX2(σ2X+σ2vσ2X)=σ2XnθXˉX2E(δ21)=σ2Xn¯¯¯X2(σ2X+σ2vσ2X)=σ2XnθX¯¯¯X2 (9)
where,
θY=σ2Yσ2Y+σ2uθY=σ2Yσ2Y+σ2u and θX=σ2Xσ2X+σ2vθX=σ2Xσ2X+σ2v ,
and are bounded on (0,1).
Also,
E(δ0hδ1h)=1nˉYˉX(CYCXρ+σuσvρ*)E(δ0hδ1h)=1n¯¯¯Y¯¯¯X(CYCXρ+σuσvρ∗) (10)
The traditional sample mean per unit estimator for estimating population mean when the sample data is contaminated with measurement error is given by:
t0=ˉyt0=¯y (11)
The variance is given as
V(t0)=C2YnθYV(t0)=C2YnθY (12)
Shalabh and Jia-Ren9 proposed ratio estimator and product estimator when the general assumption on the measurement errors is relaxed as
t1=ˉyˉXˉxt1=¯y¯¯¯X¯x (13)
t2=ˉyˉxˉXt2=¯y¯x¯¯¯X (14)
They obtained the mean square error of ratio and product estimators as
MSE(t1)=ˉY2n(C2YθY+C2XθX−2(CYCXρ+σuσvρ*ˉYˉX))MSE(t1)=¯¯¯Y2n(C2YθY+C2XθX−2(CYCXρ+σuσvρ∗¯¯¯Y¯¯¯X)) (15)
MSE(t2)=ˉY2n(C2YθY+C2XθX+2(CYCXρ+σuσvρ*ˉYˉX))MSE(t2)=¯¯¯Y2n(C2YθY+C2XθX+2(CYCXρ+σuσvρ∗¯¯¯Y¯¯¯X)) (16)
Motivated by the Shalabh and Jia-Ren,9 we propose the following modified ratio estimator to estimate population mean in the presence of correlated measurement errors as
tr=ˉy(ˉX+ρˉx+ρ)βtr=¯y(¯¯¯X+ρ¯x+ρ)β (17)
where ββ is any real number chosen so as to minimize the mean squared errors of t1t1 . It may be noted that the proposed modified estimator is a class of estimators and that the following estimators are particular members of the proposed estimators when
β=0,tr0=ˉyβ=0,tr0=¯y (18)
β=1,tr1=ˉy(ˉX+ρˉx+ρ)β=1,tr1=¯y(¯¯¯X+ρ¯x+ρ) (19)
β=−1,tr2=ˉy(ˉx+ρˉX+ρ)β=−1,tr2=¯y(¯x+ρ¯¯¯X+ρ) (20)
β=12,tr3=ˉy(ˉX+ρˉx+ρ)12 (21)
β=−12,tr4=ˉy(ˉX+ρˉx+ρ)−12 (22)
Using notations defined in Section 3, we obtain the properties of the proposed estimators. Expressing (17) in terms of δi,(i=0,1)
tr=ˉY(1+δ0)(ˉX+ρˉX(1+δ1)+ρ)β (23)
(23) can be rewritten as
tr=ˉY(1+δ0)(1+ˉXρˉX+ρδ1)β
=ˉY(1+δ0)[1−β(ˉXρˉX+ρ)δ1+β(β+1)2(ˉXρˉX+ρ)2δ1+O(δ1)]
tr=ˉY+ˉY[δ0−β(ˉXρˉX+ρ)δ1δ0+β(β+1)2(ˉXρˉX+ρ)2δ21δ0−β(ˉXρˉX+ρ)δ1+β(β+1)2(ˉXρˉX+ρ)2δ21]
tr−ˉY=ˉY[δ0−β(ˉXρˉX+ρ)δ1δ0+β(β+1)2(ˉXρˉX+ρ)2δ21δ0−β(ˉXρˉX+ρ)δ1+β(β+1)2(ˉXρˉX+ρ)2δ21] (24)
Taking expectation of both sides of (24) and making necessary substitutions using (8), (9) and (10) and simplifying the bias up to first order approximation, (24) becomes
Bias(tr)=E(tr−ˉY)=ˉYβn[(β+12)(ˉXρˉX+ρ)2CXθX−(ˉXρˉX+ρ)(ρCYCX+σuσvρ*ˉYˉX)] (25)
Squaring and taking expectation of both sides of (24) and making necessary substitution using (8), (9) and (10) and simplifyingthe mean square error up to first order approximation, (24) becomes
MSE(tr)=E(tr−ˉY)2=ˉY2n[C2YθY+β2(ˉXρˉX+ρ)2C2XθX−2β(ˉXρˉX+ρ)(ρCYCX+σuσvˉYˉXρ*)] (26)
Using the least square method which seek to minimize sum of square errors, we obtain the optimum value β which minimizes the mean square error of tr as
β=βopt=(ˉX+ρˉXρ)(ρCYCX+σuσvˉYˉXρ*)θXC2X (27)
Substituting (27) in (26) we obtain minimum mean square error of tr as
MSEmin(tr)=ˉY2n[C2YθY−θXC2X(ρCYCX+σuσvˉYˉXρ*)2] (28)
The variance and the mean square errors of the estimators which are particular members of the proposed modified estimator can easily be obtained by substituting the appropriate values of β=0,1,−1,12,−12 in (26). Thus,
Var(tr0)=ˉY2nC2YθY (29)
MSE(tr1)=ˉY2n[C2YθY+(ˉXρˉX+ρ)2C2XθX−2(ˉXρˉX+ρ)(ρCYCX+σuσvˉYˉXρ*)] (30)
MSE(tr2)=ˉY2n[C2YθY+(ˉXρˉX+ρ)2C2XθX+2(ˉXρˉX+ρ)(ρCYCX+σuσvˉYˉXρ*)] (31)
MSE(tr3)=ˉY2n[C2YθY+14(ˉXρˉX+ρ)2C2XθX−(ˉXρˉX+ρ)(ρCYCX+σuσvˉYˉXρ*)] (32)
MSE(tr4)=ˉY2n[C2YθY+(ˉXρˉX+ρ)2C2XθX+2(ˉXρˉX+ρ)(ρCYCX+σuσvˉYˉXρ*)] (33)
The optimum mean square error of tr was compared with the existing estimators t0,t1,t2 . Thus, from (28) and(12), we observed that
MSEmin(tr)−Var(t0)=−(ρCYCX+σuσvˉYˉXρ*)2<0 (34)
Since (ρCYCX+σuσvˉYˉXρ*)2 will always be positive, (34) will always be negative, and the proposed estimator will always be more efficient than the usual unbiased sample mean per unit estimator.
From (28) and (15), we observed that
MSEmin(tr)−MSE(t1)=−θXC2X(ρCYCX+σuσvˉYˉXρ*)2−C2XθX+2(ρCYCX+σuσvˉYˉXρ*)<0 (35)
From (28) and (16), we observed that
MSEmin(tr)−MSE(t2)=−θXC2X(ρCYCX+σuσvˉYˉXρ*)2−C2XθX−2(ρCYCX+σuσvˉYˉXρ*)<0 (36)
From (34), (35) and (36), the proposed estimator will always be more efficient than the sample mean per unit estimator, ratio estimator and product estimator in the presence of correlated measurement errors.
The efficiency of the proposed estimator tr is illustrated using hypothetical data set on income and expenditure from Gujarati and Porter.10
y*i=Household Spending(True Value)
x*i=Household Earning(True Value)
yi=Household Spending(Observed Value)
xi=Household Earning(Observed Value)
The following values of the parameter were obtained from the given data.
N |
ˉY | ˉX | σ2Y | σ2X | σ2u | σ2v | ρ | ρ* | θY | θX |
10 |
127 |
170 |
1278 |
3300 |
36 |
41 |
0.964 |
-0.09087 |
0.975 |
0.988 |
Table 1 Value of the Parameters
Table 2 shows the percentage relative efficiency (PRE) with respect to sample mean per unit ˉy of the proposed estimator and some existing estimator. This was defined as
PRE(·)=Var(ˉy)MSE(·)×100 (37)
Estimators |
Mean square error |
Percentage relative efficiency |
t0 |
131.3974 |
100 |
tropt |
14.4820 |
907.32 |
t1 |
22.5620 |
582.38 |
t2 |
613.1759 |
21.43 |
tr1 |
19.6744 |
667.86 |
tr2 |
611.8517 |
21.48 |
tr3 |
32.6882 |
401.97 |
tr4 |
315.8020 |
41.61 |
Table 2 Mean square error and relative efficiency
Further illustration of the efficiency of the proposed estimator was done using another hypothetical dataset from Okafor12 on land area available for cultivation and land area cultivate with maize, where,
yi=the observed land area of the village cultivated with maize
xi=the observed land area of the village avaliable for cultivation
y*i=the true land area of the village cultivated with maize
x*i=the true land area of the village avaliable for cultivation
The following values for the population parameter were obtained from the given data.
N |
ˉY | ˉX | σ2Y | σ2X | σ2u | σ2v | ρ | ρ* | θY | θX |
20 |
530.08 |
829.16 |
61824.97 |
190361.30 |
9.57 |
9.31 |
0.814 |
0.998 |
0.99985 |
0.99995 |
Table 3 Value of the Parameters Population II
Table 4 shows the mean squared error and percentage relative efficiency (PRE) of the proposed estimator and some estimators which are particular members of the proposed modified estimator with respect to sample mean per unit ˉy.
Estimators |
Mean square error |
Percentage relative efficiency |
t0 |
3091.712 |
100.00 |
tropt |
0.892 |
346460.000 |
tr1 |
1073.425 |
288.023 |
tr2 |
10253.820 |
30.152 |
tr3 |
2587.140 |
119.503 |
tr4 |
4882.238 |
63.326 |
t1 |
1336.565 |
231.318 |
t2 |
12627.140 |
24.485 |
Table 4 Mean Squared Error and Percentage Relative Efficiency
For different values of β , we also obtained the relative efficiency of tr over t0 defined as
PRE(.)=Var(t0)MSE(tr) (38)
Table 5 represents the relative efficiency of tr with respect to t0 for different values of β .
Value of β |
MSE(tr) |
Relative Efficiency |
0.00 |
131.397 |
1.000 |
0.05 |
117.645 |
1.117 |
0.10 |
104.750 |
1.254 |
0.15 |
92.711 |
1.417 |
0.20 |
81.530 |
1.612 |
0.25 |
71.205 |
1.845 |
0.30 |
61.738 |
2.128 |
0.35 |
53.127 |
2.473 |
0.40 |
45.374 |
2.896 |
0.45 |
38.477 |
3.415 |
0.50 |
32.437 |
4.051 |
0.55 |
27.255 |
4.821 |
0.60 |
22.929 |
5.731 |
0.65 |
19.460 |
6.752 |
0.70 |
16.848 |
7.799 |
0.75 |
15.093 |
8.706 |
0.80 |
14.195 |
9.256 |
βopt=0.828 |
14.067 |
9.341 |
0.85 |
14.154 |
9.283 |
0.90 |
14.970 |
8.777 |
0.95 |
16.643 |
7.895 |
1.00 |
19.173 |
6.853 |
1.05 |
22.560 |
5.824 |
1.10 |
26.803 |
4.902 |
1.15 |
31.904 |
4.119 |
1.20 |
37.862 |
3.470 |
1.25 |
44.676 |
2.941 |
1.30 |
52.348 |
2.510 |
1.35 |
60.876 |
2.158 |
1.40 |
70.262 |
1.870 |
1.45 |
80.504 |
1.632 |
1.50 |
91.603 |
1.434 |
1.55 |
103.560 |
1.269 |
Table 5 Relative efficiency of tr with respect to t0 for different values of β
The main aim of this work is to ascertain the extent of the impact of correlated measurement errors on the quality of sample statistics which estimate the population parameters. Thus, since Bias(tr) is a function of θX, it shows that the bias of the proposed class of estimator is affected by the presence of correlated measurement error in the auxiliary variable. Also MSEmin(tr) is a function of θY, θX, it also showed that the mean squared error of the proposed class of estimator is affected by presence of correlated measurement errors in both study and auxiliary variables. Also the proposed modified ratio estimator at its optimum value has more gain in efficiency than some existing estimators in the presence of correlated measurement errors. The study also revealed that even when the proposed modified ratio estimator deviates from its optimum value, there are still range of estimators at different values of β to choose from. Therefore, the proposed estimator should be preferred in practice.
None.
The authors declare that they have no conflict of interest.
©2022 Boniface, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7