Research Article Volume 13 Issue 2
Department of Statistics, Akwa Ibom State University, Mkpat Enin, Nigeria
Correspondence: Iseh Matthew Joshua, Department of Statistics, Akwa Ibom State University, Mkpat Enin, Nigeria, Tel +23480386405
Received: April 15, 2024 | Published: May 20, 2024
Citation: Ikot EE, Iseh MJ. Calibration for efficiency of ratio estimator in domains of study with sub-sampling the nonrespondents. Biom Biostat Int J. 2024;13(2):42-50. DOI: 10.15406/bbij.2024.13.00413
In sample survey, it is expected that the information would be collected from all the selected units in the sample, but practically, it is generally not possible because of non-response. Some of the units may not respond or may not be contacted during the survey period. This work focuses on domain estimation of population mean with sub-sampling the non-respondents. In this study, we consider calibration technique as a method of correcting non-response in domains of study by minimizing the chi-square distance function between the weight of the main estimator and the calibrated weight subject to the formulated constraint on the auxiliary variable. As a result, two estimators are proposed; these are the ratio estimator for domain mean and a ratio estimator for double sampling. Bias and Mean Square Error (MSE) for the proposed estimators are derived.
We have used an auxiliary variable to estimate the population mean assuming that the non-response is observed only on the study variable. The proposed estimators and the existing estimators where compared empirically in the domains with small sampling units and two populations where considered in terms of the MSE and Percentage Relative Efficiency (PRE). We considered two cases where non-responses are uniform in the two strata at approximately (30%) and a case where the non-response rates are different with 20% and 40% in strata 1 and 2 respectively. The proposed estimators are more efficient than the existing estimators.
Keywords: auxiliary variable, calibration, non-response, ratio estimator, sub-sampling, domain
It is obvious that society cannot run effectively on the basis of hunches or trial and error. Decisions based on data will provide better results than those based on intuitions or gut feelings. Statistics is a range of procedures for gathering, organizing, analyzing and presenting quantitative data. In the modern society, the need for statistical information seems endless. In particular, data are regularly collected to satisfy the need for information about specified sets of elements, called as finite population. Statistics helps us to turn data into information. One of the most important modes of data collection for satisfying such needs is sample survey, that is, a partial investigation of the finite population and on the basis of such partial information (sample information) one tries to inference about the finite population characteristics (parameters). Sample survey is less expensive than a complete enumeration, it is usually less time consuming, and may even be more accurate than the method of complete enumeration. The term sample is used for the set of units or portion of the aggregate of material which has been selected with the belief that it will be representative of the whole aggregate. The sampling theory deals with scientific and objective procedure of choosing an appropriate sampling design, i.e. selecting a sample from the population which is representative of the population as a whole and also provides suitable estimation procedure to estimate the population parameters. Most challenging about the sample representation of the population is the effect of non-response on the estimation of the population parameter. Different authors have suggested different techniques for a reliable and efficient estimator, among which is the calibration technique.
Calibration estimation in sample surveys has since its introduction by Deville JC, et al.1 developed an established theory and method for estimation of finite population parameter. Calibration of weights is a technique that uses population data on auxiliary variables to improve estimates in sample surveys. If auxiliary data are available, some improvement in the precision of estimate may be achieved. Incorporation of auxiliary data in the estimation process is known as calibration. In stratified random sampling, calibration approach is used to obtain optimum strata weights for improving the precision of survey estimates of population parameters. Koyuncu N, et al.2 defined some calibration estimators in stratified random sampling for population characteristics and Clement EP, et al.3 applied the concept of calibration estimators for domain totals in stratified random sampling. Clement EP, et al.4 combined some scalars with the mean of the auxiliary variable and proposed calibration alternative ratio estimator of mean in stratified sampling.
When a researcher is interested in obtaining information from a local or small area, it becomes challenging with small sample size in some of the areas of interest and even very difficult when non-response occurs. Several authors have made attempts to obtain reliable estimates in such areas of interests popularly called domains of study. Among them is Godwin A, et al.5 The author considers modifications of some of the procedures for global ratio estimation in single-phase sampling with sub-sampling the non-respondents proposed by Rao P6 to obtain an estimate of mean for a small domain that cuts across constituent strata of a population with unknown weights. The bias and mean-square error of each of the modified estimators were obtained for comparison However, the estimators were not subjected numerical test to validate the analytical claims most importantly in areas of small/zero sample sizes. Unlike,6 the population mean of the auxiliary variable adopted by Godwin A, et al.5 is assumed to be unknown before the start of the survey and hence double sampling was applied under stratified simple random sampling.
In a bid to improve on the efficiency of the estimators under non-response,7,8 adopted the concept of calibration with a single constraint to estimate the population mean and the result was encouraging. Cochran WG9 showed that knowledge of, of domain j that is of interest reduces the variance of the estimator of domain mean in a single-phase simple random design. The reduction in variance is shown to be greater when the proportion of non-domain elements in the population is large and the study variable varies little among the domain elements. Ashutosh10 proposed estimators for domain mean utilizing stratified sampling with non-response. The proposed estimator was compared to a direct ratio estimator for domain mean utilizing stratified sampling with non-response. Clement EP, et al.4 stated that in the presence of powerful auxiliary variables, the calibration estimation meets the objective of reducing both non-response bias and the sampling error. Etebong P11 develops a new approach to ratio estimation that produces a more efficient class of ratio estimators that do not depend on any optimality conditions for optimum performance using calibration weightings. Iseh MJ, et al.,12 Iseh MJ, et al.,13 Iseh MJ, et al.14 considered the challenges of population mean estimation in small area that is characterized by small or no sample size and in the presence of unit non-response and presents a calibration estimator that produces reliable estimates under stratified random sampling from a class of synthetic estimators using calibration approach with alternative distance measure. To overcome the challenges of poor performance of the ratio estimator in small area occasioned with small/no sample size as a result of non-response, this work considers the calibration approach using the constraints of equal weights adjustment criteria, unbiased estimator of the population mean and variance of the auxiliary variable.
In this paper, based on the attempt by Godwin A, et al.5 who suggested the global ratio estimation in single-phase sampling with sub-sampling the non-respondents to obtain an estimate of mean for a small domain that cuts across constituent strata of a population with unknown weights, a new improved ratio estimator for population mean in stratified random sampling is suggested using the theory of calibration estimation with three constraints to achieve optimal precision and efficiency.
Some existing estimator and theoretical underpinnings
This section considers some existing ratio estimators for estimation of domain population mean and the theoretical underpinnings for the proposed ratio estimator. Though not much have been done in the area of domains of study in the presence of non-response probably due to the intricate nature of the estimation, this paper highlights some existing estimators as applicable to domain estimation which applied the concept of sub-sampling the non-respondents.
Some existing estimator
Study notations and definitions
N=N= population size under study
Nd=Nd= population size for the dthdth domain
Ndh=Ndh= population size of hthhth stratum in dthdth domain
ndh=ndh= sample size for the dthdth domain in the hthhth stratum
ndh=ndh= domain sample Size
n1dh=n1dh= sample size for respondent units for the dthdth domain in the hthhth Stratum
n2dh=n2dh= Sample size for nonrespondents units for the dthdth domain in the hthhth Stratum
Wdh*=Wdh∗= The calibration weight
Wdh=Wdh= Stratum weight
Wdh1Wdh1 = Response rate of the dthdth domain in the hthhth Stratum
Wdh2=Wdh2= Non-response rate of the dth domain in the hth Stratum
λ1,λ2 and λ3 = the LaGrange multipliers
X= Auxiliary variable
Y= Study variable
ˉxdh= Sample mean for the dth domain in the hth Stratum of the auxiliary variable
ˉy*dh= Unbiased estimator of the population mean for the dth domain in the hth Stratum of the study variable
ˉXdh= Population mean for dth domain of the auxiliary variable in the hth Stratum
ˉYdh= Population mean for dth domain of the study variable in the hth Stratum
ˉXd= Population mean for dth domain of the auxiliary variable
ˉYd= Population mean for dth domain of the study variable
S2ydh= Mean square of the dth domain in the hth Stratum of the study variable
S2xdh= Mean square of the dth domain in the Stratum of the hth auxilliary variable
Cxdh= Coefficient of variation for the dth domain in the hth Stratum of the auxilliary variable
Cydh= Coefficient of variation for the dth domain in the hth Stratum of the study variable
S2ydh2= Mean square of non-respondence of the dth domain in the hth Stratum of the study variable
kdh= Inverse sampling rate
Qdh= Tuning parameter
Udofia (2004) estimator
An alternative ratio estimator for domain mean was suggested by [5] is as follows:
t2j=∑khWhˉy*hˉxhˉXh (1)
With
Bias(t2j)=∑kh=1Wh1−fhnhˉXh(RhS2xh−Sxhyh)
and
MSE(t2j)=∑kh=1W2h[1−fhnh(S2yh+R2hS2xh−2RhSxhyh+W2h(k−1)nhS22yh)] (2)
where
Rh=ˉYhˉXh,fh=(1nh−1Nh)
Pal and Singh HP estimator
Pal and Singh15 proposed a class of ratio-cum-ratio-type exponential estimators for population mean with sub sampling the non-respondents. The estimator and the mean square error is given as:
tps1=αˉy*(ˉXˉx)+(1−α)ˉy*exp(ˉX−ˉxˉX+ˉx)
And
MSE(tps1)=ˉY2(λC2y(1−ρ2xy)+W2(Z−1)nC2y(2)) (3)
Where
W2=n2n,λ=1−fn,f=nN and α is a constant
Ashutosh estimator
Ashutosh10 proposed a direct ratio generalized estimator for domain mean through stratified sampling with non-response as;
TDG.st.β.d=ˉyst.d[ˉxst.dˉXst.d]β
Where β is a chosen constant of dth domain mean of x and the value of y respondents can be written as;
ˉyst.d=∑Hh=1Wh.dˉyh.dˉxst.d=∑Hh=1Wh.dˉxh.d
Members of the proposed estimators T*DG.st.β.d
T*DG.st.β.d=ˉy*st.a if β = 0
T*DG.st.−1.a=ˉy*st.aˉx*st.aˉXh.a if β=−1
T*DG.st.1.a=ˉy*st.aˉx*st.aˉx*st.a if β=1
T*DG.st.2.a=ˉy*st.a[ˉx*st.aˉx*st.a]2 if β=2
Bias and Mean Square Error of T*DG.st.−1.a is given as;
Bias(T*DG.st.−1.a)=∑Hh=1Wh.aˉYh.a[Nh.a−nh.aNh.anh.aC2Xh.a+(gh.a−1)W2h.anh.aC22Yh.a]−ˉYa
MSE(T*DG.st.−1a)=∑Hh=1W2h.aˉY2h.a[Nh.a−nh.aNh.anha(C2Yh.a+C2Xh.a−2CYXha)+(gh.a−1)W2h.anh.a(C22Yh.a+C22Xh.a−2C2YXha)] (4)
Sampling design in single phase
Let π={U1,U2,...,UN} denote a finite population, the elements of which fall into L known strata with Ndh elements the hth stratum, h=1,2,...,L,∑hNdh=Nd . It is assumed that π can also be partitioned according to the distribution of variable Z into exhaustive set of D sub-populations or domains of study that is denoted by {A*d;d=1,2,...,D} . Each stratum consist of a substratum of N1dh respondents and a substratum of N2dh non-respondents, N1dh+N2dh=Ndh for all h. Let A*dh denote the part of domain d(A*d) in stratum h and Ndhj the unknown number of elements in A*dh . Let ydhj denote the value of characteristic Y for element i in A*dh .
Proposed estimator
Calibration has been proven to be an estimation technique to smoothen an existing estimator for a better precision and an improved efficiency. For household survey and other economic data that requires knowledge of the supplementary information, a new ratio estimator is suggested to enhance efficiency in domains of study even in the presence of non-response. Motivated by [5] in an Alternative Ratio Estimator for domain mean, we proposed the following estimator:
t*cal=∑Lh=1W*dhˉy*dhˉxdhˉXdh (5)
(5) can be written as
t*cal=∑Lh=1W*dhˉydhr (6)
where
ˉydhr=rdhˉXdh
and
rdh=ˉy*dhˉxdh,ˉXdh is assume to be known and W*dh is the calibration weight aimed at adjusting the existing weight in [5] estimators using a chi-square distance measure.
φ=∑Lh=1(W*dh−Wdh)2QdhWdh
Subject to the following constraints
∑Lh=1W*dh=1∑Lh=1W*dhˉxdh=∑Lh=1WdhˉXdh∑Lh=1W*dhs2dh=∑Lh=WdhS2dh
Thus the optimization problem is given by:
φ=∑Lh=1(W*dh−Wdh)2QdhWdh−2λ1(∑Lh=1W*dh−1)−2λ2(∑Lh=1W*dhˉxdh−∑Lh=1WdhˉXdh)−2λ3(∑Lh=1W*dhs2dh−∑Lh=WdhS2dh)
where λ1,λ2 and λ3 are the Lagrange multipliers such that
∂φ∂W*dh=2(W*dh−Wdh)QdhWdh−2λ1−2λ2ˉxdh−2λ3s2dh=0
⇒W*dh=Wdh+QdhWdh(λ1+λ2ˉxdh+λ3s2dh)
Substituting W*dh in Eq. 6 gives
ˆˉtcal=∑Lh=1Wdhˉydhr+β1(dh(1−∑Lh=1Wdh)+β2(dh)(∑Lh=1Wdh(ˉXdh−ˉxdh))+β3(dh)(∑h=1Wdh(S2dh−s2dh)) (7)
Where
β1(dh)=[(∑Lh=1QdhWdhˉydhr)(∑Lh=1QdhWdhˉx2dh)(∑Lh=1QdhWdhs4dh)−(∑Lh=1QdhWdhˉydhr)(∑Lh=1QdhWdhˉxdhs2dh)2−(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhˉxdhˉydhr)(∑Lh=1QdhWdhs4dh)+(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhs2dhˉydhr)(∑Lh=1QdhWdhˉxdhs2dh)(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhˉxdhˉydhr)(∑Lh=1QdhWdhˉxdhs2dh)−(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhs2dhˉydhr)(∑Lh=1QdhWdhˉx2dh)(∑Lh=1QdhWdh)(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhˉxdhs2dh)−(∑Lh=1QdhWdh)(∑Lh=1QdhWdhˉxdhs2dh)2−(∑Lh=1QdhWdhˉxdh)2(∑Lh=1QdhWdhs4dh)+(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhˉxdhs2dh)+(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhˉxdhs2dh)−(∑Lh=1QdhWdhs2dh)2(∑Lh=1QdhWdhˉx2dh)]
β2(dh)=[(∑Lh=1QdhWdh)(∑Lh=1QdhWdhˉxdhˉydhr)(∑Lh=1QdhWdhs4dh)−(∑Lh=1QdhWdh)(∑Lh=1QdhWdhs2dhˉydhr)(∑Lh=1QdhWdhˉxdhs2dh)−(∑Lh=1QdhWdhˉydhr)(∑Lh=1QdhWdhˉx2dh)(∑Lh=1QdhWdhs4dh)+(∑Lh=1QdhWdhˉydhr)(∑Lh=1QdhWdhˉxdhs2dh)2+(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhs2dhˉydhr)−(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhs2dhˉydhr)(∑Lh=1QdhWdh)(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhˉxdhs2dh)−(∑Lh=1QdhWdh)(∑Lh=1QdhWdhˉxdhs2dh)2−(∑Lh=1QdhWdhˉxdh)2(∑Lh=1QdhWdhs4dh)+(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhˉxdhs2dh)+(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhˉxdhs2dh)−(∑Lh=1QdhWdhs2dh)2(∑Lh=1QdhWdhˉx2dh)]
β3dh)=[(∑Lh=1QdhWdh)(∑Lh=1QdhWdhˉx2dh)(∑Lh=1QdhWdhs2dhˉydhr)−(∑Lh=1QdhWdh)(∑Lh=1QdhWdhˉxdhs2dh)(∑Lh=1QdhWdhˉxdhˉydhr)−(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhs2dhˉydhr)+(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhˉxdhˉydhr)+(∑Lh=1QdhWdhˉydhr)(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhˉxdhs2dh)−(∑Lh=1QdhWdhˉydhr)(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhˉx2dh)(∑Lh=1QdhWdh)(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhˉxdhs2dh)−(∑Lh=1QdhWdh)(∑Lh=1QdhWdhˉxdhs2dh)2−(∑Lh=1QdhWdhˉxdh)2(∑Lh=1QdhWdhs4dh)+(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhˉxdhs2dh)+(∑Lh=1QdhWdhs2dh)(∑Lh=1QdhWdhˉxdh)(∑Lh=1QdhWdhˉxdhs2dh)−(∑Lh=1QdhWdhs2dh)2(∑Lh=1QdhWdhˉx2dh)]
Bias and variance of the proposed estimator
From the proposed estimator above
Let
e0=(ˉy*dh−ˉYdh)ˉYdh⇒W*dh=Wdh+QdhWdh(λ1+λ2ˉxdh+λ3s2dh) ,
e1=(ˉxdh−ˉXdh)ˉXdh ,
e2=(s2xdh−S2xdh)S2xdh
Where
ˉy*dh=∑Lh=1y*dhndh , ˉxdh=∑Lh=1xdhndh , ˉXdh=∑Lh=1XdhNdh , s2dh=∑Lh=1(ˉxdh−ˉXdh)2ndh−1 and S2dh=∑Lh=1(ˉXdh−ˉX)2Ndh−1
Also,
ˉy*dh=ˉYdh(1+e0)ˉxdh=ˉXdh(1+e1)s2xdh=S2xdh(1+e2)
Let
E[e20]=Var(ˉy*dh)ˉY2dh=(1ndh−1Ndh)C2ydh+(Kdh−1)ndh2Wdh2C2ydh2=(1ndh−1Ndh)S2ydhˉY2dh+(Kdh−1)ndhˉY2dhWdh2S2ydh2E[e21]=Var(ˉxdh)ˉX2dh=(1ndh−1Ndh)C2xdh=(1ndh−1Ndh)S2xdhˉX2dhE[e22]=Var(s2xdh)S2xd=(1ndh−1Ndh)S4xdhS2xdh=(1ndh−1Ndh)S2xdhE[e0e1]=COV(ˉxdh,ˉy*dh)ˉXdhˉYdh=1ˉXdhˉYdh[C(E[ˉxdh],E[ˉy*dh])]=(1ndh−1Ndh)ρxyCydhCxdh=1ˉXdhˉYdh(1ndh−1Ndh)ρxySxdhSydh∴E[e0]=E[e1]=E[e2]=0E[e1e2]=(1ndh−1Ndh)Cxdhλ03=(1ndh−1Ndh)SxdhˉXdhλ03
where
λrs=μrsμr/220μs/202
And
μrs=1Ndh−1∑Ni=1(Ydhi−ˉYdh)r(Xdhi−ˉXdh)sμ20=S2ydhμ02=S2xdh
Hence
λ03=μ03μ0/220μ3/202
E[e0e2]=(1ndh−1Ndh)Cydhλ12=(1ndh−1Ndh)SydhˉYdhλ12
Where λ12=μ12μ1/220μ02
ˉydhr=ˉy*dhˉxdhˉXdh
=ˉYdh(1+e0−e1+e21−e0e1)
To obtain the bias
B(t*cal)=E[t*cal−ˉYd]=E[∑Lh=1Wdh[ˉYdh(1+e0−e1+e21−e0e1)]−β2(dh)∑Lh=1WdhˉXdhe1−β3(dh)∑Lh=1WdhS2xdhe2−ˉYd]=E[∑Lh=1Wdh[ˉYdh(e0−e1+e21−e0e1)]−β2(dh)∑Lh=1WdhˉXdhe1−β3(dh)∑Lh=1WdhS2xdhe2]=∑Lh=1WdhˉYdh[(E(e21)−E(e0e1))]B(t*cal)=∑Lh=1WdhˉYdh[(1ndh−1Ndh)S2xdhˉX2dh]−∑Lh=1WdhˉYdh[1ˉXdhˉYdh(1ndh−1Ndh)ρxySxdhSydh] (8)
=E[∑Lh=1Wdh[ˉYdh(1+e0−e1+e21−e0e1)]−β2(dh)∑Lh=1WdhˉXdhe1−β3(dh)∑Lh=1WdhS2xdhe2−ˉYd]2 ignoring terms with power >2
MSE(t*cal)=∑Lh=1W2dh[(1ndh−1Ndh)S2ydh+(Kdh−1)ndhWdh2S2ydh2]−2∑Lh=1W2dh[ˉYdhˉXdh(1ndh−1Ndh)ρxySxdhSydh]−2β2dh∑Lh=1W2dh[(1ndh−1Ndh)ρxySxdhSydh]−2β3(dh)∑Lh=1W2dh[(1ndh−1Ndh)S2xdhSydhλ12]+∑Lh=1W2dhˉY2dhˉX2dh(1ndh−1Ndh)S2xdh+2β2dh∑Lh=1W2dhˉYdhˉXdh(1ndh−1Ndh)S2xdh+2β3(dh)∑Lh=1W2dhˉYdhˉXdh[(1ndh−1Ndh)S3xdhλ03]+β22dh∑Lh=1W2dh(1ndh−1Ndh)S2xdh+β23dh∑Lh=1W2dh[(1ndh−1Ndh)S4dh(λ04−1)] (9)
To obtain minimum variance, we differentiate (9) partially with respect to and
Such that
β2(dh)=∑Lh=1W2dh[(1ndh−1Ndh)ρxySxdhSydh]−∑Lh=1W2dhˉYdhˉXdh(1ndh−1Ndh)S2xdh∑Lh=1W2dh(1ndh−1Ndh)S2xdh (10)
β3(dh)=∑Lh=1W2dh[(1ndh−1Ndh)S2xdhSydhλ12]−∑Lh=1W2dhˉYdhˉXdh[(1ndh−1Ndh)S3xdhλ03]∑Lh=1W2dh [(1ndh−1Ndh)S4dh(λ04−1)] (11)
min MSE(t*cal)=∑Lh=1 W2dh[(1ndh−1Ndh) S2ydh+(Kdh−1)ndh2Wdh2S2ydh2]−2∑Lh=1 W2dhˉYdhˉXdh(1ndh−1Ndh)ρxySxdhSydh+
∑Lh=1 W2dhˉYdhˉXdh(1ndh−1Ndh)S2xdh−[∑Lh=1W2dh(1ndh−1Ndh)ρxySxdhSydh−∑Lh=1W2dhˉYdhˉXdh(1ndh−1Ndh)S2xdh]2∑Lh=1W2dh(1ndh−1Ndh)S2xdh
[∑Lh=1W2dh(1ndh−1Ndh)S2xdhSydhλ12−∑Lh=1W2dhˉYdhˉXdh(1ndh−1Ndh)S3xdhλ03]2∑Lh=1 W2dh(1ndh−1Ndh)S4dh(λ04−1) (12)
Equation (12) is the minimum variance for the proposed estimator
Percentage relative efficiency of the estimators
The percentage relative efficiency of the proposed estimators with respect to the existing estimators is given as:
PRE=MSE(P)MSE(E)×100
Empirical study
We take the Sweden municipalities MU284,16 (appendix B). The population is geographically sub-divided (domain) into eight different parts 1, 2, 3, 4, 5, 6, 7 and 8 having their sizes 25, 48, 32, 38, 56, 41, 15 and 29 respectively. However, we considered only four domains 1, 3, 7 and 8 because these domains have small units compared to other domains. The proposed estimator is a calibration estimator. Variables like ndh1 and ndh were computed based on existing information from the populations. Then each of the domains is classified into homogeneous groups according to our convenient into two strata: value of below 1500 (millions of kronor) and above 1500 (millions of kronor). We consider two cases 1 and 2 of non-response (in both Population I and Population II).
Case 1: If non-respondents are available in both strata (1 and 2) as well as in the domains (approximately 30%).
Case 2: If different non-respondents are available in both strata 1 and 2 approximately 20% and 40% respectively.
Population I
Y: Real estate values according to 1984 assessment (in millions of kronor).
X: Total number of municipal employees in 1984.
Population II
Another population is considered ([16] appendix B) which is classified in to four domains with stratum 1 and 2 according to the revenues less than 100 (in millions of kronor) and revenues above 100 (in millions of kronor).
Y: Revenues of 1985 municipal taxation assessment (in millions of kronor).
X: 1985 population (in thousands).
This discussion is based on the empirical analysis carried out and results presented in Tables 1–7. From Table 7 (Populations I and II) with respect to single stage sampling (MSE of estimators for domain mean), it is observed that the mean square error of the proposed estimator is less than the MSE of the existing estimators in all the domains. This is seen in both cases of non-response where the non-response rate was uniform across the strata and where it was non-uniform as specified in the data. The Average Mean Squared Errors (AMSE) also confirms the behavior of the MSE in both populations and cases. From Table 8 (Populations I and II) with respect to single stage sampling, it is observed that the Percentage Relative Efficiency (PRE) for the proposed estimators kept at a benchmark of 100% had greater gains in efficiency than the existing estimators for all the domains.
Domain Parameter |
Domain |
|
|
|
|
|
|
|
Domain Size |
25 |
|
32 |
|
15 |
|
29 |
|
Stratum |
1 |
2 |
1 |
2 |
1 |
2 |
1 |
2 |
Ndh |
2 |
23 |
12 |
20 |
2 |
13 |
18 |
11 |
Wdh |
0.080 |
0.920 |
0.375 |
0.625 |
0.133 |
0.867 |
0.620 |
0.379 |
ˉYdh |
955.50 |
6888 |
1056.9 |
3364 |
1231 |
4020 |
723.2 |
4799 |
ˉXdh |
529 |
4385 |
485.4 |
1816 |
493 |
1694 |
354.1 |
2205 |
S2Ydh |
40.5 |
136775663 |
71715.5 |
4652460 |
135721 |
5643626 |
81863.7 |
10236271 |
S2Xdh |
101250 |
81259476 |
29239.7 |
2530489 |
162 |
2354475 |
18128.3 |
2902966 |
SXYdh ; |
-2025 |
104701660 |
34761.1 |
3144594 |
-4689 |
2999017 |
13306 |
3557068 |
ρXYdh |
-1.000 |
0.993 |
0.759 |
0.916 |
-1.000 |
0.823 |
0.345 |
0.653 |
Table 1 Value of parameters of the strata (1 and 2) and domains
Source: Statistical computation from original data 2023.
Domain |
Strata |
S2Ydh2 |
S2Xdh2 |
SXYdh2 |
Kdh |
ndh2 |
wdh2 |
ndh1 |
ndh |
1 |
1 |
0 |
0 |
0 |
3 |
0 |
0.3 |
0 |
0 |
2 |
223415888 |
132328227 |
171255299 |
2 |
4 |
0.3 |
10 |
14 |
|
2 |
1 |
120583 |
9862.3 |
28095.5 |
2 |
1 |
0.3 |
2 |
3 |
2 |
3977507 |
2517165 |
2669307 |
2 |
3 |
0.3 |
8 |
11 |
|
3 |
1 |
0 |
0 |
0 |
2 |
0 |
0.3 |
0 |
0 |
2 |
987699 |
1771955 |
1137277 |
3 |
1 |
0.3 |
3 |
4 |
|
4 |
1 |
79129 |
20311.8 |
8114.3 |
2 |
3 |
0.3 |
6 |
9 |
|
2 |
141512 |
5618 |
-28196 |
2 |
1 |
0.3 |
1 |
2 |
Table 2 The parameter values of strata (1 and 2) for domains (1, 2, 3 and 4) in case 1
Source: Statistical computation from original data 2023.
Domain |
Strata |
S2Ydh2 |
S2Xdh2 |
SXYdh2 |
kdh | ndh2 |
Wdh2 |
ndh1 |
ndh |
1 |
1 |
0 |
0 |
0 |
2 |
0 |
0.2 |
0 |
0 |
2 |
256872328 |
152987958 |
197463945 |
3 |
5 |
0.4 |
7 |
12 |
|
2 |
1 |
120583 |
9862.3 |
28095.5 |
2 |
1 |
0.2 |
2 |
3 |
2 |
3977507 |
2517165 |
2669307 |
3 |
4 |
0.4 |
7 |
11 |
|
3 |
1 |
0 |
0 |
0 |
2 |
0 |
0.2 |
0 |
0 |
2 |
987699 |
1771955 |
1137277 |
4 |
2 |
0.4 |
2 |
4 |
|
4 |
1 |
79129 |
20311.8 |
8114.3 |
2 |
2 |
0.2 |
7 |
9 |
|
2 |
141512 |
5618 |
-28196 |
3 |
1 |
0.4 |
1 |
2 |
Table 3 The parameter values of Strata (1 and 2) for domain (1,2,3 and 4) in case 2
Source: Statistical computation from original data 2023.
Domain Parameter |
Domain |
|
|
|
|
|
|
|
Domain Size |
25 |
|
32 |
|
15 |
|
29 |
|
Stratum |
1 |
2 |
1 |
2 |
1 |
2 |
1 |
2 |
Ndh |
2 |
23 |
14 |
18 |
7 |
8 |
20 |
9 |
Wdh |
0.08 |
0.92 |
0.438 |
0.563 |
0.467 |
0.533 |
0.69 |
0.31 |
ˉYdh |
75.5 |
594 |
67.5 |
260.6 |
73 |
315 |
44.55 |
345.2 |
ˉXdh |
9.00 |
67.1 |
10.643 |
34.5 |
10.714 |
40.63 |
6.55 |
41.89 |
S2Ydh |
840.5 |
1551426 |
275.96 |
41200.8 |
369.67 |
51631.7 |
187.21 |
54848.9 |
S2Xdh |
18.00 |
16649.6 |
4.555 |
544.97 |
6.905 |
731.13 |
5.103 |
681.61 |
SXYdh |
123 |
160633.5 |
32.038 |
4559.147 |
49.5 |
6116.143 |
29.839 |
6076.778 |
ρXYdh |
1.00 |
0.999 |
0.904 |
0.962 |
0.98 |
0.995 |
0.965 |
0.994 |
λ12 |
0.003414 |
0.0000163 |
0.0004078 |
0.0000537 |
0.005061 |
0.0001168 |
0.0003394 |
0.0000987 |
λ03 |
0.001047 |
0.0000066 |
0.000086 |
0.0000156 |
0.000813 |
0.0000207 |
0.0000813 |
0.0000208 |
λ04 |
0.250000 |
0.5941080 |
0.407718 |
0.543903 |
0.001194 |
0.417192 |
0.221445 |
0.283596 |
Table 4 The parameter value of the strata for the domains (1, 2, 3 and 4)
Domain |
Strata |
S2Ydh2 |
S2Xdh2 |
SXYdh2 |
kdh |
ndh2 |
Wdh2 |
ndh1 |
ndh |
1 |
1 |
0 |
0 |
0 |
2 |
0 |
0.3 |
0 |
0 |
2 |
2478255 |
26533 |
256331 |
2 |
4 |
0.3 |
10 |
14 |
|
2 |
1 |
373.7 |
4.200 |
35.05 |
2 |
2 |
0.3 |
3 |
5 |
2 |
64541.6 |
875.25 |
7146.75 |
2 |
3 |
0.3 |
6 |
9 |
|
3 |
1 |
0 |
0 |
0 |
2 |
0 |
0.3 |
0 |
0 |
2 |
0 |
0 |
0 |
2 |
0 |
0.3 |
0 |
0 |
|
4 |
1 |
168.16 |
5.018 |
27.945 |
3 |
3 |
0.3 |
8 |
11 |
|
2 |
0 |
0 |
0 |
2 |
0 |
0.3 |
0 |
0 |
Table 5 The parameter values of Strata (1 and 2) for domain (1,2,3 and 4) in case 1
Source: Statistical computation from original data 2023.
Domain |
Strata |
S2Ydh2 |
S2Xdh2 | SXYdh2 | kdh | ndh2 |
Wdh2 |
ndh1 | ndh |
1 |
1 |
0 |
0 |
0 |
2 |
0 |
0.2 |
0 |
0 |
2 |
2654913 |
28395.1 |
274463.7 |
3 |
5 |
0.4 |
8 |
13 |
|
2 |
1 |
176.25 |
1.333 |
9.667 |
2 |
1 |
0.2 |
3 |
4 |
2 |
64184.8 |
874.3 |
7069.214 |
2 |
3 |
0.4 |
5 |
8 |
|
3 |
1 |
0 |
0 |
0 |
2 |
0 |
0.2 |
0 |
0 |
2 |
0 |
0 |
0 |
3 |
0 |
0.4 |
0 |
0 |
|
4 |
1 |
169.778 |
5.511 |
30 |
2 |
2 |
0.2 |
8 |
10 |
|
2 |
0 |
0 |
0 |
3 |
0 |
0.4 |
0 |
0 |
Table 6 Parameter values of strata (1 and 2) for each domain in the case 2
Estimator |
1 |
2 |
3 |
4 |
AMSE |
Case 1 (Population 1) |
|||||
t2j |
1950061 |
74298 |
817651 |
367172 |
802295.5 |
T*DG.st.−1.d |
969486 |
1524915 |
8225052 |
7635549 |
4588751 |
texp1 |
3388887 |
2662270 |
2108968 |
1580771 |
2435224 |
t*cal |
389212 |
68914.02 |
613475 |
41078.01 |
278169.8 |
Case2 (Population 1) |
|||||
t2j |
1724942 |
69614.41 |
771605.6 |
36752.6 |
650728.7 |
T*(DG.st.−1.d)i |
531387 |
334565 |
501732.17 |
1024511 |
598048.8 |
texp1 |
1190074 |
28456.45 |
270455 |
927116.3 |
604025.4 |
t*cal |
31623 |
2178.416 |
35028.05 |
32543.11 |
25343.14 |
Case 1 (Population 2) |
|||||
t2j |
716146 |
115492 |
- |
721 |
208089.8 |
T*DG.st.−1.d |
279503 |
68413 |
- |
409 |
87081.25 |
texp1 |
30790869 |
615419.7 |
- |
246 |
7851634 |
t*cal |
222071 |
21236 |
- |
169 |
60869 |
Case 2 (Population 2) |
|||||
t2j |
11048.4 |
246.8 |
- |
9 |
2826.05 |
T*DG.st.−1.d |
11989.8 |
639.6 |
- |
313 |
3235.6 |
texp1 |
97184 |
942 |
- |
206 |
24583 |
t*cal |
10431 |
127.3 |
- |
4.8 |
2640.775 |
Table 7 MSE of Estimators for domain mean in both cases 1 and 2(Population 1&2)
Note: AMSE, average mean square error.
Source: Statistical computation from original data 2023.
|
D1 |
D2 |
D3 |
D4 |
Estimator |
Case 1 ( Population 1) |
|||
t2j |
19.95897 |
92.75353 |
75.02895 |
11.18767 |
T*DG.st.−1.di |
40.14622 |
4.519204 |
7.458615 |
0.537984 |
texp1 |
11.48495 |
2.588544 |
29.08887 |
2.598606 |
t*cal |
100 |
100 |
100 |
100 |
Case 2( Population 1) |
||||
t2j |
1.833279 |
3.12926 |
4.539631 |
88.54642 |
T*DG.st.−1.di |
5.95103 |
0.651119 |
6.981424 |
3.176453 |
texp1 |
2.65723 |
7.655263 |
12.95153 |
3.510143 |
t*cal |
100 |
100 |
100 |
100 |
Case 1( Population 2) |
||||
t2j |
31.00918 |
18.38742 |
0 |
23.43967 |
T*DG.st.−1.di |
79.4521 |
31.04088 |
0 |
41.32029 |
texp1 |
0.721224 |
3.450653 |
0 |
68.69919 |
t*cal |
100 |
100 |
0 |
100 |
Case 2( Population 2) |
||||
t2j |
94.41186 |
51.58023 |
0 |
53.33333 |
T*DG.st.−1.di |
86.99895 |
19.90306 |
0 |
1.533546 |
texp1 |
10.73325 |
13.5138 |
0 |
2.330097 |
t*cal |
100 |
100 |
0 |
100 |
Table 8 PRE of the estimators for domain mean in both cases 1 and 2(Population 1 and 2)
This study develops the concept of calibration estimator for ratio estimation and proposes calibration ratio estimators of population mean in single stage sampling. The study contributes to the theory of domain estimation in stratified random sampling of the population mean of the study variable with sub-sampling the non-respondents when there is non-response in the study variable and auxiliary variable is free from non-response.
The proposed class of estimators provide opportunity for different known values of the domain population parameters of the auxiliary variable to be incorporated in constructing estimators in the presence of non-response using the concept of calibration. The study revealed that the first constraint is just the sum of the calibration weight equals to one and the third constraint which has to do with the stratum variance also contributes immensely to the efficiency of the proposed estimator. Furthermore, with the adoption of the procedure of sub-sampling the non-respondents even with ratio estimator, the study has reveal that subjecting an estimator to conditions where the study variable is affected by non-response while the auxiliary variable is free of non-response has no effect in the mean estimate.
From the efficiency comparison and empirical work, it becomes pertinent that the use of calibration technique has really paid off in providing estimates of the population mean with sub-sampling the non-respondents that provides greater gains in efficiency better than the existing estimators. This will proffer useful results to users of statistics and researchers when working on economic data that requires the use of auxiliary data either from the records or from previous survey.
However, it could be seen clearly from Table 7 that it was impossible to compute estimates for domain 3 in both cases of population II and hence, the mean square error was not computed. As a result, the PRE was accorded zero value. This is as a result of no sample size for both the respondents and the non-respondents as indicated in Table 6. Future research is encouraged in the light of this through the use of synthetic estimation technique.
None.
The authors declare there is not any conflict of interest.
None.
©2024 Ikot, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7