Calibration for efficiency of ratio estimator in domains of study with sub-sampling the nonrespondents

doi:10.15406/bbij.2024.13.00413

In sample survey, it is expected that the information would be collected from all the selected units in the sample, but practically, it is generally not possible because of non-response. Some of the units may not respond or may not be contacted during the survey period. This work focuses on domain estimation of population mean with sub-sampling the non-respondents. In this study, we consider calibration technique as a method of correcting non-response in domains of study by minimizing the chi-square distance function between the weight of the main estimator and the calibrated weight subject to the formulated constraint on the auxiliary variable. As a result, two estimators are proposed; these are the ratio estimator for domain mean and a ratio estimator for double sampling. Bias and Mean Square Error (MSE) for the proposed estimators are derived.

We have used an auxiliary variable to estimate the population mean assuming that the non-response is observed only on the study variable. The proposed estimators and the existing estimators where compared empirically in the domains with small sampling units and two populations where considered in terms of the MSE and Percentage Relative Efficiency (PRE). We considered two cases where non-responses are uniform in the two strata at approximately (30%) and a case where the non-response rates are different with 20% and 40% in strata 1 and 2 respectively. The proposed estimators are more efficient than the existing estimators.

Keywords: auxiliary variable, calibration, non-response, ratio estimator, sub-sampling, domain

It is obvious that society cannot run effectively on the basis of hunches or trial and error. Decisions based on data will provide better results than those based on intuitions or gut feelings. Statistics is a range of procedures for gathering, organizing, analyzing and presenting quantitative data. In the modern society, the need for statistical information seems endless. In particular, data are regularly collected to satisfy the need for information about specified sets of elements, called as finite population. Statistics helps us to turn data into information. One of the most important modes of data collection for satisfying such needs is sample survey, that is, a partial investigation of the finite population and on the basis of such partial information (sample information) one tries to inference about the finite population characteristics (parameters). Sample survey is less expensive than a complete enumeration, it is usually less time consuming, and may even be more accurate than the method of complete enumeration. The term sample is used for the set of units or portion of the aggregate of material which has been selected with the belief that it will be representative of the whole aggregate. The sampling theory deals with scientific and objective procedure of choosing an appropriate sampling design, i.e. selecting a sample from the population which is representative of the population as a whole and also provides suitable estimation procedure to estimate the population parameters. Most challenging about the sample representation of the population is the effect of non-response on the estimation of the population parameter. Different authors have suggested different techniques for a reliable and efficient estimator, among which is the calibration technique.

Calibration estimation in sample surveys has since its introduction by Deville JC, et al.¹ developed an established theory and method for estimation of finite population parameter. Calibration of weights is a technique that uses population data on auxiliary variables to improve estimates in sample surveys. If auxiliary data are available, some improvement in the precision of estimate may be achieved. Incorporation of auxiliary data in the estimation process is known as calibration. In stratified random sampling, calibration approach is used to obtain optimum strata weights for improving the precision of survey estimates of population parameters. Koyuncu N, et al.² defined some calibration estimators in stratified random sampling for population characteristics and Clement EP, et al.³ applied the concept of calibration estimators for domain totals in stratified random sampling. Clement EP, et al.⁴ combined some scalars with the mean of the auxiliary variable and proposed calibration alternative ratio estimator of mean in stratified sampling.

When a researcher is interested in obtaining information from a local or small area, it becomes challenging with small sample size in some of the areas of interest and even very difficult when non-response occurs. Several authors have made attempts to obtain reliable estimates in such areas of interests popularly called domains of study. Among them is Godwin A, et al.⁵ The author considers modifications of some of the procedures for global ratio estimation in single-phase sampling with sub-sampling the non-respondents proposed by Rao P⁶ to obtain an estimate of mean for a small domain that cuts across constituent strata of a population with unknown weights. The bias and mean-square error of each of the modified estimators were obtained for comparison However, the estimators were not subjected numerical test to validate the analytical claims most importantly in areas of small/zero sample sizes. Unlike,⁶ the population mean of the auxiliary variable adopted by Godwin A, et al.⁵ is assumed to be unknown before the start of the survey and hence double sampling was applied under stratified simple random sampling.

In a bid to improve on the efficiency of the estimators under non-response,^7,8adopted the concept of calibration with a single constraint to estimate the population mean and the result was encouraging. Cochran WG⁹ showed that knowledge of, of domain j that is of interest reduces the variance of the estimator of domain mean in a single-phase simple random design. The reduction in variance is shown to be greater when the proportion of non-domain elements in the population is large and the study variable varies little among the domain elements. Ashutosh¹⁰ proposed estimators for domain mean utilizing stratified sampling with non-response. The proposed estimator was compared to a direct ratio estimator for domain mean utilizing stratified sampling with non-response. Clement EP, et al.⁴ stated that in the presence of powerful auxiliary variables, the calibration estimation meets the objective of reducing both non-response bias and the sampling error. Etebong P¹¹ develops a new approach to ratio estimation that produces a more efficient class of ratio estimators that do not depend on any optimality conditions for optimum performance using calibration weightings. Iseh MJ, et al.,¹² Iseh MJ, et al.,¹³ Iseh MJ, et al.¹⁴ considered the challenges of population mean estimation in small area that is characterized by small or no sample size and in the presence of unit non-response and presents a calibration estimator that produces reliable estimates under stratified random sampling from a class of synthetic estimators using calibration approach with alternative distance measure. To overcome the challenges of poor performance of the ratio estimator in small area occasioned with small/no sample size as a result of non-response, this work considers the calibration approach using the constraints of equal weights adjustment criteria, unbiased estimator of the population mean and variance of the auxiliary variable.

In this paper, based on the attempt by Godwin A, et al.⁵ who suggested the global ratio estimation in single-phase sampling with sub-sampling the non-respondents to obtain an estimate of mean for a small domain that cuts across constituent strata of a population with unknown weights, a new improved ratio estimator for population mean in stratified random sampling is suggested using the theory of calibration estimation with three constraints to achieve optimal precision and efficiency.

Some existing estimator and theoretical underpinnings

This section considers some existing ratio estimators for estimation of domain population mean and the theoretical underpinnings for the proposed ratio estimator. Though not much have been done in the area of domains of study in the presence of non-response probably due to the intricate nature of the estimation, this paper highlights some existing estimators as applicable to domain estimation which applied the concept of sub-sampling the non-respondents.

Some existing estimator

Study notations and definitions

$N =$ population size under study

$N_{d} =$ population size for the $d^{t h}$ domain

$N_{d h} =$ population size of $h^{t h}$ stratum in $d^{t h}$ domain

$n_{d h} =$ sample size for the $d^{t h}$ domain in the $h^{t h}$ stratum

$n_{d h} =$ domain sample Size

$n_{1 d h} =$ sample size for respondent units for the $d^{t h}$ domain in the $h^{t h}$ Stratum

$n_{2 d h} =$ Sample size for nonrespondents units for the $d^{t h}$ domain in the $h^{t h}$ Stratum

$W_{d} h^{*} =$ The calibration weight

$W_{d h} =$ Stratum weight

$W_{d h 1}$ = Response rate of the $d^{t h}$ domain in the $h^{t h}$ Stratum

$W_{d h 2} =$ Non-response rate of the $d^{t h}$ domain in the $h^{t h}$ Stratum

$λ_{1}, λ_{2}$ and $λ_{3}$ = the LaGrange multipliers

$X =$ Auxiliary variable

$Y =$ Study variable

${\bar{x}}_{d h} =$ Sample mean for the $d^{t h}$ domain in the $h^{t h}$ Stratum of the auxiliary variable

${\bar{y}}^{*}_{d h} =$ Unbiased estimator of the population mean for the $d^{t h}$ domain in the $h^{t h}$ Stratum of the study variable

${\bar{X}}_{d h} =$ Population mean for $d^{t h}$ domain of the auxiliary variable in the $h^{t h}$ Stratum

${\bar{Y}}_{d h} =$ Population mean for $d^{t h}$ domain of the study variable in the $h^{t h}$ Stratum

${\bar{X}}_{d} =$ Population mean for $d^{t h}$ domain of the auxiliary variable

${\bar{Y}}_{d} =$ Population mean for $d^{t h}$ domain of the study variable

$S_{y d h}^{2} =$ Mean square of the $d^{t h}$ domain in the $h^{t h}$ Stratum of the study variable

$S_{x d h}^{2} =$ Mean square of the $d^{t h}$ domain in the Stratum of the $h^{t h}$ auxilliary variable

$C_{x d h} =$ Coefficient of variation for the $d^{t h}$ domain in the $h^{t h}$ Stratum of the auxilliary variable

$C_{y d h} =$ Coefficient of variation for the $d^{t h}$ domain in the $h^{t h}$ Stratum of the study variable

$S_{y d h 2}^{2} =$ Mean square of non-respondence of the $d^{t h}$ domain in the $h^{t h}$ Stratum of the study variable

$k_{d h} =$ Inverse sampling rate

$Q_{d h} =$ Tuning parameter

Udofia (2004) estimator

An alternative ratio estimator for domain mean was suggested by [5] is as follows:

$t_{2 j} = \sum_{h}^{k} W_{h} \frac{{\bar{y}}_{h}^{*}}{{\bar{x}}_{h}} {\bar{X}}_{h}$ (1)

With

$B i a s (t_{2 j}) = \sum_{h = 1}^{k} W_{h} \frac{1 - f_{h}}{n_{h} {\bar{X}}_{h}} (R_{h} S_{x_{h}}^{2} - S_{x_{h} y_{h}})$

and

$M S E (t_{2 j}) = \sum_{h = 1}^{k} W_{h}^{2} [\frac{1 - f_{h}}{n_{h}} (S_{y h}^{2} + R_{h}^{2} S_{x_{h}}^{2} - 2 R_{h} S_{x_{h} y_{h}} + \frac{W_{2 h} (k - 1)}{n_{h}} S_{2 y_{h}}^{2})]$ (2)

where

$R_{h} = \frac{{\bar{Y}}_{h}}{{\bar{X}}_{h}}, f_{h} = (\frac{1}{n_{h}} - \frac{1}{N_{h}})$

Pal and Singh HP estimator

Pal and Singh¹⁵ proposed a class of ratio-cum-ratio-type exponential estimators for population mean with sub sampling the non-respondents. The estimator and the mean square error is given as:

$t_{p s 1} = α {\bar{y}}^{*} (\frac{\bar{X}}{\bar{x}}) + (1 - α) {\bar{y}}^{*} \exp (\frac{\bar{X} - \bar{x}}{\bar{X} + \bar{x}})$

And

$M S E (t_{p s 1}) = {\bar{Y}}^{2} (λ C_{y}^{2} (1 - ρ_{x y}^{2}) + \frac{W_{2} (Z - 1)}{n} C_{y (2)}^{2})$ (3)

Where

$W_{2} = \frac{n_{2}}{n}, λ = \frac{1 - f}{n}, f = \frac{n}{N}$ and α is a constant

Ashutosh estimator

Ashutosh¹⁰ proposed a direct ratio generalized estimator for domain mean through stratified sampling with non-response as;

$T_{D G . s t . β . d} = {\bar{y}}_{s t . d} {[\frac{{\bar{x}}_{s t . d}}{{\bar{X}}_{s t . d}}]}^{β}$

Where β is a chosen constant of $d^{t h}$ domain mean of x and the value of y respondents can be written as;

$\begin{array}{l} {\bar{y}}_{s t . d} = \sum_{h = 1}^{H} W_{h . d} {\bar{y}}_{h . d} \\ {\bar{x}}_{s t . d} = \sum_{h = 1}^{H} W_{h . d} {\bar{x}}_{h . d} \end{array}$

Members of the proposed estimators $T_{D G . s t . β . d}^{*}$

$T_{D G . s t . β . d}^{*} = {\bar{y}}_{s t . a}^{*}$ if $β = 0$

$T_{D G . s t . - 1. a}^{*} = \frac{{\bar{y}}_{s t . a}^{*}}{{\bar{x}}_{s t . a}^{*}} {\bar{X}}_{h . a}$ if $β = - 1$

$T_{D G . s t .1. a}^{*} = {\bar{y}}_{s t . a}^{*} \frac{{\bar{x}}_{s t . a}^{*}}{{\bar{x}}_{s t . a}^{*}}$ if $β = 1$

$T_{D G . s t .2. a}^{*} = {\bar{y}}_{s t . a}^{*} {[\frac{{\bar{x}}_{s t . a}^{*}}{{\bar{x}}_{s t . a}^{*}}]}^{2}$ if $β = 2$

Bias and Mean Square Error of $T_{D G . s t . - 1. a}^{*}$ is given as;

$B i a s (T_{D G . s t . - 1. a}^{*}) = \sum_{h = 1}^{H} W_{h . a} {\bar{Y}}_{h . a} [\frac{N_{h . a} - n_{h . a}}{N_{h . a} n_{h . a}} C_{X h . a}^{2} + \frac{(g_{h . a} - 1) W_{2 h . a}}{n_{h . a}} C_{2 Y h . a}^{2}] - {\bar{Y}}_{a}$

$M S E (T_{D G . s t . - 1 a}^{*}) = \sum_{h = 1}^{H} W_{h . a}^{2} {\bar{Y}}_{h . a}^{2} [\begin{array}{l} \frac{N_{h . a} - n_{h . a}}{N_{h . a} n_{h a}} (C_{Y h . a}^{2} + C_{X h . a}^{2} - 2 C_{Y X h a}) + \\ \frac{(g_{h . a} - 1) W_{2 h . a}}{n_{h . a}} (C_{2 Y h . a}^{2} + C_{2 X h . a}^{2} - 2 C_{2 Y X h a}) \end{array}]$ (4)

Sampling design in single phase

Let $π = {U_{1}, U_{2}, ..., U_{N}}$ denote a finite population, the elements of which fall into L known strata with $N_{d h}$ elements the $h^{t h}$ stratum, $h = 1, 2, ..., L, \sum^{} h N_{d h} = N_{d}$ . It is assumed that π can also be partitioned according to the distribution of variable Z into exhaustive set of D sub-populations or domains of study that is denoted by ${A_{d}^{*}; d = 1, 2, ..., D}$ . Each stratum consist of a substratum of $N_{1 d h}$ respondents and a substratum of $N_{2 d h}$ non-respondents, $N_{1 d h} + N_{2 d h} = N_{d h}$ for all h. Let $A_{d h}^{*}$ denote the part of domain $d (A_{d}^{*})$ in stratum h and $N_{d h j}$ the unknown number of elements in $A_{d h}^{*}$ . Let $y_{d h j}$ denote the value of characteristic Y for element i in $A_{d h}^{*}$ .

Proposed estimator

Calibration has been proven to be an estimation technique to smoothen an existing estimator for a better precision and an improved efficiency. For household survey and other economic data that requires knowledge of the supplementary information, a new ratio estimator is suggested to enhance efficiency in domains of study even in the presence of non-response. Motivated by [5] in an Alternative Ratio Estimator for domain mean, we proposed the following estimator:

$t_{c a l}^{*} = \sum_{h = 1}^{L} W_{d h}^{*} \frac{{\bar{y}}_{d h}^{*}}{{\bar{x}}_{d h}} {\bar{X}}_{d h}$ (5)

(5) can be written as

$t_{c a l}^{*} = \sum_{h = 1}^{L} W_{d h}^{*} {\bar{y}}_{d h r}$ (6)

where

${\bar{y}}_{d h r} = r_{d h} {\bar{X}}_{d h}$

and

$r_{d h} = \frac{{\bar{y}}_{d h}^{*}}{{\bar{x}}_{d h}}, {\bar{X}}_{d h}$ is assume to be known and $W_{d h}^{*}$ is the calibration weight aimed at adjusting the existing weight in [5] estimators using a chi-square distance measure.

$φ = \frac{\sum_{h = 1}^{L} {(W_{d h}^{*} - W_{d h})}^{2}}{Q_{d h} W_{d h}}$

Subject to the following constraints

$\begin{array}{l} \sum_{h = 1}^{L} W_{d h}^{*} = 1 \\ \sum_{h = 1}^{L} W_{d h}^{*} {\bar{x}}_{d h} = \sum_{h = 1}^{L} W_{d h} {\bar{X}}_{d h} \\ \sum_{h = 1}^{L} W_{d h}^{*} s_{d h}^{2} = \sum_{h =}^{L} W_{d h} S_{d h}^{2} \end{array}$

Thus the optimization problem is given by:

$\begin{array}{l} φ = \frac{\sum_{h = 1}^{L} {(W_{d h}^{*} - W_{d_{h}})}^{2}}{Q_{d h} W_{d h}} - 2 λ_{1} (\sum_{h = 1}^{L} W_{d h}^{*} - 1) - 2 λ_{2} (\sum_{h = 1}^{L} W_{d h}^{*} {\bar{x}}_{d h} - \sum_{h = 1}^{L} W_{d h} {\bar{X}}_{d h}) \\ - 2 λ_{3} (\sum_{h = 1}^{L} W_{d h}^{*} s_{d h}^{2} - \sum_{h =}^{L} W_{d h} S_{d h}^{2}) \end{array}$

where $λ_{1}, λ_{2}$ and $λ_{3}$ are the Lagrange multipliers such that

$\frac{\partial φ}{\partial W_{d h}^{*}} = \frac{2 (W_{d h}^{*} - W_{d_{h}})}{Q_{d h} W_{d h}} - 2 λ_{1} - 2 λ_{2} {\bar{x}}_{d h} - 2 λ_{3} s_{d h}^{2} = 0$

$\Rightarrow W_{d h}^{*} = W_{d h} + Q_{d h} W_{d h} (λ_{1} + λ_{2} {\bar{x}}_{d h} + λ_{3} s_{d h}^{2})$

Substituting $W_{d h}^{*}$ in Eq. 6 gives

$\begin{array}{l} {\hat{\bar{t}}}_{c a l} = \sum_{h = 1}^{L} W_{d h} {\bar{y}}_{d h r} + β_{1 (d h} (1 - \sum_{h = 1}^{L} W_{d h}) + β_{2 (d h)} (\sum_{h = 1}^{L} W_{d h} ({\bar{X}}_{d h} - {\bar{x}}_{d h})) \\ + β_{3 (d h)} (\sum_{h = 1} W_{d h} (S_{d h}^{2} - s_{d h}^{2})) \end{array}$ (7)

Where

$β_{1 (d_{h})} = [\begin{array}{l} (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{y}}_{d h r}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{4}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{y}}_{d h r}) {(\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2})}^{2} - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} {\bar{y}}_{d h r}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{4}) + \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2} {\bar{y}}_{d h r}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} {\bar{y}}_{d h r}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2} {\bar{y}}_{d h r}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}^{2}) \frac{}{} \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h}) {(\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2})}^{2} - \\ {(\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h})}^{2} (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{4}) + (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) + \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) - \\ {(\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2})}^{2} (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}^{2}) \end{array}]$

$β_{2 (d_{h})} = [\begin{array}{l} (\sum_{h = 1}^{L} Q_{d h} W_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} {\bar{y}}_{d h r}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{4}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2} {\bar{y}}_{d h r}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{y}}_{d h r}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{4}) + \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{y}}_{d h r}) {(\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2})}^{2} + \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2} {\bar{y}}_{d h r}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2} {\bar{y}}_{d h r}) \frac{}{} \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h}) {(\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2})}^{2} - \\ {(\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h})}^{2} (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{4}) + (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) + \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) - \\ {(\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2})}^{2} (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}^{2}) \end{array}]$

$β_{3 d_{h})} = [\begin{array}{l} (\sum_{h = 1}^{L} Q_{d h} W_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2} {\bar{y}}_{d h r}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} {\bar{y}}_{d h r}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2} {\bar{y}}_{d h r}) + \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} {\bar{y}}_{d h r}) + \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{y}}_{d h r}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{y}}_{d h r}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}^{2}) \frac{}{} \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) - \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h}) {(\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2})}^{2} - \\ {(\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h})}^{2} (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{4}) + (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) + \\ (\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}) (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h} s_{d h}^{2}) - \\ {(\sum_{h = 1}^{L} Q_{d h} W_{d h} s_{d h}^{2})}^{2} (\sum_{h = 1}^{L} Q_{d h} W_{d h} {\bar{x}}_{d h}^{2}) \end{array}]$

Bias and variance of the proposed estimator

From the proposed estimator above

Let

$\begin{array}{l} e_{0} = \frac{({\bar{y}}_{d h}^{*} - {\bar{Y}}_{d h})}{{\bar{Y}}_{d h}} \\ \Rightarrow W_{d h}^{*} = W_{d h} + Q_{d h} W_{d h} (λ_{1} + λ_{2} {\bar{x}}_{d h} + λ_{3} s_{d h}^{2}) \end{array}$ ,

$e_{1} = \frac{({\bar{x}}_{d h} - {\bar{X}}_{d h})}{{\bar{X}}_{d h}}$ ,

$e_{2} = \frac{(s_{x d h}^{2} - S_{x d h}^{2})}{S_{x d h}^{2}}$

Where

${\bar{y}}_{d h}^{*} = \frac{\sum_{h = 1}^{L} y_{d h}^{*}}{n_{d h}}$ , ${\bar{x}}_{d h} = \frac{\sum_{h = 1}^{L} x_{d h}}{n_{d h}}$ , ${\bar{X}}_{d h} = \frac{\sum_{h = 1}^{L} X_{d h}}{N_{d h}}$ , $s_{d h}^{2} = \frac{\sum_{h = 1}^{L} {({\bar{x}}_{d h} - {\bar{X}}_{d h})}^{2}}{n_{d h} - 1}$ and $S_{d h}^{2} = \frac{\sum_{h = 1}^{L} {({\bar{X}}_{d h} - \bar{X})}^{2}}{N_{d h} - 1}$

Also,

$\begin{array}{l} {\bar{y}}_{d h}^{*} = {\bar{Y}}_{d h} (1 + e_{0}) \\ {\bar{x}}_{d h} = {\bar{X}}_{d h} (1 + e_{1}) \\ s_{x d h}^{2} = S_{x d h}^{2} (1 + e_{2}) \end{array}$

Let

$\begin{array}{l} E [e_{0}^{2}] = \frac{V a r ({\bar{y}}_{d h}^{*})}{{\bar{Y}}_{d h}^{2}} = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) C_{y_{d h}}^{2} + \frac{(K_{d h} - 1)}{n_{d h 2}} W_{d h 2} C_{y d h 2}^{2} \\ = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) \frac{S_{y d h}^{2}}{{\bar{Y}}_{d h}^{2}} + \frac{(K_{d h} - 1)}{n_{d h} {\bar{Y}}_{d h}^{2}} W_{d h 2} S_{y d h 2}^{2} \\ E [e_{1}^{2}] = \frac{V a r ({\bar{x}}_{d h})}{{\bar{X}}_{d h}^{2}} = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) C_{x d h}^{2} = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) \frac{S_{x d h}^{2}}{{\bar{X}}_{d h}^{2}} \\ E [e_{2}^{2}] = \frac{V a r (s_{x d h}^{2})}{S_{x d}^{2}} = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) \frac{S_{x d h}^{4}}{S_{x d h}^{2}} = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2} \\ E [e_{0} e_{1}] = \frac{C O V ({\bar{x}}_{d h,} {\bar{y}}_{d h}^{*})}{{\bar{X}}_{d h} {\bar{Y}}_{d h}} = \frac{1}{{\bar{X}}_{d h} {\bar{Y}}_{d h}} [C (E [{\bar{x}}_{d h}], E [{\bar{y}}_{d h}^{*}])] \\ = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) ρ_{x y} C_{y d h} C_{x d h} \\ = \frac{1}{{\bar{X}}_{d h} {\bar{Y}}_{d h}} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) ρ_{x y} S_{x d h} S_{y d h} \\ ∴ E [e_{0}] = E [e_{1}] = E [e_{2}] = 0 \\ E [e_{1} e_{2}] = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) C_{x d h} λ_{03} = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) \frac{S_{x d h}}{{\bar{X}}_{d h}} λ_{03} \end{array}$

where

$λ_{r s} = \frac{μ_{r s}}{μ_{20}^{r / 2} μ_{02}^{s / 2}}$

And

$\begin{array}{l} μ_{r s} = \frac{1}{N_{d h} - 1} \sum_{i = 1}^{N} {(Y_{d h i} - {\bar{Y}}_{d h})}^{r} {(X_{d h i} - {\bar{X}}_{d h})}^{s} \\ μ_{20} = S_{y d h}^{2} \\ μ_{02} = S_{x d h}^{2} \end{array}$

Hence

$λ_{03} = \frac{μ_{03}}{μ_{20}^{0 / 2} μ_{02}^{3 / 2}}$

$E [e_{0} e_{2}] = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) C_{y d h} λ_{12} = (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) \frac{S_{y d h}}{{\bar{Y}}_{d h}} λ_{12}$

Where $λ_{12} = \frac{μ_{12}}{μ_{20}^{1 / 2} μ_{02}}$

${\bar{y}}_{d h r} = \frac{{\bar{y}}_{d h}^{*}}{{\bar{x}}_{d h}} {\bar{X}}_{d h}$

$= {\bar{Y}}_{d h} (1 + e_{0} - e_{1} + e_{1}^{2} - e_{0} e_{1})$

To obtain the bias

$\begin{array}{l} B (t_{c a l}^{*}) = E [t_{c a l}^{*} - {\bar{Y}}_{d}] \\ = E [\begin{array}{l} \sum_{h = 1}^{L} W_{d h} [{\bar{Y}}_{d h} (1 + e_{0} - e_{1} + e_{1}^{2} - e_{0} e_{1})] \\ - β_{2 (d h)} \sum_{h = 1}^{L} W_{d h} {\bar{X}}_{d h} e_{1} - β_{3 (d h)} \sum_{h = 1}^{L} W_{d h} S_{x d h}^{2} e_{2} - {\bar{Y}}_{d} \end{array}] \\ = E [\begin{array}{l} \sum_{h = 1}^{L} W_{d h} [{\bar{Y}}_{d h} (e_{0} - e_{1} + e_{1}^{2} - e_{0} e_{1})] \\ - β_{2 (d h)} \sum_{h = 1}^{L} W_{d h} {\bar{X}}_{d h} e_{1} - β_{3 (d h)} \sum_{h = 1}^{L} W_{d h} S_{x d h}^{2} e_{2} \end{array}] \\ = \sum_{h = 1}^{L} W_{d h} {\bar{Y}}_{d h} [(E (e_{1}^{2}) - E (e_{0} e_{1}))] \\ B (t_{c a l}^{*}) = \sum_{h = 1}^{L} W_{d h} {\bar{Y}}_{d h} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) \frac{S_{x d h}^{2}}{{\bar{X}}_{d h}^{2}}] - \\ \sum_{h = 1}^{L} W_{d h} {\bar{Y}}_{d h} [\frac{1}{{\bar{X}}_{d h} {\bar{Y}}_{d h}} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) ρ_{x y} S_{x d h} S_{y d h}] \end{array}$ (8)

$= E {[\begin{array}{l} \sum_{h = 1}^{L} W_{d h} [{\bar{Y}}_{d h} (1 + e_{0} - e_{1} + e_{1}^{2} - e_{0} e_{1})] - \\ β_{2 (d h)} \sum_{h = 1}^{L} W_{d h} {\bar{X}}_{d h} e_{1} - β_{3 (d h)} \sum_{h = 1}^{L} W_{d h} S_{x d h}^{2} e_{2} - {\bar{Y}}_{d} \end{array}]}^{2}$ ignoring terms with power >2

$\begin{array}{l} M S E (t_{c a l}^{*}) = \sum_{h = 1}^{L} W_{d h}^{2} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{y d h}^{2} + \frac{(K_{d h} - 1)}{n_{d h}} W_{d h 2} S_{y d h 2}^{2}] \\ - 2 \sum_{h = 1}^{L} W_{d h}^{2} [\frac{{\bar{Y}}_{d h}}{{\bar{X}}_{d h}} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) ρ_{x y} S_{x d h} S_{y d h}] - \\ 2 β_{2 d h} \sum_{h = 1}^{L} W_{d h}^{2} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) ρ_{x y} S_{x d h} S_{y d h}] - \\ 2 β_{3 (d h)} \sum_{h = 1}^{L} W_{d h}^{2} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2} S_{y d h} λ_{12}] \\ + \sum_{h = 1}^{L} W_{d h}^{2} \frac{{\bar{Y}}_{d h}^{2}}{{\bar{X}}_{d h}^{2}} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2} + \\ 2 β_{2 d h} \sum_{h = 1}^{L} W_{d h}^{2} \frac{{\bar{Y}}_{d h}}{{\bar{X}}_{d h}} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2} + \\ 2 β_{3 (d h)} \sum_{h = 1}^{L} W_{d h}^{2} \frac{{\bar{Y}}_{d h}}{{\bar{X}}_{d h}} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{3} λ_{03}] \\ + β_{2 d h}^{2} \sum_{h = 1}^{L} W_{d h}^{2} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2} + \\ β_{3 d h}^{2} \sum_{h = 1}^{L} W_{d h}^{2} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{d h}^{4} (λ_{04} - 1)] \end{array}$ (9)

To obtain minimum variance, we differentiate (9) partially with respect to and

Such that

$β_{2 (d h)} = \frac{\sum_{h = 1}^{L} W_{d h}^{2} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) ρ_{x y} S_{x d h} S_{y d h}] - \sum_{h = 1}^{L} W_{d h}^{2} \frac{{\bar{Y}}_{d h}}{{\bar{X}}_{d h}} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2}}{\sum_{h = 1}^{L} W_{d h}^{2} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2}}$ (10)

$β_{3 (d h)} = \frac{\sum_{h = 1}^{L} W_{d h}^{2} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2} S_{y d h} λ_{12}] - \sum_{h = 1}^{L} W_{d h}^{2} \frac{{\bar{Y}}_{d h}}{{\bar{X}}_{d h}} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{3} λ_{03}]}{\sum_{h = 1}^{L} W_{d h}^{2} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{d h}^{4} (λ_{04} - 1)]}$ (11)

$\begin{array}{l} m i n M S E (t_{c a l}^{*}) = \sum_{h = 1}^{L} W_{d h}^{2} [(\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{y d h}^{2} + \frac{(K_{d h} - 1)}{n_{d h 2}} W_{d h 2} S_{y d h 2}^{2}] \\ - 2 \sum_{h = 1}^{L} W_{d h}^{2} \frac{{\bar{Y}}_{d h}}{{\bar{X}}_{d h}} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) ρ_{x y} S_{x d h} S_{y d h} + \end{array}$

$\sum_{h = 1}^{L} W_{d h}^{2} \frac{{\bar{Y}}_{d h}}{{\bar{X}}_{d h}} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2} - \frac{{[\sum_{h = 1}^{L} W_{d h}^{2} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) ρ_{x y} S_{x d h} S_{y d h} - \sum_{h = 1}^{L} W_{d h}^{2} \frac{{\bar{Y}}_{d h}}{{\bar{X}}_{d h}} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2}]}^{2}}{\sum_{h = 1}^{L} W_{d h}^{2} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2}}$

$\frac{{[\sum_{h = 1}^{L} W_{d h}^{2} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{2} S_{y d h} λ_{12} - \sum_{h = 1}^{L} W_{d h}^{2} \frac{{\bar{Y}}_{d h}}{{\bar{X}}_{d h}} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{x d h}^{3} λ_{03}]}^{2}}{\sum_{h = 1}^{L} W_{d h}^{2} (\frac{1}{n_{d h}} - \frac{1}{N_{d h}}) S_{d h}^{4} (λ_{04} - 1)}$ (12)

Equation (12) is the minimum variance for the proposed estimator

Percentage relative efficiency of the estimators

The percentage relative efficiency of the proposed estimators with respect to the existing estimators is given as:

$P R E = \frac{M S E (P)}{M S E (E)} \times 100$

Empirical study

We take the Sweden municipalities MU284,¹⁶ (appendix B). The population is geographically sub-divided (domain) into eight different parts 1, 2, 3, 4, 5, 6, 7 and 8 having their sizes 25, 48, 32, 38, 56, 41, 15 and 29 respectively. However, we considered only four domains 1, 3, 7 and 8 because these domains have small units compared to other domains. The proposed estimator is a calibration estimator. Variables like $n_{d h 1}$ and $n_{d h}$ were computed based on existing information from the populations. Then each of the domains is classified into homogeneous groups according to our convenient into two strata: value of below 1500 (millions of kronor) and above 1500 (millions of kronor). We consider two cases 1 and 2 of non-response (in both Population I and Population II).

Case 1: If non-respondents are available in both strata (1 and 2) as well as in the domains (approximately 30%).

Case 2: If different non-respondents are available in both strata 1 and 2 approximately 20% and 40% respectively.

Population I

Y: Real estate values according to 1984 assessment (in millions of kronor).

X: Total number of municipal employees in 1984.

Population II

Another population is considered ([16] appendix B) which is classified in to four domains with stratum 1 and 2 according to the revenues less than 100 (in millions of kronor) and revenues above 100 (in millions of kronor).

Y: Revenues of 1985 municipal taxation assessment (in millions of kronor).

X: 1985 population (in thousands).

This discussion is based on the empirical analysis carried out and results presented in Tables 1–7. From Table 7 (Populations I and II) with respect to single stage sampling (MSE of estimators for domain mean), it is observed that the mean square error of the proposed estimator is less than the MSE of the existing estimators in all the domains. This is seen in both cases of non-response where the non-response rate was uniform across the strata and where it was non-uniform as specified in the data. The Average Mean Squared Errors (AMSE) also confirms the behavior of the MSE in both populations and cases. From Table 8 (Populations I and II) with respect to single stage sampling, it is observed that the Percentage Relative Efficiency (PRE) for the proposed estimators kept at a benchmark of 100% had greater gains in efficiency than the existing estimators for all the domains.

Domain Parameter	Domain
Domain Size	25		32		15		29
Stratum	1	2	1	2	1	2	1	2
$N_{d h}$	2	23	12	20	2	13	18	11
$W_{d h}$	0.080	0.920	0.375	0.625	0.133	0.867	0.620	0.379
${\bar{Y}}_{d h}$	955.50	6888	1056.9	3364	1231	4020	723.2	4799
${\bar{X}}_{d h}$	529	4385	485.4	1816	493	1694	354.1	2205
$S_{Y d h}^{2}$	40.5	136775663	71715.5	4652460	135721	5643626	81863.7	10236271
$S_{X d h}^{2}$	101250	81259476	29239.7	2530489	162	2354475	18128.3	2902966
$S_{X Y d h}$ ;	-2025	104701660	34761.1	3144594	-4689	2999017	13306	3557068
$ρ_{X Y d h}$	-1.000	0.993	0.759	0.916	-1.000	0.823	0.345	0.653

Table 1 Value of parameters of the strata (1 and 2) and domains
Source: Statistical computation from original data 2023.

Domain	Strata	$S_{Y d h 2}^{2}$	$S_{X d h 2}^{2}$	$S_{X Y d h 2}$	$K_{d h}$	$n_{d h 2}$	$w_{d h 2}$	$n_{d h 1}$	$n_{d h}$
1	1	0	0	0	3	0	0.3	0	0
	2	223415888	132328227	171255299	2	4	0.3	10	14
2	1	120583	9862.3	28095.5	2	1	0.3	2	3
	2	3977507	2517165	2669307	2	3	0.3	8	11
3	1	0	0	0	2	0	0.3	0	0
	2	987699	1771955	1137277	3	1	0.3	3	4
4	1	79129	20311.8	8114.3	2	3	0.3	6	9
	2	141512	5618	-28196	2	1	0.3	1	2

Table 2 The parameter values of strata (1 and 2) for domains (1, 2, 3 and 4) in case 1
Source: Statistical computation from original data 2023.

Domain	Strata	$S_{Y d h 2}^{2}$	$S_{X d h 2}^{2}$	$S_{X Y d h 2}$	$k_{d h}$	$n_{d h 2}$	$W_{d h 2}$	$n_{d h 1}$	$n_{d h}$
1	1	0	0	0	2	0	0.2	0	0
	2	256872328	152987958	197463945	3	5	0.4	7	12
2	1	120583	9862.3	28095.5	2	1	0.2	2	3
	2	3977507	2517165	2669307	3	4	0.4	7	11
3	1	0	0	0	2	0	0.2	0	0
	2	987699	1771955	1137277	4	2	0.4	2	4
4	1	79129	20311.8	8114.3	2	2	0.2	7	9
	2	141512	5618	-28196	3	1	0.4	1	2

Table 3 The parameter values of Strata (1 and 2) for domain (1,2,3 and 4) in case 2
Source: Statistical computation from original data 2023.

Domain Parameter	Domain
Domain Size	25		32		15		29
Stratum	1	2	1	2	1	2	1	2
$N_{d h}$	2	23	14	18	7	8	20	9
$W_{d h}$	0.08	0.92	0.438	0.563	0.467	0.533	0.69	0.31
${\bar{Y}}_{d h}$	75.5	594	67.5	260.6	73	315	44.55	345.2
${\bar{X}}_{d h}$	9.00	67.1	10.643	34.5	10.714	40.63	6.55	41.89
$S_{Y d h}^{2}$	840.5	1551426	275.96	41200.8	369.67	51631.7	187.21	54848.9
$S_{X d h}^{2}$	18.00	16649.6	4.555	544.97	6.905	731.13	5.103	681.61
$S_{X Y d h}$	123	160633.5	32.038	4559.147	49.5	6116.143	29.839	6076.778
$ρ_{X Y d h}$	1.00	0.999	0.904	0.962	0.98	0.995	0.965	0.994
$λ_{12}$	0.003414	0.0000163	0.0004078	0.0000537	0.005061	0.0001168	0.0003394	0.0000987
$λ_{03}$	0.001047	0.0000066	0.000086	0.0000156	0.000813	0.0000207	0.0000813	0.0000208
$λ_{04}$	0.250000	0.5941080	0.407718	0.543903	0.001194	0.417192	0.221445	0.283596

Table 4 The parameter value of the strata for the domains (1, 2, 3 and 4)

Domain	Strata	$S_{Y d h 2}^{2}$	$S_{X d h 2}^{2}$	$S_{X Y d h 2}$	$k_{d h}$	$n_{d h 2}$	$W_{d h 2}$	$n_{d h 1}$	$n_{d h}$
1	1	0	0	0	2	0	0.3	0	0
	2	2478255	26533	256331	2	4	0.3	10	14
2	1	373.7	4.200	35.05	2	2	0.3	3	5
	2	64541.6	875.25	7146.75	2	3	0.3	6	9
3	1	0	0	0	2	0	0.3	0	0
	2	0	0	0	2	0	0.3	0	0
4	1	168.16	5.018	27.945	3	3	0.3	8	11
	2	0	0	0	2	0	0.3	0	0

Table 5 The parameter values of Strata (1 and 2) for domain (1,2,3 and 4) in case 1
Source: Statistical computation from original data 2023.

Domain	Strata	$S_{Y d h 2}^{2}$	$S_{X d h 2}^{2}$	$S_{X Y d h 2}$	$k_{d h}$	$n_{d h 2}$	$W_{d h 2}$	$n_{d h 1}$	$n_{d h}$
1	1	0	0	0	2	0	0.2	0	0
	2	2654913	28395.1	274463.7	3	5	0.4	8	13
2	1	176.25	1.333	9.667	2	1	0.2	3	4
	2	64184.8	874.3	7069.214	2	3	0.4	5	8
3	1	0	0	0	2	0	0.2	0	0
	2	0	0	0	3	0	0.4	0	0
4	1	169.778	5.511	30	2	2	0.2	8	10
	2	0	0	0	3	0	0.4	0	0

Table 6 Parameter values of strata (1 and 2) for each domain in the case 2

Estimator	1	2	3	4	AMSE
	Case 1 (Population 1)
$t_{2 j}$	1950061	74298	817651	367172	802295.5
$T_{D G . s t . - 1. d}^{*}$	969486	1524915	8225052	7635549	4588751
$t_{\exp 1}$	3388887	2662270	2108968	1580771	2435224
$t_{c a l}^{*}$	389212	68914.02	613475	41078.01	278169.8
	Case2 (Population 1)
$t_{2 j}$	1724942	69614.41	771605.6	36752.6	650728.7
$T_{(D G . s t . - 1. d)}^{*} i$	531387	334565	501732.17	1024511	598048.8
$t_{\exp 1}$	1190074	28456.45	270455	927116.3	604025.4
$t_{c a l}^{*}$	31623	2178.416	35028.05	32543.11	25343.14
	Case 1 (Population 2)
$t_{2 j}$	716146	115492	-	721	208089.8
$T_{D G . s t . - 1. d}^{*}$	279503	68413	-	409	87081.25
$t_{\exp 1}$	30790869	615419.7	-	246	7851634
$t_{c a l}^{*}$	222071	21236	-	169	60869
	Case 2 (Population 2)
$t_{2 j}$	11048.4	246.8	-	9	2826.05
$T_{D G . s t . - 1. d}^{*}$	11989.8	639.6	-	313	3235.6
$t_{\exp 1}$	97184	942	-	206	24583
$t_{c a l}^{*}$	10431	127.3	-	4.8	2640.775

Table 7 MSE of Estimators for domain mean in both cases 1 and 2(Population 1&2)
Note: AMSE, average mean square error.
Source: Statistical computation from original data 2023.

	D₁	D₂	D₃	D₄
Estimator	Case 1 ( Population 1)
$t_{2 j}$	19.95897	92.75353	75.02895	11.18767
$T_{D G . s t . - 1. d^{i}}^{*}$	40.14622	4.519204	7.458615	0.537984
$t_{\exp 1}$	11.48495	2.588544	29.08887	2.598606
$t_{c a l}^{*}$	100	100	100	100
	Case 2( Population 1)
$t_{2 j}$	1.833279	3.12926	4.539631	88.54642
$T_{D G . s t . - 1. d^{i}}^{*}$	5.95103	0.651119	6.981424	3.176453
$t_{\exp 1}$	2.65723	7.655263	12.95153	3.510143
$t_{c a l}^{*}$	100	100	100	100
	Case 1( Population 2)
$t_{2 j}$	31.00918	18.38742	0	23.43967
$T_{D G . s t . - 1. d^{i}}^{*}$	79.4521	31.04088	0	41.32029
$t_{\exp 1}$	0.721224	3.450653	0	68.69919
$t_{c a l}^{*}$	100	100	0	100
	Case 2( Population 2)
$t_{2 j}$	94.41186	51.58023	0	53.33333
$T_{D G . s t . - 1. d^{i}}^{*}$	86.99895	19.90306	0	1.533546
$t_{\exp 1}$	10.73325	13.5138	0	2.330097
$t_{c a l}^{*}$	100	100	0	100

Table 8 PRE of the estimators for domain mean in both cases 1 and 2(Population 1 and 2)

This study develops the concept of calibration estimator for ratio estimation and proposes calibration ratio estimators of population mean in single stage sampling. The study contributes to the theory of domain estimation in stratified random sampling of the population mean of the study variable with sub-sampling the non-respondents when there is non-response in the study variable and auxiliary variable is free from non-response.

The proposed class of estimators provide opportunity for different known values of the domain population parameters of the auxiliary variable to be incorporated in constructing estimators in the presence of non-response using the concept of calibration. The study revealed that the first constraint is just the sum of the calibration weight equals to one and the third constraint which has to do with the stratum variance also contributes immensely to the efficiency of the proposed estimator. Furthermore, with the adoption of the procedure of sub-sampling the non-respondents even with ratio estimator, the study has reveal that subjecting an estimator to conditions where the study variable is affected by non-response while the auxiliary variable is free of non-response has no effect in the mean estimate.

From the efficiency comparison and empirical work, it becomes pertinent that the use of calibration technique has really paid off in providing estimates of the population mean with sub-sampling the non-respondents that provides greater gains in efficiency better than the existing estimators. This will proffer useful results to users of statistics and researchers when working on economic data that requires the use of auxiliary data either from the records or from previous survey.

However, it could be seen clearly from Table 7 that it was impossible to compute estimates for domain 3 in both cases of population II and hence, the mean square error was not computed. As a result, the PRE was accorded zero value. This is as a result of no sample size for both the respondents and the non-respondents as indicated in Table 6. Future research is encouraged in the light of this through the use of synthetic estimation technique.

None.

The authors declare there is not any conflict of interest.

None.

Deville JC, Särndal C. E. Calibration estimators in survey sampling. JASA. 1992;87:376–382.
Koyuncu N, Kadilar, C. (2013). Calibration estimators using different measures in stratified random sampling. International Journal of Modern Engineering Research. 2013;3(1):415–419.
Clement EP, Udofia GA, Enang EI. Sample design for domain calibration estimators. International Journal of Probability and Statistics. 2014;3(1):8–14.
Clement EP, Enang E I. Calibration approach alternative ratio estimator for population mean in stratified sampling. International Journal of Statistics and Economics. 2015;16(1):83–93.
Godwin A Udofia. Ratio estimation for small domains with subsampling the non-respondents:an application of Rao strategy. Statistics in Transition. 2004;6(5):713—724.
Rao Poduri SRS. Ratio estimation with sub-sampling the non-respondents. Survey Methodology. 1986;12:217—230.
Iseh, MJ, Bassey MO. Calibration estimators for population mean with subsampling the nonrespondents under stratified sampling. Science Journal of Applied Mathematics and Statistics. 2022;10(4):45–56.
Iseh Matthew, Bassey Mbuotidem. Smoothing of estimators of population mean using calibration technique with sample errors. Journal of Modern Applied Statistical Methods. 2024; 23(1):.
Cochran WG. Sampling Techniques, 3rd edition, New York: Wiley. 1977.
Ashutosh (2021) Estimator of domain mean using stratified sampling in the presence on non-response.,Sri Lankan Journal of Applied Statistics. 2021;22(1):13–29.
Clement EP, Inyang EJ. Improving the efficiency of ratio estimators by calibration weightings. International Journal of Statistics and Mathematics. 2021;8(1):164–172.
Iseh, M J, Bassey KJ. A New calibration estimator of population mean for small area with nonresponse. Asian Journal of Probability and Statistics. 2021(a);12(2):14–51.
Iseh, M. J, Bassey, KJ. Calibration estimator for population mean in small sample size with non-response. European Journal of Statistics and Probability. 2021(b);9(1):32–42.
Iseh, M.J, Enang EI. A calibration synthetic estimator of population mean in small area under stratified sampling design. Transition in Statistics new series. 2021;22(3):15–30.
Pal SK, Singh HP. A class of ratio-cum-ratio-type exponential estimators for population mean with subsampling the non-respondents. Jordan Journal of Mathematics and Statistics. 2017;10(1):73–94.
Sarndal CE, Swensson B, Wretman J. Model-assisted surveys. New York: Springer-Verlag. 1992.

Submit manuscript...

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Calibration for efficiency of ratio estimator in domains of study with sub-sampling the nonrespondents

Ikot Ekemini E,

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Iseh Matthew Joshua

Abstract

Introduction

Discussion

Conclusion

Acknowledgments

Conflicts of interest

Funding

References

Citations

Rejected Articles

Journal Menu

Useful Links