On inference of partially correlated data

doi:10.15406/bbij.2015.02.00019

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Editorial Volume 2 Issue 1

On inference of partially correlated data

Hani Samawi,

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Robert Vogel

Department of Biostatistics, Georgia Southern University, USA

Correspondence: Hani Samawi, Department of Biostatistics, JPHCOPH, Georgia Southern University, Statesboro, GA 30460, USA, Tel 912-478-1345, Fax 912-478-5811

Received: January 22, 2015 | Published: January 26, 2015

Citation: Samawi H, Vogel R. On inference of partially correlated data. Biom Biostat Int J. 2015;2(1):5-6. DOI: 10.15406/bbij.2015.02.00019

Download PDF

Editorial

Statistical inferential methods in the fields of the social, behavioral, economic, biological, medical, epidemiologic, health, public health, and drug developmental sciences need has grown exponentially in the last few decades. Study designs in the aforementioned applied sciences give rise to correlated and partially correlated data due to missing responses. For instances correlated data arise when subjects are matched to controls because of confounding factors and there are missing values in either or both groups. Other situations arise when subjects are repeatedly measured over time as in repeated measures designs. One assumption to consider is that observations are missing completely at random (MCAR).¹^,² However, Akritas et al.³ consider another missing value mechanism, missing at random (MAR). For quantitative responses, statistical methods, including linear and nonlinear models, are established for correlated data. However, for partially correlated data there are concerns which to be addressed due to the complexity of the analysis. In particular, for small sample sizes and when a normality assumption of the underlying populations is not valid.

As an example of partially correlated data for the MCAR design, consider the case where the researcher compares two different treatment regiments for eye redness or allergy and randomly assigns one treatment to each eye for each experimental subject. Some patients may drop out after the first treatment, while other patients may drop out before the first treatment and came back for the second treatment. In this situation, we may have two groups of patients: the first group of patients who received both treatments in each eye, and are considered as paired matched data; and the second group who received only one of the treatments in one of the eyes, and are considered as unmatched data.

Moreover, additional examples for partially correlated data can be found in the literature.⁴^–⁶ Several authors have presented various tests considering the problem of estimating the difference of means of a bivariate normal distribution when some observations corresponding to both variables are missing. Under the assumption of bivariate normality and MCAR, Ekbohm⁷ summarized five procedures for testing the equality of two means. Using Monte Carlo results Ekbohm⁷ indicated that the two tests based on a modified maximum likelihood estimator are preferred: one due to Lin and Stivers⁸ when the number of complete pairs is large, and the other proposed in Ekbohm"s paper otherwise, provided the variances of the two responses do not differ substantially. When the correlation coefficient between the two responses is small, two other tests may be used: a test proposed by Ekbohm when the homoscedasticity assumption is not strongly violated, and otherwise a Welch-type statistic suggested by Lin and Stivers ⁸ (for further discussion, see Ekbohm⁷).

Alternatively, researchers tend to ignore some of the data – either the correlated or the uncorrelated data depending on the size of each subset. However, in case the missing ness not completely at random (MCAR), Looney and Jones⁹ argued that ignoring some of the correlated observations would bias the estimation of the variance of the difference in treatment means and would dramatically affect the performance of the statistical test in terms of controlling type I error rates and statistical power.¹⁰ They propose a corrected z-test method to overcome the challenges created by ignoring some of the correlated observations. However, our preliminary investigation shows that the method of Looney and Jones⁹ pertains to large samples and is not the most powerful test procedure. Furthermore, Rempala & Looney¹¹ studied asymptotic properties of a two-sample randomized test for partially dependent data. They indicated that a linear combination of randomized t-tests is asymptotically valid and can be used for non-normal data. However, the large sample permutation tests are difficult to perform and only have some optimal asymptotic properties in the Gaussian family of distributions when the correlation between the paired observations is positive. Other researchers, such as Xu & Harra ¹² and Konietschke et al.¹³ also discuss the problem for continuous variables including the normal distribution by using weighted statistics. However, the procedure suggested by Xu & Harra¹² is a functional smoothing to the Looney & Jones⁹ procedure. As such, the Xu and Hara procedure is not a practical alternative for the non-statistician researcher. The procedure suggested by Konietschke et al.¹³ is a nonparametric procedure based on ranking.

Samawi & Vogel¹⁴ presented weighted test procedure to combined the correlated and non-correlated data. The aforementioned methods cannot be used for non-normal and moderate, small sample size data and categorical data. Samawi & Vogel¹⁵ introduced several weighted tests when the variables of interest are categorical. They showed that their test procedures compete with other tests in the literature. Moreover, there are several attempts to provide nonparametric test procedures under MCAR and MAR designs.¹^–³^,¹⁶^,¹⁷ However, there is still a need for intensive investigation to develop more powerful nonparametric testing procedures for MCAR and MAR designs. Samawi et al.,¹⁸ discussed and proposed some nonparametric testing procedures to handle data when partially correlated data is available without ignoring the cases with missing responses. They introduced more powerful testing procedure which combined all cases in the study. All the above suggested procedures will be of special importance in meta-analysis where partially correlated data is a concern when combining results of various studies.