Submit manuscript...
Journal of
eISSN: 2373-6445

Psychology & Clinical Psychiatry

Mini Review Volume 6 Issue 2

DARHUBER: A Computer Program for Effect Size Estimation in Linear Regression and for Calculating the Significance of Difference between Observed and Expected R2 Values

James B Hittner,1 N Clayton Silver2

1Department of Psychology, College of Charleston, USA
2Department of Psychology, University of Nevada, USA

Correspondence: James B Hittner, Department of Psychology College of Charleston66 George Street Charleston, SC 29424, USA, Tel 843-953-6734, Fax 843-953-7151

Received: June 06, 2016 | Published: July 5, 2016

Citation: Hittner JB, Silver NC (2016) DARHUBER: A Computer Program for Effect Size Estimation in Linear Regression and for Calculating the Significance of Difference between Observed and Expected R2 Values. J Psychol Clin Psychiatry 6(2): 00350. DOI: 10.15406/jpcpy.2016.06.00350

Download PDF

Abstract

In linear multiple regression it is common practice to test whether the squared multiple correlation coefficient, R2, differs significantly from zero. Although frequently used, this test is misleading because the expected value of R2 is not zero under the null hypothesis that ρ, the population value of the multiple correlation coefficient, equals zero. The non-zero expected value of R2 has implications both for significance testing and effect size estimation involving the squared multiple correlation coefficient. In this paper we discuss and offer a freely available computer program that calculates the expected value of R2, an adjusted R2 value and effect size measure that both incorporate the expected value of R2, and an F statistic that tests the significance of difference between the obtained R2 and the expected value of R2 under the null hypothesis that ρ = 0. The interactive, stand-alone program is written in FORTRAN 77 for a Windows environment. The user simply enters the value of a multiple correlation coefficient from a linear regression, the number of predictors, and the sample size. No knowledge of FORTRAN or any other statistical programming language is required.

Keywordsmultiple correlation, regression, effect size, hypothesis testing, computer program, fortran

Introduction

Imagine that a clinical psychologist wishes to predict frequency of depressive symptoms in a community sample of adults from levels of trait anxiety, pessimism, loneliness, and perceived social support. Such data are often modeled using linear multiple regression analysis, and it is common practice to examine whether the squared multiple correlation coefficient, R2, differs significantly from zero. Despite being widely used, this test is misleading because the expected value of R2 is not zero when the population parameter, ρ, equals 0 (where ρ is the population value for the multiple correlation coefficient). Instead, the expected value of R2 is equal to p / n – 1, where p is the number of predictor variables and n is the sample size.1 The non-zero expected value of R2 has implications both for significance testing and effect size estimation involving the squared multiple correlation coefficient. One implication is that in the context of statistical significance testing, the observed value of R2 should be evaluated against the expected value of R2, and not against zero. A second implication concerns effect size estimation. In particular, as pointed out by Huberty,2 the squared multiple correlation coefficient should be adjusted, or corrected, by explicitly incorporating the expected value of R2. In addition, Hubertys2 suggested interpreting an effect size measure created by subtracting the expected value of R2 from adjusted R2 value. As regards to hypothesis testing, Darlington3 presented an F statistic for testing the null hypothesis that the observed R2 equals the expected value of R2.

Unfortunately, the aforementioned statistical quantities are not routinely calculated by widely used statistical software packages such as IBM SPSS, Minitab, and SAS. To address this gap, Hittner4 wrote a SAS data step computer program that calculates these quantities. However, to implement the SAS data step program users must have access to SAS, which is an expensive commercial software package. To accommodate researchers who do not have SAS, we have written an interactive, stand-alone FORTRAN 77 computer program for the Windows environment. No knowledge of FORTRAN or any other statistical programming language is required to use the program.

Program description

The user is queried interactively for the multiple correlation coefficient, number of predictor variables, and sample size. The program responds with a restatement of the inputted values, the expected value of R2, Darlington’s3 F statistic for testing the null hypothesis that the observed R2 equals the expected value of R2, the observed probability value for Darlington’s F, Huberty’s2 adjusted R2 index, and Huberty’s2 effect size measure. The name of the program is Darhuber and it is written in FORTRAN 77, using the GNU FORTRAN compiler, and runs on a Windows PC or compatible. The output is contained in darhube.out.

Worked example

Let’s revisit the example mentioned at the beginning of the Introduction whereby a clinical psychologist wishes to predict frequency of depressive symptoms from four putative risk factors, such that the number of predictors, p, equals four. Suppose the sample size, n, is 60 and the multiple correlation coefficient, R, equals 0.56. The output from darhuber.out is contained in Table 1. As these results indicate, the expected value of R2 is 0.0678, which is the value of R2 expected by chance alone. Darlington’s F-test was 3.0308 with a p-value of 0.0140, indicating that, at the nominal alpha level of 0.05, the observed R2 (0.3136) was significantly different (greater) than the expected value of R2 (0.0678). Thus, in our example the four predictors accounted for 31.36% of the variance in frequency of depressive symptoms. This proportion of variance (31.36%) is significantly greater (p < 0.05) than the proportion of predicted outcome variance (6.78%) that would be expected based on chance alone. Huberty’s adjusted R2 value was 0.2637. This value “corrects” the obtained R2 by explicitly incorporating the expected value of R2. One way to conceptualize Huberty’s adjusted R2 is that it accounts for sample-to-population shrinkage by directly modeling the expected value of R2. Finally, Huberty’s R2-based effect size estimate was 0.1959, which, according to Cohen’s5 criteria, is a medium-sized effect.

 SAMPLE R =   0.5600

 SAMPLE R-SQUARED =   0.3136

 SAMPLE SIZE =   60.0000

 NUMBER OF PREDICTORS =   4.0000

 EXPECTED VALUE OF R-SQUARED =   0.0678

 DARLINGTON F =   3.0308 WITH A PROBABILITY OF   0.0140

 NUMERATOR AND DENOMINATOR DF OF F =   5.2155, 55.0000

 HUBERTY ADJUSTED R-SQUARED =   0.2637

 HUBERTY EFFECT SIZE =   0.1959

Table 1 Sample Output from the DARHUBER FORTRAN Program

Availability

DARHUBER.FOR and the executable version (DARHUBER.EXE) may be obtained at no charge by sending an e-mail request to N. Clayton Silver, Department of Psychology, University of Nevada, Las Vegas, Las Vegas, NV 89154-5030 at "mailto:fdnsilvr@unlv.nevada.edu" fdnsilvr@unlv.nevada.edu.

Acknowledgments

None.

Conflicts of interest

Author declare there are no conflicts of interest.

Funding

None.

References

  1. Morrison DF. Multivariate statistical methods. McGraw-Hill, New York, USA. 1990.
  2. Huberty CJ. A note on interpreting an R2 value. Journal of Educational and Behavioral Statistics. 1994;19:351‒356.
  3. Darlington RB. Regression and linear models. McGraw-Hill, New York, USA. 1990.
  4. Hittner JB. On correctly adjusting the squared multiple correlation coefficient in linear regression Effect size estimation and significance testing with application to substance abuse research. Journal of Drug Abuse. 2016;2:2.
  5. Cohen J. A power primer. Psychological Bulletin. 1992;112(1):155‒159.
Creative Commons Attribution License

©2016 Hittner, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.