Submit manuscript...
eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 11 Issue 4

A new method for identifying significant genes from gene expression data

Yiwen Cao,1 Jiajuan Liang,1,2 Na Gao,3 Zengrong Sun3

1 Department of Statistics and Data Science, BNU-HKBU, United International College, China
2Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, United International College, China
3School of Public Health, Tianjin Medical University, China

Correspondence: Jiajuan Liang, Department of Statistics and Data Science, United International College, Zhuhai, China

Received: November 07, 2022 | Published: November 23, 2022

Citation: Cao Y, Liang J, Gao N, et al. A new method for identifying significant genes from gene expression data. Biom Biostat Int J. 2022;11(4):140-146. DOI: 10.15406/bbij.2022.11.00368

Download PDF

Abstract

Testing the significance of a medical treatment on experimental subjects is very common in medical data analysis. Classical methods like the traditional analysis of variance usually assume variance homogeneity across treatments or experimental groups of subjects. However, this assumption is often violated if there exists fundamental difference between experimental groups like male and female groups of patients. In this paper, we propose to use a theoretically proved exact -test for testing the significance of a medical treatment for subjects before and after the treatment. This new exact -test is applicable regardless of variance homogeneity across groups. An illustration based on real experimental data from treatments on rats shows that the new exact  -test gives more convincing results than those from the traditional analysis of variance.

Keywords: analysis of variance; -test; gene expression data; multiple mean comparison

Introduction

The significance of responses from a gene before and after an experiment can be tested by statistical methods for multiple mean comparison. While the traditional two-sample Student’s -test has been employing for comparison between two normal population means, it is assumed that the two populations have equal variances. This condition may not be always satisfied if a group of genes are correlated. Although unequal-variance two-sample -test was proposed to handle the two-sample mean comparison, it only provides an approximate solution. A natural extension to the two-sample -test for a comparison between two means is the one-way (single-factor) analysis of variance (ANOVA), which is also based on the assumption of equal variances across the populations from which the means are compared. If the equal-variance assumption is not satisfied, the conclusion from one-way ANOVA is doubtable. For example, a study on a comparison between the mean responses from different genes of rats before and after an experiment is given by Gao et al.1 The purpose of the study is to find out which gene has a significant change between the mean responses of each rat. In the experiment, 24 pregnant rats were randomly assigned to four groups (sample size  per group) and treated with corn oil (vehicle control), 2, 10 or 50 mg/kg DEHP (Alfa Aesar). The response data from the three experiments with different doses of corn oil were collected. Under each dose, the experimental purpose is to see which gene shows a significant change before and after the experiment. This is a kind of representative experiments in medical research for identifying the significant effect from a treatment. Under the classical equal-variance normal assumption, the solution is obvious the two-sample -test for comparing the different effects from two doses, and the one-way ANOVA is employed for comparing the different effects from three or more doses. When the classical equal-variance assumption is known to be violated, there are some approximate methods available for a multiple mean comparison.2–7 In addition to comparing the mean difference between before and after treatment for each gene, there are many other methods for analysis of gene expression data in the literature.8–14

In this paper, we will employ a new exact -distribution-based method for the multiple mean comparison for gene-experiment data under the normal assumption on the sample data. The new exact -test does not depend on the equal-variance assumption across groups. This implies that the new -test will be especially suitable for the multiple mean comparison with known variance heterogeneity across groups. We give a summary review on the basic statistical theory for the new -test in Section 4. Details on the new -test can be referred to Liang et al.15 Section 5 presents the application of the new -test to significant gene identification compared to the classical one-way ANOVA method. Some concluding remarks are given in the last section.

A simple review on the new F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaGqadabaaaaaaaaapeGaa8Nraaaa@3801@ -test

Assume that there is a balanced sample design (with an equal sample size across the normal populations) to obtain i.i.d. (independent identically distributed) samples {xi=(xi1,,xin):n×1,i=1,,k} MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaai4EaiaadIhapaWaaSbaaSqaa8qacaWGPbaapaqabaGcpeGaeyypa0JaaiikaiaadIhapaWaaSbaaSqaa8qacaWGPbGaaGymaaWdaeqaaOWdbiaacYcacqGHMacVcaGGSaGaamiEa8aadaWgaaWcbaWdbiaadMgacaWGUbaapaqabaGcpeGabiykayaafaGaaiOoa8aacaaMe8+dbiaad6gacqGHxdaTcaaIXaGaaiila8aacaaMe8+dbiaadMgacqGH9aqpcaaIXaGaaiilaiabgAci8kaacYcacaWGRbGaaiyFaaaa@566A@ from the normal distribution N(μi,σi2) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOtaiaacIcacqaH8oqBpaWaaSbaaSqaa8qacaWGPbaapaqabaGcpeGaaiilaiabeo8aZ9aadaqhaaWcbaWdbiaadMgaa8aabaWdbiaaikdaaaGccaGGPaaaaa@4104@ for each population i=1,,k MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamyAaiabg2da9iaaigdacaGGSaGaeyOjGWRaaiilaiaadUgaaaa@3DBB@ (k2) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaaiikaiaadUgacqGHLjYScaaIYaGaaiykaaaa@3BF9@ . Here it is also assumed that samples from different populations N(μi,σi2) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOtaiaacIcacqaH8oqBpaWaaSbaaSqaa8qacaWGPbaapaqabaGcpeGaaiilaiabeo8aZ9aadaqhaaWcbaWdbiaadMgaa8aabaWdbiaaikdaaaGccaGGPaaaaa@4104@ and N(μj,σj2) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOtaiaacIcacqaH8oqBpaWaaSbaaSqaa8qacaWGQbaapaqabaGcpeGaaiilaiabeo8aZ9aadaqhaaWcbaWdbiaadQgaa8aabaWdbiaaikdaaaGccaGGPaaaaa@4106@ (ij) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaaiikaiaadMgacqGHGjsUcaWGQbGaaiykaaaa@3C2B@ are also independent. We want to test the hypothesis of multiple mean comparison: H0:μ1==μk MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamisa8aadaWgaaWcbaWdbiaaicdaa8aabeaak8qacaGG6aWdaiaaysW7peGaeqiVd02damaaBaaaleaapeGaaGymaaWdaeqaaOWdbiabg2da9iabgAci8kabg2da9iabeY7aT9aadaWgaaWcbaWdbiaadUgaa8aabeaaaaa@4512@ H1:at least two means differ MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamisa8aadaWgaaWcbaWdbiaaigdaa8aabeaak8qacaGG6aWdaiaaysW7peGaaeyyaiaabshacaqGGaGaaeiBaiaabwgacaqGHbGaae4CaiaabshacaqGGaGaaeiDaiaabEhacaqGVbGaaeiiaiaab2gacaqGLbGaaeyyaiaab6gacaqGZbGaaeiiaiaabsgacaqGPbGaaeOzaiaabAgacaqGLbGaaeOCaaaa@51AB@ (1)

 Randomly selecting a population as population , we construct the observation matrix

=(x11xk1x21xk1x12xk2x22xk2x1nxknx2nxknxk1,1xk1xk1,2xk2xk1,nxkn):n×(k1) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaeyypa0ZaaeWaaeaafaqabeabeaaaaaqaaiaadIhadaWgaaWcbaGaaGymaiaaigdaaeqaaaGcbaGaeyOeI0IaamiEamaaBaaaleaacaWGRbGaaGymaaqabaaakeaacaWG4bWaaSbaaSqaaiaaikdacaaIXaaabeaaaOqaaiabgkHiTiaadIhadaWgaaWcbaGaam4AaiaaigdaaeqaaaGcbaGaamiEamaaBaaaleaacaaIXaGaaGOmaaqabaaakeaacqGHsislcaWG4bWaaSbaaSqaaiaadUgacaaIYaaabeaaaOqaaiaadIhadaWgaaWcbaGaaGOmaiaaikdaaeqaaaGcbaGaeyOeI0IaamiEamaaBaaaleaacaWGRbGaaGOmaaqabaaakeaacqWIUlstaeaaaeaafaqabeqabaaabaGaeSO7I0eaaaqaaaqaaiaadIhadaWgaaWcbaGaaGymaiaad6gaaeqaaaGcbaGaeyOeI0IaamiEamaaBaaaleaacaWGRbGaamOBaaqabaaakeaacaWG4bWaaSbaaSqaaiaaikdacaWGUbaabeaaaOqaaiabgkHiTiaadIhadaWgaaWcbaGaam4Aaiaad6gaaeqaaaaakuaabeqaeqaaaaaabaGaeS47IWeabaGaamiEamaaBaaaleaacaWGRbGaeyOeI0IaaGymaiaacYcacaaIXaaabeaaaOqaaiabgkHiTiaadIhadaWgaaWcbaGaam4AaiaaigdaaeqaaaGcbaaabaGaeS47IWeabaGaamiEamaaBaaaleaacaWGRbGaeyOeI0IaaGymaiaacYcacaaIYaaabeaaaOqaaiabgkHiTiaadIhadaWgaaWcbaGaam4AaiaaikdaaeqaaaGcbaaabaGaeSO7I0eabaGaeSO7I0eabaaabaaabaGaeS47IWeabaGaamiEamaaBaaaleaacaWGRbGaeyOeI0IaaGymaiaacYcacaGGUbaabeaaaOqaaiabgkHiTiaadIhadaWgaaWcbaGaam4Aaiaad6gaaeqaaaGcbaaaaaGaayjkaiaawMcaaiaacQdacaWGUbGaey41aq7aaeWaaeaacaWGRbGaeyOeI0IaaGymaaGaayjkaiaawMcaaaaa@9286@ (2)

 

  • Theorem. Let the observation matrix be given by (1.8). Define the following eigenvalue-eigenvector problem:15

(1nXX)di=λdi MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeWaaeWaaeaadaWcaaqaaiaaigdaaeaacaWGUbaaaGqadiqa=Hfagaqbaiaa=HfaaiaawIcacaGLPaaacaWGKbWaaSbaaSqaaiaadMgaaeqaaOGaeyypa0Jaeq4UdWMaamizamaaBaaaleaacaWGPbaabeaaaaa@4309@ (3)

where di=(di1,...,di,k1) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamizamaaBaaaleaacaWGPbaabeaakiabg2da9iaacIcacaGGKbWaaSbaaSqaaiaadMgacaaIXaaabeaakiaacYcacaGGUaGaaiOlaiaac6cacaGGSaGaaiizamaaBaaaleaacaWGPbGaaiilaiaadUgacqGHsislcaaIXaaabeaakiqacMcagaqbaaaa@4737@ for i=(1,,r) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamyAaiabg2da9iaacIcacaaIXaGaaiilaiabl+UimjaacYcacaGGYbGaaiykaaaa@3F7A@ with r=min(n,k1)1 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOCaiabg2da9iGac2gacaGGPbGaaiOBamaabmaabaGaamOBaiaacYcacaWGRbGaeyOeI0IaaGymaaGaayjkaiaawMcaaiabgkHiTiaaigdaaaa@4369@ being the number of positive eigenvalues λ1>>λr>0 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaeq4UdW2aaSbaaSqaaiaaigdaaeqaaOGaeyOpa4JaeS47IWKaeyOpa4Jaeq4UdW2aaSbaaSqaaiaadkhaaeqaaOGaeyOpa4JaaGimaaaa@4274@ in (1.9).

Definezi=(zi1,zin)=Xdi,z¯i=1nj=1nzij,Fi(zi)=n(z¯i)21n1j=1n(zijz¯i)2. MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOabaeqabaGaamOEamaaBaaaleaaqaaaaaaaaaWdbiaadMgaa8aabeaak8qacqGH9aqpcaGGOaGaamOEa8aadaWgaaWcbaWdbiaadMgacaaIXaaapaqabaGccaGGSaGaeS47IW0dbiaadQhapaWaaSbaaSqaa8qacaWGPbGaamOBaaWdaeqaaOWdbiqacMcagaqbaiabg2da9Gqadiaa=HfacaWGKbWdamaaBaaaleaapeGaamyAaaWdaeqaaOWdbiaacYcaceWG6bGbaebapaWaaSbaaSqaa8qacaWGPbaapaqabaGcpeGaeyypa0ZaaSaaaeaacaaIXaaabaGaamOBaaaadaaeWaqaaiaadQhadaWgaaWcbaGaamyAaiaadQgaaeqaaaqaaiaadQgacqGH9aqpcaaIXaaabaGaamOBaaqdcqGHris5aOGaaiilaaqaaiaadAeapaWaaSbaaSqaa8qacaWGPbaapaqabaGcpeGaaiikaiaacQhapaWaaSbaaSqaa8qacaWGPbaapaqabaGcpeGaaiykaiabg2da9maalaaabaGaamOBamaabmaabaGabmOEayaaraWdamaaBaaaleaapeGaamyAaaWdaeqaaaGcpeGaayjkaiaawMcaamaaCaaaleqabaGaaGOmaaaaaOqaamaalaaabaGaaGymaaqaaiaad6gacqGHsislcaaIXaaaamaaqadabaGaaiikaiaadQhadaWgaaWcbaGaamyAaiaadQgaaeqaaOGaeyOeI0IabmOEayaaraWdamaaBaaaleaapeGaamyAaaWdaeqaaOGaaiykamaaCaaaleqabaGaaGOmaaaaa8qabaGaamOAaiabg2da9iaaigdaaeaacaWGUbaaniabggHiLdaaaOGaaiOlaaaaaa@77C9@ (4)

i=(1,,r) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamyAaiabg2da9iaacIcacaaIXaGaaiilaiabl+UimjaacYcacaGGYbGaaiykaaaa@3F7A@ for . Under the null hypothesis (1), Fi MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOramaaBaaaleaacaWGPbaabeaaaaa@3913@ has an exact F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraaaa@37F9@ -distribution F(1,n1) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraiaacIcacaaIXaGaaiilaiaad6gacqGHsislcaaIXaGaaiykaaaa@3D58@ for i=1,,r=min(n,k1)1 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamyAaiabg2da9iaaigdacaGGSaGaeS47IWKaaiilaiaackhacqGH9aqpciGGTbGaaiyAaiaac6gadaqadaqaaiaad6gacaGGSaGaam4AaiabgkHiTiaaigdaaiaawIcacaGLPaaacqGHsislcaaIXaaaaa@4965@ .

Each of the Fi MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOramaaBaaaleaacaWGPbaabeaaaaa@3913@ -statistic given by (4) can be employed to test the hypothesis (1). For any given i=1,,r=min(n,k1)1 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamyAaiabg2da9iaaigdacaGGSaGaeS47IWKaaiilaiaackhacqGH9aqpciGGTbGaaiyAaiaac6gadaqadaqaaiaad6gacaGGSaGaam4AaiabgkHiTiaaigdaaiaawIcacaGLPaaacqGHsislcaaIXaaaaa@4965@ , reject the null hypothesis in (1) at a given level 0<α<1 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaaGimaiabgYda8iabeg7aHjabgYda8iaaigdaaaa@3C4A@ for a large value of Fi>F(1α;1,n1) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOramaaBaaaleaacaWGPbaabeaakiabg6da+iaadAeadaqadaqaaiaaigdacqGHsislcqaHXoqycaGG7aGaaGymaiaacYcacaWGUbGaeyOeI0IaaGymaaGaayjkaiaawMcaaaaa@4485@ , which stands for the 100(1α) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaaGymaiaaicdacaaIWaWaaeWaaeaacaaIXaGaeyOeI0IaeqySdegacaGLOaGaayzkaaaaaa@3E2D@ -percentile of the traditional F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraaaa@37F9@ -distribution F(1,n1) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraiaacIcacaaIXaGaaiilaiaad6gacqGHsislcaaIXaGaaiykaaaa@3D58@ . We suggest using the F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraaaa@37F9@ -statistic F1(z1) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOramaaBaaaleaacaaIXaaabeaakmaabmaabaGaamOEamaaBaaaleaacaaIXaaabeaaaOGaayjkaiaawMcaaaaa@3C63@ in (4) associated with the largest eigenvalue in (3) based on the Monte Carlo study in Liang et al.15 The -test is called the PCA-test (principal component test).

Application of the exact F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaGqadabaaaaaaaaapeGaa8Nraaaa@3801@ -test

A research project was carried out by Tianjin Medical University, China.1 Rats were collected for experiment by four different treatments (doses) to see the treatment effects from 46 genes with sample size n=6 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOBaiabg2da9iaaiAdaaaa@39E7@ (rats) for each treatment. In the experiment on 6 rats, the ratio of organ wet weight to body weight (organ coefficient) was observed. The purpose is to evaluate organ development during the treatment. Details on the experiment and medical analysis can be found in Gao et al.1 In one-way ANOVA, we can consider each factor level as a group or population. In the experiment on 6 male rats with 46 levels (genes), we consider if the ratio of organ wet weight to body weight has changed during the treatment.

Let

μ1i=the average ratio of organ wet weight to body weight f or gene before treatment, MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaeqiVd02damaaBaaaleaapeGaaGymaiaadMgaa8aabeaak8qacqGH9aqpcaqG0bGaaeiAaiaabwgacaqGGaGaaeyyaiaabAhacaqGLbGaaeOCaiaabggacaqGNbGaaeyzaiaabccacaqGYbGaaeyyaiaabshacaqGPbGaae4BaiaabccacaqGVbGaaeOzaiaabccacaqGVbGaaeOCaiaabEgacaqGHbGaaeOBaiaabccacaqG3bGaaeyzaiaabshacaqGGaGaae4DaiaabwgacaqGPbGaae4zaiaabIgacaqG0bGaaeiiaiaabshacaqGVbGaaeiiaiaabkgacaqGVbGaaeizaiaabMhacaqGGaGaae4DaiaabwgacaqGPbGaae4zaiaabIgacaqG0bGaaeiiaiaabAgacaqGGaGaae4BaiaabkhacaqGGaGaae4zaiaabwgacaqGUbGaaeyzaiaabccacaqGIbGaaeyzaiaabAgacaqGVbGaaeOCaiaabwgacaqGGaGaaeiDaiaabkhacaqGLbGaaeyyaiaabshacaqGTbGaaeyzaiaab6gacaqG0bGaaiilaaaa@8229@

μ2i=the average ratio of organ wet weight to body weight f or gene after treatment MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaeqiVd02damaaBaaaleaapeGaaGOmaiaadMgaa8aabeaak8qacqGH9aqpcaqG0bGaaeiAaiaabwgacaqGGaGaaeyyaiaabAhacaqGLbGaaeOCaiaabggacaqGNbGaaeyzaiaabccacaqGYbGaaeyyaiaabshacaqGPbGaae4BaiaabccacaqGVbGaaeOzaiaabccacaqGVbGaaeOCaiaabEgacaqGHbGaaeOBaiaabccacaqG3bGaaeyzaiaabshacaqGGaGaae4DaiaabwgacaqGPbGaae4zaiaabIgacaqG0bGaaeiiaiaabshacaqGVbGaaeiiaiaabkgacaqGVbGaaeizaiaabMhacaqGGaGaae4DaiaabwgacaqGPbGaae4zaiaabIgacaqG0bGaaeiiaiaabAgacaqGGaGaae4BaiaabkhacaqGGaGaae4zaiaabwgacaqGUbGaaeyzaiaabccacaqGHbGaaeOzaiaabshacaqGLbGaaeOCaiaabccacaqG0bGaaeOCaiaabwgacaqGHbGaaeiDaiaab2gacaqGLbGaaeOBaiaabshaaaa@8096@ (5)

i=1,,46 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaadMgacqGH9aqpcaaIXaGaaiilaiabl+UimjaacYcacaaI0aGaaGOnaaaa@3E89@ for . Then we need to test the hypothesis

H0:μ1iμ2i=0versusH1:μ1iμ2i0 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamisa8aadaWgaaWcbaWdbiaaicdaa8aabeaak8qacaGG6aWdaiaaysW7peGaeqiVd02damaaBaaaleaapeGaaGymaiaadMgaa8aabeaak8qacqGHsislcqaH8oqBpaWaaSbaaSqaa8qacaaIYaGaamyAaaWdaeqaaOWdbiabg2da9iaaicdapaGaaGjbV=qacaqG2bGaaeyzaiaabkhacaqGZbGaaeyDaiaabohapaGaaGjbV=qacaWGibWdamaaBaaaleaapeGaaGymaaWdaeqaaOWdbiaacQdapaGaaGjbV=qacqaH8oqBpaWaaSbaaSqaa8qacaaIXaGaamyAaaWdaeqaaOWdbiabgkHiTiabeY7aT9aadaWgaaWcbaWdbiaaikdacaWGPbaapaqabaGcpeGaeyiyIKRaaGimaaaa@5E74@ (6)

For each of the two groups of rats with four treatments, we employ the Bartlett test for variance homogeneity before and after the treatments.16 The -values for testing homogeneity for those genes with significance at levels α=.05 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaeqySdeMaeyypa0JaaiOlaiaaicdacaaI1aaaaa@3BFE@ and α=.10 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaeqySdeMaeyypa0JaaiOlaiaaigdacaaIWaaaaa@3BFA@ are given in Tables 1–3 from the two groups of rats. It shows that there exists significant variance heterogeneity for those genes before and after the treatment. This implies that if one continues using the traditional ANOVA (analysis of variance) method for testing the significance of the genes after the treatment, the conclusion is doubtable because the data show a violation of variance homogeneity.

Genes

Avp

Dbp

Drd1a

Gh1

Ghrh

Igf1

p-value

.0062

.0002

.0294

0

.0074

.0011

Genes

Kiss1r

Lepr

Cyp19a1

Nkx2-1

Pomc

p-value

.0056

.0000

.0542

.0045

0.000

Table 1 p-values for testing homogeneity for genes in group male-neonatal

Genes

Avp

Bdnf

Crh

Crhr1

Drd1a

Gh1

p-value

.0000

.0072

.0030

.0542

.0051

.0000

Genes

Grin2a

Mtnr1a

Oxt

Oxtr

Pgr

Tacr3

p-value

0.0213

.0000

.0000

.0000

.0463

.0011

Table 2 p-values for testing homogeneity for genes in group male-ARC

Genes

Drd1a

Drd2

Gh1

Ghrh

Gper

Grin2a

p-value

.0235

.0000

.0000

.0047

.0092

.0000

Genes

Mtnr1b

Npy

Pomc

Tac2

Tacr3

p-value

.0259

.0036

.0003

.0377

.0001

Table 3 p-values for testing homogeneity for genes in group male-AVPV

We apply both the classical one-way ANOVA F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraaaa@37F9@ -test and the new exact F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraaaa@37F9@ -test F=F1(z1) MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraiabg2da9iaadAeadaWgaaWcbaGaaGymaaqabaGcdaqadaqaaiaadQhadaWgaaWcbaGaaGymaaqabaaakiaawIcacaGLPaaaaaa@3E34@ in (4) (called PCA -test) to testing the significance for the genes in three groups. The -values from the two tests are summarized in Tables 4–7 below. The following conclusions can be summarized:

Genes

Ar

Arntl

Avp

Avpr1a

Bdnf

Clock

Crh

ANOVA-F

.3073

.0152

.4923

.573

.6385

.8592

.6671

PCA-F

.0324

.0445

.2663

.3545

.4595

.6501

.5668

Genes

Crhr1

Crhr2

Dbp

Drd1a

Drd2

Esr1

Esr2

ANOVA-F

.6914

.1955

.7558

.1906

.8552

.9505

.0244

PCA-F

0.5876

.0203

.4652

.097

.7526

.8538

.041

Genes

Gh1

Ghrh

Gper

Grin1

Grin2a

Grin2b

Grin2d

ANOVA-F

.4179

.9457

.3892

.1939

.5434

.7007

.3738

PCA-F

.3631

.6833

.2926

.194

.3387

.3057

.4023

Genes

Hcrtr2

Igf1

Igf1r

Kiss1

Kiss1r

Lepr

Cyp19a1

ANOVA-F

.2066

.8900

.2030

.9091

.3700

.7010

.2118

PCA-F

.0036

.5683

.1657

.6052

.2589

.4588

.0239

Genes

Mc3r

Mtnr1a

Mtnr1b

Nkx2-1

Npy

Nr3c1

Oxt

ANOVA-F

.3918

.0745

.7636

.2333

.8120

.9236

.4260

PCA-F

.2086

.0091

.4700

.2107

.4419

.6632

.1254

Genes

Oxtr

Pdyn

Per1

Per2

Pgr

Pomc

Slc17a6

ANOVA-F

.8607

.4441

.3191

.0534

.1953

.3404

.0708

PCA-F

.7419

.1702

.1874

.0373

.1394

.3162

.0148

Genes

Sst

Tac2

Tacr3

Trh

ANOVA-F

.7489

.4924

.1086

.3043

PCA-F

.3707

.1928

.0816

.0871

Table 4 p-values for testing significance for genes in group male-neonatal

Genes

Ar

Arntl

Avp

Avpr1a

Bdnf

Clock

ANOVA-F

.2738

.6888

.3234

.8551

.6795

.5743

PCA-F

.1571

.3826

.1975

.4207

.3934

.3208

Genes

Crh

Crhr1

Crhr2

Dbp

Drd1a

Drd2

ANOVA-F

.1842

.3629

.6945

.9889

.5464

.7514

PCA-F

.2447

.2159

.3823

.8757

.2261

.4878

Genes

Esr1

Esr2

Gh1

Ghrh

Gper

Grin1

ANOVA-F

.0221

.0722

.4098

.0966

.7448

.7226

PCA-F

.0069

.0485

.3612

.0432

.4959

.4386

Genes

Grin2a

Grin2b

Grin2d

Hcrtr2

Igf1

Igf1r

ANOVA-F

.5232

.9309

.6604

.3190

.6347

.7416

PCA-F

.4889

.5286

.2644

.1584

.4042

.4538

Genes

Kiss1

Kiss1r

Lepr

Cyp19a1

Mc3r

Mtnr1a

ANOVA-F

.3503

.4138

.3043

.2991

.2582

.5545

PCA-F

.3023

.7061

.2564

.2454

.3977

.3466

Genes

Mtnr1b

Nkx2-1

Npy

Nr3c1

Oxt

Oxtr

ANOVA-F

.1711

.3207

.0029

.6746

.2143

.3939

PCA-F

.0562

.3647

.0190

.3246

.1696

.2914

Genes

Pdyn

Per1

Per2

Pgr

Pomc

Slc17a6

ANOVA-F

.5596

.5897

.8595

.2314

.3705

.4243

PCA-F

.5191

.5415

.4891

.1275

.2689

.3281

Genes

Sst

Tac2

Tacr3

Trh

 

 

ANOVA-F

.6276

.3626

.6251

.5534

 

 

PCA-F

.7813

.4330

.3860

.2318

 

 

Table 5 p-values for testing significance for genes in group male-ARC

Genes

Ar

Arntl

Avp

Avpr1a

Bdnf

Clock

Crh

ANOVA-F

.7509

.2764

.7339

.1999

.2555

.9345

.8701

PCA-F

.2708

.5035

.5116

.1643

.2446

.5705

.5202

Genes

Crhr1

Crhr2

Dbp

Drd1a

Drd2

Esr1

Esr2

ANOVA-F

.0136

.0610

.4226

.6432

.1514

.5466

.9320

PCA-F

.0107

.0250

.2781

.3535

.2012

.1871

.7808

Genes

Gh1

Ghrh

Gper

Grin1

Grin2a

Grin2b

Grin2d

ANOVA-F

.4493

.6673

.3409

.6511

.2186

.3453

.2355

PCA-F

.3825

.6800

.0505

.3143

.2644

.0805

.4195

Genes

Hcrtr2

Igf1

Igf1r

Kiss1

Kiss1r

Lepr

Cyp19a1

ANOVA-F

.1491

.3570

.1444

.3790

.8003

.1226

.4710

PCA-F

.0502

.1385

.1944

.2156

.4514

.0928

.1250

Genes

Mc3r

Mtnr1a

Mtnr1b

Nkx2-1

Npy

Nr3c1

Oxt

ANOVA-F

.6769

.5237

.2324

.8186

.7305

.4726

.9267

PCA-F

.5365

.4859

.0508

.8017

.3902

.2611

.9816

Genes

Oxtr

Pdyn

Per1

Per2

Pgr

Pomc

Slc17a6

ANOVA-F

.8586

.9684

.4784

.8736

.9046

.7508

.6924

PCA-F

.5902

.7579

.1905

.5027

.6225

.4901

.3291

Genes

Sst

Tac2

Tacr3

Trh

 

 

 

ANOVA-F

.2935

.8648

.0993

.9017

 

 

 

PCA-F

.3589

.7606

.1528

.7711

 

 

 

Table 6 p-values for testing significance for genes in group male-AVPV

Genes

Ar

Arntl

Avp

Avpr1a

Bdnf

Clock

ANOVA-F

.1539

.4363

.0157

.3085

.2654

.2786

PCA-F

.0373

.2631

.0019

.1560

.0835

.1930

Genes

Crh

Crhr1

Crhr2

Dbp

Drd1a

Drd2

ANOVA-F

.5818

.6505

.4481

.8795

.5372

.7997

PCA-F

.1744

.2764

.1376

.5565

.1448

.5654

Genes

Esr1

Esr2

Gh1

Ghrh

Gper

Grin1

ANOVA-F

.3685

.7953

.4393

.6965

.9670

.4227

PCA-F

.2178

.4330

.3175

.3349

.9512

.3256

Genes

Grin2a

Grin2b

Grin2d

Hcrtr2

 

 

ANOVA-F

.3246

.9973

.8915

.0244

 

 

PCA-F

.0202

.8985

.6386

.0657

 

 

Genes

Igf1

Igf1r

Kiss1

Kiss1r

Lepr

Cyp19a1

ANOVA-F

.7253

.2116

.4614

.7769

.4248

.2284

PCA-F

.4271

.1766

.2922

.6023

.2213

.0263

Genes

Mc3r

Mtnr1a

Mtnr1b

Nkx2-1

Npy

Nr3c1

ANOVA-F

.6119

.4281

.3589

.7073

.7314

.6476

PCA-F

.3823

.3642

.3105

.4190

.4107

.4370

Genes

Oxt

Oxtr

Pdyn

Per1

Per2

Pgr

ANOVA-F

.9217

.6444

.3707

.6318

.5211

.3892

PCA-F

.5645

.4699

.1437

.4065

.3449

.3447

Genes

Pomc

Slc17a6

Sst

Tac2

Tacr3

Trh

ANOVA-F

.7911

.3327

.6869

.2291

.1424

.4648

PCA-F

.4319

.1467

.3772

.1567

.0850

.2366

Table 7 p-values for testing significance for genes in group Male-MPN

1) the red-colored genes with red p MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamiCaaaa@3823@ -values are significant based on both ANOVA F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraaaa@37F9@ -test and PCA F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraaaa@37F9@ -test for level α=.05 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaeqySdeMaeyypa0JaaiOlaiaaicdacaaI1aaaaa@3BFE@ ;

 2) the red-colored genes with a red p MathType@MTEF@5@5@+=feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8qacaWGWbaaaa@370C@ -value and a green-colored -value is significant based on PCA F MathType@MTEF@5@5@+=feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8qacaWGgbaaaa@36E2@ -test for α=.05 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaeqySdeMaeyypa0JaaiOlaiaaicdacaaI1aaaaa@3BFE@ or α=.10 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaeqySdeMaeyypa0JaaiOlaiaaigdacaaIWaaaaa@3BFA@ but insignificant based on ANOVA F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraaaa@37F9@ -test. Some of the genes are significant based on ANOVA F MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaamOraaaa@37F9@ -test for α=.10 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkY=Mj0xXdbba91rFfpec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaapeGaeqySdeMaeyypa0JaaiOlaiaaigdacaaIWaaaaa@3BFA@ ;

 3) the ANOVA F-test fails to identify several significant genes at level α=.05 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkYJL8crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySdeMaeyypa0JaaiOlaiaaicdacaaI1aaaaa@3CD5@ or α=.10 MathType@MTEF@5@5@+=feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqkYJL8crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySdeMaeyypa0JaaiOlaiaaigdacaaIWaaaaa@3CD1@ - those genes with red or green-colored p-values in Tables 4-7: genes Ar, Crhr2, Drd1a, Hcrtr2, Cyp19a1, Tacr3, and Trh in Table 4, Mtnr1b in Table 5, genes Gper, Grin2b, Hcrtr2, Lepr, and Mtnr1b in Table 6, genes Ar, Bdnf, Grin2a, Cyp19a1, and Tacr3 in Table 7.

The following box plots indicate there exists variance heteroscedasticity across different treatment groups. This means that the PCA-F test gives more convincing conclusions when testing the mean difference. Furthermore, the ANOVA fails to identify quite a few of significant genes.

Figures(1–4b)

Figure 1 Box plots for four significant genes in group male-neonatal.

Figure 1 Box plots for four significant genes in group male-neonatal.

Figure 2 Box plots for four significant genes in group male-ARC.

Figure 3 Box plots for four significant genes in group male-AVPV.

Figure 4 Box plots for four significant genes in group male-MPN.

 

Concluding remarks

The new exact -test in this paper is applicable for multiple mean comparisons without assuming homogeneity of variances across the populations. It is especially suitable for matched pair mean comparison in the situation of before and after treatments in medical research. When different experimental subjects show different responses to the treatments, it is very likely that there exists variance heterogeneity across the treatments. As a result, conclusions from the traditional ANOVA -test or the classical two-sample -test are doubtable. While there exists approximate solutions to the problem of two-sample mean comparison with heterogeneous variances, for example, Welch’s2 approximate -test, and Dudewicz et al.7 method for an exact solution to the Behrens-Fisher problem, these methods are either based on the approximate null distribution of the test statistics or based on approximate computation of the -values. The method based on the new exact -test in this paper provides an accurate solution to the problem of two normal population mean comparison without any restriction on the population variances. The real data analysis shows the new exact -test could detect some situations of mean difference for which the traditional ANOVA -test fails. Therefore, the method based on the new exact -test in this paper is recommended to be used together with some existing methods for the same purpose in problems of multiple mean comparisons.17–19

Conclusion

The new exact -test in this paper is applicable for multiple mean comparisons without assuming homogeneity of variances across the populations. It is especially suitable for matched pair mean comparison in the situation of before and after treatments in medical research. When different experimental subjects show different responses to the treatments, it is very likely that there exists variance heterogeneity across the treatments. As a result, conclusions from the traditional ANOVA -test or the classical two-sample -test are doubtable. While there exists approximate solutions to the problem of two-sample mean comparison with heterogeneous variances, for example, Welch’s2 approximate -test, and Dudewicz et al.7 method for an exact solution to the Behrens-Fisher problem, these methods are either based on the approximate null distribution of the test statistics or based on approximate computation of the -values. The method based on the new exact -test in this paper provides an accurate solution to the problem of two normal population mean comparison without any restriction on the population variances. The real data analysis shows the new exact -test could detect some situations of mean difference for which the traditional ANOVA -test fails. Therefore, the method based on the new exact -test in this paper is recommended to be used together with some existing methods for the same purpose in problems of multiple mean comparisons.17–1

Acknowledgments

This work was partially supported by a UIC New Faculty Start-up Research Fund R72021106, and in part by the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College (UIC), project code 2022B1212010006.

Conflicts of interest

The authors declare that there are no conflicts of interest.

References

  1. Gao N, Hu R, Huang Y, et al. Specific effects of prenatal DEHP exposure on neuroendocrine gene expression in the developing hypothalamus of male rats. Arch Toxicol. 2018;92(1):501–512.
  2. Welch BL. The generalization of Student’s problem when several different population variances are involved. Biometrika. 1947;34(1–2):28–35.
  3. Turkey JW. Comparing individual means in the analysis of variance. Biometrics.1949;5(2):99–114.
  4. Kramer CY. Extension of multiple range tests to group means with unequal numbers of replications. Biometrics. 1956;12(3):307–310.
  5. Best DJ, Rayner JCW. Welch’s approximate solution for the Behrens-Fisher problem. Technometrics. 1987;29(2):205–210.
  6. Kruskal WH, Wallis WA. Use of ranks in one-criterion variance analysis. J Amer Statist Assoc. 1952;47(260):583–621.
  7. Dudewicz EJ, Ma Y, Mai E, et al. Exact solutions to the Behrens-Fisher problem: asymptotically optimal and finite sample efficient choice among. J Statist Plann & Infer. 2007;137(5):1584–1605.
  8. Törönen P, Kolehmainen M, Wong G, et al. Analysis of gene expression data using self-organizing maps. FEBS Letters. 1999;451(2):142–146.
  9. Brazma A, Vilo J. Minireview: Gene expression data analysis. FEBS Letters. 2000;480(2000):17–24.
  10. Sherlock G. Analysis of large-scale gene expression data. Current Opinion in Immunology. 2000;12(2):201–205.
  11. Yeung KY, Ruzzo W L. Principal component analysis for clustering gene expression data. Bioinformatics. 2001;17(9):763–774.
  12. Slonim DK. From patterns to pathways: gene expression data analysis comes of age. Nature Genetics Supplement. 2002;32:502–508.
  13. Parmigiani G, Garrett ES, Irizarry RA, et al. The Analysis of Gene Expression Data: An Overview of Methods and Software. Springer: 2003.
  14. Wolf FE, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology. 2018;19:15.
  15. Liang J, Tang ML, Yang J, et al. An application of the theory of spherical distributions in multiple mean comparison. In: Fan J, Pan J, editors. Contemporary Experimental Design, Multivariate Analysis and Data Mining - Festschrift in Honour of Professor Kai-Tai Fang. Springer-Verlag; 2020;189–199.
  16. Bartlett MS. Properties of sufficiency and statistical tests. Proc Roy Statist Soc. (Ser. A). 1937;160:268–282.
  17. Brown MB, Forsythe AB. Robust tests for the equality of variances. J Amer Statist Assoc. 1974;69(346):364–367.
  18. Fang KT, Kotz S, Ng KW. Symmetric Multivariate and Related Distributions. Chapman and Hall Ltd. London and New York;1990.
  19. Fang KT, Zhang YT. Generalized Multivariate Analysis. Science Press and Springer-Verlag. Beijing and Berlin;1990.
Creative Commons Attribution License

©2022 Cao, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.