Theory of estimation of parameters and genetic values under mixed models

doi:10.15406/ijawb.2024.08.00210

In animal breeding, it is essential to know genetic parameters such as heritability, with the aim of being able to predict genetic values (GV) and efficiently direct selection programs. A mixed model refers to those cases where the researcher considers fixed and random factors in a statistical model. Models widely used in the area of animal genetic improvement are the reproductive model and the animal model, which consider the reproductive or animal factor as random and a group of non-genetic effects as fixed. These mixed models allow us to obtain both heritability values (h²) for a trait, as well as genetic predictions such as the expected progeny difference (EPDs) or the predicted transmission ability (PTA) for each animal. An example of birth weight (BW) in cattle was used to calculate the VG, h2 and e2 using a mixed model, with a fixed and a random factor. The ANOVA, ML and REML methods were used to calculate h², e² and the VG first using all the information and subsequently assuming the last lost data, under a reproductive model and an animal model. The results found using the 3 methods were the same for REML and ANOVA in balanced data and different for the 3 methods in unbalanced data, where in the unbalanced case the ANOVA estimated a negative variance component, therefore, it can be concluded that estimate genetic values and parameters using ANOVA, ML and REML, but with the risk of estimating negative variance components using ANOVA or null (or overestimated) heritabilities with likelihood-based methods when the data structure or model is not the same correct.

Keywords: heritability, ANOVA, REML, ML, mixed models

In animal breeding, it is essential to know genetic parameters such as heritability in order to be able to predict genetic values (GV) and efficiently conduct selection programs. Genetic parameters are ratios between estimated population variances, known as variance components, which are calculated using linear models containing fixed and random factors, generally known as mixed models.¹ For the correct estimation of parameters and genetic values, it is necessary to have a broad knowledge of estimation using mixed models. Therefore, this article reviews the estimation of variance components and genetic values using ANOVA, ML and REML under a reproductive model and an animal model, explaining the virtues and limitations of each method in balanced and unbalanced data.

Theoretical framework mixed models

A mixed model refers to those cases where the researcher considers both fixed and random factors in a statistical model.² A model widely used in the area of animal breeding is the reproductive model or Sire Model, which considers the reproductive factor as random and a group of non-genetic effects as fixed.² The reproductive model allows obtaining both heritability values (h²) for a trait, as well as genetic predictions such as the expected difference of progeny (DEPs) or the predicted transmission ability (PTA) for each breeder.³ In matrix algebra the reproductive model takes the following form:

$y = X b + Z s + e$

Where 𝑦 is a vector for the data, 𝑋 is an incidence matrix relating the data to the fixed effects, 𝑏 is a vector of unknown parameters for the fixed effects, 𝑍 is an incidence matrix relating the data to the random effects, 𝑠 is a vector of unknown predictions for each player, and 𝑒 is a vector of residuals.

The covariance structure of the above model is:

$V A R [\frac{s}{e}] = [\begin{matrix} I σ^{2}_{s} & 0 \\ 0 & I σ^{2}_{s} \end{matrix}]$

Where 𝐼 is an identity matrix, 𝜎_𝑠² is the variance between breeders and 𝜎_𝑒² is the residual variance.

The Henderson normal equations, necessary to find the genetic values of the breeders, for the above model are given by³:

$[\begin{matrix} X' X & X' Z \\ Z' X & Z' Z + I α \end{matrix}] [\begin{matrix} b_{i} \\ s_{i} \end{matrix}] = [\begin{matrix} X^{'} y \\ Z' y \end{matrix}]$

Where 𝛼 is a ratio of the residual variance to the variance between breeders:

$α = \frac{σ_{e}^{2}}{σ_{s}^{2}}$

According to Román and Aranguren,⁴ it is possible to substitute 𝐼𝛼 by 𝐴⁻¹𝛼 in the normal Henderson equations, with the objective of improving predictions using all the parentage information between males, therefore, the new equations are:

$[\begin{matrix} X' X & X' Z \\ Z' X & Z' Z + A^{- 1} α \end{matrix}] [\begin{matrix} b_{i} \\ s_{i} \end{matrix}] = [\begin{matrix} X' y \\ Z' y \end{matrix}]$

Where 𝐴⁻¹ is the inverse of the kinship matrix.

And the covariance structure taking into account the introduction of 𝐴 is⁵:

$V A R [\begin{matrix} s \\ e \end{matrix}] - [\begin{matrix} A σ_{s}^{2} & 0 \\ 0 & I σ_{e}^{2} \end{matrix}]$

Another model widely used in genetic evaluation is the animal model, which uses all the parentage information in the pedigree, and unlike the reproductive model, allows obtaining genetic predictions of all the animals in the herd, whether or not data is present or not:

$y = X b + Z a + e$

Where 𝑎 is a vector of genetic predictions for each animal the covariance structure of the above model is as follows:

$V A R [\begin{matrix} a \\ e \end{matrix}] = [\begin{matrix} A σ_{a}^{2} & 0 \\ 0 & I σ_{e}^{2} \end{matrix}] = [\begin{matrix} G & 0 \\ 0 & R \end{matrix}]$

Where 𝐺 is a variance and covariance matrix for the random effects and 𝑅 is a matrix of residuals.

The Henderson normal equations for this model are given by:⁶

$[\begin{matrix} X' X & X' Ζ \\ Ζ' X & Ζ' Ζ + Α^{- 1} α \end{matrix}] [\begin{matrix} b_{i} \\ a_{i} \end{matrix}] = [\begin{matrix} X' y \\ Z' y \end{matrix}]$ >

Where 𝛼 in this model is a ratio of the residual variance to the additive variance:

$α = \frac{σ_{e}^{2}}{σ_{a}^{2}}$

Where 𝜎_𝑎² is additive genetic variance

Genetic parameters

Using mixed models, it is possible to estimate the variance components, and from them calculate the hereability, which is given by:⁵

$h^{2} = \frac{σ_{a}^{2}}{σ_{p}^{2}}$

Where ℎ² is heritability, and 𝜎_𝑝² is phenotypic variance, therefore, heritability is defined as a quotient between the additive variance and the phenotypic variance. The additive component (additive variance) of the numerator of the formula of ℎ² can be estimated using several procedures, a well-known one is to use a reproductive model to estimate the variance between breeders, which is ¼ of the additive variance, therefore, a formula to estimate 𝜎_𝑎² is:

$σ_{a}^{2} = 4 σ_{s}^{2}$

Where 4𝜎_𝑠² is four times the variance among breeders, therefore, heritability can be calculated as:⁷

$h^{2} = \frac{4_{s}^{2}}{σ_{p}^{2}}$

If the heritability is known, the heritability component can be 𝜎_𝑎² component can be calculated using the following formula:⁸

$σ_{a}^{2} = h^{2} σ_{p}^{2}$

Another parameter of interest is the environmental proportion coefficient, which indicates how much of the differences observed in the phenotype (data) of the animals are due to non-genetic (environmental) factors, this coefficient has the following mathematical formula:⁹

$e^{2} = \frac{σ_{e n}^{2}}{σ_{p}^{2}}$

Where 𝜎_𝑒𝑛² is the environmental variance. The variance component 𝜎_𝑒𝑛² is calculated using the difference between 𝜎_𝑝² 𝜎_𝑎² therefore, the formula for 𝜎_𝑒𝑛² is:⁸

$σ_{e n}^{2} = σ_{p}^{2} - σ_{a}^{2}$

Finally, the variance 𝜎_𝑝² is the sum of the variance components:

$σ_{p}^{2} = σ_{a}^{2} + σ_{e n}^{2} = σ_{s}^{2} + σ_{e}^{2}$

Variance component estimation using a reproductive model analysis of variance

There are several classical methods for estimating the variance components needed to compute ℎ² y 𝑒²including analysis of variance (ANOVA), maximum likelihood (ML) and restricted maximum likelihood (REML).

ANOVA is a technique that attempts to separate out different sources of variability. 𝜎_𝑝² into different sources of variability, this involves the separation of sums of squares (SC), degrees of freedom (GL) and mean squares (MS) for each source of variation. Variance components estimated using ANOVA are calculated by equating the expected values of the CM (E (CM)) for each source of variation, with their respective CM and solving the resulting system of equations.¹⁰ CMs are a ratio of SC to GLs for each source of variation:¹⁰

$C M = \frac{S C}{G L}$

In the case of a fixed factor and a random factor, without interaction, the reproductive model, in elementary algebra, is given by:

$y_{i j k} = μ + s_{i} + b_{j} + e_{i j k}$

And the ANOVA square for the above model is presented in Table 1.

FV	SC	GL	CM	E ( CM)
Factor Fijo	SCb	$n_{f i j o}$ -1	$\frac{S C b}{n_{f i j o} - 1}$
Padres	SCs	$n_{s}$ -1	$\frac{S C s}{n_{s} - 1}$	E( ${CM}_{s}) = σ_{e}^{2} + n_{f i j o} k σ_{s}^{2}$
residual	$S C t o t a l - \sum^{} S C r e s t o$	$G L t o t a l - \sum^{} G L r e s t o$	$\frac{S C e}{G L t o t a l - G L r e s t o}$	$E ({CM}_{e}) = σ_{e}^{2}$
Total	$y' y - R (μ)$	n-1

Table 1 ANOVA for Henderson's method III

Where 𝑘 is the number of replicates of the design, 𝑛_{𝑓𝑖𝑗𝑜} is the number of levels of the fixed effect and 𝑛_𝑠 is the number of levels of the random factor. The variance components are calculated by equating the CM to their E (CM):

$C M s = σ_{e}^{2} + n_{f i j o} k σ_{s}^{2}$

$C M e = σ_{e}^{2}$

And the unique solution of this system of equations is:

$σ_{s}^{2} = \frac{C M s - C M e}{n_{f i j o} k}$

$σ_{e}^{2} = C M_{e}$

In balanced data, the CS can be estimated directly without the need for adjustment, for a model with two non-interacting factors:

$S C s = \frac{\sum y_{s .}^{2}}{k} - \frac{{(\sum y)}^{2}}{n}$

$S C b = \frac{\sum y_{b .}^{2}}{w} - \frac{{(\sum y)}^{2}}{n}$

Where ∑ 𝑦_𝑠²_. is the sum of the sum of the sum of the data for each player squared, ∑ 𝑦_𝑏²_. is the sum of the sum of the sum of the data for each level of the fixed effect and w is the number of replicates for the fixed effect. For the unbalanced case, the SCs have to be calculated using the type III SCs for the random factor (sire), since type III calculates the SCs of an effect by correcting them with respect to any other effect that does not contain it and orthogonal to any effect (if it exists) that contains it. Type III CS can be expressed as:¹¹

$S C s = S C (μ, s, b) - S C (μ, b)$

The 𝑆𝐶𝑠 is corrected for the effects of 𝜇 𝑦 𝑏where 𝜇 is the intercept or herd mean effect. In order to find the values of 𝑆𝐶𝑠 it is necessary to fit a complete model and calculate (𝜇, 𝑠, 𝑏, ) and subtract 𝑆𝐶(𝜇, 𝑏) a reduced model.

Maximum likelihood

The maximum likelihood (ML) method is a classical method of parameter estimation proposed by Fisher,¹² but it was not until Hartley and Rao,¹³ that it was used for mixed models in general. Knowing the likelihood function as a function of the parameters of a statistical model given some data, in ML we try to obtain estimators of the variance components that maximize the likelihood function, that is, that have the maximum probability of representing the population parameters.

The likelihood function is defined as the product of the likelihood function of the data, but in practice, the natural logarithm of the likelihood function is used because it is more manageable, if the distribution of the data is normal, in matrix algebra the natural logarithm of the likelihood function is defined as:¹¹

$L n (L) = - 0.5 (n) . I n (2 π) - 0.5 I n | V | - 0.5 (y - X b)' V^{- 1} (y - X b)$

Where (𝐿) is the natural logarithm of the likelihood function and 𝑉 = 𝑍𝐺𝑍^′+ 𝑅 is the variance and phenotypic covariance matrix of the model. To find the estimators that maximize the likelihood, we need to find the maximum of equation (𝐿)This is achieved with different methodologies, for example, if the data structure is balanced and we have a mixed model, with a random effect and a fixed one with no interaction, the derivative of 𝐿𝑛(𝐿)with respect to the parameters to be estimated σ²_s y σ²_e will lead us to a system of equations whose solution is:

$σ_{s}^{2} = \frac{\frac{S C s}{n_{s}} - σ_{e}^{2}}{n_{f i j o} k}$

$σ_{e}^{2} = [1 - \frac{n_{f i j o} - 1}{n_{s} (n_{f i j o} k - 1)}] C M e$

An important point of ML estimation, for this model, is that even with balanced data, it is possible to find estimators different from the ones presented above, since these solutions will be valid if the inequality 𝐶𝑀𝑠 > 𝐶𝑀𝑒is met, but on the other hand, if the inequality is 𝐶𝑀𝑠 < 𝐶𝑀𝑒 ML estimates for this model and balanced data are given by:¹¹

$σ_{e}^{2} = \frac{\overset{σ_{s}^{2} = 0}{S C t o t a l}}{(n_{s}) (n_{f i j o}) (k)}$

That is all phenotypic variability is residual, which may indicate that the model used is incorrect or that the number of data is insufficient, thus increasing the variability of the error. The variance σ²_p is the sum of the variance components σ²_e y σ²_s whose sum gives an estimate of σ²_p given mathematically by:

$σ_{p}^{2} = σ_{s}^{2} + σ_{e}^{2} = \frac{(y - X b)' (y - X b)}{n}$

Which is biased, since it is associated with n degrees of freedom. If the structure of the daros is unbalanced, the partial derivatives of (𝐿) lead to nonlinear maximum likelihood equations for the parameters to be estimated, therefore, the system of equations cannot be solved with direct methods. Faced with this problem, iterative number methods are used to try to approximate the maximum of (𝐿) which are applied to the logarithmic likelihood itself and not to the equations resulting from its first derivative, in order to be able to simultaneously calculate the variance components and (𝐿)which we can use to find fit criteria for our model, such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).

Restricted maximum likelihood

The restricted maximum likelihood method (REML) is a method proposed by Paterson and Thompson,¹³ which takes into account the loss of degrees of freedom by including fixed effects in the statistical model, therefore, the estimation of variability components are unbiased, since they are associated to degrees of freedom, which leads to an estimation of variance of the model. 𝑛 − (𝑋) degrees of freedom, which leads to an estimate of the variance, which is defined as σ²_pwhich is defined as:

$σ_{p}^{2} = \frac{(y - X b)' (y - X b)}{n - R a n g o (X)}$

Where (𝑋) is the rank of the incidence matrix for the fixed effects of the model. For the case where the only fixed effect is 𝜇the variance σ²_p is associated with 𝑛 − 1 degrees of freedom.

As in ML, in REML, the objective is to maximize the logarithm of a function of the parameters, but in this case restricted, which is known as restricted likelihood function, which in matrix algebra is defined as:¹⁴

$L n (L r) = - 0.5 (n - p) . I n (2 π) - 0.5 I n | V | - 0.5 | X' V^{- 1} X | - 0.5 (y - X b)' V^{- 1} (y - X b)$

Where (𝐿_𝑟) is the logarithm of the restricted likelihood function. If we have a balanced data structure and by deriving (𝐿_𝑟) as a function of the variability components of the model (model above), we can solve a system of equations that give rise to estimates given by:¹¹

$σ_{s}^{2} = \frac{C M s - C M e}{n_{f i j o} k}$

$σ_{e}^{2} = C M e$

These are identical to estimates using an ANOVA, since a property of REML is that in a balanced data structure, REML estimates = ANOVA as long as the inequality is satisfied. 𝐶𝑀𝑠 > 𝐶𝑀𝑒Otherwise, estimates via ANOVA would be negative and in REML all phenotypic variability is residual. In unbalanced data structure, the derivative of (𝐿_𝑟) with respect to the variance components, gives rise to nonlinear equations, which cannot be solved directly, therefore, in these cases, as in ML, iterative numerical methods are used to approximate the value of the variance components.

REML estimates using kinship information in an animal model

In the case of a simple animal model, where each animal has only one data (and there are animals without data), the ANOVA method cannot be applied, since it is not possible to estimate the variation within groups using this methodology, because the classification variable is each animal that has a unique record, but the ML and REML estimations are applicable since they allow introducing kinship information in the matrix. 𝐴. In a mixed model, maximizing (𝐿_𝑟) is equivalent to minimize −2 (𝐿_𝑟) Therefore, the objective function to be minimized, in matrix algebra, can be defined as:¹⁴

$- 2 L n (L r) = (n - p) . I n (2 π) + L n | R | + L n | G | + L n | C | + y' P y$

Where 𝐿𝑛|𝐶| is the natural logarithm of the determinant of the coefficient matrix of the normal Henderson equations and ′𝑃𝑦 is the generalized residual sum of squares. Obviously to minimize −2𝐿𝑛(𝐿_𝑟) iterative numerical methods are needed, but it has the advantage that it is easier than maximizing 𝐿𝑛(𝐿_𝑟) Therefore, most specialized REML programs use sparse matrix algorithms and numerical methods to try to find estimators resulting from the minimization of −2𝐿𝑛(𝐿_𝑟).

An example of birth weight (BW) in cattle was used to calculate the VG, ℎ² y 𝑒² using a mixed model, with a fixed and a random factor. The database is presented in Table 2. In this problem, we want to eliminate the variability that exists between the sexes, therefore, the sex factor is considered as fixed and the father factor as random, which leads us to the statistical model for this problem:

$P N = m e d i a + p a d r e + s e x o + e r r o r$

Father	Animal	Sex	y
1	3	Male	36
1	4	Male	35
1	5	Female	33
1	6	Female	28
2	7	Female	31
2	8	Female	29
2	9	Male	28
2	10	Male	36
3	11	Male	38
3	12	Male	37
3	13	Female	29
3	14	female	35

Table 2 Database of animal records, sex and NP

ANOVA, ML, and REML methods were used to calculate ℎ² , 𝑒² and GVs using the data in Table 2, first using all the information and then assuming the last missing data. For the animal model, a similar model was used:

$P N = m e d i a + a n i m a l + s e x o + e r r o r$

Where all the kinship information and the value of the variance components found in the previous model were used to solve the Henderson normal equations.

Balanced data in a reproductive model

To calculate the CM, it is necessary to calculate the SC and GL for each source of variation, for this model and our data structure, we can calculate them using the formulas in Table 1.

$\begin{array}{l} G L_{s} = 3 - 1 = 2 \\ G L_{s e x o} = 2 - 1 = 1 \\ G L_{t o t a l} = 12 - 1 = 11 \\ G L_{e} = 11 - 1 - 2 = 11 - 3 = 8 \end{array}$

And since the design is balanced, the SCs are:

$\begin{array}{l} S C_{t o t a l} = 36^{2} + 35^{2} + 33^{2} + \dots + 35^{2} - \frac{395^{2}}{12} = 152.916 \\ S C_{s} = \frac{{(132)}^{2} + {(124)}^{2} + {(139)}^{2}}{4} - \frac{395^{2}}{12} = 28.166 \\ S C_{s e x o} = \frac{{(185)}^{2} + {(210)}^{2}}{6} - \frac{395^{2}}{12} = 52.083 \\ S C_{e} = 152.916 - (28.166 + 52.083) = 72.667 \end{array}$

And the CMs come from:

$\begin{array}{l} C M_{S} = \frac{28.166}{2} = 14.083 \\ C M_{e} = \frac{72.667}{8} = 9.083 \end{array}$

And from the CM we can calculate the variance components:

$σ_{s}^{2} = \frac{14.083 - 9.083}{2 (2)} = 1.25$

Therefore, ℎ² using ANOVA is:

$\begin{array}{l} h^{2} = \frac{4 (1.25)}{1.25 + 9.083} = 0.483 \\ e^{2} = \frac{10.33 - 5}{10.33} = 0.515 \end{array}$

And these ANOVA estimates, too, are REML, since the data structure is balanced and the 𝐶𝑀𝑠 > 𝐶𝑀𝑒. Now the calculation of the GVs, using the REML estimates, comes from the solutions of the normal Henderson equations, using the estimated value of the variance components, and calculating the value of 𝛼 we have that:

$α = \frac{9.083}{1.25} = 7.2664$

Therefore, the equations are:

$(\begin{matrix} 6 & 0 & \begin{matrix} 2 & 2 & 2 \end{matrix} \\ 0 & 6 & \begin{matrix} 2 & 2 & 2 \end{matrix} \\ \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} \begin{matrix} 4 + 7.266 & 0 & 0 \end{matrix} \\ \begin{matrix} 0 & 4 + 7.266 & 0 \end{matrix} \\ \begin{matrix} 0 & 0 & 4 + 7.266 \end{matrix} \end{matrix} \end{matrix}) [\begin{matrix} \begin{matrix} b_{1} \\ b_{2} \end{matrix} \\ s_{1} \\ s_{2} \\ s_{3} \end{matrix}] = [\begin{matrix} 185 \\ 210 \\ 132 \\ 124 \\ 139 \end{matrix}] \to [\begin{matrix} 6 & 0 & \begin{matrix} 2 & 2 & 2 \end{matrix} \\ 0 & 6 & \begin{matrix} 2 & 2 & 2 \end{matrix} \\ \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} \begin{matrix} 11.266 & 0 & 0 \end{matrix} \\ \begin{matrix} 0 & 11.266 & 0 \end{matrix} \\ \begin{matrix} 0 & 0 & 11.266 \end{matrix} \end{matrix} \end{matrix}] [\begin{matrix} b_{1} \\ b_{2} \\ s_{1} \\ s_{2} \\ s_{3} \end{matrix}] = [\begin{matrix} 185 \\ 210 \\ 132 \\ 124 \\ 139 \end{matrix}]$

In the previous equations the value of 𝜇 was forced to be zero in order to break the linear dependence between the rows and columns of the coefficient matrix. The solution of this system of equations is given by:

$[\begin{matrix} b_{1} \\ b_{2} \\ s_{1} \\ s_{2} \\ s_{3} \end{matrix}] = {[[\begin{matrix} 6 & 0 & \begin{matrix} 2 & 2 & 2 \end{matrix} \\ 0 & 6 & \begin{matrix} 2 & 2 & 2 \end{matrix} \\ \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} \begin{matrix} 11.266 & 0 & 0 \end{matrix} \\ \begin{matrix} 0 & 11.266 & 0 \end{matrix} \\ \begin{matrix} 0 & 0 & 11.266 \end{matrix} \end{matrix} \end{matrix}]]}^{- 1} [\begin{matrix} 185 \\ 210 \\ 132 \\ 124 \\ 139 \end{matrix}] = [\begin{matrix} 30.83 \\ 35 \\ 0.029 \\ - 0.680 \\ 0.650 \end{matrix}]$

ML estimates are:

$\begin{array}{l} σ_{s}^{2} = \frac{\frac{28.166}{3} - 8.073}{2 (2)} = 0.328 \\ σ_{e}^{2} = [1 - \frac{2 - 1}{3 (2 (2) - 1)}] 9.083 = 8.073 \end{array}$

Y ℎ²y 𝑒²using ML is:

$\begin{array}{l} h^{2} = \frac{4 (0.328)}{0.328 + 8.073} = 0.156 \\ e^{2} = \frac{8.401 - 1.3212}{8.401} = 0.842 \end{array}$

And the equations using ML are:

$[\begin{matrix} 6 & 0 & \begin{matrix} 2 & 2 & 2 \end{matrix} \\ 0 & 6 & \begin{matrix} 2 & 2 & 2 \end{matrix} \\ \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} \begin{matrix} 28.612 & 0 & 0 \end{matrix} \\ \begin{matrix} 0 & 28.612 & 0 \end{matrix} \\ \begin{matrix} 0 & 0 & 28.612 \end{matrix} \end{matrix} \end{matrix}] [\begin{matrix} b_{1} \\ b_{2} \\ s_{1} \\ s_{2} \\ s_{3} \end{matrix}] = [\begin{matrix} 185 \\ 210 \\ 132 \\ 124 \\ 139 \end{matrix}]$

Therefore the solution is:

$[\begin{matrix} b_{1} \\ b_{2} \\ s_{1} \\ s_{2} \\ s_{3} \end{matrix}] = {[\begin{matrix} 6 & 0 & \begin{matrix} 2 & 2 & 2 \end{matrix} \\ 0 & 6 & \begin{matrix} 2 & 2 & 2 \end{matrix} \\ \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} \begin{matrix} 28.612 & 0 & 0 \end{matrix} \\ \begin{matrix} 0 & 28.612 & 0 \end{matrix} \\ \begin{matrix} 0 & 0 & 28.612 \end{matrix} \end{matrix} \end{matrix}]}^{- 1} [\begin{matrix} 185 \\ 210 \\ 132 \\ 124 \\ 139 \end{matrix}] = [\begin{matrix} 30.83 \\ 35 \\ 0.011 \\ - 0.267 \\ 0.256 \end{matrix}]$

Introduction of the parentage matrix in a reproductive model

Now it is assumed that animal 1 is the father of animal 2, therefore, the equations take into account all the genealogy between males. First we have to calculate 𝐴⁻¹. Applying Henderson's rules,⁵ we have:

$A^{- 1} = [\begin{matrix} 1 + 1 / 3 & - 2 / 3 & 0 \\ - 2 / 3 & 1 / 4 & 0 \\ 0 & 0 & 1 \end{matrix}] = [\begin{matrix} 1 / 4 & - 2 / 3 & 0 \\ - 2 / 3 & 1 / 4 & 0 \\ 0 & 0 & 1 \end{matrix}]$

Therefore,

$A^{- 1} α = [\begin{matrix} \frac{1}{4} & - \frac{2}{3} & 0 \\ - \frac{2}{3} & \frac{1}{4} & 0 \\ 0 & 0 & 1 \end{matrix}] (7.2664) = [\begin{matrix} 1.8166 & - 4.8442 & 0 \\ - 4.8442 & 1.8166 & 0 \\ 0 & 0 & 7.2664 \end{matrix}]$

And adding the Z'Z matrix:

$Z' Z + A^{- 1} α = [\begin{matrix} 4 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & 4 \end{matrix}] + [\begin{matrix} 1.8166 & - 4.8442 & 0 \\ - 4.8442 & 1.8166 & 0 \\ 0 & 0 & 7.2664 \end{matrix}] = [\begin{matrix} 5.8166 & - 0.8442 & 0 \\ - 0.8442 & 5.8166 & 0 \\ 0 & 0 & 11.2664 \end{matrix}]$

Therefore, the Henderson normal equations are:

$[\begin{matrix} 6 & 0 & \begin{matrix} 2.0000 & 2.0000 & 2.0000 \end{matrix} \\ 0 & 6 & \begin{matrix} 2.0000 & 2.0000 & 2.0000 \end{matrix} \\ \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} \begin{matrix} 5.8166 & - 0.8442 & 0.0000 \end{matrix} \\ \begin{matrix} - .08442 & 5.8166 & 0.0000 \end{matrix} \\ \begin{matrix} 0.0000 & 0.0000 & 11.2664 \end{matrix} \end{matrix} \end{matrix}] [\begin{matrix} b_{1} \\ b_{2} \\ s_{1} \\ s_{2} \\ s_{3} \end{matrix}] = [\begin{matrix} 185 \\ 210 \\ 132 \\ 124 \\ 139 \end{matrix}]$

And the solution of these equations is:

$[\begin{matrix} b_{1} \\ b_{2} \\ s_{1} \\ s_{2} \\ s_{3} \end{matrix}] = {[\begin{matrix} 6 & 0 & \begin{matrix} 2.0000 & 2.0000 & 2.0000 \end{matrix} \\ 0 & 6 & \begin{matrix} 2.0000 & 2.0000 & 2.0000 \end{matrix} \\ \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} 2 \\ 2 \\ 2 \end{matrix} & \begin{matrix} \begin{matrix} 5.8166 & - 0.8442 & 0.0000 \end{matrix} \\ \begin{matrix} - 0.8442 & 5.8166 & 0.0000 \end{matrix} \\ \begin{matrix} 0.0000 & 0.0000 & 11.2664 \end{matrix} \end{matrix} \end{matrix}]}^{- 1} [\begin{matrix} 185 \\ 210 \\ 132 \\ 124 \\ 139 \end{matrix}] = [\begin{matrix} 31.6285 \\ 35.7952 \\ - 0.7765 \\ - 1.9776 \\ 0.3685 \end{matrix}]$

In this solution, the fixed effect (sex) is adjusted for the random example, and the random effect is adjusted for the fixed effect.

Unbalanced data in a reproductive model

Assuming the last missing data, the 𝑆𝐶𝑡𝑜𝑡𝑎𝑙 is:

$S C t o t a l = S C_{t o t a l} = 36^{2} + 35^{2} + 33^{2} + \dots + 29^{2} - \frac{360^{2}}{11} = 148.1818182$

The GLs are:

$\begin{array}{l} G L_{s} = 3 - 1 = 2 \\ G L_{s e x o} = 2 - 1 = 1 \\ G L_{t o t a l} = 11 - 1 = 10 \\ G L_{e} = 10 - 1 - 2 = 11 - 3 = 7 \end{array}$

To calculate the ordinary least squares solutions for the reduced model (without the sire factor) must be estimated:

$b = {[\begin{matrix} 11 & 5 & 6 \\ 5 & 5 & 0 \\ 6 & 0 & 6 \end{matrix}]}^{- 1} [\begin{matrix} 360 \\ 150 \\ 0 \end{matrix}] = [\begin{matrix} 0.1666 & - 0.1666 & 0 \\ - 0.1666 & 0.3666 & 0 \\ 0 & 0 & 0 \end{matrix}] [\begin{matrix} 360 \\ 150 \\ 0 \end{matrix}] = [\begin{matrix} 35 \\ - 5 \\ 0 \end{matrix}]$

Therefore, the 𝑆𝐶𝑠𝑒𝑥𝑜 for the reduced model is:

$S C_{s e x o (r e d u c i d o)} = [35 - 5 0] [\begin{matrix} 360 \\ 150 \\ 210 \end{matrix}] - \frac{360^{2}}{11} = (35) (360) + (- 5) (150) - \frac{360^{2}}{11} = 68.1818$

To obtain the adjusted SCs, we calculate the SC for the full model and subtract from it, respectively (𝑟𝑒𝑑𝑢𝑐𝑖𝑑𝑜):

$S C s = S C (μ, s, s e x o,) - S C (μ, s e x o) = 83.6818182 - 68.1818 = 15.5$

And the 𝑆𝐶_𝑒 is:

$S C_{e} = 148.1818182 - (15.5 + 68.1818) = 64.5$

Therefore, the CMs are:

$\begin{array}{l} C M_{s} = \frac{15.5}{2} = 7.75 \\ C M_{e} = \frac{64.5}{7} = 9.214286 \end{array}$

Finally, the variance components are:

$\begin{array}{l} σ_{s}^{2} = \frac{7.75 - 9.214286}{3.6} = - 0.4067 \\ σ_{e}^{2} = 9.214286 \end{array}$

The component σ²_s component is a negative estimate of the variance, because the 𝐶𝑀𝑠 < 𝐶𝑀𝑒is negative, therefore, the ML and REML estimates for σ²_s are:

σ²_s= 0

In other words, all the total variability is residual. Table 3 shows the ML and REML iterations for the calculation of the residual variance with the Newton-Rapson method: As shown in Table 3, when an unbalanced database is used, the estimates for the variance components are different in ANOVA and REML, since with ANOVA we have σ²_e = 9.2142 and with REML σ²_e = 8.88. For this particular case, ℎ²= 0 because the variance of the numerator is zero, obviously to find a more credible estimate, one should increase the number of data used in the genetic evaluation or try another model and compare the AIC.

ML				REML
Iteración	-2ln(L)	σ_s²	σ_e²	iteración	-2ln()	σ_s²	σ_e²
1	53.0420	0	7.2727	1	48.6053	0	8.88
2	53.0420	0	7.2727	2	48.6053	0	8.88

Table 3 Iteration of variance components for ML and REML

Animal model

The solutions of the Henderson normal equations, using the values of σ²_e = 9.083 y σ²_a= 5 are presented in Table 4 Henderson's normal equations are not presented for this case due to its large dimensions.

Animal	VG
1	0.422982
2	-0.984574
3	1.10566
4	0.217214
5	0.809321
6	-0.651756
7	-0.273233
8	-0.857664
9	-2.32642
10	0.113062E-01
11	1.33545
12	1.04324
13	-0.117947
14	1.63535

Table 4 Genetic values using an animal model

Genetic values and parameters can be estimated using ANOVA, ML and REML, but with the risk of estimating negative variance components using ANOVA or zero (or overestimated) heritabilities with likelihood-based methods when the data structure or model is not correct. When the data structure is unbalanced, mathematical calculations with ANOVA, ML and REML are more complex and require computational algorithms with higher performance.

None.

The authors declared that there are no conflicts of interest.

Caballero J, Pablo E, Martinez C. Restricted maximum likelihood estimation of variance and covariance components of multiple traits under designs i and ii of North Carolina. Rev Fitotec Mex. 2003;26(1):53–66.
Castejón, Osiris. Design and analysis of experiments with statistix. Venezuela: Rafael Urdaneta University; 2008.
Elzo M, Garay O. Evolution of genetic improvement practices in domestic animal populations. Gainesville: University of Florida; 2012.
Román R, Aranguren A. Genetic evaluation of breeding stock: achievements and challenges From: Achievements and challenges in dual purpose cattle breeding; 2014.
Gutiérrez P. Introduction to animal genetic assessment methodology adapted to the EHEA. Madrid: Complutense; 2010.
Aranguren A, Román R. The simple animal model: a methodology for geneticists from: achievements and challenges of dual purpose cattle breeding; 2014.
Perez J, Montiel N. Heritability of the IBMI index of Italian buffaloes used in artificial insemination. Rev Cientif FCV-LUZ XXXIII. 2023;205–207.
Román R, Aranguren J, Garcidueñas R, et al. Association between reproductive characteristics and milk production in crossbred heifers. Revista Espamciencia. 2023;14(2):63–70.
Montgomery D. Experiments with random factors. From: Design and analysis of experiments. 2nd edn. Limusa-wiley; 2015.
Searle R, Casella G, Mcculloch C. Variance Components. Wiley series; 1992.
Aldrich J. R. A. Fisher and the making of maximum likelihood 1912 - 1922. Statist Sci. 1997;12(3):162–176.
Hartley HO, Rao JNK. "Maximum likelihood estimation for the mixed analysis of variance model". Biometrika. 1967;54(1):93–108.
Patterson HD, Thompson R. "Recovery of inter-block information when block sizes are unequal". Biometrika. 1971;58:545–554.
Meyer K. Restricted maximum likelihood to estimate variance components for animal models with several random effects using a derivative-free algorithm. Genet Sel Evol. 1989;21:317–340.
Román R, Aranguren J, Villasmil Y, et al. Analysis of fertility at first service in dual-purpose heifers under an animal model. Revista Científica. 2010;20(4):383–389.

Submit manuscript...

International Journal of

eISSN: 2574-9862

Avian & Wildlife Biology

Theory of estimation of parameters and genetic values under mixed models

Pérez González José Raúl,

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Morales Valladares David Daniel

Abstract

Introduction

Material and methods

Results and discussion

Conclusion

Acknowledgments

Conflicts of interest

References

Citations

Message by EIC

Editor's Message

Journal Menu

Useful Links