Research Article Volume 8 Issue 1
Maracaibo Territorial Polytechnic University of Maracaibo, Venezuela
Correspondence: Pérez González José Raúl, Maracaibo Territorial Polytechnic University of Maracaibo, Venezuela
Received: February 20, 2024  Published: March 20, 2024
Citation: Raúl PGJ,Valladares M, Daniel D. Theory of estimation of parameters and genetic values under mixed models. Int J Avian & Wildlife Biol. 2024;8(1):2733. DOI: 10.15406/ijawb.2024.08.00210
In animal breeding, it is essential to know genetic parameters such as heritability, with the aim of being able to predict genetic values (GV) and efficiently direct selection programs. A mixed model refers to those cases where the researcher considers fixed and random factors in a statistical model. Models widely used in the area of animal genetic improvement are the reproductive model and the animal model, which consider the reproductive or animal factor as random and a group of nongenetic effects as fixed. These mixed models allow us to obtain both heritability values (h^{2}) for a trait, as well as genetic predictions such as the expected progeny difference (EPDs) or the predicted transmission ability (PTA) for each animal. An example of birth weight (BW) in cattle was used to calculate the VG, h2 and e2 using a mixed model, with a fixed and a random factor. The ANOVA, ML and REML methods were used to calculate h^{2}, e^{2} and the VG first using all the information and subsequently assuming the last lost data, under a reproductive model and an animal model. The results found using the 3 methods were the same for REML and ANOVA in balanced data and different for the 3 methods in unbalanced data, where in the unbalanced case the ANOVA estimated a negative variance component, therefore, it can be concluded that estimate genetic values and parameters using ANOVA, ML and REML, but with the risk of estimating negative variance components using ANOVA or null (or overestimated) heritabilities with likelihoodbased methods when the data structure or model is not the same correct.
Keywords: heritability, ANOVA, REML, ML, mixed models
In animal breeding, it is essential to know genetic parameters such as heritability in order to be able to predict genetic values (GV) and efficiently conduct selection programs. Genetic parameters are ratios between estimated population variances, known as variance components, which are calculated using linear models containing fixed and random factors, generally known as mixed models.^{1} For the correct estimation of parameters and genetic values, it is necessary to have a broad knowledge of estimation using mixed models. Therefore, this article reviews the estimation of variance components and genetic values using ANOVA, ML and REML under a reproductive model and an animal model, explaining the virtues and limitations of each method in balanced and unbalanced data.
Theoretical framework mixed models
A mixed model refers to those cases where the researcher considers both fixed and random factors in a statistical model.^{2} A model widely used in the area of animal breeding is the reproductive model or Sire Model, which considers the reproductive factor as random and a group of nongenetic effects as fixed.^{2} The reproductive model allows obtaining both heritability values (h^{2}) for a trait, as well as genetic predictions such as the expected difference of progeny (DEPs) or the predicted transmission ability (PTA) for each breeder.^{3} In matrix algebra the reproductive model takes the following form:
$y=Xb+Zs+e$
Where 𝑦 is a vector for the data, 𝑋 is an incidence matrix relating the data to the fixed effects, 𝑏 is a vector of unknown parameters for the fixed effects, 𝑍 is an incidence matrix relating the data to the random effects, 𝑠 is a vector of unknown predictions for each player, and 𝑒 is a vector of residuals.
The covariance structure of the above model is:
$VAR\left[\frac{s}{e}\right]=\left[\begin{array}{cc}I{\sigma}^{2}{}_{s}& 0\\ 0& I{\sigma}^{2}{}_{s}\end{array}\right]$
Where 𝐼 is an identity matrix, 𝜎_{𝑠}^{2} is the variance between breeders and 𝜎_{𝑒}^{2} is the residual variance.
The Henderson normal equations, necessary to find the genetic values of the breeders, for the above model are given by^{3}:
$\left[\begin{array}{cc}X\text{'}X& X\text{'}Z\\ Z\text{'}X& Z\text{'}Z+I\alpha \end{array}\right]\left[\begin{array}{c}{b}_{i}\\ {s}_{i}\end{array}\right]=\left[\begin{array}{c}{X}^{\prime}y\\ Z\text{'}y\end{array}\right]$
Where 𝛼 is a ratio of the residual variance to the variance between breeders:
$\alpha =\frac{{\sigma}_{e}^{2}}{{\sigma}_{s}^{2}}$
According to Román and Aranguren,^{4} it is possible to substitute 𝐼𝛼 by 𝐴^{−1}𝛼 in the normal Henderson equations, with the objective of improving predictions using all the parentage information between males, therefore, the new equations are:
$\left[\begin{array}{cc}X\text{'}X& X\text{'}Z\\ Z\text{'}X& Z\text{'}Z+{A}^{1}\alpha \end{array}\right]\left[\begin{array}{c}{b}_{i}\\ {s}_{i}\end{array}\right]=\left[\begin{array}{c}X\text{'}y\\ Z\text{'}y\end{array}\right]$
Where 𝐴^{−1} is the inverse of the kinship matrix.
And the covariance structure taking into account the introduction of 𝐴 is^{5}:
$VAR\left[\begin{array}{c}s\\ e\end{array}\right]\left[\begin{array}{cc}A{\sigma}_{s}^{2}& 0\\ 0& I{\sigma}_{e}^{2}\end{array}\right]$
Another model widely used in genetic evaluation is the animal model, which uses all the parentage information in the pedigree, and unlike the reproductive model, allows obtaining genetic predictions of all the animals in the herd, whether or not data is present or not:
$y=Xb+Za+e$
Where 𝑎 is a vector of genetic predictions for each animal the covariance structure of the above model is as follows:
$VAR\left[\begin{array}{c}a\\ e\end{array}\right]=\left[\begin{array}{cc}A{\sigma}_{a}^{2}& 0\\ 0& I{\sigma}_{e}^{2}\end{array}\right]=\left[\begin{array}{cc}G& 0\\ 0& R\end{array}\right]$
Where 𝐺 is a variance and covariance matrix for the random effects and 𝑅 is a matrix of residuals.
The Henderson normal equations for this model are given by:^{6}
$\left[\begin{array}{cc}X\prime X& X\prime {\rm Z}\\ {\rm Z}\prime X& {\rm Z}\prime {\rm Z}+{{\rm A}}^{1}\alpha \end{array}\right]\left[\begin{array}{c}{b}_{i}\\ {a}_{i}\end{array}\right]=\left[\begin{array}{c}X\prime y\\ Z\prime y\end{array}\right]$ >
Where 𝛼 in this model is a ratio of the residual variance to the additive variance:
$\alpha =\frac{{\sigma}_{e}^{2}}{{\sigma}_{a}^{2}}$
Where 𝜎_{𝑎}^{2} is additive genetic variance
Genetic parameters
Using mixed models, it is possible to estimate the variance components, and from them calculate the hereability, which is given by:^{5}
${h}^{2}=\frac{{\sigma}_{a}^{2}}{{\sigma}_{p}^{2}}$
Where ℎ^{2} is heritability, and 𝜎_{𝑝}^{2} is phenotypic variance, therefore, heritability is defined as a quotient between the additive variance and the phenotypic variance. The additive component (additive variance) of the numerator of the formula of ℎ^{2} can be estimated using several procedures, a wellknown one is to use a reproductive model to estimate the variance between breeders, which is ¼ of the additive variance, therefore, a formula to estimate 𝜎_{𝑎}^{2} is:
${\sigma}_{a}^{2}=4{\sigma}_{s}^{2}$
Where 4𝜎_{𝑠}^{2} is four times the variance among breeders, therefore, heritability can be calculated as:^{7}
${h}^{2}=\frac{{4}_{s}^{2}}{{\sigma}_{p}^{2}}$
If the heritability is known, the heritability component can be 𝜎_{𝑎}^{2} component can be calculated using the following formula:^{8}
${\sigma}_{a}^{2}={h}^{2}{\sigma}_{p}^{2}$
Another parameter of interest is the environmental proportion coefficient, which indicates how much of the differences observed in the phenotype (data) of the animals are due to nongenetic (environmental) factors, this coefficient has the following mathematical formula:^{9}
${e}^{2}=\frac{{\sigma}_{en}^{2}}{{\sigma}_{p}^{2}}$
Where 𝜎_{𝑒𝑛}^{2} is the environmental variance. The variance component 𝜎_{𝑒𝑛}^{2} is calculated using the difference between 𝜎_{𝑝}^{2} 𝜎_{𝑎}^{2} therefore, the formula for 𝜎_{𝑒𝑛}^{2} is:^{8}
${\sigma}_{en}^{2}={\sigma}_{p}^{2}{\sigma}_{a}^{2}$
Finally, the variance 𝜎_{𝑝}^{2} is the sum of the variance components:
${\sigma}_{p}^{2}={\sigma}_{a}^{2}+{\sigma}_{en}^{2}={\sigma}_{s}^{2}+{\sigma}_{e}^{2}$
Variance component estimation using a reproductive model analysis of variance
There are several classical methods for estimating the variance components needed to compute ℎ^{2} y 𝑒^{2}including analysis of variance (ANOVA), maximum likelihood (ML) and restricted maximum likelihood (REML).
ANOVA is a technique that attempts to separate out different sources of variability. 𝜎_{𝑝}^{2} into different sources of variability, this involves the separation of sums of squares (SC), degrees of freedom (GL) and mean squares (MS) for each source of variation. Variance components estimated using ANOVA are calculated by equating the expected values of the CM (E (CM)) for each source of variation, with their respective CM and solving the resulting system of equations.^{10} CMs are a ratio of SC to GLs for each source of variation:^{10}
$CM=\frac{SC}{GL}$
In the case of a fixed factor and a random factor, without interaction, the reproductive model, in elementary algebra, is given by:
${y}_{ijk}=\mu +{s}_{i}+{b}_{j}+{e}_{ijk}$
And the ANOVA square for the above model is presented in Table 1.
FV 
SC 
GL 
CM 
E ( CM) 
Factor Fijo 
SCb 
${n}_{fijo}$ 1 
$\frac{SCb}{{n}_{fijo}1}$ 

Padres 
SCs 
${n}_{s}$ 1 
$\frac{SCs}{{n}_{s}1}$ 
E(${\text{CM}}_{\text{s}})={\sigma}_{\text{e}}^{2}+{n}_{fijo}k{\sigma}_{s}^{2}$ 
residual 
$SCtotal{{\displaystyle \sum}}^{\text{}}SCresto$  $GLtotal{{\displaystyle \sum}}^{\text{}}GLresto$  $\frac{SCe}{GLtotalGLresto}$ 
$\text{E}\left({\text{CM}}_{\text{e}}\right)={\sigma}_{e}^{2}$ 
Total 
$y\prime yR\left(\mu \right)$ 
n1 


Table 1 ANOVA for Henderson's method III
Where 𝑘 is the number of replicates of the design, 𝑛_{𝑓𝑖𝑗𝑜} is the number of levels of the fixed effect and 𝑛_{𝑠} is the number of levels of the random factor. The variance components are calculated by equating the CM to their E (CM):
$CMs={\sigma}_{e}^{2}+{n}_{fijo}k{\sigma}_{s}^{2}$
$CMe={\sigma}_{e}^{2}$
And the unique solution of this system of equations is:
${\sigma}_{s}^{2}=\frac{CMsCMe}{{n}_{fijo}k}$
${\sigma}_{e}^{2}=C{M}_{e}$
In balanced data, the CS can be estimated directly without the need for adjustment, for a model with two noninteracting factors:
$SCs=\frac{\sum {y}_{s.}^{2}}{k}\frac{{\left(\sum y\right)}^{2}}{n}$
$SCb=\frac{\sum {y}_{b.}^{2}}{w}\frac{{\left(\sum y\right)}^{2}}{n}$
Where ∑ 𝑦_{𝑠}^{2}_{.} is the sum of the sum of the sum of the data for each player squared, ∑ 𝑦_{𝑏}^{2}_{.} is the sum of the sum of the sum of the data for each level of the fixed effect and w is the number of replicates for the fixed effect. For the unbalanced case, the SCs have to be calculated using the type III SCs for the random factor (sire), since type III calculates the SCs of an effect by correcting them with respect to any other effect that does not contain it and orthogonal to any effect (if it exists) that contains it. Type III CS can be expressed as:^{11}
$SCs=SC\left(\mu ,s,b\right)SC\left(\mu ,b\right)$
The 𝑆𝐶𝑠 is corrected for the effects of 𝜇 𝑦 𝑏where 𝜇 is the intercept or herd mean effect. In order to find the values of 𝑆𝐶𝑠 it is necessary to fit a complete model and calculate (𝜇, 𝑠, 𝑏, ) and subtract 𝑆𝐶(𝜇, 𝑏) a reduced model.
Maximum likelihood
The maximum likelihood (ML) method is a classical method of parameter estimation proposed by Fisher,^{12} but it was not until Hartley and Rao,^{13} that it was used for mixed models in general. Knowing the likelihood function as a function of the parameters of a statistical model given some data, in ML we try to obtain estimators of the variance components that maximize the likelihood function, that is, that have the maximum probability of representing the population parameters.
The likelihood function is defined as the product of the likelihood function of the data, but in practice, the natural logarithm of the likelihood function is used because it is more manageable, if the distribution of the data is normal, in matrix algebra the natural logarithm of the likelihood function is defined as:^{11}
$Ln\left(L\right)=0.5\left(n\right).In\left(2\pi \right)0.5In\leftV\right0.5\left(yXb\right)\prime {V}^{1}\left(yXb\right)$
Where (𝐿) is the natural logarithm of the likelihood function and 𝑉 = 𝑍𝐺𝑍^{′ }+ 𝑅 is the variance and phenotypic covariance matrix of the model. To find the estimators that maximize the likelihood, we need to find the maximum of equation (𝐿)This is achieved with different methodologies, for example, if the data structure is balanced and we have a mixed model, with a random effect and a fixed one with no interaction, the derivative of 𝐿𝑛(𝐿)with respect to the parameters to be estimated σ^{2}_{s} y σ^{2}_{e} will lead us to a system of equations whose solution is:
${\sigma}_{s}^{2}=\frac{\frac{SCs}{{n}_{s}}{\sigma}_{e}^{2}}{{n}_{fijo}k}$
${\sigma}_{e}^{2}=\left[1\frac{{n}_{fijo}1}{{n}_{s}\left({n}_{fijo}k1\right)}\right]CMe$
An important point of ML estimation, for this model, is that even with balanced data, it is possible to find estimators different from the ones presented above, since these solutions will be valid if the inequality 𝐶𝑀𝑠 > 𝐶𝑀𝑒is met, but on the other hand, if the inequality is 𝐶𝑀𝑠 < 𝐶𝑀𝑒 ML estimates for this model and balanced data are given by:^{11}
${\sigma}_{e}^{2}=\frac{\stackrel{{\sigma}_{s}^{2}=0}{SCtotal}}{\left({n}_{s}\right)\left({n}_{fijo}\right)\left(k\right)}$
That is all phenotypic variability is residual, which may indicate that the model used is incorrect or that the number of data is insufficient, thus increasing the variability of the error. The variance σ^{2}_{p} is the sum of the variance components σ^{2}_{e} y σ^{2}_{s} whose sum gives an estimate of σ^{2}_{p} given mathematically by:
${\sigma}_{p}^{2}={\sigma}_{s}^{2}+{\sigma}_{e}^{2}=\frac{\left(yXb\right)\prime \left(yXb\right)}{n}$
Which is biased, since it is associated with n degrees of freedom. If the structure of the daros is unbalanced, the partial derivatives of (𝐿) lead to nonlinear maximum likelihood equations for the parameters to be estimated, therefore, the system of equations cannot be solved with direct methods. Faced with this problem, iterative number methods are used to try to approximate the maximum of (𝐿) which are applied to the logarithmic likelihood itself and not to the equations resulting from its first derivative, in order to be able to simultaneously calculate the variance components and (𝐿)which we can use to find fit criteria for our model, such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).
Restricted maximum likelihood
The restricted maximum likelihood method (REML) is a method proposed by Paterson and Thompson,^{13} which takes into account the loss of degrees of freedom by including fixed effects in the statistical model, therefore, the estimation of variability components are unbiased, since they are associated to degrees of freedom, which leads to an estimation of variance of the model. 𝑛 − (𝑋) degrees of freedom, which leads to an estimate of the variance, which is defined as σ^{2}_{p}which is defined as:
${\sigma}_{p}^{2}=\frac{\left(yXb\right)\prime \left(yXb\right)}{nRango\left(X\right)}$
Where (𝑋) is the rank of the incidence matrix for the fixed effects of the model. For the case where the only fixed effect is 𝜇the variance σ^{2}_{p} is associated with 𝑛 − 1 degrees of freedom.
As in ML, in REML, the objective is to maximize the logarithm of a function of the parameters, but in this case restricted, which is known as restricted likelihood function, which in matrix algebra is defined as:^{14}
$Ln\left(Lr\right)=0.5\left(np\right).In\left(2\pi \right)0.5In\leftV\right0.5\leftX\prime {V}^{1}X\right0.5\left(yXb\right)\prime {V}^{1}\left(yXb\right)$
Where (𝐿_{𝑟}) is the logarithm of the restricted likelihood function. If we have a balanced data structure and by deriving (𝐿_{𝑟}) as a function of the variability components of the model (model above), we can solve a system of equations that give rise to estimates given by:^{11}
${\sigma}_{s}^{2}=\frac{CMsCMe}{{n}_{fijo}k}$
${\sigma}_{e}^{2}=CMe$
These are identical to estimates using an ANOVA, since a property of REML is that in a balanced data structure, REML estimates = ANOVA as long as the inequality is satisfied. 𝐶𝑀𝑠 > 𝐶𝑀𝑒Otherwise, estimates via ANOVA would be negative and in REML all phenotypic variability is residual. In unbalanced data structure, the derivative of (𝐿_{𝑟}) with respect to the variance components, gives rise to nonlinear equations, which cannot be solved directly, therefore, in these cases, as in ML, iterative numerical methods are used to approximate the value of the variance components.
REML estimates using kinship information in an animal model
In the case of a simple animal model, where each animal has only one data (and there are animals without data), the ANOVA method cannot be applied, since it is not possible to estimate the variation within groups using this methodology, because the classification variable is each animal that has a unique record, but the ML and REML estimations are applicable since they allow introducing kinship information in the matrix. 𝐴. In a mixed model, maximizing (𝐿_{𝑟}) is equivalent to minimize −2 (𝐿_{𝑟}) Therefore, the objective function to be minimized, in matrix algebra, can be defined as:^{14}
$2Ln\left(Lr\right)=\left(np\right).In\left(2\pi \right)+Ln\leftR\right+Ln\leftG\right+Ln\leftC\right+y\prime Py$
Where 𝐿𝑛𝐶 is the natural logarithm of the determinant of the coefficient matrix of the normal Henderson equations and ′𝑃𝑦 is the generalized residual sum of squares. Obviously to minimize −2𝐿𝑛(𝐿_{𝑟}) iterative numerical methods are needed, but it has the advantage that it is easier than maximizing 𝐿𝑛(𝐿_{𝑟}) Therefore, most specialized REML programs use sparse matrix algorithms and numerical methods to try to find estimators resulting from the minimization of −2𝐿𝑛(𝐿_{𝑟}).
An example of birth weight (BW) in cattle was used to calculate the VG, ℎ^{2} y 𝑒^{2} using a mixed model, with a fixed and a random factor. The database is presented in Table 2. In this problem, we want to eliminate the variability that exists between the sexes, therefore, the sex factor is considered as fixed and the father factor as random, which leads us to the statistical model for this problem:
$PN=media+padre+sexo+error$
Father 
Animal 
Sex 
y 
1 
3 
Male 
36 
1 
4 
Male 
35 
1 
5 
Female 
33 
1 
6 
Female 
28 
2 
7 
Female 
31 
2 
8 
Female 
29 
2 
9 
Male 
28 
2 
10 
Male 
36 
3 
11 
Male 
38 
3 
12 
Male 
37 
3 
13 
Female 
29 
3 
14 
female 
35 
Table 2 Database of animal records, sex and NP
ANOVA, ML, and REML methods were used to calculate ℎ^{2} , 𝑒^{2} and GVs using the data in Table 2, first using all the information and then assuming the last missing data. For the animal model, a similar model was used:
$PN=media+animal+sexo+error$
Where all the kinship information and the value of the variance components found in the previous model were used to solve the Henderson normal equations.
Balanced data in a reproductive model
To calculate the CM, it is necessary to calculate the SC and GL for each source of variation, for this model and our data structure, we can calculate them using the formulas in Table 1.
$\begin{array}{l}G{L}_{s}=31=2\\ G{L}_{sexo}=21=1\\ G{L}_{total}=121=11\\ G{L}_{e}=1112=113=8\end{array}$
And since the design is balanced, the SCs are:
$\begin{array}{l}S{C}_{total}={36}^{2}+{35}^{2}+{33}^{2}+\dots +{35}^{2}\frac{{395}^{2}}{12}=152.916\\ S{C}_{s}=\frac{{\left(132\right)}^{2}+{\left(124\right)}^{2}+{\left(139\right)}^{2}}{4}\frac{{395}^{2}}{12}=28.166\\ S{C}_{sexo}=\frac{{\left(185\right)}^{2}+{\left(210\right)}^{2}}{6}\frac{{395}^{2}}{12}=52.083\\ S{C}_{e}=152.916\left(28.166+52.083\right)=72.667\end{array}$
And the CMs come from:
$\begin{array}{l}C{M}_{S}=\frac{28.166}{2}=14.083\\ C{M}_{e}=\frac{72.667}{8}=9.083\end{array}$
And from the CM we can calculate the variance components:
${\sigma}_{s}^{2}=\frac{14.0839.083}{2\left(2\right)}=1.25$
Therefore, ℎ^{2} using ANOVA is:
$\begin{array}{l}{h}^{2}=\frac{4\left(1.25\right)}{1.25+9.083}=0.483\\ {e}^{2}=\frac{10.335}{10.33}=0.515\end{array}$
And these ANOVA estimates, too, are REML, since the data structure is balanced and the 𝐶𝑀𝑠 > 𝐶𝑀𝑒. Now the calculation of the GVs, using the REML estimates, comes from the solutions of the normal Henderson equations, using the estimated value of the variance components, and calculating the value of 𝛼 we have that:
$\alpha =\frac{9.083}{1.25}=7.2664$
Therefore, the equations are:
$\left(\begin{array}{ccc}6& 0& \begin{array}{ccc}2& 2& 2\end{array}\\ 0& 6& \begin{array}{ccc}2& 2& 2\end{array}\\ \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}\begin{array}{ccc}4+7.266& 0& 0\end{array}\\ \begin{array}{ccc}0& 4+7.266& 0\end{array}\\ \begin{array}{ccc}0& 0& 4+7.266\end{array}\end{array}\end{array}\right)\left[\begin{array}{c}\begin{array}{c}{b}_{1}\\ {b}_{2}\end{array}\\ {s}_{1}\\ {s}_{2}\\ {s}_{3}\end{array}\right]=\left[\begin{array}{c}185\\ 210\\ 132\\ 124\\ 139\end{array}\right]\to \left[\begin{array}{ccc}6& 0& \begin{array}{ccc}2& 2& 2\end{array}\\ 0& 6& \begin{array}{ccc}2& 2& 2\end{array}\\ \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}\begin{array}{ccc}11.266& 0& 0\end{array}\\ \begin{array}{ccc}0& 11.266& 0\end{array}\\ \begin{array}{ccc}0& 0& 11.266\end{array}\end{array}\end{array}\right]\left[\begin{array}{c}{b}_{1}\\ {b}_{2}\\ {s}_{1}\\ {s}_{2}\\ {s}_{3}\end{array}\right]=\left[\begin{array}{c}185\\ 210\\ 132\\ 124\\ 139\end{array}\right]$
In the previous equations the value of 𝜇 was forced to be zero in order to break the linear dependence between the rows and columns of the coefficient matrix. The solution of this system of equations is given by:
$\left[\begin{array}{c}{b}_{1}\\ {b}_{2}\\ {s}_{1}\\ {s}_{2}\\ {s}_{3}\end{array}\right]={\left[\left[\begin{array}{ccc}6& 0& \begin{array}{ccc}2& 2& 2\end{array}\\ 0& 6& \begin{array}{ccc}2& 2& 2\end{array}\\ \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}\begin{array}{ccc}11.266& 0& 0\end{array}\\ \begin{array}{ccc}0& 11.266& 0\end{array}\\ \begin{array}{ccc}0& 0& 11.266\end{array}\end{array}\end{array}\right]\right]}^{1}\left[\begin{array}{c}185\\ 210\\ 132\\ 124\\ 139\end{array}\right]=\left[\begin{array}{c}30.83\\ 35\\ 0.029\\ 0.680\\ 0.650\end{array}\right]$
ML estimates are:
$\begin{array}{l}{\sigma}_{s}^{2}=\frac{\frac{28.166}{3}8.073}{2\left(2\right)}=0.328\\ {\sigma}_{e}^{2}=\left[1\frac{21}{3\left(2\left(2\right)1\right)}\right]9.083=8.073\end{array}$
Y ℎ^{2}y 𝑒^{2}using ML is:
$\begin{array}{l}{h}^{2}=\frac{4\left(0.328\right)}{0.328+8.073}=0.156\\ {e}^{2}=\frac{8.4011.3212}{8.401}=0.842\end{array}$
And the equations using ML are:
$\left[\begin{array}{ccc}6& 0& \begin{array}{ccc}2& 2& 2\end{array}\\ 0& 6& \begin{array}{ccc}2& 2& 2\end{array}\\ \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}\begin{array}{ccc}28.612& 0& 0\end{array}\\ \begin{array}{ccc}0& 28.612& 0\end{array}\\ \begin{array}{ccc}0& 0& 28.612\end{array}\end{array}\end{array}\right]\left[\begin{array}{c}{b}_{1}\\ {b}_{2}\\ {s}_{1}\\ {s}_{2}\\ {s}_{3}\end{array}\right]=\left[\begin{array}{c}185\\ 210\\ 132\\ 124\\ 139\end{array}\right]$
Therefore the solution is:
$\left[\begin{array}{c}{b}_{1}\\ {b}_{2}\\ {s}_{1}\\ {s}_{2}\\ {s}_{3}\end{array}\right]={\left[\begin{array}{ccc}6& 0& \begin{array}{ccc}2& 2& 2\end{array}\\ 0& 6& \begin{array}{ccc}2& 2& 2\end{array}\\ \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}\begin{array}{ccc}28.612& 0& 0\end{array}\\ \begin{array}{ccc}0& 28.612& 0\end{array}\\ \begin{array}{ccc}0& 0& 28.612\end{array}\end{array}\end{array}\right]}^{1}\left[\begin{array}{c}185\\ 210\\ 132\\ 124\\ 139\end{array}\right]=\left[\begin{array}{c}30.83\\ 35\\ 0.011\\ 0.267\\ 0.256\end{array}\right]$
Introduction of the parentage matrix in a reproductive model
Now it is assumed that animal 1 is the father of animal 2, therefore, the equations take into account all the genealogy between males. First we have to calculate 𝐴^{−1}. Applying Henderson's rules,^{5} we have:
${A}^{1}=\left[\begin{array}{ccc}1+1/3& 2/3& 0\\ 2/3& 1/4& 0\\ 0& 0& 1\end{array}\right]=\left[\begin{array}{ccc}1/4& 2/3& 0\\ 2/3& 1/4& 0\\ 0& 0& 1\end{array}\right]$
Therefore,
${A}^{1}\alpha =\left[\begin{array}{ccc}\frac{1}{4}& \frac{2}{3}& 0\\ \frac{2}{3}& \frac{1}{4}& 0\\ 0& 0& 1\end{array}\right]\left(7.2664\right)=\left[\begin{array}{ccc}1.8166& 4.8442& 0\\ 4.8442& 1.8166& 0\\ 0& 0& 7.2664\end{array}\right]$
And adding the Z'Z matrix:
$Z\prime Z+{A}^{1}\alpha =\left[\begin{array}{ccc}4& 0& 0\\ 0& 4& 0\\ 0& 0& 4\end{array}\right]+\left[\begin{array}{ccc}1.8166& 4.8442& 0\\ 4.8442& 1.8166& 0\\ 0& 0& 7.2664\end{array}\right]=\left[\begin{array}{ccc}5.8166& 0.8442& 0\\ 0.8442& 5.8166& 0\\ 0& 0& 11.2664\end{array}\right]$
Therefore, the Henderson normal equations are:
$\left[\begin{array}{ccc}6& 0& \begin{array}{ccc}2.0000& 2.0000& 2.0000\end{array}\\ 0& 6& \begin{array}{ccc}2.0000& 2.0000& 2.0000\end{array}\\ \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}\begin{array}{ccc}5.8166& 0.8442& 0.0000\end{array}\\ \begin{array}{ccc}.08442& 5.8166& 0.0000\end{array}\\ \begin{array}{ccc}0.0000& 0.0000& 11.2664\end{array}\end{array}\end{array}\right]\left[\begin{array}{c}{b}_{1}\\ {b}_{2}\\ {s}_{1}\\ {s}_{2}\\ {s}_{3}\end{array}\right]=\left[\begin{array}{c}185\\ 210\\ 132\\ 124\\ 139\end{array}\right]$
And the solution of these equations is:
$\left[\begin{array}{c}{b}_{1}\\ {b}_{2}\\ {s}_{1}\\ {s}_{2}\\ {s}_{3}\end{array}\right]={\left[\begin{array}{ccc}6& 0& \begin{array}{ccc}2.0000& 2.0000& 2.0000\end{array}\\ 0& 6& \begin{array}{ccc}2.0000& 2.0000& 2.0000\end{array}\\ \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}2\\ 2\\ 2\end{array}& \begin{array}{c}\begin{array}{ccc}5.8166& 0.8442& 0.0000\end{array}\\ \begin{array}{ccc}0.8442& 5.8166& 0.0000\end{array}\\ \begin{array}{ccc}0.0000& 0.0000& 11.2664\end{array}\end{array}\end{array}\right]}^{1}\left[\begin{array}{c}185\\ 210\\ 132\\ 124\\ 139\end{array}\right]=\left[\begin{array}{c}31.6285\\ 35.7952\\ 0.7765\\ 1.9776\\ 0.3685\end{array}\right]$
In this solution, the fixed effect (sex) is adjusted for the random example, and the random effect is adjusted for the fixed effect.
Unbalanced data in a reproductive model
Assuming the last missing data, the 𝑆𝐶𝑡𝑜𝑡𝑎𝑙 is:
$SCtotal=S{C}_{total}={36}^{2}+{35}^{2}+{33}^{2}+\dots +{29}^{2}\frac{{360}^{2}}{11}=148.1818182$
The GLs are:
$\begin{array}{l}G{L}_{s}=31=2\\ G{L}_{sexo}=21=1\\ G{L}_{total}=111=10\\ G{L}_{e}=1012=113=7\end{array}$
To calculate the ordinary least squares solutions for the reduced model (without the sire factor) must be estimated:
$b={\left[\begin{array}{ccc}11& 5& 6\\ 5& 5& 0\\ 6& 0& 6\end{array}\right]}^{1}\left[\begin{array}{c}360\\ 150\\ 0\end{array}\right]=\left[\begin{array}{ccc}0.1666& 0.1666& 0\\ 0.1666& 0.3666& 0\\ 0& 0& 0\end{array}\right]\left[\begin{array}{c}360\\ 150\\ 0\end{array}\right]=\left[\begin{array}{c}35\\ 5\\ 0\end{array}\right]$
Therefore, the 𝑆𝐶𝑠𝑒𝑥𝑜 for the reduced model is:
$S{C}_{sexo\left(reducido\right)}=\left[3550\right]\left[\begin{array}{c}360\\ 150\\ 210\end{array}\right]\frac{{360}^{2}}{11}=\left(35\right)\left(360\right)+\left(5\right)\left(150\right)\frac{{360}^{2}}{11}=68.1818$
To obtain the adjusted SCs, we calculate the SC for the full model and subtract from it, respectively (𝑟𝑒𝑑𝑢𝑐𝑖𝑑𝑜):
$SCs=SC\left(\mu ,s,sexo,\right)SC\left(\mu ,sexo\right)=83.681818268.1818=15.5$
And the 𝑆𝐶_{𝑒} is:
$S{C}_{e}=148.1818182\left(15.5+68.1818\right)=64.5$
Therefore, the CMs are:
$\begin{array}{l}C{M}_{s}=\frac{15.5}{2}=7.75\\ C{M}_{e}=\frac{64.5}{7}=9.214286\end{array}$
Finally, the variance components are:
$\begin{array}{l}{\sigma}_{s}^{2}=\frac{7.759.214286}{3.6}=0.4067\\ {\sigma}_{e}^{2}=9.214286\end{array}$
The component σ^{2}_{s} component is a negative estimate of the variance, because the 𝐶𝑀𝑠 < 𝐶𝑀𝑒is negative, therefore, the ML and REML estimates for σ^{2}_{s} are:
σ^{2}_{s }= 0
In other words, all the total variability is residual. Table 3 shows the ML and REML iterations for the calculation of the residual variance with the NewtonRapson method: As shown in Table 3, when an unbalanced database is used, the estimates for the variance components are different in ANOVA and REML, since with ANOVA we have σ^{2}_{e} = 9.2142 and with REML σ^{2}_{e} = 8.88. For this particular case, ℎ^{2 }= 0 because the variance of the numerator is zero, obviously to find a more credible estimate, one should increase the number of data used in the genetic evaluation or try another model and compare the AIC.
ML 
REML 

Iteración 
2ln(L) 
σ_{s}^{2}  σ_{e}^{2} 
iteración 
2ln() 
σ_{s}^{2}  σ_{e}^{2} 
1 
53.0420 
0 
7.2727 
1 
48.6053 
0 
8.88 
2 
53.0420 
0 
7.2727 
2 
48.6053 
0 
8.88 
Table 3 Iteration of variance components for ML and REML
Animal model
The solutions of the Henderson normal equations, using the values of σ^{2}_{e} = 9.083 y σ^{2}_{a }= 5 are presented in Table 4 Henderson's normal equations are not presented for this case due to its large dimensions.
Animal 
VG 
1 
0.422982 
2 
0.984574 
3 
1.10566 
4 
0.217214 
5 
0.809321 
6 
0.651756 
7 
0.273233 
8 
0.857664 
9 
2.32642 
10 
0.113062E01 
11 
1.33545 
12 
1.04324 
13 
0.117947 
14 
1.63535 
Table 4 Genetic values using an animal model
Genetic values and parameters can be estimated using ANOVA, ML and REML, but with the risk of estimating negative variance components using ANOVA or zero (or overestimated) heritabilities with likelihoodbased methods when the data structure or model is not correct. When the data structure is unbalanced, mathematical calculations with ANOVA, ML and REML are more complex and require computational algorithms with higher performance.
None.
The authors declared that there are no conflicts of interest.
©2024 Raúl, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work noncommercially.