Review Article Volume 2 Issue 4
Department of Biology, Madda Walabu University, Ethiopia
Correspondence: Temesgen Bedassa Gudeta, Department of Biology, Madda Walabu University, P.O. Box 247, Robe, Ethiopia, Tel 2519-1178-5364
Received: January 28, 2018 | Published: July 3, 2018
Citation: Gudeta TB. Molecular marker based genetic diversity in forest tree populations. Forest Res Eng Int J. 2018;2(4):176-182. DOI: 10.15406/freij.2018.02.00044
Information of the genetic diversity of the threatened tree species in any region of the world may contribute to the creation of effective strategies for their preservation and future use. Nowadays, molecular markers have proven to be invaluable tools for assessing genetic resources of tree plants by improving understanding of the users with regards to the distribution and the extent of genetic variation within and among species. Recently developed marker technologies allow the uncovering of the extent of the genetic variation in an unprecedented way through increased coverage of the genome. Markers have diverse applications in plant sciences, but certain marker types, due to their inherent characteristics, have also shown their limitations. A combination of diverse marker types is usually recommended to provide an accurate assessment of the extent of intra– and inter–population genetic diversity of naturally distributed plant species on which proper conservation directives for species that are at risk of decline can be issued. Here, specifically, natural populations of forest trees are reviewed by summarizing published reports in terms of the status of genetic variation in the pure species. In general, for out bred forest tree species, the g
within populations is larger than among populations of the same species, indicative of a negligible local spatial structure. Additionally, as is the case for plants in general, the diversity at the phenotypic level is also much larger than at the marker level, as selectively neutral markers are commonly used to capture the extent of genetic variation. However, more and more, nucleotide diversity within candidate genes underlying adaptive traits are studied for signatures of selection at single sites. This adaptive genetic diversity constitutes important potential for future forest management and conservation purposes.
Keywords:forest trees, genetic diversity, molecular markers
Forest trees are largely undomesticated and highly heterozygous, due to their out crossing breeding systems and, therefore, have large effective population sizes.1 Despite the high number of known species, approximately 450 different forest tree species are actively part of a deliberate domestication process through tree improvement programs (FAO).2 Knowledge of the genetic diversity of the threatened tree species in any region of the world may contribute to the creation of effective strategies for their preservation and future use. The majority of the world–wide forests represent natural forests (93%), with 12% dedicated as conservation forests. A major concern regarding forests health and resilience is the declining in forest genetic diversity as documented as early as 1967 (FAO conference). Genetic diversity serves several important purposes: (a) as a resource for tree breeding and improvement programs to develop well–adapted tree species varieties and to enhance the genetic gain for a multitude of useful traits; (b) to ensure the vitality of forests as a whole by their capacity to withstand diverse biotic and abiotic stressors under changing and unpredictable environmental conditions; and (c) the livelihoods of indigenous and local communities that use traditional knowledge. Rich genetic diversity within and among forest tree species thus provides an important basis for maintaining food security and enabling sustainable development (FAO).3
Historically, for plant improvement, three major areas have always been important for molecular marker applications: (a) the determination of genetic diversity within, between and among populations; (b) verification and characterization of genotypes; and (c) marker–assisted selection (MAS) .4 In particular, for forest trees that are out crossing and largely undomesticated plant species, molecular markers have proven to be invaluable tools with applications in: (1) genetic conservation efforts by identification of genetic diversity hotspots; (2) the assembly of breeding populations in newly developed and advanced breeding programs; (3) the monitoring and characterization of population dynamics and gene flow; (4) the proper delineation of species taxonomy for management issues associated with conservation; (5) assessment of gene flow (pollen contamination) in seed orchards and the authentication of “controlled crossings”, the assessment of inbreeding occurrence in breeding programs and studies of mating systems in non–industrial tree species; and (6) genetic fingerprinting in advanced breeding programs for the purpose of quality control to detect misidentified ramets in production and breeding populations.4 Although tree breeding programs would significantly benefit from an early selection of clones with advantageous trait characteristics (particularly important for late–expressing wood quality traits), MAS was deemed not feasible for forest trees with limited genetic marker coverage.5,6 The main reasons for the infeasibility of MAS as a tool for forest tree improvement are the inherent characteristics specific of forest trees as compared to inbred agricultural crop plants, such as the polygenic nature of most of the economically important traits in forestry, the inconsistency in quantitative trait locus (QTL) marker linkages among families originating from large outcrossed breeding populations and the instability of QTLs from the same genetic material planted across different sites, due to strong genotype–by–environment (G×E) interactions. As highly efficient next generation SNP (single nucleotide polymorphism) genotyping platforms have become available, genome–wide selection approaches have become feasible for accelerating forest tree breeding.7,8
The use of DNA markers in plant and animal breeding has opened new territory in agriculture which is called molecular breeding. These markers are widely used because of their high prevalence and expression in different stages of the organisms.9 This review is begun with regards to the genetic diversity in forest tree species with a brief historical retrospect concerning the development of marker types that have been widely employed for studying genetic variability in plants in general. The first, while the most easily accessible types of plant characteristics, are morphological markers that can easily be monitored based on simple inheritance.9 However, due to serious drawbacks with respect to dominance, the difficulty of distinguishing between multiple alleles or even between different loci10,11 and trait expression due to environmental and developmental variation (G × E interaction), their use was substantially reduced with the advent of DNA marker technologies. Another marker type that played an important role in assessing genetic diversity in plants was isozymes.12,13 Isozymes had a long history in genetic variability studies in forestry, to assess the genetic diversity present within natural forest stands14,15 or to determine whether domestication practices had led to a reduction in diversity.16–18 However, the problem of these biochemical marker assays is that they are affected by plant phenological stage and their limited availability, and therefore, they would never allow for a genome–wide scan of variability (as only 0.1% of the total variation is detectable by this technique).19 An invaluable alternative offered DNA–based markers, such as restriction fragment length polymorphism (RFLPs).20–22 Finally, the possibility to rapidly amplify specific DNA fragments in vitro via polymerase chain reaction (PCR)23 revolutionized the generation of molecular markers, leading to diverse sets of diagnostic DNA–marker systems with or without a priori sequence knowledge, such as random amplified polymorphic DNA (RAPD),24 amplified fragment length polymorphism (AFLP),25 simple sequence repeats (SSRs or microsatellites),26 single nucleotide polymorphisms (SNPs)27,28 and variations thereof.29,30 For example, through bootstrap analysis the number of loci sufficient for the study of genetic diversity of M. caesalpiniaefolia (Figure 1) was estimated. In this review it is verified that, with the increase of loci analyzed in re–sampling there was an increase of the values of correlation and a reduction of the Kruskal stress values.
Figure 1 Values of the Pearson correlation (r) and Kruskal stress (E) as a function of the number of ISSR loci used to estimate the genetic diversity of nine M. caesalpiniaefolia individuals.36
Important issues are related to the reproducibility of the RAPD marker system,31 other limitations, such as the presence of null alleles in the case of SSR assays that may underestimate heterozygosity,32 or the dominance nature of the RAPD and AFLP marker systems, where heterozygous individuals cannot be distinguished from homozygous ones, and lastly, the inexpensive generation of a vast abundance of highly polymorphic DNA markers to tackle genome–wide genetic diversity studies. Dependent on the study focus, genetic markers were derived from nuclear or organelle sequences; for example, chloroplast–or mitochondrial–derived diagnostic markers,33–35 dependent on the evidence of their maternal inheritance in the species, were used to trace back the colonization history of angiosperm forest tree species and conifers, respectively.36,37 Although it has been known that variability within protein–coding regions is far less than within non–coding genomic regions, due to lower mutation rates and purifying selection to maintain proper protein functions, the study of polymorphic sites within coding sequences has been deemed more relevant because of their putative functional associations and, in addition, the ease of their interspecific transferability for comparative genetic studies based on sequence conservation. Thus, a major focus in plant studies has been the development of genetic markers prevalently present within such coding regions for high–throughput analysis of many samples using the inexpensive detection method of PCR fragment length polymorphisms (e.g., eco–tilling to circumvent expensive Sanger resequencing of PCR products, as in the case of SNP detection and genotyping),38 but that still relied on laborious PCR optimizations.39–42 The substantial and almost exponential drop in whole genome sequencing costs, thanks to the 1000 Human Genome Project, which has stimulated the development of highly cost–efficient high–throughput technologies, has also provided for the plant research community unprecedented opportunities for affordable in–depth characterization of plant genomes that has involved the genome–wide discovery of SSRs and SNPs and the detection of common, as well as rare functional variants by next generation sequencing.43–49
Figure 2 Examples of mapping populations and their relationship.51 AC, anther culture; BC, backcross population; BIL, backcross inbred line; DH, double haploid; IM, intermating; NIL, near–isogenic line; RIL, recombinant inbred line; TC, testcross; TTC, triple testcross.
Figure 3 Marker assisted pyramiding of two disease resistance genes. Note that homozygotes can be selected from the F2 population.9
A number of evolutionary processes can impact the genetic diversity of natural populations. These are: (a) spontaneously arising mutations; (b) gene flow via migration; (c) inbreeding; (d) natural selection; (e) the Wahlund effect; and (f) random genetic drift.50 Genetic drift introduces random changes in allele frequencies over generations and becomes important for finite population samples and/or a large number of generations. These random allele frequency changes can, over time, lead to allele fixation or extinction. By all means, genetic drift represents a source of differences in genetic diversity among different populations. On the other hand, gene flow evens out among–population genetic differences, but increases genetic variation within populations, due to the introduction of new alleles. Selection influences within–population diversity, but the effects are dependent on the nature of these selection processes (balancing selection). Furthermore, the effects of natural selection are interwoven with stochastic effects, such as genetic drift. Mutations can counterbalance the loss of allelic diversity; however, natural mutations are rare, and such mutations that turn out to be harmful allelic variants are again removed by purifying selection. The occurrence of a population bottleneck causes a significant reduction in the effective population size and represents a major reason for the loss in allelic diversity, first by the loss of rare alleles, then by the successive loss of heterozygosity in the population.50 Inbreeding and the presence of a subpopulation structure, where gene flow is prevented by habitat fragmentation (the Wahlund effect), both cause the loss in heterozygosity.50 This, in turn, results in increased genetic diversity among populations.
Within–population genetic variation using genotype data
A gene is defined as polymorphic in the population when its most common allele is less frequent than 95%.50 Genetic diversity can be assessed by estimating the following parameters: the total number of different alleles in the population, the percentage of polymorphic loci, the mean number of alleles per locus, the allelic richness, the within–population genetic diversity, , the effective population size, Ne (i.e. ,divided by the per–generation mutation rate), the minor allele frequency (as in the case of biallelic loci), the proportion of heterozygous individuals in the population for a given locus (the expected heterozygosity, (HE; based on the Hardy–Weinberg expectations that assume the random mating of genotypes), as well as the observed heterozygosity (HO) and the fixation index, F.50 Genomic diversity is estimated by genome–wide assessment of genetic diversity using a larger sample of loci at random. An estimate of the genome–wide genetic diversity in a population is then derived by averaging heterozygosity over the multitude of studied loci.
Between–/among–population genetic variation using genotype data
Differences in the genetic diversity between/among (sub–)populations are assessed based on the presence of significant allele frequency differences; widely applied metrics to estimate such “genetic differentiation” include, for example, FST51,52, 53, RST54, ΦST (Φ′ST)55,56, GST(G′ST)57,58, DST57, HST59 or D.60 Some measures are marker–dependent; they are based on the assumption of infinite–allele or stepwise mutation models, respectively, and depending on whether biallelic or multi–allelic molecular markers or haplotype data were used in the analysis (FST; RST; ST). Moreover, the use of fixation measures for result interpretation with regard to genetic differentiation has been found to be problematic when the populations under study exhibited high genetic diversity/heterozygosity.58,60 For such cases, “standardized” genetic differentiation metrics’ have been suggested;56,58,60 but, see also the recent publication on the topic by Whitlock et al.61 who emphasized the continuous use of FST for intra–specific differentiation estimation when the mutation rate is small (relative to gene flow), while emphasizing the use of and RST when the mutation rate is high (as in the case of SSRs). In any case, for the estimation of population divergence from genotypic data, freely available software packages within the R environment62 that have these statistics implemented are readily available (cf. “mmod”). Furthermore, genetic loci with allelic frequencies significantly different among populations and potentially under selection (“FST outlier loci”) can be efficiently detected using multilocus scans that compare the patterns of nucleotide diversity and genetic differentiation (based on the distribution of empirical FST estimates conditioned on HE) to the simulated genome–wide selectively–neutral genetic background.63,64
Sequence divergence using sequence alignment data
Other and additional ways to look at genetic diversity and study mutation and selection events within populations and by comparing different populations involve the characterization of DNA sequences of genes and the diversity of nucleotides as the specific study entities.65–68 Widely used tests include nucleotide diversity ,50,69–72 and the McDonald–Kreitman & HKA (Hudson–Kreitman–Aguade) tests,73,74 respectively. Such tests are implemented in the freely available software package, DnaSP.75 The combination of results from such analyses has particular value for identifying past population size changes (population expansion or population bottleneck).
One of the first comprehensive reviews on genetic diversity with regards to forest tree populations was published by Hamrick et al.76 This early work summarized results based on isozymes and is especially valuable, as it compares long–lived forest trees with other life forms of plant species, in total comprising 662 different species with representatively high sample sizes for the analysis of the genetic diversity parameters. Long–lived, woody species showed the highest genetic diversity (including a significantly higher percentage of polymorphic loci and more alleles per locus) among all plant species. Specifically, the genetic diversity within populations was significantly the highest (HE=0.15) compared to all other plant life forms (HE<0.10). However, heterogeneity in genetic diversity exists among woody species taxa and this is due to the different evolutionary histories of species. For example, species from smaller founder populations, small disjunct populations or those with past population bottlenecks show generally less genetic diversity. Alseis blackiana, Picea glauca, Robinia pseudoacacia and Pinus sylvestris showed high diversity. On the other side of the spectrum were Acacia mangium, Pinus resinosa, P. torreyana and Populus balsamea with very low diversity.76 Other studies77,78 identified additional species with low intra–population diversity: Ficus carica and Thuja plicata.
While most studies identified high intra–population variation, by contrast, the diversity among populations of long–lived, woody tree species based on the GST estimate was significantly the lowest (GST =0.08) compared to the herbaceous and annual life forms (GST>0.25).76 When woody angiosperms were compared to gymnosperms in terms of their intra–population genetic diversity, differences were not significant, yet the latter exhibited a significantly higher percentage of polymorphic allozyme loci, suggestive of a higher proportion of low frequency alleles in gymnosperm species.76 Angiosperm species showed higher among–population genetic diversity (GST). Recent research on the conifer genome evolution, which involved orthologous coding sequence alignments for thousands of gymnosperms and angiosperm orthologous coding sequences, respectively, showed, more specifically, an overrepresentation of non–synonymous substitutions in protein–coding genes for conifers compared to angiosperms,79 while the average synonymous mutation rate in angiosperms is significantly higher, suggestive of a higher number of fixed adaptive mutations in conifers. As expected, the extent of the geographical range had a significant impact on genetic diversity within species and among populations.76 Geographically widespread species showed a significantly higher intra–population genetic diversity estimate compared to locally confined species, but the latter showed higher genetic diversity among populations.76 However, the “non–significant” inter–population differentiation sometimes reported in these isozyme studies (see above) can mislead the directions of conservation efforts. Other marker types, those that are able to cover a higher portion of the overall genetic variation (such as restriction fragment length polymorphisms of DNA) succeeded in uncovering significant among–population diversity in Pinus and Quercus, specifically with the application of organellar DNA markers.80,81 Differing outcomes for isozymes and organellar DNA studies on population divergence are frequent and were even reported within the same sample as for Argania spinosa (L.) Skeels, an important multi–purpose tree in the Moroccan local community.82 It is also clear that variation at selectively neutral molecular markers commonly used to assess genetic diversity within or among populations may not covary with the phenotypic expression of a particular qualitative or quantitative trait of interest,29 such that population differentiation for adaptive traits (growth, morphology or fitness) is much higher than for isozymes, for example. In any case, the total allelic richness was identified as a more adequate directive than the HE estimate for conservation purposes, and marker types, such as SSRs or DNA sequence–based data, that are highly polymorphic are required for an accurate estimate.82 A recent study integrating molecular genetic analysis based on four SSR and five sequence loci along with climate modeling83 forecasted the long–term decline of the late–successional Australian rainforest conifer, Podocarpus elatus, in its southern populations, due to habitat fragmentation (and the decline in Ne), for which conservation strategies are now invoked. Isozyme markers (15 loci) were used to characterize the genetic diversity of Carapa procera, which occurs in low density within a tropical rain forest.15 Its characteristics were high within–population diversity (comparable to temperate gymnosperms), high heterozygosity and a lack of spatial structure consistent with the highly outcrossing nature of the species, leading to extensive pollen–mediated gene flow that prevented local genetic differentiation. When 63 SNP polymorphisms (surveyed by eco–tilling) in nine different genes with broad functional properties were targeted as a feature for understanding DNA variation in 41 wild populations of a small western black cottonwood (P. trichocarpa) sample panel,40 it was found that heterozygosity was high (HO =0.47) and that overall nucleotide diversity at the gene level (π=0.0018) among populations was low. Similarly, low average π values of the segregating sites were obtained for other forest tree species, such as P. nigra (π=0.0024)84 and Pinus sylvestris (π=0.0025).85 Much higher overall nucleotide diversity levels in a conifer were uncovered for P. taeda (π=0.00398).86 Among the studied poplars, interestingly, the European species, P. tremula, showed the highest nucleotide diversity (π=0.007 or even π=0.0111,87,88 dependent on the surveyed genes), but differences in diversity were also consistent with its different and complex demographic history. However, nucleotide diversity is best interpreted on a gene–by–gene basis, as population history and selection affect these mutation rates more specifically.40,89 In a similar context, assessing the adaptive genetic diversity in forest trees is important to harness this adaptive potential for future forest management and conservation purposes.90 Candidate genes underlying a specific trait of interest are typically selected (cf. nine candidate genes for bud burst in Quercus petraea: π=0.00615;91 121 candidate genes for cold hardiness in Pseudotsuga menziesii var. menziesii: π =0.004;92 13 candidate genes for drought stress in Pinus pinaster π=0.00548).64 While most of this detected variation was largely attributed to purifying selection (an excess of nucleotide diversity at synonymous vs. non–synonymous sites), as commonly observed in forest trees, patterns of strong diversifying selection in candidate genes were also uncovered.64
This review is focused on the value of integrating knowledge on adaptive complex traits as a companion to molecular markers for making informative management and conservation decisions. It is emphasized that integrative approaches using future climate modeling have been very successful in uncovering potential threats of declines of the genetic diversity and the distribution of forest tree species, so that timely precautions to preserve the species can be undertaken. Associated with the substantial drop in whole genome sequencing costs making the sequencing of genetically complex organisms more affordable, inventorying the complete portfolio of genetic resources has become feasible. This will also open new avenues for the conservation of previously marginalized and undervalued forest tree species that are considered of less economic value, but nevertheless represent value to the local ecosystems. While the present review focused primarily on the genetic diversity assessed for pure species, it is also stressed on the importance of investigating natural species hybrid zones as important sources of population genetic diversity in forest tree management. The potential of molecular markers application for the management forest gene conservation is summarized as following:
Therefore, how to manage, which and how many materials we should manipulate and where we should establish or protect the gene resources, depends on whether we really know the genetic background of particular species which we wants to conserve.
None.
Author declares there is no conflict of interest.
©2018 Gudeta. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.