Molecular marker based genetic diversity in forest tree populations

doi:10.15406/freij.2018.02.00044

eISSN: 2577-8307

Forestry Research and Engineering: International Journal

Review Article Volume 2 Issue 4

Molecular marker based genetic diversity in forest tree populations

Temesgen Bedassa Gudeta

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Department of Biology, Madda Walabu University, Ethiopia

Correspondence: Temesgen Bedassa Gudeta, Department of Biology, Madda Walabu University, P.O. Box 247, Robe, Ethiopia, Tel 2519-1178-5364

Received: January 28, 2018 | Published: July 3, 2018

Citation: Gudeta TB. Molecular marker based genetic diversity in forest tree populations. Forest Res Eng Int J. 2018;2(4):176-182. DOI: 10.15406/freij.2018.02.00044

Download PDF

Abstract

Information of the genetic diversity of the threatened tree species in any region of the world may contribute to the creation of effective strategies for their preservation and future use. Nowadays, molecular markers have proven to be invaluable tools for assessing genetic resources of tree plants by improving understanding of the users with regards to the distribution and the extent of genetic variation within and among species. Recently developed marker technologies allow the uncovering of the extent of the genetic variation in an unprecedented way through increased coverage of the genome. Markers have diverse applications in plant sciences, but certain marker types, due to their inherent characteristics, have also shown their limitations. A combination of diverse marker types is usually recommended to provide an accurate assessment of the extent of intra– and inter–population genetic diversity of naturally distributed plant species on which proper conservation directives for species that are at risk of decline can be issued. Here, specifically, natural populations of forest trees are reviewed by summarizing published reports in terms of the status of genetic variation in the pure species. In general, for out bred forest tree species, the g

within populations is larger than among populations of the same species, indicative of a negligible local spatial structure. Additionally, as is the case for plants in general, the diversity at the phenotypic level is also much larger than at the marker level, as selectively neutral markers are commonly used to capture the extent of genetic variation. However, more and more, nucleotide diversity within candidate genes underlying adaptive traits are studied for signatures of selection at single sites. This adaptive genetic diversity constitutes important potential for future forest management and conservation purposes.

Keywords:forest trees, genetic diversity, molecular markers

Introduction

Forest trees are largely undomesticated and highly heterozygous, due to their out crossing breeding systems and, therefore, have large effective population sizes.¹ Despite the high number of known species, approximately 450 different forest tree species are actively part of a deliberate domestication process through tree improvement programs (FAO).² Knowledge of the genetic diversity of the threatened tree species in any region of the world may contribute to the creation of effective strategies for their preservation and future use. The majority of the world–wide forests represent natural forests (93%), with 12% dedicated as conservation forests. A major concern regarding forests health and resilience is the declining in forest genetic diversity as documented as early as 1967 (FAO conference). Genetic diversity serves several important purposes: (a) as a resource for tree breeding and improvement programs to develop well–adapted tree species varieties and to enhance the genetic gain for a multitude of useful traits; (b) to ensure the vitality of forests as a whole by their capacity to withstand diverse biotic and abiotic stressors under changing and unpredictable environmental conditions; and (c) the livelihoods of indigenous and local communities that use traditional knowledge. Rich genetic diversity within and among forest tree species thus provides an important basis for maintaining food security and enabling sustainable development (FAO).³

Historically, for plant improvement, three major areas have always been important for molecular marker applications: (a) the determination of genetic diversity within, between and among populations; (b) verification and characterization of genotypes; and (c) marker–assisted selection (MAS) .⁴ In particular, for forest trees that are out crossing and largely undomesticated plant species, molecular markers have proven to be invaluable tools with applications in: (1) genetic conservation efforts by identification of genetic diversity hotspots; (2) the assembly of breeding populations in newly developed and advanced breeding programs; (3) the monitoring and characterization of population dynamics and gene flow; (4) the proper delineation of species taxonomy for management issues associated with conservation; (5) assessment of gene flow (pollen contamination) in seed orchards and the authentication of “controlled crossings”, the assessment of inbreeding occurrence in breeding programs and studies of mating systems in non–industrial tree species; and (6) genetic fingerprinting in advanced breeding programs for the purpose of quality control to detect misidentified ramets in production and breeding populations.⁴ Although tree breeding programs would significantly benefit from an early selection of clones with advantageous trait characteristics (particularly important for late–expressing wood quality traits), MAS was deemed not feasible for forest trees with limited genetic marker coverage.^5,6 The main reasons for the infeasibility of MAS as a tool for forest tree improvement are the inherent characteristics specific of forest trees as compared to inbred agricultural crop plants, such as the polygenic nature of most of the economically important traits in forestry, the inconsistency in quantitative trait locus (QTL) marker linkages among families originating from large outcrossed breeding populations and the instability of QTLs from the same genetic material planted across different sites, due to strong genotype–by–environment (G×E) interactions. As highly efficient next generation SNP (single nucleotide polymorphism) genotyping platforms have become available, genome–wide selection approaches have become feasible for accelerating forest tree breeding.^7,8

Types of molecular markers and their applications

The use of DNA markers in plant and animal breeding has opened new territory in agriculture which is called molecular breeding. These markers are widely used because of their high prevalence and expression in different stages of the organisms.⁹ This review is begun with regards to the genetic diversity in forest tree species with a brief historical retrospect concerning the development of marker types that have been widely employed for studying genetic variability in plants in general. The first, while the most easily accessible types of plant characteristics, are morphological markers that can easily be monitored based on simple inheritance.⁹ However, due to serious drawbacks with respect to dominance, the difficulty of distinguishing between multiple alleles or even between different loci^10,11and trait expression due to environmental and developmental variation (G × E interaction), their use was substantially reduced with the advent of DNA marker technologies. Another marker type that played an important role in assessing genetic diversity in plants was isozymes.^12,13 Isozymes had a long history in genetic variability studies in forestry, to assess the genetic diversity present within natural forest stands^14,15 or to determine whether domestication practices had led to a reduction in diversity.^16–18However, the problem of these biochemical marker assays is that they are affected by plant phenological stage and their limited availability, and therefore, they would never allow for a genome–wide scan of variability (as only 0.1% of the total variation is detectable by this technique).¹⁹ An invaluable alternative offered DNA–based markers, such as restriction fragment length polymorphism (RFLPs).^20–22 Finally, the possibility to rapidly amplify specific DNA fragments in vitro via polymerase chain reaction (PCR)²³ revolutionized the generation of molecular markers, leading to diverse sets of diagnostic DNA–marker systems with or without a priori sequence knowledge, such as random amplified polymorphic DNA (RAPD),²⁴ amplified fragment length polymorphism (AFLP),²⁵ simple sequence repeats (SSRs or microsatellites),²⁶ single nucleotide polymorphisms (SNPs)^27,28 and variations thereof.^29,30 For example, through bootstrap analysis the number of loci sufficient for the study of genetic diversity of M. caesalpiniaefolia (Figure 1) was estimated. In this review it is verified that, with the increase of loci analyzed in re–sampling there was an increase of the values of correlation and a reduction of the Kruskal stress values.

Figure 1 Values of the Pearson correlation (r) and Kruskal stress (E) as a function of the number of ISSR loci used to estimate the genetic diversity of nine M. caesalpiniaefolia individuals.³⁶

Important issues are related to the reproducibility of the RAPD marker system,³¹ other limitations, such as the presence of null alleles in the case of SSR assays that may underestimate heterozygosity,³² or the dominance nature of the RAPD and AFLP marker systems, where heterozygous individuals cannot be distinguished from homozygous ones, and lastly, the inexpensive generation of a vast abundance of highly polymorphic DNA markers to tackle genome–wide genetic diversity studies. Dependent on the study focus, genetic markers were derived from nuclear or organelle sequences; for example, chloroplast–or mitochondrial–derived diagnostic markers,^33–35 dependent on the evidence of their maternal inheritance in the species, were used to trace back the colonization history of angiosperm forest tree species and conifers, respectively.^36,37 Although it has been known that variability within protein–coding regions is far less than within non–coding genomic regions, due to lower mutation rates and purifying selection to maintain proper protein functions, the study of polymorphic sites within coding sequences has been deemed more relevant because of their putative functional associations and, in addition, the ease of their interspecific transferability for comparative genetic studies based on sequence conservation. Thus, a major focus in plant studies has been the development of genetic markers prevalently present within such coding regions for high–throughput analysis of many samples using the inexpensive detection method of PCR fragment length polymorphisms (e.g., eco–tilling to circumvent expensive Sanger resequencing of PCR products, as in the case of SNP detection and genotyping),³⁸ but that still relied on laborious PCR optimizations.^39–42 The substantial and almost exponential drop in whole genome sequencing costs, thanks to the 1000 Human Genome Project, which has stimulated the development of highly cost–efficient high–throughput technologies, has also provided for the plant research community unprecedented opportunities for affordable in–depth characterization of plant genomes that has involved the genome–wide discovery of SSRs and SNPs and the detection of common, as well as rare functional variants by next generation sequencing.^43–49

Advantages and applications of molecular markers in plants

Time saving: genomic DNA can be isolated from any part of the plant tissue at every stage of its development and target trait information can be obtained with linked DNA markers before pollination, thus allowing breeders to carry out more informed genetic crosses.^9,50
Stability and reliability: phenotypic evaluation of genetic traits is often complicated by environmental factors. However, DNA markers are mostly neutral to environmental variation. The breeder can evaluate their material independently of the environmental conditions, environmental conditions can be favourable or unfavourable for morphologic and/or biochemical marker expression.⁹
Biosafety: diagnostic tests for the presence or absence of traits for disease resistance can be conducted by DNA markers tightly linked to the target gene without resorting to pathogen inoculation in the field or greenhouse. Molecular markers also facilitate introgression of genes into elite cultivars in advance of the occurrence of certain races of diseases or biotypes of insects.
Performance: evaluation of breeding lines in early generations of the breeding process with DNA markers can allow breeders to reject progenies from the programme and improve the genetic quality of breeding materials.
Precise selection of the complex traits: polygenic traits are often difficult to select for using conventional breeding approaches. DNA markers linked to quantitative trait loci (QTL) allow them to be treated as single Mendelian factors. Beside analyzing and selecting the interesting characters, molecular markers allow the researchers also to analyze the wild species with potential interest for the breeding program.⁵¹ The structure of genetic linkage maps using molecular markers is based on certain principles: a) selecting the molecular markers and genotyping system, b) selecting the parental lines from germplasm collections which are highly polymorphic in the marker loci, c) creating populations or lines (derived from these populations) using a large number of molecular markers segregating in the population (Figure 2), d) genotyping of each individual/ line using molecular markers and making linkage maps using markers information.

Figure 2 Examples of mapping populations and their relationship.⁵¹ AC, anther culture; BC, backcross population; BIL, backcross inbred line; DH, double haploid; IM, intermating; NIL, near–isogenic line; RIL, recombinant inbred line; TC, testcross; TTC, triple testcross.

Figure 3 Marker assisted pyramiding of two disease resistance genes. Note that homozygotes can be selected from the F₂ population.⁹

In the case of Marker assisted pyramiding: Pyramiding is the simultaneous integration of multiple genes/ QTLs into a single genotype. The most widely application of pyramiding is the integration of multiple disease resistance genes into a plant for durable resistance to a disease. The main advantage of molecular markers in gene pyramiding is their ability to search and discover multiple genes in plants whose phenotypic effects are difficult to be separated.^9,51
The most widely application of pyramiding is the integration of several genes for disease resistance (i.e., integration of qualitative resistance genes) into a single genotype. The motivation of this work is to develop "durable" or stable resistance to a disease, because pathogens usually overcome single–gene resistance over time due to the emergence of new strains of plant pathogens.

Evaluation of genetic diversity

A number of evolutionary processes can impact the genetic diversity of natural populations. These are: (a) spontaneously arising mutations; (b) gene flow via migration; (c) inbreeding; (d) natural selection; (e) the Wahlund effect; and (f) random genetic drift.⁵⁰ Genetic drift introduces random changes in allele frequencies over generations and becomes important for finite population samples and/or a large number of generations. These random allele frequency changes can, over time, lead to allele fixation or extinction. By all means, genetic drift represents a source of differences in genetic diversity among different populations. On the other hand, gene flow evens out among–population genetic differences, but increases genetic variation within populations, due to the introduction of new alleles. Selection influences within–population diversity, but the effects are dependent on the nature of these selection processes (balancing selection). Furthermore, the effects of natural selection are interwoven with stochastic effects, such as genetic drift. Mutations can counterbalance the loss of allelic diversity; however, natural mutations are rare, and such mutations that turn out to be harmful allelic variants are again removed by purifying selection. The occurrence of a population bottleneck causes a significant reduction in the effective population size and represents a major reason for the loss in allelic diversity, first by the loss of rare alleles, then by the successive loss of heterozygosity in the population.⁵⁰ Inbreeding and the presence of a subpopulation structure, where gene flow is prevented by habitat fragmentation (the Wahlund effect), both cause the loss in heterozygosity.⁵⁰ This, in turn, results in increased genetic diversity among populations.

Within–population genetic variation using genotype data

A gene is defined as polymorphic in the population when its most common allele is less frequent than 95%.⁵⁰ Genetic diversity can be assessed by estimating the following parameters: the total number of different alleles in the population, the percentage of polymorphic loci, the mean number of alleles per locus, the allelic richness, the within–population genetic diversity, $θ$ , the effective population size, Ne (i.e. $θ$ ,divided by the per–generation mutation rate), the minor allele frequency (as in the case of biallelic loci), the proportion of heterozygous individuals in the population for a given locus (the expected heterozygosity, (HE; based on the Hardy–Weinberg expectations that assume the random mating of genotypes), as well as the observed heterozygosity (HO) and the fixation index, F.⁵⁰ Genomic diversity is estimated by genome–wide assessment of genetic diversity using a larger sample of loci at random. An estimate of the genome–wide genetic diversity in a population is then derived by averaging heterozygosity over the multitude of studied loci.

Between–/among–population genetic variation using genotype data

Differences in the genetic diversity between/among (sub–)populations are assessed based on the presence of significant allele frequency differences; widely applied metrics to estimate such “genetic differentiation” include, for example, FST^51,52, $θ$ ⁵³, RST⁵⁴, ΦST (Φ′ST)^55,56, GST(G′ST)^57,58, DST⁵⁷, HST⁵⁹ or D.⁶⁰ Some measures are marker–dependent; they are based on the assumption of infinite–allele or stepwise mutation models, respectively, and depending on whether biallelic or multi–allelic molecular markers or haplotype data were used in the analysis (FST; RST; $Φ$ ST). Moreover, the use of fixation measures for result interpretation with regard to genetic differentiation has been found to be problematic when the populations under study exhibited high genetic diversity/heterozygosity.^58,60 For such cases, “standardized” genetic differentiation metrics’ have been suggested;^56,58,60 but, see also the recent publication on the topic by Whitlock et al.⁶¹ who emphasized the continuous use of FST for intra–specific differentiation estimation when the mutation rate is small (relative to gene flow), while emphasizing the use of $Φ S T$ and RST when the mutation rate is high (as in the case of SSRs). In any case, for the estimation of population divergence from genotypic data, freely available software packages within the R environment⁶² that have these statistics implemented are readily available (cf. “mmod”). Furthermore, genetic loci with allelic frequencies significantly different among populations and potentially under selection (“FST outlier loci”) can be efficiently detected using multilocus scans that compare the patterns of nucleotide diversity and genetic differentiation (based on the distribution of empirical FST estimates conditioned on HE) to the simulated genome–wide selectively–neutral genetic background.^63,64

Sequence divergence using sequence alignment data

Other and additional ways to look at genetic diversity and study mutation and selection events within populations and by comparing different populations involve the characterization of DNA sequences of genes and the diversity of nucleotides as the specific study entities.^65–68 Widely used tests include nucleotide diversity $π$ ,^50,69–72 and the McDonald–Kreitman & HKA (Hudson–Kreitman–Aguade) tests,^73,74 respectively. Such tests are implemented in the freely available software package, DnaSP.⁷⁵ The combination of results from such analyses has particular value for identifying past population size changes (population expansion or population bottleneck).

Diversity of forest tree population

One of the first comprehensive reviews on genetic diversity with regards to forest tree populations was published by Hamrick et al.⁷⁶ This early work summarized results based on isozymes and is especially valuable, as it compares long–lived forest trees with other life forms of plant species, in total comprising 662 different species with representatively high sample sizes for the analysis of the genetic diversity parameters. Long–lived, woody species showed the highest genetic diversity (including a significantly higher percentage of polymorphic loci and more alleles per locus) among all plant species. Specifically, the genetic diversity within populations was significantly the highest (HE=0.15) compared to all other plant life forms (HE<0.10). However, heterogeneity in genetic diversity exists among woody species taxa and this is due to the different evolutionary histories of species. For example, species from smaller founder populations, small disjunct populations or those with past population bottlenecks show generally less genetic diversity. Alseis blackiana, Picea glauca, Robinia pseudoacacia and Pinus sylvestris showed high diversity. On the other side of the spectrum were Acacia mangium, Pinus resinosa, P. torreyana and Populus balsamea with very low diversity.⁷⁶ Other studies^77,78 identified additional species with low intra–population diversity: Ficus carica and Thuja plicata.

While most studies identified high intra–population variation, by contrast, the diversity among populations of long–lived, woody tree species based on the GST estimate was significantly the lowest (GST =0.08) compared to the herbaceous and annual life forms (GST>0.25).⁷⁶ When woody angiosperms were compared to gymnosperms in terms of their intra–population genetic diversity, differences were not significant, yet the latter exhibited a significantly higher percentage of polymorphic allozyme loci, suggestive of a higher proportion of low frequency alleles in gymnosperm species.⁷⁶ Angiosperm species showed higher among–population genetic diversity (GST). Recent research on the conifer genome evolution, which involved orthologous coding sequence alignments for thousands of gymnosperms and angiosperm orthologous coding sequences, respectively, showed, more specifically, an overrepresentation of non–synonymous substitutions in protein–coding genes for conifers compared to angiosperms,⁷⁹ while the average synonymous mutation rate in angiosperms is significantly higher, suggestive of a higher number of fixed adaptive mutations in conifers. As expected, the extent of the geographical range had a significant impact on genetic diversity within species and among populations.⁷⁶ Geographically widespread species showed a significantly higher intra–population genetic diversity estimate compared to locally confined species, but the latter showed higher genetic diversity among populations.⁷⁶ However, the “non–significant” inter–population differentiation sometimes reported in these isozyme studies (see above) can mislead the directions of conservation efforts. Other marker types, those that are able to cover a higher portion of the overall genetic variation (such as restriction fragment length polymorphisms of DNA) succeeded in uncovering significant among–population diversity in Pinus and Quercus, specifically with the application of organellar DNA markers.^80,81 Differing outcomes for isozymes and organellar DNA studies on population divergence are frequent and were even reported within the same sample as for Argania spinosa (L.) Skeels, an important multi–purpose tree in the Moroccan local community.⁸² It is also clear that variation at selectively neutral molecular markers commonly used to assess genetic diversity within or among populations may not covary with the phenotypic expression of a particular qualitative or quantitative trait of interest,²⁹ such that population differentiation for adaptive traits (growth, morphology or fitness) is much higher than for isozymes, for example. In any case, the total allelic richness was identified as a more adequate directive than the HE estimate for conservation purposes, and marker types, such as SSRs or DNA sequence–based data, that are highly polymorphic are required for an accurate estimate.⁸² A recent study integrating molecular genetic analysis based on four SSR and five sequence loci along with climate modeling⁸³ forecasted the long–term decline of the late–successional Australian rainforest conifer, Podocarpus elatus, in its southern populations, due to habitat fragmentation (and the decline in Ne), for which conservation strategies are now invoked. Isozyme markers (15 loci) were used to characterize the genetic diversity of Carapa procera, which occurs in low density within a tropical rain forest.¹⁵ Its characteristics were high within–population diversity (comparable to temperate gymnosperms), high heterozygosity and a lack of spatial structure consistent with the highly outcrossing nature of the species, leading to extensive pollen–mediated gene flow that prevented local genetic differentiation. When 63 SNP polymorphisms (surveyed by eco–tilling) in nine different genes with broad functional properties were targeted as a feature for understanding DNA variation in 41 wild populations of a small western black cottonwood (P. trichocarpa) sample panel,⁴⁰ it was found that heterozygosity was high (HO =0.47) and that overall nucleotide diversity at the gene level (π=0.0018) among populations was low. Similarly, low average π values of the segregating sites were obtained for other forest tree species, such as P. nigra (π=0.0024)⁸⁴ and Pinus sylvestris (π=0.0025).⁸⁵ Much higher overall nucleotide diversity levels in a conifer were uncovered for P. taeda (π=0.00398).⁸⁶ Among the studied poplars, interestingly, the European species, P. tremula, showed the highest nucleotide diversity (π=0.007 or even π=0.0111,^87,88 dependent on the surveyed genes), but differences in diversity were also consistent with its different and complex demographic history. However, nucleotide diversity is best interpreted on a gene–by–gene basis, as population history and selection affect these mutation rates more specifically.^40,89 In a similar context, assessing the adaptive genetic diversity in forest trees is important to harness this adaptive potential for future forest management and conservation purposes.⁹⁰ Candidate genes underlying a specific trait of interest are typically selected (cf. nine candidate genes for bud burst in Quercus petraea: π=0.00615;⁹¹ 121 candidate genes for cold hardiness in Pseudotsuga menziesii var. menziesii: π =0.004;⁹² 13 candidate genes for drought stress in Pinus pinaster π=0.00548).⁶⁴ While most of this detected variation was largely attributed to purifying selection (an excess of nucleotide diversity at synonymous vs. non–synonymous sites), as commonly observed in forest trees, patterns of strong diversifying selection in candidate genes were also uncovered.⁶⁴

Conclusion

This review is focused on the value of integrating knowledge on adaptive complex traits as a companion to molecular markers for making informative management and conservation decisions. It is emphasized that integrative approaches using future climate modeling have been very successful in uncovering potential threats of declines of the genetic diversity and the distribution of forest tree species, so that timely precautions to preserve the species can be undertaken. Associated with the substantial drop in whole genome sequencing costs making the sequencing of genetically complex organisms more affordable, inventorying the complete portfolio of genetic resources has become feasible. This will also open new avenues for the conservation of previously marginalized and undervalued forest tree species that are considered of less economic value, but nevertheless represent value to the local ecosystems. While the present review focused primarily on the genetic diversity assessed for pure species, it is also stressed on the importance of investigating natural species hybrid zones as important sources of population genetic diversity in forest tree management. The potential of molecular markers application for the management forest gene conservation is summarized as following:

To clarify the identity of the taxa and their relatedness as well as infer their evolutionary histories
To rectify the correct clones and ramets in gene banks to avoid mislabeling, duplication and contamination.
To evaluate the amount, extent and distribution of genetic variation within and among populations
To estimate mating system (selfing and out–crossing rate) and gene flow.
To evaluate the status of genetic resources as the criteria for ex situ and in situ conservation from genetic information provided.
To maximize the management of gene conservation by combining adaptive traits, ecogeograhic and genetic survey for both ex situ collecting programs and for identifying sites for in situ conservation.

Therefore, how to manage, which and how many materials we should manipulate and where we should establish or protect the gene resources, depends on whether we really know the genetic background of particular species which we wants to conserve.