Research Article Volume 5 Issue 2
1Molecular Biology Laboratory, American University of Science and Technology, Lebanon
2School of Criminal Justice, University of Lausanne, Switzerland
Correspondence: Issam Mansour, Molecular Biology Laboratory, Faculty of Health Sciences, American University of Science and Technology, Ashrafieh, Lebanon, Tel 961 03-649207
Received: July 20, 2017 | Published: July 26, 2017
Citation: Al-Azem M, El Andari A, Mansour I (2017) Estimation of Allele and Haplotype Frequencies for 23 YSTR Markers in the Lebanese Population. Forensic Res Criminol Int J 5(2): 00150. DOI: 10.15406/frcij.2017.05.00150
Y-STR analysis may in certain cases complement the autosomal STR markers in forensic, investigations, kinship testing and human identification. Hence, it would be informative to assess the probability of occurrence of the Y-STR haplotype in the Lebanese population where. This study aimed at estimating the Lebanese Y-STR allele and haplotype frequencies in 502 non-related males using the Y-filer Kit (Applied Biosystems) which includes the nine minimal Y-STR haplotype: DYS19, DYS390, DYS391, DYS392, DYS393, DYS389I/II, DYS385a/b plus 14 other Y-STR loci: DYS437, DYS438, DYS439, DYS456, DYS456, DYS448, Y-GATA-H4, DYS576, DYS570, DYS549, DYS643, DYS533 and DYS481. All 23 Y-STR loci were highly polymorphic with the marker DYS385a/b being the most polymorphic. 489 different haplotypes were defined where 476 (94.8%) carried a unique haplotype and the most common haplotypes appeared twice. This database has a discrimination capacity of 97.4% and a haplotype diversity of 0.9999%. The obtained data serves as an essential pre-requisite for using Y-chromosomal STR in routine forensic practice in the Lebanese population where the endogamy rate average is 88%.
Keywords: y-chromosome; y-str; haplotype; allele; frequency; lebanese population
Forensic investigation and human identification using DNA testing have come into popularity in the late 1980’s.1-7 with the most commonly used polymorphic markers in forensic casework lay on the autosomal short tandem repeats (STR). In some case work, such as in rape cases or DNA mixture, autosomal STRs fail to be informative.8,9 The amplification of Y chromosome STR markers provides an attractive alternative.10,11 which will amplify only the male DNA in the mixture. There is a wide range of other applications to Y-STR profiling namely patrinlineal relationships, familial search.12, disaster victim identification.13, studies of geographical or ethnic origins, archeogenetics.14,15, genealogical studies, reconstruction of human history and investigation of cases involving mixtures.16 Certain limitations have constrained the use of Y-STR since it is always in a haploid state.2, most of the polymorphisms lay in the non-recombining region of the Y-chromosome (NRY).17 and the NRY is inherited unchanged through paternal lineages unless a meiotic mutation occurs. Consequently, when a reference Y-STR profile matches a profile obtained from a crime scene trace, all members of the paternal lineage would probably match too, in addition to any male who shares a more distant paternal ancestry with the person to whom belongs the reference sample.18 The probability of sharing Y-STR profiles increases in isolated populations.19
According to Balding and Nichols.20, it is possible that within a suspect’s sub-population, the crime scene DNA profile may be more common than in the wider population. Lebanon is a country with a population of around 4.5 million and a geographical area of 10,452 Km2 situated on the eastern coast of the Mediterranean Sea. Several waves of immigration were observed in this area such as the Muslim expansion in the 7th century, the Crusades between the 11th and 13th centuries and Ottoman Empire expansion in the 16th century followed by the Armenian exode, French protectorate and the Palestinian exode in the 20th century. All these historical events led to the diversity in terms of religious belongings to 18 different religious communities with an average rate of endogamy of 88%.21 Endogamy is a widely common practice and has an effect on differentiation resulting from genetic stratification by genetic relatedness. Consequently, population allele and haplotype frequencies are essential for a more accurate use of Y-STR since they provide the basis for random match probability calculations in forensics and human identification.2 Inter-population variability seems to be more pronounced in the case of Y-chromosome than for the unlinked autosomal markers, which makes the definition of local databases essential for the use of Y-specific markers.22 In a crime or paterliniary study involving a suspect from an isolated sub-population group or geographically isolated region, the appropriate frequencies can be used in generating a random match probability.19 Such data are not available for the Lebanese population, and it would be highly informative to assess these frequencies where endogamy is a widely spread practice. The subject of the present work is to compile Y-chromosome allele and haplotype frequencies of natives from Lebanon using 23 Y-chromosome STR markers, calculate rate of occurrence and determine match probability in forensic casework.
Samples were collected from 502 non-related male individuals selected based on their geographic and religious distribution which represents the Lebanese population according to the Ministry of Interior and Municipalities in 2009 as described by El Andari et al..23 Samples collected were either EDTA blood (n = 346) or buccal swabs (n = 156) (from right cheek, left cheek and tongue). DNA was extracted from whole blood leukocytes using the salting out method and from buccal swabs using a modified phenol-chloroform method. Samples were quantified using Nanodrop 2000 (Thermo Fisher Scientific Inc.) and diluted accordingly to approximately 1 ng/µl.
PCR amplification
DNA amplification of the 23 Y-STR loci was performed using two commercial kits: the Applied Biosystems Y-Filer® multiplex PCR Amplification kit (Applied Biosystems, Foster City, CA) and the Promega PowerPlex® Y23 System (Promega, Madison, USA). The 23 Y-STR systems include the 11 core loci recommended by the SWGDAM: DYS19, DYS385a/b, DYS389I/II, DYS390, DYS391, DYS392, DYS393, DYS438, DYS439 and additional markers DYS437, DYS448, DYS456, DYS458, DYS635 and Y-GATA-H4. Amplifications typically contained 0.5-1.0 ng of extracted DNA. PCR reactions were carried out on GeneAmp PCR System 9700 (Applied Biosystems) using the cycling conditions as described in the manufacturers’ instructions.24,25
DNA typing
Electrophoretic separation and detection were performed using the ABI PRISM® 3130 Genetic Analyzer 4-capillary array system (ABI Prism 3130 Data Collection Software version 3.0) (Applied Biosystems, Foster City, CA). Size calling was performed using the GeneScan-500 Internal Lane Size Standard (LIZ-500) (Applied Biosystems) and CC5 Internal Lane Standard 500 (Promega, Madison, USA). Genotyping was performed by comparison with the provided allelic ladder and using Genemapper v4.0 (Applied Biosystems).
Statistical analysis
Y-STR data from the GeneScan® software was transferred to an in-house software named Forensic Information Management System (FIMS) to estimate the allele frequencies for the 23 Y-STR systems. Gene diversity (GD) was calculated for each Y-STR according to the formula supplied by Nei.26: GD = (n/n-1) (1-Σpi2). Haplotype diversity (HD) was also computed with the same equation using haplotype frequencies instead of allele frequencies. Unique haplotypes (UH), random match probability (RMD) and discrimination capacity (DC).27,2 were also calculated for the obtained data. Y-STR alleles are inherited in haplotypes, so their individual frequencies cannot be the product of the combined frequencies.29 Haplotype frequency was obtained using the counting method.30 Values were confirmed using Arlequin v3.5 software.31
Quality control
A proficiency testing quality control check was performed in conjunction with submission to the YHRD.org database.32
Allele frequencies for 23 Y-STR markers were estimated in the Lebanese population (Table 1). 23 Y-STR systems were amplified, yet they were designated as 22 systems because of the duplicated DYS385 system which is a multi-copy Y-STR system that represents variations at two loci simultaneously, thus were analyzed as a phenotype.33 Table 2 represents the allele frequencies for system DYS385a/b in the Lebanese population. Micro-variants (partial alleles) were observed in several occasions in system DYS458 (Table 1) whereby individuals exhibited an addition or deletion of 2 bp (alleles 16.2, 17.2, 18.2, 19.2, 20.2, 21.2, and 23.2). New and rare alleles were also documented (Table 3) according to published results in the NIST Standard Reference Database.34 Null alleles and multiple peaks were reported at a number of loci (Table 3). The null alleles will be investigated and confirmed in the near future using a second set of primers. As observed previously.35,36, DYS19 exhibits a relatively high frequency of duplications (reported to be 0.12%).
System |
Allele |
(N = 502) |
System |
Allele |
(N = 502) |
System |
Allele |
(N = 502) |
DYS385a/b |
10/10 |
0.004 |
DYS385a/b |
13/13 |
0.02 |
DYS385a/b |
15/17 |
0.008 |
10/15 |
0.002 |
13/14 |
0.02 |
15/18 |
0.006 |
|||
11/11 |
0.002 |
13/15 |
0.048 |
15/19 |
0.014 |
|||
11/12 |
0.008 |
13/16 |
0.056 |
15/20 |
0.002 |
|||
11/13 |
0.012 |
13/17 |
0.068 |
15/22 |
0.002 |
|||
11/14 |
0.062 |
13/18 |
0.09 |
16/16 |
0.028 |
|||
11/15 |
0.02 |
13/19 |
0.05 |
16/17 |
0.036 |
|||
11/16 |
0.008 |
13/20 |
0.006 |
16/18 |
0.02 |
|||
11/17 |
0.006 |
13/21 |
0.004 |
16/19 |
0.006 |
|||
11/18 |
0.002 |
14/14 |
0.018 |
17/17 |
0.012 |
|||
12/12 |
0.012 |
14/15 |
0.014 |
17/18 |
0.008 |
|||
12/13 |
0.012 |
14/16 |
0.054 |
17/19 |
0.006 |
|||
12/14 |
0.032 |
14/17 |
0.028 |
18/18 |
0.012 |
|||
12/15 |
0.012 |
14/18 |
0.038 |
18/19 |
0.012 |
|||
12/16 |
0.01 |
14/19 |
0.014 |
18/20 |
0.006 |
|||
12/17 |
0.018 |
14/20 |
0.004 |
19/19 |
0.012 |
|||
12/18 |
0.024 |
15/15 |
0.012 |
19/20 |
0.002 |
|||
12/19 |
0.008 |
15/16 |
0.012 |
19/21 |
0.002 |
Table 2 Allele frequencies for multi-copy system 385a/b
New Alleles |
Rare Alleles |
Null Alleles |
Multiple Alleles |
||||
Y-STR System |
Allele |
Number of Times Observed |
Allele |
Number of Times Observed |
Number of Times Observed |
Allele |
Number of Times Observed |
DYS438 |
7 |
2 |
|||||
DYS458 |
23.2 |
1 |
|||||
DYS456 |
12 |
1 |
|||||
DYS635 |
17 |
3 |
|||||
DYS448 |
1 |
||||||
DYS19 |
15, 16 |
2 |
Table 3 New, rare, multiple and null Y-STR alleles in Lebanese Population
Diversity of Y-STR alleles
In a second step, the gene diversity values of the tested Y-STR systems in Lebanese and Caucasians males were compared (Table 4).33,37,38 Systems were ranked according to their gene diversity values. When compared to the Caucasian population similar results were obtained for the two most polymorphic systems (DYS385a/b and DYS458). However, the system with the lowest gene diversity value in each of the Lebanese and Caucasian population groups were attributed to different markers (DYS392 in Lebanese population and DYS393 in Caucasian population), indicating that while some markers could be highly informative and polymorphic in Caucasian they could be of limited value in the analysis of the Lebanese population.
System |
Lebanese Gene Diversity |
Rank |
Caucasians Gene Diversity |
Rank |
DYS385a/b |
0.964 |
1 |
0.842 |
1 |
DYS458 |
0.855 |
2 |
0.777 |
2 |
DYS481 |
0.838 |
3 |
0.72 |
6 |
DYS570 |
0.814 |
4 |
0.747 |
4 |
DYS635 |
0.777 |
5 |
0.643 |
11 |
DYS576 |
0.775 |
6 |
0.768 |
3 |
DYS643 |
0.762 |
7 |
0.625 |
12 |
DYS389II |
0.75 |
8 |
0.676 |
9 |
DYS390 |
0.676 |
9 |
0.708 |
7 |
DYS448 |
0.671 |
10 |
0.596 |
15 |
DYS456 |
0.649 |
11 |
0.722 |
5 |
DYS533 |
0.648 |
12 |
0.588 |
17 |
DYS439 |
0.648 |
13 |
0.648 |
10 |
DYS438 |
0.645 |
14 |
0.59 |
16 |
DYS19 |
0.636 |
15 |
0.509 |
21 |
DYS549 |
0.627 |
16 |
0.68 |
8 |
DYS393 |
0.599 |
17 |
0.381 |
22 |
DYS389I |
0.579 |
18 |
0.52 |
20 |
Y-GATA-H4 |
0.562 |
19 |
0.599 |
14 |
DYS437 |
0.557 |
20 |
0.576 |
18 |
DYS391 |
0.495 |
21 |
0.546 |
19 |
DYS392 |
0.438 |
22 |
0.604 |
13 |
Table 4 Lebanese population gene diversity compared as to the Y-STR systems in the Caucasian population
Taking into consideration the high rates of endogamy in the Lebanese population, the gene diversity for the different Lebanese sub-populations was calculated to attempt and assess whether genetic differences between different religious sub-populations existed. Results showed that a marker may have variable polymorphisms among these subpopulations (Table 5). For example DYS448 could discriminate 72.5% of the individuals in the Muslim Shiite subpopulation; however, it could only discriminate 59.1% of individuals belonging to the Druze subpopulation. Another variation was recorded in system DYS392 between Armenian Orthodox (73%) which greatly differed from Christian Orthodox (30%). The latter example particularly shows the effect of endogamous marriages whereby both communities were Orthodox, yet they still differed in terms of marker gene diversity since Armenian Orthodox and Christian Orthodox do not intermingle.
Lebanese Population |
Muslim Sunnite |
Muslim Shiite |
Christian Maronite |
Christian Catholic |
Christian Orthodox |
Armenian Orthodox |
Druze |
|
Sample Size |
502 |
140 |
137 |
108 |
26 |
30 |
16 |
32 |
DYS19 |
0.635 |
0.69 |
0.583 |
0.637 |
0.569 |
0.628 |
0.717 |
0.639 |
DYS389I |
0.577 |
0.513 |
0.559 |
0.643 |
0.532 |
0.591 |
0.7 |
0.667 |
DYS389II |
0.75 |
0.728 |
0.726 |
0.78 |
0.683 |
0.756 |
0.767 |
0.681 |
DYS390 |
0.676 |
0.675 |
0.628 |
0.711 |
0.68 |
0.687 |
0.717 |
0.625 |
DYS391 |
0.495 |
0.521 |
0.489 |
0.523 |
0.394 |
0.453 |
0.5 |
0.476 |
DYS392 |
0.441 |
0.404 |
0.447 |
0.48 |
0.351 |
0.306 |
0.725 |
0.544 |
DYS393 |
0.599 |
0.644 |
0.575 |
0.565 |
0.631 |
0.57 |
0.625 |
0.643 |
DYS437 |
0.557 |
0.468 |
0.58 |
0.634 |
0.44 |
0.549 |
0.675 |
0.542 |
DYS438 |
0.643 |
0.542 |
0.663 |
0.662 |
0.683 |
0.593 |
0.775 |
0.639 |
DYS439 |
0.648 |
0.645 |
0.627 |
0.66 |
0.714 |
0.57 |
0.7 |
0.669 |
DYS448 |
0.671 |
0.618 |
0.725 |
0.683 |
0.532 |
0.641 |
0.675 |
0.591 |
DYS456 |
0.647 |
0.694 |
0.614 |
0.673 |
0.397 |
0.72 |
0.617 |
0.623 |
DYS458 |
0.857 |
0.872 |
0.835 |
0.835 |
0.886 |
0.857 |
0.792 |
0.766 |
DYS635 |
0.777 |
0.771 |
0.771 |
0.774 |
0.772 |
0.779 |
0.808 |
0.81 |
Y-GATA-H4 |
0.563 |
0.538 |
0.568 |
0.541 |
0.603 |
0.618 |
0.633 |
0.591 |
DYS385a/b |
0.964 |
0.96 |
0.956 |
0.961 |
0.963 |
0.986 |
0.958 |
0.96 |
DYS570 |
0.815 |
0.805 |
0.782 |
0.819 |
0.822 |
0.848 |
0.792 |
0.891 |
DYS576 |
0.774 |
0.773 |
0.803 |
0.753 |
0.766 |
0.743 |
0.533 |
0.647 |
DYS481 |
0.838 |
0.856 |
0.801 |
0.829 |
0.886 |
0.86 |
0.792 |
0.875 |
DYS643 |
0.762 |
0.736 |
0.751 |
0.773 |
0.846 |
0.777 |
0.708 |
0.732 |
DYS533 |
0.648 |
0.619 |
0.642 |
0.674 |
0.674 |
0.609 |
0.725 |
0.677 |
DYS549 |
0.627 |
0.617 |
0.633 |
0.655 |
0.502 |
0.625 |
0.4 |
0.714 |
Table 5 Lebanese population and the major subpopulations (whereby n ≥15) gene diversity values
This demonstrated possible heterogeneity of allele frequencies across the different Lebanese sub-populations and is in agreement with previous population studies which showed that sub-populations exhibit greater differentiation at certain loci with the possibility of identifying new and unique alleles.39
Lebanese Y-STR Haplotype Frequency
The haplotype diversity (HD) in the Lebanese population was 0.9999%. A total of 489 distinct haplotypes were observed in the total data set (n = 502) with 476 haplotypes being unique and 13 haplotypes observed more than once (Table 6). Different case scenarios were seen with the 13 common haplotypes shared by two individuals each: individuals either originated from the same religious community and geographical area, the same religious community but originated from different geographical areas or vice-versa or belonged to different religious communities as well as to different geographical origins. Results showed that Y-STR haplotypes were not restricted to members originating from the same family. Individuals from distinct geographical areas and/or religious communities shared a common haplotype. These cases could be explained by common ancestry whereby non-related individuals may have had a distant common ancestor and common haplotypes were transmitted from a distant common ancestor without mutation. Haplotype diversity (HD), discrimination capacity (DC), unique haplotypes (UH) and random match probability (RMP) calculations were performed to determine how common a Y-STR haplotype occurred in the population and how frequent a random match could occur between two non-related individuals (Table 7).
Population |
No. of Distinct Haplotypes |
No. pf Haplotypes Observed Once |
No. of Haplotypes Observed More than Once |
Haplotype Diversity |
Lebanese (n=502) |
489 |
476 |
13 |
0.9999 |
Table 6 Haplotype diversity and number of distinct haplotypes in the Lebanese Population
Lebanese Population |
||||
(N=502) HD (%) |
UH |
RMP (%) |
DC (%) |
|
23 Y-STR |
0.9999 |
476 |
0.0001 |
97.4 |
Table 7 Statistical indices for the Lebanese population
The occurrence of common haplotypes raises the question of how often a haplotype is present in the population and what are the chances of a match when used in forensic cases. The frequency of the most common haplotype (n=2) was 0.009% meaning that the probability of finding an individual sharing this haplotype is one in 111 (Table 8). When comparing a haplotype occurrence within the total population and sub-population datasets, the haplotype frequency and consequently the discrimination capacity varied. The haplotype frequency decreased in the sub-populations leading to a lower discrimination capacity and higher match probability. A match frequency of only one in 111 in the total population has a match frequency of one in 30 among Muslim Sunnites and of one in seven among Druze. The match probability varies greatly when using total population v/s sub-population datasets.
Religious Community |
Haplotype Frequency (Occurrence) |
Match Frequency in Total Population |
Match Frequency in Sub-Population |
Muslim Sunnite (n =140) |
2 |
1/111 |
30-Jan |
Druze (n = 32) |
2 |
1/111 |
7-Jan |
Table 8 Rate of match of the most frequent haplotype in the total population and in the subpopulation where each haplotype occurred
Allele and haplotype frequencies for the Lebanese population were estimated. The study showed that Y-STR markers exhibit possible genetic Y-chromosomal heterogeneity within and between populations and would seem to be very useful to trace back human evolutionary processes at a historical time-scale. While many Y-STR markers were highly discriminative in the Lebanese population; others were less discriminative such as system DYS392. In the future, if specific kits were to be developed for the Lebanese population and other closely related populations with similar socio-economic characteristics, we may omit this system (DYS392) and other similar low discriminative markers. Match probability differed when comparing the national database with the respective sub-population database, thus the question that raises itself which dataset to use in case of a match profile in a forensic case. In order to properly assess this issue, more samples from each subpopulation should be tested in order to determine whether allele and haplotype frequencies along with the match probability would change.
Evaluating possible Lebanese genetic sub-structures will be essential. If sub-structures exists, they should be accounted for when producing the strengths of the DNA profile evidence in Y-STR analysis whereby theta values will be calculated and incorporated in calculations. These criteria should be used to validate and evaluate Y-STR haplotype frequencies used in match probability calculations in forensic cases, human identification and kinship studies. Results in this study showed that there is a certain level of endogamy in the Lebanese population. Hence, for better assessment of the effect of endogamy a further Y-23 study will be performed on random villages with known high endogamous rates. This article follows the guidelines for publication of population data requested by the journal.40
The authors wish to thank and acknowledge all the Lebanese volunteers for participating in this study, and Applied Biosystems and Promega for granting and supporting this project.
The author declares there are no conflicts of interest.
©2017 Al-Azem, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.