Human genetics of diabetic cardiovascular complications

Abbreviations: DCC, diabetic cardiovascular complications; mtDNA, mitochondrial DNA; SSRs, simple sequence repeats; STRs, short tandem repeats; CPI, combined paternity index; LOD, logarithm of odds; SNPs, single nucleotide polymorphisms; GWAS, genome-wide association studies; HDL, high density lipoprotein; MI, myocardial infarction; CAD, coronary artery disease; Lp, lipoprotein; CNV, copy number variation; T2DM, type 2 diabetes mellitus; CKD, chronic kidney disease


Introduction
The diabetic cardiovascular complications (DCC) involve the cardiovascular system, and are often classified as macro-vascular complications. 1-3 A large body of evidence indicates that major risk factors, such as long-term diabetes, poor control of blood glucose, and elevated blood pressure are responsible for the onset and progression of diabetic complications. 4 Patients cannot be stratified with respect to their risk of developing DCC based only upon clinical or procedural risk factors. A large body of evidence for the role of genetic factors in DCC has been generated over several decades. Genetic studies have reported that DCC has 40%-50% of the variance within indices for the extent of atherosclerosis, i.e. coronary calcium and carotid intimamedia thickness, can be attributed to familial factors among subjects with diabetes. 5,6 DCC can be considered as a classic example of a human complex disease attributed to genetic factors, environmental factors and interactions between them.
Currently, multiple genetic approaches have been used to identify which genetic loci or genes are risk factors for developing this complex disease. Genetic linkage analyses have been performed under the 'rare variant' hypothesis to identify genetic loci using extended families or sibling pairs. Genetic association analyses under the 'common variant' hypothesis have identified genetic susceptibility variants via a dense marker map. 7 Genetic analysis of mitochondrial DNA (mtDNA), including mtDNA genome association analysis and copy number analysis have provided insights into the underlying mechanisms of DCC. 8 This review focuses on the current knowledge of the genetic and epigenetic basis of DCC and summarizes data from previous genetic studies regarding susceptibility genetic variants that influence DCC. Ultimately, the identification of genetic variants and structural variants will be of benefit to the development of personalized medicine in DCC.

Genetic Linkage Analysis
Genetic linkage analysis detects the chromosomal location of disease genes and is based on the observation that genes that are physically close together on a chromosome remain linked during meiosis. 9,10 For this approach to be successful, it is very important to define a specific phenotype associated with each gene. Two classical approaches, parametric tests and non-parametric tests, are commonly used in genetic linkage analyses. [11][12][13][14] It is also critical to choose proper genetic markers for genetic linkage analysis. Microsatellites, also known as simple sequence repeats (SSRs) or short tandem repeats (STRs), are repeating sequences of 2-5 base pairs of DNA. This genetic marker is used as molecular markers in STR analysis, for kinship, population and other studies and used for studies of gene duplication or deletion, marker assisted selection and fingerprinting. Furthermore, the markers are most widely used owing to microsatellite loci being linked to highly polymorphic regions with greater than the combined paternity index (CPI). 15 The LOD score (logarithm (base 10) of odds), is a statistical test often used for linkage analysis in human, animal, and plant populations. The LOD score compares the likelihood of obtaining the test data if the two loci are indeed linked, to the likelihood of observing the same data purely by chance. Positive LOD scores favor the presence of linkage, whereas negative LOD scores indicate that linkage is less likely.
For candidate gene analysis, candidate genes of known sequence and location are identified that may be involved in disease pathogenesis and these are often selected on the basis of their physiological functions. In contrast, genome-wide screens are a more powerful approach that can be used to screen the whole human genome for gene linkage or association with a disease without making any assumptions regarding disease pathogenesis. 16,17 This type of approach been used successfully to identify susceptibility genetic loci for DCC. Genetic linkage analysis often consists of the following steps: identifying linked loci, confirming linked loci, fine mapping of confirmed loci and then testing genes in the linked region in functional studies. 18 Two whole genome linage analysis for DCC have been conducted to detected several chromosome regions such as 19q, 3p and 11p with linkage evidence to the complications. 19,20 Elbein et al. 19 have performed genome wide linkage study for DCC in Caucasian to detect 19q 13.2 with the strongest linkage evidence. Chromosomes 3p, 11p and 19p-q have been identified with linkage evidence to DCC in a whole genome scan study in Caucasian, Hispanic and African-American, respectively. 20 Interestingly, although the two genome wide linkage studies have been conducted in different ethnicity, linkage evidence to chromosome 19q has been consistently detected (Table 1).

Genetic Association Analysis
Genetic association studies are more sensitive and may detect minor susceptibility genes contributing less than 5% of the total genetic contribution to a disease. 7,9 The approach used for this type of analysis is based on comparing the frequency of the allele studied in unrelated patients with matched controls. If the allele appears significantly more frequently in patients than in controls, then it is considered to be associated with the disease. 21 Single nucleotide polymorphisms (SNPs) are the most important genetic markers for genetic association analysis, due to the abundance of SNPs covering the entire human genome at a high density. [22][23][24] Candidate gene association analyses use candidate genes of a known sequence and location that are considered to be involved in the disease pathology. However, approaches based on prior hypothesis have a limited power to detect novel genetic variants. Instead, a non-prior hypothesis is a more powerful approach for identifying gene(s) association with a disease by screening the whole human genome. Genome-wide association studies (GWAS) became a reality following the publication of the HapMap of the human genome. 25 Recently, several genes associated with type 2 diabetes have been reproducibly identified using genome-wide association studies. 26 Genetic association-based gene mapping consists of the following steps: genome-wide association using tag SNPs, confirming SNP association, gene identification and then functional studies.
Many candidate association studies have been performed to identify genes linked to cardiovascular disease (CVD) and/or diabetes mellitus and some gene associations have been consistently reported. 27 For example, polymorphisms in genes related to lipid metabolism or fibrinolysis, including APOE, APOB, APOC, PON, CETP, and PAI1, have been shown to increase the risk of ischemic vascular disease in diabetic patients. It is well known that lipid factors and their oxidation influence the development of diabetic metabolic syndrome and CVD. Further, APOE, APOB, or APOC gene polymorphisms have been reported to associate with macrovascular complications of diabetes, 28,29 although these results have yet to be reproduced. Paraoxonase is an enzyme associated with high density lipoprotein (HDL) and PON is considered as a candidate gene for DCC. Interestingly, an association between polymorphisms in this gene and the risk of CVD has been consistently described in patients with T2DM from different ethnic backgrounds [30][31][32][33][34] and three polymorphisms (rs662, rs854560 and Q191R have been linked with the risk of CVD in patients with T2DM. 35-37 Patients with diabetes carrying the G allele of the rs662 polymorphism have been found to have more than double the risk of myocardial infarction (MI) than patients with other alleles. 35 The exonic rs854560 polymorphism causes a leucine to methionine change in the encoded protein and strongly influences gene expression and serum levels of the enzyme. 36 Furthermore, the Q191R polymorphism was previously identified as an independent risk factor for CVD in patients with diabetes. 37 There is now a large body of evidence implicating this polymorphism as a genetic determinant for the risk of ischemic vascular disease in T2DM. CETP plays a key role in the metabolism of HDL, which regulates uptake of cholesterol by hepatocytes, and CETP polymorphism is a strong and independent risk factor for atherosclerotic vascular disease. Interestingly, the CETP rs1800774 polymorphism has been reported to associate with macrovascular disease in male T2DM patients independently of lipid levels. 38-42 It is well known that PAI-1 is the main circulating inhibitor of fibrinolysis, which causes thrombus dissolution. A single base insertion/deletion polymorphism of rs1799889 in the promoter of the PAI1 gene can partially determine the levels of PAI-1 43 and a possible association between this polymorphism and the risk of CVD in patients with T2DM has been reviewed in a meta-analysis. 44 An interaction of ACE genotype with the PAI1 genotype has also been reported. 45 In addition, two studies show apparently contradictory results regarding the rs2227631 (-455G/A) polymorphism of the fibrinogen gene. In one study, this allele was found to be associated with higher levels of fibrinogen and an increased risk of coronary disease in Chinese diabetic patients. 46 However, a second study conducted in an English T2DM population suggested that the G allele was associated with an increased risk of coronary artery disease (CAD), without affecting circulating fibrinogen levels. 47 No published reports have been found to GWAS for CAD being performed in the diabetic population specifically. However, findings from several GWAS for CAD conducted in the general population showed a potential relation to diabetic subjects. 48,49 Twelve loci with genome-wide significance have been found to associate with either CAD or MI in the general population. Two of 12 genes -LDLR and PCSK9 are mutated in Mendelian forms of hypercholesterolemia, 50 as are genes in the SLC22A3-LPAL2-LPA cluster, which includes the gene for the atherogenic lipoprotein (Lp) (a). Moreover, variations at chromosome 9p21 have been found to significantly associate with CVD in the general population. 51,52 In addition, the locus has a larger effect on CVD risk among patients with T2DM. 53 In general, common genetic variants with small effects do not significantly improve predictive algorithms for other complex disorders; however, the chromosome 9p21 locus indicates some genes associated with CVD. 54 For example, haplotype analyses have found an interesting CVD association with a group of SNPs residing in a 60kb region that includes ANRIL. [55][56] A decreased risk associated with the long to short variant ratio has been reported for this allele. 57,58 ANRIL links to the CDKN2A and CDKN2B genes, which are involved in the control cell the control cell proliferation, cell aging and apoptosis. 59,60 Hyperglycemia and variation in the 9p21 locus may induce vascular smooth muscle cell proliferation. 61 It is thought that several others of the 12genes identified in the general population can also influence the risk of CAD in the diabetic population. 62 In addition, the chromosome 6p24 locus, which includes the PHACTR1 gene, has been found to promote CAD with a strong effect, second only to that of the 9p21 locus. 63 Interestingly, RAGE gene has been found to associate with both DN and DR, AGE gene associate with both DN and DCC, and VEGF gene associate with DR and DCC. Unfortunately, no shared gene has been found to associate with the three complications ( Table  2). mtDNA association analysis mtDNA is non-genomic DNA located within mitochondria, which are the structures within eukaryotic cells that convert the chemical energy from food into ATP. The mitochondrial genome is highly compact, consisting of double-stranded circular mtDNA greater than 16kb in length. In humans, each cell contains between several hundred and more than a thousand mitochondria and each mitochondrion contains 2-10 copies of mtDNA. The number of mitochondria and mtDNA copies can vary dramatically in response to energy demand and under different physiological conditions and are tightly controlled by mitochondrial biogenesis. The consequence of mtDNA mutation may be a change in the protein-coding sequence, which may affect organism metabolism. Alterations in mitochondrial biogenesis may be the underlying pathological factors for several human complex diseases such as diabetes mellitus or DCC. [64][65][66] Further, there is compelling evidence for a genetic predisposition to diabetes complications [67][68][69][70][71][72][73] Single mtDNA mutations and mitochondrial haplogroups are associated with T2DM and many studies have evaluated mtDNA variation in T2DM patients. These subjects also showed a slight decrease in HDL cholesterol, indicating that the entire haplogroup H might play an important role in diabetic cardiovascular complications. This compelling analysis of grouped complications provides some initial clues concerning the role of mitochondrial haplogroups in modulating the course of the diabetes mellitus.

Copy number variant analysis
Recent discoveries have revealed that large segments of DNA can vary in copy number between individuals. A copy number variation (CNV) is a segment of DNA in which copy number differences have been found in two or more genomes; the segment may range from one kilobase to several megabases in size 74 CNVs can encompass genes, leading to dosage imbalances, and this may play important role both in human disease and in drug response. It was first realized that DNA, CNV is a widespread and common phenomenon among humans after the completion of the human genome project. 75,76 CNVs can lead to variations in dosage sensitive genes, which may contribute to a substantial amount of human phenotypic variability and disease susceptibility. 77,78 In genome-wide association studies, the raw intensity data generated from SNP genotyping can be mined for copy number information. 79,80 Unfortunately, no published genetic study to date has performed copy number variation analysis to identify associations between CNVs and DCC.

Gene-environment interaction analysis
Current genetic association analyses are designed to detect strong and direct associations of a SNP or clusters of SNPs, with disease. 81,82 However, in the context of complex diseases, scanning for strong associations may miss important genetic variants specific to subpopulations, defined by their exposure to particular environmental factors. Interactions of functional gene polymorphisms with environmental factors play a substantial role in disease risk. 83 Thus, in a genome-wide association study, gene-environment interactions are worth further investigation. 84 First, gene-environment interactions can reveal fundamental biological mechanisms and the effects of individual components on a complex mixture and can be important for risk prediction and for evaluating the benefit of changes in modifiable environmental factors. [85][86][87] Gene-environment studies have been performed for exposure-related diseases such as asthma, lung cancer and T2DM. 88,89 Unfortunately, no published study to date has performed a gene-environment interaction analysis to identify interactions for DCC.

Conclusion
With the development of GWAS, many genetic polymorphisms with a possible impact on DCC have been identified. However, it has been reported to inconsistent identifications of the genetic variants underlying susceptibility to DCC from GWAS. Standardization of phenotypes and genotyping protocols has been considered essential for GWAS due to the methods could pooling of individual patient level data in meta-analyses to increase their power. For cardiovascular complications, 9p21 has been found association with coronary artery disease and diabetes mellitus, respectively, which suggests that potential shared candidate gene associated with both diseases. The successful identification of the disease at an early stage, leading to changes in lifestyle and dietary behavior, is important for prevention and control of the disease. It is expected that characterization of the genetic factors involved in the development of diabetes mellitus and its complications will lead to the understanding of the molecular pathogenesis and the development of novel therapeutic approaches. Although there are many differences between mtDNA and nuclear DNA, there is coordinated expression and interaction between the gene products of the mitochondria and nuclear genomes.
Because DCC is involved in the important systems of humanthe cardiovascular system in a diabetic background, we can consider that shared genes influence the development of both systems and of diabetes mellitus. Thus, based on data from genome-wide scans for DCC, we can perform multivariant genome-wide association analysis for these systems and for diabetes mellitus. We have performed bivariate genome-wide linkage analyses for obesity and osteoporosis 90 and found several novel chromosomal regions that may influence both. We also plan to perform multivariant genomewide association analysis for chronic kidney disease (CKD) and T2DM, and for CVD and T2DM. Recently, pathway-based genomewide association analyses have been conducted to identify pathways underlying complex human disease, based on data from genome-wide scans. Similar analyses should reveal pathway-based genome-wide associations for DCC.
Whole genome sequencing study with "next generation sequencing" technology is an efficient strategy to sequence the human genome in order to identify novel genes associated with rare and common disorders. 91 Whole genome sequencing will eventually become a standard approach and allow us to gain a deeper understanding of genetic variation found in populations. 92