Genetic Variation and its Impact on Human Diseases
Abstract
Variation in the human genome is common; studies suggest that on average, any two individuals differ by around 1 in 1000 base pairs, that is 3 million variations. While millions of variants are known, the vast majority are common in the general population and are unlikely to affect an individual's risk of developing disease. However, the minority are rare and may predispose to disease which is likely dependent on the type of mutation, its position within the gene, the proportion of individuals affected, and the availability of suitable nuclear or animal models. Most human genetic disease is due to a single mutation in a protein-coding gene and therefore the search for the genetic cause is usually simplified by searching for protein-altering mutations in coding exons and adjacent splice sites. To date, mutations in ≈4000 human genes have been associated with disease. The focus here will be on human disease, although much of the research that defines understanding comes from the study of animal models that share similar or related genes . The human genome comprises roughly 3 billion bp of DNA. The majority of this DNA is present within the nucleus, as chromosomes, but there is also a small amount of DNA in the mitochondria; this organelle has a unique circular genome inherited maternally that encodes proteins integral to cellular respiration. Most individuals possess 23 pairs of chromosomes (46 total), so much of the DNA content is present in two copies, one from our mother and one from our father. Each parent contributes one complete set of 23 chromosomes, so the total number is maintained across successive generations. Each human somatic cell possesses two copies of each chromosome, making 46 in total. The following are the basic features of the human genome. The human nuclear genome encodes roughly 20000 protein-coding genes, which typically consists of both protein-coding (exon) and non-coding (intron) sequences. These transcript types can be classified by biogenesis as small nuclear (sn)RNA, small nucleolar (sno)RNA, micro (mi)RNA, and long non-coding (linc)RNA.