Proteome-wide Analysis of Non-synonymous Single-nucleotide Variations in Active Sites of Human Proteins Open Access
Downloadable ContentDownload PDF
An enzyme's active site is essential to normal protein activity such that any disruptions at this site may lead to dysfunction and disease. Non-synonymous single nucleotide variations (nsSNVs), which alter the amino acid sequence by a one-nucleotide-substitution in the codon, are one type of genomic change that can alter the active site. When this occurs, it is assumed enzyme activity will vary because of the criticality of the site to normal protein function. We integrate nsSNV data and active site annotations from curated resources to identify all active-site-impacting-nsSNVs in the human genome and search for all pathways observed to be associated with this dataset to assess the likely consequences. We find that there are 934 unique nsSNVs that cause alteration in the active sites of 559 proteins. Analysis of the nsSNV data shows an over-representation of arginine and an under-representation of cysteine, phenylalanine and tyrosine when comparing the list of nsSNV-impacted active site residues to the list of all possible proteomic active site residues, implying a potential bias for or against variation of these residues at the active site. Clustering analysis shows an abundance of hydrolases and transferases. Pathway and functional analysis shows several pathways over- or under-represented in the active site nsSNV dataset with the most significantly affected pathways involved in carbohydrate metabolism. We provide a table of 32 variation-substrate/product pairs that can be used in targeted metabolomics experiments to assay for presence and quantify the effects of specific variations. Additionally, we find significant prevalence of aspartic acid to histidine variation in 8 proteins associated with 9 diseases including Glycogen storage diseases, Lacrimo-auriculo-dento-digital syndrome, Parkinson's Disease and several cancers.