Loss and Gain of N-linked Glycosylation Sequons in Cancers Open Access
Downloadable ContentDownload PDF
Abstract of ThesisLoss and Gain of N-linked Glycosylation Sequons in CancersYears of sequence feature curation have culminated in a rich resource of information about genes and proteins including positions where variants occur and positions where glycosylation is known or predicted to occur. Some commonly used databases that house this information are UniProtKB/Swiss-Prot, NCBI-CDD, and RefSeq, among others. Despite the availability of data about these sequence features, however, there is a lack of specific efforts that involve the integration of mutation data, the effect of such variations on N-linked glycosylation sequons, cancer-relatedness of variants at these positions, and supporting evidence from literature mining efforts. We propose to integrate non-synonymous single nucleotide variations (nsSNVs) in cancers that are of interest to the glycobiology community based on the function of the genes in which they occur. Our previous study has shown that 1,091 proteins of the human glycoproteome have either loss or gain of glycosylation sequon (LOG or GOG, respectively) due to known nsSNVs. We have now expanded our study to include known somatic mutations in various cancer types. We collected all the real and possible N-linked glycosylation sequons (NLGs) in the human proteome and then mapped them to the polymorphism and cancer mutation data. 15,314 out of 20,199 human proteins have a total of 59,220 Asn-X-Ser/Thr sequons, where X is not proline. 4,598 of human proteins are annotated as glycoproteins in UniProtKB/Swiss-Prot. We have identified 59,341 non-redundant NLGs from the human protein by three different methods, 16,125 of which are verified high-confident sequons. For distinct somatic mutations associated with 3 or more cancer types in membrane or secreted proteins, 42 NLGs are abolished by 42 mutations on 41 proteins (LOG); 46 sequons are created by 46 mutations on 45 proteins (GOG). This work is expected to identify potential N-glycosylation disease biomarkers for further studies.