A first step across the omics gap?

In a recent blog we talked about the omics gap and the problems this is causing across life sciences and in drug development. Well it seems that some of the super smart people out there have done more than sit and ponder the problem and actually set about finding a solution and integrating some of our different tools of analysis so that we can really get to the bottom of stuff.

The C-HPP (Chromosome Centric Human Proteome Project) aims to catalogue the proteins expressed by our 20,300 or so genes in the context of the chromosomal gene sequences to improve collaboration with molecular biologists. An important part of this challenge is the missing proteins which have been estimated to account for roughly 20% of the total coding proteins. The incomplete proteome information includes uncharacterised products for known protein coding genes, variants generated by alternative splicing and coding SNPs and a comprehensive characterisation of major post-translational modifications (PTMs).

ENCODE, short for the Encyclopedia of DNA Elements, is an amazing genomic resource at UCSC that has been ongoing since 2002. It offers one way of trying to make sense of the masses of DNA sequencing and expression data that has been rapidly accumulating out there in the ether. You can find more about ENCODE here (link is external).

The team working on chromosome 19 as part of C-HPP have now worked out ways of incorporating the data from the transcriptomic resource of ENCODE and the proteomic technology pillars (mass spec, antibodies and bioinformatics) to create a powerful strategy to leverage the discovery and identification of missing proteins and novel proteoforms. This all-encompassing approach can be seen in their paper (link is external)published in the Journal of Proteomic Research.

Using a model cell system of glioma stem cells (GSC) they analysed gene activity of chromosome 19 in a number of GCS lines. This uncovered the differential regulation of over 200 genes between different subtypes of GSCs. The transcriptomic data was compared to the neXtProt database with the identification of 15 uncertain or predicted proteins. Transcripts of 41 zinc-finger proteins and all 8 olfactory receptors were also identified and the team will continue to examine whether any of these or the other 290 transcripts listed as transcript only within the neXtProt database are actually present in the protein form.

They translated their data from bioinformatic searches of ENCODE into a searchable database for proteomics yielding 80 previously unpredicted proteins, 3 of which were from chromosome 19; 1 was the result of a fusion between a known chromosome 19 exon and an unknown exon and 2 were the result of the fusion of two unknown exons. These are all now being progressed as high priority for antibody targeting to enable us to understand more about their biological roles through the creation of protein arrays, sensitive quantitation of mass spectrometry studies and immunohistochemistry analysis.

This integration of RNA-seq and proteomic data by C-HPP has demonstrated that the detection of novel spliceforms as transcripts are in fact translated to proteins. These unidentified proteins likely hold much promise to pharmaceutical companies as targets for disease processes.

Within proteomics, or any other study where we’re making thousands, millions or billions of observations, false discoveries have always been a problem. But false discoveries can only be scored based on what we currently know as true. The data coming out of C-HPP may help to screen our categorisation of true and false discoveries. A whole host of things that we were previously unable to explain that were scored as false observations in genomics it seems may be explained by unmatched spectra from proteomics runs.

So while it seems that we are incredibly complex creatures and any one technique can only ever peek at a small fraction of the biological picture, integrating our technologies in this way may help us to drill a little deeper and uncover a little more in our quest to understand the way we function.