The Human Proteome Re-mapped

The newly published tissue-based map of the human proteome includes a catalogue of 13 million annotated images from 32 different tissues with additional RNA and protein expression data across a total of 44 tissues.

The team of researchers led by Professor Mathias Uhlén from the Royal Institute of Technology in Stockholm, Sweden probed 32 tissue type samples with a panel of antibodies to 16,975 human proteins: these included 24,456 antibodies generated ‘in-house’ as part of the Human Protein Atlas project and 3,572 externally supplied antibodies.

Using this antibody-based approach they generated a compendium of immunohistochemistry images that have been annotated by a team of pathologists from Mumbai, India. This database shows the intracellular localisation of each of these proteins and is available online as a free interactive resource presented as part of the HPA project.

Together this version of the proteome is able to show us not just which proteins are expressed, at what levels and in what tissues, but also the cellular localisation of each protein, whether secreted, soluble or membrane-bound. This is complementary to the previously published human proteome maps, which used mass spectrometry to analyse the different protein levels in different tissues. While mass spec can identify and quantitate protein levels, the IHC approach taken in the new paper provides resolution at a cellular level within a tissue that mass spectrometry cannot.

Uhlén and colleagues found that approximately 44% of all proteins expressed in any tissue were ubiquitous. Representing proteins involved in metabolism, cell structure, transcription, translation and replication their ubiquitous expression is indicative of their roles as housekeeping proteins. RNA transcripts from the different tissue samples showed that the percentage of tissue-enriched transcripts was typically just a fraction of the cellular protein, but in pancreas this amounted to 70% of the total transcripts and 35% of liver.

Of the ubiquitously expressed proteins only 2% were supported only by RNA data, yet for the proteins elevated in specific tissues antibody-based evidence was lacking for 18% of the almost 17,000 proteins analysed, demonstrating the large number of issues that can arise with antibody

Of the proteins analysed the authors show that approximately 3,000 are secreted from cells and an additional 5,500 proteins are located to the membrane systems of the cells. Comparing this to the 618 proteins targeted by currently approved pharmaceuticals they show that 70% of the current pharmaceutical targets are either for secreted or membrane-bound proteins, with 30% of current protein targets showing ubiquitous expression across all tissues and organs, which may help to explain some drug side effects.

In addition to the 44 primary tissues analysed, the Swedish team also looked at 46 different cell lines. In line with previous studies suggesting that cell lines dedifferentiate in vitro, expression of many of the tissue-enriched genes in the cell lines were switched off. This has implications for the use of these ‘model’ cells to accurately infer the basic biology of specific tissues or drug treatment outcomes.

Combining the data of this study with the UniProt database and the recent proteogenomic mass spectrometry study of the human proteome the authors show that 17,132 proteins from the putative proteome have been identified through one of the three methods and 13,841 proteins with evidence from at least two of the efforts. An additional 2,546 proteins can be accounted for at RNA level from UniProt annotation or the RNA-sequencing data from this latest tissue-based proteome. So, out of the 20,356 putative protein-coding genes in the human proteome only 677 (3.3%) are lacking any experimental evidence to support their expression. Many of the missing 677 genes have been removed by Ensembl in later revisions of the genome or are now thought to be non-coding and so will likely be removed in the future, suggesting that we may be close to revealing the whole human proteome.

The ultimate goal in the exploration of the proteome is to define the structure, function, expression, localisation and interaction of all human proteins. The next stages of the project, they say, will focus upon the ‘missing’ 677 proteins to try and generate a finite list of the human proteome and the mapping of the isoform proteome to better appreciate the role of this diverse proteome across different tissues and organs.

The HPA project is doing an amazing job in generating and validating antibodies to the entire human proteome. However, in their generation of the 24,456 antibodies used in this study well over 50,000 were generated, with approximately 50% failing their rigorous validation studies according to Fredrik Ponten, the HPA’s Vice Programme Director.

Yet even the antibodies getting through this validation process are not all of equally high quality; if nothing is known about a particular protein besides its sequence, and the corresponding antibody produces a distinct signal in IHC, the project team will approve the antibody if no evidence exists to dispute it.

Furthermore, though the antibodies used to generate the HPA are all well validated and serve as an excellent resource for medical and biological science, stocks will not last forever and with batch-to-batch variation acknowledged as a significant problem in the life sciences the validation of all antibody reagents will be a constant game of catch-up.

Affimer technology is an alternative to antibodies, offering the specificity and affinity you would associate with antibodies, but without the batch-to-batch variation and long generation times. We can even produce Affimer proteins to targets where antibodies fail and the small size of the Affimer molecule allows them to better penetrate tissues in applications such as IHC, making them an ideal affinity reagent.