In the last two blogs in this series, we looked at the human genome, and tried to define the subunits of the human proteome. Having come up with a definition for these subunits (a proteome subunit is a protein or a complex made up of two or more proteins), we figured out that there might be a billion subunits in the human proteome alone. And for this we ignored the additional layer of complexity that is post-translational modification (PTM), which means that each protein might exist in any one of a multiplicity of chemically modified forms.
In this blog, we’ll first look at how much more complex this might make proteomic analysis, and then (probably) dismiss the whole post-translational modification thing as a red herring, at least in terms of the complexity of proteomic analysis.
There are many different chemical modifications that can be added to proteins post-translationally – phosphorylation, methylation, acetylation, hydroxylation, glycosylation. Some of these are mutually exclusive because they compete with each other for binding to the same amino acid side-chain. On the other hand, many of them are combinatorial – they can each be added independently to the same protein, and sometimes even to the same amino acid side chain! In fact there is no consensus on how large the post-translational repertoire actually is, not least because of the dearth of tools to detect PTMs (there are still no antibodies that reliably recognise phospho-serine and phospho-threonine, two of the earliest PTMs to be described) and because it is unclear what proportion of a PTM is lost during mass spec.
Khoury et al (2011) mined the Uniprot protein database for experimentally-proven and predicted-but-unproven PTMs on all proteins in the database (i.e. not just proteins encoded by the human genome). Khoury et al found more than 300 different experimentally confirmed PTMs in UniProt (counting phospho-serine separately from phospho-threonine or phospho-tyrosine) with 47,673 occurrences on 66,260 different proteins.
When looking at all of Uniprot, 27 PTMs account for 86% of the experimentally-described instances of a PTM, with protein phosphorylation alone accounting for 68% of all described PTMs. Surprisingly (at least to someone who has been raised on a protein phosphorylation diet) the proportion drops when looking at mammalian proteins, where only 53% of modifications are a protein phosphorylation. Less surprisingly, perhaps, in mammalian proteins phosphorylation on serine (19871) vastly outweighs phosphorylation on threonine (3759) and tyrosine (1871). The number of mammalian proteins with a detectable ubiquitylation is only 703, with 347 instances of sumoylation. Given the importance of ubiquitylation on protein turnover, and the fact that most yeast (the prototypical eukaryotic cell, predicted to be highly similar at the molecular level to mammalian cells) proteins are ubiquitylated, one suspects these may be underestimates. Indeed, and presumably after Khoury et al mined Uniprot, two papers have since described 19,000 ubiquitylation sites in 5000 human proteins and 11,000 on 4700 human proteins, respectively (Kim et al 2011 and Wagner et al, 2011). Updated versions of Khoury et al’s findings can be found here.
As Khoury et al (2011) discuss, these numbers may be wildly misleading. Protein phosphorylation dominates the list, which might reflect its importance in cellular regulation, but this could be an artefact: phosphorylation is easy to detect, whereas other PTMs aren’t and so these may be abundant, even predominant, but uncounted; in their own words: “it may be important to focus substantial efforts on developing new methods for identifying PTMs that are not widely annotated (…), as they may have a major undiscovered impact on the cell”. Ubiquitylation is a case in point here: the fact that ubiquitin forms long chains is likely to make it difficult to quantify by mass spec, while the lack of antibodies specific for each of the 7 potential linkages that ubiquitin can make (Kommander and Rape, 2012) means that their prevalence is unknown highlighting the fact that the proteome-wide search for post-translationally-modified proteins remains a challenging task, even when we think we have a good understanding of the PTM itself.
Back then to the suggestion that this might all be a red herring. Well, in the previous blog we defined the subunits of the proteome as protein complexes. The reason why post translational modification might be a red herring here is that what the modifications actually do is either stabilise or destabilise the modified protein- or drive it to form or to leave a particular complex. In other words, if we have the tools to study protein complexes, we can understand the proteome. Of course it would be great if we could study both PTM and complex formation at the same time, and correlate each PTM with a protein’s response. But without the tools, and with the huge diversity of chemical modifications known (87308 according to Khoury et al) and predicted (234938, Khoury et al, 2011), and the billion protein complexes we predicted in the last post, it will take some time.