In the first blog in this series, we looked at the complexity of the human proteome compared with the relative simplicity of the human genome (which is actually, of course, really not that simple). We touched upon the units of each ‘-ome’, genes and proteins, and the way that proteomic complexity increases as the subunits are made and are modified. In this blog, we’ll question whether proteins really are the subunits of the proteome, and what it is that ‘proteomics’ should be aiming to achieve?
Using mass spectrometry to detect and measure proteins in cell lysates, two groups showed that U2os and HeLa cells express roughly 10,000 proteins each (Beck et al, 2011; Nagaraj et al, 2011). This is a pleasingly small number – we can realistically plan to study these few proteins in a single experiment. But it is also a very large number, because it increases exponentially when combinatorial post-translational modifications and combinatorial protein-protein interactions are taken into account – and one major unanswered question is, just how many protein complexes are there in a cell?
Starting from just over 5,500 clones expressing human proteins, Stelzl et al (2005) found that 1,705 of them could bind to at least one other protein in the same set. They also found that these 1,705 actually are capable of making 3,186 pair-wise interactions. Of these, nearly half (47%: 804 of 1,705 proteins) bind to only one partner, while 24 of these proteins (1.4%) have at least 30 partners and are designated as ‘hubs’ that probably play essential roles in cell biology (Stelzl et al, 2005).
What would this mean for proteomics? Well, if Nagaraj et al and Beck et al are right and there are 10,000 proteins present in a cell, then 4,700 of them would bind to just one other protein, and 140 of them may bind to one or more of 30 potential partners. Proteins will undoubtedly behave differently according to the partners they bind, and so understanding the proteome requires first cataloguing, then measuring, not just the 10,000 proteins themselves, but also all of the protein complexes. The subunits of the proteome are not simply proteins; rather the proteome is made up of a combination of proteins and protein complexes, and we need to define proteome subunits as each protein or protein complex.
So how many proteome subunits are there? Doing some quick sums using the data in the 3 papers mentioned above, we need to look at 2 x 4,715 proteomic subunits made up of protein singletons or pairs; and 140 billion proteomic subunits using the more promiscuous proteins – picking any 2 or more proteins out of 30. (Thanks to Avacta’s Kurt Baldwin for the maths!)
This is probably an upper limit. Not every protein will be able to interact with all of its partners at once, with some proteins competing for the same binding surface. But it also ignores the myriad new possibilities introduced by alternative splicing and post translational chemical modifications. Both of these will be difficult to follow using bottom up mass spectrometry, where every protein is digested into peptides (so alternative splice variants become difficult to detect), not all of which fly predictably (or possibly at all) when post-translationally modified.
So, this represents a major challenge for mass spec in terms of sensitivity and specificity, as we need not just detection of the protein molecules, but also to be able to identify protein complexes. This is a relatively new challenge that is being addressed by people like Carol Robinson and Matthias Mann. This also represents a challenge for more traditional technologies. For example, we could aim to build immuno-assays for proteins and complexes, as the Human Protein Atlas is aiming to do. But this is proving to be a major challenge too, not just in terms of producing a renewable supply of mono-specific reagents but also in being able to devise assays that allow the antibodies to be used to dissect protein complexes in terms of the individual proteins that make them up.
Part III coming soon!