Charting the Proteome, part 4: What do we need to do?

By Paul Ko Ferrigno

In the previous blog, we looked at how many protein complexes there might be in a single cell if just 10,000 proteins are expressed, as suggested by Nagaraj et al and Beck et al. The answer was of the order of 1 billion, not taking splice variants or post-translationally-modified protein isoforms into account. The next questions are firstly, how many copies of each complex are there, and second, how quickly do we need to be able to look at them? If we simply sum all the abundances of each individual protein quantified by Nagaraj et al, we arrive at an estimate of 2 billion protein molecules in each HeLa cell. Some of these proteins are expressed at very high levels (20 million copies per cell), although the median expression level is 18,000 copies per cell. Nagaraj et al also suggest that the 40 most abundant proteins make up 25% of the proteome by mass, while the 600 most abundant proteins make up 75% of the total protein mass in a HeLa cell (Nagaraj et al, 2011).

Can we sense-check these numbers? Many of these abundant proteins will be ribosomal components, and there are approximately 10 million ribosomes in a metabolically active cell. According to this excellent piece each ribosome is capable of joining 200 amino acids per minute. Assuming 1000 amino acids for a protein of an average size (say, 100 kDa), that means a single ribosome can turn out a brand new protein on average every 5 minutes. Assuming also a metabolically active, proliferating HeLa or U2os cell with 10 million ribosomes and a cell division time of 24 hours, that’s over 2.8 billion new protein molecules per cell, per day. Given that each protein has a half-life shorter than the life of the cell- the half life of a typical protein inside a cell ranges from 45 minutes to 22.5 hours (Eden et al, 2011)- these numbers do make sense and we can agree that cells are making 2 to 3 times more protein molecules over the course of a day than they harbour at any given instant. This probably also means that the complexes themselves are dynamic. For simple housekeeping they must be able to disassemble (either partially or completely) so that new protein molecules to be inserted in place of older ones. We know that mechanisms exist for this – for example the paradigm of dynamic protein phosphorylation has long told us that the reversible phosphorylation of particular residues on a protein will serve to change the conformation and/or behaviour of that protein. Presumably including its insertion into or removal from protein complexes.

If we assume that there are a billion protein complexes that disassemble and reassemble just twice in the 24 hour life cycle of a cell, then more than 11,574 complexes are being modified every second, in every cell.

Another way to look at this is to start from the median copy number of a protein, which is 18,000 molecules per cell. Knowing that half of these molecules will be turned over at least once in each cell cycle, that’s 9000 turnover events in every 86,400 seconds for any given complex, i.e. for any given typical protein, one complex turns over every 10 seconds. So to be able to monitor the life of a proteomic subunit in detail, we need to be working with a technology that is capable of taking snapshots faster than once every 10 seconds.

So a complete picture of the life of a cell, and the relative contribution to changes in protein number over time, will probably need to look at 10 million protein molecules, in perhaps a billion distinct protein complexes (see previous blog), every 10 seconds over a 24 hour period. Daunting.

But at least we can now write a roadmap that can guide us towards our ‘proteomics’ goal, of understanding life at the level of the protein molecules that do the work.

Task I: make multiple antibodies, non-antibody binding proteins, or other affinity reagents against 10,000 proteins (to start with)
Task II: assign these antibodies/affinity reagents into pairs that are capable of detecting interacting pairs of proteins
Task III: Given that each ‘pair’ of interacting proteins defines a complex with potentially 30 or more other proteins present, determine how many other proteins are present in each ‘pair’ to define proteomic subunits
Task IV: devise a system capable of looking at millions to billions of such complexes
Task V: understand how these contribute to the changing life of the system over time, at better than 10 second intervals.

Relying solely on antibodies for the first task is to take a big risk. These reagents will need to be mono-specific, renewable, stable and capable of producing reproducible results over time. The Human Protein Atlas is making terrific inroads here, but their reagents are unfortunately largely non-renewable and, as their own data show, antibodies in general are usually anything but mono-specific. The ultimate goal of Avacta Life Sciences is to address these types of problems, on a small scale to start with, but with the aim of generating a large catalogue of mono-specific, high affinity, stable and renewable Affimer reagents that can serve as replacements for antibodies in any assay. Perhaps most exciting is that we know that Affimer molecules are stable on surfaces, and that we can immobilise them in 10’s of 1000’s on microscope slides. With the appropriate fluorescent multiplexing ability this will potentially allow the interrogation of 10’s of 1000’s of proteomic subunits in parallel, in a single experiment.

There’s a long way to go- but we’re making a start!

References

Beck M, Schmidt A, Malmstroem J, Claassen M, Ori A, Szymborska A, Herzog F, Rinner O, Ellenberg J, Aebersold R. 2011 The quantitative proteome of a human cell line. Mol Syst Biol. 7:549. doi: 10.1038/msb.2011.82.

Eden E, Geva-Zatorsky N, Issaeva I, Cohen A, Dekel E, Danon T, Cohen L, Mayo A, Alon U 2011 Proteome Half-Life Dynamics in Living Human Cells. Science 31 pp. 764-768

Nagaraj N, Wisniewski JR, Geiger T, Cox J, Kircher M, Kelso J, Pääbo S, Mann M. 2011 Deep proteome and transcriptome mapping of a human cancer cell line. Mol Syst Biol. 7:548. doi: 10.1038/msb.2011.81.