The title of a paper declaring that proteomics needs more statistics (link is external) is sure to put many on edge, and not just those of us with statistical analysis skills that require us to consult a textbook for each sequential step.
With reproducibility across science as a whole being criticised sooner or later statistics was sure to take a turn in the spotlight. In some scientific fields this isn’t as much of a problem, but between statistical misuse, misunderstanding and the assessment of bigger and bigger datasets in life science it is a problem that needs to be tackled. One of the main criticisms (link is external) is that scientists blindly use statistics such as p values and false discovery rates, without examining the trends in the data, without understanding what it is they are calculating and without an appreciation of what the statistics mean. p-values as a measure of the significance and credibility of data were never meant to be used in the way they are today, as a proxy for whether a hypothesis is true.
P-hacking (link is external), as it has become known, is the biased processing of data, through multiple methods of analysis until the desired result is achieved or monitoring data as it is collected. An incomplete understanding of the statistical methods being used and inherent biases in our analysis towards extreme results mean that we may subconsciously p-hack our own results. Many journals are only keen to accept statistically significant studies. In todays’ research ethos of ‘publish or perish’ this has inevitably led to a focus on small p values, and only serves to increase the bias towards statistically incorrect data being published.
A major study (link is external) published in 2005 suggested that most published research findings are false, due to the incorrect evaluation of results. This has caused many scientists to re-examine the way in which they evaluate their results. At the same time statisticians are looking for better ways of thinking about data, to help scientists interpret their results and avoid them missing important information.
Any quick trawl through pubmed will show that p values are very popular, and any reformation in statistical analysis would have to be strong enough to take on this entrenched culture. Many are calling for a more all-encompassing approach to the use of statistics, encouraging the use of lots of different statistical methods on the same dataset to assess their findings. If the different statistical methods provide different interpretations of the dataset then researchers need to look into why. This might give a more thorough idea of what is actually going in the data and the reasons behind it.
In terms of proteomics, the two drafts of the human proteomes show remarkably high levels of sample prep, instrumentation and methodologies. Yet, both relied upon common statistics such as the common 1% false discovery rate. In datasets this large and complex these statistical margins appear weak. 1% bad matches sounds pretty good, until you have a billion observations.
In an era where our instruments generate more data than we could ever possibly manually examine we are going to need to rely more and more on algorithms to sort things out. But this doesn’t mean we should abandon our own scientific judgement. The integration of newer and more complex statistical analyses to handle large datasets should be the starting point, with the return to the use of our own scientific judgement in the assessment of these statistics and their meaning rather than pursuit of statistical significance in and of itself.