Errors in excel (and everything else)

Autoformatting in Microsoft Excel is apparently responsible for errors that appear in 20% of the genetics papers (link is external) containing excel spreadsheets across the top scientific journals. The study from researchers in Australia examined the occurrence of these errors across nearly 3600 papers published in journals such as Nature, Science and PLoS One.

Microsoft Excel screenshot

These are typically simple errors generated when gene names in a spreadsheet have been autocorrected to dates or random numbers. For example, one gene called Septin-2 is commonly shortened to SEPT2, but inputting it into Excel changed it to 2-SEP and it was stored as the date 2 September 2016. Unfortunately for the genetics researchers using this software, once Excel has decided the gene name is actually a date or number there’s no retrieving the initial values that you entered into a cell. So the name is lost and unless the correction is obvious to the user there’s no way to retrospectively correct the work. Also, there is no way to switch off this autocorrect function, so researchers are stuck with it.  

Of course these mistakes can be easily averted by users formatting Excel columns as text before entering data and keeping vigilant to the possibility of autocorrect generated errors. Or else averted through using alternative software, such as Google Sheets or R, the statistics software. These steps to prevent errors may seem simple enough, but it isn’t the first time that the risks of Excel’s autocorrect function have been pointed out. In 2004 an article in BMC Bioinformatics titled ‘Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics (link is external)’ highlighted this problem to the community. Yet the new study shows that the initial warning over a decade ago hasn’t been heeded and researchers and journal editors aren’t thoroughly checking work, even before it goes to publication.

These issues have been identified and are apparently easily overcome, yet they echo the recent problems of reproducibility (link is external) that have arisen within the life sciences. Validation of working reagents and results are simply not occurring to the level that they need to and many researchers are unaware of proper validation methods and the issues this can cause. Consequently, the number of retractions of published papers and data has increased and there is a growing lack of confidence in published data.

Users being aware of the potential pitfalls in the use of any tools, whether they are affinity reagents or software programmes can help to spot problems when they do arise and rectify them before they cause errors in the literature hindering future science. Yet there are issues that extra vigilance alone will not fix, instead manufacturer’s need to address issues with the tools they produce. Issues such as lack of reagent reproducibility between various lots is something that antibody manufacturers need to address. Similarly, the Excel autocorrect function that has caused flaws in a high number of genetic research papers could be modified to allow users to turn this off, to reduce the errors in research.

Some are starting to address these issues: for antibody validation meetings (link is external) will be held this autumn to outline the tangible solutions for reagent validation. Online discussions (link is external) are being held with the scientific community prior to the meeting to crowd source the consensus on validation. These discussions already recognise that validating affinity reagents will need different approaches for different applications, as we already do for Affimer proteins. Equally lots of scientists in these discussions have reflected the need to move away from standard monoclonal and polyclonal antibodies and towards recombinant affinity proteins, like Affimer proteins, for increased reproducibility going forward.

It is hoped that both manufacturers and end-users will adopt these steps to improve the accuracy and reproducibility of future results.  Once a consensus exists of the appropriate validation techniques for reagents used in different applications then we can begin to eliminate some of the errors in research and science can build upon itself in a reliable way, as it’s supposed to do.

Understanding the tools we use to conduct research and the problems they can bring can help prevent these problems becoming ingrained in the scientific literature, but meanwhile manufacturers must do their upmost to provide tools that perform to their optimum. At Avacta Life Sciences we are happy to work with our customers to understand how you want to use Affimer® reagents in your assay. We offer validation that fits in with the desired end use for the Affimer protein to help return the quality data that you need from your assays.