Scientists rename 27 human genes to prevent Microsoft Excel from mislabeling them as dates

Over the years, scientists in Genomics have faced a strange predicament. Every human genome they have identified and named has been mislabelled by Microsoft’s Excel. This happens due to the fact that Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. Scientists studying Genomics have labeled these genomes with a name and alphanumeric code to identify it easily at the time of research, papers, and other requirements.

The problem is with Microsoft’s spreadsheet software that Genomists usually use. MS Excel converts these gene names to dates and floating-point numbers. This issue was originally described in 2004 when it was found that Excel date conversions affected at least 30 gene names. The floating-point conversions affected at least 2,000 such genomes if Riken identifiers are included. Another report from 2016 examined genetic data shared alongside 3,597 published papers and found that roughly one-fifth had been affected by Excel errors.

This automatic conversion of gene symbols to dates and floating-point numbers in Microsoft Excel seemed to be a problematic feature for the scientists. The report further mentioned that the workaround for the same was also been implemented by scientists about a decade ago. However, the scientists continued to find that these errors continued especially in research papers.

Now the Genomists are rectifying this unintentional error. So far, the names of some 27 genes have been changed like this over the past year, Elspeth Bruford, the coordinator of HGNC, said The Verge, but the guidelines themselves weren’t formally announced until this week. “We consulted the respective research communities to discuss the proposed updates, and we also notified researchers who had published on these genes specifically when the changes were being put into effect,” says Bruford.

“We always have to imagine a clinician having to explain to a parent that their child has a mutation in a particular gene,” says Bruford. “For example, HECA used to have the gene name ‘headcase homolog (Drosophila),’ named after the equivalent gene in the fruit fly, but we changed it to ‘hdc homolog, cell cycle regulator’ to avoid potential offense.”

This week, the HGNC(HUGO Gene Nomenclature Committee) published new guidelines for gene naming, including for “symbols that affect data handling and retrieval.” From now on, they say, human genes and the proteins they expressed will be named with one eye on Excel’s auto-formatting. That means the symbol MARCH1 has now become MARCHF1, while SEPT1 has become SEPTIN1, and so on. A record of old symbols and names will be stored by HGNC to avoid confusion in the future.

After knowing this you might get a question of why the human gene name should be renamed instead of changing Microsoft Excel’s working, but Bruford’s theory is that it’s simply not worth the trouble to change. “This is quite a limited use case of the Excel software,” she says. “There is very little incentive for Microsoft to make a significant change to features that are used extremely widely by the rest of the massive community of Excel users.”

