RNA is now a first class bioinformatics molecule.

RNA research is expanding very quickly, and a public resource for these extremely valuable datasets has been long overdue.

Some 30 years ago, scientists realised that RNA was not just an intermediary between DNA and protein (with a couple of functions on the side), but a polymer that could fold into complex shapes and catalyse countless reactions. The importance of RNA was cemented when the structure of the ribosome was determined (something that Venki Ramakrishnan, Ada E. Yonath, and Tom Steitz won the Nobel Prize for, eg here is Venki’s Nobel lecture) and it was confirmed that the core function of ribosomes – making a peptide bond between two amino acids – was catalysed by ribosomal RNA and not by proteins. It’s also likely that RNA – not protein, not DNA – was the first active biomolecule in the primordial soup that gave rise to Life. Indeed, one could easily see DNA as an efficient storage scheme for RNA information, and proteins as an extension of single-stranded RNA’s catalytic capabilities, enabled by the monstrous enzyme, ribosomal RNA.

Even focusing on RNAs established role as the cell’s information carrier, the textbook mRNA, RNA-based interactions are widely recognised as being important. A real insight was the discovery of microRNA (miRNA): small RNAs whose actions lead to the down-regulation of transcripts by suppressing translation efficiency and cleaving mRNAs. MicroRNA has brought to life a whole new world of other small RNAs, many of which are involved in suppressing “genome parasites” – repeat sequences that every organism needs to manage.

And then there are long RNAs in mammalian genomes that do not encode proteins (i.e. long non-coding RNA – lincRNA) have long been recognised as having some significance – but what do they do? Some are clearly important, like the non-coding RNA poster child Xist, which inactivates one of the X chromosomes in female mammals to ensure the correct dosage of gene products. Others are involved in imprinting/epigenetic processes, for example the curiously named HOTAIR, which influences transcription on a neighbouring chromosome.

RNA: something missing

Discoveries in RNA biology have expanded the molecular biologist’s toolkit considerably in recent years. For instance, the cleavage systems from small RNAs can be used (in siRNA and shRNA ways) to knock down genes at a transcriptional level. The current “wow” technology, CRISPR/Cas9, is a bacterial phage defence system that uses an RNA-based component to adapt to new phages easily. This system has been repurposed for gene editing in (seemingly) all species – every genetics grant written these days probably has a CRISPR/Cas9 component.

And yet in terms of bioinformatics, RNA data was – until this past September – rather uncoordinated. There wasn’t a good way to talk consistently about well-known RNAs across all types, although this was sometimes coordinated in sub-fields such as Sam Griffiths-Jones’ excellent miRBase for miRNAs, or Todd Lowe’s gtRNAdb resource from for tRNAs. But because RNA data was mostly handled in one-off schemes, researchers working in this area were hindered. Computational research couldn’t progress to the next stages, for example capturing molecular function and process terms with GO or collecting protein–RNA interactions in a consistent way.

RNAcentral in the bioinformatics toolkit

So I’m delighted to see the RNAcentral project emerge (http://rnacentral.org/). RNAcentral is coordinating the excellent individual developments emerging in different RNA subdisciplines: miRNAs, piRNAs, lincRNAs, rRNAs, tRNAs and many more besides. It provides a common way to talk about RNA, which in turn allows other resources – such as the Gene Ontology or drug interactions databases – to slot in, usually precisely in the same “place” as the protein identifier.

Alex Bateman, who leads the RNAcentral project, has been exploring a more federated approach, quite deliberately gathering the hard-earned, community-driven expertise of member databases in specific, specialised areas of RNA biology.

RNAs were, potentially, the first things on our planet that could be considered “alive”. They are critical components in biology, not just volatile intermediaries. In terms of bioinformatics, giving RNA the same love, care and attention as proteins is long overdue, and I look forward to seeing RNAcentral provide the cohesion and stability this area of science so richly deserves.