The First Genome of Christmas: E. coli (and friends)

Inspired by a very boring train stoppage last year, I am going to add, one a day, to this of great / interesting genomes until christmas day.

On the first day of christmas, my true love sent to me:

Escherichia coli and its associated phages. This humble bacterium is one of our commensal organisms; it hangs out in our gut being, usually, useful to us. But the reason why every molecular biologists knows about this critter is that it is also the bedrock of DNA manipulation. Molecular biologists shuttle DNA  from all sorts of different organisms through E. coli constantly.  It is the assembly line for much of molecular biology – where you capture, grow up, extract DNA. The smell of the growth media to grow E. coli infuses all molecular biology labs. E. coli has its own parasites – phages – which are viruses that infect E.coli, and these are as useful as their bacterial host.

As well as being this practical, laboratory workhorse, these organisms play a key role in understanding life. We know most of the early layout of gene sequences and protein translation due to studies in E. coli. A number of the classic experiments on gene regulation involved the lambda phage infecting E. coli. It still remains one of the “best” complete systems to study, with aspects from complete metabolic modelling through to understanding how bacteria sense chemicals and change swimming direction.

The two common experimental phages to E. coli – PhiX174 and lambda phage – are probably the two most sequenced pieces of DNA in the world. PhiX174 is tiny, only 5,386 bases long, and was the first genome sequenced by Fred Sanger and his team. Lambda phage is bigger (~44 KB) but both PhiX174 and Lambda are useful not only because of their small size, but also because you make bucket loads of the DNA which is near identical not only in their sequence, but also in the details of the precise behaviour.

PhiX174 was run as a “control lane” on Illumina machines for almost five years, meaning that 1/8th of the world’s sequencing capacity was dedicated to this; Lambda phage is the “burn-in” experiment provided with Oxford Nanopore’s kit to ensure the machines work in the remote laboratory successfully. Seeing traces of PhiX174 or lambda phage in a sequencing experiment is common place; in the large centres, there has been micro-evolution of some phages which you track through their publicly accessible projects.

The E. coli genome itself – a mere 4MB – is a bit of a sad tale. In theory it should have been clearly the first genome to be sequenced, but the consortium to sequence it got locked in too early to an unautomated sequencing technology, and took a rather painstaking approach to complete it (finished in 1997). It was soundly beaten by Haemophilus influenzae in 1995 from Craig Venter’s lab with a scale up of the shotgun technique on (at the time) state of the art fluorescent dye automated sequencers. This would not be the last technological tussle in genomics.

The final thing to note about E.coli is – like all bacteria – it doesn’t really have one genome. Different individual bacteria swap around substantial amounts of genomic sequence all the time, like a giant game of trading cards, endlessly looking for the perfect combination for the situation each individual is in.

Unfortunately for us, sometimes this shuffling leads to nasty – sometimes deadly combinations – where additional pieces of DNA change this mild mannered, laboratory stalwart into a dangerous infectious enemy. One example was the outbreak in 2011, eventually traced to a German bean sprout plant (but not before many other possibilities were touted). This was notable in that for the first there was a crowdsourced, internet response to both sequencing and analysing this genome.

(Many thanks to Mark Pallen for catching both factual and grammatical errors)