‘Big Data’, genetics and translation

Today sees the announcement of a new public–private partnership in science – the Centre for Therapeutic Target Validation – between EMBL-EBI, the Sanger Institute and GlaxoSmithKline (GSK). The collaboration is dedicated to developing a framework for biological target validation so that we can reduce the amount of time it takes to discover new therapies.

This is a really exciting initiative for me personally, both because the science is challenging and because I have been appointed as interim Head of the CTTV over the next year whilst we look for a long-term Head to steer the collaboration. It has already been a fascinating journey for me to understand the pharmaceutical industry in more depth, and really get to grips with an important scientific problem.

What’s the problem?

In very broad terms, there three phases in drug discovery: validating biological targets, making small molecules and testing them in clinical trials.

The purpose of target validation is to figure out, for a particular disease, which molecule (usually a protein) you need to change in order to change the disease. This molecule is called the target.

Drugs act on these targets. Drugs are often small molecules and sometimes proteins, like antibodies, that will change the activity of a biological target. A drug has to be able to enter the body, do what it was designed to do and not mess around doing other things. Pharmaceutical companies are really good at making these.

Clinical trials involve giving a new (or repurposed) drug to a group of consenting people. They are jaw-droppingly expensive. Unfortunately, the vast majority (90%) of drugs ultimately fail to make it through clinical trials. A large proportion of those fail because the information on which they were based, from right at the start – the target protein – was not quite right.

What’s the problem? In short, it is that a billion-dollar phase III clinical trial is a very expensive way to discover that your drug wasn’t changing the right target.

How do you make it better?

Clearly, validating those biological targets is extremely important and it can be done a lot better. What the CTTV is aiming to do is change the landscape for the initial phase of drug discovery by pooling our knowledge and resources to improve target validation.

GSK realised that this is not something that they (or any other commercial organisation) can do easily in house. Wisely, they decided that this work is best carried out pre-competitively, in the public domain. Around a year ago, members of GSK’s senior leadership came to visit the Genome Campus to explore a way forward, and the CTTV concept was born.

It isn’t going to be easy

So this sounds great! What’s not to like? A large company is funding public domain work for the greater good. But… it’s not a walk in the park – this problem is actually quite hard. At what point can one say, definitively, that this protein is a good target for a drug to act on to change the course of a disease?

To resolve this you would ideally create a specific perturbation that changes a specific molecule, verify its safety in humans and give it to people with the disease. In short, develop a drug. But the CTTV aims to get a good handle on target validation without actually making drugs, so what information are we working with, and what are we hoping to deliver?

Genetics is powerful…

For this task, genetics provides powerful tools. For example, some people have experienced a genetic change and, as a result, have natural knockouts of a protein. Ideally, you would study a large number of people with this profile and determine that because of this protein, they are either protected from or more vulnerable to the disease than others. The Broad Institute and deCode recently published an excellent case study publication on this, in this case finding a natural knockout of a zinc transporter which protects against type II diabetes.

This is just one way to use genetics. I am also excited about using genetics to get negative information – that a particular protein is not a good drug target for a disease. How to do this? Well if you can convince yourself that a variant is definitely changing the activity of a protein, even only a little, then if this change has no impact on disease risk, it is unlikely to be a good target. This area of work will borrow statistical techniques from epidemiology, in particular the rather impressive-sounding “Mendelian Randomisation” approach.

… but there’s more to it.

Genetics is just the beginning. Most ‘good’ drug targets, from which people have made successful medicines, are not co-incident with the strongest genetic signals, even though they may be in the same pathway. Much of the genetic signal (i.e. for common diseases) is in regulatory regions, with uncertain links to the proteins of interest.

But hindsight is 20/20: it’s far easier to do this analysis post-hoc, knowing the right answer. The scale and diversity of the data – sequence, expression, interactions, reactions – make this solidly a ‘big data’ problem that requires both engineering and statistical sophistication.

Molecular biology moves very quickly; often, a new kind of experiment will unlock a problem or narrow the possibilities from thousands to tens of targets.

Not to get too geeky, but I’m particularly excited about the CRISPR/Cas9 technology (like most experimentalists) because they make it possible to introduce specific mutations into cell lines. This should be particularly powerful in oncology, where systematic cancer sequencing efforts are giving rise to a number of robust targets. Cancer has been transformed by the ability to more systematically find oncogenes and tumour suppressors via sequencing, leading to targeted therapies such as BRAF inhibitors. The question is: which ones in a now established cancer cell do you need to target to slow (or stop) it’s growth?

Open data, of course

Like every other project we do at the EBI and Sanger, this collaboration is solidly committed to open data, open methods, open web sites, peer-reviewed publication and public discussion. The results will be distributed in accordance with the data-sharing policies of both institutes. Full stop.

But why would a big company find that attractive?

GSK, in its ‘enlightened self interest’, has set an ambitious goal of fundamentally shifting the way drugs are developed, and has backed the effort with substantial funds. But this isn’t just self interest – the results of work done at the CTTV will benefit all other drug companies and anyone working in drug development. And while GSK has put up the funding, the institutes on the Genome Campus have staked the considerable in-kind contributions of people and resources.

A full-bodied blend

The skill sets of the three institutions are complementary: GSK brings a deep understanding of finding and verifying drug targets, albeit in a bespoke manner. The Sanger Institute brings to the table substantial expertise in genetics, genomics, cancer genomics and cellular genetics, in terms of both experimentation and analysis. The EBI provides large-scale reference datasets, engineering at scale and innovative analysis.

The CTTV is kicking off with a motivated team of people who understand and respect these skill sets. I’m really lucky to be starting with a strong science team, and we’re already learning from each other, scientifically and organisationally. What is most exciting right now is grappling with some of these key problems in human disease biology. It’s going to be a really interesting year for me, and I hope a great start for the CTTV.

Bring it on!