Mendelian randomization is one of the techniques EPoCH uses to understand whether parents’ lifestyles in the prenatal period causally affect the health of their children.

The technique (which cool kids call ‘MR’) is based on the idea that genetics can tell us about non-genetic factors and their effects on health and disease.

Many of the people I work with at the University of Bristol (most notably the director of the MRC Integrative Epidemiology Unit, George Davey Smith) have championed MR, and over the past decade there has been a huge increase in the number of published MR studies.

To be honest, I struggled with understanding MR when I started working at the University of Bristol. At one early department meeting I asked whether there was “a course or something on MR for Dummies?” Thus charming my new colleagues with my cutesy self-deprecating wit.

And I know it’s not just me, because I have tried to explain MR to multiple confused faces belonging to students, academics, clinicians, members of the public, and, most bemused of all, my friends and family (“WHY is she telling us this?”).

But now that I’m the principal investigator on a grant that includes MR, I want to be able to explain it in a way that other people understand, so here goes…

MR helps us tell the difference between correlation and causation

MR uses the chance (or “random”) distribution of genes in a population to tell us about whether certain non-genetic characteristics or behaviours cause other characteristics or disease. I’ve written more about why distinguishing correlation from causation is important here.

Genetic data as a “proxy”

MR uses genetic information as a “proxy” for non-genetic information. For example, people with a certain variant of the ALDH2 gene (let’s call it variant 1) are much more likely to drink alcohol than people with another variant (let’s call it variant 2). So if we want to study the effects of parents’ alcohol consumption on offspring birth weight, we can compare the average birth weight in babies of parents with variant 1 to the average in babies of parents with variant 2.

This can give us a better idea of causation than if we just tried to study this by asking parents how much they drink (i.e. if we didn’t use their genetic information).

That’s because genetic information is randomised at conception, so the chance of someone getting variant 1 or variant 2 is random and not affected by any confounding factors. On the other hand, how much alcohol parents drink will be heavily influenced by confounding factors such as how much money they have, where they live, their religion, etc.

Parents might also forget how much alcohol they have consumed, or under-report it, which would introduce reporting bias. Genetic information is measured objectively and therefore not affected by this type of bias.

Also, because genetic variants are assigned at conception and then can’t be changed, there’s no chance that the outcome (birth weight) can influence the exposure (parents’ alcohol intake), so reverse causation is not an issue in MR either.

(…reverse causation is unlikely to be much of an issue in EPoCH anyway, because we know that the exposure (e.g. drinking alcohol during pregnancy) always comes before the outcome (e.g. weight at birth) in this study).

How EPoCH will use MR

In EPoCH, we’ll use MR to study the causal effects of maternal and paternal health behaviours on childhood health. Although there is lots of observational evidence suggesting that parental (particularly maternal) factors are associated with the health of their children, very few studies have looked at whether these associations are causal.

The assumptions of MR

MR can be a really useful tool for “causal inference”, but there are many things to consider before drawing conclusions. In particular, we need to check that the following main “assumptions of MR” are being met.

The relevance assumption

To be suitable for MR, a genetic variant should be very strongly associated with the exposure being studied. So in our example, having variant 1 of ALDH2 must be very strongly associated with drinking more alcohol.

The independence assumption

The genetic variant must not be affected by any of the other factors that affect the outcome, i.e. the association between the genetic variant and the outcome must not be confounded. So smoking (etc) shouldn’t affect the chances of having ALDH2 variant 1 or 2.

The exclusion restriction assumption

The effect of the genetic variant on the outcome should not act via any pathway that doesn’t involve the exposure. So having ALDH2 variant 1 should not affect birth weight through any pathway that doesn’t involve an effect on drinking alcohol. E.g. ALDH2 variant 1 should not affect how much a person smokes, because this might then have a causal effect on offspring birth weight independently of any effect of alcohol consumption.

How can we be sure MR is giving us the right answer?

Neil Davies, Michael Holmes and George Davey Smith have written an excellent introduction to MR that outlines how we can check that the assumptions of MR are being met. We’ll be doing all we can in EPoCH to check these assumptions. However, even with multiple checks, it will be difficult to tell that an answer is “right” using MR alone.

That’s why we’re combining MR with other causal inference techniques, such as sibling comparisons and negative control designs, to “triangulate” the evidence. If all these different strands of evidence point towards the same answer, then that will strengthen our confidence that the answer is correct.

A two minute explainer

Well that was my attempt to explain MR, but if you’re still confused, George manages to explain it much more eloquently in just two minutes in the following animation…

(I probably should’ve just posted this and saved everyone’s time… soz).

One of the main aims of EPoCH is to understand whether parents’ lifestyles in the prenatal period causally affect the health of their children. The project will use several techniques to try to tease apart correlation from causation. In our first blog post, I’m going to try to explain why correlation does not equal causation and why we often need to separate the two.

Example: does drinking a small amount of alcohol every day reduce your chance of developing heart disease?

We might see in the news that a study finds that people who drink a small amount of alcohol every day have lower rates of coronary heart disease than people who don’t drink at all (in fact, here are a few examples of news articles along these lines). The article might then say that this means we can all reduce our risk of heart disease by enjoying a lovely drop of booze every day.

Observational evidence

More often than not, these sorts of studies are based on “observational evidence”, i.e. evidence that has been gathered by observing a group of people enrolled in either a “cohort” or a “case-control” study. There is no experiment, the researchers haven’t changed anything, and the people taking part in the study haven’t received any treatment or intervention. Instead, the researchers have compared the rate of heart disease in people who say they drink a little and people who say they don’t drink, and they have observed that, proportionally, more people who don’t drink have heart disease.

All of the studies used in EPoCH are observational studies. They are birth cohorts that enrol mothers, partners and children at the time of pregnancy or the child’s birth and then observe them over time (not in a creepy way, but with their full consent, using questionnaires and interviews, etc). So, it’s really important that we are aware of the different interpretations of observational evidence.

Potential interpretations of observational evidence

In the alcohol-heart disease example, let’s think of possible reasons why people who drink small amounts of alcohol might have a lower risk of heart disease:

1. Drinking small amounts of alcohol reduces the chance of getting heart disease.

This is the explanation the news article has homed in on (although, without further evidence, there is no reason to believe this explanation over the others below). It suggests that drinking alcohol causes lower risk of heart disease, i.e. there is a true causal effect from alcohol –> lower heart disease.

2. Having heart disease makes people give up drinking.

This isn’t as enticing at the first explanation, because it doesn’t mean we can all indulge in our penchant for Babycham on a nightly basis, but without further evidence, it is equally likely. In fact, given what we already know (from other studies) about alcohol generally being bad for us, we can assume that this explanation is more likely. Most people who drink probably change their drinking habits if they find out they’ve got heart disease. They might give up completely. They might even be in hospital, where alcohol is not available. This is an example of reverse causation, i.e. the direction of causal effect is from lower heart disease –> alcohol.

For EPoCH, where we’re interested in prenatal influences on childhood health, reverse causation is less likely. For example, a child’s IQ at age 7 can’t feasibly affect whether the child’s father drank coffee before they were born. In fancier words, the temporal order of events makes reverse causation impossible.

3. People who drink a small amounts of alcohol have something else about them that reduces their chance of getting heart disease, and it’s that thing (not the alcohol) that’s responsible for the link.

The “something else about them” could be anything, but say for example that people who drink small amounts of alcohol are wealthier than people who don’t. If this is true, then wealth affects both alcohol and risk of heart disease, thereby creating a false or a stronger relationship between those two factors. If we imagine that many of the people who are drinking a small amount of alcohol every day are middle class people drinking a glass of wine with their dinner, then we might imagine that these people can also afford a high quality diet, better access to health care and many of the other health advantages to having a higher socioeconomic position. So it makes sense that, in addition to being more likely to drink in moderation, wealthy people might be less likely to have heart disease. In this case, the relationship between alcohol and heart disease would be explained partly, or even completely, by the differences in wealth between the two groups. This is an example of “confounding” (where the confounder in this example is wealth), i.e. higher wealth –> alcohol, and, higher wealth –> lower heart disease, with no (or a weaker-than-estimated) direct causal effect between alcohol and heart disease in either direction.

4. Drinking small amounts of alcohol is not related to having a lower risk of heart disease in the general population, but it appears to be in this study because of bias.

There are lots of types of bias that can distort the results of a study so that they don’t match up with what we would see if we had perfect data on the whole population. One example is reporting bias whereby participants in the study misreport their behaviour (for example, because they can’t remember, or because they feel uncomfortable reporting the truth). In this case, people with heart disease might say they don’t drink because they have been told by a doctor that they should limit their alcohol intake and they want to give the “right” answer, even though, in reality, they do drink. Another example is loss-to-follow-up bias whereby participants die or drop out of the study because of the thing being studied before all of their data can be collected. In this case, drinking a little may in fact cause people to either die or drop out of the study before they are recorded as having heart disease. Therefore the group that doesn’t drink will be left with a higher proportion of people with heart disease than the group that drinks small amounts. In both these examples, bias would distort the numbers to make it appear like people who don’t drink have a higher risk of heart disease than those that drink a small amount (i.e. the results of the observational study).

5. Drinking small amounts of alcohol is not related to having a lower risk of heart disease in the general population, but it appears to be in this study because of chance.

Because the study is based on just a sample of people, the researchers might have found a fluke association that they wouldn’t find if they were to re-run their study using a different sample, or using everyone in the population. The researchers will have tried to rule out this explanation by conducting statistical tests that give a measure of “precision” (i.e. P-values and confidence intervals), but chance findings can never be ruled out completely.

So correlation does not imply causation… but why do we care?

The news article’s claim that drinking alcohol can slash the risk of heart disease (i.e. the interpretation of the observational evidence as evidence of explanation 1 – a true causal effect) is a classic example of taking correlative evidence and making causative claims, i.e. confusing correlation and causation.

But why is it important to know whether something causes a disease or is merely correlated?

The clue is in the way the news article has framed the evidence, i.e. that we can all reduce our risk of heart disease by drinking alcohol every day. Identifying causal relationships helps us identify things we can modify to improve our health. If there was a lot of strong causal evidence that drinking a small amount every day really does reduce our risk of heart disease, then doctors would start prescribing alcohol.

What will EPoCH do?

A lot of the current health advice given to parents (mostly mums) is based on correlative rather than causative findings from observational studies, so we desperately need better, causal evidence to support this advice.

In EPoCH, we’ll use statistical techniques to try to tease apart correlation from causation. We’ll also work with the media and groups like WRISK to help make sure our findings are interpreted accurately by journalists, policy makers, healthcare professionals and ultimately parents.

You can find out more about our plans, methods and results by exploring our blog.

Category: causal inference

What is “Mendelian randomization”?