Mendelian randomization is one of the techniques EPoCH uses to understand whether parents’ lifestyles in the prenatal period causally affect the health of their children.

The technique (which cool kids call ‘MR’) is based on the idea that genetics can tell us about non-genetic factors and their effects on health and disease.

Many of the people I work with at the University of Bristol (most notably the director of the MRC Integrative Epidemiology Unit, George Davey Smith) have championed MR, and over the past decade there has been a huge increase in the number of published MR studies.

To be honest, I struggled with understanding MR when I started working at the University of Bristol. At one early department meeting I asked whether there was “a course or something on MR for Dummies?” Thus charming my new colleagues with my cutesy self-deprecating wit.

And I know it’s not just me, because I have tried to explain MR to multiple confused faces belonging to students, academics, clinicians, members of the public, and, most bemused of all, my friends and family (“WHY is she telling us this?”).

But now that I’m the principal investigator on a grant that includes MR, I want to be able to explain it in a way that other people understand, so here goes…

MR helps us tell the difference between correlation and causation

MR uses the chance (or “random”) distribution of genes in a population to tell us about whether certain non-genetic characteristics or behaviours cause other characteristics or disease. I’ve written more about why distinguishing correlation from causation is important here.

Genetic data as a “proxy”

MR uses genetic information as a “proxy” for non-genetic information. For example, people with a certain variant of the ALDH2 gene (let’s call it variant 1) are much more likely to drink alcohol than people with another variant (let’s call it variant 2). So if we want to study the effects of parents’ alcohol consumption on offspring birth weight, we can compare the average birth weight in babies of parents with variant 1 to the average in babies of parents with variant 2.

This can give us a better idea of causation than if we just tried to study this by asking parents how much they drink (i.e. if we didn’t use their genetic information).

That’s because genetic information is randomised at conception, so the chance of someone getting variant 1 or variant 2 is random and not affected by any confounding factors. On the other hand, how much alcohol parents drink will be heavily influenced by confounding factors such as how much money they have, where they live, their religion, etc.

Parents might also forget how much alcohol they have consumed, or under-report it, which would introduce reporting bias. Genetic information is measured objectively and therefore not affected by this type of bias.

Also, because genetic variants are assigned at conception and then can’t be changed, there’s no chance that the outcome (birth weight) can influence the exposure (parents’ alcohol intake), so reverse causation is not an issue in MR either.

(…reverse causation is unlikely to be much of an issue in EPoCH anyway, because we know that the exposure (e.g. drinking alcohol during pregnancy) always comes before the outcome (e.g. weight at birth) in this study).

How EPoCH will use MR

In EPoCH, we’ll use MR to study the causal effects of maternal and paternal health behaviours on childhood health. Although there is lots of observational evidence suggesting that parental (particularly maternal) factors are associated with the health of their children, very few studies have looked at whether these associations are causal.

The assumptions of MR

MR can be a really useful tool for “causal inference”, but there are many things to consider before drawing conclusions. In particular, we need to check that the following main “assumptions of MR” are being met.

The relevance assumption

To be suitable for MR, a genetic variant should be very strongly associated with the exposure being studied. So in our example, having variant 1 of ALDH2 must be very strongly associated with drinking more alcohol.

The independence assumption

The genetic variant must not be affected by any of the other factors that affect the outcome, i.e. the association between the genetic variant and the outcome must not be confounded. So smoking (etc) shouldn’t affect the chances of having ALDH2 variant 1 or 2.

The exclusion restriction assumption

The effect of the genetic variant on the outcome should not act via any pathway that doesn’t involve the exposure. So having ALDH2 variant 1 should not affect birth weight through any pathway that doesn’t involve an effect on drinking alcohol. E.g. ALDH2 variant 1 should not affect how much a person smokes, because this might then have a causal effect on offspring birth weight independently of any effect of alcohol consumption.

How can we be sure MR is giving us the right answer?

Neil Davies, Michael Holmes and George Davey Smith have written an excellent introduction to MR that outlines how we can check that the assumptions of MR are being met. We’ll be doing all we can in EPoCH to check these assumptions. However, even with multiple checks, it will be difficult to tell that an answer is “right” using MR alone.

That’s why we’re combining MR with other causal inference techniques, such as sibling comparisons and negative control designs, to “triangulate” the evidence. If all these different strands of evidence point towards the same answer, then that will strengthen our confidence that the answer is correct.

A two minute explainer

Well that was my attempt to explain MR, but if you’re still confused, George manages to explain it much more eloquently in just two minutes in the following animation…

(I probably should’ve just posted this and saved everyone’s time… soz).

What is “Mendelian randomization”?