What happens when the evidence is mixed?
EIF Director of Evidence Tom McBride outlines how EIF's programme assessment process and online Guidebook of programmes reflects and represents the variation in the evidence-base for high-profile programmes such as Multisystemic Therapy (MST) and Functional Family Therapy (FFT).
Today we have added Multisystemic Therapy (MST) to our online EIF Guidebook of programmes. MST receives a level 4+ rating on the EIF scale for the quality of evidence of impact, the highest rating available. This may raise eyebrows, given that a recent high-profile UK evaluation by Professor Peter Fonagy and colleagues published in the Lancet (Fonagy et al., 2018) found that MST was no more effective than management as usual at preventing out-of-home placements or in reducing the number offences committed by adolescents who have a history of persistent and enduring violent and aggressive interpersonal behaviour.
MST sits alongside other programmes such as Functional Family Therapy (FFT) and Family Nurse Partnership (FNP) which have received higher EIF ratings (level 3 or 4) despite having equivocal evidence and, in particular, having disappointing findings from UK evaluations. It is important to say a little about these programmes, their evaluations, our process for assessing programmes for inclusion on our Guidebook, and how we will factor in mixed findings of this type into our assessments.
Which programmes have mixed findings?
Multisystemic Therapy (MST) is an intensive family and community-based intervention for families with a young person aged 12–17, who are at risk of going into care due to serious antisocial and/or offending behaviour. MST therapists provide the young person and their parents with individual and family therapy over a four-to-six-month period, with the aim of doing ‘whatever it takes’ to improve the family’s functioning and the young person’s behaviour.
- MST has a strong international evidence-base, with over 20 studies (from the UK, US, Norway and other countries) demonstrating positive impact across a range of criminal justice system, behaviour and parenting outcomes. This includes one robust study conducted in the UK (Butler et al., 2011), demonstrating reductions in youth offending behaviour, as well as aggressive and delinquent behaviour.
- However, there are also robust studies investigating the impact of MST that have more equivocal findings. For example, one study conducted in Sweden (Sundell, 2008) found no programme impact on a range of outcomes, including social skills, and self-report delinquency and substance misuse. The more recent Fonagy trial from the UK also failed to demonstrate that MST was consistently more effective than standard services at improving the primary outcomes of the evaluation (out-of-home placements). Moreover, the study suggests that for one outcome (namely, criminal offences at 12-months follow-up), participants receiving standard services improved more relative to those receiving MST.
Functional Family Therapy (FFT) is a family therapy programme for young people between 10 and 18 years old involved in serious antisocial behaviour and/or substance misuse. The young person, often referred into FFT through the youth justice system at the time of conviction, attends between eight and 30 weekly session with his or her parents to learn strategies for improving family functioning and addressing their own behaviour.
- FFT is underpinned by a number of studies, including one robust trial conducted in the USA (Waldron et al., 2001) demonstrating reductions in the number of days of marijuana use in a sample of adolescents.
- However, a couple of studies, including a recently published and robust trial from the UK (Humayun et al., 2017) failed to demonstrate that FFT was consistently more effective than standard services at improving the primary outcomes of the evaluation (including delinquency and offending).
Family Nurse Partnership (FNP) is a home-visiting programme for young mothers expecting their first child. The programme is delivered by highly trained and supervised nurses or midwives over the course of 64 visits, delivered from pregnancy to the child’s second birthday, and aims to improve child health and development by helping parents provide responsible and competent care.
- FNP also has a strong international evidence-base, with multiple trials (from the US and the Netherlands) demonstrating positive impact across a range of outcomes, including child abuse and neglect, criminal justice system outcomes, child behaviour outcomes and parenting outcomes.
- However, a recently published and robust trial from the UK (Robling et al., 2016) had more equivocal findings. Although there were some positive results, including higher development quotients for children in the FNP group (at age 2), the trial failed to demonstrate that FNP was consistently more effective than standard services at improving the primary outcomes of the evaluation (including smoking cessation, birthweight, rates of second pregnancies, and emergency hospital visits for the child).
So how can MST get a level 4+?
Our evidence rating is based on the highest quality evidence available, regardless of country of origin. This is because we want our evidence rating to summarise the extent to which a programme has been proven to work anywhere.
Moreover, programmes can receive a level 3 or 4 rating – on the basis of having robust evidence that the programme has been effective – even if there is also robust evidence suggesting that the programme has had no impact in some cases. This is because our rating is intended to communicate the extent to which a programme has demonstrated ‘proof of concept’ – that is, whether it has been demonstrated that it can work.
However, even multiple randomised control trials (RCTs) showing impact do not guarantee that a programme will work in all situations, and so we endeavour to report the complexity of findings and give prominence to UK evaluations in the write-up of programmes on the EIF Guidebook.
For a programme to receive a 4+ on our rating scale it needs to have:
- Multiple high-quality evaluations showing statistically significant impact on a valid and reliable measure of child outcomes
- Evidence of long-term impact: outcomes have been shown to have improved at least one year after the intervention has stopped being delivered
- Evidence of impact using a form of measurement that is independent of the study participants (and also independent of those who deliver the programme)
- At least one evaluation which is independent of the programme developers.
As MST and FNP achieve this, they receive our highest rating. However, commissioning decisions should not be made on the strength of evidence rating alone.
How will EIF factor in mixed findings of this type into its assessments going forward?
We do not plan to fundamentally change our approach to rating evidence. However, it is important to us that we are transparent in the reporting of mixed evidence.
Therefore, for assessments conducted in 2018 and onwards (which includes MST and FFT), we will be adding clarification to evidence ratings where the evidence is mixed, to draw greater attention to these findings. Specifically:
- If a programme has strong evidence of impact from a single robust study, but also has strong evidence of not having achieved impact from other robust studies, the programme will receive a level 3+ rating, with mixed findings. This is denoted by an asterisk alongside evidence ratings on the Guidebook, to reflect the fact that while the programme has robustly demonstrated that it can work in one setting, there are also robust studies with more equivocal findings.
- If a programme has strong evidence of impact from multiple robust studies, but also has strong evidence of not having achieved impact from other robust studies, the programme will receive a level 4+ rating, with mixed findings. This is also denoted by an asterisk alongside evidence ratings on the Guidebook.
Shouldn’t EIF give more weight to UK findings?
This is something we are actively looking at, as we know that UK commissioners are most interested in programmes which have been shown to deliver outcomes here. As the evidence-base grows and develops for some programmes, we need to consider how we weigh multiple and conflicting results produced in a number of settings and what prominence we should give to UK evaluations. However, there is no right way to collapse multiple evaluations into a simple rating, and any approach will have strengths and weaknesses. That is why we encourage everyone using our Guidebook to read the entirety of our assessment.
Why haven’t these programmes been as successful in the UK?
The truth is no one knows – an RCT can only establish if something has worked, not why.
However, there are a number of theories as to why the latest MST evaluation has been so disappointing:
- Fidelity: While treatment fidelity was monitored in this evaluation, and findings suggest that the vast majority of MST sites were delivering the intervention with adequate fidelity, the authors do note that treatment fidelity typically improves over time. Indeed, monitoring data suggests that the current fidelity ratings for MST services in the UK are higher than they were in this study, which evaluated what could be considered a ‘first generation’ MST delivery. This means that more current implementations of MST may be more effective, as they are better quality deliveries of MST and so would demonstrate more improvements if evaluated.
- Differences between the UK and United States: The social security systems between the two countries are very different, and the impact of MST in the US (as opposed to UK) might be explained in part by differences in the services those in the control groups received – that is to say, the usual treatment in the US may be less effective than the usual treatment in the UK.
- The intensity and flexibility of management-as-usual: A paper on this is forthcoming this summer; however, the latest evaluation implies that those in the control group and receiving usual services were getting a large amount of input from highly trained staff. This may imply that those in the study were reasonably high-needs individuals and that management-as-usual services available in the areas of the trial were of high quality. Furthermore, the authors of the evaluation note that the management-as-usual services varied considerably, and that they may have offered more flexibility in addressing the young person’s specific needs than MST.
So, should I commission MST?
Deciding whether a service is appropriate for your area is a complex decision. Evidence on impact is part of the picture, but it is also important to consider the needs of the local population, capacity and capability of staff to deliver a programme, and fit with current services. Given MST has been shown to be effective in numerous evaluations, including one in the UK, we feel it is too soon to conclude that it cannot or does not work here. However, we encourage commissioners to think carefully about the target population and type and intensity of intervention they would otherwise receive, and what MST is likely to add over and above this. Finally, if you are delivering MST, gathering and analysing high-quality monitoring data will be essential in gauging whether it is working in your area.