A place for everything and everything in its place: using the Outcomes Stars in combination with validated measures of impact
Virginia Ghiara, co-author of our recent review of measures used in relationship support interventions, looks at the popular Outcomes Stars family of measures. Beyond their value in facilitating conversations and cooperation between practitioners and service users, what is their role in the evaluation of the impact of services and interventions?
At EIF, when we assess the strength of evidence underpinning an early intervention programme, there is one question we always ask: have children’s outcomes improved? Answering this question is important: it’s necessary to understanding if the intervention has worked, and if so, by how much.
Selecting the most appropriate measures to assess the intended outcomes of an intervention is often difficult. There are a wide variety of measures available, each focusing on different outcomes and with distinct strengths and limitations.
At EIF, when we assess a programme’s evidence, we accept a study as providing at least preliminary evidence only if there is a significant impact on child outcomes as assessed with a valid and reliable measure.
Measures are accepted as valid if they have proven to measure what they claim to measure. A measure assessing symptoms of anxiety, for example, needs to measure anxiety, not another concept such as stress or depression.
Measures are accepted as reliable if they are stable and do not vary randomly, so that they can accurately capture change over time. A measure of anxiety, for example, should elicit the same result if it is completed again by the same individual a week or a month from now, unless something significant – like an intervention – has occurred.
Recently, we have conducted an in-depth review of measures of parental conflict and its impact on child outcomes. This prompted us to develop a set of specific criteria to determine whether instruments are valid, reliable and practical for measuring an intervention’s impact. It also forced us to reconsider how we view measures that have limited information on validity and reliability, but which local areas use extensively. An obvious example is the Outcomes Stars, a set of 30 different measures developed by Triangle and commonly used by local authorities across the country. Each measure (or star) focuses on a distinct sector, such as criminal justice, families and children, and education. Each tool, as the name suggests, is presented in a star shape, with individual outcomes measured on a five-stage ‘journey of change’.
In a previous blog post and in our evaluating early help guide, we questioned the robustness of Outcomes Stars, due to a lack of information about their validity and reliability for measuring impact. We argued that valid and reliable measures – that is, those which can be scored in a standard manner and lend themselves to reliable comparison of results – should be preferred in order to objectively evaluate the effectiveness of interventions. In light of our newly developed assessment criteria, however, it is appropriate to review that position.
From our experience, it appears that the Outcomes Stars are widely used tools, popular with both families and practitioners. At the Reducing Parental Conflict conferences, held in London and Manchester in June 2019, for instance, we met representatives from several local authorities that were using the Family Outcomes Star on a routine basis.
These measures have been developed to be ‘done with, not done to’: their aim is to facilitate dialogue between practitioners and service users. For this reason, the Outcomes Stars can be used to strengthen the therapeutic relationship, set therapeutic or practice goals, and help service users reflect on their personal experiences. A case in point is the New Mum Star, a tool for young, first-time mums, developed in collaboration with the Family Nurse Partnership (FNP). It aims to help nurses engage with young parents, identify their needs and aspirations, and reflect together on how best to adapt the FNP programme to their specific situation. Used in this manner, it is likely that this tool could add considerable value to practice – as described in FNP’s recent evaluation.
However, we continue to question the suitability of such measures for evaluating impact.
Most of the Outcomes Stars now have both a psychometric factsheet, published by Triangle, which provides information on their validity and reliability, and a user manual providing a scoring guidance to help increase objectivity. While we welcome the publication of psychometric results – which are promising in a number of ways: for instance, the internal consistency values reported for the Family Star would meet our criteria – we feel further analysis is needed to fully assess the appropriateness of these measures for measuring pre- and post-intervention change. For instance, some of these measures, such as the Family Star and Shooting Star (developed for school students, to assess both academic and non-academic achievements), do not report sufficient information to establish construct/criterion validity. Triangle is conducting further analysis on some measures, such as the Family Star Plus, and we will welcome additional psychometric evidence showing that such tools are valid and reliable.
Furthermore, it is still unclear whether the scoring guidance has actually increased objectivity in scoring, which is vital to ensure a reliable comparison of results and a good understanding of how the intervention will work for new participants. While Triangle only allows licensed users to use the Stars, and provides implementation training and support to increase objectivity, no tests have been conducted to verify whether consistency in use has, in fact, increased.
Finally, we still have some concerns regarding the fact that the measures are generally completed as part of a conversation with the practitioner. It is likely that the information collected through this completion approach is not truly self-report. Moreover, this approach also runs the risk of gathering biased information: respondents, for instance, might report improvements only to please the questioner, a measurement problem often referred to as the ‘social desirability bias’.
Taking this into account, we conclude that, while practitioners report that the Outcomes Stars facilitate dialogue with users and support the co-creation of goals, currently, such measures do not meet all our standards for assessing impact in a research evaluation context. As recommended by Triangle, we would caution against using the Outcomes Stars as the sole impact measure of an evaluation, and would advise that the data collected is primarily regarded as management information on progress to be used for monitoring and organisational learning.
Ultimately, the question we need to answer when assessing the strength of a measure is not just whether the tool is good enough, but whether it is fit for purpose, and how it can be used in the best possible way. It is a good sign that practitioners use the Outcomes Stars to facilitate dialogue with service users and create shared goals, and to provide management information. However, it is important not to confuse these activities with more formal evaluation of the effectiveness of an intervention, which requires validated measures of impact.