EIF Evidence Standards


The EIF assessment process was developed specifically to inform judgments about the extent to which a programme has been found effective in at least one rigorously conducted evaluation study.

These standards were developed in consort with other What Works Centres to assess interventions in terms of their impact and cost. They are broadly similar to the Maryland Scale and other critical appraisal systems that recognise stages of development, and were formally approved by our Evidence Panel during the set-up phase of the organisation.

It is also important to note that this approach differs from that taken by other evidence synthesis organisations (such as Cochrane, NICE) that make use of meta-analytic methods to synthesise findings from multiple interventions with similar aims and objectives. These alternative methods result in an aggregate score or statement thought to provide a robust estimate of the quality of evidence for a given practice type. They do not, however, facilitate comparisons between programmes on the basis of their evaluation evidence, as the EIF methodology does.

EIF evidence ratings

This rating system distinguishes five levels of strength of evidence of impact. This is not a rating of the scale of impact but of the degree to which a programme has been shown to have a positive, causal impact on specific child outcomes.

  • Level 4 recognises programmes with evidence of a long-term positive impact through multiple high-quality evaluations.
  • Level 3 recognises programmes with evidence of a short-term positive impact from at least one high-quality evaluation.
  • Level 2 recognises programmes with preliminary evidence of improving a child outcome, but where an assumption of causal impact cannot be drawn.

The term ‘evidence-based’ is frequently applied to programmes with Level 3 evidence or higher, because this is the point at which there is sufficient confidence that a causal relationship can be assumed. The term ‘preliminary’ is applied to programmes at Level 2 to indicate that causal assumptions are not yet possible.

NL2 distinguishes programmes whose most robust evaluation evidence does not meet the Level 2 threshold for a child outcome, so do not yet have direct evidence about the scale of impact of the programme at a “preliminary” level.

The category of NE — ‘Found not to be effective in at least one rigorously conducted study’ — is reserved for programmes where there is evidence from a high-quality evaluation of the programme that it did not provide significant benefits for children. This rating should not be interpreted to mean that the programme will never work, but it does suggest that the programme will need to adapt and improve its model, learning from the evaluation. The best evidenced programmes have normally had null findings along the way to demonstrating proof of concept. Some developers with such evidence have terminated their programme, others are working out how to adapt and improve their model to respond to the evidence. A more dynamic description of these standards which recognises the importance of evidence development is shown in the following diagram. This shows typical stages of development of evidence of effectiveness for a programme. For full details of these ratings and criteria, please download this summary guide.


This presentation of the standards represents a shift from how the standards of evidence are currently used in the EIF Guidebook, although the underlying criteria by which programmes are assessed haven’t changed. The standards of evidence as used in the Guidebook can be found here.