e: notaoffice@nota.co.uk | e: conference@nota.co.uk

What can you trust? Some guidance to support the critical review of evidence – Mitch Waterman, Geri Akerman & Kieran McCartan

It is accepted, and evidenced, that being a confident and competent practitioner leads to better outcomes in almost every sphere of practice; confidence lends credibility and engenders trust by clients. However, we must be cautious: confidence is not a sign of best practice or creditability in its own right, it needs to be rooted in good, effective working practices, supported by evidence – overconfidence with challenging tools and practices can be damaging. Published research is often cited as providing clarity on good practice and thereby reinforcing confidence –  it’s seen as the evidence that the approach or intervention is effective and/or is working in the best way (or otherwise of course), supporting informed decision-making.  This article offers guidance to readers to support them in critically reviewing evidence, and was the subject of a workshop at the NOTA Conference, 2019.

Yet there are now thousands of academic journals reporting research, hundreds of which may contain research relevant to the NOTA membership. There are also government (UK and otherwise), NGO and charity reports, and though typically these are not subject to the same level of ‘peer-review’ (the measure of quality assurance) as academic journals, they often also contain ‘evidence’ – material that may support or undermine the intervention. Furthermore, regardless of peer review, some of these publications (like government papers) have vested interests beyond the data which may call into question the outcomes. Therefore, it’s always worth considering the “who, what, where, why and what” of research and all related publications. In such a world, how can one be sure that what one is advised or selects to read to strengthen one’s confidence, is trustworthy?

In recent years, though with a long history, a growing sense of disquiet has emerged about the quality of research, reporting and publishing, perhaps most notably exemplified by the piece by John Ioannidis, published in 2005 in PLoS Medicine, alarmingly titled “Why most published research findings are false”, or more recently by David Colquhoun in 2017, whose searching paper in Royal Society Open Science, examined the problems of reproducibility. The concerns focus on a number of challenges for researchers and publication, collectively serving to question the quality of published evidence. Some critique targets the nature of today’s academic publishing; far too many publications and findings split across them; too little detail provided about researchers’ conflicts of interest; sample and effect sizes too small to be meaningful in the real world; generalisation from qualitative studies; and some on biases – systematic error which may inflate or deflate the size of effects. There is always error in research; in any form of measurement, there is measurement error – even two simple mercury thermometers won’t give quite the same reading in a cup of boiling water – but when bias is added, trusting research findings becomes much more problematic.

So, what can we do? First, we should acknowledge that some types of research are inherently less likely to be biased; that is, they have greater internal validity – the degree to which what one aims to measure is actually measured, and external validity – the degree to which the findings might be generalizable to a population beyond those participating in the study, than others. Studies likely to be more trustworthy tend to be on a larger scale and subject to greater control of factors which undermine validity, though these studies are far less numerous than, for example, case reports or expert comments. Perhaps a brief general comparison would be illustrative here?

Let’s consider a systematic review, examining, with detailed inclusion criteria, numerous reports of, for example, the effectiveness of a particular intervention, compared to a publication reporting interview data on a small number of participants, perhaps on their experiences or motivation for treatment. The review will very likely have followed PRISMA[1] guidelines, will have an explicit method, detailing inclusion criteria, and will typically show all the results considered and present a synthesis, indicating for the reader the general picture on effectiveness for that intervention. In contrast, the interview study will present an analysis of how the small group of participants responded, and although the authors will probably have provided quite detailed information about how those participants were recruited, those participants will still be volunteers, whose motivation in participating might be uncertain, and the degree to which they might be typical of a larger population probably even more uncertain. Such studies are therefore inherently likely to suffer from more bias, though they can be invaluable in helping shape hypotheses for future studies.  Some phenomena researchers may want to investigate can realistically only be examined through such approaches. Figure 1 outlines a quality ‘hierarchy’, and in virtually all practice-based activity, be it medicine, management, or work with offenders, you will find something very similar.

Figure 1: The hierarchy of research evidence; note that study types towards the top are far less numerous than those towards the bottom of the table.

If researchers have been explicit about the study design (not always the case!) you can generally trust that a systematic review or controlled trial, will have more trustworthy findings, probably generalizable to a wider population, than, for example, a cross-sectional study (e.g. survey) or series of case reports.

Second, we must try to be mindful of likely biases. Bias takes many forms – some are intrinsic to the study design, others a feature of how research might have been conducted, some a product of how research is reported, and some of course, our own. An exhaustive list is beyond the scope of this piece, but the following sources of bias are ones to watch out for:

  • Selection bias – that groups in a trial or a case-control study are not sufficiently similar to be compared.
  • Performance bias – that the groups are treated differently.
  • Detection bias – that the groups are measured differently.
  • Attrition bias – that withdrawal or follow up from the groups differs systematically.
  • Reporting bias – systematic differences between reported and unreported outcomes (e.g. reporting only significant findings).
  • Acquiescence bias – the tendency for participants to agree with a proposition.
  • Social desirability bias – the tendency for participants to respond in ways they think are expected, or in some way protects their image.
  • Habituation or central tendency response bias – the tendency for participants to give similar responses, often in the middle of scales.
  • Confirmation bias – the tendency of researchers to see results as confirming their hypotheses.
  • Measurement biases – examples here include, question ordering, leading questions/items in scales, cultural biases and interviewer biases.

Third, we can use published tools to help us be critical. Amongst the most useful are:

  • STROBE (STrengthening the Reporting of OBservational studies in Epidemiology) checklists, which indicate what researchers should report and what readers should expect to find, though you’ll be astonished at how often details are missing. Sometimes though this is a product of strict word limits – another problem with publication. The following link takes you to the STROBE Statement homepage, from which a number of checklists – for various different research designs can be found (https://www.strobe-statement.org/index.php?id=strobe-home)
  • The Cochrane Collaboration’s Risk of Bias Tool (now in its second version RoB2) was designed to help researchers both identify sources of bias and introduce controls; these tools also help readers determine if there is bias in a published study, and focus on randomized and non-randomized trials, but are useful for most experimental designs too. (https://www.riskofbias.info/)
  • The Mixed Methods Appraisal Tool (MMAT), was designed to support readers in assessing the methodological quality of most types of research, both quantitative and qualitative, and is probably the best option for mixed methods publications. (http://mixedmethodsappraisaltoolpublic.pbworks.com/w/file/fetch/127916259/MMAT_2018_criteria-manual_2018-08-01_ENG.pdf)

Finally, we should remember a few key messages to help us appreciate if what we are reading is worth treating as valid evidence:

  1. Be critical and use the tools if in doubt;
  2. Publication does not guarantee that the evidence is sound, therefore look at a range of publications on the same issue (if they exist) to make an informed decision;
  3. Findings that have been replicated are likely to be more trustworthy;
  4. Findings that have been replicated in different populations or using different methods that broadly say the same thing are more trustworthy;
  5. Think about who the researcher, publisher or organisation is and if they have a vested interest in the findings;
  6. Large samples and large effects are also likely to be more trustworthy;
  7. Be mindful of your own biases;
  8. Ask questions – contact authors, researchers you know, members of the Research Committee (Research@NOTA.co.uk);
  9. Look at the EQUATOR Network site: a fantastic repository of guidance and tools to enhance the quality and transparency of (health) research: https://www.equator-network.org/;
  10. Be positive; lots of people are trying to provide evidence, almost always with the best of intentions.

Professor Mitch Waterman, Academic, University of Leeds

Professor Kieran McCartan, Academic, University of the West of England

Dr Geri Akerman, Honorary Professor & Therapy Manager, Enhanced Assessment Unit, HMP Grendon.

NOTA Research Committee



Colquhoun, D. (2017). The reproducibility of research and the misinterpretation of p-values. Royal Society Open Science. https://doi.org/10.1098/rsos.171085

Higgins, J.P.T., Altman, D.G., Gøtsche, P.C., Jȕni, P., et al. (2011). The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. https://doi.org/10.1136/bmj.d5928

Hong, Q. N., Pluye, P.,  Fàbregues, S., Bartlett, G. et al. (2018). Mixed Methods Appraisal Tool (MMAT) Version 2018. McGill University. http://mixedmethodsappraisaltoolpublic.pbworks.com/w/file/fetch/127916259/MMAT_2018_criteria-manual_2018-08-01_ENG.pdf

Ioannidis, J.P. (2005). Why most published research findings are false. PLoS Medicine. https://doi.org/10.1371/journal.pmed.0020124

PRISMA: Transparent Reporting of Systematic Reviews and Meta-Analyses http://www.prisma-statement.org/

University of Bern. (2009). The STROBE Statement. https://www.strobe-statement.org/index.php?id=strobe-home

[1] PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines detail the evidence-based set of requirements for reports of systematic reviews and meta-analyses; though originally devised for reviews of randomised trials, the requirements apply just as well for reviews of any type of intervention.