Search by Outline Set Search Limits Advanced Search Back Home

Levels of Evidence and Recommendations

Below represent the criteria for how we rank the level of evidence and our recommendations.  We have chosen to follow well-established and accepted standards that are also used by other organizations.  The various criteria for our recommendations include:

  • Levels of Evidence from the Centre for Evidence-Based Medicine (CEBM), Oxford
  • Grade of Recommendation (per CEBM)
  • Quality of Evidence Rating (per GRADE criteria)

We are always open to constructive criticism and your feedback. Therefore, if you feel that we have made an error or inappropriately graded the evidence, please feel free to send us objective feedback that is also respectful and constructive so that we can all benefit from this free service.

Levels of Evidence

  • The following criteria comes from the Centre for Evidence-Based Medicine (CEBM), Oxford.  For more information please click here

    Therapy, Prevention, Etiology, Harm:

    • 1a = Systematic reviews (with homogeneity) of randomized controlled trials (RCT)
    • 1b = Individual RCT (with narrow confidence interval)
    • 1c = All or none.  Met when all patients died before the Rx became available, but some now survive on it; or when some patients died before the Rx became available, but none now die on it.
    • 2a = SR (with homogeneity) of cohort studies
    • 2b = Individual cohort study (including low quality RCT; e.g., <80% follow-up
    • 2c = "Outcomes" research; Ecological studies
    • 3a = SR (with homogeneity) of case-control studies
    • 3b = Individual case-control study
    • 4   = Case-series (and poor quality cohort and case-control studies)
    • 5   = Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"

    Prognosis:

    • 1a = Systematic reviews (SR; with homogeneity) of inception cohort studies; clinical decision rule (CDR) validated in different populations
    • 1b = Individual inception cohort study with > 80% follow-up; CDR validated in a single population
    • 1c = All or none case-series
    • 2a = SR (with homogeneity) of either retrospective cohort studies or untreated control groups
    • 2b = Retrospective cohort study or follow-up of untreated control patients in an RCT; derivation of CDR or validated on split-sample only (split-sample validation is achieved by collecting all the information in a single tranche, then artificially dividing this into "derivation" and "validation" samples)
    • 2c = "Outcomes" research
    • 4  = Case-series (and poor quality prognostic cohort studies).  Poor quality prognostic cohort study is meant to be in which sampling was biased in favor of patients who already had the target outcome, or the measurement of outcomes was accomplished in <80% of study patients, or outcomes were determined in an unblinded, non-objective way, or there was no correction for confounding factors.
    • 5 = Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"

    Diagnosis:

    • 1a = Systematic reviews (with homogeneity) of Level 1 diagnostic studies; clinical decision rule (CDR) with 1b studies from different clinical centers
    • 1b = Validating cohort study with good reference standards; or CDR tested within one clinical center
    • 1c = Absolute SpPins and SnNouts, where "SpPins" is a diagnostic finding whose Specificity is so high that a Positive result rules-in the diagnosis. An "Absolute SnNout" is a diagnostic finding whose Sensitivity is so high that a Negative result rules-out the diagnosis.
    • 2a = SR (with homogeneity) of Level >2 diagnostic studies
    • 2b = Retrospective cohort study or poor follow-up
    • 3a = SR (with homogeneity) of 3b and better studies
    • 3b = Non-consecutive study or without consistently applied reference standards
    • 4  = Case-control study, poor or non-independent reference standard
    • 5  = Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"

    Differential Diagnosis, Symptom Prevalence Study:

    • 1a = Systematic review (with homogeneity) of prospective cohort studies
    • 1b = Prospective cohort study with good follow-up
    • 1c = All or none case-series
    • 2a = SR (with homogeneity) of 2b and better studies
    • 2b = Retrospective cohort study or poor follow-up
    • 2c = Ecological studies
    • 3a = SR (with homogeneity) of 3b and better studies
    • 3b = Non-consecutive cohort study, or very limited population 
    • 4  = Case-series or superseded reference standards
    • 5  = Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"

    Economic and Decision Analysis:

    • 1a = SR (with homogeneity*) of Level 1 economic studies 
    • 1b = Analysis based on clinically sensible costs or alternatives; systematic review(s) of the evidence; and including multi-way sensitivity analyses 
    • 1c = Absolute better-value or worse-value analyses.  Better-value treatments are clearly as good but cheaper, or better at the same or reduced cost. Worse-value treatments are as good and more expensive, or worse and the equally or more expensive.
    • 2a = SR (with homogeneity*) of Level > 2 economic studies 
    • 2b = Analysis based on clinically sensible costs or alternatives; limited review(s) of the evidence, or single studies; and including multi-way sensitivity analyses
    • 2c = Audit or outcomes research
    • 3a = SR (with homogeneity) of 3b and better studies
    • 3b = Analysis based on limited alternatives or costs, poor quality estimates of data, but including sensitivity analyses incorporating clinically sensible variations
    • 4  = Analysis with no sensitivity analysis
    • 5  = Expert opinion without explicit critical appraisal, or based on economic theory or "first principles"

    Notes and Definitions:

    • Clinical Decision Rule = These are algorithms or scoring systems that lead to a prognostic estimation or a diagnostic category.
    • Homogeneity = means a systematic review that is free of worrisome variations (heterogeneity) in the directions and degrees of results between individual studies. Not all systematic reviews with statistically significant heterogeneity need be worrisome, and not all worrisome heterogeneity need be statistically significant. As noted above, studies displaying worrisome heterogeneity should be tagged with a "-" at the end of their designated level. 
    • Poor Quality Cohort Study = means one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded), objective way in both exposed and non-exposed individuals and/or failed to identify or appropriately control known confounders and/or failed to carry out a sufficiently long and complete follow-up of patients. By poor quality case-control study we mean one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded), objective way in both cases and controls and/or failed to identify or appropriately control known confounders.

Grade of Recommendation per CEBM

  • The grade of recommendation is based on the criteria set forth by the Oxford Centre for Evidence-Based Medicine (CEBM).  The level of studies mentioned reflect the level of evidence (LOE) from above. 

    • A = Consistent level 1 studies
    • B = Consistent level 2 or 3 studies or extrapolations from level 1 studies
    • C = Level 4 studies or extrapolations from level 2 or 3 studies
    • D = Level 5 evidence or troubling inconsistent or inconclusive studies at any level

Quality of Evidence per GRADE Criteria

  • Where applicable or used, we may offer a grade on the quality of evidence as put forth by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach.  For more information click here

    • High Quality (++++) = Further research is very unlikely to change our confidence in the estimate of effect.
    • Moderate Quality (+++-) = Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
    • Low Quality (++--) = Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
    • Very Low Quality (+---) = Any estimate of effect is very uncertain. 


    General notes about the use of the GRADE criteria:

    • Randomized controlled trials (RCTs) start as "high-quality" evidence and observational studies start as "low-quality" evidence.
    • The quality of a recommendation may be adjusted down if there are limitations to study design or implementation, imprecise estimates (e.g., wide confidence-intervals), variability in results, evidence is indirect, or presence of publication bias.
    • The quality of a recommendations may be adjusted up if there is a large magnitude of effect, a dose response gradient seen, and if all plausible boas would reduce an apparent treatment effect.
    • There are several limitations to the use of the GRADE criteria.  1).  It was developed to address questions about alternative management strategies, interventions, or policies and not for risk or prognosis.  2).  Its application to "ill-defined" recommendations may prove to be problematic for a guideline committee.  3).  The process of implementation is time consuming and requires a number of followed steps.  4) Most of the application has been in the evaluation of preventive and therapeutic interventions and in addressing clinical questions rather than public health and health systems related questions. 5). It cannot eliminate disagreements made when evaluating the literature or evidence as it relates to the relevance or importance of outcomes.