Publish in this journal
Journal Information
Vol. 14. Issue 2.
Pages 65-67 (March - April 2018)
Vol. 14. Issue 2.
Pages 65-67 (March - April 2018)
DOI: 10.1016/j.reumae.2017.07.002
Full text access
GRADE system, systematic and transparent evaluation
Sistema GRADE, evaluación sistemática y transparente
Claudia Mendoza Pintoa,b, Mario García Carrascoa,b,.
Corresponding author

Corresponding author.
a Unidad de Investigación de Enfermedades Autoinmunes Sistémicas, Hospital General Regional N.o 36, IMSS, Puebla, Puebla, Mexico
b Departamento de Reumatología e Inmunología, Facultad de Medicina, Benemérita Universidad Autónoma de Puebla, Puebla, Puebla, Mexico
Article information
Full Text
Download PDF
Full Text

In recent years, the GRADE system (Grade of Recommendation, Assessment, Development and Evaluation) has been utilized in the publication of guidelines and recommendations, mostly concerning treatment, both with drug and non-drug therapies in different areas of medicine.1–3 Very recently, a number of clinical practice guidelines in the field of rheumatology have employed this system, including guidelines dealing with polymyalgia rheumatica, rheumatoid arthritis and fibromyalgia, among others.4–8

Evidence-based medicine facilitates headway in this uncertain terrain and aid in clinical decision making. Evidence-based guidelines have undergone a profound transformation in recent years. There are a number of systems to evaluate scientific evidence and assign different grades of recommendation. However, these systems have certain drawbacks, as they do not include a balance of risks and benefits, nor do they take into account resource use or costs, and the fact that they were mostly developed by consensus of expert opinion and have not been validated. Clinical practice guideline panel members can have strong opinions or academic biases concerning a specific area in which they have clinical experience or other biases with respect to interactions with academic colleagues or coworkers in the industry.9 In an attempt to address these problems, GRADE requires systematic and pragmatic searches of the literature and summarizing the evidence, ideally based on grouped treatment effects and produced by panel members with no conflicts of interest or independent methodologists. The latter helps to ensure an impartial and reproducible evaluation of the literature addressing a specific clinical question.

The GRADE working group is a multidisciplinary international collaboration of experts in developing a common, transparent and sensible system for rating quality of evidence and the strength of recommendations.10 The GRADE system has been adopted by more than 80 agencies and organizations recognized worldwide, such as the World Health Organization (WHO), the Cochrane Collaboration Group, the United Kingdom National Institute for Health, and the National Institute for Health and Clinical Excellence (NICE), among others.

In the development of GRADE, the authors considered a wide range of clinical questions, including diagnosis, screening, prevention and treatment. To go from a generic clinical question to one specifically formulated in such a way that it facilitates literature searches and the drafting of recommendations for each question, the method referred to as Patients – Intervention – Comparison – Outcome (PICO) was preferred. On drawing up the clinical questions in PICO format, the issue is specifically defined, without ambiguity. Moreover as each type of question corresponds to a type of study in which the design is suitable for the response. The format aids in doing a literature search.11

The advantages of the GRADE approach are that it: (a) carefully considers the relative importance of the outcome variables and selects those that are most significant; (b) provides detailed descriptions of the criteria for quality of evidence with respect to specific results or outcomes and uses explicit definitions and sequential judgements during the categorization process; (c) separates the quality of evidence from the strength of the recommendations; and (d) moreover, considers the balance between benefits and risks, patient values and resource use or costs. It also provides tables showing the so-called evidence profiles (EP) and summary of findings (SoF). Finally, software has been developed, with its associated help files, that facilitate the development of EP and SoF tables based on EP.12,13

The GRADE system is composed of 8 criteria to evaluate quality of evidence; 5 of the 8 criteria are capable of downgrading quality of evidence, even in a randomized controlled trial (RCT): risk of biases, inconsistency of the results across studies, indirectness, imprecision and publication biases.14,15 For example, a RCT conducted with inadequate blinding of the sequence and with a high rate of dropouts should not be considered equivalent to a well-performed RCT. Moreover, 3 more criteria were proposed with the potential to increase confidence: a strong association without confounders, the existence of a dose–response gradient based on studies without the problems of biases or imprecisions and the evidence that all the possible confounders or biases could have reduced the observed effect.16 Thus, an observational report like a case–control study with a strong association, demonstrating a dose-response gradient, which in other circumstances would suggest a weak design, with the GRADE approach would produce evidence at the level of a RCT. The GRADE method proposes 4 levels to express the quality of evidence: high, moderate, low and very low.14

The first GRADE criterion, risk of bias or design limitations, is conceptually a matter of the internal validity of a scientific study. The degree of risk of bias can be determined by a careful reading of the methods section of each original study and evaluating how well the authors carried out the planning and performance of the study.17 There is evidence in medicine that methodological imperfections in a RCT can have an influence on the estimated effect, which is usually exaggerated.18

The second criterion, inconsistency of the results across the studies included in a systematic review, signifies that the results deviate from one another, and this naturally leads to decreased confidence in the effectiveness of the estimate. If the original studies are clinically homogeneous (responding to the question posed by the investigation) and the methodological quality is high, but the results are inconsistent, then statistical analyses will probably demonstrate that there is heterogeneity in the results.19

The third GRADE criterion, imprecision or absence of direct evidence, refers to any deviation from the research question of the studies included in the systematic review. In cases of the absence of direct comparisons between the interventions being considered, or of substantial differences between the available studies and the population, the interventions or outcomes put forward in the question of interest, we may find that we only have access to indirect information. The use of surrogate outcomes may not be associated with the primary outcome. This may cause problems of applicability.20

The fourth GRADE criterion, imprecision, conceptually reflects the random variation in the estimate of the outcome and is different from the internal validity. If the original studies in a systematic review are clinically homogeneous and all of them have a low risk of bias, it is appropriate to conduct a meta-analysis and obtain an overall estimate. The 95% confidence interval (CI) is frequently interpreted to signify that, with a certainty of 95%, the true value of a parameter can be found within the given range. The information on the width of this interval can be used as the basis for clinical inference. For example, this enables the conclusion that the CI have exceeded the clinically important minimal difference.21

The fifth GRADE criterion, which can reduce the confidence in the results of a systematic review, is publication bias. The selective report of outcomes is a matter of the internal validity of a given study, and should be included in the criteria for “limitations for study quality”. When individual studies are not published, there can be biases in systematic reviews. The existence of publication biases is one of the potential sources of risk of bias in systematic reviews. The obligatory risk of registry of clinical trials has enhanced the possibilities of identifying publication bias.22

We must recognize that the system has certain limitations; thus, firstly, the method was initially developed to respond to questions on alternative interventions, especially for treatment or prevention, not for risk or prognosis, and it has problems with respect to diagnostic tests, public health issues and health care systems. However, in recent years, adaptations of this method have been designed for diagnostic23 and prognostic studies,24 which are now being used in systematic reviews. Secondly, although the system employs highly systematic, transparent and reproducible judgements, it does not completely eliminate possible disagreements in the evaluation of evidence or in deciding alternative courses, given that there is always a subjective impregnation in every judgement. Finally, we should point out that a number of researchers analyzing complex systematic reviews have identified difficulties in applying the criteria for the evaluation of quality using GRADE in complex interventions.25

G.H. Guyatt, S.L. Norris, S. Schulman, J. Hirsh, M.H. Eckman, E.A. Akl, et al.
Methodology for the development of antithrombotic therapy and prevention of thrombosis guidelines: Antithrombotic Therapy and Prevention of Thrombosis, 9th ed: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines.
B.A. Lipsky, A.R. Berendt, P.B. Cornia, J.C. Pile, E.J.G. Peters, D.G. Armstrong, et al.
2012 Infectious Diseases Society of America clinical practice guideline for the diagnosis and treatment of diabetic foot infections.
Clin Infect Dis, 54 (2012), pp. e132-e173
G. Raghu, B. Rochwerg, Y. Zhang, C.A.C. Garcia, A. Azuma, J. Behr, et al.
An Official ATS/ERS/JRS/ALAT clinical practice guideline: treatment of idiopathic pulmonary fibrosis. An update of the 2011 clinical practice guideline.
Am J Respir Crit Care Med, 192 (2015), pp. e3-e19
M.C. Hochberg, R.D. Altman, K.T. April, M. Benkhalti, G. Guyatt, J. McGowan, et al.
American College of Rheumatology 2012 recommendations for the use of nonpharmacologic and pharmacologic therapies in osteoarthritis of the hand, hip, and knee.
Arthritis Care Res (Hoboken), 64 (2012), pp. 465-474
C. Dejaco, Y.P. Singh, P. Perel, A. Hutchings, D. Camellino, S. Mackie, et al.
2015 recommendations for the management of polymyalgia rheumatica: a European League Against Rheumatism/American College of Rheumatology collaborative initiative.
Ann Rheum Dis, 74 (2015), pp. 1799-1807
M.M. Ward, A. Deodhar, E.A. Akl, A. Lui, J. Ermann, L.S. Gensler, et al.
American College of Rheumatology/Spondylitis Association of America/Spondyloarthritis Research and Treatment Network 2015. Recommendations for the treatment of ankylosing spondylitis and nonradiographic axial spondyloarthritis.
Arthritis Rheumatol (Hoboken, NJ), 68 (2016), pp. 282-298
J.A. Singh, K.G. Saag, S.L.J. Bridges, E.A. Akl, R.R. Bannuru, M.C. Sullivan, et al.
2015 American College of Rheumatology guideline for the treatment of rheumatoid arthritis.
Arthritis Rheumatol (Hoboken, NJ), 68 (2016), pp. 1-26
G.J. Macfarlane, C. Kronisch, L.E. Dean, F. Atzeni, W. Hauser, E. Fluss, et al.
EULAR revised recommendations for the management of fibromyalgia.
Ann Rheum Dis, 76 (2017), pp. 318-328
G. Guyatt, E.A. Akl, J. Hirsh, C. Kearon, M. Crowther, D. Gutterman, et al.
The vexing problem of guidelines and conflict of interest: a potential solution.
G. Guyatt, A.D. Oxman, E.A. Akl, R. Kunz, G. Vist, J. Brozek, et al.
GRADE guidelines. 1. Introduction—GRADE evidence profiles and summary of findings tables.
J Clin Epidemiol, 64 (2011), pp. 383-394
G.H. Guyatt, A.D. Oxman, R. Kunz, D. Atkins, J. Brozek, G. Vist, et al.
GRADE guidelines. 2. Framing the question and deciding on important outcomes.
J Clin Epidemiol, 64 (2011), pp. 395-400
G.H. Guyatt, A.D. Oxman, N. Santesso, M. Helfand, G. Vist, R. Kunz, et al.
GRADE guidelines. 12. Preparing summary of findings tables—binary outcomes.
J Clin Epidemiol, 66 (2013), pp. 158-172
G.H. Guyatt, K. Thorlund, A.D. Oxman, S.D. Walter, D. Patrick, T.A. Furukawa, et al.
GRADE guidelines. 13. Preparing summary of findings tables and evidence profiles—continuous outcomes.
J Clin Epidemiol, 66 (2013), pp. 173-183
H. Balshem, M. Helfand, H.J. Schünemann, A.D. Oxman, R. Kunz, J. Brozek, et al.
GRADE guidelines. 3. Rating the quality of evidence.
J Clin Epidemiol, 64 (2011), pp. 401-406
G. Guyatt, A.D. Oxman, S. Sultan, J. Brozek, P. Glasziou, P. Alonso-Coello, et al.
GRADE guidelines. 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes.
J Clin Epidemiol, 66 (2013), pp. 151-157
G.H. Guyatt, A.D. Oxman, S. Sultan, P. Glasziou, E.A. Akl, P. Alonso-Coello, et al.
GRADE guidelines. 9. Rating up the quality of evidence.
J Clin Epidemiol, 64 (2011), pp. 1311-1316
G.H. Guyatt, A.D. Oxman, G. Vist, R. Kunz, J. Brozek, P. Alonso-coello, et al.
GRADE guidelines. 4. Rating the quality of evidence—study limitations (risk of bias).
J Clin Epidemiol, 64 (2011), pp. 407-415
L. Wood, M. Egger, L.L. Gluud, K.F. Schulz, P. Juni, D.G. Altman, et al.
Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study.
G.H. Guyatt, A.D. Oxman, R. Kunz, J. Woodcock, J. Brozek, M. Helfand, et al.
GRADE guidelines. 7. Rating the quality of evidence—inconsistency.
J Clin Epidemiol, 64 (2011), pp. 1294-1302
G.H. Guyatt, A.D. Oxman, R. Kunz, J. Woodcock, J. Brozek, M. Helfand, et al.
GRADE guidelines. 8. Rating the quality of evidence—indirectness.
J Clin Epidemiol, 64 (2011), pp. 1303-1310
G.H. Guyatt, A.D. Oxman, R. Kunz, J. Brozek, P. Alonso-coello, D. Rind, et al.
GRADE guidelines. 6. Rating the quality of evidence—imprecision.
J Clin Epidemiol, 64 (2011), pp. 1283-1293
G.H. Guyatt, A.D. Oxman, V. Montori, G. Vist, R. Kunz, J. Brozek, et al.
GRADE guidelines. 5. Rating the quality of evidence—publication bias.
J Clin Epidemiol, 64 (2011), pp. 1277-1282
H.J. Schünemann, A.D. Oxman, J. Brozek, P. Glasziou, R. Jaeschke, G.E. Vist, et al.
Grading quality of evidence and strength of recommendations for diagnostic tests and strategies.
A. Huguet, J.A. Hayden, J. Stinson, P.J. Mcgrath, C.T. Chambers, M.E. Tougas, et al.
Judging the quality of evidence in reviews of prognostic factor research: adapting the GRADE framework.
A. Movsisyan, G.J. Melendez-Torres, P. Montgomery.
Users identified challenges in applying GRADE to complex interventions and suggested an extension to GRADE.
J Clin Epidemiol, 70 (2016), pp. 191-199

Please cite this article as: Mendoza Pinto C, García Carrasco M. Sistema GRADE, evaluación sistemática y transparente. Reumatol Clin. 2018;14:65–67.

Copyright © 2017. Sociedad Espaola de Reumatologa y Colegio Mexicano de Reumatologa
Reumatología Clínica (English Edition)

Subscribe to our newsletter

Article options
es en

¿Es usted profesional sanitario apto para prescribir o dispensar medicamentos?

Are you a health professional able to prescribe or dispense drugs?

es en
Política de cookies Cookies policy
Utilizamos cookies propias y de terceros para mejorar nuestros servicios y mostrarle publicidad relacionada con sus preferencias mediante el análisis de sus hábitos de navegación. Si continua navegando, consideramos que acepta su uso. Puede cambiar la configuración u obtener más información aquí. To improve our services and products, we use "cookies" (own or third parties authorized) to show advertising related to client preferences through the analyses of navigation customer behavior. Continuing navigation will be considered as acceptance of this use. You can change the settings or obtain more information by clicking here.