In recent years, the GRADE system (Grade of Recommendation, Assessment, Development and Evaluation) has been utilized in the publication of guidelines and recommendations, mostly concerning treatment, both with drug and non-drug therapies in different areas of medicine.1–3 Very recently, a number of clinical practice guidelines in the field of rheumatology have employed this system, including guidelines dealing with polymyalgia rheumatica, rheumatoid arthritis and fibromyalgia, among others.4–8
Evidence-based medicine facilitates headway in this uncertain terrain and aid in clinical decision making. Evidence-based guidelines have undergone a profound transformation in recent years. There are a number of systems to evaluate scientific evidence and assign different grades of recommendation. However, these systems have certain drawbacks, as they do not include a balance of risks and benefits, nor do they take into account resource use or costs, and the fact that they were mostly developed by consensus of expert opinion and have not been validated. Clinical practice guideline panel members can have strong opinions or academic biases concerning a specific area in which they have clinical experience or other biases with respect to interactions with academic colleagues or coworkers in the industry.9 In an attempt to address these problems, GRADE requires systematic and pragmatic searches of the literature and summarizing the evidence, ideally based on grouped treatment effects and produced by panel members with no conflicts of interest or independent methodologists. The latter helps to ensure an impartial and reproducible evaluation of the literature addressing a specific clinical question.
The GRADE working group is a multidisciplinary international collaboration of experts in developing a common, transparent and sensible system for rating quality of evidence and the strength of recommendations.10 The GRADE system has been adopted by more than 80 agencies and organizations recognized worldwide, such as the World Health Organization (WHO), the Cochrane Collaboration Group, the United Kingdom National Institute for Health, and the National Institute for Health and Clinical Excellence (NICE), among others.
In the development of GRADE, the authors considered a wide range of clinical questions, including diagnosis, screening, prevention and treatment. To go from a generic clinical question to one specifically formulated in such a way that it facilitates literature searches and the drafting of recommendations for each question, the method referred to as Patients – Intervention – Comparison – Outcome (PICO) was preferred. On drawing up the clinical questions in PICO format, the issue is specifically defined, without ambiguity. Moreover as each type of question corresponds to a type of study in which the design is suitable for the response. The format aids in doing a literature search.11
The advantages of the GRADE approach are that it: (a) carefully considers the relative importance of the outcome variables and selects those that are most significant; (b) provides detailed descriptions of the criteria for quality of evidence with respect to specific results or outcomes and uses explicit definitions and sequential judgements during the categorization process; (c) separates the quality of evidence from the strength of the recommendations; and (d) moreover, considers the balance between benefits and risks, patient values and resource use or costs. It also provides tables showing the so-called evidence profiles (EP) and summary of findings (SoF). Finally, software has been developed, with its associated help files, that facilitate the development of EP and SoF tables based on EP.12,13
The GRADE system is composed of 8 criteria to evaluate quality of evidence; 5 of the 8 criteria are capable of downgrading quality of evidence, even in a randomized controlled trial (RCT): risk of biases, inconsistency of the results across studies, indirectness, imprecision and publication biases.14,15 For example, a RCT conducted with inadequate blinding of the sequence and with a high rate of dropouts should not be considered equivalent to a well-performed RCT. Moreover, 3 more criteria were proposed with the potential to increase confidence: a strong association without confounders, the existence of a dose–response gradient based on studies without the problems of biases or imprecisions and the evidence that all the possible confounders or biases could have reduced the observed effect.16 Thus, an observational report like a case–control study with a strong association, demonstrating a dose-response gradient, which in other circumstances would suggest a weak design, with the GRADE approach would produce evidence at the level of a RCT. The GRADE method proposes 4 levels to express the quality of evidence: high, moderate, low and very low.14
The first GRADE criterion, risk of bias or design limitations, is conceptually a matter of the internal validity of a scientific study. The degree of risk of bias can be determined by a careful reading of the methods section of each original study and evaluating how well the authors carried out the planning and performance of the study.17 There is evidence in medicine that methodological imperfections in a RCT can have an influence on the estimated effect, which is usually exaggerated.18
The second criterion, inconsistency of the results across the studies included in a systematic review, signifies that the results deviate from one another, and this naturally leads to decreased confidence in the effectiveness of the estimate. If the original studies are clinically homogeneous (responding to the question posed by the investigation) and the methodological quality is high, but the results are inconsistent, then statistical analyses will probably demonstrate that there is heterogeneity in the results.19
The third GRADE criterion, imprecision or absence of direct evidence, refers to any deviation from the research question of the studies included in the systematic review. In cases of the absence of direct comparisons between the interventions being considered, or of substantial differences between the available studies and the population, the interventions or outcomes put forward in the question of interest, we may find that we only have access to indirect information. The use of surrogate outcomes may not be associated with the primary outcome. This may cause problems of applicability.20
The fourth GRADE criterion, imprecision, conceptually reflects the random variation in the estimate of the outcome and is different from the internal validity. If the original studies in a systematic review are clinically homogeneous and all of them have a low risk of bias, it is appropriate to conduct a meta-analysis and obtain an overall estimate. The 95% confidence interval (CI) is frequently interpreted to signify that, with a certainty of 95%, the true value of a parameter can be found within the given range. The information on the width of this interval can be used as the basis for clinical inference. For example, this enables the conclusion that the CI have exceeded the clinically important minimal difference.21
The fifth GRADE criterion, which can reduce the confidence in the results of a systematic review, is publication bias. The selective report of outcomes is a matter of the internal validity of a given study, and should be included in the criteria for “limitations for study quality”. When individual studies are not published, there can be biases in systematic reviews. The existence of publication biases is one of the potential sources of risk of bias in systematic reviews. The obligatory risk of registry of clinical trials has enhanced the possibilities of identifying publication bias.22
We must recognize that the system has certain limitations; thus, firstly, the method was initially developed to respond to questions on alternative interventions, especially for treatment or prevention, not for risk or prognosis, and it has problems with respect to diagnostic tests, public health issues and health care systems. However, in recent years, adaptations of this method have been designed for diagnostic23 and prognostic studies,24 which are now being used in systematic reviews. Secondly, although the system employs highly systematic, transparent and reproducible judgements, it does not completely eliminate possible disagreements in the evaluation of evidence or in deciding alternative courses, given that there is always a subjective impregnation in every judgement. Finally, we should point out that a number of researchers analyzing complex systematic reviews have identified difficulties in applying the criteria for the evaluation of quality using GRADE in complex interventions.25
Please cite this article as: Mendoza Pinto C, García Carrasco M. Sistema GRADE, evaluación sistemática y transparente. Reumatol Clin. 2018;14:65–67.