Standards of Evidence
Evidence: Something that furnishes proof or tends to furnish proof
Confusion exists around the term “evidence-based programs” because of multiple definitions in various program registries and misunderstandings by many of what is minimally required to provide sufficient proof that a program is effective. Blueprints for Healthy Youth Development has created an Evidence Continuum graphic to provide clarification as to the level of evidence required to be called an evidence-based program.
Evidence in support of the effectiveness of a program, practice, or policy falls on a continuum ranging from very low to very high levels of confidence. The more rigorous the research design of evaluations and the greater the number of positive evaluations, the greater confidence users can have that the intervention will reach its goal of helping youth.
Evidence with the lowest level of confidence is “opinion informed.” This includes information such as anecdotes, testimonials, and personal experiences obtained from a few individuals. A satisfaction survey is only a step above, as it still involves opinions of a program, even if based on a larger sample. This type of evidence, while useful in developing a program in the early stages, fails to examine targeted youth outcomes in a systematic way. It does not provide any real “proof” of effectiveness and ranks “very low” on the confidence continuum.
Research-informed studies rely on more than testimonial or professional insight by gathering data on youth outcomes from surveys, agency records, or other sources. They provide some evidence of effectiveness, but the level of confidence is “low.” The basic problem is that they do not isolate the impact of the program from other possible influences on targeted youth outcomes.
Correlational studies can reveal if a relationship exists between a program and a desired outcome (i.e., a positive relationship, a negative relationship or no relationship). However, demonstrating that a relationship exists does not prove that one variable caused the other. For example, a correlational study might show that being in a community-based treatment program was related to lower recidivism rates than being in a state correctional program, but this finding does not prove that the lower recidivism rate was due to being in the community program. The judge may be sentencing more serious offenders to the correctional program; those in the community program may have come from slightly better homes or schools or more positive peer groups, all of which could explain the difference in recidivism. Correlational studies cannot show that it was the program that actually caused the difference in observed outcome.
Other research-informed studies provide evidence of effectiveness by collecting survey data from program participants at posttest only or pretest and posttest. Because a control or comparison group is lacking, it is not clear that the program caused posttest outcomes or changes from pretest to posttest. Changes may well have occurred among similar subjects not going through the program. In other words, we cannot attribute the outcomes to the program, as the outcomes may have been produced by other factors.’
Thus, research-informed studies lack an appropriate comparison group and evidence of a causal effect. These studies provide some preliminary support for a program that can help justify more rigorous experimental evaluation, but they rate low on the confidence continuum.
Experimental and Experimentally Proven (Evidence-Based Programs)
At the higher end of the continuum are “experimental” and “experimentally proven” studies. These comprise what is commonly referred to as “evidence-based programs (EBPs).” Virtually all web-based registries of EBPs require experimental evidence for certification as an EBP. All experimental studies use designs that involve comparison or control groups. If participants receiving the program have better outcomes than those in the comparison or control groups, that is, those not receiving the program, the program likely is having the intended effect (i.e, is the cause of this effect). However, levels of confidence and evidence of effectiveness attributed to experimental studies can vary from moderate to very high.
At the moderate range of confidence are a set of designs that are commonly called quasi-experimental designs (QEDs). The three identified on the graph are the most frequently utilized of this type of design. These designs all lack the element of random assignment that characterize randomized control trials (RCTs) and the certainty that the intervention and control groups are identical at the start of the study. Comparison groups may be matched on measured characteristics at the start of an evaluation study but may nonetheless differ on unmeasured characteristics.