Promoting Alternative Thinking Strategies (PATHS)

A classroom-based social emotional learning program for elementary school students to reduce aggression and behavior problems.

Program Outcomes

Antisocial-aggressive Behavior
Conduct Problems
Externalizing

Program Type

Cognitive-Behavioral Training
School - Individual Strategies
Skills Training
Social Emotional Learning

Program Setting

School

Continuum of Intervention

Universal Prevention

Age

Late Childhood (5-11) - K/Elementary

Gender

Both

Race/Ethnicity

Endorsements

Blueprints: Promising
Crime Solutions: Effective
OJJDP Model Programs: Effective
SAMHSA : 2.6-3.2

Program Information Contact

For curriculum and materials:
PATHS^® Program
Phone: 1-877-71PATHS or 1-877-717-2847
pathsprogram.com

For training:
PATHS^® Training
pathstraining.com

Also see:

SEL Worldwide
selworldwide.org
Contact: Dorothy Morelli
dorothy@selw.org
dorothygm@hotmail.com
Phone: 615-364-6606

Program Developer/Owner

Mark Greenberg and Carol Kusché
Co-developers

Brief Description of the Program

The PATHS curriculum is a comprehensive program for promoting emotional and social competencies and reducing aggression and behavior problems in elementary school-aged children (grades K-6) while simultaneously enhancing the educational process in the classroom. The evaluation of the preschool version, called Head Start REDI, is treated separately by Blueprints.

The Grade Level PATHS Curriculum consists of separate volumes of lessons for each grade level (K - 6), all of which include developmentally appropriate pictures, photographs, posters, and additional materials. Five conceptual domains, integrated in a hierarchical manner, are included in PATHS lessons at each grade level: self-control, emotional understanding, positive self-esteem, relationships, and interpersonal problem-solving skills. Throughout the lessons, a critical focus of PATHS involves facilitating the dynamic relationship between cognitive-affective understanding and real-life situations. PATHS is designed to be taught two to three times per week (or more often if desired, but not less than twice weekly), with daily activities to promote generalization and support ongoing behavior. PATHS lessons follow lesson objectives and provide scripts to facilitate instruction, but teachers have flexibility in adapting these for their particular classroom needs. Although each unit of PATHS focuses on one or more skill domains (e.g., emotional recognition, friendship, self-control, problem solving), aspects of all five major areas are integrated into each unit. Moreover, each unit builds hierarchically upon and synthesizes the learning which preceded it.

The PATHS curriculum is designed to be used by educators and counselors in a multi-year, universal prevention model. To encourage parent involvement and support, parent letters, home activity assignments, and information are also provided.

See: Full Description

PATHS is now available by grade level in the following grades: Kindergarten, Grade 1, Grade 2, Grade 3, Grade 4, and Grade 5/6. The original multi-year version is also available from the publisher. The grade level versions maintain all key elements of the original version and now organize them more discretely by grade levels. The preschool version of the program, called Head Start REDI, is treated separately by Blueprints.

PATHS targets five major conceptual domains: (1) self control; (2) emotional understanding; (3) positive self-esteem; (4) relationships; and (5) interpersonal problem solving skills. In addition, a 30-lesson non-mandatory supplementary unit reviews and extends PATHS concepts that are covered in other units.

The PATHS curriculum is designed for use by regular classroom teachers. Lessons are sequenced according to increasing developmental difficulty and designed for implementation in approximately 20-30 minutes 2 to 3 times per week. The curriculum provides detailed lesson plans, exact scripts, suggested guidelines, and general and specific objectives for each lesson. However, the curriculum has considerable flexibility so that it can also be integrated with an individual teacher's style. Lessons include such activities as dialoguing, role-playing, story-telling by teachers and peers, social and self-reinforcement, attribution training, and verbal mediation. Learning is promoted in a multi-method manner through the combined use of visual, verbal, and kinesthetic modalities.

Outcomes

Primary Evidence Base for Certification

Study 7 (Malti et al., 2012) found that by the beginning of grade 5 (just over two years after program commencement), the intervention condition, relative to a control group, showed:

Fewer externalizing behaviors (e.g., aggression)
A reduction in ADHD symptoms

Brief Evaluation Methodology

Primary Evidence Base for Certification

Of the 19 studies Blueprints has reviewed, one (Study 7) meets Blueprints evidentiary standards (specificity, evaluation quality, impact, dissemination readiness). Study 7 was conducted by independent evaluators.

Study 7

Malti et al. (2011, 2012) and Averdijk et al. (2016) used a cluster randomized controlled trial with 56 public primary schools in Zurich assigned to four treatment conditions: PATHS (n = 442), Triple-P (n = 422), PATHS+Triple-P (n = 397), and control (n = 414). The program was implemented with grades K-2, and students were followed for eight years, from ages 7-8 to when the children were 11 years old (grade 4; Malti et al., 2011, 2012) to age 13 and age 15 (Averdijk et al., 2016). Assessments measured externalizing behavior and social competence (Malti et al., 2011, 2012) and delinquency (Averdijk et al., 2016).

Blueprints Certified Studies

Study 7

Malti, T., Ribeaud, D., & Eisner, M. (2012). Effectiveness of a universal school-based social competence program: The role of child characteristics and economic factors. International Journal of Conflict and Violence, 6, 249-259.

Risk and Protective Factors

Risk Factors

Individual: Antisocial/aggressive behavior*, Early initiation of antisocial behavior, Favorable attitudes towards antisocial behavior, Hyperactivity*

School: Low school commitment and attachment, Repeated a grade

Protective Factors

Individual: Clear standards for behavior, Problem solving skills, Prosocial behavior, Skills for social interaction

Peer: Interaction with prosocial peers

School: Opportunities for prosocial involvement in education, Rewards for prosocial involvement in school

* Risk/Protective Factor was significantly impacted by the program

Subgroup Analysis Details

Subgroup differences in program effects by race, ethnicity, or gender (coded in binary terms as male/female) or program effects for a sample of a specific racial, ethnic, or gender group:

Study 7 (Malti et al., 2011, 2012) tested for subgroup differences in program effects and found equal benefits for boys and girls and for parents differing in family economic disadvantage (i.e., reported having or not having financial difficulties).

Sample demographics including race, ethnicity, and gender for Blueprints-certified studies:

The Study 7 sample was 47.5% female. About 16.5% of the parents reported having financial difficulties.

Training and Technical Assistance

PATHS program training is usually done on site at a school or school district. The initial training workshop consists of two separate days scheduled approximately 4-8 weeks apart. The first day provides teachers/trainees with theory, research background, lessons modeled by the trainer, practice to prepare teachers to use PATHS lessons, and implementation planning. During the 4-8 week period prior to the second day of training, teachers gain initial experience with the curriculum. This leads to a more interactive learning experience on the second workshop day since teachers have had some realistic experiences with lesson implementation. Trainer and teachers discuss advanced curriculum issues, trade ideas and engage in problem solving, and teachers model interactive lessons. Another option is to schedule training for two consecutive days.

For optimal implementation, sites should consider additional training/technical assistance activities each year. Ongoing consultation and booster visits are available and are often desired by comprehensive, long-term implementations. The trainer can provide a booster visit each year (one day in length) to meet with the staff and provide continued professional development. One day of fidelity visits is another option, in which the trainer visits schools, observes lessons, etc. The trainer can also provide ongoing consultation by means of regularly scheduled phone calls/conference calls and on-call email consultation with the school's or agency's PATHS coordinator.

In addition to training for teachers, when a multi-school site implementation is conducted, separate training workshops are also provided to school principals on issues in building-wide use and principal leadership. Additional trainings can be arranged for other school staff.

Training for PATHS coaches-a position often utilized by larger implementations to provide feedback, ideas, and encouragement to classroom teachers implementing the PATHS program-typically involves six on-site trainer visits per year, for training, observation, and continued professional development in social-emotional learning. Every-other-week team conference calls typically take place in between on-site training sessions, with everyone checking in to engage in problem-solving and receive additional professional development.

Training and technical assistance is available from two sources:

PATHS™ Education Worldwide
Dorothy Morelli, CEO
615-364-6606
dorothygm@hotmail.com

Carol A. Kusché, Ph.D.
PATHS^® Training LLC
927 10th Ave. East
Seattle, WA 98102
206-323-6688
ckusche@comcast.net

Training Certification Process

The PATHS Training Program is designed to develop highly experienced, high quality trainers who are fully competent to provide training in the PATHS Curriculum to their local educational entity. Trainers can include staff (teachers, support staff, staff developers) from local school districts/boards, Local Education Agencies (LEAs) and non-profit agencies focused on the promotion of children's mental health and youth development. PATHS Training LLC trains these qualified "educators" to conduct school-based or regional workshops for the preparation of teachers and school support staff who plan to implement PATHS Curricula within these educational entities. Once certified, PATHS Trainers conduct workshops and provide follow-up technical assistance and coaching services for their district or regional personnel in accordance with the PATHS workshop training materials, agenda and guidelines.

To be considered as an Affiliate Trainer requires meeting the following prerequisites:

High Quality Performance for at least two years as a PATHS teacher or PATHS Coach
Master's degree (or comparable credentials)
Classroom experience with students in a learner role (teaching, administration, and school counseling preferred)
Training experience with educators

After meeting the pre-requisites above, the requirements to be certified as a trainer include participation in the following four-step training/certification process. The AT candidate(s) receive four days of coaching from a PATHS Senior Trainer in addition to participation in an Observation Workshop and two Shared Workshops. The first day of coaching follows the Observation Workshop. The second day precedes the Shared Workshop. The third day follows the Shared Workshop in preparation for the second Shared Workshop. The fourth day follows the second Shared Workshop in preparation for certification as a PATHS trainer. The primary purpose of the coaching days are to provide detailed and personalized instruction in how to conduct the PATHS workshop and to observe and provide feedback on candidates' training skills. Candidates who successfully complete the program are certified as Affiliate Trainers.

Benefits and Costs

Program Benefits (per individual): $10,772
Program Costs (per individual): $439
Net Present Value (Benefits minus Costs, per individual): $10,332
Measured Risk (odds of a positive Net Present Value): 63%

Source: Washington State Institute for Public Policy
All benefit-cost ratios are the most recent estimates published by The Washington State Institute for Public Policy for Blueprint programs implemented in Washington State. These ratios are based on a) meta-analysis estimates of effect size and b) monetized benefits and calculated costs for programs as delivered in the State of Washington. Caution is recommended in applying these estimates of the benefit-cost ratio to any other state or local area. They are provided as an illustration of the benefit-cost ratio found in one specific state. When feasible, local costs and monetized benefits should be used to calculate expected local benefit-cost ratios. The formula for this calculation can be found on the WSIPP website.

Start-Up Costs

Initial Training and Technical Assistance

$4,000 + trainer travel costs for initial two-day teacher training for up to 40 teachers. There is usually another day for training set up and meeting with the school administration at $2,000.

Curriculum and Materials

$350 to $600 per classroom, depending on the grade level.

Materials Available in Other Language: Parent and home materials have been translated to Spanish and are free of charge with the curriculum.

PATHS is available in the following languages:
German
British English (http://www.pathseducation.co.uk/what-is-paths-across-the-uk)
Croatian (some grades)
Chinese (some grades)
Swedish (preK and K)
Dutch
Welsh
Portuguese
French (under development as of 9/20/17)
For these translations, interested persons can contact Mark Greenberg or Channing-Bete.

Licensing

None.

Other Start-Up Costs

None.

Intervention Implementation Costs

Ongoing Curriculum and Materials

$100 per year per classroom for photocopying activity sheets, poster replacement and books.

Staffing

Qualifications: None required but typically delivered by classroom teachers or school counselors.

Ratios: None required. Program designed for classroom delivery with typical classroom ratios of 15 - 25 students per teacher, depending on grade level.

Time to Deliver Intervention: Curriculum is taught three times per week for a minimum of 20 - 30 minutes and ideally should be taught throughout the school year from kindergarten through grade five.

Other Implementation Costs

A local coach is recommended for at least the first year. Coaches are usually teachers with special PATHS training. Whether a full- or part-time coach is needed depends upon how many teachers need the support.

Implementation Support and Fidelity Monitoring Costs

Ongoing Training and Technical Assistance

Technical assistance by email and phone is available from PATHS Training, LLC. While coaches are used, funds should be budgeted for annual site visits by national trainers at a cost of $4,000 plus travel.

Fidelity Monitoring and Evaluation

Local coach takes the lead in fidelity monitoring. If site does not have a coach, a local coordinator responsible for fidelity monitoring should be designated.

Ongoing License Fees

None.

Other Implementation Support and Fidelity Monitoring Costs

No information is available

Other Cost Considerations

The size of implementation is key to lowering costs. Training many teachers at one time is most cost effective.

Year One Cost Example

This example will be to implement PATHS in two elementary schools using 20 teachers and their classes of 25 students each. Schools can expect to incur the following costs:

Training, with planning day, on-site for 2 days for 20 teachers and coach	$6,500.00
Salary for Coach .5 FTE	$30,000.00
Training and Support for Coach	$8,000.00
Curriculum for 20 classrooms estimated @ $400	$8,000.00
Supplies @ $100 per classroom	$2,000.00
Booster Visit-one two-day visit @ $2000/day plus travel	$5,000.00
Total One Year Cost	$59,500.00

With 500 students participating, the cost per student is $119.

Funding Overview

It is relatively inexpensive to get PATHS started in schools, with districts only needing to identify funds for initial training and curriculum purchase. To be most effective, the ongoing implementation of PATHS requires a relatively significant commitment of classroom time in grades K-5. District and school administrators must view the development of social and emotional competence and reduction of disruptive behavior as a priority in order to commit the time.

Funding Strategies

Improving the Use of Existing Public Funds

Sustaining this program requires the ongoing allocation of existing classroom teaching time for the intervention to be delivered by teachers or counselors. To the extent that existing interventions in schools aimed at fostering the development of social and emotional competence and the reduction of disruptive behavior are not evidence-based, funding for these interventions can be considered for re-direction to PATHS.

Allocating State or Local General Funds

State and local funds, most typically from school budgets, are often allocated to purchase the initial training and curriculum. State departments of education or health may also allocate state funds toward prevention programs, and administer them to school districts competitively or through formula. Some states have put in place legislative set-asides requiring a certain portion of state agency budgets be dedicated to evidence-based programs and/or prevention programs.

Maximizing Federal Funds

Formula Funds:

Title I can potentially support curricula purchase, training, and teacher salaries in schools that are operating schoolwide Title I programs (at least 40% of the student population is eligible for free and reduced lunch). In order for Title I to be allocated, PATHS would have to be viewed as contributing to overall academic achievement.
Office of Juvenile Justice and Delinquency Prevention (OJJDP) Formula Funds support a variety of improvements to delinquency prevention programs and juvenile justice programs in states. Evidence-based programs are an explicit priority for these funds, which are typically administered on a competitive basis from the administering state agency to community-based programs.
The Mental Health Services Block Grant (MHSBG) can fund a variety of mental health promotion and intervention activities and is a potential source of support for school-based mental health promotion programs, depending on the priorities of the administering state agency.

Discretionary Grants: There are relevant federal discretionary grants administered by SAMHSA (Department of Health and Human Services), OJJDP (Department of Justice), and the Department of Education that could support the PATHS program.

Foundation Grants and Public-Private Partnerships

Since the initial training and curriculum purchases, while inexpensive, may still be prohibitive to districts interested in implementing the program, a public-private partnership in which private foundations or local education funds provide funding for initial training and curriculum and schools agree to commit staff time to implementation can be an effective approach for financing PATHS.

Generating New Revenue

New revenue streams are not typically created for this program, though the program is so low-cost that interested schools could potentially consider community fundraising through Parent Teacher Associations, student civic societies, or partnerships with local businesses and civic organizations as a means of raising dollars to support the initial training and curriculum purchases.

Data Sources

All information comes from the responses to a questionnaire submitted by the developer of the program, Mark Greenberg, to the Annie E. Casey Foundation.

Program Developer/Owner

Mark Greenberg and Carol Kusché Co-developersPrevention Research CenterPenn State UniversityUniversity Park, PA 16802-6504(814) 863-0112mxg47@psu.educkusche@comcast.net

Program Outcomes

Antisocial-aggressive Behavior
Conduct Problems
Externalizing

Program Specifics

Program Type

Cognitive-Behavioral Training
School - Individual Strategies
Skills Training
Social Emotional Learning

Program Setting

School

Continuum of Intervention

Universal Prevention

Program Goals

A classroom-based social emotional learning program for elementary school students to reduce aggression and behavior problems.

Population Demographics

PATHS is implemented with elementary school age youth (grades K-6). A modified version to be age-appropriate for preschool students (called Head Start REDI) is treated separately by Blueprints. PATHS has been shown to be effective for both males and females, different ethnic and socio-demographic populations, and a wide variety of populations, including students in regular education and special needs settings.

Target Population

Age

Late Childhood (5-11) - K/Elementary

Gender

Both

Race/Ethnicity

Subgroup Analysis Details

Subgroup differences in program effects by race, ethnicity, or gender (coded in binary terms as male/female) or program effects for a sample of a specific racial, ethnic, or gender group:

Study 7 (Malti et al., 2011, 2012) tested for subgroup differences in program effects and found equal benefits for boys and girls and for parents differing in family economic disadvantage (i.e., reported having or not having financial difficulties).

Sample demographics including race, ethnicity, and gender for Blueprints-certified studies:

The Study 7 sample was 47.5% female. About 16.5% of the parents reported having financial difficulties.

Other Risk and Protective Factors

Risk: poor self-control, lack of commitment to school, favorable attitudes toward problem behavior and early initiation, impulsiveness, and peer rejection.

Protective: prosocial orientation, positive peer relations, bonding to school.

Risk/Protective Factor Domain

Individual
School
Peer

Risk/Protective Factors

Risk Factors

Individual: Antisocial/aggressive behavior*, Early initiation of antisocial behavior, Favorable attitudes towards antisocial behavior, Hyperactivity*

School: Low school commitment and attachment, Repeated a grade

Protective Factors

Individual: Clear standards for behavior, Problem solving skills, Prosocial behavior, Skills for social interaction

Peer: Interaction with prosocial peers

School: Opportunities for prosocial involvement in education, Rewards for prosocial involvement in school

*Risk/Protective Factor was significantly impacted by the program

Brief Description of the Program

Description of the Program

Theoretical Rationale

PATHS incorporates seven factors considered critical for effective, school-based SEL curricula. These included the use of:

an integration of a variety of successful approaches and promising theories
a developmental model, including neuropsychological brain development
a multi-grade level paradigm
a strong focus on the role of emotions and emotional development
generalization of skills to everyday situations
ongoing training and support for implementation
multiple measures of both process and outcome for assessing program effectiveness

PATHS is based on five conceptual models. The first, the ABCD (Affective-Behavioral-Cognitive-Dynamic) Model of Development focuses on the promotion of optimal developmental growth for each individual. The ABCD model places primary importance on the developmental integration of affect (i.e., emotion, feeling, mood) and emotion language, behavior, and cognitive understanding to promote social and emotional competence. The second model incorporates an eco-behavioral systems orientation and emphasizes the manner in which the teacher uses the curriculum model and generalizes the skills to build a healthy classroom atmosphere (i.e., one that supports the children's use and internalization of the material they have been taught). The third model involves the domains of neurobiology and brain structuralization/organization. PATHS incorporates strategies to optimize the nature and quality of teacher-child and peer-peer interactions that are likely to impact brain development as well as learning. The fourth paradigm involves psychodynamic education (derived from Developmental Psychodynamic Theory) which aims to coordinate social, emotional, and cognitive growth. Finally, the fifth model includes psychological issues related to emotional awareness, or as it is more popularly labeled, emotional intelligence. As such, a central focus of PATHS is encouraging children to discuss feelings, experiences, opinions, and needs that are personally meaningful, and making them feel listened to, supported, and respected by both teachers and peers. As a result, the internalization of feeling valued, cared for, appreciated, and part of a social group is facilitated, which, in turn, motivates children to value, care for, and appreciate themselves, their environment, their social groups, other people, and their world.

Theoretical Orientation

Skill Oriented
Cognitive Behavioral
Biological - Neurobiological
Self Efficacy
Social Learning

Brief Evaluation Methodology

Primary Evidence Base for Certification

Study 7

Outcomes (Brief, over all studies)

Primary Evidence Base for Certification

Study 7

Malti et al. (2012) found that, relative to students in the control schools, students in the intervention schools had significantly fewer externalizing behaviors (e.g., aggression) and ADHD symptoms by the beginning of grade 5 (more than two years after program commencement).

Outcomes

Primary Evidence Base for Certification

Study 7 (Malti et al., 2012) found that by the beginning of grade 5 (just over two years after program commencement), the intervention condition, relative to a control group, showed:

Fewer externalizing behaviors (e.g., aggression)
A reduction in ADHD symptoms

Effect Size

In Study 7 (Malti et al., 2012), the effect sizes of significant results were small to moderate. Specifically, compared to a control group, PATHS significantly reduced aggressive behavior (effect size = 0.42) and ADHD symptoms (effect size = 0.46).

Generalizability

One study Blueprints standards for high-quality methods with strong evidence of program impact (i.e., "certified" by Blueprints): Study 7 (Malti et al., 2011, 2012; Averdijk et al., 2016). The study examined a sample of elementary schools in Zurich, Switzerland, in which the treatment group was compared to a business-as-usual control group.

Potential Limitations

Additional Studies (not certified by Blueprints)

Studies 1-2 (Greenberg et al., 1995; Riggs et al., 2006)

Sample attrition in the first year of implementation was high and reduced the sample size significantly, thus reducing the power to accurately detect differences. High levels of student mobility further limited comparisons between students receiving one or two years of the intervention. No analysis of differential attrition or mobility was conducted for the full sample (although this was done for the regular classroom subsample), which would further inform the interpretations of the analyses.

Greenberg, M. T., Kusche, C. A., Cook, E. T., & Quamma, J. P. (1995). Promoting emotional competence in school-aged children: The effects of the PATHS curriculum. Development and Psychopathology, 7, 117-136.

Riggs, N. R., Greenberg, M. T., Kusché, C. A., & Pentz, M. A. (2006). The mediational role of neurocognition in the behavioral outcomes of a social-emotional prevention program in elementary school students: Effects of the PATHS curriculum. Prevention Science, 7, 91-102.

Studies 3-6 (Kam et al., 2004; Greenberg & Kusche, 1998; Kam et al., 2003; Curtis & Norgate, 2007)

Small sample sizes within treatment groups make it difficult to generalize the outcomes to larger, more diverse populations. No analyses of differential attrition were performed.

Kam, C., Greenberg, M. T., & Kusché, C. A. (2004). Sustained effects of the PATHS curriculum on the social and psychological adjustment of children in special education. Journal of Emotional and Behavioral Disorders, 12, 66-78.

Greenberg, M. T., & Kusche, C. A. (1998). Preventive intervention for school-aged deaf children: The PATHS curriculum. Journal of Deaf Studies and Deaf Education, 3,49-63.

Kam, C., Greenberg, M. T., & Walls, C. T. (2003). Examining the role of implementation quality in school-based prevention using PATHS curriculum. Prevention Science, 4,55-63.

Curtis, C., & Norgate, R. (2007). An evaluation of the Promoting Alternative Thinking Strategies curriculum at key stage 1. Educational Psychology in Practice, 23, 33-44.

Study 8 (Seifert et al., 2004)

No pretest assessment, assessment of baseline equivalence, or information on attrition
The comparisons across cohorts may be confounded by time
Only outcomes based on interviewer ratings reached significance, not outcomes based on child self-reports
Interviewers rating children likely were not blinded to the condition
Reports of teacher dissatisfaction with the program suggest implementation problems

Seifert, R., Gouley, K., Miller, A. L., & Zabriski, A. (2004). Implementation of the PATHS curriculum in an urban elementary school. Early Education & Development, 15(4), 471-486.

Study 9 (Bierman et al., 2010)

A concurrent intervention for high-risk students meant that the sample excluded the worst behaving students and that the other ongoing intervention might have influenced the program outcomes
Baseline tests for equivalence compared schools but not children
Teachers who delivered the intervention also did ratings of classroom children, and results proved stronger for teacher ratings than child ratings
Attrition was high because the study was limited to children who had stayed in the same school for all three years, and differential attrition was apparent on several baseline measures
Contrary to intent-to-treat procedures, only students who participated in the program for all three years were followed and used in the analysis

Bierman, K. L., Coie, J. D., Dodge, K. A., Greenberg, M. T., Lochman, J. E., McMahon, R. J., & Pinderhughes, E. (2010). The effects of a multiyear universal social-emotional learning program: The role of student and school characteristics. Journal of Consulting and Clinical Psychology, 78(2), 156.

Study 10 (Crean & Johnson, 2013; SCDRC, 2010)

Teachers who delivered the program also provided some student measures
Models adjusted for clustering but may have too few schools to obtain reliable estimates
No effects on independent behavioral outcomes
Some evidence of iatrogenic effects on conduct problems in first two years of the program

Crean, H. F., & Johnson, D. B. (2013). Promoting Alternative THinking Strategies (PATHS) and elementary school aged children's aggression: Results from a cluster randomized trial. American Journal of Community Psychology, 52, 56-72.

Social and Character Development Research Consortium (SCDRC) (2010). Efficacy of schoolwide programs to promote social and character development and reduce problem behavior in elementary school children (NCER 2011-2001). Washington, DC: National Center for Education Research, Institute of Education Sciences, U.S. Department of Education.

Study 11 (Little et al., 2012; Berry et al., 2016)

Randomized 64 schools but eight dropped out before baseline
Most measures of children came from teachers who delivered the program
Measures validated by others but no study-specific figures
Incomplete tests for differential attrition
No significant effects on behavioral outcomes

Little, M., Berry, V., Morpeth, L., Blower, S., Axford, N., Taylor, R., . . . Tobin, K. (2012). The impact of three evidence-based programmes delivered in public systems in Birmingham, UK. International Journal of Conflict and Violence, 6(2), 260-272.

Berry, V., Axford, N., Blower, S., Taylor, R. S., Edwards, R. T., Tobin, K., . . . Bywater, T. (2016). The effectiveness and micro-costing analysis of a universal, school- based, social-emotional learning programme in the UK: A cluster-randomised controlled trial. School Mental Health, 8, 238-256.

Study 12 (Schonfeld et al., 2015)

Not an intent-to-treat study - excluded those not participating in all four program years
No tests for baseline equivalence of outcomes
No controls for baseline scores
Tests for differential attrition incomplete
Sample from one large, urban school district

Schonfeld, D. J., Adams, R. E., Fredstrom, B. K., Weissberg, R. P., Gilman, R., Voyce, C., ... & Speese-Linehan, D. (2015). Cluster-randomized trial demonstrating impact on academic achievement of elementary social-emotional learning. School Psychology Quarterly, 30(3), 406.

Study 13 (Fishbein et al., 2016)

No information on student-level attrition
No information on reliability/validity provided for 8 of the 14 outcome measures
Teachers who delivered the program also completed the assessments (with effects in favor of the treatment on 13 of the 13 teacher-rated measures)
There was an effect on 15 out of 23 measures, but only 2 of these effects were assessed using independent measures (and it still wasn't clear whether those collecting these data were blind to condition)
Small sample size (n = 4 schools, and schools were the unit of assignment)
Incorrect level of analysis with no adjustment for unit of randomization (schools)

Fishbein, D. H., Domitrovich, C., Williams, J., Gitukui, S., Guthrie, C., Shapiro, D., & Greenberg, M. (2016). Short-term intervention effects of the PATHS curriculum in young low-income children: Capitalizing on plasticity. Journal of Primary Prevention, 37, 493-511.

Study 14 (Goossens et al., 2012)

Non-random assignment of schools
Many student measures came from teachers who delivered the program
Adjusted for clustering, but the sample of 18 clusters may be too small for reliable estimates
Tests for baseline equivalence showed many differences
No reliable program effects

Goossens, F., Gooren, E., Orobio de Castro, B., Overveld, K., Buijs, G., Monshouwer, K., … Paulussen, T. (2012). Implementation of PATHS through Dutch municipal health services: A quasi-experiment. International Journal of Conflict and Violence, 6, 234-248.

Study 15 (David, 2014)

Non-random assignment of schools with only one control school
Some student measures came from teachers who delivered the program
Low reliabilities for some measures
Incorrect level of analysis
No tests for differential attrition
No significant program effects
Small sample of only three schools

David, M. D. (2014). The effect of Promoting Alternative THinking Strategies on social competence and reading achievement in elementary school children. Master's Thesis. Halifax, Nova Scotia: Mount Saint Vincent University.

Study 16 (Barlow et al., 2015; Hennessey & Humphrey, 2020; Humphrey et al., 2016; Humphrey, Barlow, & Lendrum, 2018; Humphrey, Hennessey et al., 2018; Panayiotou et al., 2020)

Some posttest child measures provided by teachers who delivered the program
Evidence of differential attrition
Few ITT effects at posttest, though stronger effects in QED complier analysis
No program effects at long-term follow-up

Barlow, A., Wigelsworth, M., Lendrum, A., Pert, K., Joyce, C., Stephens, E., . . . Humphrey, N. (2015). Promoting Alternative Thinking Strategies (PATHS): Evaluation report and executive summary. The Education Endowment Fund. Available online: https://files.eric.ed.gov/fulltext/ED581278.pdf.

Hennessey, A., & Humphrey, N. (2020). Can social and emotional learning improve children's academic progress? Findings from a randomised controlled trial of the Promoting Alternative Thinking Strategies (PATHS) curriculum. European Journal of Psychology of Education, 35(4), 751-774.

Humphrey, N., Barlow, A., & Lendrum, A. (2018). Quality matters: Implementation moderates student outcomes in the PATHS curriculum. Prevention Science, 19, 197-208.

Humphrey, N., Barlow, A., Wigelsworth, M., Lendrum, A., Pert, K., Joyce, C., . . . Turner, A. (2016). A cluster randomized controlled trial of the Promoting Alternative Thinking Strategies (PATHS) curriculum. Journal of School Psychology, 58, 73-89.

Humphrey, N., Hennessey, A., Lendrum, A., Wigelsworth, M., Turner, A., Panayiotou, M., . . . Calam, R. (2018). The PATHS curriculum for promoting social and emotional well-being among children aged 7-9 years: A cluster RCT. Public Health Research, 6(10), 1-116.

Panayiotou, M., Humphrey, N., & Hennessey, A. (2020). Implementation matters: Using complier average causal effect estimation to determine the impact of the Promoting Alternative Thinking Strategies (PATHS) curriculum on children's quality of life. Journal of Educational Psychology, 112(2), 236-253.

Study 17 (Novak et al., 2017)

Teachers who delivered the program provided all child measures
Adjusted for clustering within classrooms but not within schools, the unit of randomization
No significant baseline differences for outcomes but no tests for sociodemographic characteristics
No significant main effects at posttest, only effects for a low-risk subgroup

Novak, M., Mihic, J., Bašic, J., & Nix, R. L. (2017). PATHS in Croatia: A school-based randomised-controlled trial of a social and emotional learning curriculum. International Journal of Psychology, 52(2), 87-95.

Study 18 (Hindley & Reed, 1999)

Non-random assignment of schools/units (n = 7)
Some measures came from teachers who delivered the program
No reliability or validity information
Unclear if used intent-to-treat sample
Incorrect level of analysis
Incomplete tests for baseline equivalence
No tests for differential attrition

Hindley, P., & Reed, H. (1999). Promoting alternative thinking strategies: Mental health promotion with deaf children in school. In S. Decker, S. Kirby, A. Greenwood, & D. Moore (Eds.), Taking children seriously (pp. 113-130). London: Cassel Publications.

Study 19 (Ross, Sheard et al., 2011; Ross, Cheung et al., 2011)

Cluster RCT but one of 13 schools dropped out right after assignment and comparison schools adopted the program before the posttest
Teachers who delivered programs provided behavioral measures of children
Little information on reliability and validity of measures
Did not attempt to follow the oldest students after they left middle school
Incorrect level of analysis
Baseline controls not always used
Some baseline differences between conditions
Incomplete tests for differential attrition
No significant effects on independently measured behavioral outcomes

Ross, S. M., Cheung, A., Slavin, R., Sheard, M. K., & Elliott, L. (2011). Promoting primary pupils' social-emotional learning and pro-social behaviour: Longitudinal evaluation of the Together 4 All Programme in Northern Ireland. Effective Education, 3, 61-81.

Ross, S. M., Sheard, M. K., Slavin, R., Elliott, L., Cheung, A., Hanley, P., & Tracey, L. (2011). Evaluation of Together 4 All programme for schools. Institute for Effective Education, The University of York.

Notes

A preschool version of PATHS called Head Start REDI is treated as a separate program in Blueprints.

Endorsements

Blueprints: Promising
Crime Solutions: Effective
OJJDP Model Programs: Effective
SAMHSA : 2.6-3.2

Peer Implementation Sites

Denine Goolsby
Executive Director Humanware
Cleveland Public Schools
1111 Superior Avenue
Cleveland, OH 44114
PH: 216-838-0107

Flavia Hernandez, Principal
McCormick Elementary School
Chicago Public Schools
2712 S. Sawyer Avenue
Chicago, IL 60623
PH: 773-535-7252

Carmen Navarro, Principal
Mariano Azuela Elementary School
Chicago Public Schools
3707 W. Marquette Road
Chicago, IL 60629
PH: 773-535-7395

Caroline Boxmeyer, Associate Professor University of Alabama
Hale County/Sawyerville Head Start Center
850th 5th Avenue East
Box 870326
Tuscaloosa, Alabama
PH: 205-348-1325

Program Information Contact

For curriculum and materials:
PATHS^® Program
Phone: 1-877-71PATHS or 1-877-717-2847
pathsprogram.com

For training:
PATHS^® Training
pathstraining.com

Also see:

SEL Worldwide
selworldwide.org
Contact: Dorothy Morelli
dorothy@selw.org
dorothygm@hotmail.com
Phone: 615-364-6606

References

Study 1

Study 2

Study 3

Study 4

Greenberg, M. T., & Kusche, C. A. (1998). Preventive intervention for school-aged deaf children: The PATHS curriculum. Journal of Deaf Studies and Deaf Education, 3, 49-63.

Study 5

Kam, C., Greenberg, M. T., & Walls, C. T. (2003). Examining the role of implementation quality in school-based prevention using PATHS curriculum. Prevention Science, 4, 55-63.

Study 6

Curtis, C., & Norgate, R. (2007). An evaluation of the Promoting Alternative Thinking Strategies curriculum at key stage 1. Educational Psychology in Practice, 23, 33-44.

Study 7

Averdijk, M., Zirk-Sadowski, J., Ribeaud, D., & Eisner, M. (2016). Long-term effects of two childhood psychosocial interventions on adolescent delinquency, substance use, and antisocial behavior: A cluster randomized controlled trial. Journal of Experimental Criminology, 12, 21-47.

Certified Malti, T., Ribeaud, D., & Eisner, M. (2012). Effectiveness of a universal school-based social competence program: The role of child characteristics and economic factors. International Journal of Conflict and Violence, 6, 249-259.

Malti, T., Ribeaud, D., & Eisner, M. P. (2011). The effectiveness of two universal preventive interventions in reducing children's externalizing behavior: A cluster randomized controlled trial. Journal of Clinical Child & Adolescent Psychology, 40(5), 677-692.

Study 8

Seifert, R., Gouley, K., Miller, A. L., & Zabriski, A. (2004). Implementation of the PATHS curriculum in an urban elementary school. Early Education & Development, 15(4), 471-486.

Study 9

Study 10

Study 11

Study 12

Study 13

Study 14

Study 15

Study 16

Humphrey, N., Barlow, A., & Lendrum, A. (2018). Quality matters: Implementation moderates student outcomes in the PATHS curriculum. Prevention Science, 19, 197-208.

Study 17

Study 18

Study 19

Study 1

Summary

Greenberg et al. (1995) used a randomized control trial with two schools assigned to the intervention group (n = 130 students) and two schools to the control group (n = 156 students). A posttest after one year of the program measured children's self-reported emotional understanding and teacher-reported emotional problems.

Greenberg et al. (1995) found that, relative to the control group, the intervention group showed significantly greater improvements in

Affective vocabulary
Understanding of feelings in others
Comprehension of complex feeling states.

Evaluation Methodology

Design: Participants were selected in different ways for the regular and special needs subsamples. Regular education children were drawn from the second and third grades of four schools in the Seattle school district. These schools were representative of the district profile, with the exception of having a lower percentage of Asian-American students. The prevention model was initially described to principals and teachers at each school. After faculty discussion, building-based decisions were made regarding participation. Schools were aware that once they decided to participate, they had a 50% chance of being randomized as a control school. All four schools that were approached decided to participate and two were randomly assigned as intervention schools.

Classrooms for special need children were drawn from the Seattle, Highline, and Shoreline school districts. A presentation was made to interested special needs teachers. Each teacher was free to participate or decline, knowing that participation ensured only a 50% chance of receiving the intervention. Fourteen teachers elected to participate and were then randomized to either a treatment or control condition. Informed consent was received from approximately 70% of eligible students. Although the study assessed 426 students at the spring pre-test, 96 subjects were lost to follow-up due to school moves between spring of pre- and spring of post-test.

Of the 286 participating children, 130 received the intervention (83 regular education, 47 special education) and 156 were in control classrooms (109 regular education, 47 special education). Children were initially tested in either the spring or fall prior to the intervention year; most children were tested in the spring in order not to delay the onset of the intervention during the first few weeks of class. The children were then interviewed during the following spring, approximately one month post-intervention.

Sample: The final study sample included 286 children (167 males, 119 females) who attended school in the metropolitan Seattle area and were available for both pre- and post-testing. The children were attending first and second grade at pre-test, and second and third grade at the time of post-test. Ages ranged from 6 years, 5 months to 10 years, 6 months at pre-test, with a mean age of 8 years, 0 months. The mean age at post-test was 8 years, 10 months, with a range from 7 years, 0 months to 11 years, 2 months. The sample consisted of 165 Caucasians, 91 African Americans, 11 Asian Americans, 7 Filipino Americans, 7 Native Americans, and 1 Hispanic. Four children were of unknown ethnic origin. Sixty-seven percent of the children (n = 192) were in a regular education program, and 33% (n = 94) were in self-contained special education classrooms. Within the special education sample, children were classified in the following categories according to school records: learning disabilities (n = 44), mild mental retardation (n = 23), severe behavior disorders (n = 22), or multihandicaps (n = 5).

Measures: Each child was individually assessed using the Kusche Affective Interview Revised (KAI-R). This interview was developed as an expansion of previous interviews to assess children's emotional understanding at both an experiential and at a metacognitive level to probe a wide range of affective states and situations. Five domains of emotional understanding were assessed: ability to discuss one's own emotional experiences, cues used to recognize emotions, issues regarding the simultaneity of emotions, display rules for emotions, and whether and how emotions can change.

1. Students' feelings vocabularies were measured by summary counts of total positive and negative feelings stated at the start of the interview and by accurate definitions of five complex feelings (proud, guilty, jealous, nervous/anxious, and lonely). Definitions were coded on a trilevel scale analogous to that of the Wechsler Intelligence Scale for Children-Revised (WISC-R). Two questions were used to assess children's general knowledge of feelings. First, they were asked "Are feelings OK to have?" with follow up questions based on either an affirmative or negative response. Children's abilities to discuss personal emotional experiences were assessed by asking them to provide examples of times when they had felt ten specific emotions (happy, sad, mad, scared, love, proud, guilty, jealous, nervous/anxious, and lonely).

2. Children's ability to identify three specific emotional states in themselves and other people were assessed. They were asked "How do you know when you are feeling ____?" (happy, mad, or jealous), with a follow-up question asking how they know when others are feeling happy, mad or jealous. Responses were scored based on children's use of facial cues, situational cues, and internal feeling states.

3. For issues of simultaneity of emotions, three pairs of feelings were probed. Children were asked "Can someone feel _____ and _____ at the very same time?" (sad/mad, happy/sad, and love/anger). Children were asked to provide examples in support of their responses, which were scored based on the level at which the child was able to report simultaneous feelings directed toward the same target.

4. Assessment of whether and how emotions are hidden included children's understanding about their own ability to hide feelings as well as whether or not other's can hide feelings. In addition, children were asked if feelings should sometimes be hidden. Children were first asked: "Can you hide your feelings?" Percentages of this response were used as a measure of children's understanding of hiding feelings. Children who responded affirmatively were additionally asked how they could hide them, and this was coded using a developmental stage level system based on responses. Parallel questions were asked regarding others hiding feelings from the child, and similar response categories were used. Finally, children were asked: "Do you think there are times when people should hide their feelings?" which was coded using a simple yes or no format.

5. Children were also asked a series of questions to probe their understanding of whether and how emotions can change using yes/no questions. To assess the efficiency of problem-solving, the WISC-R subtests of Coding and Block Design were used.

In addition, the 112-item CBCL-TRF checklist was used to measure behavioral and emotional problems commonly seen by teachers. Responses yield eight narrow-band and two broad-band scores: Internalizing and Externalizing. Separate norms are utilized for boys and girls aged 6-11. To assess individualized changes in behavior, the Teacher Goal-Oriented Rating Form (TGOR) was utilized.

Analysis: A series of three-way repeated measures ANOVAs were conducted to assess the general effects of the intervention. The two between-subjects factors were Intervention Status (intervention vs. control) and Educational Placement (regular vs. special needs); the within subjects factor was Time. As the special needs population was a heterogeneous grouping that included different types of students, further exploratory analyses were conducted to examine potential differential effects of type of special education classification. Using the same analytic model as above, a series of three-way repeated measures ANOVAs were conducted with two between-subjects factors; intervention status (intervention or control) and educational placement (learning disability, behavior disorder, mild mental retardation, and multiple disabilities), the within subjects factor was time.

Outcomes

Post-test:

Feelings vocabulary: Children who received the PATHS curriculum significantly increased the number of feeling words they could generate between pre- and post-tests as compared to children in the control group. Both intervention and control children demonstrated a significant developmental change in terms of the number of positive and negative feeling words listed from pre- to post-test. Treatment group children in regular education classrooms showed a significant increase in knowledge of the five complex emotions relative to regular education children in the comparison group. No intervention effect was found for special needs students. A significant main effect of time indicated a developmental advancement across the year for all students.

General questions about feelings: There were no significant effects of the intervention on children's level of reasoning regarding why (or why not) all feelings were OK.

Discussion of own emotional experiences: Children in the intervention group significantly improved their ability to provide appropriate personal examples of the five basic feelings, but not of the five complex feelings (happy, sad, mad, scared, love, proud, guilty, jealous, nervous/anxious, and lonely) from pre- to post-test as compared to children in the control group. A significant effect for time indicated a general developmental increase in children's ability to give more appropriate examples of complex feelings at post-test.

Cues to recognize emotions: No effects of the intervention were found on children's ability to describe cues used to recognize their own emotions. A main effect for time, however, indicated a general developmental advance in the level of children's reasoning about recognizing their own emotions between pre- and post-test. Children in the intervention group improved their level of reasoning with regard to knowing how others feel more than did children in the control group. In addition, a significant time effect indicated a general developmental advance in the level of children's reasoning about recognizing the feelings of others.

Understanding simultaneous feelings: There were no significant intervention effects. However, a significant time effect indicated developmental increases over the one year period.

Display rules for emotions: Children in the intervention group said "yes" significantly more often at post-test when asked if feelings could be hidden by themselves and by others when compared to children in the control group. No developmental change was noted for these responses. No intervention effects were found for children's level of reasoning about hiding their own feelings. Intervention children in special needs classrooms increased their level of understanding for other people's strategies for hiding feelings as compared to special needs children in the control group. A significant time effect indicated general developmental improvement in the level of reasoning about both themselves and others.

Changing feelings: Children in the intervention group were significantly more likely to respond positively to questions about changes in feeling states than were children in the control group. However, significant Time x Intervention Status x Educational Placement effects indicated that much of the Time x Intervention Status effect was the result of a large change in the special needs intervention group. Intervention children also showed a higher level of reasoning in their examples of how feelings can change when a picture cue was not provided compared to children in the control group. Special needs children in the intervention group improved significantly more than intervention children in regular education classrooms, as well as more than children in the control group.

Differential effects within special needs: There were fewer differential effects of intervention than would be expected by chance, indicating that improvements in special needs children were shown equally across the three identified groups.

Differential effects related to level of teacher-rated psychopathology: In order to examine the question of differential effects related to level of child behavior problems, two sets of analyses were conducted; one on the level of externalizing problems and the other on the level of internalizing problems. Among intervention students, those with low and moderate scores showed significant improvement for externalizing problems. Intervention children with high TRF scores showed the greatest relative improvement in the number of appropriate examples given for basic emotions, but comparison children with high TRG scores showed the greatest relative decline during the intervention period. Similarly, for both questions concerning efficacy regarding changing feelings, intervention children with moderate and high scores showed the greatest relative improvement, while control children with high TRF scores showed significant declines.

Intervention children with the lowest scores for internalizing problems demonstrated the greatest improvement on providing appropriate examples of advanced emotions, and comparison children with moderate and high TRF scores declined during the intervention period. These findings were repeated for both questions concerning efficacy regarding changing feelings. Similarly, intervention children with moderate or high TRF scores showed significant change in developmental level of understanding regarding how feelings change, and children in the control group with high internalizing scores showed declines during the same period. Regular education boys in the intervention group scored significantly higher on social competence scales as assessed by teacher, parent, and child ratings.

Study 2

Summary

Riggs et al. (2006) used a randomized controlled trial with two schools assigned to the intervention group (n = 153) and two schools to the control group (n = 165). Assessments at posttest (nine months after baseline) and one year after posttest measured executive function and teacher-reported behavior problems.

Riggs et al. (2006) found that, relative to students in the control group, students in the intervention group showed significantly greater improvements in

Teacher-reported externalizing and internalizing behavior.

Evaluation Methodology

Design: Four schools were randomly assigned to treatment or comparison conditions. The total recruited sample was 329 students enrolled in the second and third grade at the time of pretesting, and the final sample was 318 students. The sample sizes equaled 153 for the intervention group and 165 for the control group. A total of 68 classroom sessions were devoted to PATHS teaching.

Data were collected at pretest, nine months later (posttest), and one year follow-up.

Sample Characteristics . About 50% were girls, 55% were Caucasian, 33% were African American, and 22% were from other racial backgrounds.

Measures . Student surveys: 1) IQ was estimated using a two subtest short form of the WISC-R, which includes Vocabulary and Block Design; 2) Inhibitory Control was measured using the Stroop Test, which activates the anterior cingulated in adults, a neural region that interacts with both the limbic and prefrontal function; and 3) the Verbal Fluency subtest of the McCarthy Scales of Children's Abilities, which requires children to name as many items as they can in four common categories.

Teacher Surveys: Child behavior problems were assessed using the Teacher Report Form of the Child Behavior Checklist.

Analysis . First, hierarchical regression models were estimated to determine the effects of the PATHS Curriculum on teacher-reported behavior outcomes. Covariates for these models included pretest behavior scores, age, and IQ. Next, hierarchical regression models estimated the effects of the PATHS Curriculum on children's inhibitory control and verbal fluency 9 months later. Covariates included pretest neurocognitive scores, age, and IQ. Third, hierarchical linear models estimated the effects of both the intervention and the neurocognitive mediators on the behavioral outcomes, again including pretest covariates. The mediators in this third model should significantly influence the outcomes and attenuate the effect of the intervention found in the first model.

Missing data were replaced with sample means.

Outcomes

Implementation Fidelity . Treatment teachers attended a three-day training and received weekly consultation and observation from project staff. Fidelity assessments were conducted.

Baseline Equivalence and Attrition . Paired t-tests revealed group differences at pretest for Verbal Fluency and IQ, with the control group scoring higher. No analysis of differential attrition was reported, perhaps because missing data were replaced with sample means.

Posttest and Long-Term . The results based on both the posttest and one-year follow-up focused on each step of the mediation analysis.

(1) Regression analyses indicated that there was a significant prevention effect on both inhibitory control and verbal fluency.

(2) Posttest inhibitory control was negatively related to teacher-reported externalizing and internalizing behavior at 1-year follow-up. Posttest verbal fluency was negatively related to teacher ratings of internalizing behavior at 1-year follow-up. That is, children who had greater inhibitory control at posttest demonstrated fewer externalizing and internalizing behavior problems and children who were more verbally fluent demonstrated fewer internalizing behavior problems at 1-year follow-up. After taking these neurocognitive variables into account, the intervention condition continued to have a significant effect on externalizing behavior and internalizing behavior.

(3) Sobel tests for mediation demonstrated that inhibitory control at immediate posttest significantly mediated the relation between experimental condition and both teacher-reported externalizing and internalizing behavior at 1-year follow-up. These findings demonstrate that the direct effect of the PATHS program on inhibitory control significantly reduced the relation between PATHS and both outcomes. However, a Sobel test demonstrated that the mediating role of verbal fluency in the relation between the experimental condition and teacher-reported internalizing behavior only approached significance.

Study 3

Summary

Kam et al. (2004) used a randomized controlled trial with 18 teachers of special education classes and 133 special need children assigned to intervention or control groups. Assessments in the spring over the next three years measured depression, problem behavior, and social competence.

Kam et al. (2004) found that, for a sample of special-needs students, the intervention group relative to the control group had significantly lower scores on

Internalizing
Externalizing behavior
Depression
Self-control.

Evaluation Methodology

Design: This study examined the effects of the PATHS Curriculum on diverse outcomes at post-test, 1-year, 2-year, and 3-year follow-up. The sample consisted of 133 special needs children (in grades 1-3 at time of pretest) who had been previously assigned by their schools to special education classes. Eighteen teachers of special education classes elected to participate and were randomly assigned to either the intervention or control group. About 70% of the parents of children in the classes consented to testing. Children were initially tested in either the spring or fall prior to the intervention year; they were assessed again each spring for the next three years.

The rate of missing data varied by outcome, but large attrition generally occurred in the follow-up years. All participating children had baseline and posttest data, but from 6% to 48% were missing data on some measures for the follow-ups.

Sample: The sample included 52% White, 40% African American, and 8% children from other ethnic minority populations. All children had been previously assigned by their schools to special education classes. Of the 133 children, most had learning disabilities (53), but the sample also included children with mild mental retardation (23), physical disabilities (21), emotional and behavioral disorders (31), and multiple handicaps (5).

Measures: Measures are described in Study 1:

Feelings Vocabulary
Social Problem-Solving Skills
Child Self-Report of Depression
Teacher Ratings of Problem Behavior (Externalizing and Internalizing)
Teacher Ratings of Social Competence

Analyses: Students' outcome trajectories were modeled across the early elementary grades using individual growth-curve analyses and multilevel models. Trajectories for students in the intervention group were compared to those for students in the control group. Sustained intervention effects were indicated as positive changes in trajectories above and beyond those observed in the control group.

In estimating outcome trajectories in the multilevel models, the analysis used all available data for subjects, even if not complete for all assessments. The analysis thus meets the intent-to-treat criterion.

Outcomes

Baseline Equivalence and Differential Attrition. At baseline, the intervention and control groups were equivalent on all outcomes except internalizing behaviors (where the intervention group had higher scores). The study made no mention of tests for differences across the groups on sociodemographic characteristics or types of disability.

The study did not report on differential attrition. It noted that the multilevel models used the maximum amount of information that, under the assumption of data missing at random, gives unbiased and efficient estimates. However, the study did not compare the rate of attrition across groups or the baseline values of those having missing data with those having complete data,

Posttest and Long-Term. Separate analyses were not done for posttest and follow-ups. Rather, the results examined linear changes in outcomes over the full period from baseline to 3-year follow-up.

Trajectories of teacher-rated behaviors: Teachers' ratings of students externalizing and internalizing behaviors can best be described as changing linearly from Time 1 (baseline) to Time 4 (3-year follow-up). A significant difference was found between the intervention and control groups in the estimated mean rate of growth in both types of behaviors, with teacher ratings of behaviors decreasing over time in the intervention group whereas those in the control group increased over time in both cases. No significant group differences were found for the trajectories of teacher-reported social competencies (frustration tolerance, assertive skills, task orientation, and peer sociability).

Trajectories of self-reported depression: A linear growth curve model fit the Child Depression Inventory (CDI) data relatively well. Depression scores reported by children in the intervention group declined at a significantly greater rate than did the scores reported by children in the control group.

Affective vocabulary: Linear trend models fit well with both positive and negative feelings vocabulary data. A significant difference between the two groups was found in the size of the negative feelings vocabulary at Time 4. There was no significant difference in the rates of change in the size of negative and positive feelings vocabulary.

Social problem-solving skills: No significant intervention group difference was found in the growth curve analyses of efficacy in social problem solving among children in the special needs classrooms. Children in the intervention displayed a marginally significant reduction in the percentage of aggressive solutions they generated and a significant increase in the percentage of solutions that were nonconfrontational and indicated self-control.

Study 4

Summary

Greenberg and Kusche (1998) combined a randomized controlled trial and a quasi-experimental design. The study randomly assigned six Seattle-area schools with 79 severely and profoundly hearing-impaired children to an intervention group or a waitlisted control group. The one-year posttest compared the two conditions on measures of academic achievement and behavioral difficulties, but the long-term analysis occurred after some control schools had joined the program.

Greenberg and Kusche (1998) found that, for a sample of deaf children, the intervention group relative to the control group showed significantly greater improvements in

Social problem-solving skills
Emotional recognition skills
Reading achievement
Non-verbal planning skills
Teacher and parent-rated social competence.

Evaluation Methodology

Design: The participants in this project consisted of 79 severely and profoundly hearing-impaired children who were enrolled in self-contained classrooms for deaf children (grades 1-6) in 6 local elementary schools in the Seattle area. The study design was quasi-experimental. Schools were randomly assigned to intervention vs. waitlist control group status. After the first year, the intervention was replicated on the wait-list control children. Teachers were trained in the intervention model and implemented PATHS lessons during most of one school year. The participants represented approximately 85% of all of the deaf children who were served in the area and who also met the following criteria: (1) basic education occurred using both sign language and speech (Total Communication), (2) unaided hearing loss was >60 decibels in the better ear averaged across the speech range, (3) deafness was diagnosed prior to 36 months of age, (4) non-verbal intelligence was greater than 75, and (5) no known significant additional handicaps were present. The intervention and comparison groups did not differ significantly on relevant pretest variables.

Sample: The children ranged in age from 67 to 146 months of age. The sample was primarily White (84%), and the average child had a profound unaided hearing loss.

Measures: Measures included an interview of social-problem solving, tests of non-verbal cognitive abilities, achievement testing, and teacher and parent ratings of behavioral difficulties and competencies.

Analysis: Mediational analyses were conducted to test the theoretical model that changes in understanding of emotions, ability to take others' perspectives, and social problem-solving skills were related to changes in behavioral outcome. For a detailed overview of the analyses used in all of the PATHS evaluations, please see the description above for the pilot study.

Outcomes

Post-test: Results indicated that the intervention led to significant improvement in students' social problem-solving skills, emotional recognition skills, and teacher and parent-rated social competence. Teacher ratings of behavior indicated that there were significant improvements in social competence and in frustration tolerance. Results also indicated significant improvement in reading achievement and non-verbal planning skills in the intervention sample. There was no effect in this normative sample on teacher or parent-rated psychopathology.

Mediational analyses: Results indicated that (a) improvement in emotional understanding was related to lower parent report of lowered externalizing problems at home; (b) improvement in role-taking skills was related to higher teacher ratings of emotional adjustment, and reductions in behavior problems at school and at home; and (c) improvement in problem-solving was related to higher teacher ratings of emotional adjustment and social competence and decreases in behavior problems at home and school.

Long-term: One- and two-year post-test results indicated maintenance of effects. Results with the wait-list control group indicated replication of effects in the second sample.

Study 5

Summary

Kam et al. (2003) used a quasi-experimental matched-group design with three high-risk schools assigned to the intervention group (n = 164 students) and three lower-risk schools assigned to the control group (n = 186 students). The intervention combined PATHS with Big Brothers/Big Sisters. The study followed first-grade students from fall to spring, with teachers rating the students on social competence, aggression, and attentional control.

Kam et al. (2003) found that, in a study of PATHS combined with Big Brothers/Big Sisters, the intervention group relative to the control group showed significantly greater reductions in

Aggression
Behavioral dysregulation.

Evaluation Methodology

Design: This evaluation used a quasi-experimental matched-group design. Random assignment to intervention groups was not carried out because the local funding source required that schools in neighborhoods with the most high-risk profile receive the intervention. The sample in the overall intervention consisted of 350 first graders in six elementary schools in Harrisburg, Pennsylvania. Three of the schools received the intervention and three other schools served as comparison schools. A total of 13 classrooms with 164 students received the intervention. All of the participating schools served neighborhoods with very high rates of poverty and crime. Due to high family mobility and school reorganizations, student mobility averaged approximately 35-40% during the 1999 school year. A psychological consultation agency was contracted to coordinate the program implementation in targeted schools in the Harrisburg school district. Teachers received training from the program developers in two one-day workshops held approximately six weeks apart. Additional support for implementing the PATHS Curriculum was provided by an on-site PATHS Coordinator. The support included a weekly visit by the coordinator to PATHS classrooms and continuous consultation with the teachers, and logistics/materials support. The PATHS coordinator met with the school building principals on a monthly basis; principals were also strongly encouraged to attend workshop training. The PATHS Coordinator received, as needed, ongoing consultation from the program developers. The first year intervention is more brief than a "standard" implementation of the PATHS Curriculum. Because of the timing of funding, the curriculum was taught for only four months, from January to April, as compared to a full school year.

**While the PATHS Curriculum was the major component of the intervention (see above studies for a detailed description of the Curriculum), the Dauphin County project also had a second component of intervention provided by the Big Brothers and Big Sisters in the area. The latter provided mentoring to 14 students in the intervention schools who teachers identified to have special needs.

Sample: The sample was 47.14% male and 79.42% African American. Approximately 85% of the children in the schools sampled came from low-income families (as indexed by participation in the free lunch program). More than 65% of the students in the participating schools performed below the 30th national percentile in reading and mathematics.

Measures: Students behaviors at school were assessed by teachers at both the pre- and post-testing period using the 31-item Teacher Social Competence Rating Scale (TSCRS). Behaviors measured included: aggression, dysregulated behaviors, attentional control, and social-emotional competence. The quality of PATHS program implementation in classrooms was measured by observations made by the local PATHS Coordinator. Two aspects of classroom environment and implementation quality were rated: (1) How well are PATHS concepts and skills taught by the teacher? and (2) How well is the teacher generalizing PATHS skills across the school day? Principal support for PATHS implementation was measured by PATHS Coordinator and PATHS Supervisor ratings. Two measures of ratings were used: (1) quality of principal support for PATHS, and (2) quality of support for the PATHS technical assistance team (PATHS Coordinator and Supervisor).

Analysis: Analysis of covariance was used to analyze each outcome separately. A baseline measure of the outcome was entered in to the regression as well as the two dummy variables representing principal support. The classroom implementation measure was included as a continuous variable, as well as interaction terms between principal support and implementation. Planned comparisons were made on the predicted changes in classrooms that had a high and low degree of implementation, but had different levels of principal support. Due to a high rate of intercorrelation, observational ratings of PATHS program implementation were averaged over a four-month period.

Outcomes

Post-test: There was no significant main effect for implementation quality in predicting any of the four outcomes. Significant main effects were found, however, for principal support. In addition, significant interaction effects were found between the effects of principal support and implementation in the changes in all four domains (aggression, behavior dysregulation, social-emotional competence, and on-task behaviors). These results indicate that the effects of implementation work differently in schools with different degrees of principal support. When both the quality of implementation and principal support were high, students demonstrated significantly greater reductions in aggression and behavioral dysregulation, and significant increases in emotional competence when compared to students in the school with the lowest principal support. Similarly significant, but weaker differences on the same student outcomes were also shown when the school with the lowest principal support was compared to the average of the two schools with higher principal support.

Long-term: No long-term data was collected or analyzed in this evaluation.

Study 6

Summary

Curtis and Norgate (2007) used a quasi-experimental design with five intervention schools (n = 114 students) matched to three control schools (n = 173 students) in the United Kingdom. The sample students, ages 5-7, were assessed at the end of one academic year on emotional symptoms, conduct problems, hyperactivity, peer problems, and consideration.

Curtis and Norgate (2007) found that the intervention group relative to the control group showed significantly greater improvements on

Emotional symptoms
Conduct problems
Hyperactivity
Peer problems.

Evaluation Methodology

Design: This quasi-experimental project (labeled by the investigators as a pilot project) implemented PATHS in five schools, with three control schools. Random assignment was not conducted, though groups were matched on age range and catchment area. At least two members of staff from each school attended two days of initial training provided by educational psychologists, and these staff then conducted the training in their own schools.

Sample: PATHS was administered to the children in Key Stage 1 (the term for Year 1 and Year 2 in England and Wales, ages 5 to 7) in five treatment schools and three comparison schools. It is not noted how many students received the curriculum. Surveys were completed by 287 students (114 PATHS and 173 control), and a semi-structured interview was completed with 17 teachers in the PATHS schools.

Measures: The Strengths and Difficulties Questionnaire (SDQ, Goodman, 1997) was administered to the 287 students, and a semi-structured interview was completed with 17 teachers. The SDQ is composed of five constructs: (1) Emotional symptoms; (2) Conduct problems; (3) Hyperactivity; (4) Peer problems; and (5) Consideration.

Analyses: Student surveys: Independent t -tests were conducted to compare the pretest scores on the five constructs of the SDQ between the intervention and control group. Pretest scores were significantly different for the treatment and comparison groups. Mixed analyses of variance (ANOVAs) and paired t -tests were conducted to compare the mean scores before and after intervention for both groups on each of the five constructs. Analyses examining mediating variables were not conducted.

Teacher interviews: Interviews were recorded and transcribed, and then analyzed using a content analysis approach (Weber, 1990). Major categories were identified and changed until they were able to accommodate all of the data. The material from these categories was then organized into subcategories.

Outcomes

Posttest (School Surveys): The ANOVA indicated that the change over time in mean scores was statistically significant, as was the interaction between the two conditions, in all of the dimensions within the SDQ (Emotional symptoms, Conduct problems, Hyperactivity, Peer problems, and consideration, all p values <.0001). The change in scores from pretest to posttest was significant for the intervention group but not for the comparison group.

Teacher Interviews: Results indicated that in all schools using PATHS, a whole-school emphasis had been adopted, PATHS materials were displayed in hallways and classrooms, staff showed flexibility in the use of sessions, and all staff were knowledgeable about the program. Teachers particularly felt that the introduction of the "pupil of the day" had a positive impact on the students. Teachers also noted their own ideas to keep the curriculum fresh and interesting (e.g., role-play, social and self-reinforcement, story-telling, and modeling). All schools using PATHS worked to get parents involved so that the ideas and skills in PATHS were recognized and reinforced at home. Perceptions of how PATHS helped (despite that it was one of many programs being delivered across schools): Building a vocabulary of feelings, Developing the ability to describe own feelings, Recognizing emotions in others, Empathy, Developing self-control/managing emotions, Developing cooperation, and Dealing with problems. Finally, teachers felt that PATHS was a good fit for their schools. Criticisms included that teachers wanted more ideas on how to keep the curriculum "fresh and exciting," and practical ideas as the curriculum involved much sitting listening, and discussion, which was not appropriate for all children.

Study 7

Summary

Malti et al. (2012) found that by the beginning of grade 5 (just over two years after program commencement), the intervention condition, relative to a control group, showed

Fewer externalizing behaviors (e.g., aggression)
A reduction in ADHD symptoms.

Evaluation Methodology

Design:

Recruitment: The data for this study were obtained from the Zurich Project on the Social Development of Children (Z-Proso), an ongoing prospective longitudinal study of a cohort of children who entered elementary school in the city of Zurich, Switzerland, in 2004. The final sample consisted of 1,675 first graders from 56 elementary schools.

Assignment: Sampling was based on a cluster randomized approach involving all 90 public primary schools in Zurich. Schools were blocked by school size and socio-economic background of the catchment area, then a stratified sample of 56 schools (comprising 1,675 children) was drawn. All selected schools participated, and 14 "quadruplets" of similar size and socio-economic background were formed. Schools in each quadruplet were randomly allocated to four treatment conditions: PATHS (n = 442), Triple-P (n = 422), PATHS+Triple-P (n = 397), control (n = 414). The programs were delivered sequentially: Level 4 of Triple P was implemented between waves 1 and 2 (i.e., year 1 of primary school) whereas PATHS was implemented between waves 2 and 3 (i.e., year 2). Malti et al. (2011) used student-, parent- and teacher-reported data at annual intervals between 2004/2005 and 2006/2007 (T1-T3); Wave 4 (T4) was conducted 2 years later in 2008/2009, so either at the end of grade 4 (two years after program commencement) or at the beginning of grade 5. Malti et al. (2012) used teacher-reported results at Wave 4 (T4). It is unclear whether the data collection for Malti et al. (2011, 2012) was conducted at the beginning or the end of the particular academic year. Averdijk (2016) used data from the baseline assessment collected in 2004/2005 (wave 1, when students were 7 or 8) and follow up data collected in 2011 (wave 5, when students were 13 years old) and 2013 (wave 6, when students were 15 years of age).

The version of PATHS used in this study was the "Fast Track Project" version. This school-based 1-year program included 46 primary lessons addressing problem-solving skills, social relationships, self-regulation, rule understanding, emotion understanding, and positive self-esteem. The PATHS classes consumed about 67 min per week during the 1-year implementation phase with an average of 2.4 sessions per week. PATHS was implemented in year 2 (2005/2006) when students were in second grade. Trained teachers implemented the lessons, and five trained coaches were also available to visit each class four to six times during the implementation period to give teachers feedback. Implementation was monitored using teacher and child questionnaires developed by the program designers. Teachers were also observed by the coaches. Content, methods, and materials were culturally adapted to the Swiss school system.

Attrition: As reported in Malti et al. (2011), at T1 the response rates were 81% for the child interviews (n = 1,361), 74% for the parent interviews (n = 1,240), and 81% for the teacher assessments (n = 1,350). At T2 the respective response rates were 97%, 95%, 96%; at T3 96%, 95%, 94%, and at T4 83%, 86%, and 92%. (Note - as Malti et al., 2012, reported teacher outcomes at T4, the completion rate was therefore 92%). The computer-assisted, face-to-face interviews with parents were conducted at the parent's home. In the first three waves, computer-assisted child assessments were conducted at the school. In the fourth wave, classroom-based paper-and-pencil surveys were utilized. The child's teacher completed questionnaires on the child's social development and returned it by mail. The interviews were conducted by 44 trained interviewers. It is unclear whether interviewers were blind to group membership. The overall attrition rate reported in Averdijk (2016) for the control group was 27% in wave 5 and 30% in wave 6. For the PATHS group, overall attrition was 29% (wave 5) and 30% (wave 6).

Sample: At study commencement for wave 1, the sample consisted of first graders (mean age 7 years) with 48% girls and 52% boys. About 91% of the students were in regular classes, whereas the other 9% were in special-needs classes. About 78% lived with their biological parents, 20% with their biological mother only, and 2% with their biological father only or with foster parents. As for the socioeconomic background of the primary caregiver, 25% had little or no secondary education, 30% had vocational training, 29% had attended vocational school or had a baccalaureate degree or advanced vocational diploma, and 16% had a university degree. Socio-economic status was assessed through the International Socio-Economic Index of occupational status (ISEI) with an average score of 44.56 for households in the sample. In only half of the households (55%) at least one parent was of Swiss nationality, demonstrating the cultural diversity of the sample.

Measures: Primary outcome measures reported in Malti et al. (2011, 2012) include:

Externalizing behavior: The teachers and parents evaluated the externalizing behavior of the children using Tremblay's Social Behavior Questionnaire (SBQ). Three subscales of the SBQ were employed measuring aggressive behavior (alpha = .72-.93), impulsivity/attention deficit hyperactive disorder (ADHD; alpha = .62-.91), and non-aggressive conduct disorders (NACD; alpha = .69-.78 for teacher's reports). For youth self-reports, the children were shown drawings of specific behaviors of a child and asked whether (s)he sometimes does what is shown in the pictures (answer options yes/no).
Social competence: Social competence of children was measured using the Prosocial Behavior subscale of the Social Behavior Questionnaire (7-10 items, alpha = .59-.93). Children were presented with hypothetical scenarios and their responses were rated as aggressive strategies and socially competent strategies (interrater agreement .80-.87).

Averdijk et al. 2016 included 13 measures, five of which reported reliability statistics for the sample. These measures, eight from the youth and five from the teachers, included:

Self-reported delinquency (15 items constructed into a "total variety scale" which the author reports has been termed the preferred criminal offending scale as they display high reliability and validity).
Teacher-reported deviance (7 items constructed into a total variety scale).
Self-reported police contacts related to an offense.
Self-reported substance use (the sum of 4 items) and teacher-reported substance use (the sum of 3 items on smoking, drinking, and illegal drugs).
Teacher- and self-reported aggressive behavior, as measured by the Social Behavior Questionnaire (SBQ). The reliabilities were α=0.84 (wave 5) and α=0.83 (wave 6) for the youths and α=0.93 (wave 5) and α=0.92 (wave 6) for the teachers.
Self-reported peer aggression derived from Olweus (1993). The reliabilities were α=0.78 (wave 5) and α=0.75 (wave 6).
Teacher- and self-reported prosocial behavior, as measured by the Social Behavior Questionnaire (SBQ). The reliabilities were α=0.82 (wave 5) and α=0.80 (wave 6) for the youths and α=0.93 (wave 5) and α=0.90 (wave 6) for the teachers.
Self-reported conflict resolution, an eight-item scale adapted from Wetzels et al. (2001). Four items were used to create a measure for aggressive conflict resolution strategies (α=0.72 at wave 5, α=0.67 at wave 6) and 4 items comprised the competent conflict resolution strategies scale (α=0.71 at wave 5, α=0.71 at wave 6).
Teacher-reported non-aggressive conduct disorder, as measured by the Social Behavior Questionnaire (SBQ). The reliabilities were α=0.83 (wave 5) and α=0.85 (wave 6).

Analysis: Malti et al. (2011) employed longitudinal multilevel models to account for the hierarchical data structure (time was nested within children and children were nested within schools). Treatment assignment, measured at the school-level, was coded as two dummy variables to compare the PATHS and Triple-P conditions separately with the control condition. This design allowed for specifying different timings of the interventions as well as the inclusion of an interaction term between PATHS and Triple-P conditions. The models implicitly accounted for baseline scores.

Malti et al. (2012) used hierarchical linear modeling (HLM Version 6.08) to assess the intervention effects at the fourth time point (when children were in fifth grade). Treatment was coded as a dummy variable to compare the PATHS and Triple-P conditions separately with the control condition. Thus, a standard approach to coding a 2 x 2 design (two levels of factor A crossed with two levels of factor B) was used to analyze program effects. The model incorporated three levels: data-collection wave (level 1), child (level 2), and school (level 3). These levels were employed in conjunction with a two-way interaction between time and intervention to measure the treatment effects.

Averdijk et al. (2016) conducted multilevel modeling with youths at level 1 and schools at level 2. Effects were estimated with the inclusion of several baseline sociodemographic covariates. Because the teen outcomes had little meaning at the age 7 pretest, the models used teacher and child measures of externalizing behavior as proxies for baseline outcomes.

Intention-to-treat: Malti et al. (2011, 2012) followed the intent-to-treat principle. For the Malti et al. (2011) analyses, multiple imputation was used to account for missing data for children and parents. However, because Little's MCAR test was not significant for the teacher's data, multiple imputation was not necessary (for both Malti et al., 2011, 2012). Averdijk et al. (2016) handled missing data with robust full-information maximum-likelihood (FIML). Two sets of analyses were performed. The first used the dataset with all available data points of the target sample (n = 1,580 of 1,675). The second used stricter inclusion criteria, including only participants who participated both at age 7 years and at either age 13 years or age 15 years (n = 1,275 of 1,675). Results below are reported for the first set of analyses.

Outcomes

Implementation fidelity: The five PATHS coaches visited each class four to six times during the implementation period, after which they discussed the lesson with the teacher. The checklists completed by the coaches indicated high implementation quality and quantity. On average, 27 of the 30 obligatory lessons, 30 of the recommended vignettes, and 25 small-group activities were completed in the classes. The quality of classroom leadership, child motivation, and teaching of PATHS concepts received marks of 88%, 82%, 74%, respectively.

Baseline Equivalence: Analyses reported by Malti et al. (2011) revealed no statistically significant baseline differences on any of the teacher, parent, or child outcome measures across treatment conditions. However, results for a baseline comparison of socio-demographic factors were not presented. Malti et al. (2012) appears to have reported the same figures as Malti et al. (2011). Averdijk et al. (2016) noted only that baseline measures of externalizing did not differ significantly across conditions.

Differential attrition: Malti et al. (2011) compared attrition rates by condition, reported the Little test, and used imputation when the Little test was significant. Malti et al. (2012) noted that the rates of attrition did not differ significantly across conditions for any of the four waves. Blueprints calculations also showed that, based on What Works Clearinghouse standards, the attrition rates did not indicate likely bias. Averdijk et al. (2016) reported similar rates of attrition by condition at wave 5 and wave 6, used FIML for missing data due to attrition, and conducted a sensitivity test to demonstrate that missing data did not change the findings.

Posttest and Long-Term:

As reported by Malti et al. (2011), the employed growth curve models do not allow for the separation between results for posttest and follow-up. As such, results were reported for time x group interactions.

Malti et al. (2011) reported a few program effects for children's externalizing behavior across time. For the intervention group, 2 of 3 tests were significant based on teachers report, 1 of 2 was significant based on parent reports, and none of 2 were significant based on child reports. Compared to a control group, the intervention group significantly reduced aggressive behavior across time. This result was observed for both teacher (d=0.42) and parent reports (d=0.26, p<.05) but not for child reported behavior. Since parents and teachers both delivered the intervention, these are considered non-independent measures. The intervention group also evidenced a significantly greater reduction in ADHD symptoms compared to the control group (d=0.46), but only based on measures that included teacher reports across time points 1-3 from teachers who delivered the program combined with a fourth time point from teachers who did not deliver the program. Because teachers who deliver the program have a stake in good outcomes, Blueprints would need additional types of independent measures (such as data collected by researchers who are blind to study condition and/or student reports) to certify the ADHD outcome. Finally, no significant change was observed for nonaggressive externalizing behavior for the intervention group compared to the control group.

Malti et al. (2011) found evidence for the moderating role of baseline behavior on program effectiveness. Three of the four significant effects (three-way interactions) suggested that children with high levels of baseline problem behavior benefitted more from either or both interventions than children with low levels of baseline problem behavior.

In Malti et al. (2012), compared to the control group, children in the intervention group were reported by their teachers as having a greater decrease in aggressive problem behaviors (effect size = 0.42) and ADHD related problems (effect size = 0.46). These outcomes were reported when children were in fourth grade by teachers who were not involved in delivering the program (in grades K-3). As such, they are considered independent outcome measures. Meanwhile the results suggest that overall, children in the intervention group did not differ from children in the control condition on prosocial behavior.

Treatment effects reported in Malti et al. (2012) were moderated by level of moral emotions at baseline, where children who exhibited higher levels of moral emotions and received the intervention showed larger reductions in aggression at follow-up than children who initially had low levels of moral emotions. Other moderator variables that predicted higher aggression at follow-up included baseline aggression, financial problems, single parent household, and non-Swiss nationality. Finally, SES and female gender predicted lower aggression at follow-up. On the measure of ADHD, there was a significant teacher-reported decrease in ADHD-related problems among children who received PATHS, compared to the children in the control condition. However, these treatment effects were moderated by the level of moral emotions at baseline and by initial level of competent problem-solving strategies, where intervention students who exhibited higher levels of moral emotions and competent problem-solving strategies showed larger reductions in teacher-reported ADHD. Additionally, aggressive problem-solving strategies, baseline ADHD, family financial problems and single-parent household predicted higher ADHD at follow-up.

In the study conducted by Averdijk et al. (2016), only 1 of 13 tests at age 13 (seven years after program commencement) emerged as significant. Results showed a reduction in adolescent delinquency (i.e., fewer self-reported police contacts) for the intervention group (effect size =−0.225). In addition, there were no significant effects at age 15 (nine years after program commencement).

Study 8

Summary

Seifert et al. (2004) used a quasi-experimental design with one elementary school in Rhode Island and two cohorts of children - one younger cohort receiving the one-year intervention in 2001 during first grade (n = 62), and one older cohort not receiving the intervention (n = 75). Both cohorts were assessed in second grade, but in different years, on depression, socio-emotional competence, and peer relationships.

Seifert et al. (2004) found that, relative to the control cohort, the intervention cohort showed significantly greater improvements on

Global social competence
Social-emotional competence.

Evaluation Methodology

Design . This QED examined one urban elementary school, a magnet school in inner-city Providence, Rhode Island. The design compared two cohorts of children - one younger cohort receiving the 1-year intervention in 2001 during first grade, the other older cohort not receiving the intervention. The intervention group was thus tested in 2002 in second grade, while the control group was tested in 2001 in second grade. Rather than randomization, the study assumed that the intervention second graders in 2002 were equivalent to control second graders in 2001 except for experiencing the intervention.

All available children from the three first-grade and three second-grade classrooms participated. The first cohort undergoing the 1-year intervention had 62 students. The second cohort or control group began with 75 students. However, a group of 13 students entered the school in 2002 and did not experience the intervention during the previous year. Although part of the intervention cohort and tested in second grade (2002), their results were combined with the control group to increase the control group n to 88. The list below contrasts the three cohorts:

Cohort 1 (Treatment n=62): 2001 (first grade) PATHS; 2002 (second grade) assessment
Cohort 2 (Control n=75): 2001 (second grade) assessment; 2002 (third grade) not participating
Cohort 3 (Control n=13): 2001 (not attending); 2002 (second grade) assessment

No pretest assessments were conducted, and no information on attrition is available.

Sample Characteristics . The students were 68% Hispanic and 14% black. About 25% were interviewed in Spanish. Across the full school, 94% qualified for subsidized lunch programs; 31% received bilingual programs; and less than 40% met state standards for reading, mathematics, and writing.

Measures . Ten measures came from the students and interviewers.

Sociometric Status. Each child was asked to nominate classmates on 17 positive and negative descriptors. Using the first seven nominations of each student, the measures summed the number of times a student was nominated. Principal components analysis of 15 of the items produced scales for positive peer nominations such as "want to sit next to" (alpha = .82), and negative peer nominations such as "starts fights" (alpha = .85).

Emotion Understanding. The first measure taps spontaneous emotion naming skills or the number of emotions identified when asked to name all the different feelings they could think of. The second measure taps accuracy of emotion recognition or the score on matching pictures to emotions.

Social Status Self-Reports. The first measure taps perceived meanness of treatment by other children (e.g., kids say mean things to me), and the second taps perceived rejection (e.g., kids blame me when things go wrong). The alphas equaled .75 for perceived meanness and .76 for perceived rejection. A third measure taps negative feelings toward school ("I feel alone at school") and positive feelings toward school ("school is fun for me"). The eight items in the scale have an alpha of .63.

Child Depression Symptoms. The Childhood Depression Inventory measures the frequency of different depression symptoms and has been used reliably with children as young as first grade (alpha = .78 for this sample).

Global Competence Scale. After the 30-minute child interview, interviewers used the Psychological Impairment Rating Scale to assign a global rating of social competence (alpha = .93). Given the different timing of assessment for the intervention and control cohorts, the interviewers likely knew the assignment of the children they rated.

Social-Emotional Competence Composite. The composite combined standardized scores on the nine previously listed measures (alpha = .64).

Analysis. The analysis performed t-tests (without controls) on the posttest scores of the two groups.

Outcomes

Implementation Fidelity . All teachers participated in two training sessions. At the beginning and end of the school year, interviews with teachers revealed only modest enthusiasm for the program. Complaints about dissatisfaction with the PATHS materials, lack of support, and time and effort needed to implement the program suggest poor fidelity.

Baseline Equivalence and Differential Attrition . Lacking pretest assessment, the study could not examine baseline equivalence. No information was provided on whether all children who started in the intervention and control groups completed the posttest.

Posttest . Of the ten outcome measures, two showed significant differences between the intervention and control cohorts: the global social competence rating done by the interviewers and the social-emotional competence composite. The results did not differ between students interviewed in Spanish and students interviewed in English.

Long-Term . None

Study 9

Summary

Bierman et al. (2010) used a cluster randomized controlled trial that assigned six schools to the intervention group and six schools to the control group. The sample of 2,937 students came from three cohorts and three different cities. A posttest assessment at the end of the three-year program included measures of social competence and peer relationships.

Bierman et al. (2010) found that the intervention group relative to the control group showed significantly greater reductions in

Teacher-rated scores on authority acceptance, cognitive concentration, and social competence
Peer ratings of aggression and hyperactivity but for boys only.

Evaluation Methodology

Design . The cluster randomized design studied schools and children over three years.

Schools. Participating schools came from Nashville, Seattle, and rural central Pennsylvania. Within each site, investigators invited about 12 elementary schools in high-risk areas (i.e., with high delinquency and juvenile arrests) to participate. Schools had to reach consensus among principals and teachers to participate. For those schools agreeing, three cohorts of students participated in the program, each beginning in first grade and participating for three years. The exact number of schools was not listed, but in each grade there were approximately 190 intervention classrooms and 180 comparison classrooms across the three cohorts. Note that schools in Durham, North Carolina, began the project but dropped out after the city and county schools merged and reassigned many children.

Students. Participating students in the classrooms needed to remain in the same school building from the beginning of grade 1 to the end of grade 3 and needed to have supplied information on the Social Health Profile and sociometric outcomes. These criteria produced a sample of 2,937 children across the three cities. However, this sample appears truncated. The study noted (p. 159) that "Children who were selected in kindergarten for additional intervention from the Fast Track project (and their high-risk control counterparts) were not included." The sample thus excluded (p. 166) "the worst behaving children."

Randomization. After being grouped on size, achievement levels, poverty, and ethnic/racial diversity, schools were randomly assigned to intervention and control groups. For the intervention group, this version of the program contained 57 lessons in grade 1, 46 in grade 2, and 48 in grade 3. The lessons were adapted to the needs of regular students in high-risk schools and lasted from September to May of each school year. The related version of the program for high-risk children, although not examined in this study, occurred simultaneously with the universal intervention.

Attrition. Limiting students to those who stayed in the same school over three years produced high attrition. In Nashville, only 30.9% of the original 1,560 children remained in the same school over three years. In Seattle, only 41.6% of the original sample of 1,825 children remained in the same school. In rural Pennsylvania, 75.0% of the 1,696 children remained in the same school. Further, the study may have violated the intent-to-treat principle by examining only students receiving the full intervention and failing to follow any student who did not remain in the same school over three years.

Assessment. Assessments occurred in the fall of the first year, and the spring of the first, second, and third year. The last assessment served as a posttest for the three-year program.

Sample Characteristics . The school characteristics differed across sites. The mean percentage of children receiving free or reduced lunch was 57% but ranged from 39% in rural Pennsylvania to 78% in Nashville. The mean percentage of ethnic minority students (primarily African American) was 36% but ranged from 1% in rural Pennsylvania to 55% in Nashville. The mean reading percentile was 45^th but ranged from 32^nd in Nashville to 57^th in rural Pennsylvania. The study did not report on the characteristics of the sampled students.

Measures . Outcomes came from teacher ratings and peer sociometric nominations.

Teacher Ratings. Teachers were interviewed regarding the behavior of each individual child in their class at the four assessment points (pretest, after year 1, after year 2, and posttest). The interviews used two instruments, the Teacher Observation of Classroom Adaptation - Revised (TOCA-R) and the Social Health Profile (SHR), and produced three measures:

authority acceptance (alpha = .93) on oppositional and conduct problems from the TOCA-R,
cognitive concentration (alpha = .97) on attention and work completion from the TOCA-R, and
social competence (alpha = .87) from the SHR.

For all three measures, high scores indicated more problems.

Peer Nominations. Interviews with children asked them to nominate classroom peers who fit descriptions of aggressive, hyperactive-disruptive, and prosocial behaviors. Scores for each child came from the average ratings given by classmates.

Analysis. The investigators recoded some variables into ordered categories to reduce skewness (e.g., social competence) and truncated others to reduce the influence of large values (e.g., peer nominations). The analyses then estimated hierarchical models using three levels: time, child, and school. Time was centered so that the intervention main effect showed group differences at the end of the intervention. Time-by-intervention effects showed how the trend varied across groups.

The models controlled for baseline values of the teacher-rated measures but not for the peer nomination measures, which were not gathered at the start of the study.

Outcomes

Implementation Fidelity . More than 90% of the teachers attended a 2-day training workshop. Educational consultants spent an average of 1 to 1.5 hours in each classroom observing, demonstrating, and providing feedback. They also met individually with teachers.

On average, teachers completed 48.2 lessons in the first grade (85%), 39.6 in the second grade (86%), and 38.4 in the third grade (80%). Fidelity ratings from educational consultants ranged from 3.0 to 3.2 on a scale from 1 (low skilled performance) to 4 (highly skilled performance).

Baseline Equivalence and Differential Attrition . A series of analyses of variance indicated no significant differences between intervention and control schools on the percent of children who received free and reduced lunch, the percentage of ethnic minority children, or academic achievement scores. Tests for baseline equivalence did not compare the baseline characteristics of students in the intervention and control groups.

To assess differential attrition, analyses compared baseline scores for the sample students who remained in their school for all three years with other students who left the school between grades 1 and 3. The groups showed no significant differences on gender or pretest authority acceptance at any of the sites, but differed on other characteristics for at least one site. Dropouts were more likely to be African American, have lower pretest scores on social competence, and have lower pretest scores on cognitive concentration.

Since attrition was lower in rural Pennsylvania (25%) than in the other two sites (58% and 69%), the study replicated the models separately for the rural Pennsylvania sample. The results were similar, perhaps even stronger for the rural Pennsylvania sample. However, it's hard to know if the stronger results indicate the lack of bias from attrition or general differences in the Pennsylvania sample.

Posttest . For all three teacher-rated outcome measures, the intervention group had better scores at the posttest and improved more over the three years than the control group. The intervention main effects were positive and significant for authority acceptance (effect size = .24), cognitive concentration (effect size = .12), and social competence (effect size = .34). The time-by-intervention effects were statistically significant as well, indicating that intervention children experienced less of an increase over time in problems.

In addition, tests for moderation revealed weaker intervention benefits in low-income schools for authority acceptance and social competence. They also indicated that, for the outcome of authority acceptance, the intervention had stronger benefits for children with higher baseline problems.

For the peer nominations, the intervention failed overall to affect outcomes of aggressive, prosocial, and hyperactive ratings, but it reduced aggressive and hyperactive outcomes for boys.

Study 10

The two articles came from the same project, but the Social and Character Development Research Consortium (SCDRC, 2010) examined only the first cohort of schools (n = 10), while Crean & Johnson (2013) examined two cohorts of schools (n = 14). Both studies evaluated PATHS when delivered starting in third grade rather than in kindergarten.

Summary

Crean and Johnson (2013) and the SCDRCC (2010) used a randomized controlled trial with 14 schools assigned to intervention or control groups. The study followed third grade students (n = 779) for three years and included teacher-rated and child-reported measures of aggression, conduct problems, and delinquent behavior.

Crean & Johnson (2013) found that, relative to the control group, the intervention group showed significantly greater reductions in

Teacher-reported conduct problems
Student reported aggressive social problem solving, hostile attribution bias, and aggressive interpersonal negotiation strategies.

Evaluation Methodology

Design:

Recruitment: This study, one part of the Social and Character Development Research Program that evaluated six other programs, recruited 14 public elementary schools, 10 in the first cohort and four in a second cohort. The schools represented one school district in Minnesota (two schools) and two school districts in New York (12 schools). The sample came from third-grade students in the schools. Of the 1,024 eligible students, 607 (59%) had consent and completed the baseline assessment. In addition, the sample added new students entering the schools during the study period.

Assignment: The 14 schools were randomly assigned within districts to treatment (n = 7) or control (n = 7) conditions after matching pairs of schools on the basis of nine school characteristics. The control schools continued their standard practices, some of which included using other social and character development programs. After randomization, 63.9% (n = 328) of the students in the intervention schools consented and 57.5% (n = 294) of students in the control schools consented. Among the new students entering the schools, 49% consented in the intervention group and 39% consented in the control group.

Assessments/Attrition: Assessments came in fall of third grade (baseline), the spring of third grade (year 1), the fall, winter, and spring of fourth grade, (year 2), and the fall, winter, and spring of fifth grade (year 3). SCDRC (2010) noted that baseline data collection came on average six weeks after the program start. Of the initial sample of 607 students with consent and baseline data, 38% (n = 231) left the schools or withdrew from the study. Helping to balance that loss, 172 (28%) new students with consent joined the study. Across all assessments and data sources (see Crean & Johnson, 2013, Table 2), the analysis sample sizes ranged from 429 to 630 students.

Sample Characteristics: Data from Crean & Johnson (2013) showed more female (57%) than male (43%) participants. About half of the students were white (51%), more than one-third were African-American (38%), and the remaining were classified as other (10%). Additionally, 17% of students identified themselves as Hispanic. About one third (33%) of students came from single parent households. The distribution of family income indicated a large proportion of families were poor (39% of families reported earning less than $20,000/year and 39% of families reported earning between $20,000 and $39,000 per year). By contrast, 21% of families reported earning over $70,000 per year. Most households had a family member who had graduated high school (19%), had some college (38%), or had obtained a college degree (33%).

Measures: Measures were collected from teachers and students, but teachers both rated the children and delivered the program. The timing and number of data points varied depending on the instrument and source of data.

Teacher-reported measures came from the Teacher-Child Rating Scales (TCRS) and the Behavior Assessment Scale for Children-2 (BASC-2). The three measures included aggression, conduct problems, and acting out behaviors. Internal consistency on these measures ranged from .72 to .94.

Child self-reports provided seven measures of aggression, delinquent behavior, victimization at school, normative beliefs about aggression, aggressive social problem solving, hostile attribution bias, and aggressive interpersonal negotiation strategies. Internal reliability on these measures ranged from .68 to .93.

Analysis:

The main analyses in Crean & Johnson (2013) used three-level growth models for time, students, and schools. The authors defined significance at p < .10, but the tables also listed significance at p < .05. Because curvilinear change appeared for most outcomes, the growth models included linear and quadratic terms (except for outcomes with only three time points). The multilevel models used random intercepts and slopes to adjust for clustering, and a time variable controlled for change from baseline. However, the sample size of 14 schools was likely not large enough to accurately estimate the standard errors, and the result may be to overstate the significance of the tests.

Intent to Treat: The analyses used all available data, with exclusions coming only from those moving schools, withdrawing, or not completing specific measures. In addition, sensitivity tests used multiple imputation for missing data on baseline covariates.

Outcomes

Implementation Fidelity:

Teachers reported teaching an average of 34.8 lessons per year. Observers rated teachers on 1) quality of teaching program concepts; 2) modeling and generalization of concepts throughout the school day; 3) quality of student compliance during lessons; and 4) openness to consultation. Reliability on these four measures was high (.87-.90). Six of the seven schools averaged greater than three (out of four) on quality of teaching concepts, modeling and generalization of concepts, and openness to consultation. Five of the seven schools also attained a three or better average on quality of student compliance during the program lessons.

Baseline Equivalence:

At the school level, a significant imbalance existed in the percent of students with limited English proficiency. At the student level, two-level mixed models found no significant condition differences in baseline demographics, but there was a statistical difference in parent-rated inter-generational closure at p < .05, and several marginal differences at p < .10. Overall, of the 69 measures collected, only seven showed significant or marginal baseline differences. In all instances, baseline differences favored the control group.

Differential Attrition:

Two-level binomial models including the aggression outcome variables (teacher and self-report), conduct disorder, acting out behavior problems, and minor delinquency were used to predict attrition as well as new enterer status. There was a higher level of attrition among students in the urban schools, but individual-level aggression measures predicted neither attrition nor entrance. Further, the authors stated that "none of the aggression by condition interaction terms were significant."

Posttest:

For the three teacher-reported measures, one (conduct problems) differed significantly across conditions in the linear and quadratic changes over time (p < .05). Effect sizes listed in Table 3 show initial iatrogenic effects: The intervention group had more conduct problems than the control group in the first four follow-ups. However, the last assessment at the end of fifth grade, after three years of the program, indicated fewer conduct problems for the intervention group (d = -.15).

For the seven child-reported measures, three risk and protective factors showed significant condition differences in the linear changes over time (p < .05). Relative to the control group, the intervention group had lower aggressive problem solving (d = -.27), hostile attribution bias (d = -.27), and aggressive interpersonal negotiation strategies (d = -.28).

Sensitivity tests using baseline covariates and multiple imputation did not lead to substantive changes in the findings.

The SCDAC (2010) analysis of 10 schools found weaker effects. In 60 tests for mean differences between conditions across grades 3-5 (Table 6.18), there were no significant differences. Further, in 18 tests for growth rates across the three years, only one reached statistical significance. The intervention group did better on academic competence than the control group (d = .08).

Long-Term:

Not examined.

Study 11

This study was registered at www.controlled-trials.com: ISRCTN 32534848. It used the pre-school version of PATHS but included students in the early years of elementary school. Little et al. (2012) offered a shortened summary of the study, while Berry et al. (2016) presented more details.

Summary

Little et al. (2012) and Berry et al. (2016) used a randomized controlled trial with 64 schools in Birmingham, England, that were assigned to intervention or control groups. The 5,397 children in the schools were followed for two years. Teachers provided measures of strengths and difficulties.

Little et al. (2012) and Berry et al. (2016) found no significant effects on youth strengths and difficulties or on teacher and classroom behaviors.

Evaluation Methodology

Design:

All mainstream (i.e. not special schools) primary schools in Birmingham, England, were invited to take part in the study (n = 299), and 64 schools expressed an initial commitment. Berry et al. (2016, p. 19) stated that "Eight schools dropped out shortly after randomisation and collected no baseline children level data." The study therefore reported on the remaining 56 schools. Participants were boys and girls in reception (i.e., pre-kindergarten) and year one (i.e., kindergarten) at the participating schools in the 2009/2010 academic year who went on to year one and year two, respectively, in 2010/2011. The 56 schools included 5,397 children ages 4-7 and 196 classes (Little et al., 2012, p. 265).

Assignment: The 64 schools were stratified by percentage of free school meals and size of school and then randomly allocated to an intervention or control group. However, as shown in the CONSORT diagram (Berry et al., 2016), the loss of six control schools versus two intervention schools before the baseline assessment may have compromised the randomization.

For the remaining 56 schools (n = 5,397 students) used in the study, 29 intervention schools received two years of the pre-school version of PATHS. The 27 control schools continued as usual, which typically involved use of a nationally recommended social and emotional competence program called SEAL. In most cases, the intervention replaced the usual social and emotional competence program with PATHS.

Assessments/Attrition: Data were collected at three points: baseline (n = 183 classes and 5,074 children), mid-intervention at one year (n = 176 classes and 4,998 children), and posttest at two years (n = 178 classes and 4,994 children). Among the 5,397 children in the 56 schools, completion rates were 93-94% for each of the assessments.

Sample:

The students in the sample averaged five years of age and were 68% non-white. About 4-10% exceeded the clinical cutoffs for behavior problems, emotional problems, or total difficulties.

Measures:

Teachers provided all child measures. The children were rated by two different teachers over the two post-baseline assessments, but each teacher delivered the program before rating the children.

The primary outcomes came from the teacher-completed Strengths and Difficulties Questionnaire (SDQ), which included four subscales, one impact scale, and one total scale. The study cited other studies that had demonstrated good internal consistency for the measures.

The secondary outcomes came from the PATHS Teacher Rating Survey and included 11 subscales.

One other set of nine measures came from trained independent observers, who were masked to condition. They rated teacher and classroom behavior using the Teacher-Pupil Observation Tool (T-POT). However, only 19 of the 56 schools (10 intervention, 9 control) had data, the baseline assessment for this measure came 1-2 months after the intervention began, and the only follow-up assessment came at six months after baseline.

Analysis:

Most analyses used three-level linear hierarchical or mixed models to account for clustering at the classroom and school levels as well as to control for covariates. For the 56 schools used in the study, data were analyzed both with and without missing data imputed, and the imputation used both multiple imputation and the last observation carried forward. The authors noted that all results were similar, with the imputed findings somewhat weaker.

Intent to Treat: Although data were analyzed with missing data imputed and all participants with baseline data were included, the eight randomized schools that failed to collect baseline data could not be included.

Outcomes

Implementation Fidelity:

On average, teachers delivered 55% of the 47 lessons in the year 2 curriculum, and they reported that lessons were delivered as specified in the program materials. However, the fidelity ratings done by the coaches (expressed as a percentage of the total possible score) ranged from 21% to 100%, with a mean of 79%. Using an 80% threshold, about half could be said to have delivered the program with high fidelity.

Baseline Equivalence:

There were no statistically significant differences between the intervention and control groups at baseline on seven demographic and behavioral measures (Berry et al., 2016, Table 2).

Differential Attrition:

Although the text is unclear, it appears that the influence of attrition was assessed by comparing the complete-case results with missing data excluded to the imputed results with all data included. The similarity of the results suggested minimal bias from attrition.

Posttest:

At 12 months (mid-intervention) and 24 months (posttest), the primary measures of strengths and difficulties did not differ significantly across conditions. For the secondary measures of the PATHS teacher ratings of children, the intervention group did significantly better than the control group on six of 11 tests at mid-intervention. However, at the 24-month posttest, these effects had disappeared.

At six months, observer ratings of teacher and classroom behavior showed better scores for the intervention group than the control group on three of nine tests: total positive behaviors (d = .304), class behavior negative to teacher (d = .307), and class off-task behavior (d = .227). However, these measures were available for only about one-third of the schools and the first year of the two-year study.

Moderation tests (Table 6) showed four significant subgroup differences in the program effects. Most consistently, the intervention group did significantly better than the control group for students with emotional difficulties.

Long-Term:

Not examined.

Study 12

Summary

Schonfeld et al. (2015) used a randomized controlled trial with 24 schools assigned to an intervention group (n = 692 students) or control group (n = 702 students). The students were followed from third grade (baseline) to sixth grade (posttest) and assessed with statewide achievement tests.

Schonfeld et al. (2015) found that, relative to the control group, the intervention group showed significantly higher

Test score proficiency in reading, writing, and math.

Evaluation Methodology

Design:

Recruitment: All 24 schools in a large, high-risk, urban school district in the Northeast were included in the study.

Assignment: Schools in the study were divided into two clusters balanced for race/ethnicity, proportion of students qualifying for free or reduced price lunch, and school size. Using a block randomization procedure, the study then assigned one cluster to the treatment group and the other cluster to the control group. The 12 schools in the treatment group had 692 students, and the 12 schools in the control group had 702 students.

Attrition: The longitudinal study followed the same cohort of students from 3rd grade to 6th grade. Assessments occurred at the end of 4th, 5th, and 6th grade, with the 6th-grade assessment representing a posttest. The analysis included only students who remained in the same treatment group (although not necessarily the same school) for the entirety of the 4-year program. This excluded 49% of the originally assigned students.

Sample: The sample was approximately half male (51%) and more than half (68%) received free or reduced price lunches. Black students made up 48% of the sample; Latino students made up 41%; White students made up 9%; and other races made up the remaining 2%. Although the study reports no socioeconomic information, the school district is described as high-risk.

Measures: The State Mastery Test, a statewide achievement test administered annually in the spring in Grades 4 to 8, measured problem-solving skills for academic tasks separately for math, reading, and writing. The test showed good validity and reliability.

Analysis: The study used multilevel logistic regressions to account for student-level information nested within schools (but not within the clusters used for assignment). Variability of the student defines level 1 analysis and variability of schools defines level 2. Because testing began in 4th grade, after the program start, the study did not have baseline test scores.

Intent-to-Treat: The study included only students who participated in the full program period and therefore does not conform to intent-to-treat.

Outcomes

Implementation Fidelity: The study measured fidelity and the effect of exposure to the intervention on outcome scores. Teachers presented about two-thirds of the available lessons, and exposure to more lessons predicted achievement.

Baseline Equivalence: At baseline, there were no significant differences between conditions in sociodemographic measures. There were no differences between control and treatment schools for achievement test results the year before the beginning of the program, but those scores did not include program participants.

Differential Attrition: The study found no differences between completers and dropouts for four sociodemographic measures but could not test for differences by baseline outcomes and did not analyze attrition by condition.

Posttest: Fourth grade students in the treatment group had significantly higher odds of attaining basic proficiency in reading and math, but not writing. Students in the treatment group in 5th and 6th grades had higher odds of attaining basic proficiency in writing, but not reading or math. None of the control measures moderated these intervention effects.

Long-Term: The study did not conduct any long-term follow-up.

Study 13

Summary

Fishbein et al. (2016) found that, relative to the control group, the intervention group showed significantly greater improvements in

Researcher-rated outcomes of inhibition task accuracy and impulsivity
Teacher-rated outcomes of aggression, internalizing, social competence, emotion regulation, prosocial behavior, impulsivity, inattention, closeness and conflict with teacher, peer relationship problems, and academic skills.

Evaluation Methodology

Design

Recruitment: Four public elementary schools from highly disadvantaged Baltimore City neighborhoods with poor school readiness participated in the evaluation. Consent was sought from all children in the schools' kindergarten classrooms. Of 464 eligible children, 327 provided consent and entered the trial, though children not receiving consent still received the intervention.

Assignment: Schools were randomly assigned to treatment (n=2) or control conditions (n=2). The preschool/kindergarten version of the PATHS curriculum was used as the primary intervention, which was taught by all kindergarten teachers in treatment schools, while kindergarten teachers in control schools provided instruction as usual. Since the treatment was administered grade-wise within school, randomization could not occur by classroom.

Assessment: Students were assessed using teacher-rated measures of behavior (e.g., attention, concentration, aggression) and peer-reported nominations (e.g., likability, aggression, acceptance) at baseline and posttest, in the spring of the academic year. Attrition was not described for the baseline sample.

Sample

The majority of students at participating schools were eligible for free lunches and nearly all students were African American. Household income in the areas served by the schools averaged about $40,000 a year, and areas had moderately high crime rates (~70/1000 residents).

Measures

All instruments were administered at the beginning of the fall semester and end of the spring semester, after the program had concluded.

Kindergarten teachers completed a series of measures (listed below) assessing child competencies.

Aggression was assessed with seven modified items from the Teacher Observation of Child Adaptation-Revised. Despite the modifications, no psychometric properties were described.
Internalizing was measured using five items from the Teacher Observation of Child Adaptation-Revised. Reliability was not reported for the subscale.
Social Competence was defined using 13 items from the Social Competence Scale. No measures of validity or reliability were reported.
Emotion Regulation was assessed with six items regarding the selected student's coping mechanisms and temper control (a=.88).
Prosocial Behavior was examined with a subscale of seven items that displayed good reliability (a=.96).
Child Impulsivity and Inattention were measured using the Diagnostic and Statistical Manual's ADHD Rating Scale (a=.92-.94).
Student-Teacher Closeness and Conflict were assessed with eight items from the Student-Teacher Relationship Scale (a=.92).
Positive Peer Relationships were assessed with the Peer Relations Questionnaire (a=.79).
Academic Skill was measured with four items drawn from the Academic Competence Evaluation Scales. Psychometric properties for the instrument were not reported.

Though not explicitly stated, it appears that child cognitive functioning outcomes (listed below) were administered by the research team. It was not clear whether these researchers were blind to condition.

Delayed Gratification was assessed across four dimensions using the Delay of Gratification tasks. Reliability was not reported for the measure.
Behavioral Inhibition was tested using the computerized Whack-A-Mole game, which yielded four total measures. No psychometric properties were described.
General Intelligence was measured using the KBIT-2 composite measure, which demonstrated high reliability (a=.89-.96).
Emotional Intelligence was assessed using "FACES task" at posttest, only. The validity of the measure was not described.
Motor-Skills Impulsivity was tested with the Peg-Tapping Task. No measures of validity or reliability were reported.

Analysis

Multilevel growth models were used to evaluate the intervention, with two observations nested in each student. The models control for student gender and inherently adjust for baseline outcomes. A secondary analysis applied similar methods to a propensity score matched subsample (114 of 327 cases) to adjust for baseline differences in baseline differences in several behavioral outcomes. There was no adjustment for clustering in classrooms, which were the unit of delivery, or schools (the unit of randomization) beyond the condition difference.

Intent-to-Treat: Subject attrition was not discussed.

Outcomes

Implementation Fidelity: The teachers delivering the intervention all completed at least 80% of the lessons affiliated with the treatment. However, observation-based fidelity ratings made by the program coordinator suggested some deviation from content, with an average score of 3.8 (76%) across all classrooms on a 5-point scale.

Baseline Equivalence: The groups were not equivalent at baseline, with "preliminary analyses reveal[ing] significant baseline differences across treatment condition for multiple behavioral outcomes," (pp.503) and several apparent differences in school and neighborhood characteristics in Table 1 with no significance tests (pp.497); See, for example, household income (control mean=$31,053 vs. treatment mean=$50,592) and overall crime rates (control=73.69 per 1000 vs. treatment=45.17).

Differential Attrition: There was no discussion of student attrition, though all schools were retained.

Posttest: At posttest, the treatment group showed improvement on 13 of 13 teacher-rated outcomes compared to controls. Gains were observed for aggression, internalizing, social competence, emotion regulation, prosocial behavior, impulsivity, inattention, teacher-child closeness and conflict, peer relationship problems and academic skills, as rated by the teachers who delivered the intervention. All but two of these impacts (academic skills and teacher-student closeness) were maintained in the propensity score-matched subsample.

In the overall sample, treatment students significantly improved 2 of 10 direct cognitive functioning outcomes over the control group, including inhibition task accuracy and motor-skills impulsivity performance. However, these gains were not evident in the matched subsample.

Study 14

The version of PATHS used in this study was implemented by health-promotion professionals and consisted of 161 lessons spread over the eight years of elementary school. It updated a Dutch translation of the U.S. curriculum that had been used for several years in the Netherlands. In this two-year effectiveness study, all children in the program received about 40 PATHS lessons over two years, but children in the higher grades, who did not start the lessons from kindergarten, received extra lessons.

Summary

Goossens et al. (2012) used a quasi-experimental design that non-randomly assigned 18 Dutch schools and 1,331 students in kindergarten and grade 1 to intervention and comparison groups (n = 9 schools each) and followed the students over two-years. Teacher-rated and child-reported measures included problem behaviors, depression, and emotional regulation.

Goossens et al. (2012) found only one significant effect in 27 tests. The intervention group compared to the comparison group showed significantly higher

Emotional awareness.

Evaluation Methodology

Design:

Recruitment: Of 30 Dutch Municipal Health Services approached, three agreed to participate in the study. Health promotion professionals at each location then recruited all elementary schools in their region. The 18 schools that joined were located in rural areas and provincial towns in the Netherlands. In total, 1,331 children (ages 5-11) from four cohorts - kindergarten and grades 1, 3, and 5 - were eligible for the study.

Assignment: The quasi-experimental design began with randomizing the 18 elementary schools to an intervention condition or a waitlist comparison condition. However, four schools deviated from the randomization. The authors allowed two to start two years later for organizational reasons and agreed to demands of two to start the program directly. The intervention group included nine schools with 695 eligible students, and the comparison group included nine schools with 636 eligible students.

Assessments/Attrition: Assessments occurred over the two years of program implementation: baseline (T0), end of the first year (T1), start of the second year (T2), and end of the second year (T3). Due to missing data, moving to other schools, and parent refusal, 1,294 (97% of those eligible) students completed the baseline assessment. By the last assessment at the end of two years, 1,223 students provided data (92% of the randomized sample, 95% of the baseline sample). Attrition came from moving to other schools or from repeating and skipping grades.

Sample:

Nearly all students in the sample came from a Dutch background (92-99% across the cohorts and conditions). The distribution by gender was close to even (44-57% male), and the students varied in mean age from 5.4 years to 10.6 years across the four cohorts.

Measures:

Teachers who delivered the program also rated their students. Child questionnaires were completed in face-to-face interviews with the three youngest cohorts and by self-report with the oldest cohort. Trained graduate psychology students who did not know of the school assignment conducted the child assessments. The study used 27 outcome measures in total.

Problem Behaviors. Six teacher-rated scales from the Problem Behavior at School Interview measured four externalizing problems (attention deficit and hyperactivity, oppositional defiant disorder, conduct problems, and relational aggression), and two internalizing problems (anxiety and depression). Cronbach's alphas varied between .78 and .92.

Social Experiences. Three teacher-rated scales measured relational victimization, physical victimization, and prosocial behavior. Cronbach's alphas were .87 (relational victimization), .85 (physical victimization), and .75 (prosocial behavior).

Depression. Five child self-report scales, obtained only for the oldest (grade 5) cohort, came from the Dimensions of Depression Profile for Children instrument. The measures included one total score (alpha = .85) and four subscales: depressed mood (alpha = .69), self-blame (alpha = .59), low energy/interest (alpha = .75), and low global self-worth (alpha = .77).

Peer Nominations. One measure of peer social preference was obtained using peer nominations of the most liked and least liked.

Social Behavior. Three teacher-rated scales obtained for the youngest two cohorts (kindergarten and grade 1) came from the Preschool and Kindergarten Behavior Scale. The three subscales included social cooperation (following instructions, getting along with peers), social interaction (gaining and keeping friendships), and social independence. The internal consistency of the subscales ranged from .86 to .89.

Social and Emotional Skills. A single teacher-rated measure obtained for the two youngest cohorts (kindergarten and grade 1) came from the Head Start Competence Scale. It measured children's social and emotional skills in interpersonal relationships and emotion regulation (alpha = .95).

Emotional Awareness. A single teacher-rated measure came from the Levels of Emotional Awareness Scale for Children, which assessed the complexity of children's emotional awareness. Cronbach's alpha ranged from .89 to .92 over the assessments.

Emotional Regulation. Six child-rated scales obtained for the oldest cohort (grade 5) came from the Difficulties in Emotion Regulation Scale and measured non-acceptance of emotional responses (alpha = .73), difficulties engaging in goal-directed behavior (alpha = .82), impulse control difficulties (alpha = .80), lack of emotional awareness (alpha = .78), limited access to emotion regulation strategies (alpha = .74), and lack of emotional clarity (alpha = .61).

Empathy. One-child-rated measure from Bryant's Empathy Index was used for the youngest and oldest cohorts only. Cronbach's alpha was .68.

Analysis:

Multilevel mixed models nested the four measurement waves within students, the students within classes, and classes within schools. Although the mixed models adjusted for clustering within schools, the unit of assignment, the sample size of 18 clusters was likely not large enough to accurately estimate the standard errors and may result in the overestimation of program effects. The authors stated that they calculated "three change scores (T0 - T1, T1 - T2, and T2 - T3) for each variable." To test for the program effect over the full period, these three change scores were then included in the mixed models for each outcome. Missing data were handled through Full Information Maximum Likelihood estimation (FIML). Because of multiple testing (27 outcomes), the level of statistical significance was set at p < .01 in all tests.

Intent-to-Treat: The FIML estimation used all participants with baseline data. Also, the authors minimized follow-up attrition by attempting to collect data via mail from children who moved schools or who repeated or skipped a grade.

Outcomes

Implementation Fidelity:

Mean completeness, which measured teacher reports of coverage of the lessons, was only 50% in the first year and 49% in the second year. A measure of "conceptual use" of the program principles ranged from 1-4. The mean was around 3.05 in the first year and 3.07 in the second year.

Baseline Equivalence:

Age, gender, ethnicity, and verbal ability did not differ significantly between the intervention and comparison groups at baseline. However, significant baseline differences (p < .05) were present for the level of attention deficit and hyperactivity, oppositional defiant disorder, conduct problems, relational aggression, anxiety, relational victimization, physical victimization, prosocial behavior, low energy, social interaction, social independence, and social and emotional skills. Problem behaviors tended to be higher and social and emotional skills tended to be lower in the intervention condition.

Differential Attrition:

Attrition was low, only 5% of the baseline sample.

Posttest:

In 27 tests, the intervention group did significantly (p < .01) better than the comparison group only for emotional awareness. Additional moderation tests showed, with few exceptions, no differences in program effects between subgroups. The authors attributed the sparse effects to poor program implementation.

Long-Term:

Not examined.

Study 15

Summary

David (2014) non-randomly assigned three Canadian schools and 98 students to an intervention and comparison group and followed the students over 14 months. The measures include reading achievement and social competence.

David (2014) found no significant program effects on measures of social competence or reading achievement.

Evaluation Methodology

Design:

Recruitment: The study examined 98 students in three elementary schools, all part of the same school board in Eastern Canada. The students were enrolled in grades primary, one, and two and were required to have parental consent, but the study did not report consent rates.

Assignment: The school board administration determined assignment of schools. Two schools received the intervention (n = 57 students), and one school served as a waitlist comparison (n = 41 students). The single comparison school confounded the no treatment with any unique characteristics of the school.

Assessments/Attrition: Assessments occurred at baseline and after 14 months. Nearly all students with baseline data had posttest reading data (98%). However, only 79% of data from teachers was complete (N = 77).

Sample:

The study lacked socio-demographic information on the students. It reported only that the genders were evenly split and that grade 2 had more students than the other grades.

Measures:

The two measures of reading achievement came from the Woodcock-Johnson III: Tests of Achievement Letter-Word Identification and the Woodcock-Johnson III: Tests of Achievement Word Attack. Trained research assistants conducted the student assessments, but the study did not say if they were aware of condition assignment. Three measures of social competence came from the teacher-rated PATHS Student Evaluation Questionnaire and included 1) aggressive/disruptive behaviors, 2) attention and concentration, and 3) social and emotional competence. The social competence alpha values were low, ranging from .51 to .73, and came from teachers who delivered the program.

Analysis:

The analyses used repeated-measures ANOVA models that controlled for the baseline outcome but no other covariates. It did not appear that the models accounted for the assignment of schools with adjustments for clustering.

Intent-to-Treat: The analyses used all available data.

Outcomes

Implementation Fidelity:

The study noted the reluctance of some teachers to implement the program but provided no quantitative measures of implementation fidelity.

Baseline Equivalence:

Tests found no significant baseline differences in age, aggression, attention, social-emotional competence, letter-word identification, or word attack.

Differential Attrition:

No tests presented.

Posttest:

The study found no effects of condition on any of the three social competence outcomes or the two reading outcomes.

Long-Term:

Not examined.

Study 16

This evaluation examined the impact of a two-year program that teachers delivered twice-weekly in 30-40 minute lessons on two different groups of students - years 3-5 and years 5-6.

Summary

The Manchester study (Barlow et al., 2015; Hennessey & Humphrey, 2020; Humphrey et al., 2016; Humphrey, Barlow, & Lendrum, 2018; Humphrey, Hennessey et al., 2018) was a cluster randomized controlled trial with a sample of 45 schools and 5,218 students in years 3-5 and 3,336 students in years 5-6. Measures of socio-emotional competence, mental health, and academic performance came at posttest (after two years of the program) and at 12- and 24-month follow-ups. An additional QED analysis (Panayiotou et al., 2020) examined program effects for students receiving most of the program lessons relative to those who did not.

Barlow et al. (2015), Hennessey & Humphrey (2020), Humphrey et al. (2016), Humphrey, Barlow, & Lendrum et al. (2018), Humphrey, Hennessey et al. (2018), and Panayiotou et al. (2020) found only two significant program benefits in numerous tests for measures of socio-emotional competence, mental health, and English and math achievement tests. The intervention group relative to the control group had significantly higher

Teacher ratings of socio-emotional competence
Child self-reported psychological well-being.

Evaluation Methodology

Design:

Recruitment: Mainstream primary schools in the 10 Local Authorities that make up the Greater Manchester region were eligible to participate. Children in years 3, 4, and 5 (ages 7-9) on the school's full-time roll at the start of the 2012-13 academic year defined the initial target population for the study (another sample of children in years 5 and 6 is described below). From a pool of 58 schools, 45 met the eligibility criteria. *Although the reported figures differed across studies, only 133 or 140 (2.5-2.6%) of 5,218 students did not receive parental consent to participate. The schools were generally representative of schools in England, although they had higher proportions of children eligible for free school meals and speaking English as an additional language than national averages (Hennessey & Humphrey, 2019, Tables 1-2).

Assignment: An independent team randomly allocated the 45 schools to an intervention group (n = 23) or control group (n = 22). The studies did not report the number of randomized students in each condition, only the total of 5,218. The randomization used a minimization algorithm to ensure balance in eligibility for free school meals and speaking English as an additional language. From 2012/13 to 2013/14, the intervention group received the program curriculum during health education classes, while the control group continued usual teaching practices.

Assessments/Attrition: The baseline assessment took place in May-July 2012. Humphrey et al. (2016, Figure 1) reported that 4,516 children (89%) and 4,498 teachers (89%) provided baseline data. The posttest took place in May-July 2014, after two years of the program, with 3,505 children (69%) and 3,317 teachers (65%) providing data. Humphrey, Hennessey et al. (2018) reported somewhat different figures: 4,400 children (84%) had baseline data and 3,888 children (75%) had posttest data. The higher posttest figure came from the addition of new students to the sample.

For the assessments at 12 and 24 months, Humphrey, Hennessey et al. (2018) examined a subsample of students who had transferred to secondary school following the posttest (n = 1,631). Attrition for this group was high: At 24 months, teacher and child measures were available for only 28% (n = 463) of the subgroup.

Sample:

For the baseline sample of students ages 7-9, about 30% were eligible for free school meals and about 22% spoke English as an additional language. About 70% were white, 12% Asian, and 8% black. The gender distribution was about equal.

Measures:

The study obtained teacher reports and child self-report surveys. Teachers who rated the children at posttest - but not at follow-up - also delivered the program.

Child-rated social-emotional competence or social skills was assessed using the 46-item social skills domain of the self-report version of the Social Skills Improvement System. The instrument contained seven subscales and a total scale. Alphas ranged from .67 (Assertion and Responsibility, baseline) to .83 (Self-Control, follow-up).

Teacher-rated mental health difficulties was assessed using the 25-item Strength and Difficulties Questionnaire. The instrument contained five subscales and a total scale. Alphas ranged from .68 (Peer Problems, baseline) to .90 (Hyperactivity/Inattention, baseline) in Humphrey et al. (2016) and from .79-.89 for subscales of internalizing, externalizing, and prosocial behavior in Humphrey, Hennessey et al. (2018).

Teacher-rated changes in children's social-emotional competence was assessed at posttest using the five-item Social and Emotional Competence Change Index, which was derived from the PATHS program evaluation tools. The alpha for the single scale was .93.

Child-reported psychological well-being was assessed using the Kidscreen-27. It contained subscales on psychological well-being (alpha = .77), peer and social support (alpha = .78), and school environment (alpha = .71).

Child school measures in Humphrey, Hennessey et al. (2018) came from the National Pupil Database and included school exclusions and attendance for all students and reading/writing and math attainment for the transition sample of students moving to secondary school.

Analysis:

Several analyses used three-level hierarchical linear modeling for school, child, and time. Controls included school- and child-level eligibility for free school meals and speaking English as an additional language, child sex, school use of other SEL practices, and child baseline risk status. Tests for program effects used condition-by-time interaction terms that included baseline scores. Other analyses used two-level hierarchical modeling for school and child with covariates for the baseline outcome and other baseline measures.

In a QED analysis, Panayiotou et al. (2020) used complier average causal effect estimation with randomization as an instrumental variable to compare those receiving high program dosage with others rather than to compare all intervention participants with all control participants.

Intent-to-Treat: The studies used FIML estimation or multiple imputation for all available data, including participants without complete data. The non-ITT complier analysis of Panayiotou et al. (2020) was an exception.

Outcomes

Implementation Fidelity:

On a scale of 1 to 10, measures were generally high for fidelity (8.20), quality (8.48), participant responsiveness (7.34) and reach (9.08). However, classes on average fell 20 lessons behind schedule. Analyses showed that higher levels of implementation quality and reach were associated with better academic outcomes but not better psychological outcomes. Humphrey, Hennessey et al. (2018) noted that despite its very modest impact, the program offered value given its low cost.

In a detailed study of the intervention group only, Humphrey, Barlow, & Lendrum (2018) found that implementation quality and participant responsiveness was associated with significantly lower ratings of students' externalizing problems. However, higher dosage was associated with lower prosocial behavior and social competence, and procedural fidelity had no association with student outcomes.

Baseline Equivalence:

Humphrey et al. (2016) reported that "Assessment of balance on key observables between the trial groups at baseline revealed negligible differences - with only the Pro-Social Behavior subscale of the SDQ exhibiting a difference of greater than d = 0.1 (see Table 1). After accounting for data clustering and multiple comparisons, there were no statistically significant differences between the two trial groups at baseline on any outcome measure." Humphrey, Hennessey et al. (2018) similarly found that standardized mean differences across conditions were no more than .16 for baseline sociodemographic and outcome measures.

Differential Attrition:

Humphrey et al. (2016) presented no tests for differential attrition, but retention differed substantially across conditions. Of those with baseline data, 86% of the intervention students and 69% of the control students had posttest data. Humphrey, Hennessey et al. (2018) noted that all intervention schools completed the posttest, but five of 22 control schools failed to complete the posttest. For students, they reported that 97% of the intervention group had posttest data compared to 79% of the control group.

Posttest:

Humphrey et al. (2016) found that the intervention teachers reported significantly greater changes in socio-emotional competence than the control teachers (d = .47). For teacher-reported mental health outcomes, the six tests found two significant group-by-time coefficients, but both favored the control group rather than the intervention group. For child-reported socio-emotional competence, the eight tests found no significant effects. Subgroup analyses of those at high baseline risk demonstrated mixed effects, with some favoring the intervention group and some favoring the control group.

Humphrey, Hennessey et al. (2018) examined 11 posttest outcomes in Tables 7A-7C, with only one significant effect. The intervention group self-reported better psychological well-being than the control group (d = .15)

Long-Term:

At the 12 and 24-month follow-ups of the transition sample of students who had moved to secondary school, Humphrey, Hennessey et al. (2018, Tables 9 and 11) found no significant effects in 10 tests. Additional mediation tests using structural equation models (Figure 4) did not include condition as a predictor.

Academic Performance among Year 5 and Year 6 Students (Barlow et al., 2015; Hennessey & Humphrey, 2019)

Although part of the same study and project, these two articles examined older students. Using the same randomized schools (23 intervention, 22 control), the sample consisted of 1,705 year 5 students and 1,631 year 6 students. The posttest assessment came in the summer term of 2014, at the end of the two-year program. For year 5 students, 37 schools (82%) and 1,117 students (66%) had data. For year 6 students, 45 schools (100%) and 1,582 students (97%) had data.

The outcomes measured reading and math attainment. For the year 5 cohort, posttest measures came from the Interactive Computerised Assessment System. The measures had reliabilities exceeding .92 and predicted future external assessment scores (r = .72). Assessments were administered by members of the research team, but an independent organization unaware of school or individual assignment graded the tests. For the year 6 cohort, both the pretest and posttest measures came from the Standardised Assessment Test English and Mathematics available from the National Pupil Database. Administration and scoring were done independently. Studies had shown that the test had strong internal consistency (alpha > .91) and classification accuracy (> 85%). Note that these test scores also served as the pretest measure for the year 5 cohort.

The analyses used either two-level hierarchical modeling for school and child or three-level hierarchical modeling for school, child, and time. Controls included school- and child-level eligibility for free school meals and speaking English as an additional language, child sex, school use of other socioemotional learning practices, and child baseline risk status. Barlow et al. (2015) lacked baseline outcome controls, while Hennessey & Humphrey (2019) used baseline outcome controls from the standardized tests given to students in year 2. To deal with missing data, the analyses used FIML estimation or multiple imputation, except for the year 6 cohort in Hennessey & Humphrey (2019), where attrition was only 3%.

Tests for baseline equivalence showed some school differences but few individual differences (Barlow et al., 2015, Table 1; Hennessey & Humphrey, 2019, Tables 1-2). The tables list effect sizes without significance tests. For six school characteristics, the three largest differences were for school size (.24), attainment (.43), and attendance (.48). The intervention schools had more students, lower attendance, and higher attainment. For five student characteristics, the largest effect size was .17, with the intervention students having higher scores on teacher-rated strengths and difficulties.

Differential attrition appeared in the substantial differences in condition completion rates for the year 5 students. According to Figure 1 in Barlow et al. (2015, p. 15), 96% of the intervention schools and 73% of the intervention students provided year 5 data, while only 68% of the control schools and 58% of the control students provided year 5 data. Hennessey & Humphrey (2019) reported that the comparison of schools lost to follow-up with those retained did not reveal significant differences on school characteristics. Children with complete versus incomplete data did not differ in terms of their gender, language group, or prior academic attainment, but those with incomplete data were significantly more likely to be eligible for free school meals.

For the four posttest academic attainment outcomes in Barlow et al. (2015, Table 3), only one of the tests proved significant, but the effect was iatrogenic. The multivariate coefficient for the English test in year 6 indicated that the control group improved more than the intervention group (Hedges g = -.106). Four other tests for the subgroup of students eligible for free-school meals were not significant. Hennessey & Humphrey (2019) found that the program had no significant impact on the attainment outcomes, regardless of year group (year 5, year 6), outcome measure, subject area (Math, English/Reading), or analysis sample (ITT, subgroup).

Complier and Non-Compliers Analysis (Panayiotou et al., 2020)

As a supplement to the ITT analyses in other articles, Panayiotou et al. (2020) used a quasi-experimental design to examine the program effects for those who received a high dosage (i.e., compliers) relative to those who did not. Randomization served as an instrumental variable in estimating complier average casual effects that adjusted statistically for differences between the compliers and others.

The authors defined the high-dosage compliers in two ways. The first was defined as being in a class that delivered at least 67% of the scheduled lessons (50th percentile and higher), and the second was defined as being in class that delivered at least 79% of the scheduled lessons (75th percentile or higher). Independent observers coded dosage for each class near the end of the first year of implementation. In examining models for both measures of high-dosage compliers, the analysis used robust maximum likelihood estimation with full information and included all available data. Two-level models nested students at level 1 within schools at level 2 (n = 45) and controlled for baseline scores, conduct problems, social-emotional competence, free school meal eligibility, sex, and three school characteristics (percent free-school meal eligibility, percent with English as an additional language, and size).

The ITT results confirmed previous findings that the program significantly improved psychological well-being for the intervention group relative to the control group (d = .17) but did not improve peer social support or school connectedness. The non-ITT complier average causal effect estimates produced stronger results. Compliers did significantly better than others on all three outcomes of psychological well-being (d = .43), peer social support (d = .63), and school connectedness (d = .80). The results confirmed findings of the benefits of high-quality program implementation in Humphrey, Barlow et al. (2018) but did so with strong QED methods.

Study 17

The study evaluated a Croatian version of PATHS that consisted of 63 lessons, about two per week, delivered in the last half of first grade and during most of second grade.

Summary

Novak et al. (2017) used a cluster randomized controlled trial that assigned 60 Croatian schools and 600 first-grade students to intervention and control conditions. The assessment at the end of second grade included teacher-reported measures of prosocial behavior, emotional regulation, hyperactivity, and aggression.

Novak et al. (2017) found no significant main effects of the intervention for the full sample but found significant benefits for several outcomes among those classified as being low risk at baseline.

Evaluation Methodology

Design:

Recruitment: The study recruited 30 schools and two first grade classrooms within each school. Although all children within each of the 60 classrooms participated in PATHS or usual practice, only 10 children from each classroom were randomly selected for assessment. The parents of all 600 children gave consent.

Assignment: After matching pairs of schools within region on neighborhood characteristics, family socioeconomic status, percentage of children receiving free lunches, school size, class size, and average achievement scores, one school within each pair was randomly selected to receive the intervention, and the other school was assigned to continue its usual practice.

However, one intervention school (with 20 children) and one control classroom (with 10 children) had to be excluded because teachers failed to complete the initial assessment of children. In addition, one selected child in each of two other classrooms in the control condition had to be excluded because the teacher failed to complete the initial assessment. That left 568 of 600 students (95%) to participate.

Assessments/Attrition: The posttest came near the end of second grade, about 1.5 years after the baseline assessment. A total of 546 children completed the posttest, which was 91% of all eligible children and 96% of those with baseline data.

Sample:

The Croatian sample had an average age at baseline of seven years and consisted of 47% girls. The study provided no other details on sociodemographic background.

Measures:

Teachers who delivered the program provided all ratings of children. The nine measures came from commonly used instruments:

Prosocial behavior (alpha = .88)
Emotion regulation (alpha = .89)
Learning behavior (alpha = .92)
Inattention (alpha = .94)
Hyperactivity (alpha = .95)
Oppositional behavior (alpha = .91)
Physical aggression (alpha = .93)
Peer problems (alpha = .65)
Withdrawn/depressed behavior (alpha = .81).

Analysis:

The analyses used two-level hierarchical linear models, with children nested within classrooms but without nesting within schools, the unit of assignment. The study reported examining three-level models, but the intraclass correlation coefficient for eight of nine outcomes was not significant (ranging from .02 to .06). For the other outcome, accounting for the variance at the school level did not change the intervention effect. The study therefore reported results from the two-level models. The models included the relevant baseline outcome and child sex as covariates.

Intent-to-Treat: The full-information maximum likelihood estimation included the 22 children with baseline assessments but missing posttest data (4% of the sample). It was not possible to include randomized schools and children lost before baseline.

Outcomes

Implementation Fidelity:

Local coaches completed checklists of program implementation, which indicated that teachers delivered 90-95% of the PATHS curriculum.

Baseline Equivalence:

There were no statistically significant baseline differences between the intervention and control groups for the nine outcomes (Table 1), but the study did not test for baseline equivalence on any sociodemographic variables.

Differential Attrition:

Among the sample with baseline data, attrition was only 4%.

Posttest:

For the full sample, the intervention group did not differ significantly (p < .05) from the control group on any of the nine outcome measures. A high-risk subgroup also showed no significant intervention effects, but a low-risk subgroup showed significantly better outcomes for the intervention group on seven of the nine outcomes (d ranged from .22 to .38). The study did not test for the significance of the subgroup differences in intervention effects, however.

Long-Term:

Not examined.

Study 18

Summary

Hindley and Reed (1999) used a quasi-experimental design that non-randomly assigned four schools and three hearing-impaired units in northeast England to intervention and waitlisted control groups. The sample included 64 deaf children and assessments over the following year measured social and emotional adjustment, reading attainment, and self-control.

Hindley and Reed (1999) found that the intervention children improved significantly more than the control children on

Self-image
Emotional adjustment
Ability to recognize, label and understand emotions.

Evaluation Methodology

Design:

Recruitment: The study included four schools for the deaf and three hearing-impaired units that were located in northeast England. A total of 64 severely to profoundly deaf students in school years 4 and 5 were involved in the study.

Assignment: Two schools and one unit were non-randomly assigned to receive PATHS (n = 33 students), and two schools and two units continued their usual practices as a waitlisted control group (n = 31 student).

Assessments/Attrition: Assessments came at baseline and three time points over the following year. The study reported that one hearing-impaired unit in the intervention group dropped out, leaving 55 of 64 students (86%) available for posttest.

Sample:

Boys made up about 43% of the sample.

Measures:

The eight outcomes included measures of emotional vocabulary, social and emotional adjustment, self-image, reading attainment, impulsivity, and self-concept. Teachers provided some of the measures, and no information on reliability and validity was presented.

Analysis:

The analysis used repeated-measures models with baseline controls but no adjustment for clustering within schools/units.

Intent-to-Treat: One unit dropped out, and it was unclear if the study attempted to gather or use information from the unit.

Outcomes

Implementation Fidelity:

The study reported positive reactions from teachers, parents, and students but provided no quantitative measures of implementation fidelity.

Baseline Equivalence:

Tests for baseline differences for the eight outcomes showed one significant difference.

Differential Attrition:

Not examined.

Posttest:

The repeated-measures analyses found that the intervention group improved significantly more than the control group for five of the eight outcomes: children's ability to recognize, label and understand emotions, children's self-image, and the teacher's assessment of the children's emotional adjustment.

Long-Term:

The study included results for an additional follow-up, but after the waitlisted control group had received the program.

Study 19

Summary

Ross, Sheard et al. (2011) and Ross, Cheung et al. (2011) used a cluster randomized controlled trial with 12 schools in Northern Ireland, six in the intervention group (n = 650 students), and six in the comparison group (n = 780 students). Assessments of socio-emotional attitudes and prosocial behavior came yearly throughout the three years of the study, with the last assessment serving as a posttest.

Ross, Sheard et al. (2011) and Ross, Cheung et al. (2011) found that, relative to the control group, the intervention group did significantly better on

Teacher-rated measures of empathy, perseverance, negative affect, and aggression
Child-assessed measures of emotional understanding, responses to challenging scenarios, and mutual understanding.

Evaluation Methodology

Design:

Recruitment: The study recruited 13 primary schools in three communities of Northern Ireland. The schools served populations of mostly working class pupils and were in communities that had experienced religious conflict. According to Table 3 in Ross, Sheard et al. (2011), the initial sample included 1,430 students in grade levels P1 (ages 4-5), P2 (ages 5-6), P5 (ages 8-9), and P6 (ages 9-10).

Assignment: Six schools were randomly selected to implement the program (n = 650 students) and the others served as a comparison group. One of the comparison schools dropped out shortly after the random assignment and before pretesting, leaving six schools (n = 780 students). The comparison schools initially implemented a less structured and intensive socio-emotional education program that focused on accepting others and getting along.

However, the comparison schools implemented the Together 4 All (T4A) program in the fall of year 3, before the posttest in the spring of year 3. Ross, Sheard et al. (2011, p. 70) referred to the comparison group receiving the intervention as Cohort 2, and stated: "although continuing to be treated in the analyses as a comparison group, the teachers, pupils, and principals were being fully exposed to the T4A curriculum, activities, and expectations."

Assessments/Attrition: The study followed the students for three school years. The baseline assessment in November/December 2008 was followed by assessments in May/June 2009, October/November 2009, May/June 2010, and May/June 2011. The last assessment (called Sweep 5) represented a posttest for the ongoing program. Students in grades P1, P2, and P5 at baseline were in grades P3, P4, and P7 at posttest, but the P6 students at baseline were not available for assessment at posttest. Table 3 shows that at the last follow-up, the sample size fell to 981 (69%). Not counting the P6 students, the sample sizes fell to 91% for P1 and P2 students and 93% for P5 students. Also, some measures had substantial additional missing data.

Sample:

The median percentage of students receiving free school meals was 29%, and the median percentage of students with special education needs was 19%. No other sociodemographic information on the students was provided.

Measures:

The study examined a large number of measures but with few relating to child behavioral outcomes. The measures came from three sources: 1) teachers who delivered the program, 2) child assessments using done by researchers, 3) and observations of teacher and child behaviors by researchers. The study did not report on reliability or validity, although some of the measures were created by the researchers. It also did not report on the awareness of condition by researchers.

First, the teacher assessments of the children used the Strengths and Difficulties Questionnaire. The measures focused on child behaviors but were not independently rated.

Second, the child assessments used the Assessment of Children's Emotions Scale, the Challenging Situations Task, and the Mutual Respect and Understanding Survey to measure pupil skills in social problem solving and recognizing emotions. These measures reflected student attitudes and responses to hypothetical situations rather than behavior.

Third, the observations of teachers and children used the Classroom Observation of Behaviour and Playground Observation of Behaviour instruments. Inter-rater reliabilities for the measures were adequate. The study made no mention of the observers being masked to condition.

Analysis:

Ross, Sheard et al. (2011) presented chi-square tests and differences-of-means tests. The latter tests provided the more direct contrast of the outcomes for the intervention and control groups. Page 42 states that the results controlled for baseline scores. The tests did not adjust for clustering within schools - the unit of assignment. Ross, Cheung et al. (2011) used chi-square and Mann-Whitney tests, but the chi-square tests appear to include a category for not available. They also examined hierarchical models but said that the power was too low with only 12 schools.

Intent-to-Treat: The study did not attempt to follow the P7 students after they left middle school but otherwise used all available data.

Outcomes

Implementation Fidelity:

The authors summarized a detailed implementation analysis by stating that, by the third year, implementation was fairly strong in all six schools and had significantly improved from the first year.

Baseline Equivalence:

Ross, Sheard et al. (2011, p. 3) stated that "None of the demographic variables was found to differ statistically significantly across treatment groups." In regard to the outcome measures, they found several differences: "At baseline, comparison pupils were rated significantly more positively than T4A pupils on 19% of the items and were directionally higher on 69%." For the baseline differences reported in Tables 14, 16, 18, 20, 25, and 28, a count shows 19 significant differences in 79 tests.

Differential Attrition:

For three of the grades included in the posttest assessment, attrition was modest, only 7-9%, but the study provided no analysis of differential attrition.

Posttest:

The two articles presented numerous tables and tests. Ross, Cheung et al. (2011) examined results only at the interim, year one assessment. The authors said the "programme effects on social-emotional learning were weak and inconsistent."

Ross, Sheard et al. (2011) analyzed results for a longer follow-up period (i.e., the posttest at Sweep 5). The results from their difference-in-means t-tests can be summarized as follows:

Teacher observations (Table 8): No significant differences in 11 tests.
Student observations (Table 10): Two significant differences in nine tests but both were iatrogenic.
Playground observations (Table 12): No significant differences in 11 tests.
Teacher ratings of children in grades P3-P4 (Table 15): Three significant differences in five tests for empathy, coping, and cooperation (d = .24), more actively helping others (d = .20), and being less socially withdrawn (d = .17).
Teacher ratings of children in grade P7 (Table 17): Four significant differences in four tests for empathy and cooperation (d = .58), reflectivity and perseverance (d = .43), negative affect (d = .53), and fighting and aggression (d = .41).
Child assessment of emotions in grades P3-P4 (Table 18): One significant difference in two tests for correct emotional responses (d = .17), but one significant iatrogenic difference for giving more incorrect anger responses (d = .17).
Child managing emotions in hypothetical challenging scenarios in grades P3-P4 (Table 20): No significant differences in eight tests.
Child coping strategies in hypothetical scenarios in grades P3-P4 (Table 22): Two significant differences in three tests, but both were iatrogenic.
Child attitudes of mutual respect and understanding in grades P3-P4 (Table 24): Two significant differences in seven tests for understanding differences between people (d = .24) and total score (d = .21).
Child managing emotions in hypothetical scenarios in grade P7 (Table 25): Four significant differences in 16 tests for hit by ball (d = .33), invitation turned down (d = .37), lunchroom rejection (d = .29), and greeting ignored (d= .34).
Child naming of feelings in grade P7 (Table 28): Two significant differences in three tests for total naming and positive naming.
Child coping strategies in grade P7 (Table 29): No significant differences in three tests.
Child attitudes of mutual respect and understanding in grade P7 (Table 31): Three significant differences in 10 tests for different opinions (d = .20), not putting others down (d = .29), and happily sharing (d = .28).

Overall, in 92 tests, there were 19 significant outcomes favoring the intervention group and five outcomes favoring the control group. The authors summarized the numerous findings by noting strong intervention effects for teacher ratings of prosocial behavior and for child responses in hypothetical situations (e.g., identifying emotions, demonstrating skills in hypothetical situations, and identifying feelings).

Long-Term:

Not examined.