Success for All

A school-wide reform initiative in which instructional processes, curriculum enhancements, and improved support resources for families and staff work together to ensure that every student acquires adequate basic language skills in pre-K through 2nd grade and builds on these basic skills throughout the rest of elementary school.

Fact Sheet

Program Outcomes

Academic Performance
Preschool Communication/ Language Development

Program Type

Academic Services
Mentoring - Tutoring
School - Environmental Strategies
School - Individual Strategies
Teacher Training

Program Setting

School

Continuum of Intervention

Universal Prevention

Age

Late Childhood (5-11) - K/Elementary

Gender

Both

Race/Ethnicity

Endorsements

Blueprints: Promising
Crime Solutions: Effective
OJJDP Model Programs: Effective
Social Programs that Work:Top Tier
What Works Clearinghouse: Meets Standards Without Reservations - Positive Effect

Program Information Contact

Success for All Foundation
200 W. Towsontown Blvd.
Baltimore, MD 21204
800-548-4998, ext.2372
sfainfo@successforall.org
www.successforall.org

Program Developer/Owner

Bob Slavin and Nancy Madden
Success for All Foundation

Brief Description of the Program

Success for All (SFA) is primarily a literacy program, but is also a schoolwide reform initiative in which specific instructional processes, curriculum enhancements, and improved support resources for families and staff come together to ensure that every student acquires adequate basic language skills in pre-K through 2nd grade and that they build on these basic skills throughout the rest of elementary school. As such, the need for remediation and grade retention should drastically decline. The program has two major components: (a) student-level intervention which includes instruction based on the SFA philosophy and curriculum; and (b) school-level intervention which involves establishing a schoolwide "solutions" team (i.e., a team that addresses classroom management issues, seeks to increase parents' participation, mobilizes integrated services to help families, and identifies particular problems such as homelessness), hiring a full-time program facilitator, and undertaking training and ongoing professional development for staff. Due to the comprehensive approach to reform, the significant and ongoing professional development across multiple years, and the focus on faculty support and buy-in from the outset, a vote of at least 80% of teachers in favor of program adoption is required.

Success for All (SFA) is more than just an elementary school literacy program. It is a schoolwide reform initiative in which specific instructional processes, curriculum enhancements, and improved support resources come together to ensure that every student acquires adequate basic language skills in pre-K through 2nd grade and that they build on these basic skills throughout the rest of elementary school. As such, the need for remediation and grade retention should drastically decline. The SFA program has two primary levels of intervention: (a) student-level interventions and (b) school-level interventions. Note that even student-level instruction is implemented school-wide.

Student-level interventions

Instructional processes: Instruction focuses on cooperative learning which teaches metacognitive strategies. The cycle of instruction includes direct instruction, guided peer practice, assessment, and feedback on progress to students. Students are placed in skill-level reading groups which may cross over grades.
Curriculum: The curriculum is research-based reading, writing and language arts in all grades. The kindergarten curriculum is a full-day program where children learn language and literacy, math, science, and social studies through 16 2-week thematic units. The reading component in K-1 contains systematic phonemic awareness and phonics programs. Key to this curriculum is the use of mnemonic picture cards and embedded video clips that support phonics and vocabulary development. In grades 2-6, students use novels and basals but not workbooks. The curriculum emphasizes cooperative learning and partner reading activities, comprehension strategies such as summarization and clarification built around narrative and expository texts, writing, and direct instruction. Students are required to read books of their own choice 20 minutes at home each evening.
Tutors: In grades 1-3, specially trained certified teachers and paraprofessionals work one-on-one with any students who are failing to keep up with classmates in reading. Tutoring takes place 20 minutes per day during times other than reading periods.
Quarterly assessment and regroupings: Students in grades 1-6 are assessed every quarter to determine whether they are making adequate progress in reading. Assessment information is also used to suggest alternate teaching strategies, changes in reading group placement, or provision of tutoring services.

School-level interventions

Solutions team: This team works in each school to help support staff and families in ensuring success of the children. For example, the Team addresses classroom management issues, seeks to increase parents' participation, organizes and integrates services to help families, and identifies particular problems such as homelessness. The team is composed of school staff, parent liaisons, social workers, counselors, and/or assistant principals.
Facilitators: An on-site SFA program facilitator (a) works with teachers and staff to implement the reading program; (b) manages the quarterly assessments; (c) assists the solutions team; (d) ensures adequate communication between staff members; and (c) makes certain each child is making adequate progress.
Training and professional development: The staff receives three days of intensive training at the beginning of the first year of implementation. During the first year, SFA program staff typically provides 16 more days of on-site support. After the first year, approximately 15 days of additional training by SFA program staff are provided each year.

Outcomes

Primary Evidence Base for Certification

Study 1

Borman et al. (2007, 2005) found that the intervention schools relative to the control schools had significantly

Higher scores on two of the four reading subdomains after two years.
Higher scores on three of the four reading subdomains after three years.

Brief Evaluation Methodology

Primary Evidence Base for Certification

Of the 11 studies Blueprints has reviewed, one (Study 1) meets Blueprints evidentiary standards (specificity, evaluation quality, impact, dissemination readiness). The study was done by the developer.

Study 1

Borman et al. (2007) conducted a clustered randomized trial with a sample of 35 high-poverty elementary schools (grades K-5) across 11 states with over 15,000 students. After randomly assigning schools to the intervention and control groups, assessments of literacy outcomes were completed at the beginning and end of the first school year and then at the end of the second and third years.

Blueprints Certified Studies

Study 1

Borman, G., Slavin, R., Cheung, A, Chamberlain, A., Madden, N., & Chambers, B. (2007). Final reading outcomes of the national randomized field trial of Success for All. American Education Research Journal, 44(3), 701-731.

Risk and Protective Factors

Risk Factors

Family: Neglectful parenting

School: Poor academic performance, Repeated a grade

Protective Factors

Family: Parental involvement in education

School: Instructional Practice

* Risk/Protective Factor was significantly impacted by the program

Subgroup Analysis Details

Subgroup differences in program effects by race, ethnicity, or gender (coded in binary terms as male/female) or program effects for a sample of a specific racial, ethnic, or gender group:

Study 1 (Borman et al., 2007, 2005) did not test for subgroup effects defined by race, ethnicity, gender, sexual identity, economic disadvantage, geographic location, or birth origin.

Sample demographics including race, ethnicity, and gender for Blueprints-certified studies:

The sample in Study was 56% African American and 10% Hispanic.

Training and Technical Assistance

Year 1 - Beginning Implementation:

Introductory Workshops

The principal, Success for All facilitator, and Solutions coordinator attend a five-day New Leaders Conference in Baltimore, Maryland. Participants gain an understanding of the schoolwide structures, including data-based goal-setting, progress monitoring tools, and instructional processes, that form the SFA approach.

The Success for All Point Coach conducts a Leading for Success planning kickoff meeting with the school leadership team. This meeting is at the school site, in preparation for the program introduction workshops for the full staff.

Program introduction workshops at schools involving all staff members will present the schoolwide structures and instructional processes with an emphasis on preparing teachers to use the Success for All instructional tools and classroom materials. After a one-day whole-school overview, teachers meet in break-out groups, each guided by an SFA coach, for two days of introductions to KinderCorner, Reading Roots, and Reading Wings, as appropriate to each teacher's role.

Staff responsible for increasing school attendance, enhancing parent involvement, managing student interventions, and creating community engagement are provided with three days of workshops over the course of the year to develop the planning and intervention teams in those areas.

Ongoing Coaching

Success for All Coaches visit schools throughout the year to provide coaching related to all aspects of SFA implementation. During onsite visits, usually about 16 days in the first year, coaches review progress relative to previously set goals and against previously selected progress metrics, and carry out observations of classrooms, discussions with teachers, reviews of student progress data with teachers and school leaders, reviews of implementation self-assessments, planning for achievement growth, and meetings with school staff responsible for schoolwide initiatives related to prevention and intervention (such as attendance, parent involvement, and student referrals). Coaches are also available by telephone and e-mail to check on progress, answer questions, and problem solve between visits. Coaches develop a strong relationship with the principal and facilitator, who guide the day to day implementation.

Year 2 and beyond:

After the initial year, all school staff participate in one to three days of workshops focused on whole school and classroom implementation of Success for All that are based on identified school needs at the beginning of each year. The Snapshot, an implementation quality assessment completed by the coach and school together, guides the selection of workshops and coaching support services. Onsite visits and telephone/email consultation continue, gradually decreasing as schools build capacity. In Year 2, schools average 12 days of coaching support. In Year 3, schools average 10 days. Schools in Year 4 and beyond usually receive between three and six days.

Training Certification Process

Training of SFA Coaches

Coaches who work with schools to help them implement Success for All receive extensive training and mentoring themselves. This starts with a week-long New Coaches Institute in Baltimore. Then coaches are assigned to area teams (groups of coaches assigned to a given region) and to mentors, who help them develop skills in initial training, ongoing coaching, telephone consultation, data management, and other essential skills. During at least their first year, new coaches only work jointly with their mentor. Ultimately, they need to demonstrate a series of increasingly sophisticated skills, and then they are certified to work as fully-qualified members of their regional teams.

After initial certification, additional professional development continues to be provided by SFA to help coaches develop specific component skills, and they are expected to continue over many years participating in annual experienced coaches institutes and building their skills.

Benefits and Costs

Program Benefits (per individual): $8,863
Program Costs (per individual): $723
Net Present Value (Benefits minus Costs, per individual): $8,140
Measured Risk (odds of a positive Net Present Value): 66%

Source: Washington State Institute for Public Policy
All benefit-cost ratios are the most recent estimates published by The Washington State Institute for Public Policy for Blueprint programs implemented in Washington State. These ratios are based on a) meta-analysis estimates of effect size and b) monetized benefits and calculated costs for programs as delivered in the State of Washington. Caution is recommended in applying these estimates of the benefit-cost ratio to any other state or local area. They are provided as an illustration of the benefit-cost ratio found in one specific state. When feasible, local costs and monetized benefits should be used to calculate expected local benefit-cost ratios. The formula for this calculation can be found on the WSIPP website.

Program Costs

Start-Up Costs

Initial Training and Technical Assistance

Initial training and technical assistance for 20 teachers, plus administrators and support staff, is estimated at $24,750. This would support Success for All coaches on site for one day of planning with the leadership team, three additional days of workshops for the full staff with three trainers, and a one-day workshop with tutors. Registration for 4.5 days of offsite workshops is also included. Ongoing onsite coaching support (14 days) and offsite technical assistance during Year 1 is estimated at $29,400. Total professional development in Year 1 is $54,150.

Curriculum and Materials

$45,994 to cover classroom teacher and student materials, including teacher guides, lesson support videos, and interactive whiteboard tools, student materials, books, and manipulatives, as well as shipping of materials.

Licensing

$700 for online data management tools supporting Success for All and for online professional development tutorials and resources.

Other Start-Up Costs

1) Coverage of travel expenses for principal and program facilitator to offsite conference.

2) If the school chooses to use trade books rather than basal texts, purchase of trade books will be necessary. Approximate cost is $30,000 for the purchase of about 5,500 books.

Intervention Implementation Costs

Ongoing Curriculum and Materials

Replacement of materials estimated at $10,000 per year, including replacement books for kindergarten and first grade students, as programs are encouraged to allow these students to keep reading materials.

Staffing

Qualifications: Program is generally delivered by certified classroom teachers. In addition to the teachers, a full-time Program Facilitator is required to coordinate and support effective implementation of the program. Schools usually fill this position with existing staff.

Ratios: The program does not indicate minimum ratios, but is generally delivered in classrooms, where ratios range from 20 - 30 students per teacher.

Time Required to Deliver Interventions: In addition to teaching time, teachers must have designated time to work in teams focused on continuous improvement of instruction. Teacher teams meet biweekly.

Other Implementation Costs

Some of the student materials must be reproduced; schools can photocopy these materials, or SFAF can provide the materials for an additional charge.

Implementation Support and Fidelity Monitoring Costs

Ongoing Training and Technical Assistance

Ongoing training & technical assistance includes refresher training, coaching, support, and professional development conferences. Costs are estimated at:

$24,950 year 2
$24,950 year 3

Coaching costs include onsite coaching once per month in the first year and every 6 - 8 weeks in years two and three. Includes registration for principal and facilitator at an annual conference.

Fidelity Monitoring and Evaluation

Fidelity monitoring and evaluation of quality of implementation and student outcomes are conducted by the Success for All coaches when onsite for coaching support and on a daily basis by the school's program facilitator. There is no additional cost.

Ongoing License Fees

$700 per year includes continued access to online data management tools, and telephone support for teachers, IT personnel, and administrators. Online resources include tutorials and webinars on a variety of classroom support and data management topics.

Other Implementation Support and Fidelity Monitoring Costs

No information is available

Other Cost Considerations

The per student cost diminishes as school size increases, and declines over time after the initial intensive training and coaching period is complete.

Year One Cost Example

Success for All Foundation offers an implementation example with 20 teachers, 7 tutors, and 500 students in grades K-5. In its first year, a school can anticipate the following costs:

Training with travel, ongoing support (Year 1)	$54,150.00
Classroom materials	$44,994.00
Online Data management and resources	$700.00
Staffing: Program Facilitator, teaching time-in kind
Total One Year Cost	$99,844.00

With 500 students served in the first year, the cost per student would be $198. Per-student costs in years 2 and 3 are estimated to be $80 per student per year, averaging $120 per student per year over three years.

Funding Strategies

Funding Overview

Success for All is a whole school improvement approach with a strong focus on literacy. Thus, it is generally implemented in place of other curricula and school improvement approaches and can be supported with the full range of federal, state, and local funds that support core K-12 education.

Funding Strategies

Improving the Use of Existing Public Funds

Schools that implement Success for All will likely choose to shift funds spent on another curriculum or professional development program to this evidence-based program, as well as allocate teacher time to implement the program. Program facilitators are almost always reallocated from other Title I-supported roles. In addition, Success for All has been successful in reducing grade retention and special education assessments and placements, leading to cost savings that can be invested in ongoing support and expansion of the program.

Allocating State or Local General Funds

State education funds allocated to local school systems as well as locally-appropriated public school funding can support Success for All, particularly during regular reviews of curricula within the district. Professional development funds can also be used for teacher training.

Maximizing Federal Funds

Formula Funds: Title I is the funding stream most typically used to cover the costs of training and coaching support, classroom materials, program facilitators, and tutors.

Discretionary Grants: Federal discretionary grants from the U.S. Department of Education can be used to fund the initial training, ongoing coaching, technical assistance, and classroom materials. The Success for All Foundation was the recipient of a federal Investing in Innovation grant in 2010 and grant funds are enabling them to significantly reduce the initial start-up costs and build local coaching capacity in high need districts throughout the country (see http://www.successforall.org for more information).

Foundation Grants and Public-Private Partnerships

Foundations, especially those with a stated interest in improving educational achievement, can provide funding for initial training, coaching, technical assistance, classroom materials, and books.

Generating New Revenue

While purchase of classroom materials is usually viewed as a school system responsibility, fundraising can also be considered, especially when the school has many competing needs and priorities. Parent Teacher Associations, business and local civic associations potentially serve as sponsors of fundraising campaigns.

Data Sources

All information comes from the responses to a questionnaire submitted by the purveyor, the Success for All Foundation, to the Annie E. Casey Foundation.

Evaluation Abstract

Program Developer/Owner

Bob Slavin and Nancy MaddenSuccess for All Foundation200 W. Towsontown Blvd.Baltimore, MD 21204(410) 616-2310(410) 324-4440rslavin@SuccessForAll.org www.successforall.org

Program Outcomes

Academic Performance
Preschool Communication/ Language Development

Program Specifics

Program Type

Academic Services
Mentoring - Tutoring
School - Environmental Strategies
School - Individual Strategies
Teacher Training

Program Setting

School

Continuum of Intervention

Universal Prevention

Program Goals

Population Demographics

Elementary school children, K through 5.

Target Population

Age

Late Childhood (5-11) - K/Elementary

Gender

Both

Race/Ethnicity

Subgroup Analysis Details

Subgroup differences in program effects by race, ethnicity, or gender (coded in binary terms as male/female) or program effects for a sample of a specific racial, ethnic, or gender group:

Study 1 (Borman et al., 2007, 2005) did not test for subgroup effects defined by race, ethnicity, gender, sexual identity, economic disadvantage, geographic location, or birth origin.

Sample demographics including race, ethnicity, and gender for Blueprints-certified studies:

The sample in Study was 56% African American and 10% Hispanic.

Risk/Protective Factor Domain

School
Family

Risk/Protective Factors

Risk Factors

Family: Neglectful parenting

School: Poor academic performance, Repeated a grade

Protective Factors

Family: Parental involvement in education

School: Instructional Practice

*Risk/Protective Factor was significantly impacted by the program

Brief Description of the Program

Description of the Program

Student-level interventions

Instructional processes: Instruction focuses on cooperative learning which teaches metacognitive strategies. The cycle of instruction includes direct instruction, guided peer practice, assessment, and feedback on progress to students. Students are placed in skill-level reading groups which may cross over grades.
Curriculum: The curriculum is research-based reading, writing and language arts in all grades. The kindergarten curriculum is a full-day program where children learn language and literacy, math, science, and social studies through 16 2-week thematic units. The reading component in K-1 contains systematic phonemic awareness and phonics programs. Key to this curriculum is the use of mnemonic picture cards and embedded video clips that support phonics and vocabulary development. In grades 2-6, students use novels and basals but not workbooks. The curriculum emphasizes cooperative learning and partner reading activities, comprehension strategies such as summarization and clarification built around narrative and expository texts, writing, and direct instruction. Students are required to read books of their own choice 20 minutes at home each evening.
Tutors: In grades 1-3, specially trained certified teachers and paraprofessionals work one-on-one with any students who are failing to keep up with classmates in reading. Tutoring takes place 20 minutes per day during times other than reading periods.
Quarterly assessment and regroupings: Students in grades 1-6 are assessed every quarter to determine whether they are making adequate progress in reading. Assessment information is also used to suggest alternate teaching strategies, changes in reading group placement, or provision of tutoring services.

School-level interventions

Solutions team: This team works in each school to help support staff and families in ensuring success of the children. For example, the Team addresses classroom management issues, seeks to increase parents' participation, organizes and integrates services to help families, and identifies particular problems such as homelessness. The team is composed of school staff, parent liaisons, social workers, counselors, and/or assistant principals.
Facilitators: An on-site SFA program facilitator (a) works with teachers and staff to implement the reading program; (b) manages the quarterly assessments; (c) assists the solutions team; (d) ensures adequate communication between staff members; and (c) makes certain each child is making adequate progress.
Training and professional development: The staff receives three days of intensive training at the beginning of the first year of implementation. During the first year, SFA program staff typically provides 16 more days of on-site support. After the first year, approximately 15 days of additional training by SFA program staff are provided each year.

Theoretical Rationale

The theoretical rationale for Success for All (SFA) exists on two levels -- theories of the importance of individual early literacy and theories of whole-school reform.

The SFA program has a core and fundamental focus on early student literacy. SFA's "defining characteristic" is the specific sequencing of literary instruction across the grades. The K-1 curriculum emphasizes the development of language skills and launches students into reading phonetically regular storybooks. The theory is supported by empirical evidence which suggests that phonemic awareness is the best single predictor of future reading ability.

Some external school reform models have been criticized because their prescriptive designs may suppress teacher creativity and also require an inordinate amount of teacher prep time. However, if the reform model is clearly defined, developed with a mind toward greater fidelity, and has strong professional development and training components, these problems may be mitigated. Success for All has addressed each of these issues and is expected to have earlier and more sustained effects than models without such components.

Theoretical Orientation

Skill Oriented

Brief Evaluation Methodology

Primary Evidence Base for Certification

Study 1

Outcomes (Brief, over all studies)

Primary Evidence Base for Certification

Study 1

Borman et al. (2007, 2005) found that the intervention schools relative to the control schools had significantly higher scores on two of the four reading subdomains after two years and significantly higher literacy scores on three of four reading subdomains after three years.

Outcomes

Primary Evidence Base for Certification

Study 1

Borman et al. (2007, 2005) found that the intervention schools relative to the control schools had significantly

Higher scores on two of the four reading subdomains after two years.
Higher scores on three of the four reading subdomains after three years.

Effect Size

Study 1 (Borman et al., 2007, 2005) found weak to moderate effect sizes. The Cohen's d for the longitudinal sample compared to the control sample was .33 for Word Attack, .22 for Word Identification, and .21 for Passage Comprehension. The combined sample showed slightly higher effect sizes. The Cohen's d for the longitudinal sample was .36 for Word Attack, .24 for Word Identification, and .21 for Passage Comprehension.

Generalizability

One study meets Blueprints standards for high-quality methods with strong evidence of program impact (i.e., "certified" by Blueprints): Study 1 (Borman et al., 2007, 2005). The sample for the study included elementary school students.

Study 1 took place across 11 states and compared the treatment group schools to business-as-usual schools.

Potential Limitations

Additional Studies (not certified by Blueprints)

Study 2 (Correnti, 2009)

No tests for baseline equivalence
No tests for differential attrition
Missing details on sample characteristics and design

Correnti, R. (2009, March). Examining CSR program effects on student achievement: Causal explanation through examination of implementation rates and student mobility.Paper presented at the annual meetings of the Society for Research on Educational Effectiveness. Crystal City, VA

Study 3 (Borman & Hewes, 2002; Madden et al., 1993)

Only students who were stable in their enrollment were studied and no analysis of differential attrition was provided.
Due to lack of randomization, the study results may be due to school differences rather than the program.
The authors do not provide enough data on the control schools to ensure baseline equivalence.
Control school retention and attendance data was not available to compare with the treatment schools.
With only five matched schools, it is difficult to ensure that all relevant school characteristics are the same.
The results on retention are not relevant because not retaining students is a component of the SFA program.
Schools self-selected into the program.
Matching occurred at both school and individual level, but the analysis was done only at the individual level.
The long-term follow-up had attrition rates approaching 50%.

Borman, G., & Hewes, G. (2002) The long-term effects and cost-effectiveness of Success for All. Educational Evaluation and Policy Analysis, 24(4), 243-266.

Madden, N., Slaven, R., Karwit, N., Dolan, L., & Wasik, B. (1993). Success for All: Longitudinal effects of a restructuring program for inner-city elementary schools. American Educational Research Journal, 30(1), 123-148.

Study 4 (Nunnery et al., 1997)

The results were not comprehensive, which suggests that some null or negative results may have been excluded.
Basic results of the effect of implementation on outcomes (without interactions) were not presented.
The lack of a Cohort 1 for the Spanish program due to late implementation may violate the intent-to-treat principle.
No pre-test data was available for Cohort 2 or the Spanish program, which suggests that results could be attributed to pre-existing differences in school achievement, especially given the study's lack of clarity around how the comparison schools were selected.
The schools self-selected into SFA and the comparison schools explicitly did not select SFA, which suggests the strong possibility of selection bias.
Although matched at the school level, the analysis was done at the individual level.

Nunnery, J., Slavin, R., Madden, N., Ross, S., Smith L. J., Hunter, P., & Stubbs, J. (1997). Effects of full and partial implementation of Success for All on student reading achievement in English and Spanish.Paper presented at the meeting of the American Educational Research Association, Chicago IL.

Study 5 (Munoz et al., 2004)

The authors do not justify low survey response rates for teachers who presumably were encouraged to take the survey, or students, who presumably were required to take the survey.
The authors report no systematic analysis of non-response bias in the survey results, especially among parents and students.
For the first three research questions, the authors do not report significance levels.
The authors rely on fidelity of implementation to justify different outcomes by school, but do not measure fidelity in the study.
`The SFA schools self-selected into the program, which may introduce selection bias.
Although matched at the school level, the analysis was done at the individual level.

Munoz, M. A., & Dossett, D. H. (2004). Educating students placed at risk: Evaluating the impact of Success for All in urban settings. Journal of Education for Students Placed at Risk, 9(3), 261-277.

Study 6 (Jones et al., 1997)

The program produced evidence of iatrogenic effects on math achievement.
Possibly because of the lack of SFA approval by the staff or because of Hurricane Hugo, fidelity was extremely weak, so it is difficult to determine whether the results (or lack thereof) are indicative of how a well-implemented SFA program might perform.
The matching of schools on demographics and history of performance may not be strong enough to allow researchers to conclude that differences in outcomes are due to SFA.
Although matched at the school level, the analysis was done at the individual level.

Jones, E., Gottfredson, G., & Gottfredson, D. (1997). Success for some: An evaluation of a Success for All program. Evaluation Review, 21(6), 643-670.

Study 7 (Miller et al. (2017)

Only five tests for baseline equivalence and one difference
Differences in attrition across conditions and incomplete tests for differential attrition
No significant effects for full sample

Miller, S., Biggart, A., Sloan, S., & O'Hare, L. (2017). Success for All: Evaluation Report and Executive Summary.Millbank, UK: Education Endowment Foundation. (ERIC Document Reproduction Service No. ED581417)

Study 8 (Livingston & Flaherty, 1997; Slavin, & Madden,1998)

No tests of statistical significance were presented.
The sample sizes for the Spanish ESL students were so small that the results are extremely difficult to interpret.
The effectiveness of SFA decline over time and may have actually been non-significant by grade 3.
The authors did not report an analysis of differential attrition.

Livingston, M., & Flaherty, J. (1997). Effects of Success for All on reading achievement in California schools.San Francisco, CA: Wested.

Slavin, R. E., & Madden, N. A. (1998). Success for All/exito para todos: Effects on the reading achievement of students acquiring English. Report No. 19.Baltimore, MD: Center for Research on the Education of Students Placed at Risk.

Study 9 (Chambers et al., 2005)

Sample size is low (n=10).
This study does not address how multimedia may impact SFA students on important literacy outcomes other than phonetics.
No long-term follow-up.
No differential attrition analysis.

Chambers, B., Cheung, A., Madden, N., Slavin, R., & Gifford, R. (2005). Achievement effects of embedded multimedia in a Success for All reading program. (Technical Report). Center for Research and Reform in Education, Johns Hopkins University.

Study 10 (Quint et al., 2013, 2014, 2015)

Iatrogenic effects were observed for special education students on 3 of 4 outcomes

Quint, J. C., Balu, R., DeLaurentis, M., Rappaport, S., Smith, T. J., & Zhu, P. (2013). The Success For All model of school reform: Early findings from the Investing in Innovation (i3) scale-up. New York: MDRC.

Quint, J. C., Balu, R., DeLaurentis, M., Rappaport, S., Smith, T. J., & Zhu, P. (2014). The Success For All model of school reform: Interim findings from the Investing in Innovation (i3) scale-up. New York: MDRC.

Quint, J. C., Zhu, P., Balu, R., Rappaport, S., & DeLaurentis, M. (2015). Scaling up the Success for all model of school reform.New York: MDRC.

Study 11 (Tracy et al., 2014)

Used a matched QED design that may be biased by self-selection of schools into the intervention group.
Lack of information on attrition of baseline student sample.
Intent to treat unclear given the lack of information on students (though all available schools were used).
Groups differed at baseline on two measures.
Some evidence of differential attrition.

Tracey, L., Chambers, B., Slavin, R. E., Hanley, P., & Cheung, A. (2014). Success for All in England: Results from the third year of a national evaluation. SAGE Open, 4, 1-10.

Endorsements

Peer Implementation Sites

AZ - Cloves Campbell School. William Collins, 602-304-3170
CA - Pacoima Charter Elementary School. Sylvia Fajardo, Sylvia.fajardo@pacoimacharter.org
CA - Phillips Charter School. Matt Manning, 707-253-3481, mmanning@nvusd.org
FL - Choices In Learning. Janet Kearney, janet.kearney@choicesinlearning.org
FL - Duval Elementary. Judy Black, blackjm@gm.sbac.edu
FL - Jackson Heights Middle School. Sarah Mansur, sarah_mansur@scps.k12.fl.us
FL - Liza Jackson Prep. Kaye McKinley or Leiah Bolin, kmckinley@lizajackson.org or lbolin@lizajackson.org
FL - Lockhart Middle School. Alison Kirby, allison.kirby@ocps.net
KS - Ware Elementary. Deb Gustafson, 785-717-4600, debragustafson@usd475.org
KS - Westwood Elementary. Kim Dressman, 785-717-4150
KY - Owsley County Elementary. Alan Taylor, alan.taylor@owsley.kyschools.us
MD - Franklin Square Elementary/Middle. Terry Patton, 410-396-0795, tpatton@bcps.k12.md.us
MD - Johnston Square Elementary. Raymond Braxton, 410-396-1477, Rkbraxton@bcps.k12.md.us
MD - Windsor Hills Elementary/Middle. Corey Basmajian, 410-396-0595, cbasmajian@bcps.k12.md.us
MI - Commonwealth Community Academy District. Angela Moore, Principal, 313-366-9470, amoore@cwdacademy.com
MI - Detroit Academy of Arts and Sciences. Kim Bland - CAO, 313-833-1100, rkbland@aol.com
MI - Glazer Elementary. Kim Bland - CAO, 313-833-1100, rkbland@aol.com
MI - New Paradigm College Preparatory. Kim Bland - CAO, 313-833-1100, rkbland@aol.com
MI - New Paradigm for Education. Kim Bland - CAO, 313-833-1100, rkbland@aol.com
MN - Central Middle School. Ron Erpenbach, rerpenbach@central.k12.mn.us
MO - Morgan County R II Middle School. Travis Troyer, 573-378-5432, troyert@mcr2.k12.mo.us
MO - Morgan County R II South School. Kimberly Murdock, Principal, murdockk@mcr2.k12.mo.us
MO - Versailles Elementary. Kimberly Murdock, Principal, murdockk@mcr2.k12.mo.us
MT - Pablo Elementary School. Ryan Fisher, 406-676-3390 x7700
ND - Minnewaukan Public School. Jean Callahan, Superintendent, jean.callahan@k12.nd.us
NV - Bordewich/Bray Elementary. Karen Simms, 775-283-2400
OH - Blanchester Local Schools. Bridgid Carson, Director of Instruction, carsonb@blan.org
OH - Karaffa Elementary School. Chris Dopp, Principal, chris.dopp@omeresa.net
OR - Kids Unlimited Academy. Lynn Eccleston, 541-774-3900, leccleston@kuoregon.org
OR - Oakland Elementary School. Nanette Hagen, 541-459-3407
PA - Chambersburg Area Middle School-North. Kurt Widmann, 717-658-1496, kurt.widman@casdonline.org
PA - Claysburg-Kimmel Elementary. Matt Hall, 814-239-5144, mhall@cksd.k12.pa.us
PA - Kane Area Elementary. Linda Lorenzo, 814-837-7555, llorenzo@kasd.net
PA - Roberto Clemente Elementary Charter School. Alyssa Newman, 610-435-5334, anewman@myrcecs.com
PA - Thaddeus Stevens Elementary. Thomas Knepper, 717-261-3470, thomas.knepper@casdonline.org
PA - William Penn School District. Jane Harbert, Superintendent, 610-284-8005, jharbert@wpds.k12.pa.us
SD - Rapid City Area Schools. Luis Usera, District Curriculum Specialist, luis.usera@k12.sd.us
VA - Dudley Elementary School. Lisa Newell, 540-721-2621, lisa.newell@frco.k12.va.us
VA - Rocky Mount Elementary School. Sheila Fields, 540-483-5040, sheila.fields@frco.k12.va.us
WV - Brooke County Schools. Rhonda Combs, Director of Curriculum and Federal Programs, rcombs@k12.wv.us
WV - Hancock County Schools. Andrea Dulaney, Director of Student Services, adulaney@k12.wv.us

Program Information Contact

Success for All Foundation
200 W. Towsontown Blvd.
Baltimore, MD 21204
800-548-4998, ext.2372
sfainfo@successforall.org
www.successforall.org

References

Study 1

Borman, G. D., Slavin, R. E., Cheung, A. C., Chamberlain, A. M., Madden, N. A., & Chambers, B. (2005). The national randomized field trial of Success for All: Second-year outcomes. American Educational Research Journal, 42(4), 673-696.

Certified Borman, G., Slavin, R., Cheung, A, Chamberlain, A., Madden, N., & Chambers, B. (2007). Final reading outcomes of the national randomized field trial of Success for All. American Education Research Journal, 44(3), 701-731.

Study 2

Correnti, R. (2009, March). Examining CSR program effects on student achievement: Causal explanation through examination of implementation rates and student mobility. Paper presented at the annual meetings of the Society for Research on Educational Effectiveness. Crystal City, VA.

Study 3

Borman, G., & Hewes, G. (2002) The long-term effects and cost-effectiveness of Success for All. Educational Evaluation and Policy Analysis, 24(4), 243-266.

Study 4

Nunnery, J., Slavin, R., Madden, N., Ross, S., Smith L. J., Hunter, P., & Stubbs, J. (1997). Effects of full and partial implementation of Success for All on student reading achievement in English and Spanish. Paper presented at the meeting of the American Educational Research Association, Chicago IL.

Study 5

Study 6

Jones, E., Gottfredson, G., & Gottfredson, D. (1997). Success for some: An evaluation of a Success for All program. Evaluation Review, 21(6), 643-670.

Study 7

Miller, S., Biggart, A., Sloan, S., & O'Hare, L. (2017). Success for All: Evaluation Report and Executive Summary. Millbank, UK: Education Endowment Foundation. (ERIC Document Reproduction Service No. ED581417)

Study 8

Livingston, M., & Flaherty, J. (1997). Effects of Success for All on reading achievement in California schools. San Francisco, CA: Wested.

Slavin, R. E., & Madden, N. A. (1998). Success for All/exito para todos: Effects on the reading achievement of students acquiring English. Report No. 19. Baltimore, MD: Center for Research on the Education of Students Placed at Risk.

Study 9

Study 10

Quint, J. C., Zhu, P., Balu, R., Rappaport, S., & DeLaurentis, M. (2015). Scaling up the Success for all model of school reform.New York: MDRC.

Study 11

Tracey, L., Chambers, B., Slavin, R. E., Hanley, P., & Cheung, A. (2014). Success for All in England: Results from the third year of a national evaluation. SAGE Open, 4, 1-10.

Study 1

Evaluation Methodology

Design: This clustered randomized trial selected 41 elementary schools (grades K-5) across 11 states for this study.

School recruitment took place in two phases. In Phase 1, all schools were offered a discount to purchase the SFA program. Ordinarily, schools would have to spend $75,000 the first year, $35,000 the second year, and $25,000 the third year. During the spring and summer of 2001, a one-time payment of $30,000 was offered to all schools in exchange for participating in the study. Only six schools were attracted by this incentive. Three schools were randomly assigned to SFA (Group 1) and three were allowed to spend the $30,000 on any innovation other than SFA (Group 2). The sample was not sufficient, so the following year (spring and summer of 2002), schools were offered SFA at no cost and 35 schools responded. Thus, the initial sample size was 41 schools.

The Phase 2 recruited schools were randomly assigned to one of the two groups. Group 1 schools provided SFA to kindergarten and grades 1-2 and their outcomes were compared to corresponding students from Group 2 who received a different intervention (Phase 1 schools) or their normal reading instruction (Phase 2 schools). Group 2 schools from Phase 1 recruitment did not receive any SFA treatment, and Group 2 schools from Phase 2 recruitment received the SFA treatment only for 3rd - 5th grade students (note, however, that the effects of SFA on 3rd - 5th grade students were not studied because these students were not exposed to the program during the key foundational instruction period in K-2nd grade). Therefore, most of the schools had both a treatment and a control group within each school.

This method of having both treatment and control groups within each school had advantages and disadvantages. The primary advantage was that this design allowed for fewer schools to participate in the study and still provide valid counterfactuals.

One disadvantage was that contamination (i.e., instruction in the treatment grades might influence instruction in the control grades and vice versa) was a distinct possibility. However, during observations to check for treatment fidelity, researchers did not notice any significant contamination of this kind.

A second disadvantage of this design was that having both a treatment and a control in the same school could possibly reduce the measured effects of whole school reform because both treatment and control students and their families could have taken advantage of the school-wide reform-based services (e.g., family meetings). However, during observations to check for treatment fidelity few, if any, control students were observed benefiting directly from school-level SFA services such as parental support.

A third disadvantage of this study is that during the third year of this 3-year study, the majority of baseline 1st grade students had moved to 3rd grade. Because the Group 2 teachers used SFA with their 3rd grade students, there was no control group to compare with the treatment group. Thus, the analysis is restricted to baseline kindergartners who progressed through 2nd grade in this 3 year study.

Of the initial 41 participating schools, five closed due to insufficient enrollment and one withdrew from the study because of "local political problems." Of the remaining 35 schools, 18 were in Group 1 (the "treatment" group, SFA in grades K-2), and 17 were in Group 2 (the "control" group, SFA in grades 3-5 or no SFA at all). The final sample included 1,085 students in the 18 treatment schools and 1,023 students in the 17 control schools.

Borman et al. (2005) examined second-year outcomes, following students from the fall of 2001 to spring 2003 or from the fall of 2002 to spring 2004. Also, they focused on program effects for grades K-2 only. They used 38 randomized schools, with 18 in the intervention group and 17 in the control group. The treatment schools had 2966 students in the pretest sample and 1672 in the posttest sample (56% completion rate). The control schools had 2770 students in the pretest sample and 1618 in the posttest sample (58% completion rate). Including in-moving students who entered the schools after the start of the program raised the posttest sample by 890 students to 4180.

Children in the kindergarten cohort were followed into any grade as long as they remained in the same school. They were also followed into special education.

Sample: The sample was concentrated in the urban Midwest (e.g., Chicago, Indianapolis) and rural and small towns in the South. Approximately 72% of the students participated in federal free lunch program, which is similar to the 80% participation rate for SFA participants in the nation. The sample is 56% African American and 10% Hispanic. This is somewhat different than the SFA national figures of 40% and 35%, respectively. Overall, the researchers contend that the school sample was "reasonably well matched" with the SFA population.

The total enrollment in the SFA schools was 7,923 students (mean per school = 440) and total enrollment in the control schools was 7,400 students (mean per school = 435).

Measures: The measures used in this study were standard language arts assessments used in education research. The pre-test for the kindergarten cohort was the Peabody Picture Vocabulary Test. The Woodcock Reading Master Tests-Revised (WMTR) was used as the annual post-tests and the quarterly assessments. During Year 1 (kindergarten) and Year 2 (1st grade), four subtests of WMTR were administered: Letter Identification, Word Identification, Word Attack (decoding non-words), and Passage Comprehension. In Year 3 (2nd grade), Letter Identification was dropped because it is typically not taught in 2nd grade. The WMTR is nationally normed and has internal reliability coefficients for Word Identification, Word Attack, and Passage Comprehension subtests of .97, .87, and .92, respectively. Scores for the Peabody Picture Vocabulary Test pre-test and the WMTR post-test were standardized to a mean of 0 and a standard deviation of 1.

The students were individually tested by trained testers who were unaware of whether the student was assigned to SFA or the control group. The testers were primarily graduate students who had undergone a 2-day training session, completed a written test, and participated in a practice session with children not in the study.

Analysis: All analyses were run using two different samples. The complete sample included all students, regardless of when they enrolled. The longitudinal sample included only those students who attended the sampled school for the entire three years. A multi-level framework was used with students nested within schools. Hierarchical linear models, which allowed for student- and school-level variability, estimated school-level effects of post-test achievement, with a sample size of 35. All tests were run as two-tailed tests, with alpha=.05 and power at least .80, and degrees of freedom = 32 (35 schools - 3). Total student sample size was 15,323. Pre-test and post-test scores were standardized so that effects show group differences in standard units.

Borman et al. (2005) used the same models with 38 schools and 3290 students in the longitudinal analysis and 38 schools and 4180 students in the analysis including new in-moving students.

Outcomes

Implementation fidelity: In addition to the extensive training and ongoing professional development provided by the SFA staff, trainers from SFA made quarterly implementation visits to each school to assess the extent to which SFA program components were in place. The trainers also identified other potential obstacles including staff turnover and student attendance. The trainers did find some implementation variability. Some schools immediately embraced and implemented the program while others struggled, even after the first year. Classroom instruction was "of reasonable quality" at almost all schools, but the tutoring and "solutions team" were rarely adequately implemented. Finally, most schools had a part-time rather than the recommended full-time facilitator.

Baseline equivalence: The authors report that the treatment and control schools were "reasonably well matched" with respect to demographics. Tests for statistically significant demographic differences between treatment and control schools were non-significant. However, when testing for significant differences, the researchers combined "percent African American" and "percent Hispanic" into "percent minority." They found that there was no statistical difference between the SFA schools and the control schools on "percent minority" but the African American and Hispanic proportions seem quite different to the naked eye. The SFA sample was 49% African American, while the control sample was 65% African American. The SFA sample was 13% Hispanic and the control sample was 7% Hispanic. This difference may be due to the attrition of the 5 schools because the original sample of 41 schools showed no statistical differences in demographics between the SFA and the control schools.

The SFA schools were not significantly different than the control schools with respect to school-level pretest scores.

Borman et al. (2005) reported statistical equivalence on eight school-level measures but did not report on student measures.

Differential attrition: The study lost five schools to attrition (four closed due to insufficient enrollment and one refused to participate due to "local political problems"). Some of the study students had missing post-test data, but had, in fact, been consistently enrolled for three years at a study school. For these students, researchers imputed post-test data. However, for students who had missing post-test data but were not enrolled consistently over the three years, the researchers used listwise deletion. The listwise deletion did not did not cause differential attrition rates by program condition.

No statistically different pre-test scores were found between treatment students who were dropped and control students who were dropped (internal validity satisfied). The researchers also compared attriters with those who were retained in the study. Attriters were more likely than non-attriters to be mobile (i.e., move into a school after the program had started) and had lower average pre-test scores. On one hand, since previous research has suggested that SFA is more effective for lower achieving students, the results from this study that has dropped a disproportionate number of lower achieving students might be biased downward. On the other hand, movers and attriters may be less compliant and their loss may exaggerate program effects.

Borman et al. (2005) found no statistical differences in attrition rates across the two conditions but found that low-achieving students were significantly more likely to have dropped out. They reported no other tests.

Posttest and Follow-Up: The primary outcome was the WMTR test (Word Attack, Word Identification, and Passage Comprehension) at the end of 2nd grade (Year 3). No analysis was completed using the 3rd through 5th grade SFA students (from the "control" schools) because data from them would not be representative of the effects of the SFA program and its emphasis on sequenced, foundational instruction in the early elementary years.

To answer the question of whether SFA positively impacted early-elementary literacy outcomes, the researchers ran the model on the sample of those who participated in all three years (the "longitudinal" sample). The school-level effect size of SFA (Cohen's d) from the multi-level model was .33 units (p<.01) for Word Attack scores, .22 units (p<.05) for Word Identification scores, and .21 units (p<.05) for Passage Comprehension scores. Thus, in all three literacy domains of the WMTR, the SFA schools scored significantly higher than control schools by the end of 2nd grade (Year 3).

In reporting second-year outcomes for the longitudinal sample, Borman et al. (2005) showed that two of the four WMTR sub-scores were significantly higher for the SFA group as compared to the control. Significant Cohen's d results (p < 0.01) were 0.18 for Letter Identification and 0.25 for Word Attack. The difference for Word Identification and Passage Comprehension failed to reach .05 statistical significance at the two-year mark.

To answer the question of whether the effects of SFA were larger for the longitudinal sample vs. the combined sample (includes students who enrolled after program implementation and were therefore not exposed to the program for the full three years), the researchers ran the model on both samples and compared the results. The school-level effect size (Cohen's d) of SFA was .36 units (p<.01) for Word Attack, .24 units (p<.05) for Word Identification, and .21 units (p<.05) for Passage Comprehension. Surprisingly, the effects for the longitudinal sample were not larger than the effects for the combined sample. The researchers concluded that the school-wide reform component is comprehensive enough to impact all SFA children, regardless of the number of years they were exposed to the SFA program.

To address whether the sequencing and length of the program had a broad effect on all literacy domains by the end of 2nd grade, the researchers looked at effect sizes by year. For the combined sample, Word Identification effect sizes (Cohen's d) increased from .09 units in kindergarten to .19 units in 1st grade and then to .24 units in 2nd grade. Word Attack effect sizes were steady from kindergarten to 1st grade and then rose in 2nd grade ( from .32 units to .29 units to .36 units). Passage Comprehension effect sizes grew from -.10 units in kindergarten to .12 units in 1st grade to .21 in 2nd grade. This pattern was similar for the longitudinal sample. Thus, the researchers concluded that, in fact, improving early literacy can be achieved by first building a strong phenomic foundation in kindergarten and 1st grade.

Study 2

Evaluation Methodology

Design: This research used secondary data from the Study of Instructional Improvement (SII). The longitudinal SII contains data collected from 2000-01 through the 2003-04 academic years. The sample included 115 elementary schools (90 treatment schools, roughly evenly spread across the three programs, and 25 control schools). The schools were selected based on geographic region (to control for costs), length of time schools had been affiliated with the programs, and measures of socio-economic disadvantage. The comparison schools were chosen from the same geographic regions and were selected based on similar socio-economic disadvantage measures. Schools in the highest quartile of community disadvantage were over-represented in the sample. The 115 schools provided a student sample size of 7,692.

Analysis: Treatment schools were matched with control schools using propensity scores based on school background characteristics (the author did indicate the specific characteristics used). Literacy achievement indicators for two cohorts of children, K-2 and grades 3-5, were compiled and reading outcomes for treatment schools were compared with reading outcomes for their propensity score matched comparison schools. The analysis was executed twice - once with all students and only with students who were stable in their schools during the treatment period.

Outcomes

Two of the three CSR programs demonstrated a positive treatment effect on student literacy outcomes (Success for All and America's Choice), while the third program (Accelerated Schools Project) showed no significant impact. According to the author, Success for All successfully produced a pattern of "skill-based" reading instruction. Success for All was primarily effective in the early grades (K-2). Importantly, the author noted that the treatment effect was more pronounced for students who were stable in SFA schools. This result implies a dosage response effect and the author argues that this is evidence that Success for All has a causal effect on student achievement.

Study 3

Evaluation Methodology

Design: This study was quasi-experimental in that the five Success for All schools were matched with five other Baltimore schools that were similar in terms of percentage of students receiving free lunch, historical achievement level, and "other factors" that were not identified by the authors. Once the comparison schools were selected, the students were themselves matched based on previous scores on standardized tests.

Reading proficiency data were collected in the 1990-91 academic year from students in all 10 schools who had been stable attenders since program implementation in 1987-88. Therefore, all 3rd graders in this study had been exposed to the program for at least 3 years.

Attrition: No schools left the study during the three years of data collection. In terms of student-level attrition, the study only used data from youth who were enrolled consistently at each school. The lack of effort to follow up or study those not consistently enrolled in the study schools may violate the intent-to-treat principle.

Sample Characteristics: The five SFA schools had a total baseline enrollment of 2,598. The authors do not provide enrollment counts for the control schools. Of the five SFA schools, all had between 97-100% African American enrollment and between 83-98% free lunch eligible. No other data were provided for the five control schools.

Two of the schools were considered "high resource" in that they hired the suggested number of tutors (6 in one school, 9 in the other); offered full-day kindergarten; hired at least two staff members to be on the family support staff (now known as the solutions team), and hired full-time facilitators. The other three schools were considered "low resource' and did not achieve the full level of implementation. These schools hired only 2-3 tutors each, did not hire any additional staff members to be on the family support staff, and had only half-time program facilitators.

Measures: Assessments of reading proficiency were individually administered to students by trained students from local colleges who were unaware of the study hypotheses or the school's treatment status. Retention and attendance data were obtained from school records.

The study used two reading proficiency measures: The Letter-Word Identification (tests letter and word recognition) and Word Attack (tests phonetic synthesis) components of the Woodcock Language Proficiency Battery and the Durrell Analysis of Reading Difficulty which assess oral reading and comprehension.

The study also collected data on retention and attendance, yet this data was only available from Success for All schools. The researchers do not address why they could not get retention and attendance data from the control schools.

Analyses: The reading proficiency analyses were conducted using MANOVAs with standardized pretest scores as covariates and raw scores on the three reading outcomes as dependent variables. The MANOVAs produced Wilkes's lambda statistics and these were used to test for significance. Following multivariate analysis, ANCOVAs were computed for each dependent measure separately. All reading proficiency analyses were done by grade to test program effectiveness as children progress through the successive program components. Significance levels were evaluated at p-values of .10 and below.

The retention and attendance rates for each treatment school were computed for each year and compared over time.

Outcomes

Baseline Equivalence: The five Success for All schools were matched with five other Baltimore schools that were similar in terms of percentage of students receiving free lunch, historical achievement level, and "other factors" that are not identified by the authors. The researchers did not present baseline equivalence data at the student level or pre-test baseline equivalence data.

Differential Attrition: No analyses of differential attrition were presented.

Posttest: Compared to their matched control schools, each SFA school had significantly higher average reading proficiency scores on most outcomes. The average effect size was .51 for Grade 1, .60 for Grade 2 and .57 for Grade 3. The consistency of the effect sizes across grades does not reflect the true difference in average scores between the Success for All schools and the control schools because the standard deviation of scores increased over time. The raw difference in scores between the schools averaged approximately 3 months of grade-equivalency in grade 1, 5.5 months of grade-equivalency in grade 2, and 8 months of grade-equivalency in grade 3.

The researchers also ran the reading proficiency analysis using a sample of students who were in the bottom 25% in terms of reading achievement. The effect sizes were even stronger, but insignificant and unreliable because of extremely small n's (n's between 9 and 16 students).

Retention rates, defined as the percent of students required to repeat a grade, fell from an average of 8.4% before program implementation to an average of .8% in 1990-91. Note, however, that SFA is fundamentally opposed to retention and supports non-retention, even if student performance is not at appropriate grade level. Rather, SFA recommends advancing students and continuing with the program's special services to get them up to speed.

Absentee rates, defined as the percent of students absent, fell from an average of 11.7% to an average of 9.0%. The authors do not report whether this drop is statistically significant for each school or overall.

Long-term followup

Borman, G., & Hewes, G. (2002) The long-term effects and cost-effectiveness of Success for All. Educational Evaluation and Policy Analysis, 24 (4): 243-266.

This study focused on long-term effects of the original Success for All program that was implemented for first-graders in five elementary schools in Baltimore in 1988, 1989, and 1990. The student outcomes assessed in 1998-99 included 8th grade achievement in reading and math and a group of outcomes including years of special education, instances of grade retention, and age at grade 8. The pre- and posttest data for reading and math achievement were drawn from the California Test of Basic Skills in grade 1 and grade 8. The remaining data was drawn at grade 8 from school district records.

Of the original sample of about 2,500 students, 1,310 students remained in the sample for achievement, and 1,730 remained in the sample for the other outcomes. The bulk of the attrition was due to three factors including: (a) students remained in Baltimore schools but had missing data for one or more measure (50%); (b) students left the Baltimore school district (25%); and (c) students had not yet made it to grade 8 (12%).

The rates of attrition among SFA students and control students were statistically equivalent and the reasons for attrition were similar. A further attrition analysis revealed that the SFA attriters and control attriters were statistically equivalent on all background characteristics except for pretest reading score. However, the magnitude of the difference was "essentially" the same as the magnitude between the SFA non-attriters and the control non-attriters in pretest reading score. Thus, internal validity remains intact. With respect to attriters vs. non-attriters, all background characteristics were equivalent except that non-attriters had higher math and reading pretest scores than attriters in both the SFA and control samples. However, to the extent that the SFA program had stronger effects on the lowest achievers, the outcomes may have underestimated the program effect.

ANOVA and logistic regression analysis produced results for achievement outcomes (reading and math CTBS/4 scores) and transcript outcomes (years of special ed in elementary school, years of special ed in middle school, ever retained in elementary school, ever retained in middle school, and age at 8th grade).

The analysis for achievement included controls for pretests. For the full sample, SFA produced a statistically significant effect on reading achievement (E.S. = .29, equivalent to a 6 month advantage) and math achievement (E.S. = .11, equivalent to a 3 month advantage). For the sample of low-achievers, SFA produced a statistically significant effect on reading achievement (E.S. = .34), but not on math achievement.

The analysis for the other outcomes produced some significant results, but the results do not reflect whether students were, in fact, improving academic performance to a point beyond special ed or retention thresholds. Rather, if the school is following the suggestion of SFA, it will, by definition, have fewer special ed placements and fewer retentions than otherwise. These significant outcomes have relevance in that cost savings may accrue because of fewer special ed placements and retained students and the savings could be reallocated to SFA.

Study 4

Evaluation Methodology

Design: In this quasi-experimental design, SFA was offered to the highest poverty elementary schools in the Houston Independent School District. The schools were offered SFA with the reading component only, the reading component plus tutoring, or the full SFA program (reading, tutoring, support team and facilitator). Fifty schools volunteered. From the pool of elementary schools that did not volunteer, 23 were chosen to make up the "matched comparison" schools. The authors did not indicate how precisely the matching was made or why 23 schools were chosen.

Of the 50 SFA schools, 19 schools used the Spanish-bilingual version of SFA alongside English SFA, and one school used the Spanish-bilingual version exclusively. None of the SFA schools were fully implemented in mid-fall 1995, but the Spanish-bilingual programs were especially late in implementation.

In the English dominant study, the cohorts were defined as follows: Cohort 1 began first grade in 1995 and Cohort 2 began first grade in 1996. Only Cohort 1 students were given a pretest (n=4,256). In 1996, posttests were given to ten Cohort 1 students from each school (had reached 2nd grade, n=595) and ten Cohort 2 students from each school (had reached 1st grade, n=682). The researchers reported that Cohort 1 had some missing pretest data and were dropped using listwise deletion. They reported that 46 SFA schools and 18 comparison schools had complete data.

The pre-test (Spanish Language Assessment Scale) was given in 1994-95 to Spanish dominant students who were entering first grade (n=1,682), but because the Spanish-bilingual program was not completely implemented until late in 1995-96 school year, there were no pretest data for Spanish-bilingual students. The final sample included 278 Spanish dominant first grade students in 20 SFA and 10 comparison schools. The authors did not indicate how many of the 278 were SFA and how many were comparison. Also, because the Spanish bilingual version of the program took so long to implement, the researchers did not draw a Cohort 1. This may violate the intent-to-treat principal by not analyzing data that may be negative because the program was difficult to implement.

Attrition: For Cohort 1, the analysis was performed on all students who had both pretest and posttest data. For Cohort 2, or the Spanish-bilingual students, the researchers did not mention attrition. Also, two schools dropped out at some point, but the authors do not address it.

Sample characteristics: Only general characteristics of the schools were provided. The schools had an average of about 78% eligible for free lunch, between 47% and 57% Hispanic, and mobility rates between 30% and 53%.

Measures

Reading measures: The English-dominant reading pretest was the Language Assessment Scales - Oral (LAS). A battery of four reading posttests included the Word Attack, Word Identification, and Passage Comprehension of the Woodcock Reading Master Tests and the Durrell Oral Reading Test.

School characteristics measures: Six measures were drawn from each school: average pretest LAS score, percentage of students eligible for free or reduced-price lunch, student mobility rate, percentage of teachers with advanced degrees, average years of experience of teachers in the school, and teacher attendance rate. Factor analysis was used to generate two aggregate measures - student background characteristics and teacher experience measures of each school. The student background characteristic composite variable was converted into a dummy variable (low/high) at the median.

Implementation measure s: An implementation questionnaire was administered to principals or facilitators in all SFA programs. A 100% response rate was obtained after three mail and two telephone followups. The questionnaire collected program data such as number and type of tutors, facilitator status (non, part-time, full-time), and whether the school implemented a support team. An overall support score was computed by summing the standardized scores for the various measures. Schools were grouped into three implementation categories - low, medium, and high. In general, programs identified as high implementers had a higher number of certified tutors, were more likely to have full-time facilitators, had higher percentages of Hispanic students, and lower percentages of African American students. Among Spanish-dominant programs, only two implementation categories were used to "retain adequate power and balance in the design".

Analyses: Multivariate analyses of variance (MANOVA) were performed to test for overall treatment differences. The reading outcomes were the dependent variables, while implementation level, ethnicity of student body (majority Hispanic or majority African American), and the student background aggregate variable were the independent variables. For Cohort 1, the pretest score was also used as a covariate, while Cohort 2 did not have pre-test scores available. Follow-up univariate analysis was conducted when the multivariate hypothesis tests suggested significant treatment effects. When univariate effects were significant, ANOVA was conducted on residual scores for each student. The authors write, "For Cohort 1, effect size estimates were computed as the difference between mean standardized residual scores of a given SFA implementation level and the comparison mean. For Cohort 2, effect size estimates were computed as the standardized difference between posttest means."

Outcomes

Differential attrition: Differential attrition was not assessed.

Baseline equivalency: SFA schools had a similar percentage of students eligible for free lunch (about 78%). SFA schools had lower percentages of Hispanic students than comparison schools (47% vs. 57%) and SFA schools had higher average mobility rates (53% vs. 30%). Comparison schools had slightly higher average pretest scores than SFA schools. The authors did not present an analysis of how these differences in baseline equivalency may impact the results. The authors did not provide any student-level base equivalency information.

Fidelity: Fidelity is explicitly measured as the "implementation" variable that took the value of low/middle/high in the English-dominant SFA programs and low/high in the Spanish dominant programs. This variable was derived from survey data on number and type of tutors, facilitator status (non, part-time, full-time), and whether the school implemented a support team.

Posttests: In the English-dominant program for Cohort 1, the authors did not present the effect of implementation level on outcome. Rather, the results presented represent interactions between implementation and racial status. The analysis indicated that high-implementation, predominantly African American schools were the only schools that substantially exceeded control students when controlling for the pretest scores (ES=.49 in Oral Reading, ES=.18 in Passage Comprehension,ES=.14 in Word Attack, and ES=.22 in Word Identification).

The analysis of Cohort 2 did not include controls for pretest, so the results should be interpreted with caution. SFA implementation had main effects on Oral Reading (p<.001), Passage Comprehension (p<.001) and Word Identification. Also, a significant multivariate interaction occurred between implementation level and socioeconomic strata (p=.04). High implementation effect sizes for schools with low Student Background characteristics were .33 for Oral Reading, .34 for Passage Comprehension, .73 for Word Attack, and .55 for Word identification. Again, without controlling for pretest scores, the results cannot be clearly interpreted.

Study 5

Evaluation Methodology

Design: This quasi-experimental design used data from three SFA schools and three matched comparison schools in an urban Kentucky school district. The three SFA schools had been participating in SFA for three years, from 1999-2000 to 2001-2002, and the analysis looked at changes from Grade 1 to Grade 3. Student achievement, attendance, and suspension data were taken from school records; schoolwide reform measures were taken from surveys of students, teachers, and parents. Thus, this study sought to examine the effects of SFA not only as an early literacy program, but as a whole-school reform initiative.

Matching took place on two levels - school and student. The treatment and control schools were matched on the following characteristics: percent free\reduced price lunch, race, percent with disabilities, percent from single-parent households, gender, and on historical test scores. The authors are unclear whether the historical test scores are from the Stanford Diagnostic Reading Test, as stated in the text of the article, or the Comprehensive Test of Basic Skills (CTBS), as reported in the table. After school matching, the treatment and control students were matched on free/reduced price lunch status, race, single-parent household status, and gender. The matching procedure was checked using Chi-squared analysis and no significant differences were found between groups on these matching characteristics.

The three SFA schools had been participating in SFA for three years, from 1999-2000 to 2001-2002, and the analysis looked at changes from Grade 1 to Grade 3. The baseline sample size was 1,074 (593 treatment students and 481 control students). The final analytical sample, however, excluded students who transferred out of their baseline schools or did not have assessment data through the entirety of the study. The final N used for analysis was not reported.

Attrition: Only students who were enrolled continuously in their schools from fall 1998 through the 2001-02 school year were included in this analysis. From this group, only students with complete demographic and testing data were included in this analysis. The authors did not provide an analysis of the potential systematic effect of this attrition on the results. The lack of effort to follow up or study those not consistently enrolled in the study schools may violate the intent-to-treat principle.

Sample characteristics: The sample was entirely urban, about 55% female and 57% minority. About 85% of the sample received a free/reduced price lunch and slightly over 70% lived in single-parent homes.

Measures: The measures in this study included: (a) student test scores on the CTBS Reading component, normal curve equivalent (NCE), taken from computer files; (b) school-level records on attendance (mean daily attendance rate) and on behavior-based suspensions (per year); and (c) survey responses from parents, teachers, and students on Likert-scale type questions, including perceptions of school climate, educational quality, and job satisfaction (for teachers only).

The perception surveys were given each year. The student surveys were administered to students in schools. The teacher surveys were to be completed by teachers in private, with assurances of confidentiality. The parent surveys were taken home by students and returned to school.

The combined response rate for all years of the survey was 69% for teachers, 68% for students, and 42% for parents. A total of 115 teachers, 667 students, and 867 parents completed the instruments. The authors did not report response rates by treatment status or justify why the response rate for students was so low, given that the surveys were administered in school.

Analysis: Student-level data were analyzed using ANCOVA methods, with the treatment of SFA as the between-subject factor and the pretest scores as the covariate. Effect sizes reflect standardized differences between SFA and comparison students. The mean Likert-scores for each survey item were averaged by school and overall and were reported separately for each year.

Outcomes

Baseline Equivalence: While equivalence was examined for both schools and students, only student equivalence was tested for significance. Specifically, the authors report the factors that were used to match schools, but given the small numbers did not indicate whether there were statistically significant differences in these or other factors between the treatment and control schools. Importantly, the authors do not report whether significant differences exist on pre-test scores, even though they control for pre-test scores in the ANCOVA. Chi-squared tests indicated that the baseline characteristics of the students themselves were not significantly different by treatment status.

Differential Attrition: The authors did not present any differential attrition analysis.

Achievement on Standardized Reading Tests: The researchers calculated the improvement in the mean CTBS NCE scores from 1998-99 through 2001-02. The SFA treatment schools averaged a gain of 4.4 points, compared to the control schools' improvement of only 2.3 points. The authors do not report whether this is a significant difference. However, using the student level sample (n=295), ANCOVA tests revealed that, adjusting by pretest scores, the effect of the program was statistically significant, but with a very small effect size (ES=.11).

Attendance: The average attendance rate at SFA schools rose 1.2 points, from 93.5% to 94.7%. The average attendance rate at the control schools rose 0.7 points, from 94.4% to 95.1%. The researchers do not report whether there is a statistically significant difference in improvement between the control and the SFA schools.

Out-of-School Suspensions: Among the SFA schools, the mean number of annual suspensions decreased by 23 suspensions (from 49 in 1998-99 to 26 in 2001-02). Among control schools, the mean number of annual suspensions decreased by 11 suspensions (from 22 in 1998-99 to 11 in 2001-02). SFA schools experienced a decrease of 47% in suspensions, while the control schools experienced a decrease of 50%. As with the previous outcomes, the authors do not report whether this is a statistically significant difference.

Mediating Effects

Perceptions of school climate, educational quality, and teacher job satisfaction: Compared to teachers from control schools, teachers from SFA schools had higher increases in ratings of school climate from 1998-99 to 2000-01 (SFA teacher ratings increased from 4.1 to 4.3, compared to no change (4.0) for control school teachers). Educational quality ratings also grew substantially for SFA teachers compared to control teachers (SFA teacher's ratings of educational quality grew from 3.9 to 4.3, compared to no change (3.9) for control school students). Job satisfaction ratings for teachers from SFA increased by .4 points (4.1 to 4.5) and increased by .1 points (4.4 to 4.5) for teachers from comparison schools.

Students from SFA schools rated school climate as 4.1 in 1998-99 and 4.2 in 2000-01, while students from control schools remained steady in their rating of school climate (4.3). Students from SFA schools rated educational quality as 4.3 in 1998-99 and 4.5 in 2000-01, while students from control schools rated educational quality as 4.5 in 1998-99 and 4.5 in 2000-01.

Parents from SFA schools had higher increases in ratings of school climate than parents from control schools (4.0 to 4.4 for SFA parents and 4.2 to 4.4 for control parents). Parents from SFA schools and parents from control schools had identical ratings of educational quality, 4.1 in 1998-99 and 4.4 in 2001-02.

Study 6

Evaluation Methodology

Design: This quasi-experimental study evaluated a single Success for All (SFA) program in Charleston, SC. The SFA school was matched with a comparison school based on "demographics" and "history of performance on district standardized tests." The SFA program was implemented in 1989-90, with pre-test data collected in fall 1989 for kindergarten and first-grade students (Cohort 2 and Cohort 1, respectively) and in fall 1990 for kindergarten students (Cohort 3). Cohorts 1 and 2 were re-tested in the 1990-91 and 1991-92 school years (one and two years from baseline). Cohort 3 was tested again in 1991-92 (two years from baseline).

The base sample sizes for Cohorts 1, 2, and 3 were 172 (113 SFA and 59 control), 157 (109 SFA and 48 control) and 169 (117 SFA and 52 control), respectively. The authors did not report why the SFA sample was almost twice the size of the control sample. Only students who were consistently enrolled in the same school through the course of the study were included in the analysis.

Attrition: Only students who had attended the schools consistently for the length of the study were eligible for final analysis. The number of actual students used in the final analysis excluded students with missing data, regardless of whether the data were missing due to attrition, absence, or some other reason. The sample sizes used in the calculation of each outcome varied according to how many students happened to take the assessment on the day it was offered.

Sample characteristics: Each study school was approximately 50% male, and almost all of the schools were at least 99% African American.

Measures: This study was somewhat unique in that it used the typical SFA measures of literacy achievement, but also used measures that are more typically required by school districts to assess school achievement.

Pretest

Cognitive Skills Assessment Battery

SFA outcome measures

Woodcock Reading Mastery Tests-Revised
Durrell Test of Reading Difficulty

District outcome measures

Merrill Language Screening Test
Test of Language Development
Basic Skills Assessment Program
Stanford Achievement Test
Teacher achievement ratings
Teacher behavior ratings

The SFA outcome measures were not collected in the third year of the study because, according to the authors, the developers had "lost interest" in the evaluation.

Analyses: Analyses were run for each cohort and for each year separately. Means were adjusted for pretest scores and calculated for the treatment and comparison schools using ANCOVA. The standardized regression coefficients were calculated from multiple regression models in which the test score was the dependent variable, and pre-test score and treatment status were the independent variables.

Outcomes

Baseline Equivalence: The comparison school was chosen based on its similarity to the treatment school in demographics (gender and race/ethnicity) and history of performance on district standardized tests. The student sample was roughly evenly split by gender (although Cohort 1 from the control school was 64% male). All of the study schools were almost exclusively African American. The authors did not report on significance of baseline equivalence.

Fidelity: This implementation of SFA was severely compromised. One of the requirements of SFA is that faculty agree to the new program with an 80% majority in a secret ballot. The SFA school in this study was required to participate by the school district. Also, Hurricane Hugo had occurred just before the program was implemented, which caused a good deal of disruption in implementation. The researchers also noted that the SFA facilitator had a somewhat hostile relationship with some teaching staff and that the components of the program (e.g., assessing progress every eight weeks and making reading group adjustments) were not evenly implemented.

Differential Attrition: Neither of the two schools dropped out of the study. The analysis was conducted only on students who were enrolled continuously at their schools and were non-absent on the day of the assessments. No analysis of the effects of student mobility or absence on the outcomes was reported.

Posttest: The outcomes that follow are based on multiple regression betas. The outcomes include SFA developer outcomes (the Woodcock and Durrell Assessments) and school-district outcomes. They indicated that the program appeared to successfully influence achievement in kindergarten, but that the effects did not continue into 1st and 2nd grade.

For Cohort 1 (1st grade in Year 1), none of the developer literacy outcomes or school district outcomes were significant for Years 1 or 2. In fact, the SFA program appeared to have a negative effect on math achievement in Year 1 (beta = -.28, p<.01). Year 3 SFA developer outcome data were not collected and Year 3 school district outcomes results were generally insignificant.

For Cohort 2 (kindergarten in Year 1), with only a few exceptions, the developer literacy outcomes and the school district outcomes were generally significant and positive for the SFA program in Year 1. However, with the exception of scores on the Woodcock Word Attack assessment, all the positive effects of SFA disappeared by the end of Year 2 (1st grade). Year 3 SFA developer outcome data were not collected, and none of the school district outcomes were significantly positive.

For Cohort 3 (kindergarten in Year 2), the developer literacy outcomes were strongly positively significant in Year 2, and the district outcomes were generally significant and positive as well. However, the effects of SFA on the school district measures disappeared in Year 3 (no SFA developer outcome data were collected in Year 3).

The authors conclude that buy-in, cooperation, and implementation is crucial in allowing SFA to function properly and produce positive results. A school culture that approves of an SFA implementation may be very different from a culture that would not vote to approve SFA. Thus, when comparison schools are chosen, the authors strongly believe that the comparison schools should have also have voted in SFA. This would reduce the likelihood that the differences in outcomes are due to factors other than SFA.

Study 7

Evaluation Methodology

Design:

Recruitment: The researchers aimed to recruit at least 50 schools for participation with higher-than-average proportions of students receiving free lunch. Schools needed to be willing to 1) accept results of school-level randomization, at the school level, 2) implement the program over two school years, and 3) give access for the delivery of relevant assessments. The schools came from the North and Midlands of England.

Assignment: Random assignment was conducted at the school level over two cohorts (one starting in 2013 and the other in 2014). The researchers randomized 39 schools in the first year and 14 schools in the second year. The result was 874 treatment condition students in 27 schools and 893 control group students also in 27 schools. The number of schools reported for the two cohorts (N = 53) did not match the number reported for the two conditions (N = 54). The treatment group received the program over two years, the reception year and the first year of primary. The control group continued with business as usual.

Attrition: Assessments occurred at pretest, midpoint (1 year into the 2-year program), and posttest (at the end of the 2-year program). Seven schools withdrew from the study after randomization, and all of these belonged to the treatment condition. Data from most of these schools, however, were included in analyses. Attrition varied by outcome but was 12% at the midpoint and ranged between 15% and 24% at posttest.

Sample:

Study schools tended to be smaller than the national average for England with roughly 22% qualifying for free lunch. The mean age of participating students was just over 4.5 years, and 49% were male.

Measures:

Measures were administered by field workers who were part of the project team but who were also blind to allocation. The study presented no figures on reliability or validity for the sample, but the standardized measures have been well validated.

The primary outcome of interest was literacy as measured by the Woodcock Reading Mastery Test III (WRMT III). At the end of the first year of implementation (the midpoint), the WRMT III was administered using the letter identification, word identification, and word attack subscales. At the end of the second year of implementation (posttest), the WRMT III was administered using the word identification, word attack, and passage comprehension subscales. The researchers then combined respective subscales to create overall literacy scores.

A secondary measure was the Phonics Check, a standardized national literacy assessment administered at posttest in June of the second year.

Because these outcome measures were not considered age-appropriate for children at baseline, the researchers administered the British Picture Vocabulary Scale (BPVS) at the beginning of the first school year to serve as a baseline literacy measure.

Analysis:

The researchers used multilevel models to account for randomization at the school level. Since no significant baseline differences in demographic composition were found between conditions, the only covariates included were the baseline BPVS literacy score and a school-level achievement measure that was used in the randomization process.

The researchers also explored interaction effects to determine whether the program differentially affected students receiving free lunch or with low baseline literacy performance. Due to high attrition, they also conducted multiple imputation as a sensitivity test. In addition, since seven treatment schools opted out of the program, they included a treatment-on-treated analysis that excluded those schools.

Intent-to-Treat: Seven treatment schools opted out of program implementation and several others failed to fully implement the program. The main analyses, however, included data from all but 2 or 3 of the schools that were initially randomized. The researchers also used multiple imputation for missing data as a sensitivity test.

Outcomes

Implementation Fidelity:

Implementation data were collected through classroom observations, staff interviews, and a teacher survey. The researchers assessed implementation fidelity among 15 of the schools still delivering the program, rating them as mechanical, routine, or refined. By the end of the two years, only one school reached the refined level of fidelity. The researchers found that the school with refined implementation had a significantly higher score on the Phonics Check at posttest. Results for the WRMT III literacy measure were higher for the refined school but with only marginal statistical significance.

Baseline Equivalence:

The researchers found no significant differences at baseline in terms of gender composition, age, free lunch eligibility, and mean pre-test scores from the British Picture Vocabulary Scale. Control schools, however, had a significantly higher school-level academic attainment, and consequently controls were used for this variable.

Differential Attrition:

There was a significantly greater proportion of missing data for the treatment condition (15%) as compared to the control (10%). Missingness at posttest was also associated significantly with poorer pretest outcome scores. Table 13 compared results for the complete case and imputed analyses, with results differing little, however.

Posttest:

The main analyses found no significant program effects on literacy outcomes at midpoint or posttest.

In light of a marginally significant (p = 0.08) interaction effect for free lunch eligibility, the researchers did a subgroup analysis looking at the program effect for free lunch eligible students alone, thereby limiting the analysis to a subsample of 386 students. Among this subgroup, the researchers found that treatment students scored significantly higher (p = 0.03) than the control group on the WRMT III literacy test at midpoint. They also found that the posttest phonics score was higher than the control for the subgroup but with only marginal significance (p = 0.06).

When limiting the analysis to schools that did not drop from the study (a treatment-on-treated analysis), treatment group participants performed significantly higher than the control at post-test on the WRMT III literacy test.

Long-Term:

Not included

Study 8

Evaluation Methodology

Design: This quasi-experimental design compared reading outcomes for three cohorts of students from three SFA schools to three cohorts from three matched comparison schools. Pretesting took place in kindergarten in fall 1992 (1992 cohort), fall 1993 (1993 cohort) and fall 1994 (1994 cohort). Posttests were given in the spring of 1993, 1994, and 1995. Thus, the 1992 cohort had three years of data, the 1993 cohort had two years of data, and the 1994 cohort had one year of data.

The authors did not indicate how the study schools were selected. Comparison schools from the same cities as the treatment schools were chosen based on "student demographics and other selected factors."

The treatment and control schools were Fremont Elementary and Taft Elementary from Riverside, CA; Orville Wright Elementary and Garrison/Kelly Elementary from Modesto, CA; and El Vista Elementary and Tuolumne Elementary also from Modesto, CA. The students were pretested in kindergarten, and the baseline sample sizes were 118 for Fremont, 142 for Taft, 72 for Orville Wright, 135 for Tuolumne, 90 for El Vista and 90 for Garrison/Kelly.

In the treatment schools, the SFA program was modified to be more appropriate to ELL students.

Sample characteristics: The authors did not provide sample characteristics at the student level. Rather, the characteristics of the schools were presented as of Spring 1992. All six schools had reading scores below the 60th percentile and all had at least 50% minority enrollment.

Measures: All kindergarten students were pretested with the Peabody Picture Vocabulary Test. The assessors were current and former classroom teachers who had received training on proper administration of the test. The posttests were three scales from the Woodcock Language Proficiency Battery (Word Identification, Word Attack, and Passage Comprehension).

Analysis: To prepare for the analysis, the students were divided into four analytical groups, defined as follows:

English Speakers: Dominant language in kindergarten was English, the pretest language was English, the instruction was in English, and the posttest was in English.
Spanish Bilingual: Dominant language in kindergarten was Spanish, the pretest was in Spanish, the instruction was in Spanish, and the posttest was in Spanish.
Spanish ESL: Dominant language in kindergarten was Spanish, the pretest was in Spanish, the instruction was in sheltered English, and the posttest was in English.
Other ESL: Dominant language in kindergarten was not English or Spanish, the pretest was in English, the instruction was in English, and the posttest was in English.

ANOVA analyses were conducted within each analytical group and cohort, with PPVT pretest score as a covariate. Effect sizes were calculated.

Outcomes

Baseline Equivalence: The treatment schools were somewhat equivalent to their matched schools on the following characteristics: historical reading scores, percent AFDC, percent free lunch, percent minority, percent ELL, and percent Spanish speaking. The authors did not indicate whether the differences between treatment and comparison schools on these factors were statistically significant. Baseline equivalency at the student level was assessed with the PPVT pretest scores, and there were no differences between treatment and control students within each analysis group and cohort.

Differential Attrition: The authors did not address differential attrition. They also did not address student mobility in and out of the control and treatment schools.

Posttests: Importantly, the researchers did not do tests of statistical significance for any of the results.

No tests of statistical significance of results were presented. For English speakers, the SFA program showed moderate positive effect sizes for the 1992 Cohort (effect sizes = .41, .42, and .23 for grades 1, 2, and 3, respectively.) The SFA program showed moderate positive effect sizes for the 1993 Cohort (effect sizes = .87 and .34 for grades 1 and 2, respectively). The SFA program showed a weak positive effect for the 1994 Cohort (effect size = .27). In general, the effect size decreased over time within cohort.

No tests of statistical significance of results were presented. For the Spanish Bilingual group, the SFA program showed extremely strong effects early, but the effects declined over time. Specifically, the effect sizes for the 1992 Cohort were 1.36, .19, and .09 for grades 1, 2, and 3, respectively. The effects sizes for the 1993 Cohort were 1.32 and .72, for Grades 1 and 2, respectively. The effect size for the 1994 Cohort was 1.4 for grade 1.

No tests of statistical significance of results were presented. For the Spanish ESL group, the SFA program effects were similar to the Spanish Bilingual group. Specifically, the effect sizes for the 1992 Cohort were .97, .45, and .03 for grades 1, 2, and 3, respectively. The effect sizes for the 1993 Cohort were .72 and .43, for Grades 1 and 2, respectively. The effect size for the 1994 cohort was 1.41 for grade 1. However, the sample sizes for the SFA students in this group were extremely low - n=7 for Cohort 1992, n=4 for Cohort 1993 and n=4 for Cohort 1994.

No tests of statistical significance of results were presented. For the Other ESL group, the SFA program effects were small to moderate. The effect sizes for the 1992 Cohort were .24, .25, and .05. The effect sizes for Cohort '93 were .96 and .49 for Grades 1 and 2, respectively. The effect sizes for the '94 Cohort were nil. Again, the general trend was decreasing effect sizes over time.

To address the general trend toward lower effect sizes over time within cohort, the authors provided grade equivalencies for each cohort and analytical group. For the Spanish Bilingual and Other ESL groups (Spanish ESL sample sizes are too low to be trusted), the grade equivalency differentials between treatment and control for Grade 3, unfortunately, appear to be quite similar. Without tests of statistical significance, the case of non-decreasing effects is difficult to make.

Study 9

Evaluation Methodology

Design: This study used a cluster randomized trial design to identify the effects of using embedded multimedia in SFA programs. Staff from ten SFA elementary schools in an inner city Hartford, CT school district agreed to implement the embedded multimedia component. Five of the ten schools were randomly chosen to implement the multimedia component of SFA, and the other five served as the control group for the first year, using SFA without multimedia. After the first year, the control group was given the embedded multimedia component.

The components of the embedded media treatment included:

Animal Alphabet: Animations that teach and reinforce sound/symbol relationships.
The Sound and the Furry: Videos in which SFA puppets model the word blending process, phonemic awareness, spelling, fluency, reading strategies, and cooperative routines.
Word Plays: Live action videos of skits dramatizing important vocabulary concepts from the Success for All beginning reading texts.
Between the Lions: Clips from the award-winning PBS program in which puppets and animations teach phonemic awareness, sound/symbol correspondence, and sound blending.

The subjects were SFA first grade students who were pretested in early October 2003 and posttested in early May 2004.

Sample characteristics: The SFA embedded media schools and the SFA control schools were very similar. Each group of schools had an enrollment of just over 200 1st grade students; more than 95% of the students from each group of schools received free lunch; and about 30% of the students from each group had LEP's. The racial/ethnic distribution was very similar, with both groups of schools enrolling about 2/3 Hispanic students, 1/3 African American. The authors did not provide characteristics of the actual sample of first grade students.

Attrition: Of the 450 first graders enrolled in all ten schools in the fall of 2003, 394 completed pre-and posttests (n=189 in treatment schools, n=205 in control schools).

Measures: The pretests were the Peabody Picture Vocabulary Test (PPVT) and the Word Identification subtests from the Woodcock Reading Mastery Test. Each testing session took approximately 30 minutes per child.

The posttests were the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) and three scales from the Woodcock Reading Mastery Test: Word Identification, Word Attack, and Passage Comprehension. Testing sessions were about 42 minutes per child.

Analyses: The data were analyzed using Hierarchical Linear Modeling with students nested within schools. The dependent variables were the DIBELS score and the three subscales of the Woodcock Reading Mastery Test. The independent variable was treatment condition and the PPVT and Word ID pretest were used as covariates. The analysis was conducted on the entire sample and on a sub-sample of Hispanics.

Outcomes

Baseline Equivalence: The authors did not provide demographic baseline equivalency data on the first grade students. However, there was no significant difference in the pretests between SFA treatment and SFA control students. No significant difference existed between the embedded media SFA schools and the SFA control schools on mean PPVT and mean Word Identification score. However, at the individual level, the Word Identification scores for students from the control schools were higher (p<.01) than Word Identification scores for students from the embedded media SFA schools.

Fidelity: The researchers did not measure or report on fidelity.

Differential Attrition: The authors did not present an analysis of the 56 students who did not complete both pre- and posttests.

Posttest: Only one of the four outcomes measures showed significant effects for the embedded media SFA program. Specifically, embedded multimedia SFA schools scored significantly higher than the control SFA schools on the Word Attack subtest (p<.05 and individual ES=.47), but did not score significantly better on Word Identification, Passage Comprehension, or the DIBELS assessment. This pattern of outcomes held for the Hispanic subset as well.

The authors expected that Word Attack would be the assessment that was most effective because three of the four multimedia segments dealt primarily with letter sounds and sound blending, which are key components of Word Attack. The fourth, Word Plays, focused on vocabulary. The other measures, especially Passage Comprehension and DIBELS, are more logically related to reading of connected text, which was emphasized equally in both groups.

Study 10

Evaluation Methodology

Design: This study used a randomized-controlled trial to estimate program impacts on K-2 reading over three years of a multi-year evaluation project. The study recruited five school districts in four states for a total sample of 37 schools and examined the effects of the intervention from the 2011-2012 school year through the 2013-2014 school year. Each of the schools had to be willing to participate and meet the following eligibility criteria: it had to serve students from kindergarten through fifth grade; at least 40% of students had to be eligible for the free and reduced-price lunch program; it had to identify a school staff member to serve as program facilitator; at least 75% of teachers had to vote to adopt the program. The 37 schools were randomly assigned to a condition, resulting in 19 intervention schools and 18 control schools.

The study followed the 2,956 kindergarten students enrolled in the 37 schools in the fall of the 2011-2012 school year that were not enrolled in separate special education classes. Pretests were given in the fall and first-year posttests were administered in the spring. The analysis sample included 2,568 kindergarten students who were present in the study schools in the fall and spring of the school year and who had valid spring test scores. An additional sample used in supplemental models included any kindergarten student with a valid spring test score, regardless of whether the student was enrolled in the study school in the fall (N=2,897).

Follow-up data from spring of students' first grade year was collected in 2013. A total of 2,251 students (though N was as low as 2,147 for one measure) who remained enrolled in a school of the same type (treatment or control) and completed assessments in spring comprised the analytic sample.

At the 3-year follow-up in 2014, up to 1,635 students (55%) had scores on the outcome measures.

Sensitivity analyses were also performed among all students completing measures in first and second grade regardless of whether students attended a program school in previous years (N ranged from 2,802 to 2,962 across measures).

Of those enrolled in a study school at baseline, 10.4% of program students and 9.8% of control students transferred to a non-study school. Some students transferred from one study school to another, and these students' treatment statuses were determined by the status of the fall school. Of the students in the program group at baseline, 0.9% transferred to a control group school; of those in control schools at baseline, 0.6% changed to a program group school. Of the total treatment sample, 63% were in the treatment group for all 3 years.

Sample Characteristics: Study schools were located in the West, South, and Northeast regions of the country, with most located in large or midsize cities. The average school enrollment was 547 students. Across the sample, the kindergarten students averaged 5.5 years old and were evenly divided across gender. Most students were Hispanic (64-65%), followed by black (20%), white (13-14%), other race/ethnicity (1-2%), and Asian (1-2%). Over 88% of the sample was comprised of families in poverty. Between 18 and 25% of the students were English language learners and a small percentage (8%) were in special education.

Measures: At posttest, two measures came from the "Basic Reading" achievement cluster of the Woodcock-Johnson III Tests of Achievement, developed and validated by others. Students who were instructed primarily in Spanish were given Spanish and English versions of these assessments. The study used raw scores for these measures, since the standard scores would rely on the test's norming sample that was reported to be out of date (p. 126).

Letter-word identification test. This assessment measures a student's letter and word identification skills and tests reading decoding.
Word attack test. This test measures a student's ability to apply phonic/decoding skills to unfamiliar words.

At the first and second grade follow-ups, two additional measures from the Woodcock-Johnson reading cluster assessed more advanced reading skills:

Test of Word Reading Efficiency. Assesses efficiency of sight word recognition and phonemic decoding in children.
Passage Comprehension. Students orally supply the missing word removed from a sentence or brief paragraph.

The study also administered the letter-word test at baseline. Additionally, the following measure was collected at baseline:

Vocabulary test score, using the Peabody Picture Vocabulary Test, developed and validated by others.

Analysis: The study conducted two-level hierarchical models that nested students within schools and treated the five districts as fixed effects. Models included school- and student-level covariates. It appears that student-level and school-average baseline outcome scores for the letter-word test and the vocabulary test were controlled (Appendix F, p. 123-4 in Quint et al., 2013). Baseline scores for word attack do not appear to have been included as covariates, but they may not have been developmentally appropriate at pretest. To adjust for multiple tests, the year-1 analysis applied the Benjamini-Hochberg procedure, while the year-2 and year-3 analyses noted results after adjustment in appendices and footnotes rather than in all analyses. Additional analyses were performed for the full sample of students assessed in spring of first grade, regardless of Kindergarten program exposure.

Moderation analysis applied the same multilevel models to the following subgroups: Blacks, Whites, Hispanics, males, females, special education, not special education, English language learners, non-English language learners, poverty status, and not poverty status. Additionally, models determined whether program effects varied across initial achievement levels through terms interacting condition status with baseline vocabulary test score and baseline letter-word test score.

Intent-toTreat. The study used all subjects with outcome data. Students with missing outcomes, due primarily to those moving to non-study schools and secondarily to missing the assessments, were dropped from analysis. Students new to the study schools, and not present for the full program, were included in separate analyses. Students missing covariates (but not outcomes) were included with covariates indicating missing values.

Outcomes

Implementation Fidelity: Although teachers voiced some concerns, "by the end of the first year, all but one of the study schools were deemed to have met SFAF's standards for adequate first-year implementation, although there was also considerable room for improving the breadth and depth of that implementation" (p. ES-4, 2013). Further information on implementation fidelity is reported in Chapter 4 of the 2013 report and Chapter 3 of the 2015 report.

By the end of the second year (Quint et al., 2014, pp. 5-13), "program group schools improved their implementation of SFA… [putting] in place new practices that they had not previously implemented, and they increased the proportion of classrooms within a school where SFA-prescribed practices were in evidence" (p. 5). However, due to more strict standards for implementation as schools progress with the program, only "16 of the 19 program schools were judged to meet SFAF's standards for adequate implementation fidelity" (p. 8), and qualitative assessments from teachers implementing the program indicated that they "reported feeling much more at ease with the SFA initiative in the second year than in the first year, although they continued to express some concerns about the program" (p. 11). Perhaps most notably, intervention group teachers were significantly less likely than controls to believe that their reading program helps adequately prepare students to do well on state achievement tests. During the 3rd year (Quint et al., 2015, p. 27), 17 of 19 schools achieved adequate implementation fidelity.

Baseline Equivalence: Program and control schools did not differ on free and reduced-price lunch eligibility, race/ethnicity, sex, school enrollment, number of full-time teachers, or percentage of students at or above reading proficiency level. Program and control students did not differ on age, poverty status, race/ethnicity, sex, special education status, or vocabulary test score. Marginally significant differences (p<.10) across condition status were noted for English language learner status and letter-word identification test score. See Quint et al. (2013, p. 14).

Differential Attrition: All studies tested for different rates of attrition by condition, and two studies examined differential attrition by testing for baseline equivalence in the analytic sample, after excluding dropouts.

At the end of year 1 (Quint et al., 2013), there were no statistically significant differences across conditions for students transferring schools, including changes to another study school, to a non-study school, or to either a study or non-study school in spring of students' Kindergarten year. In addition, there was no significant relationship between condition status and the proportion of in-movers (students enrolled in a study school in the spring, but not the fall).

At the end of year 2 (Quint et al., 2014), tests for differential attrition among those retained in the spring of students' first grade year revealed no significant differences in response rates by condition, but one marginally significant difference (p= .058) on teacher surveys measuring implementation. Baseline sociodemographic or outcome measures were not tested for differential attrition.

At the end of year 3 (Quint et al., 2015, Table 2.5), the study reported no significant differences in attrition across conditions. Further, tests for baseline equivalence of the analysis sample (Table 2.4), which excluded those lost to attrition, revealed no significant differences across conditions. Appendix B indicates some differential attrition. Specifically, Table B.3 shows that out-movers differed significantly on several measures from those retained for the analysis sample, and Table B.4 shows that out-movers in the intervention group differed significantly on several measures from the control group out-movers. However, based on Table 2.4, the differential attrition was not strong enough to compromise the randomization.

Kindergarten Posttest: Adjusting for multiple hypotheses testing, the intervention group scored marginally significantly higher on the word attack (p<.10), but not the letter-word test. Without the adjustment, the impact of the program on the intervention group word attack scores was significant (effect size=.18). Results using a sample that also included students who were not enrolled in the study school in the fall showed the same results, with word attack scores significantly improved among the treatment group (effect size=.18).

Moderation Analysis: Positive and significant program effects for the word attack test were observed for males, black students, students in poverty, non-English language learners, and students not in special education. Hispanic and female students showed marginally significant improvements on word attack, while whites, students in special education, English language learners, and students not in poverty did not differ. No significant differences on letter-word test for any subgroup were reported. Among students who primarily received reading instruction in Spanish, analysis revealed no significant differences across conditions on four measures (English and Spanish letter-word and word attack tests).

Additional models found that program effects did not vary by initial achievement. For both outcome measures, terms interacting condition status with baseline vocabulary test score and baseline letter-word test score were not significant when included separately or together.

First Grade Follow-up: By spring of the students' first grade year, the treatment group had made significant small-moderate improvements in word attack (effect size= .35) and marginally significant improvements in word identification (p= .08, effect size= .09) compared to controls. No treatment effects were observed for higher level reading functions such as reading efficacy or passage comprehension.

A supplementary analysis examining whether program effects persisted among a sample of all students who completed measures in spring (including those who did not attend program schools in Kindergarten) indicated that the treatment was still positively associated with improvements in word attack, but not word identification, relative to controls.

Moderation Analysis: Positive, significant impacts of the program were observed for letter-word identification among Hispanic and female students. Similarly, Black, Hispanic, female, male, and non-English language learner students receiving the intervention improved word attack, relative to like controls. Treatment group Whites also improved passage comprehension; however, special education students performed significantly worse on 3 of 4 measures (letter-word identification, word attack, and passage comprehension) than their control group counterparts -- an iatrogenic effect.

Second Grade Follow-up: The study reported significant improvement in the treatment group for the Woodcock-Johnson Word Attack subtest of phonics decoding skills (p=.022, d = .15), but not for the other three reading tests. The program also had no impact on school-level measures of special education or grade retention rates.

For a subset of the sample that had Woodcock-Johnson letter identification scores below the median score of the primary sample, the intervention had some additional marginal effects. Among "lower performing" students, the treatment group had better scores on the Woodcock-Johnson Letter-Word Identification (p=.074), Woodcock-Johnson Word Attack (p=.014) tests and the Test of Word Reading Efficiency (p=.099) at the second-grade follow-up. There were no moderation effects for the Peabody Picture Vocabulary test. The study reported that results for socio-demographic groups were consistent with earlier results.

Study 11

Evaluation Methodology

Design

The study evaluated the effects of the Success for All program using a quasi-experimental design. Twenty schools from a range of regional contexts throughout England that were already using the program were recruited in spring of 2008 to participate in the evaluation. Once these treatment schools consented to participate, researchers recruited 20 control schools whose academic and student demographic characteristics matched those of the treatment schools. The matches used prior test scores, % free-lunch eligible, and % additional language students for the full school rather than for the kindergarten subjects. The study did not present the number of students randomized to each group.

Baseline measures were collected from students attending the 40 participating schools in fall of their reception or kindergarten year (September 2008). Measures were also collected at the end of kindergarten (spring 2009) and at the end of grades 1 and 2, though only the grade 2 (posttest) results were presented. At posttest, 36 schools (90%), 18 in both treatment and control conditions, were retained. The number of students in the posttest analysis varied by outcome. Tables 2 and 3 show that the number of students in the control group ranged from 381 to 471, and the number of students in the intervention group ranged from 356 to 415.

Sample

Little information was given describing the kindergarten student sample, though aggregate measures suggest that about 40% of pupils were eligible for free school meals, about 35% were English language learners, 23% had special educational needs that were provided by the school, and 13% had special educational needs that were fulfilled by outside specialists.

Measures

The study's outcome measures were collected at posttest, in the spring of students' 2nd grade year. As with the other studies, measures primarily come from the Woodcock-Johnson Tests of Achievement, which was normed in the U.S. Testers were blind to condition.

Letter-word identification test. This assessment measures a student's letter and word identification skills and tests reading decoding. Reported internal consistency was .97.
Word attack test. This assesses a student's ability to apply phonic/decoding skills to unfamiliar words. Reported internal consistency was .87 for the measure.

Additional measures of higher-order reading accuracy, reading rate, and comprehension came from the York Assessment of Reading Comprehension. Reliability for the three constructs was .87, .95, and .62 among the posttest sample.

Baseline reading ability was assessed using a more developmentally appropriate measure, the British Picture Vocabulary Scale- Second Edition: An English adaptation of the Peabody Picture Vocabulary Scale. Cronbach's alpha for the measure using a national sample of English children was .93.

Analysis

The program's impact on reading outcomes at posttest was estimated using multilevel regression models, with students nested within schools. Analyses adjusted for baseline picture vocabulary scores at the school level, but not for demographic characteristics that differed between treatment groups.

The study used all schools that were willing to continue to provide data and all students who were present on testing days. No effort was made to follow students who moved out of the study schools or into another study school.

Outcomes

Implementation Fidelity: Schools were rated by program personnel on 19 items related to teacher and student behaviors. Each item was rated on a scale of 0 to 3, with 3 indicating the highest fidelity. In total, the 18 intervention schools had medium or high implementation ratings: 10 schools received ratings of 3, 7 schools were rated 2, and only 1 school was rated 1.

Baseline Equivalence: Despite the matching strategy used to identify control sites, treatment schools had significantly more students eligible for free lunch and a significantly greater proportion of students learning English as a second language. Schools did not differ significantly on baseline reading measures.

Differential Attrition: Groups in the analytic sample used for the posttest results differed significantly on one baseline measure-- the percentage of English language learners.

Posttest: Analysis revealed that program schools significantly improved 2 of 5 literacy outcomes relative to controls: Word identification and word attack. Standardized effect sizes for both outcomes suggest that the treatment led to small improvements in basic reading skills (d= .20 and .25, respectively), and while results favored the treatment group on higher-level reading skills, differences were non-significant for those measures.