Reading Recovery

A one-on-one tutoring intervention designed to reduce the number of first-grade students who have extreme difficulty learning to read and write and to reduce the cost of these learners to educational systems.

Fact Sheet

Program Outcomes

Academic Performance

Program Type

Academic Services
Mentoring - Tutoring
School - Individual Strategies
Skills Training

Program Setting

School

Continuum of Intervention

Indicated Prevention

Age

Late Childhood (5-11) - K/Elementary

Gender

Both

Race/Ethnicity

Endorsements

Blueprints: Promising
What Works Clearinghouse: Meets Standards Without Reservations - Positive Effect

Program Information Contact

Reading Recovery Council of North America
500 West Wilson Bridge Road, Suite 250
Worthington, Ohio 43085-5218
Phone: 614-310-READ (7323)
Main Fax: 614-310-7345
Conference Dept. Fax: 614-310-7342
http://www.readingrecovery.org/

Program Developer/Owner

Dr. Marie M. Clay, Deceased

Brief Description of the Program

The program is an intensive one-to-one tutoring intervention program for the poorest readers (lowest 20%) in first-grade classrooms. During daily 30-minute lessons, teachers who are specifically trained in Reading Recovery techniques individually tutor up to eight faltering readers to help them develop the kinds of strategies that good readers use. For the first 10 days, the teacher does not teach, but rather, explores reading and writing with the child to determine specific needs. During the following days, Reading Recovery lessons evolve around reading small story books (the teacher chooses from 500 books organized into 20 reading levels), manipulating letters and words, and composing and writing a story. Specific skills taught include problem solving strategies based on self-monitoring, cross-checking, predicting, and confirming, as well as the use of multiple sources of information while reading and writing. Children typically leave the program within 12 to 20 weeks (60 sessions), depending on when they reach the average level of text reading for their class.

Reading Recovery originated in New Zealand and has been a nationwide program in that country since 1979. It has been successfully adapted and tested for four years in Ohio and is now being disseminated to many other locations throughout the United States, Canada, and Australia. Reading Recovery in the U.S. is a collaboration between universities and school districts, involving a one-year academic course for teachers. By the early 1990s, Reading Recovery was operating in 48 states.

The program is an intensive one-to-one intervention tutoring program targeting the lowest-achieving 20% of first-grade readers. During daily 30-minute lessons, teachers who are specifically trained in Reading Recovery techniques individually tutor faltering readers to help them develop the kinds of strategies that good readers use. Lessons are tailored to a student's individual strengths and needs based on teacher observations. Teachers must be able to make highly skilled decisions at each moment during the lesson. Once fully trained, Reading Recovery teachers provide lessons to approximately eight first grade students. These students are served during half of the teacher's work day. During the other half of the day the Reading Recovery teacher performs additional duties that vary by individual, such as classroom instruction, small-group work, or instructional coaching.

Reading Recovery focuses on phonemic awareness, phonics, vocabulary, fluency, and comprehension. The program starts with an assessment of the child's strengths and weaknesses (letter identification, word test, concepts about print, writing, dictation test, text reading). For the first 10 days, the teacher does not teach, but rather, explores reading and writing with the child. During the following days Reading Recovery lessons evolve around reading small story books (the teacher chooses from 500 books organized into 20 reading levels) and composing and writing a story. Specific skills taught include problem solving strategies based on self-monitoring, cross-checking, predicting, and confirming, as well as the use of multiple sources of information while reading and writing. Once students are equipped with these strategies for independent processing, struggling readers can achieve at average levels and maintain proficiency in the regular classroom without special intervention.

Lessons are discontinued when students demonstrate the ability to consistently read at the average level for their grade, between weeks 12 and 20 of the program. Those who make progress but do not reach average classroom performance after 20 weeks are referred for further evaluation and a plan for future action.

Teacher training includes a one-year university-based training program through a network of partner universities and ongoing professional development by a Reading Recovery teacher leader.

Outcomes

Primary Evidence Base for Certification

Study 10

May et al. (2013) found that the intervention group relative to the control group scored significantly higher on

Composite reading, reading words subscale, and the reading comprehension subscale.

Study 12

May et al. (2014) found that the intervention group relative to the control group scored significantly higher on

Total reading, reading words subscale, and the reading comprehensive subscale.

Brief Evaluation Methodology

Primary Evidence Base for Certification

Of the 13 studies Blueprints has reviewed, two (Studies 10 and 12) meet Blueprints evidentiary standards (specificity, evaluation quality, impact, dissemination readiness). The studies were conducted by independent evaluators.

Study 10

May et al. (2013) used a randomized controlled trial to examine 1,253 first-grade students with low reading scores who attended 158 schools. Students were randomly assigned to the intervention group or an instruction-as-usual control group. Assessments at the end of the intervention measured reading skills with standardized tests.

Study 12

May et al. (2014) used a randomized controlled trial to examine 2,092 first-grade students with low reading scores who attended 267 schools. Students were randomly assigned to the intervention group or an instruction-as-usual control group. Assessments at the end of the intervention period measured reading skills with standardized tests.

Blueprints Certified Studies

Study 10

May, H., Gray, A., Gillespie, J. N., Sirinides, P., Sam, C., Goldsworthy, H., . . . Tognatta, N. (2013). Evaluation of the i3 scale-up of Reading Recovery: Year one report, 2011-12. Philadelphia, PA: Consortium for Policy Research in Education.

Study 12

May, H., Goldsworthy, H., Armijo, M., Gray, A., Sirinides, P., Blalock, T. J., . . . Sam, C. (2014). Evaluation of the i3 scale-up of Reading Recovery: Year two report, 2012-13. Philadelphia, PA: Consortium for Policy Research in Education.

Risk and Protective Factors

Risk Factors

School: Poor academic performance*

Protective Factors

Individual: Problem solving skills

School: Instructional Practice

* Risk/Protective Factor was significantly impacted by the program

Subgroup Analysis Details

Subgroup differences in program effects by race, ethnicity, or gender (coded in binary terms as male/female) or program effects for a sample of a specific racial, ethnic, or gender group:

Study 10 (May et al., 2013) tested for within-subgroup program effects and found significant benefits for students in rural schools but not in comparison to students in other schools.
Study 12 (May et al., 2014) tested for within-subgroup program effects and found significant benefits for students in rural schools but not in comparison to students in other schools.

Sample demographics including race, ethnicity, and gender for Blueprints-certified studies:

In Study 10, whites comprised the largest percentage of the group (56-57%), followed by Hispanics (20-22%), blacks (18-19%), and students of other races (3-5%).
In Study 12, students in the sample were 58-60% male, 55% white, 21% Hispanic, and 16% Black. About 21% were English-language learners.

Training and Technical Assistance

Each site must train a Reading Recovery teacher leader at a recognized Reading Recovery university training center (UTC) or have access to a trained teacher leader. Teacher leaders train Reading Recovery teachers and oversee implementation in the site. The salaries of teacher leaders vary according to location and experience, and training costs vary across UTCs. Reading Recovery teachers are trained for an academic year, with ongoing professional development in subsequent years, while providing Reading Recovery in a school as part of their teaching load. Each school will have one or more teachers assigned to this role, based on the number of students needing the intervention. Program guidelines limit the number of teachers supported and monitored by a teacher leader to 42. Training includes:

An Observation Survey of Early Literacy Achievement (Clay, 2002, 2006, 2013) is used for training assessment procedures. Assessment training lasts four days and includes monitored testing of children.
Teacher training includes a weekly university course for an academic year, including the observation and discussion of live lessons behind a one-way glass. Training materials include two books that detail teaching procedures and implementation issues (Literacy Lessons Designed for Individuals Part One and Part Two, Clay, 2005a and 2005b) and a nonconsumable set of books for children to read.
In addition to weekly classes, the teacher leader makes a minimum of four school visits to support teachers and is available for ongoing consultation.
Additional texts and articles are used to enhance training and ongoing implementation. The publication entitled Standards and Guidelines of Reading Recovery in the United States (RRCNA, 2015) details implementation standards (required) and guidelines (recommended) for Reading Recovery sites, including requirements for professional development for teachers and teacher leaders. Intensive year-long training of teacher leaders at university training centers is guided by a Teacher Leader Preparation Framework to ensure fidelity of the intervention.

After the initial training year, a registered Reading Recovery teacher leader provides annual professional development for Reading Recovery teachers. This includes a minimum of six sessions, some of which must include live lessons, and at least one school visit. Training and implementation materials continue to be used to deepen understandings of teaching and implementing within a school setting.

Teacher leaders and site coordinators protect the integrity of the standards for implementing Reading Recovery and communicate to various constituents. University trainers provide technical support to each site's teacher leader and site coordinator.

More specific information about training and support follows:

University Course (Initial Training)

Reading Recovery teacher training begins with a week of intensive assessment training for which university credit may or may not be granted. Then Reading Recovery teachers begin a weekly university course (with credits) taught by the Reading Recovery teacher leader for the site (schools within an established Reading Recovery teacher training site). The site pays university tuition to a Reading Recovery-affiliated university for each teacher-in-training. Costs vary among universities. The weekly classes last for an academic year and include the following:

Weekly sessions make extensive use of a one-way glass screen/mirror through which teachers observe colleagues working with a child and put their observations and analyses into words as they build new understandings to inform teaching decisions. All teachers are required to teach behind the one-way glass.
Teachers learn teaching procedures and study the theoretical rationales for selecting appropriate procedures to meet the current needs of each child.
Application of learning is gained by teaching four children individually on a daily basis (as each child completes his series of lessons, he is replaced by a child who begins individual lessons).
Attention is given to daily and weekly records of each child's reading and writing behaviors to analyze progress and solve problems.
Teachers learn procedures for submitting data to the International Data Evaluation Center.
The teacher leader makes on-site school visits to observe the teachers-in-training and provide ongoing support.

Reading Recovery teacher training is comprehensive, complex, and intensive because each teacher must learn to design and deliver individual daily lessons. No prescriptive manual or packaged set of materials can meet each child's individual needs. Reading Recovery teachers must learn to

assess each child's current understandings,
closely observe and record behaviors for evidence of progress,
use teaching procedures competently and appropriately,
design individual series of lessons daily,
critically evaluate themselves and their peers,
understand the theory behind their teaching,
use data to inform teaching, and
communicate about Reading Recovery in their schools.

The university course is generally taught at a facility constructed by the training site - a special facility with a one-way glass and appropriate space, furniture, and materials. This facility is often in a school or a central office facility within the site. Sessions are very interactive, involving live teaching sessions accompanied by verbal observations of the child's literacy behaviors and the teacher's decisions. Time is also dedicated to discussion of teaching procedures, designing lessons, gaining understanding of the theory driving the teaching, and using data to make teaching decisions. Classes also address processes for implementation within the schools.

School Visits

Teacher leaders make school visits to Reading Recovery teachers-in-training and to trained teachers. A typical visit may involve the observation of two lessons accompanied by conversations about the progress the children are making. The two professionals use records and observations as data to discuss decisions and future actions for each child. Most school visits last approximately 2 hours, but time is dependent on the purpose of the visit and specific needs of the teacher and/or children.

Additional Support

Teacher leaders provide continuing support to teachers via email, phone, and other means of communication as appropriate. There are no established time limitations for this support; however, realistic limitations are determined by the workload of the teacher leader.

Teacher leaders participate in ongoing professional develop through university training centers and annual institutes organized by the North American Trainers Group (NATG), a network of academics from each of these university centers.

Benefits and Costs

Source: Washington State Institute for Public Policy
All benefit-cost ratios are the most recent estimates published by The Washington State Institute for Public Policy for Blueprint programs implemented in Washington State. These ratios are based on a) meta-analysis estimates of effect size and b) monetized benefits and calculated costs for programs as delivered in the State of Washington. Caution is recommended in applying these estimates of the benefit-cost ratio to any other state or local area. They are provided as an illustration of the benefit-cost ratio found in one specific state. When feasible, local costs and monetized benefits should be used to calculate expected local benefit-cost ratios. The formula for this calculation can be found on the WSIPP website.

Program Costs

Start-Up Costs

Initial Training and Technical Assistance

Each site (which can include multiple school districts) must train a Reading Recovery Teacher Leader at a recognized Reading Recovery University Training Center (UTC) or have access to a trained Teacher Leader. Teacher Leader training generally requires two semesters of full-time course work prior to serving in the role. Teacher Leaders train Reading Recovery teachers and oversee implementation in the site. Salaries of teacher leaders vary according to location and experience and training costs vary across UTCs.

In addition, the site pays university tuition to a Reading Recovery UTC for each teacher-in-training. The initial training generally requires two to three hours per week for the first year of implementation (participating teachers earn approximately 6 graduate credits, depending on the university). There are approximately 20 UTC's across the U.S. Costs vary among universities. Individuals interested in implementing should contact their nearest UTC for specific costs. Links to the UTCs are available at: http://www.readingrecovery.org/development/centers/index.asp

Curriculum and Materials

For each teacher, the site or the school buys training materials, testing materials, and a set of nonconsumable books to use with children. The cost is approximately $2,500 to $3,500 per teacher.

Licensing

Each year, when a Reading Recovery site has met the Standards and Guidelines of Reading Recovery in the United States (RRCNA, 2009), it is granted use of the Reading Recovery trademark. This trademark is royalty-free.

Other Start-Up Costs

If not already available, an initial cost for a Reading Recovery site is the construction of a specified training area with a one-way glass (small room for teaching and larger room for observing). This area is often located in a school or central office building. Only one is needed per site. A one-time cost for furniture and equipment for the training room should be considered.

A site coordinator (local administrator) is designated to support site implementation. Generally, Reading Recovery responsibilities are incorporated into an existing administrative position.

Intervention Implementation Costs

Ongoing Curriculum and Materials

Replacement costs and consumable items are estimated at $300 per teacher annually.

Staffing

Qualifications: Teacher Leaders and teachers delivering the intervention are typically certified teachers. Teachers deliver the intervention for at least 2.5 hours of the school day. Teacher Leaders are generally devoted to Reading Recovery full-time.

Ratios: Reading Recovery is a one-on-one intervention.

Time to Deliver Intervention: Reading Recovery requires one half hour of individual instruction per student day for a period of 12-20 weeks.

Other Implementation Costs

A site coordinator (usually a local administrator) assists with allocation of part-time clerical support as needed for Reading Recovery Teacher Leaders; additional costs are related to communications, printing, shipping, equipment such as computers, etc.

Implementation Support and Fidelity Monitoring Costs

Ongoing Training and Technical Assistance

Sites fund ongoing professional development sessions for Reading Recovery teachers and school visits by a Teacher Leader (costs vary; some sites absorb cost in salary of Teacher Leader while others charge a fee per teacher for ongoing professional development and support).

Sites also budget for Teacher Leader(s) to participate in two required annual national/regional institutes/conferences for their continuing professional development.

Finally, sites pay annual technical support fees to their UTC. These fees vary across UTCs.

Fidelity Monitoring and Evaluation

Sites fund two visits from a UTC trainer during their field year (year after training of teacher leader) for implementation support and fidelity monitoring.

Each Reading Recovery site pays $350 per year for services of the International Data Evaluation Center (IDEC). Data costs also include $50 per teacher within the site. The IDEC collects, analyzes, and reports data for every child served by Reading Recovery.

Ongoing License Fees

None.

Other Implementation Support and Fidelity Monitoring Costs

No information is available

Other Cost Considerations

Reading Recovery teachers work in other roles during the remainder of their school day. Their specialized training is a value-added benefit for children, teachers, and schools. If a trained Teacher Leader is working with nearby Districts, it may be possible for a new district to access training from that Teacher Leader and save costs on initial training. While the first year of training for Teacher Leaders and teachers is intense, the ongoing costs for implementation of the program beyond the first year drop significantly.

Year One Cost Example

The following example is for a District implementing Reading Recovery with support for one Reading Recovery teacher leader and 28 Reading Recovery teachers. Estimates of the cost of tuition for teacher leaders and teachers is based on estimates of national average cost of graduate schools in education. Actual costs will vary based on the costs at the local University Training Center (UTC). The staffing cost of the full-time Teacher Leader is included in the estimate but it is assumed that the District is reallocating the time of existing teachers to implement Reading Recovery.

Teacher Leader Tuition at UTC (18 credits X 386 per credit)	$6,948.00
Teacher Tuition for UTC credits for training (6 credits X 386 per credit X 28)	$64,848.00
Curricula and Materials ($3,000 X 28)	$84,000.00
Data Collection from IDEC (350 + (50 X 28))	$1,750.00
Teacher Leader Travel to National Conferences	$2,000.00
Site Visits from UTC Trainer	$500.00
Teacher Leader Staffing Costs (salary and fringe)	$100,000.00
Teacher Leader Travel to Site	$1,000.00
Administrative Overhead at 20% of staffing costs	$20,000.00
Total One Year Cost	$281,046.00

With 280 students served in the first year, the cost per student would be $1,003 per student. Note that the cost of the intervention would decline significantly beyond the first year of implementation after the initial training costs have been paid for the Teacher Leader and teachers. Analyses of per student cost for Reading Recovery has been estimated at approximately $4,000 when the salaries of the Teacher Leader and those implementing the intervention are included for the time spent on Reading Recovery and at $109 per student when one assumes reallocation of current teacher time and does not include staff salaries in the cost estimate.

Funding Strategies

Funding Overview

Many schools/systems use multiple sources to fund Reading Recovery. The available percentages given in the sections below were reported to the International Data Evaluation Center during the 2009-2010 school year. Funding sources may vary year-to-year based on federal, state, and local allocations.

Examples from Reading Recovery teacher training sites are shown below:
1. A Texas site is 100% funded by Texas State Compensatory Education Funds.
2. An Ohio site is funded 99% by Title I and 1% by Title II-A.
3. A Kentucky site reports several sources to fund positions: Read to Achieve state grant (43%); Every1Reads initiative, a partnership between the public system and a local business group (33%); Title I and general funds (24%).
4. An Ohio site reports that past funding has been Title I (85%) and local (15%); a new funding source, the School Improvement Grant, was added last year.
5. An 8-county North Carolina site uses state funds (90%), local funds (6%), and Title I funds (4%).

Allocating State or Local General Funds

State funding sources vary. (In Texas, for example, State Compensatory Education funds are available for supplementary programs to aid students at risk of dropping out of school.) Approximately 30% of Reading Recovery schools use state funding options.

Local funding sources also vary. Approximately 50% of Reading Recovery schools reported using some local funding to implement the intervention.

Maximizing Federal Funds

Title I Part A: Improving America's Schools provides funds for supplemental additional instructional services to students to increase student success. These funds are widely used to fund Reading Recovery teachers and their training.

Title II-A provides supplemental financial assistance to ensure that school professionals have access to high-quality professional development. This fund can be used to support the training of Reading Recovery teachers.

Title III provides supplemental funds for language instruction for limited English and recent immigrants. This fund can be used to support the training of teachers for Descubriendo la Lectura (Reading Recovery in Spanish).

IDEA funding may be used to support Reading Recovery training. If districts choose, 15% of these funds can be used to support response to intervention (RTI).

Foundation Grants and Public-Private Partnerships

Approximately 3% of Reading Recovery schools use some private funding from a variety of sources to implement the intervention.

More than $9 million in private funding was pledged to match a 5-year, $45.6 million Investing in Innovation (i3) scale-up grant awarded by the U.S. Department of Education in 2010. The highest level of scientific evidence was required to qualify for this scale-up grant awarded to The Ohio State University. OSU and 18 university partners trained 3,750 new Reading Recovery teachers, 46 teacher leaders, and provided lessons to 62,000 Reading Recovery students in 38 states. Trained teachers also reached an additional 336,000 children in small group and classroom teaching during the rest of their school day.

Evaluation Abstract

Program Developer/Owner

Dr. Marie M. Clay, Deceased

Program Outcomes

Academic Performance

Program Specifics

Program Type

Academic Services
Mentoring - Tutoring
School - Individual Strategies
Skills Training

Program Setting

School

Continuum of Intervention

Indicated Prevention

Program Goals

Population Demographics

The program targets all first-grade students that score in the lowest 20% on reading skills as determined by a Diagnostic Survey and the classroom teachers' judgment. The program is effective among schools in low, average, and high SES settings and has been implemented in various countries (e.g., US, Australia, England). The program has also been successfully used among Spanish speakers (Descubriendo la Lectura by its Spanish acronym).

Target Population

Age

Late Childhood (5-11) - K/Elementary

Gender

Both

Race/Ethnicity

Subgroup Analysis Details

Subgroup differences in program effects by race, ethnicity, or gender (coded in binary terms as male/female) or program effects for a sample of a specific racial, ethnic, or gender group:

Study 10 (May et al., 2013) tested for within-subgroup program effects and found significant benefits for students in rural schools but not in comparison to students in other schools.
Study 12 (May et al., 2014) tested for within-subgroup program effects and found significant benefits for students in rural schools but not in comparison to students in other schools.

Sample demographics including race, ethnicity, and gender for Blueprints-certified studies:

In Study 10, whites comprised the largest percentage of the group (56-57%), followed by Hispanics (20-22%), blacks (18-19%), and students of other races (3-5%).
In Study 12, students in the sample were 58-60% male, 55% white, 21% Hispanic, and 16% Black. About 21% were English-language learners.

Risk/Protective Factor Domain

Individual
School

Risk/Protective Factors

Risk Factors

School: Poor academic performance*

Protective Factors

Individual: Problem solving skills

School: Instructional Practice

*Risk/Protective Factor was significantly impacted by the program

Brief Description of the Program

Description of the Program

Teacher training includes a one-year university-based training program through a network of partner universities and ongoing professional development by a Reading Recovery teacher leader.

Theoretical Rationale

Theory of learning: The program is based on a theory of learning which assumes that people learn by constructing meaning through social interactions. Learners engage in social activities that support their learning, and they gradually take over the process, becoming independent literacy learners.

Theory of instruction: Any theory of learning implies a theory of instruction. Adults help children to solve problems and in the process provide conditions that help the children find the patterns and regularities they will use to solve problems alone at future times. The complexity of whole tasks is maintained, yet each is tailored for the child to participate easily. The involvement of the more expert adult provides demonstrations that communicate information about the way people go about the task.

Reading Recovery provides opportunities for ongoing conversation while the student is engaged in authentic reading and writing tasks. The conversation between teacher and child operates to stimulate, encourage, challenge, and support reading work. This is based on the theoretical assumption that higher mental functions appear first on the social level between people (intercognitive), and later on the individual level, inside the child (intracognitive). This growth occurs in the zone of proximal development, that phase in the development of a cognitive skill where a child has only partially mastered the skill. By employing the skill with the assistance of an adult, the child internalizes it.

Theoretical Orientation

Skill Oriented
Cognitive Behavioral
Social Learning

Brief Evaluation Methodology

Primary Evidence Base for Certification

Study 10

Study 12

Outcomes (Brief, over all studies)

Primary Evidence Base for Certification

Study 10

May et al. (2013) found that the intervention group relative to the control group scored significantly higher on all three reading outcomes (composite reading, reading words subscale, reading comprehension subscale).

Study 12

May et al. (2014) found that the intervention group relative to the control group scored significantly higher on total reading, the reading words subscale, and the reading comprehensive subscale.

Outcomes

Primary Evidence Base for Certification

Study 10

May et al. (2013) found that the intervention group relative to the control group scored significantly higher on

Composite reading, reading words subscale, and the reading comprehension subscale.

Study 12

May et al. (2014) found that the intervention group relative to the control group scored significantly higher on

Total reading, reading words subscale, and the reading comprehensive subscale.

Effect Size

Study 10 (May et al., 2013) reported values for Cohen's d ranging from .44 to .47. Study 12 (May et al., 2014) reported values for Glass' D ranging from .36 to .42.

Generalizability

Two studies meet Blueprints standards for high-quality methods with strong evidence of program impact (i.e., "certified" by Blueprints): Study 10 (May et al., 2013) and Study 12 (May et al., 2014). The samples for these studies included first-grade students with reading problems.

Study 10 took place across a diverse set of locations and compared the treatment group to an instruction-as-usual control group.
Study 11 took place across a diverse set of locations and compared the treatment group to an instruction-as-usual control group.

Potential Limitations

Additional Studies (not certified by Blueprints)

Study 1 (Pinnell et al., 1988)

Random assignment was not made for all intervention students.
No clear description of the alternative intervention is provided.
No proof of validity of the used measurements is provided.
No description of sample characteristics is given.
Attrition was larger than 10% for all groups at the one year and two year follow-up assessment.
Observers were not blind to the condition.

Pinnell, G. S., DeFord, D. E., & Lyons, C. A. (1988). Reading Recovery: Early intervention for at-risk first graders (Educational Research Service monograph).Arlington, VA: Educational Research Service.

Study 2 (Pinnell et al., 1994)

No information on attrition was provided and no analysis of differential attrition was performed.
The intent-to-treat principle was not followed since the authors intentionally dropped schools from their analysis for which the randomization procedure had allegedly not been implemented.
The follow-up period was short (8 months) and the only two out of four tests were employed to measure long-term effects.
Observers were not blind to conditions: It is not clearly stated who conducted the tests (most likely the teachers that implemented the program also performed the tests).
No information was presented on selection of school districts and their representativeness.
Reading Recovery schools had already adopted the program and were self-selected rather than randomly assigned.

Pinnell, G. S., Lyons, C. A., DeFord, D. E., Bryk, A. S., & Seltzer, M. (1994). Comparing instructional models for the literacy education of high-risk first graders. Reading Research Quarterly, 29(1), 8-39.

Study 3 (Burroughs-Lange & Douetil, 2007)

Intervention schools had already chosen to use the program, and therefore were self-selected.
No randomization was used: The choice of the matching schools seems to bias the estimates since schools were intentionally chosen that are characterized by a large number of low-achieving students.
The statistical analysis was conducted at a different level (individuals) than the matching procedure (schools) - the authors failed to use appropriate statistical models to account for this multi-level structure.
The study did not evaluate long-term effects of the program.
The study did not monitor the fidelity of program implementation.

Burroughs-Lange, S., & Douetil, J. (2007). Literacy progress of young children from poor urban settings: A Reading Recovery comparison study. Literacy Teaching and Learning, 12(1), 19-46.

Study 4 (Curry et al., 1995)

No randomization was employed; sample was self-selective.
Different measures for pretest (MRT) and posttest (ITBS) were used; comparing percentiles of these different measures to evaluate effectiveness of the program is not ideal.
Implementation fidelity was not monitored or evaluated.
No analysis of differential attrition or baseline equivalence was conducted.
Poor methodology and poor reporting.
The study did not follow the intent-to-treat principle.
Wrong level of analysis: analysis was done at the individual level while group assignment was conducted at the school level.

Curry, J., Griffith, J., & Williams, H. (1995). Reading Recovery in AISD. Austin Independent School District: Department of Audit and Evaluation.

Study 5 (Hurry & Sylva, 2007)

No information on attrition was reported and no analysis of differential attrition was performed.
At the school-level no true random assignment was conducted and at the classroom-level only the phonological intervention but not Reading Recovery was randomly assigned.
A small within-schools control group was used (N=2).

Hurry, J., & Sylva, K. (2007). Long-term outcomes of early reading intervention. Journal of Research in Reading, 30(3), 227-248.

Study 6 (Center et al., 1995)

Selection bias: The study selects all schools that have Reading Recovery already implemented and thus might be in general more proactive and progressive in their approach to help low-achieving students.
Diminishing size of control group across evaluation points due to replacement of the intervention group poses problems to robust statistical tests.
The study does not account for clustering at the school level.
The study may not have followed the intent-to-treat principle.
No analysis of differential attrition was performed.
Poor reporting on sample characteristics.
It is not clear if the research assistants were blind to the treatment conditions.

Center, Y., Wheldall, K., Freeman, L., Outhred, L., & McNaught, M. (1995). An evaluation of Reading Recovery. Reading Research Quarterly, 30(2), 240-263.

Study 7 (Escamilla, 1994)

No long-term effects were assessed
Poor description of sample characteristics
Even though substantial differences in baseline scores were observed the study did not account for these differences in the statistical analysis
Attrition is not mentioned in the study and a test of differential attrition was not performed
Poor statistical methodology: The study did not account for clustering at the school level; it only compares means at pre- and post-test without taking baseline values into account
Observers were not blind to condition: Teachers who administered the intervention also did the testing
No random group assignment was performed
Implementation fidelity was not monitored

Escamilla, K. (1994). Descrubriendo la Lectura: An early intervention literacy program in Spanish. Literacy, Teaching, and Learning, 1(1), 58-70.

Study 8 (Baenen et al., 1997)

The characteristics of the control group are not described.
The selection criteria for the comparison group and associated schools is not described.
The study does not adequately report sample sizes, sample characteristics, and attrition.
The methodology is poorly reported.
No use of baseline controls.
Implementation fidelity was not monitored.
No randomized control group was available for the 1991-92 and 1992-93 cohorts.
No comparison or control group was available for the 1992-93 cohort.
No effort was made to adjust for baseline non-equivalence between intervention and comparison group for the 1991-92 cohort.
It is not possible to judge adherence to the intent-to-treat principle due to poor reporting of the methodology.
No control for clustering at the school level.
Individuals who did the testing might not have been blind to student's group assignment.

Baenen, N., Bernhole, A., Dulaney, C., & Banks, K. (1997). Reading Recovery: Long-term progress after three cohorts. Journal of Education for Students Placed at Risk, 2(2), 161-181.

Study 9 (Schwartz, 2005)

Small group sizes.
Randomization of only two participants to two groups each time.
The teacher both delivered the intervention and administered the assessments.
Results of recommended completers and non-completers suggests groups were not equivalent.
Covariates such as race/ethnicity, gender, SES, age, and pretest data were not controlled for, in addition to the fact that clustering at the school level was not taken into account.
Unclear whether change scores were used in the analyses.
No information on differential attrition.
Missing data was not accounted for in the analysis, as the datasets of teachers missing any posttest data were excluded from any analyses.
Content of standard lessons (i.e. for control) not described or any additional services that may have been received by participants.
Equivalence across demographics for each group not demonstrated despite some apparently large differences by gender and race.
Implementation fidelity was not measured or discussed.
No assessment of long-term impact.
Possible selection bias as schools were already implementing the Reading Recovery program.
Group Ns change at each time point.
Pre-randomization selection procedure "varied across sites" - possible selection bias.
Self-selection of RR teachers (who volunteered to take part) - they may have been more motivated about or had greater belief in the intervention than non-volunteers.

Schwartz, R. M. (2005). Literacy learning of at-risk first-grade students in the Reading Recovery early intervention. Journal of Educational Psychology, 97(2), 257-267.

Study 11 (D'Agostino & Murphy, 2004)

Most studies included did not use a randomized controlled design. No details were given on comparison groups.
No information on response rates, attrition, differential attrition, or intent-to-treat (but the analysis of only discontinued students likely violates intent-to-treat).
The strongest effects were observed for the six outcomes from Observation Survey Measures that were most closely tied to program content.
It is unknown if the analysis was conducted at the proper level, since the study did not report how condition statuses were determined.
Fewer studies had pretest scores than had posttest scores.
Many significant differences in pretest scores.
No sample characteristics were provided.

D'Agostino, J. V., & Murphy, J. A. (2004). A meta-analysis of Reading Recovery in United States schools. Educational Evaluation and Policy Analysis, 26(1), 23-38.

Study 13 (Sirinides et al., 2018)

Few tests of baseline equivalence and done on analysis sample
Incomplete tests of differential attrition (no tests for some outcome measures) and one difference by race

Sirinides, P., Gray, A., & May, H. (2018). The impacts of reading recovery at scale: Results from the 4-year i3 external evaluation. Educational Evaluation and Policy Analysis, 0162373718764828

Notes

There are a number of studies examining the effectiveness of Reading Recovery, but most are not high quality. What Works Clearinghouse identified 202 studies, of which three met evidence standards: Pinell et al. (1988), Pinell et al. (1994), and Schwartz (2005).

Endorsements

Blueprints: Promising
What Works Clearinghouse: Meets Standards Without Reservations - Positive Effect

Peer Implementation Sites

If you would like to contact a site currently implementing this program, please contact:

Jady Johnson, Executive Director
Reading Recovery Council of North America
500 West Wilson Bridge Road, Suite 250
Worthington, Ohio 43085-5218
Phone: (614) 310-7323
Fax: (614) 310-7345
jjohnson@readingrecovery.org
www.readingrecovery.org

Program Information Contact

References

Study 1

Study 2

Study 3

Burroughs-Lange, S., & Douetil, J. (2007). Literacy progress of young children from poor urban settings: A Reading Recovery comparison study. Literacy Teaching and Learning, 12(1), 19-46.

Study 4

Curry, J., Griffith, J., & Williams, H. (1995). Reading Recovery in AISD. Austin Independent School District: Department of Audit and Evaluation.

Study 5

Hurry, J., & Sylva, K. (2007). Long-term outcomes of early reading intervention. Journal of Research in Reading, 30(3), 227-248.

Study 6

Center, Y., Wheldall, K., Freeman, L., Outhred, L., & McNaught, M. (1995). An evaluation of Reading Recovery. Reading Research Quarterly, 30(2), 240-263.

Study 7

Escamilla, K. (1994). Descrubriendo la Lectura: An early intervention literacy program in Spanish. Literacy, Teaching, and Learning, 1(1), 58-70.

Study 8

Baenen, N., Bernhole, A., Dulaney, C., & Banks, K. (1997). Reading Recovery: Long-term progress after three cohorts. Journal of Education for Students Placed at Risk, 2(2), 161-181.

Study 9

Schwartz, R. M. (2005). Literacy learning of at-risk first-grade students in the Reading Recovery early intervention. Journal of Educational Psychology, 97(2), 257-267.

Study 10

Certified May, H., Gray, A., Gillespie, J. N., Sirinides, P., Sam, C., Goldsworthy, H., . . . Tognatta, N. (2013). Evaluation of the i3 scale-up of Reading Recovery: Year one report, 2011-12. Philadelphia, PA: Consortium for Policy Research in Education.

Study 11

D'Agostino, J. V., & Murphy, J. A. (2004). A meta-analysis of Reading Recovery in United States schools. Educational Evaluation and Policy Analysis, 26(1), 23-38.

Study 12

Certified May, H., Goldsworthy, H., Armijo, M., Gray, A., Sirinides, P., Blalock, T. J., . . . Sam, C. (2014). Evaluation of the i3 scale-up of Reading Recovery: Year two report, 2012-13. Philadelphia, PA: Consortium for Policy Research in Education.

Study 13

Sirinides, P., Gray, A., & May, H. (2018). The impacts of reading recovery at scale: Results from the 4-year i3 external evaluation. Educational Evaluation and Policy Analysis, 0162373718764828.

Study 1

Evaluation Methodology

Design:
This study was a randomized control trial, but the write-up targets primarily practitioners and therefore does not contain many technical details.

Recruitment:
The program was implemented in 12 schools in Columbus, OH in the year 1985-1986 (the criteria for selection were not discussed). Thirty-two trained teachers were involved in the project. The lowest 20% (determined by diagnostic survey and teachers' judgment) of children in the classrooms taught by Reading Recovery teachers were selected for the program. The lowest 20% of children in other classrooms in the same schools were also identified; half of these children were randomly assigned to receive Reading Recovery and half were randomly assigned to receive an alternative compensatory intervention. The alternative intervention was implemented in small groups (2-4 students); it is otherwise unclear how this program was structured.

Sample size/Attrition:
The study was conducted with 187 children (136 Reading Recovery intervention (RR) and 51 alternative intervention (AI)). Additionally, at each time measurement point, a random sample of students (excluding Reading Recovery and alternative intervention students) was drawn to provide a grade-level average (102 regular first-grade students, 68 regular second-grade students, 67 regular third-grade students). At the time of the first assessment (May 1986, end of first grade), 98% of the intervention group (3 children had moved from the district) and 100% of the alternative intervention students were tested. Attrition increased for the two follow-ups during May 1987 (completion rates: RR=85%; AI=84%) and May 1988 (completion rates: RR=77%; AI=82%).

Assessment:
The Reading Recovery intervention was implemented throughout the school year 1985/1986. A pretest was conducted during Fall 1985 while assessments of the program success were conducted in May 1986 (end of first grade), in May 1987 (end of second grade) and May 1988 (end of third grade).

Sample characteristics:
No description of sample characteristics is given.

Measures:
Children were assessed on eight dependent measures:

Text reading skills
Letter identification skills
Word test
Concepts about print
Writing vocabulary
Dictation test
Two subtests of the Comprehensive Tests of Basic Skills (Reading Vocabulary and Reading Comprehension)
Writing sample

Even though not explicitly mentioned, it appears that teachers who delivered the intervention also did the assessments.

Analysis:
No statistical tests were conducted. Only means are compared to evaluate the effectiveness of the program. In addition, normal curve equivalent (NCE) gain scores from baseline were computed for Reading Recovery and comparison groups.

Intention-to-treat: The study complied with the intent-to-treat principle.

Outcomes

Implementation fidelity: Teachers received training in Reading Recovery by the program developer Marie Clay. The program was pilot tested at Columbus Public Schools during 1984-1985.

Baseline Equivalence/Differential attrition: No analysis of baseline equivalence or differential attrition was performed.

Posttest: In May 1986, Reading Recovery children as a total group (73% had successfully discontinued) scored higher than children in the alternative intervention on all measures. For example, on the text reading test Reading Recovery children scored 9.95 after intervention while alternative intervention children scored only 6.96. Moreover, the scores of the total Reading Recovery children were similar to those of the reference group of regular first-grade students. Specifically, the Reading Recovery group scored slightly higher on letter identification (51.92 vs. 51.78), concepts about print (16.40 vs. 16.00), writing sample (2.94 vs. 2.92), and dictation (31.20 vs. 30.24), and slightly lower on writing vocabulary (34.68 vs. 38.12), text reading (9.95 vs. 11.13), and word test (13.62 vs. 13.91), compared to the reference group.

When students were given the Comprehensive Tests of Basic Skills, the Reading Recovery children as a group (both discontinued and not discontinued children) gained ground relative to the level of skills expected of them in the fall and again in May. For example, on the measure for reading comprehension Reading Recovery children gained 7.0 points (NCE Gain Score) while children in the comparison group showed a reduction in the gain score (-4.5), comparing pretest and posttest results.

The Reading Recovery program was extended and implemented across the state of Ohio (years 1985-86 110 children served by 28 teacher leaders at 22 schools; 1986-87 1,130 students served by 235 teachers in 167 school districts; 1987-88 2,648 children served by 416 teachers in 228 school districts). Even though long-term effects were not consistently measured and also no control group (alternative intervention) was used, this extended study confirms the effectiveness of Reading Recovery. The data show that high percentages of the Reading Recovery children, ranging from 68.5% to 94.8%, achieved scores on reading and writing skills that were similar to those of the reference group of first-graders without reading disorder.

Long-term effects: The group of Reading Recovery children maintained the advantage that they had achieved at posttest (at the end of first grade) over children who had received the alternative intervention for up to 2 years after being released from the program. For example, at the end of the third grade, the mean text reading level score for successfully discontinued Reading Recovery children was 23.99 which slightly surpassed the score of the randomly sampled comparison group (23.50), while the mean score for students in the alternative intervention group was 16.71.

Study 2

Evaluation Methodology

Design:
Sample size/Attrition:
A total of 403 first-grade students (age 6 years) representing two rural, two suburban, and six urban school districts were identified to participate. In each of the 10 districts 4 schools were assigned a different treatment resulting in a sample of 40 at the school level. Seven schools were dropped from the analysis, six because the random assignment or the test administration was not accomplished correctly and one in which pretest data was "lost in the mail". Thus the overall experimental sample was reduced to 324 students (80%) in 33 sites (82%). In addition, some individual students were lost due to mobility after the experimental treatment, absenteeism, and failure to obtain a valid administration of a particular test. No information is provided on attrition.

Study type/Randomization/Intervention:
The design employed for the study was a mix of quasi-experimental and randomized control trial with a split-plots design replicated over a series of blocks (districts). The quasi-experimental part comes from selecting one school in each of the 10 districts that already had Reading Recovery (RR). This school was designated as the RR treatment site for the district. Three additional schools were also identified in each district and randomly assigned to one of the three alternative treatments: 1) Reading Success (RS), which utilized the Reading Recovery lesson framework and procedures in individual daily lessons for children, but teachers were trained in an alternative teacher education model (Theoretical Orientation to Reading Profile developed by DeFord [1985]); 2) Direct Instruction Skill Plan (DISP), which used a one-on-one treatment but varied in the activities and instructional emphasis; 3) Reading and Writing Group (RWG), which involved trained RR teachers applying their knowledge to work with groups of children.

Each school established a pool of 10 of the lowest-scoring students. Four students from within each pool were randomly assigned to the treatment at that school. The remaining students in the pool constituted a randomized control group. The control group was taught in small groups by teachers who had not received any special training; these teachers were instructed to help students build basic reading skills without specific directions of how to accomplish this goal. With an intervention and control group in each of the four schools, the design included eight groups.

Assessment:
For all treatment and comparison groups, pretest data, consisting of the Mason Early Reading Test, Dictation Task 1, and text reading level assessment, were collected in October of Year I (1989). At the conclusion of the tutorial programs in February, the full battery of student measures, that is, Dictation Task 2, text reading level, Woodcock Reading Mastery, and Gates-MacGinitie, were collected in order to determine the immediate impact of the four alternative treatments. As a first follow-up at the end of the academic year in May, the Gates-MacGinitie was readministered in order to assess end-of-the-year progress. Finally, sustained impact (if any) of the four treatments was determined on Dictation Task 3 and text reading tasks 8 months after posttest in October of Year II (1990).

Sample characteristics:
The 403 students constituted 238 males and 165 females. Seventy-two children were in school districts whose policies forbade racial identification; the rest of the sample consisted of 244 whites, 86 blacks, and 1 Asian. One hundred thirty-one were in school districts whose policies prohibited the communication of information about free or reduced-price lunch. Of the remaining 272 subjects, 166 (60.8%) were receiving free lunch and 11 (4%) were receiving reduced-price lunch.

Measures:
Validity of measurements:
Measurement validity was established by correlating the test results with scores on a test of word reading with 100 children at age 6. In addition, test-retest reliability was estimated or reported based on published findings that used the same measure. Also Cronbach's alpha reliability was reported. However, it is not clear who collected the data (presumably the teachers who delivered the program).

Primary outcomes:
Students' reading and writing skills were evaluated using a number of different tests.

Dictation tests: three dictation tests were administered (test-retest reliability coefficients .73-.89; Cronbach's alpha=.96).
Text reading level: Clay's running record technique was utilized (alpha = .83; item separation reliability = .98).
Mason early reading test: combines spelling skills test, recognition of high-frequency words, decoding make-believe words, and a reading task (no reliability test was performed for this measure).
Revised version of Woodcock reading mastery test: constitutes a comprehensive battery of tests measuring aspects of reading ability such as visual/auditory learning, letter identification, word attack, word identification, word comprehension, and passage comprehension (internal consistency reliability coefficient =.99).
Gates-MacGinitie reading test: uses vocabulary and comprehension exercises (reliability coefficients for vocabulary .90-.95; for comprehension .88-.94).

Analysis:
The analysis used multilevel models (HLM) with a two-level structure, student-level and school-level. The statistical analysis controlled for baseline scores on two relevant pretest measures (Dictation task, Mason test).

Intention-to-treat: The study may violate the intent-to-treat principle. The authors dropped schools from the analysis after they were assigned to one of the four groups, based on the assumption that the random within-school assignment to intervention and control group was not conducted properly, or due to the unavailability of valid pretests (p. 20).

Outcomes

Implementation fidelity: All teachers received extensive training in Reading Recovery procedures or their respective instruction technique necessary for their intervention (e.g. direct instruction skills plan).

Baseline Equivalence: The study conducted a test for baseline equivalence between the intervention and control group for each school. In general, most pairs were well matched with four exceptions for which gross initial differences were observed. The authors assumed that randomization was not implemented at these schools and thus dropped these cases from the subsequent analysis.

Differential attrition: No information on attrition was reported, and no analysis of differential attrition was performed.

Posttest: The multilevel-models showed that compared to the control group in the same school both the Reading Recovery (RR) and Reading Success (RS) interventions were able to significantly improve reading and writing skills as measured by dictation assessment (RR b=4.99, p<.01, d=.65; RS b=3.45, p<.05, d=.45), text reading level assessment (RR b=5.84, p<.001, d=1.50; RS b=1.75, p<.05, d=.45). However, only the Reading Recovery intervention showed a significant improvement on reading skills on the Woodcock Reading Mastery test (b=.32, p<.05, d=.49) and the Gates-MacGinitie test (b=5.19, p<.05, d=5.1). The reading and writing group intervention showed marginal significant improvements on the text reading level assessment (b=1.60, p<.1, d=.41). No significant results were observed for the Direct Instruction Skills Plan intervention. In summary, Reading Recovery showed significant improvements on all four tests.

Long-term effects: Long-term effects were evaluated by two tests. As mentioned above Reading Recovery showed significant improvements on the Gates MacGinitie test measured in February 1990 as posttest. The same test was administered 3 months later, in May 1990, but no statistical results were observed at this point. The second test to evaluate long-term effects was a dictation assessment. Significantly higher scores on the dictation assessment were achieved in February for both the Reading Recovery intervention and the Reading Success intervention. However, 8 months later a sustained effect was detected only for the Reading Recovery intervention; none of the other three interventions differed significantly from the control group. After 8 months, Reading Recovery showed a sustained significant effect on text reading level (b=5.12, p<.01, d=.75) and a marginal significant effect on the dictation test (b=4.98, p<.1, d=.35).

Study 3

Evaluation Methodology

Design:
Sample size/Attrition:
The intervention took place across one school year (2005-2006) in 42 schools serving low-income urban areas in London. The sample chosen from 21 Reading Recovery schools contained 605 children of which 145 were characterized as low-achievers, while 588 children formed the collective sample of students in the 21 comparison schools of which 147 children were identified as low-achievers.

Study type/Randomization/Intervention:
The study used a quasi-experimental design. The study compared the literacy attainments in schools where some children received Reading Recovery interventions with attainments in schools where children received alternative interventions. The sample was matched on characteristics at three levels, boroughs (London's administrative divisions), schools, and children in classrooms. Five London boroughs had Reading Recovery provision in some of their schools (group 1). Five other London boroughs were selected to form the comparison group because they were similar in achievement levels in standardized national tests (group 2). Twenty-one elementary schools were chosen who had an established Reading Recovery program, while the 21 elementary schools forming the control group were "nominated by the borough education officers as of most concern for high numbers of children with poor performance in literacy" (p. 24). "In each of the 42 schools, the eight children considered lowest in literacy formed one sample for comparison, and children in their entire classroom in Year 1 formed the other sample for this evaluation" (p. 24-25).

Assessment:
Children in Year 1 classrooms and the lowest-achieving eight children within those classrooms were assessed in each of the 42 schools in September 2005 and again in July 2006.

Sample characteristics:
The London boroughs selected for the Reading Recovery and comparison samples are among the lowest achieving in England. In both boroughs about 8% of 11-year-old children were achieving below the competency of a seven- to eight-year-old. In the 21 Reading Recovery schools, 40% of students received free school meals while in the comparison group these children amounted to 44%. In the Reading Recovery schools, 49% of the students spoke English as their second language while this percentage was 48% for the comparison group.

Measures:
Validity of measurements:
All measures have been used by prior studies. However, no additional validity tests are reported. The Observation Survey and the BAS test were administered by trained research assistants.

Primary outcomes:

Word recognition and phonic skills measure (WRAPS) (classrooms and low-achievers)
An Observation Survey of Early Literacy Achievement (low-achievers) measured the following:
- Concepts About Print
- Letter Identification
- Writing Vocabulary
- Hearing and Recording Sounds in Words
- Text Reading
- Book-level
Standard Reading Recovery diagnostic (low-achievers)
BAS Test to identify word reading age in months (low-achievers)
Change in attitudes to learning and self-confidence (CAPSD) - based on teachers' evaluation (low-achievers)

Analysis:
To assess program effects, the intervention and control groups were compared using ANOVA. In the case where significant baseline differences between treatment and control groups emerged, baseline scores were used as controls.

Intention-to-treat: The study complied with the intent-to-treat principle. For example, if children had recently left the class, research assistants were sent to their new school to administer the tests.

Outcomes

Baseline Equivalence: No statistical differences were observed for characteristics at the school level (e.g., free school meals, percentage of children with English as second language) or at the individual level (gender, age) at baseline. Among the outcome measures, a significant difference on book-level was observed comparing the low-achiever groups in the Reading Recovery group to the comparison group. The authors controlled for this difference in the subsequent analysis.

Differential attrition: All children who had started in the studied classrooms but who had left schools or were absent when the final assessment took place were examined. Their scores were similarly distributed across groups; therefore, the authors concluded that attrition did not bias the analysis.

Posttest (July 2006):
Classroom comparison: The WRAPS test indicated that Reading Recovery schools showed stronger (p<.05) progress on both available measures for word reading and phonic skills, compared to the comparison schools.

Low-achiever comparison: Comparing low-achievers who received Reading Recovery to low-achievers in comparison schools revealed significant (p<.05) differences on all measures of reading and writing skills (book level, concepts about print, letter identification, sounds in words, written vocabulary, BAS age, WRAPS age) with mostly strong effect sizes. For example, in text reading on a gradient of difficulty, children who received Reading Recovery were on average more than 14 book levels higher on the posttest compared to pretest assessment, while comparison group children on average made only 4 book-level gains from an equivalent baseline score. Children who received Reading Recovery were at age appropriate levels across all assessment measures at the end of the evaluation year. Comparison children were not. In addition, a subjective evaluation by classroom teachers indicates that children who had received Reading Recovery compared to the control group showed a significantly (p<.05) better attitude towards learning as measured by oral communication, work habits, social interaction with adults and peers, and self-confidence. No gender effect in the impact of Reading Recovery was observed. Boys and girls attained similar age-appropriate reading levels at the end of the program.

Study 4

Evaluation Methodology

Design:
Sample size/Attrition:
A total of 268 Chapter 1 and Chapter 2 first-grade students at 20 schools were eligible to receive Reading Recovery. Only those students whose pretest score (MRT) was at or below the 30th percentile (N=154) comprised the group of students that were used to evaluate the Reading Recovery program in AISD. Reading Recovery students were compared to a control group that was composed of Chapter 1-eligible students who attended other Chapter 1 schools that did not offer Reading Recovery (N=285). In addition, a group of 23 students (that was excluded from the 154 English Reading Recovery students) received the Spanish version (Desubriendo la Lectura) of Reading Recovery. The study reports that out of the 154 program students, 9% withdrew (entered special education or withdrew for other reasons). However, the study failed to investigate whether the group of attritors differed systematically from the group of program completers.

Study type/Randomization/Intervention:
This study used a quasi-experimental design. No intentional assignment of treatment and control groups was conducted; rather, the study compared schools in which Reading Recovery was already established to schools in which Reading Recovery was not available to low-achieving students. This approach fails to control for structural differences that led schools to adopt or not adopt the program.

Assessment:
Different tests were used at the two assessments. The program's effectiveness was evaluated by comparing normal curve equivalents (NCEs) percentiles for the pre- and posttest at the beginning and end of the school year 1993/94.

Sample characteristics:
The Reading Recovery group differed on numerous socio-demographic characteristics from the control group. In the Reading Recovery group 59% and in the control group 49% were male. In the Reading Recovery group the majority were Hispanics (59%) while in the control group the majority were African American (61%). Special Education was received by 13% of the students in the Reading Recovery intervention group while this percentage was as low as 4% in the control group. However, both groups showed a high percentage of children coming from low income families (93% and 92%).

Measures:
The study relied on two measures that were administered at different assessment points. The measures have been widely used and can be assumed to be valid.

Metropolitan Readiness Test (MRT) (fall, pretest)
Iowa Test of Basic Skills (ITBS) (spring, posttest)

The 50^th percentile is the average score for both the MRT and ITBS. For Spanish Reading Recovery students, the MRT and La Prueba were used as pre- and posttest respectively.

Analysis:
The authors transformed percentile scores to normal curve equivalents (NCEs) for the pre- and posttest comparison. The NCE relates a student's percentile rank to the normal curve. The national mean NCE is 50 with a gain of 2.0 NCE points considered to be the average expected gain for a school year. No baseline controls were included.

Intention-to-treat: The study may not have complied with the intent-to-treat principle. Investigators "decided that only the students with a valid pre- and posttest would be studied" (p. 4).

Outcomes

Baseline Equivalence: Although substantial differences in socio-demographic characteristics between the intervention and control group exists, the study did not control for these differences.

Differential attrition: No test for differential attrition was performed. In fact, the study only evaluated differences for students for which complete pre- and posttest data were available.

Posttest: The results show that Reading Recovery improved reading skills of low-achieving first-graders. However, Reading Recovery was only effective for students who had been successfully discontinued over the school year. Successfully discontinued students scored higher on the posttest (mean NCE=39.2) than students in the control group (mean NCE=36.7). The grade equivalence for the successfully discontinued Reading Recovery students was 1.6, which is the expected gain for AISD Chapter 1 students. Reading Recovery was not effective for students who were not discontinued at the end of the school year (mean NCE=24.1) or had received less than 60 lessons of instruction (mean NCE=24.2).

Spanish-speaking students who were instructed with Descubriendo la Lectura made the greatest gains of all students. The discontinued Spanish students scored a median percentile of 60.5 as a group on the La Prueba end of year test which places them above the national average of 50%. This shows that the program's effectiveness is not dependent on the cultural context or language used. In addition, the study found that Reading Recovery is unable to improve reading skills among higher-achieving students.

Long-term effects: A rank-order form was used to observe how grade-2 students, who were Reading Recovery students in 1992/93, ranked in reading in the year following Reading Recovery instructions. Those students who successfully discontinued Reading Recovery on average placed in the 53^rd percentile in their second grade classes. Thus, the program can be assumed to produce lasting improvements in reading skills.

Study 5

Evaluation Methodology

Design:
Study type/Sample size/Attrition:
This study used a mixture of a randomized control trial design and a quasi-experimental design. At the start of the study in 1992, all 24 English schools which had chosen to provide Reading Recovery were initially included in the evaluation. During the intervention year, two schools abandoned Reading Recovery (reason not stated) and were dropped by the researchers from the study. For each Reading Recovery school, the primary schools adviser identified two schools with similar characteristics, which were then randomly assigned to an alternative intervention (phonological training, N=23) or the control group (N=18), resulting in a final sample of 63 schools (QED). In each of these 63 schools, the six poorest Year 2 readers (age 6 years), approximately the bottom 20% of readers, were selected on the basis of their performance on a diagnostic survey. In the 22 Reading Recovery schools, the 4 poorest scorers among selected children were offered intervention, the remainder being assigned to a within-school control condition (QED). In each of the 23 phonological training schools, the pre-identified six poorest readers were randomly assigned to phonological training (n=4) or to a within-school control condition (n=2) (RCT). In the remaining 18 control schools, the pre-identified six poorest readers formed the control group.

Intervention:
The Reading Recovery intervention, which includes reading of graded texts, word-level phonics work and writing, was delivered in standard form by trained teachers (employed by the particular school). Children received on average 21 weeks of intervention, with an average of 77 sessions. Eighty-nine percent of the children made sufficient progress to be discontinued. The phonological training intervention involved sound awareness training and word building with plastic letters and was delivered by trained teachers that belonged to the research team (not affiliated with the particular school). Each child was given forty 10-minute individual sessions, spread over 7 months. The control group received the standard provision available in their school. As weak readers, they often received extra, specialized help with reading, on average 21 minutes weekly.

Assessment:
Children were pre-tested on a battery of reading tests in September/October 1992, before the start of intervention (pretest). Short-term gains were assessed in June/July 1993 after the interventions were completed (posttest). Medium-term gains were assessed 1 year later, in May/July 1994. Long-term effects were assessed 3 years later in September/December 1996, when children were in Year 6 (ﬁnal year of primary school).

Sample characteristics:
Boys were overrepresented at 61% of the sample (class average = 52% boys). About 42% of the sample was receiving free school meals (class average 32%); 16% spoke English as a second language (class average 17%). The groups were well matched on these demographic factors with no signiﬁcant differences.

Measures:
Validity of measurements:
All measures have been used, tested, and evaluated in prior studies. In addition, the researchers who administered the tests were blind to the group assignment of the children.

Primary outcomes:
Children were assessed on standardized reading tests, sensitive to the skills addressed by both interventions. A different battery of tests was applied at each measuring point:

Pretest and posttest:

British Ability Scale (BAS) Word Reading test
Neale Analysis of Reading test
Book Level
Clay's (1985) Diagnostic Survey
Oddities test (alpha=.83)

An overall measure of reading and spelling was calculated by summing z-scores for the Diagnostic Survey, Book Level, BAS Word Reading and the Neale Analysis of Reading, and transforming again into a z-score.

1-year follow-up:

BAS Word Reading test
Neale Analysis of Reading test
Oddities test
BAS Spelling test
Graded Non-word Reading test

An overall measure of reading and spelling was calculated by summing the z-scores for BAS Word Reading, the Neale and BAS Spelling and transforming again into a z-score.

3-year follow-up:

NFER-Nelson Group Reading Test
Parallel Spelling Test

An overall measure of reading and spelling was calculated by summing the z-scores for reading and spelling and transforming again into a z-score.

Analysis:
The study used regression analyses to estimate differential treatment effects on reading/spelling outcomes, controlling for baseline scores. The study did not use multilevel models but justified this choice by stating that preliminary analyses found "between-school variation to be very small" (p. 237). All children receiving Reading Recovery were included in the analyses, irrespective of their discontinued status.

Intention-to-treat: It is hard to definitively decide whether the study complied with the intent-to-treat principle. Even though no attrition is reported, the study seems to use the information of all children assigned to treatment and control groups.

Outcomes

Implementation fidelity: Program fidelity was monitored by the senior research officer who observed each member of the team during the program implementation stage. The researchers recorded the content of every lesson, for every child, at this stage.

Baseline Equivalence: At pretest significant differences in the overall reading/spelling scores were observed comparing the intervention groups to the control group with the intervention groups doing worse than the control group (p. 234). To address this issue, the study controlled for baseline reading ability in the statistical models.

Differential attrition: Attrition is not discussed in the text and no test for differential attrition was reported.

Posttest:
Reading Recovery: At post-test, Reading Recovery children had made substantially more progress than both their within- and between-school controls on all measures of reading and spelling and on the overall measure, except for the Oddities Test (which is a test specifically measuring phonological abilities). For example, Reading Recovery children scored significantly higher on the BAS Word Reading test (b=1.2; p<.001; d=.81) compared to the within-school control group. For the within-school comparison, 5 out of 6 tests were significant, while for the between-school comparison all tests were significant.
Phonological Training: For the phonological training, the effects were more mixed and generally weaker. For the within-school comparison, only 1 out of 6 tests received significance while for the between-school comparison, 2 out of 6 tests were significant. For example, children in the intervention group scored significantly higher (b=.3; p<.01; d=.30) on the Diagnostic Survey than did children in the control schools. However, no significant results were observed for the overall (reading/spelling) measure.

Long-term effects:
1-year follow up
Reading Recovery: One year after children had graduated from Reading Recovery, they were still signiﬁcantly ahead of their between-school controls on all measures (except for the Oddities test). Out of 6 tests 5 were significant. However, the effect size had decreased substantially for all measures (e.g., BAS Word Reading test: d=.84 vs. d=.41). In addition, the within-school comparison did not produce any significant differences between intervention and control group.
Phonological training: The between-school comparison revealed that children who had received Phonological Training one year previously had now made signiﬁcantly more progress overall, in reading and spelling, as well as phonological skills. All 6 tests were significant. However, there were no signiﬁcant differences between the Phonological children and their within-school controls on any test, including the Oddities test, which directly assesses the phonological intervention focus.

3-year follow-up
Reading Recovery: Out of 3 significance tests (reading, spelling, overall), none was significant for either the within- or between-school comparisons.
Phonological training: In the between-school comparison, positive effects were sustained with a signiﬁcant (but weak) effect of phonological training on spelling and the overall measure for reading/spelling. However, no significant within-school difference was observed.

Study 6

Evaluation Methodology

Design:
Sample selection and size:
In January 1991, the 10 schools in the New South Wales (NSW, Australia) metropolitan area, which routinely offered Reading Recovery (RR) to low-achieving first-grade students, agreed to participate in the evaluation. The NSW Department of School Education selected five additional schools (where RR was not in operation), matched as closely as possible to the experimental schools in terms of educational region, socioeconomic level, and size. As only four schools could be obtained from the two educational regions by this method, a fifth school, located in a different region but matched for size and socioeconomic level, was also included as a comparison school. In the 10 RR schools, teachers identified the 20 children at greatest risk of reading failure and used the Clay Diagnostic Survey to select the 12 lowest achieving students for participation in the study.

Study type/Randomization/Intervention:
The study used a randomized controlled trial set-up. Eight children in each of the 10 schools were randomly assigned to two groups, the experimental (n=40) and the control (n=40), while 8 children in each comparison school formed the third group (n=40). The remaining 4 children in each school from the initial pool of 12 children were randomly assigned to a holding group. These children progressively replaced the experimental group children in RR upon the latter's discontinuation. However, these children are not included in the analysis.

Children in the control group were able to take advantage of any support in reading typically available at each school until they entered the program.

Attrition:
There was substantial attrition of students in all groups resulting from students changing schools, illness factors, being withdrawn from the program prior to assessment, or ceasing to be controls by entering the experimental group. For the experimental group retention rates were 78%, 70%, 58%, 58% for pretest, posttest, first follow-up, and second follow-up, respectively. Retention rates for the control group (same order) were 98%, 85%, 78%, 40% and for the comparison group 98%, 90%, 88%, 80%.

Assessment:
Children's reading and writing skills were assessed at pretest (March 1991), posttest (June/July 1991), and at 3-month (October/November 1991) and 12-month (June 1992) follow-ups.

Sample characteristics:
Children were about 6 years of age at pretest. No additional information regarding gender, race, SES, etc. is provided.

Measures:
Children were tested by trained research assistants and not by the teachers. However, it is not clear if the research assistants were blind to the treatment conditions.

Primary outcomes:
The Burt Word Reading Tests and the Clay Diagnostic Survey, including the following tests, were administered:

Book level
Letter Identification
Concepts about Print
Word Tests
Writing Vocabulary
Dictation

A second set (Set 2 tests) of tests comprising the following six standardized and criterion-referenced tests, was also administered:

Neale Analysis of Reading Ability-Revised
Passage Reading Test
Waddington Diagnostic Spelling Test
Phonemic Awareness Test (Test-retest coefficient = .91)
Syntactic Awareness (cloze) Test
Word Attack Skills Test (Test-retest coefficient = .93)

Analysis:
A multivariate analysis of variance (MANOVA) over repeated measures was employed. Significant multivariate results (F-statistic; alpha = .05) were followed up by univariate pairwise multiple comparisons (alpha = .01). The study did control implicitly for baseline scores since they measured a group-by-time interaction.

Intention-to-treat: The study may not have followed the intent-to-treat principle. Three students were lost from the Reading Recovery group between pretest and posttest because one student changed schools, one was ill, and one was withdrawn due to poor progress.

Outcomes

Implementation fidelity: Systematic observation of each Reading Recovery teacher for one session with each of the 4 students, was undertaken in April 1991 to guarantee implementation fidelity. To investigate whether teachers had altered their general theoretical approach to teaching over the implementation phase of Reading Recovery the Theoretical Orientation to Reading Profile was administered (test for spillover effects). It was shown that the teachers did not change their teaching style and thus a spillover effect is unlikely to have biased the results.

Baseline Equivalence: Multiple comparisons indicated that there were no significant differences between the experimental and control group on any literacy measure at the pretest stage.

Differential attrition: No analysis of differential attrition was performed by the authors.

Posttest: Overall, Reading Recovery was effective in impacting children's reading skills at posttest as revealed by a significant group-by-time interaction (F=4.44; p<.001) in the MANOVA model. The discontinued Reading Recovery students outperformed control students and made significantly greater gains on Burt and Clay book level tests (p<.001) and on all the Set 2 tests (p<.001) apart from two (cloze test and Phonemic Awareness Test). Thus, 6 out of 8 tests were significant.

Long-term effects:
3-month follow-up
The MANOVA revealed a significant group-by-time interaction (F=4.44; p<.001) across outcome variables. Multiple comparisons showed that the experimental group was continuing to maintain its superiority on Burt and Clay book level tests and most of the Set 2 tests (p<.001). However, two tests of metalinguistic skills, the Cloze test and the Word Attack Skills Test failed to reach significance. Thus, out of 8 tests 6 were significant. However, the effect sizes indicate that compared to posttest, there was a diminution in effect size for all literacy tests.

12-month follow-up
A MANOVA performed on the Reading Recovery group and the control group revealed no overall significant group effect (F= 0.262, p = .0268). The univariate results indicated that only one out of eight outcome measures was marginally significant at the .01 level with the Reading Recovery group having a higher book-level score than the control group. However, the authors point out that this lack of significance might be an artifact of the small numbers remaining in the control group (40% of students). Those remaining in the control group were "probably the more skilled readers" (p.253).

Study 7

Evaluation Methodology

Design:
Sample size/Attrition:
Subjects eligible for study participation were all first grade, Spanish-speaking students (N=180) from six elementary schools in a large urban Southern Arizona school district. All eligible students received their initial literacy instruction in Spanish. In October 1991, all 180 students were given the Spanish version of the Reading Recovery Observation Survey. Based on these data, students who were in the bottom 20% were identified. Four out of the six schools had the Descubriendo la lectura (DLL) program. In these 4 schools, 50 students were identified as low-achievers of which 23 students were selected to receive the program. In the 2 schools that did not offer a Descubriendo la lectura program, children were selected from among the lowest 20% to form a control group (N=23). From the six schools in the study, all students not identified as program students or control group (N=134), were assigned to the comparison group. No attrition is mentioned in the article.

Study type/Randomization/Intervention:
This study used a quasi-experimental design. No randomization procedure was used to assign students to any of the three comparison groups. Students in the program group received the Spanish version of Reading Recovery (Descubriendo la lectura, DLL). DLL has been pilot tested and closely follows the English version of Reading Recovery. Trained teachers provide at-risk children with daily 30-minutes tutoring sessions in which the child reads skills appropriate books and writes small essays. Lessons are designed to actively involve children in their own learning. Children are guided to think and solve problems while reading. Teachers provide support, but the children do the work and solve problems.

Assessment:
In October 1991, all 180 students were pretested. The posttest was administered to all 180 students at the end of the school year in May 1992. No assessment of long-term effects was conducted.

Sample characteristics:
All subjects were dominant Spanish speakers with only limited English proficiency. No additional information is provided on sex, race, or SES characteristics of the sample.

Measures:
Validity of measurements:
A number of published studies found the Spanish construction of the Observation Survey to be valid and reliable. However, a problem might be that the teachers who administered the intervention also did the testing.

Primary outcomes:
The Spanish version of the Reading Recovery Observation Survey was used as the main tool to assess progress in reading skills. The Observation Survey consists of the following tests:

Letter identification
Word test
Concepts about print
Writing vocabulary
Dictation
Text reading

In addition, two versions of the Aprenda Reading Achievement Test were administered at pretest (Nivel Preprimario - Subtests 2, 3, 4, and total reading) and posttest (Nivel Primer Nivel Primario - Subtests 2, 3, and total reading).

Analysis:
Statistical methods/baseline control:
Mean pre- and posttest scores were compared across groups using t-tests statistics (this simple test does not allow for the inclusion of baseline scores). Because different forms of the Aprenda Achievement Test were used at pre- and posttest, student's raw scores were standardized. In the analysis the program group comprises all students who completed at least 60 lessons, including successfully discontinued and not-discontinued students.

Intention-to-treat: The study appears to follow the intent-to-treat principle.

Outcomes

Baseline Equivalence: There were significant differences at baseline (Spring 1991), comparing the program group to the control group, on a number of test scores. The authors made no effort to control for these differences in their statistical comparison of posttest results.

Differential attrition: Attrition is not mentioned in the study.

Posttest:
Program vs. comparison: At the end of the intervention (May 1992) the program group had not only caught up to the comparison group (average students), but had surpassed them on many measures. At posttest, program students outperformed comparison students on four out of six observation tasks (differences were not significant for text reading and dictation).

Program vs. control: Posttest results also indicated that there were statistically significant differences between the program group and control group on all six observation tasks, with the program group significantly outperforming the control group (p<.05) on all measures. For example, on the written vocabulary test, program children scored almost twice as high (48.5 vs. 25.7; p<.001) compared to control group children.

The improvement is also reflected in standardized gain scores for the Aprenda Spanish Achievement Test. Relating pretest to posttest results showed that the program group went from the 28th percentile to the 41st percentile while the control group went from the 26th to the 28th percentile. The 50th percentile can be considered an indicator of the national average and thus, the program group was approaching this national average. At the individual level, 91% of the program students achieved end-of-year scores on all six observation tasks that either equaled or exceeded the average.

Control vs. comparison: Comparing the control group to the comparison group shows that control group children also made gains in reading/writing skills over the academic year. However, the control group did not catch up to the comparison group while the program group did.

Long-term effects: No long-term effects were investigated by this study.

Study 8

Evaluation Methodology

Design:
Recruitment:
The study appears to have used a quasi-experimental design, but provides no information on the selection process. The study was fielded in the Wake County Public School System (WCPSS) in Raleigh, North Carolina. It provides no clear description of sample size or attrition. However, from the reported figures in different tables it can be inferred that the study was conducted with students from 30 schools in which Reading Recovery was established.

Sample size/Attrition:
The study investigated the success of Reading Recovery across three cohorts, 1990-91, 1991-92, 1992-93. For each cohort a different study setup was used (note that the sample size for the various groups reported in the following came from Table 5, p. 170). For 1990-91 the Reading Recovery group (N=72) was compared to a control group (N=75). In the schools that had an established Reading Recovery program, half of the students were randomly assigned to the intervention and half to a control group. For the 1991-92 no random control group was assigned but rather the Reading Recovery children (N=135) were compared to a "comparison group" (N=86), which comprised the lowest readers in schools that did not offer Reading Recovery (no information about the selection or number of the comparison schools is provided). For the third cohort 1992-93 neither a control group nor a comparison group was used and thus only results for the Reading Recovery intervention group (N=244) were reported. As for attrition, it appears that more students received Reading Recovery than were used in the statistical analysis, which might suggest attrition at best or arbitrary selection at worst. If we use the "total students served" figures (Table 3, p.166), the following retention rates can be calculated for the Reading Recovery intervention group: 86% (1990-91 cohort); 92% (1991-92 cohort); 98% (1992-93 cohort).

Study type/Intervention:
Based on the above provided information the study might be considered a randomized control trial (at least for the cohort 1990-91). The usual reading recovery intervention was implemented based on daily 30-minute one-on-one tutoring sessions by trained teachers. Children where discontinued if they performed within the average range for their first-grade peers. A full program is generally considered 60 lessons, although sometimes the number of lessons will vary depending on students' progress.

Assessment:
Reading and writing skills of children were assessed using a pretest (beginning of academic year) and a posttest (end of academic year). Selective measures at the end of the second and third academic year were used to investigate long-term program effects.

Sample characteristics:
Characteristics of the sample were not reported.

Measures:
Validity of measurements:
The study used the validated Clay Observation Survey. It is not clear who conducted the testing (potentially not blind to conditions).

Primary outcomes:
The Clay Observation Survey was used to evaluate program progress. The Clay Observation Survey uses the following measures:

Letter identification
Word test
Concepts about print
Written vocabulary
Dictation test
Text reading level

In addition, the North Carolina EOG test in reading was used to investigate long-term effects.

Also, student need for special education, Chapter 1 service, and grade retention were measured to evaluate long-term effects of Reading Recovery.

Analysis:
Statistical methods/baseline control:
The study used basic statistics such as chi-square tests or Fisher's exact test to compare results for the program vs. control group. Baseline controls were not used.

Intention-to-treat: Due to poor reporting of the methodology, it is not possible to judge the study's adherence to the intent-to-treat principle.

Outcomes

Baseline Equivalence: Baseline scores for the program and control groups were similar in 1990-91 but differed between the treatment and comparison group for the 1991-92 cohort. No effort was made to control for these differences.

Differential attrition: Attrition was not reported by the authors and no statistical analysis of differential attrition was performed.

Posttest: Only for the 1990-91 cohort was a comparison between intervention and control group possible. For this cohort, Reading Recovery students showed greater mean short-term gains than control students on three of the six measures of the Clay Observation Survey (writing vocabulary, dictation, text reading). This improvement is also reflected in the observation that a higher percentage of Reading Recovery students scored in the first-grade average band than the control group on the same three measures (80% vs. 45% for writing vocabulary, 61% vs. 35% for dictation, and 49% vs. 15% for text reading).

Similar results were obtained for the 1991-92 cohort. Reading Recovery students showed higher scores on all measures at posttest compared to the comparison group (recall that the comparison group comprised low-achievers from schools without access to Reading Recovery).

Long-term effects: The positive effects of Reading Recovery appear to become lost over time. To measure long-term effects the study did not use the Clay Observation Survey but rather relied on measures for the need of additional services (e.g. special education) after receipt of the full Reading Recovery intervention. A small program benefit was observed one year after posttest at which point Reading Recovery children were less likely to need Chapter 1 service compared to the control group. However, after two years, Reading Recovery students were as likely to be retained in grade, placed in special education, or to receive Chapter 1 services, as the control group children.

In addition, the North Carolina EOG Reading test was used to evaluate long-term effects of Reading Recovery. No statistically significant difference was observed between the program and control group two years after intervention.

Study 9

Evaluation Methodology

Design:
This randomized controlled trial of the Reading Recovery early intervention ran for the length of one academic year. The analysis sample comprised n=148 first-grade students from schools across 14 states in the U.S. Pupils fell into one of four groups -- two groups of at-risk children randomized to receive the program in either the first or second half of the school year (where those receiving RR in the second round served as a control) and two non-randomized comparison groups of high- and low-average students, respectively, selected by the intervention teachers to provide additional points of comparison and neither of which received RR.

The evaluation sought to identify whether first-round intervention students made greater gains in reading development over those second-round intervention students who were yet to receive the intervention (thus acting as a control group).

Forty-seven Reading Recovery teachers selected all children involved in the study, with n=2 students per teacher selected for the two intervention groups (randomized to treatment or control respectively, where treatment students receive the program in the first part of the academic year and control students receive it in the second part of the academic year) and n=2 students selected for the two non-randomized comparison groups (comprising 2 additional students from the same classroom, considered to be 'high-average' and 'low-average'). All students selected by a particular teacher were from the same classroom.

The selection procedure for the students in each of the randomized intervention and non-randomized comparison groups began initially with the normal selection procedure for Reading Recovery (reference given, pg. 261). After this, the RR teacher identified the lowest 20% to 30% of their students for assessment on six tasks from Clay's Observation Survey (to assess reading and writing ability). The lowest three students were allocated to the program (they were not part of randomization). The fourth and fifth lowest children in the class were selected for randomization to receive the program in either the first half of the year (program, or 'first round') or the second half of the year (control, or 'second round'). The rationale for this approach is that each RR teacher has four half-hour slots, so the three lowest-performing students get the first three slots, and the fourth (last) slot is decided by random allocation.

Two additional students from the same classroom were identified to participate in each of the three assessments. These students were selected on the basis of the classroom teacher's ranking and available assessment information as a high-average and low-average reader. The high-average child was from the middle of the teacher's rankings after the students expected to receive RR service were removed. The low-average child was the lowest student in the class who was not expected to receive RR service.

Measurements were taken at three separate time points: pre-intervention, mid-year (at the end of round one of intervention, also referred to as 'transition') and at post-second-round intervention (i.e. at the end of the first grade academic year - usually 2 weeks before the end of the school year). Teachers who had missing data for mid-year testing were excluded from the analysis, so only data from 37 of the original 47 teachers were included in the analysis. It is not stated but this presumably means that the total potential sample was n=188 (47 teachers x 4 pupils) whereas the analysis sample was n=148 (37 teachers x 4 pupils) - a loss of 21%.

Midyear ('transition point') measurements were taken either when the student was judged to have met the criteria to terminate the intervention (average level of literacy performance for his/her class, plus also demonstrating a particular set of strategies known to increase the chances of continued progression), or at the end of the 20th week of intervention (if adequate progression had not taken place). Generally, students ended their program participation after between 12 and 20 weeks of intervention sessions.

The Reading Recovery teachers administered most of the measures themselves apart from the Observation Survey (used to decide upon discontinuation), which the RR program specified must be carried out by another trained teacher.

The intervention was provided alongside standard classroom literacy instruction and any other additional literacy support provided by the school. The authors do not specify the content of the instruction/support that control participants had or any additional support that the intervention (round one) students may have had.

The authors do not state explicitly whether there were equal numbers of participants in each of the four groups (Ns differ for each of the data tables).

Sample:
The analysis sample totaled n=148 first-graders from schools across 14 different US states. The sample was 53% male and 47% female, with lunch subsidy data (only available for n=107) indicating that 43% received free school lunches, 8% received reduced-price lunches, and 49% received no lunch subsidy. The racial and ethnic breakdown of the sample was 46% White, 40% African American, 12% Hispanic-Latino, and 2% Asian. No demographics were provided for the teachers involved in the study.

Measures:
The evaluation sought to measure whether RR improved a variety of reading and writing knowledge and skills related to literacy learning. A number of measures were used to capture different aspects of literacy development.

Six measures taken from An Observation Survey of Early Literacy Achievement were used to assess reading and writing knowledge. These six measures were: (1) the text level task (book reading), (2) letter identification task, (3) concepts about print task, (4) Ohio Word Test, (5) writing vocabulary task, and (6) hearing and recording sounds in words task. Reliability statistics ranged from .62 - .98 (r and alpha) and intercorrelations for the tasks ranged from .554 to .894. All tasks had updated norms. Validity and discrimination data are provided in Clay (2002). All six measures were completed by teachers and were carried out at the beginning of the year (pretest), at the transition from first- to second-round of intervention service, and at the end of the school year (two weeks before the end).

Teachers submitted a data summary for each child at each test period. They did not submit item information on each task, so reliability estimates for the research sample could not be calculated.

The following additional measures of literacy were also used: The Phoneme Segmentation Test; The Deletion Task (10-item version of the Roser [1975] task); The Slosson Oral Reading Test-Revised; and The Degrees of Reading Power Test. However, no data was captured at pretest for these measures; it was only available at time points 2 and 3 (midyear and posttest). This means that no true pretest-posttest change scores were captured for intervention (first-round) group vs control (second-round) group for these measures. The results are therefore not reported here.

Analysis:
For each of the Observation Survey measures, a 4 (group) x 3 (test period) repeated measures ANOVA was conducted to examine intervention effectiveness. A significant Group x Test Period interaction for the Observation Survey variables was followed by a simple effects analysis among groups at each test period. The key test compared the randomized treatment and control group at the transition period. Effect sizes (Cohen's d) were calculated only for significant simple comparisons between the two randomized groups at the transition period. These were calculated as the mean difference between groups divided by the pooled standard deviation.

Outcomes

Implementation Fidelity: Implementation fidelity was not discussed by the authors.

Baseline Equivalence and Differential Attrition: The simple comparisons between groups demonstrated baseline equivalence on all outcome variables between the round-one (intervention) group and round-two (control) group at pretest. However, the study failed to test for baseline differences by sociodemographic characteristics, despite some large differences. For example, the treatment RR group consisted of 61% males and 38% whites, while the control treatment RR group consisted of 41% males and 47% whites.

Attrition rates were not provided by group, nor were tests done on differences in attrition by baseline characteristics.

Posttest: The analysis for each of the Observation Survey measures resulted in a significant Group x Test Period interaction. Simple comparisons were therefore carried out and displayed significance for each of the variables, in favor of the intervention (round-one) group: Text Level, F(3, 129) = 22.77, p< .005; Letter ID, F(3, 129) = 7.54, p< .005; Ohio Word Test, F(3, 129) = 16.59, p< .005; Concepts About Print, F(3, 129) = 8.70, p< .005, Writing Vocabulary, F(3,129) = 6.67, p< .005; and Hearing and Recording Sounds in Words (HRSW), F(3, 129) = 10.29, p< .005.

Effect sizes (d) were also provided for most of the variables, as follows: Text Level = 2.02; Ohio Word Test = 1.38; Concepts About Print = 1.10; Writing Vocabulary = .90; and HRSW = 1.06.

Overall results for the two groups show that 65% completed early, with 16% reported as "incomplete". Interestingly, all of the "incomplete" program students came from the round-two group, with all but one of the "early completers" coming from the round-one group. This raises the question whether or not the groups were truly equivalent, or if there existed some fundamental difference not captured at the pretest stage. This could account for the unusually large effect sizes.

Another point of note is that it is unclear whether or not the authors used comparisons of change scores in their initial calculations, as it looks as though a straight comparison between scores at midyear (one time point) have been used. This would mean that pretest scores were not factored in at all (although equivalence was demonstrated at pretest).

No results relating to comparisons with the two non-randomized groups are displayed here, due to the fact that they were not equivalent to the randomized groups at pretest.

Long-Term: No data was captured to assess any possible long-term effects of the program.

No dose-response or mediation analyses were conducted.

Study 10

Evaluation Methodology

Design: This study used a randomized-controlled trial to estimate short-term program impacts on student achievement after the 12-20 week program was implemented in 2011-2012. Although this trial is one part of the study's long-term evaluation, only findings from the posttests are yet available. Of the 628 schools involved in the larger evaluation, 209 schools were randomly selected to participate in the trial, of which 158 implemented the random condition assignments. The study did not report any details on the recruitment, characteristics, or locations of the schools. The study noted that few of the noncompliant schools deliberately decided not to participate, as many had legitimate reasons beyond the school's control. At each school, the eight first-grade students with the lowest reading achievement were matched according to pretest scores and English language learner status, and within each pair, one student was randomly assigned to treatment and the other to control.

In the 158 participating schools, 1,253 students were randomly assigned. The study administered pretests prior to randomization and posttests at the conclusion of the intervention (midway through the school year). Of the 1,253 randomly assigned students, 866 (69%) students in 147 schools had Reading Recovery data, outcome data, and a match with complete data. The study reported that missing data primarily resulted from student mobility or other factors that led to the inability or failure to administer the posttest assessments.

Sample Characteristics: The study analyzed 866 students identified as having low reading achievement. The majority of the sample (61%) was male and most students were not English language learners (81-83%). Whites comprised the largest percentage of the group (56-57%), followed by Hispanics (20-22%), blacks (18-19%), and students of other race (3-5%).

Measures: All outcome measures were taken from the Iowa Tests of Basic Skills, a well-regarded, group-administered, norm- and criterion-referenced, standardized assessment. Reliability coefficients for the test ranged from middle .80s to low .90s. The study provided references for additional details on the Iowa Test. The study used the following measures:

Composite reading
Reading words subscale
Reading comprehension subscale

The study used the following pretest measure:

Reading performance, from the Text Reading Level subscale in the Observation Survey of Early Literacy Achievement. This one-to-one, teacher-administered, and standardized instrument has been validated by others and has shown moderate to high test-retest and internal consistency reliability.

Analysis: To determine program effects, three-level hierarchical linear models nested students within matched pairs and matched pairs within schools. Models controlled for pretest reading performance (but not the exact outcome measure) and allowed random school intercepts and random treatment effects across schools. Effect sizes were determined with Cohen's D and Glass' D, the former of which was calculated with the standardized deviation for national norms and latter of which was calculated with the standardized deviation of the outcome for the control group.

All student pairs that had complete data were included, but the study did not attempt to follow students with missing test scores.

Outcomes

Implementation Fidelity: The study concluded that the Reading Recovery model is being implemented with high fidelity since teachers, teacher leaders, and site coordinators met 95%, 87%, and 88% of standards, respectively. However, the study noted that there was less fidelity to the requirements of formally documenting each lesson. Further details on fidelity to program standards and guidelines are available in Chapter 4. Chapter 7 provides information on school-level implementation.

Baseline Equivalence: The groups did not differ significantly on pretest reading performance, gender, English Language Learner status, or race, but the tests compared the analysis sample rather than the randomized sample.

Differential Attrition: The study dropped both subjects in a matched pair if one subject was missing data. Analyses for those students included and excluded from the analytic sample indicated no significant differences in pretest reading performance, gender, race, or English language learner status. Baseline comparisons across condition for the analysis sample also indicated no differential attrition.

Posttest: Treatment students scored significantly higher on all three reading outcomes (composite reading, reading words subscale, reading comprehension subscale). Cohen's D effect sizes ranged from .44 to .47.

Moderation: Results of analysis restricted to students in rural schools or to English language learners were similar to the overall results. Despite smaller sample sizes, the program had significant effects on composite reading for these subgroups.

Study 11

Evaluation Methodology

Design: This study conducted a meta-analysis of 36 U.S. studies of Reading Recovery. The studies were obtained through comprehensive searches of ERIC, PsycInfo, and Dissertation Abstracts databases and through the footnote and references lists of identified manuscripts. A total of 109 studies were collected for potential inclusion. Of these, 36 met the following eligibility criteria: (1) had evidence of treatment fidelity (students only received program instruction), (2) reported sample sizes in treatment and comparison groups, (3) had pretest or posttest scores, (4) did not duplicate data, (5) was conducted in U.S. schools, (6) had data to compute effect sizes, and (7) specified a reading skill outcome measure. An additional set of analyses used 11 studies that met the eligibility criteria and also reported pretest and posttest scores for treatment and comparison groups.

The authors did not indicate how many studies used randomized controlled study design or how studies determined condition statuses, but noted that a small fraction of the 11 studies used randomly assigned groups. Treatment students were categorized into discontinued (students who improved enough to leave the program), not-discontinued (students who never improved enough to leave the program), and all program students. Comparison students were classified as "similar needy" (at or below the twentieth percentile) as the intervention group or as "regular" (above the 20th percentile).

For the 36 studies, data were collected in years between 1984 and 1996. Sample sizes ranged from 9 to 1334 students. The 36 studies were conducted in various U.S. locations. Several studies were located in Ohio, a few were in Texas, and the rest were in other locations such as Oregon or Michigan.

Sample Characteristics: No sample characteristics were provided.

Measures: All studies used a reading skill outcome measure. Many used the following measures from the Observation Survey of Early Literacy Achievement created by the developer of Reading Recovery:

writing vocabulary
hearing and recording sounds in words
text reading level
letter identification
word tests
print concepts

The study also used the following outcome measure:

standardized tests such as the California Test of Basic Skills

Analysis: For each outcome and treatment group type, the study computed separate average weighted effect sizes at each test time (pretest, posttest, and 2nd grade follow-up), although not all groups had enough cases for each test time or outcome. For the 36 studies, many of which did not have pretest standard deviations, the study calculated effect sizes with population comparison-group means and pooled standard deviations estimated with various methods. Analysis for the group of 11 studies computed effect sizes using the conventional standardized mean difference formula.

The study computed Z statistics for each effect size distribution to test the null hypothesis that each point estimate essentially equaled zero. For the analysis of the 36 studies, the study did not control for pretest levels or conduct significance tests for changes in effect sizes from pretest to posttest among the 36 studies. For the analysis of the 11 studies with more complete data, weighted meta-regression analysis predicted mean posttest scores controlling for mean pretest scores and condition status and using group degrees of freedom as weights.

It is unknown if the analysis was conducted at the proper level, since the study did not report how condition statuses were determined. It is unknown if the studies followed intent-to-treat since there was no information on attrition. However, the analysis also examined results for discontinued students only - a subset of those who successfully leave the program but excluding those doing poorly enough to continue. These results likely violate the intent-to-treat principle.

Outcomes

Implementation Fidelity: All studies included in the meta-analysis showed evidence of treatment fidelity. The authors reported that "teachers must receive rather rigorous preparation to become [Reading Recovery] instructors, and the overall quality control in program delivery is relatively high" (D'Agostino & Murphy, 2004: 29). No other details were given.

Baseline Equivalence: Treatment and comparison groups were not equivalent across the studies. For the 36 studies, the study reported that "across all outcomes and groups pretest effect sizes were negative, indicating that [Reading Recovery] students scored lower than comparison-group students initially" (D'Agostino & Murphy, 2004: 30). For the group of 11 studies, treatment students scored higher on standardized achievement tests and on letter identification than other low-achieving students.

Differential Attrition: The study did not provide any information on attrition.

Posttest: Results generally indicated improvements for the intervention group.

Using all 36 studies, results indicated stronger findings comparing intervention students to other low-achieving rather than to regular students. Compared to similarly low-achieving students, the treatment group had significantly higher posttest scores on all seven reading skill outcome measures despite also having significantly lower pretest scores on all seven outcomes. Compared to regular students, the treatment group showed significantly higher posttest scores for three outcomes (writing vocabulary, hearing and recording sounds in words, and text reading level). The treatment group had significantly lower scores for the other four outcomes (standardized achievement tests, letter identification, word test, and concepts about print), but these scores were closer to the comparison group than pretest scores, though no significance test was conducted.

Additional analysis using the 11 studies with pretest data for both treatment and control groups showed that all seven reading skill posttest outcomes were significantly higher among treatment compared to other low-achieving students. Weighted regressions controlling for pretest scores indicated that intervention students had higher scores for six of seven outcomes (writing vocabulary, hearing and recording sounds in words, text reading level, letter identification, word test, and concepts about print) compared to similar low-achieving students. There was no significant treatment effect for standardized achievement test scores.

Moderation: Results showed generally higher pretest and posttest scores among students who were discontinued compared to those were not discontinued, although there were no significance tests comparing scores of these groups. However, the discontinued students would appear to be a selective group of the most successful program participants.

1-year follow-up: In second grade, the treatment group scored significantly higher on standardized achievement tests than similarly low-achieving students. The treatment group had significantly lower scores on this outcome than regular students, but the scores were more similar at posttest than pretest, although no significance test confirmed this trend.

Study 12

Evaluation Methodology

Design:

Recruitment: Prior to the start of the 2012-2013 school year, 348 schools participating in the scale-up were randomly selected for this randomized controlled trial. At each selected school, low-performing students were identified using the Observation Survey of Early Literacy Achievement. The eight students with the lowest scores were included in the study. However, 267 schools actually carried out the selection and assignment process, with the other 81 being dropped from the study. The 267 schools selected a total of 2,092 students to participate in the study. Page 43 notes that many IEP children were excluded despite low reading performance because they were seen as already receiving one-on-one reading support.

Assignment: Of the 2,092 participating students, 1,048 were randomly assigned to the intervention group and 1,044 to the control group. They were first matched into pairs within each school according to pretest scores and English Language Learner status. One student in the pair was randomly assigned to the Reading Recovery treatment group for the first half of the school year in addition to regular classroom literacy instruction. The other student was assigned to the control group, which received regular classroom literacy instruction. The control student was eligible to receive the treatment after the program in the second half of the school year. The study noted (p. 18) that the vast majority of control group students received substantial support in addition to regular classroom instruction.

Attrition: Assessment occurred at the end of the 12- to 20-week intervention period (midyear posttest). Of the 2,092 students, a total of 1,893 had available pretest data (90.5%). At posttest, 1,697 (81.1%) had data. A total of 1,430 students with data at both points were able to be matched into pairs of treatment and control (715 matched pairs in 233 schools). This sample represents 68.4% of the students in schools that carried out the random assignment. The missing data at the student level primarily resulted from student mobility or other factors that prohibited administration of the posttest measures to both treatment and control students in a pair.

Sample Characteristics: Students in the sample were 58-60% male, 55% white, 21% Hispanic, and 16% Black. About 21% were English-language learners.

Measures: The pretest measure, the Observation Survey of Early Literacy Achievement, is a one-to-one, teacher-administered, standardized assessment. It has six sub-scales: Letter Identification, Concepts about Print, Ohio Word Test, Writing Vocabulary, Hearing and Recording Sounds in Words, and Text Reading Level. The Text Reading Level subtest was used to block students during the random assignment process, and later as a pretest covariate in the statistical models of impacts (but not as an outcome).

The Iowa Test of Basic Skills served as the outcome measure. The measure is a standardized, group-administered assessment of cognitive readiness for the academic aspects of the curriculum and growth in fundamental areas of school achievement.

Analysis: The analysis used a three-level hierarchical linear model with students nested within matched pairs, and matched pairs nested within schools. Models controlled for pretest performance with a covariate for the Observational Survey text reading level scores, and included random effects for blocks (matched pairs), a random effect for overall school performance (random school intercepts), and a random effect for the impact of Reading Recovery (random treatment effects across schools).

Standardized effect sizes were calculated with Glass' D, which represents a standardized effect relative to the distribution of outcomes for control group participants. In addition, the study reported a population-based Cohen's D standardized effect size, which was calculated by dividing the raw impact estimate by the standard deviation of Iowa Test of Basic Skills for the national norming sample.

The study analyzed all student pairs with complete data, but it did not attempt to follow students with missing test scores. If based on student mobility and school absence, missing data are unlikely to be related to the condition.

Outcomes

Implementation fidelity: Overall, 85% of the indicators used to assess implementation fidelity showed adequate implementation, and all four categories of Implementation Fidelity Activities represented in the Implementation Fidelity Logic Model (Figure 1) were implemented with fidelity. However, some inconsistencies were found in the selection of students for participation in the program, particularly among students receiving special education services.

Baseline equivalence: The baseline balance tests examined three demographic variables and one reading variable for the final analytic sample of 1,430 students in 233 schools rather than the full randomized sample. No significant differences were found between treatment and control groups on gender, ELL status, race, or text reading level.

Differential attrition: The study dropped both subjects in a matched pair if one subject was missing data. Analyses of differences in student characteristics for those students included and excluded from the analytic sample indicated no significant differences in pretest text reading levels (p = .63), gender (p = .55), race (p = .94), or ELL status (p = .68). Baseline comparisons across condition for the analysis sample also indicated no differential attrition.

Posttest: The intervention students showed significantly better posttest scores than the control students on total reading (Glass' D = .42), the reading words subscale (Glass' D = .40), and the reading comprehensive subscale (Glass' D = .36). Separate tests for populations of special interest found significant intervention effects for rural schools and for ELL students.

Study 13

Evaluation Methodology

Design:

Recruitment: As part of the i3-funded scale-up of the Reading Recovery program, a total of 1,490 schools were randomly selected from the population of schools participating in the scale-up, of which 1,254 (84%) schools agreed to participate in the evaluation. Reading Recovery teachers screened all students at each participating school for eligibility using the Observation Survey of Early Literacy Achievement (OS), which is consistent with standard procedures. The eight students with the lowest OS scores in a given school were selected to participate in the evaluation. The final sample included 9,784 children from 1,254 schools.

Assignment: An online random assignment tool was utilized to match students into pairs within schools. Students were matched according to English language learner (ELL) status and total OS score. A randomization algorithm was then used to assign one student in each matched pair to the intervention condition (n=4,892 children) and the other to the control condition (n=4,892 children).

Attrition: Assessments were conducted at baseline and post-intervention. For each pair of students, when the treatment student completed the intervention (12-20 weeks after baseline, depending on individual student progress), both students were assessed. Of the 9,784 participants allocated to conditions, 1,929 (20%) were missing pretest or posttest data. To minimize differential attrition, both the subject with missing data and the matched pair in the other condition were dropped, resulting in an overall attrition rate of 30% from randomization to follow-up. This yielded an analysis sample of 6,888 students (3,444 matched pairs) from 1,122 schools.

Sample:

The sample was 55.1% White, 20.3% Hispanic, 14.6% Black, 5.1% Asian, 3.8% two or more races, 0.9% American Indian/Alaskan Native, and 0.2% Hawaiian Native/Pacific Islander. Regarding locale, 45.2% of students attended suburban schools, 28.3% attended urban schools, and 26.5% attended rural schools.

Measures:

Students' reading achievement was assessed at baseline and immediately following the intervention (12-20 weeks after baseline) using two well-validated standardized reading assessments: the Iowa Tests of Basic Skills (ITBS) and the Observation Survey of Early Literacy Achievement (OS). One primary outcome was computed for analysis: reading achievement, as assessed by the Total Reading standard score from the ITBS. Secondary outcomes include reading words and reading comprehension, assessed by subtests of the ITBS, and literacy achievement, assessed by Total OS scores. No reliability information was reported for the present sample. Reading Recovery teachers administered the tests to both treatment and control students, but someone other than the student's own Reading Recovery teacher administered the test to treatment students.

Analysis:

The researchers used three-level hierarchical linear models (HLM) to test for intervention effects. Students were nested within matched pairs, and matched pairs were nested within schools. Differences in posttest reading performance of intervention and control participants were estimated after controlling for pretest reading performance. Text Reading Level (TRL) scores, a subscale of the OS, were included as a covariate in all models, and the OS Total score was included as a covariate in exploratory models. All models included a binary indicator of condition (treatment vs. control), a four-category fixed effect for year, an interaction effect for condition by year, a random effect for matched pair, a random effect for overall school performance, and a school-level random effect for the impact of the intervention. An unstructured covariance matrix, which included a correlation between random effects for school-level intercept and slope, was used. A grouped residual variance was included to account for differences in dispersion of outcome scores within the treatment versus control groups. Models were estimated via Restricted Maximum Likelihood (REML), with model-based standard errors and degrees of freedom based on within- and between-cluster sample sizes.

Intent-to-Treat: Matched pairs in which both students had complete pretest and posttest data were included in the analysis, and pairs in which either student was missing pretest or posttest data were dropped from the study.

Outcomes

Implementation Fidelity:

Implementation fidelity was measured using a five-step process: 1) Operationalize each of the relevant program standards as measurable program indicators; 2) Construct a logic model that defines the core activities of Reading Recovery by grouping program indicators into four key components (i.e., staff background and selection, teacher leader and site capacity, reading recovery teacher training and ongoing professional development, and one-to-one reading recovery lessons); 3) Define minimum thresholds for adequate implementation (80%); 4) Collect data on each program indicator directly from implementers; and 5) Measure adherence to the program indicators and assess adequacy of implementation for each.

Baseline Equivalence:

Using the analysis sample rather than the randomized sample, no significant differences were detected in baseline demographics (sex, ELL status, race/ethnicity) or outcome variables (Text Reading Level, a subscale of the OS). However, tests for baseline equivalence on the primary outcome measure (reading achievement) and secondary outcome measures (literacy achievement, reading words, and reading comprehension) were not reported.

Differential Attrition:

Because the matched pair was dropped along with those having missing data, attrition was equal in the intervention (15%) and control groups (15%). Dropouts were disproportionately non-White, but were not statistically different from completers on other variables of interest (sex, ELL status, and OS Text Reading Level). Tests for baseline equivalence in the analysis sample suggested little in the way of differential attrition. However, these tests did not include the primary outcome measure (reading achievement) or secondary outcome measures (literacy achievement, reading words, and reading comprehension).

Posttest:

At posttest, relative to control participants, participants in the intervention condition showed significantly greater improvements in four of four outcomes tested: reading achievement, literacy achievement, reading words, and reading comprehension.

Long-Term:

Not examined.