A New Review Finds Validity Problems Undermine Third-Grade Retention Report
East lansing, Mich. – As part of its working paper series, the National Bureau of Economic Research (NBER) recently released a report examining the outcomes of Florida’s third-grade retention policy. The report concluded, contrary to the conventional wisdom on grade retention, that third-grade retention had positive effects on the following year’s test results, but the effects fade, with no effect on graduation. Despite this, a new academic review finds several shortcomings that severely limit the report’s usefulness.
The report, The Effects of Test-based Retention on Student Outcomes Over Time: Regression Discontinuity Evidence from Florida, was reviewed by Joseph P. Robinson-Cimpian for the Think Twice think tank review project with funding from the Great Lakes Center for Education Research and Practice. Robinson-Cimpian is an associate professor and College of Education Distinguished Scholar in the Department of Educational Psychology at the University of Illinois at Urbana-Champaign. His research focuses on the use and development of novel and rigorous methods to study equity and policy.
The report attempted to investigate the impact of a Florida policy, which flags students for retention, to repeat third grade, based on a state-specified cut-score on the Florida Comprehensive Achievement Test. The findings indicated that students just below the threshold (one-third of whom were retained) performed better than those just above the threshold (5% of whom were retained) on next year’s tests.
In his review, Robinson-Cimpian notes that the report relies on what is known as a regression discontinuity design (RDD), a technique used for making causal inferences from non-experimental data when a threshold determines or strongly predicts treatment assignment (comparing students immediately above and below the law’s cut-score, lend themselves to making causal claims).
However, Robinson-Cimpian finds serious shortcomings. Most notably, because students above the cut-score do not receive the extra supports provided to students below the cut-score, the researchers cannot know if positive outcomes for those below the cut-score were due to the greater likelihood of retention or to the assurance of additional services.
Additionally, Robinson-Cimpian finds that the report exacerbates the outcome differences between those below and above the threshold by using an Instrumental Variable approach, which attributes the entire difference to just the one-third of students who are retained, effectively making the outcome difference appear more than three times as large. Importantly, Robinson-Cimpian finds the very use of the Instrumental Variable approach is inappropriate because the method assumes that failing to attain the threshold has no effect on outcomes other than through increasing the likelihood of retention.
Overall, Robinson-Cimpian points out, the methods used have extremely limited generalizability, which is restricted to students at or very near the threshold and directly affected by the policy. Even setting aside the problems generated by confounding retention effects with the effects of other interventions and supports, the findings are not easily generalizable to lower- or higher-achieving students, to other grades, or to other states with similar test-based retention policies.
Find the review on the Great Lakes Center website.
Find the original report by Guido Schwerdt and Martin R. West on the web.
Think Twice, a project of the National Education Policy Center, provides the public, policymakers and the press with timely, academically sound reviews of selected publications. The project is made possible by funding from the Great Lakes Center for Education Research and Practice.
The review can also be found on the NEPC website.