When a standardized test is given for the first time, the psychometricians who administer the test conduct a rather involved process to decide how to convert students’ raw test scores into achievement level designations like the 1 through 5 scales used for both Florida’s standardized testing program and Advanced Placement exams. The process generally involves two panels of experts in the subject covered by the test (called “content experts”) – one to set “achievement level descriptions” (ALD’s) that describe the levels of understanding of students at each of the achievement levels, and a second panel to apply those ALD’s to deciding how well students at each of the achievement levels perform on each question on the test.
The ALD’s are the most critically important decision in the whole process, since they set the rigor level. The second panel – generally called the “standard-setting panel” – is entirely constrained by the ALD’s in making its decisions on “cut scores”, the raw scores that correspond to the boundaries between achievement levels. This past summer, I was a member of the standard-setting panel for the new Advanced Placement Physics 2 exam. The new AP Physics courses are intended to drive profound improvements in the way physics is taught in high schools, and the ALD’s are ambitious. The initial stage of the process of setting cut scores is based entirely on the questions on the test and the ALD’s, without any reference to how well students actually performed on the questions. Our panel had a remarkable level of agreement about what the cut scores should be in this first stage. Then we were fed the “impact data” – the distribution of students among the five achievement levels that would result from the cut scores we had reached during this first stage. We were horrified with the results, and ultimately the powers-that-be at the AP program softened the cut scores we came up with. The teachers and students who were involved with the new AP Physics exams in May might be surprised to learn that our initial recommendation for AP Physics 2 was even more brutal than the final distribution of achievement level results.
Which brings us to the situation with Florida’s new standardized tests. This weekend, Florida Board of Education Vice Chair John Padget argued in a Southwest Florida News-Press op-ed that the cut scores proposed by two panels for the FSA tests on English Language Arts and math are too soft, and that Commissioner Stewart – who will make the final decisions on cut scores in October – should adopt higher cut scores (and thus drive down passing rates) to drive Florida’s students to higher levels of achievement. As School Zone reported, Jeb Bush’s education foundation made the same argument last week.
But it is easy to predict what will happen next. Commissioner Stewart will say that the cut scores are a result of a research-based process that has been used for decades in Florida and elsewhere, and that she has no choice but to adopt cut scores that are close to those recommended by the panels.
Florida’s process for setting cut scores involves three panels. The first is a panel of Florida teachers that sets the all-important ALD’s. They met at the end of April. The second panel is also composed of teachers (and is called the “educator panel”), and according to the FLDOE they went through a standard-setting process at the beginning of this month that was quite similar to the process I worked on for AP Physics earlier this summer. They came up with a set of cut scores based on the ALD’s, as my AP Physics panel did. Finally, a group of “community leaders”, called a reactor panel (I was a member of the reactor panel for three Florida exams in 2012), reviews the work of the educator panel and makes its own set of recommendations based mostly on how they think the test results will play politically. (In 2012, we tweaked the Geometry cut scores because of a statistical problem that neither the educator panel nor the psychometricians had caught – but I’m sure that sort of thing doesn’t happen often)
The passing rates proposed by the educator and reactor panels for the math exams (I’ll ignore ELA here) are shown below. Given Florida’s lack of success on national and international math tests like NAEP and PISA in the past, the proposed passing rates are shockingly high.
Hence the expressions of concern from John Padget and Jeb Bush’s foundation. Clearly something has gone wrong in the standard-setting process (except for Algebra 2 – that is probably a realistic assessment of where Florida’s students stand nationally and internationally).
But if something went wrong in the standard-setting process, what was it? The reactor panel basically gave the same answers as the educator panel. And the educator panel was entirely constrained by the ALD’s, which were written by the ALD panel.
But the ALD panel was composed of Florida teachers. And that is the problem. Because those teachers – and their friends and colleagues and students – are going to be graded based on those ALD’s. The ALD panelists knew it, and there was no way that the panelists’ judgments were not influenced by that knowledge. It’s a classic – if rather invisible – conflict of interest.
So how can this be fixed? The answer is pretty simple, actually. Scrap this year’s standard-setting process. Start over next year with an ALD panel that is composed of math (and ELA) education experts from around the nation – and not Florida. That panel will write ALD’s that are much more likely to be appropriately ambitious. Then rerun the remainder of the standard-setting process with the educator and reactor cut score panels.
John Padget and Jeb Bush’s foundation are right to be concerned that settling for mediocrity is bad for the future of Florida’s students. The state needs to get the standards-setting process right, and it hasn’t yet done so. We can only hope that policy-makers will have the guts to stand firm and make it right.