“Everyone knows who the best teachers are.”
It’s a common refrain. After all, we knew we wanted our kids in Ms. Brown’s second grade class, and tried to steer them clear of Mrs. Foster in the fourth grade.
But sorting teachers in a way that is consistent, credible, fair, and reliable — particularly when critical decisions like tenure, retention, or dismissal are on the line — turns out to be devilishly complex.
Apparently, the New Jersey Department of Education does not agree. On June 16 it announced a pilot teacher evaluation program that trivializes the complexity of the problem and puts participants on the equivalent of a forced march.
In brief, the department invites districts to submit applications by July 28 to try to solve instructional and evaluation issues that have evaded “fair” and “reliable” solutions for decades. Not only will the winning districts receive inadequate funding for the scale of the project, but also they must be ready to roll by September — and get everything done in one school year.
In terms of where we are with the science and art of evaluating classroom teachers, this is akin to President Teddy Roosevelt commending the Wright Brothers for their 200-foot flight at 10 feet and ordering a fighter plane for delivery next year.
A quick look at the terms of the grant make it easy to see how it’s broken:
Winning districts must agree to use state assessment results for up to 45 percent of the performance evaluation of each teacher.
There is a long list of problems with this approach. Here are a few:
Less than 20 percent of instructors teach literacy and/or math in grades 4 through 8, the only subjects and grades for which the “value-added method” (VAM) can be used. VAM takes a student’s results from last year’s tests as the baseline for measuring the teacher’s contribution to their progress in the current year.
The National Academy and just about every respected statistician, researcher, evaluator, and expert argues that standardized test results should not be used as a measure of teacher performance when making important personnel decisions. Here are a few of the reasons behind this consensus:
+ To be fair and reliable, students must be assigned to a teacher randomly, something that does not happen in any school (remember lobbying for Ms. Brown’s second grade class?).
+ There is no proven way to calibrate a teacher’s contribution for students who change schools in December or February. In most low-performing schools, the mobility rate hovers between 20 percent and 40 percent.
+ Because the sample sizes are so small (25 or so in elementary classes) results can swing wildly year to year. One large study found that over a three-year period, 40 percent of teachers who had been in the top fifth the first year were not in the second year and that a third of them slipped to the bottom 40 percent by the third year.
+ Students do not stop learning during the summer, but there is no way to capture the contribution for those who go to math or drama camp or are tutored. Yet the VAM system assumes that a student’s performance can be explained by just one teacher.
Standardized test results can help target a student’s strengths and weaknesses or highlight potential teaching inadequacies. It’s not sensible to use them to decide who gets laid off.
It’s also worth noting that the DOE expects nine New Jersey districts to do what the multibillion dollar educational research industry has been unable to do: come up with fair and reliable measures of performance for the 80 percent of teachers whose grade or subject is not tested. Consider these everyday situations:
Not only are districts expected to prepare, field test, revise, and retest their
“protocols” during the year, but also they are expected to contract with one of four national consultants (or explain why not), consult with all affected teachers, seek the input of a broadly representative advisory committee, train a cadre of in-class evaluators and prove that their observations are reliable.
Oh, they have 27 days to prepare their applications at a time when the targets of collaboration and evaluation—teachers—aren’t around.
What’s the rush? Michelle Rhee — a former chancellor of the Washington, D.C., school district and the founder of StudentsFirst — invested three years in developing her teacher evaluation scheme. If this pioneering reformer is that careful and patient, surely Trenton can show a little respect for the complexity of the problem?