State Releases First Results from Pilot Trials of Teacher Evaluation Systems

John Mooney | November 22, 2013 | Education
Initial report correlates data collected from 25 districts participating in pilot for at least a year

pie chart
As New Jersey public schools this year move to new teacher evaluation systems, two dozen districts that tested the systems over the past two years are starting to provide information about lessons learned and challenges ahead.

The Christie administration this week released the first of two reports from the 25 pilot districts that were charged — and funded — to test the new systems that use uniform evaluation practices, as well as student performance measures, to gauge the effectiveness of teachers.

The systems are a central piece of New Jersey’s new tenure reform law, known as TEACHNJ, and its requirements for strengthening the process for how teachers and principals are judged, retained, and, in some cases, let go.

The sample size of the pilots — ranging from tiny Alexandria to the 2,000 teachers in Elizabeth — was small for a state with 100,000 teachers, especially when student test scores were used. Nonetheless, it provided the first hard data that has been released on the early impact of the new evaluation requirement that has roiled schools across the state.

For instance, the report of the Evaluation Pilot Advisory Committee (EPAC) broke down how ratings were distributed across the pilots using the new practices, and found that the pilot districts continued to generally give their teachers strong scores on classroom practice.

In one breakdown of the 10 districts in the pilot for two years, the report said that 73 percent of teachers received at least an “effective” rating on a four-point scale. For another 15 districts in the pilot for one year, 86 percent were at least “effective.”

The four ratings required for every educator are “ineffective,” “partially effective,” “effective,” and “highly effective.” Teachers need to maintain the two highest levels to retain their tenure under the new law.

The following was the breakdown for each group of pilot districts:

First cohort (10 districts, for two years):

  • 3 percent ineffective
  • 25 percent partially effective
  • 66 percent effective
  • 7 percent highly effective
  • Second cohort (15 districts, for one year):

  • 1 percent ineffective
  • 13 percent partially effective
  • 82 percent effective
  • 4 percent highly effective
  • The high distribution of favorable ratings was not very surprising, but state officials went out of their way to stress that differentiation in ratings increased the longer districts had to practice with the new evaluation tools.

    Citing patterns nationwide in which virtually all teachers were rated “satisfactory,” assistant commissioner Peter Shulman told districts in a memo this week that he was encouraged such patterns were not as prevalent.

    “With time, greater understanding of the observation framework, and more practice, observers increased their ability to identify nuances in teacher practice, and as a result, to differentiate ratings,” Shulman wrote to districts.

    The pilots were also the first test case of the state’s use of student test scores as part of the ratings for teachers whose students take the state’s language arts and math tests, roughly about a sixth of the total.

    The state is to formulate what it terms a “student growth percentile” (SGP) for each teacher that looks at the progress of his or her students in a given year, as compared to other equivalent students statewide.

    The sample in the pilots was even smaller in this case, but the EPAC report said there was a “positive correlation” between SGP and classroom observations.

    In other words, teachers who appeared in the classroom to have mastered their craft also showed student performance gains. But Shulman acknowledged there were outliers, too, and implored districts to go back and look for reasons behind the divergence.

    In one unnamed district, the bulk of teachers had close correlations, the report said. Still, a dozen teachers were found to have satisfactory ratings in classroom practice but were among the lowest third in student performance gains. Conversely, a half-dozen teachers had high SGPs but less than “effective” classroom ratings.

    Elizabeth was the largest of the pilot districts and ultimately saw 2,000 of its teachers go through the new evaluation system, close to 300 of them with SGPs included.

    Rachel Goldberg, the district’s director of staff development who oversaw the effort, said she was encouraged by the progress that her district made in the pilot.

    She said there was a “strong distribution” of ratings across the spectrum for teachers, and a close correlation between classroom practice and student performance.

    “We were incredibly heartened,” she said. “It means that our school leaders understood and recognized good instruction.”

    Goldberg, who will serve on a new advisory panel created by the state to monitor the evaluation system, said there has been an overemphasis on the impact the system will have on teachers.

    “This is really about the school leaders,” she said. “We have spent so much time focusing on the teachers, but what is important is what it says about the leaders. This is a whole shift in the game for them, from being a manager and building administrator to being an instructional leader.”