Early Implementation Findings From a Study of Teacher and Principal Performance Measurement and Feedback: Year 1 Report

David Manzeske

Many states and school districts see performance evaluation systems for teachers and principals as a way to improve educators’ instruction and student achievement. With funding from the U.S. Department of Education’s Institute of Education Sciences, AIR is examining the impact of a two-year intervention that shares performance feedback with teachers and principals. Informed by recent research, the intervention gave teachers feedback on classroom practice four times a year, as well as annual feedback on their contributions to student achievement. Principals received feedback on their leadership.

More than 100 elementary and middle schools in eight school districts and five states were part of the randomized control trial. About 1,000 fourth- through eighth-grade math and English teachers participated in the study. It was designed to track the intervention’s implementation and its effect on teacher classroom practice, principal leadership and student achievement.

The current study report is the first of two. It focuses primarily on the implementation of the intervention in its first year. The authors found that:

Teachers in intervention schools received more than four times as many rounds of feedback on their classroom performance that included ratings and a written narrative than teachers who were in schools without the intervention.
Even though observers completed specialized training and passed tests on how to rate instruction, they gave teachers performance ratings that were above the mid-point on the rating scale, leaving them little room to grow.

The authors also examined the reliability of the ratings given to teachers and principals—that is, the degree to which the ratings provided a consistent message from occasion to occasion or rater to rater. The authors found that:

The average scores from four classroom observations provided some reliable information about the quality of a teacher’s practice. A two-year average of their value-added scores (or a teacher’s contributions to their students’ growth) was also reasonably reliable. But scores from a single observation had limited reliability, meaning that teachers received different messages about their overall performance from one observation to the next.
Teachers received a report with ratings on several dimensions of their practice each time they were observed. The classroom practice reports usually showed that teachers performed better on some dimensions and worse on others. However, the results varied from one observation to the next. Even after averaging the scores from four observations, the dimension ratings did not reliably identify an individual teacher’s weaknesses, so it was not clear which aspect of teaching they need the most help with.

The study’s second report will examine the impact of the two-year intervention on teacher classroom practice, principal leadership, and student achievement.

Early Implementation Findings From a Study of Teacher and Principal Performance Measurement and Feedback: Year 1 Report

Andrew Wayne

Michael S. Garet