The Uncertain Promise of Classroom Observations

This is the first of two blog posts about two new studies from AIR researchers and collaborators on the use of classroom observations for teacher evaluation.

Anyone who has spent time looking in on a classroom knows how much a visitor can learn about the teacher and the class. It provides an opportunity to see the teacher in action, and appreciate the skills needed to work with his or her particular students. Since formalized teacher evaluation systems need to capture exactly this kind of information, it makes sense to include classroom observations as part of teacher evaluation systems.

Summative teacher evaluations—the end-of-year measure of teacher performance sometimes used for high-stakes decisions such as who gets tenure or performance pay—are often based largely, if not completely, on classroom observation scores. While not all subjects and grade levels have relevant standardized test scores to create value-added scores for measuring teacher performance, all teachers can be evaluated through observation.

But how well do classroom observation scores help us understand how much a teacher has added to his or her students’ achievement? New research raises questions about the wisdom of basing high-stakes, summative teacher evaluations chiefly on classroom observations.

In a study just published online in Educational Evaluation and Policy AnalysisMatthew Steinberg and I used data from the Gates Foundation’s Measures of Effective Teaching (MET) study to investigate how well classroom observation scores capture teacher effectiveness. The MET study worked in six large, urban districts across the country over two consecutive school years. In the second year of the study, a sample of teachers was randomly assigned within their schools to classes in their fields. (For example, sixth grade English teachers to sixth grade English classes; eighth grade math teachers to eighth grade math classes.) Random assignment helped avoid concerns that the best teachers would be assigned to classes with the highest-performing students. This allowed us to better understand how each teacher contributed to student learning over the year. Teachers had lessons videotaped throughout the year, and MET study researchers used established classroom observation protocols to score teacher practices while watching the recordings remotely.

We looked at the relationships between the study teacher observation scores and student performance in math and English. We found that by using only observation scores to measure teacher performance, we could not clearly capture teacher effectiveness in promoting student achievement. We also found that a teacher’s observation scores were not highly correlated over the two years of the study, suggesting that we need to better understand how observation scores capture teacher effectiveness over time.

Note that our research does not speak to the potential benefit of observations for instructional feedback, teacher coaching and professional development. Also, other research using the MET data found that teacher effectiveness can be captured by combining classroom observation scores with value-added scores and student surveys.  This is promising – assuming these other measures are available.

Classroom observations may well make sense for discovering teachers’ instructional strengths and establishing areas for teacher growth, but for high-stakes evaluation and key personnel decisions, states and districts should be careful about relying too heavily on classroom observations.