Should We Stop Asking College Students to Evaluate Their Instructors?

At the end of every semester, at nearly every college in the country, millions and millions of students fill out student evaluations of teachers. These forms ask very sensible questions. Did the teacher effectively communicate the material? Were they available for students?

Department chairs and deans take these evaluations very seriously. At teaching-intensive institutions, these evaluations inform decisions about retention and promotion. A professor at a liberal arts college may not be tenured if they have lukewarm evaluations. Part-time faculty may be dismissed as well. Even research faculty must contend with student evaluations of their teaching. The bar is lower, but it is there. Do badly enough and the professor may have a tough case for promotion.

However, new evidence suggests that the information gathered through student evaluations is not as trustworthy as once believed. Despite this new information, college officials continue to use questionable data to make their decisions.

One reason student evaluations of faculty are popular because they have an intuitive justification: they measure consumer satisfaction. After a person has paid for a service, such as a college class, you can ask: Did the teacher do a good job? Was the customer treated well?

If this sort of customer satisfaction survey works for your car insurance salesman, why wouldn’t it work for a teacher? For a long time, many academic researchers thought that these evaluations were a good thing; by the 1970s, evaluations were widespread in academia. Surely, the argument went, students could distinguish between a punctual and prepared professor, and the chaotic and disorganized instructor.

Supporting this basic intuition, a number of early studies seemed to show that evaluations were linked with actual learning. In some studies, researchers would look at multi-section courses like introductory math or science. Then, they would administer the same test to different sections and see if students learned more in sections where the teacher received higher student evaluations. In some of these early studies, such as P.A. Cohen’s “Student Ratings of Instruction and Student Achievement” in 1981 and John Centra’s Reflective Faculty Evaluation in 1993, the answer was “yes.”

Over time, however, this positive view of evaluations slowly began to fall apart. Many researchers, like MacNell et al. in 2015, found that student evaluations were associated with the gender of the instructor. Other research showed that teachers could increase student evaluation scores by simply smiling more or being more enthusiastic. In one famous 1973 study, Naftulin et al.’s “The Doctor Fox Lecture,” investigators found that a professional actor who delivered a lecture on nonsense could still extract high evaluations from students. A 1982 meta-analysis by Abrami et al. confirmed “instructor expressiveness” could drive student evaluations without improving student achievement.

Eventually, researchers began to re-assess the evidence and systematically test the theory that evaluations were valid evidence of learning. Upon re-reading older studies, one discovered that they were based on small samples and yielded inconsistent results. To address this problem, researchers began to examine student evaluations in new ways. Some researchers set up studies where they looked at performance in a large number of students, instead of a few small classes. With better study design, the results were negative—a teacher’s evaluations were not linked to learning, as D.E. Clayson found in 2009.

Other research showed that teachers could increase student evaluation scores by simply smiling more or being more enthusiastic.

By 2017, the evidence was building and a team of educational researchers summarized the state of the field. More recent research showed no consistent pattern and many studies showed that student evaluations were riddled with biases. In terms of evaluating the value of student evaluations of teachers, the issue appears to be settled. Student evaluations are not a good way to measure learning, Uttl et al. argued in 2017. If one believes that evidence should be used to guide policy, the verdict is clear: abolish student evaluations.

If student evaluations of teachers are so bad, why do colleges keep using them? One reason is simply habit. Like individual people, institutions of higher education are resistant to change, which is very understandable. Another reason might be that college administrators do not appreciate the difference between student satisfaction and effective learning. Liking a teacher is not the same as having learned something. Student evaluations also offer college officials the illusion of rationality. Professors and administrators want to show the world that they evaluate employees in a logical way.

The lesson from research on student evaluations of teachers is that a simple and appealing solution to a problem is not always a good solution. We cannot rely on a student’s subjective reports as a proxy for knowledge gained. Why? Sometimes you have to have critical feedback before you can perform well. A 2010 study by Scott Carrel and James. E. West of cadets enrolled in the U.S. Air Force Academy shows that teachers who give lots of high grades are not the ones who always produce the best results. At the military academies, students are randomly assigned instructors in the basic science classes like calculus and physics. Researchers then looked at the link between grades earned in the introductory course and later grades in the same subjects (e.g., Calculus I and II). The finding? Students in sections run by older, stricter teachers did better in later science courses. Learning isn’t always easy or fun. This is probably why, in some studies such as V.E. Johnson’s in 2003, evaluations correlate with the number of A’s given, not the learning achieved.

The fundamental issue is that colleges should probably not be judged using the logic of consumer satisfaction. The metric of customer happiness makes sense for products that are meant to be immediately entertaining, like watching television. But education, especially higher education, is about learning which, by nature, is a stressful and inconvenient thing. And the onus for success should be at least as much on the student as on the teacher.