Student course evaluations aren’t worth much, and there are better ways

One of the first books about the state of higher education that I read after coming to the Pope Center was Generation X Goes to College by Peter Sacks. The book was published in 1996 and in it, the author wrote about his first year of teaching at an unnamed college, one where most of the students had little intellectual interest or ability.

Sacks was teaching a course on journalism and assumed at the beginning that his students would actually do the assigned work and put forth a serious effort. He was dismayed to find out that few of them were even slightly interested in the course. Most just wanted to coast through with minimal effort and they resented his criticism of their writing. Nevertheless, Sacks continued to teach the course the way he thought it should be taught—like a respectable and rigorous exercise.

When the semester was over, the students had their revenge.

That revenge took the form of blistering evaluations of his course.  They were unhappy that it hadn’t met their expectations of an easy, ego-boosting A. When department administrators read the mostly negative evaluations, they called Sacks in for a conference in which they made it clear to him that he would not be rehired for a second year unless his evaluations improved greatly in the next semester.

Consequently, to save his job, Sacks undertook what he called his “Sandbox Experiment.” He watered down the course, made it more fun, and greatly reduced his criticism of student papers. His course evaluations for that semester were much higher, which pleased his superiors. When they asked if he had gotten this improvement by making it easy, he said that he hadn’t. That assertion was enough for the superiors.

Since Generation X Goes to College, many other professors have argued that the desire and even need to avoid bad course evaluations plays a leading role in both grade inflation and curricular erosion. An abundance of information on that from faculty members in various disciplines can be found on the website of the Society for a Return to Academic Standards, maintained by Professor Larry Crumbley of Louisiana State.

Non-tenured faculty members desperate to keep their jobs try to appease their students in hopes of getting good evaluations. While evaluations may help identify faculty who are incompetent or unprepared, they exert a strong downward pull on academic standards.

A recent paper, An Evaluation of Course Evaluations by two UC-Berkeley professors gives a powerful indictment of the current practice of course evaluations and also argues that there are better ways for colleges to find out which professors do a good job in the classroom and which ones do not.

The authors, professors Philip Stark and Richard Freishtat, write, “The common practice of relying on averages of student teaching evaluation scores as the primary measure of teaching effectiveness for promotion and tenure decisions should be abandoned for substantive and statistical reasons: There is strong evidence that student responses to questions of ‘effectiveness’ do not measure teaching effectiveness.”

Using the standard course evaluations by students turns college teaching into a “popularity contest” where good professors often get bad ratings and bad professors get good ones. Furthermore, anxiety over the prospect of bad evaluations causes faculty to stick with “safe” tried-and-true methods of teaching rather than trying out anything innovative.

Stark and Freishtat put their statistical expertise to use (the former holds an appointment in Berkeley’s statistics department, the latter in the university’s Center for Teaching and Learning), noting immediately that course evaluations suffer from a response problem. Anger, they point out, motivates people much more than satisfaction does; therefore course evaluations are apt to be weighted more heavily by the responses of disaffected students than those who thought the professor at least acceptable.

Also, many factors other than the professor’s competence come into play, such as the time the class met. For that reason, it makes no sense to compare an individual instructor’s average on evaluations with course or departmental averages—but that is often done.

It’s not the case that nothing valuable can be gleaned from student evaluations, the authors argue. Instead of fixating on the numbers students assign an instructor’s course, administrators should pay attention to student comments, which at least might be insightful.

Stark and Freishtat write, “Students are ideally situated to comment about their experience of the course, including factors that influence teaching effectiveness, such as the instructor’s audibility, legibility, and perhaps the instructor’s availability outside class….They might be able to judge clarity, but clarity may be confounded with the difficulty of the material.”

Thus, it would make sense for department heads to carefully read student comments, even though some of them may written more for revenge than in a helpful, objective spirit.

What would be better than the typical student evaluation? “If we want to assess and improve teaching, we have to pay attention to the teaching,” the authors write. Toward that end, Berkeley’s statistics department has adopted a more “holistic” method for evaluating its faculty members. That evaluation includes review of a portfolio with a statement about teaching, syllabi, notes, assignments, exams and anything else the faculty member thinks pertinent.

Crucially, the department also has a senior member attend at least one class and give written comments on it. That takes quite a bit of time, but it is time very well spent. Senior attorneys spend time working with and mentoring young attorneys and experienced surgeons oversee and instruct young surgeons. It’s rather amazing that it is rare to find such collaboration in the academic world.

Instead of true oversight of faculty teaching quality, most colleges and universities simply assume that faculty will do a good job because they have gone through the academic credentialing process (that is, they’ve earned their terminal degrees) and that student evaluations will suffice to indicate those who are wasting their students’ time.

It is not surprising that serious oversight should manifest itself in a department like statistics where there are right and wrong answers and where students must master concepts at lower levels before they can possibly continue to more advanced work. The quality of instruction really matters in statistics and other disciplines where there is a body of knowledge that students either learn or don’t learn.

For that reason, I don’t expect the UC-Berkeley Statistics Department’s strong system for ensuring that courses are taught well to spread into “soft” fields where there are no wrong answers and it hardly matters how much students have learned. But at least there is a good model available for any department that sees the importance of going beyond unreliable student course evaluations.