The Emptiness of College Rankings — The James G. Martin Center for Academic Renewal

Inputs are out, and outcomes are in. So goes the current thinking about the methodology used to create college rankings. The publications that tell us which colleges are best, next best, and on down the line often revamp their formulas, and the emphasis has moved in recent years to evaluating institutions by the status of their students after graduation rather than their status—and institutional conditions—when they enter.

The industry’s behemoth, U.S. News, responded to ongoing criticism by touting, about this year’s best-colleges list, “We want to ensure the educational resources we provide emphasize the outcomes for graduates … This, in conjunction with the other outcomes-focused measures we are adopting.” The now-favorite word was echoed in a recent Wall Street Journal headline about its own formula: “The WSJ/College Pulse College Rankings: Measuring Outcomes, Not Inputs.”

Amidst the rush toward outcomes, key points about college rankings have been lost.Forbes’s approach, in keeping with the magazine’s purpose, keys on financial outcomes such as alumni salaries, debt, and return on investment. Money magazine does likewise, while also including inputs. Washington Monthly divides its concentration among financial outcomes, research, and community and national service (e.g., Peace Corps participation and voter engagement).

Inputs are not entirely gone. They still have a presence in Money’s formula and carry considerable weight in U.S. News’s in spite of the image it wants to cultivate. But the logic behind outcomes is dominant—the idea that a ranking formula is a recipe, such that what should be rated is the final product rather than the particular ingredients. Thus, factors like SAT scores, faculty salaries, spending per student, and prestige have lost favor to calculations about graduation rates, alumni earnings, and debt, with new rankings often including a value-added calculation that gives credit to otherwise unheralded schools.

Amidst the rush toward outcomes, two key points about college rankings have been lost. One is that inputs are not all bad, even if the ones commonly used are. More about this shortly. The other is that the most important element in the college experience is largely ignored: the learning process that occurs over four years and is supposed to result in a broader and deeper content knowledge, along with the maturation of critical-thinking skills. Who would deny that this is the purpose of college and therefore the most important thing that rankings should address? Yet herein lies their greatest fault.

The reason for this void is an inability on the part of college rankers to come up with relevant data for measurement. What is needed is an assessment of the what and how of the learning experience. It is easy to imagine comparing the curricular structure of one school to that of another. This is an input variable that belongs in any reputable ranking system, and the pertinent information can be found by perusing college catalogs.

Evaluating the portion of the curriculum devoted to majors would be problematic, because of the variance among institutions in the majors they offer or emphasize. General education, however, is fertile territory to explore, since nearly all schools have it as a requirement. But what standard should be used to separate better programs from worse ones? Until roughly 60 years ago, general education consisted of a set of specific basic courses in the various disciplines, assembled to ensure a broad liberal-arts background that varied little among institutions. But the structure loosened over time to the point that, today, students at most schools acquire it by picking one course from each of several categories, often having been offered long lists of possibilities. Critics decry a lack of solid content in this arrangement, while defenders say that students can learn critical thinking regardless of the particular subject matter.

The learning process that occurs over four years is mostly ignored by rankers. In an effort to assess content in general education, the American Council of Trustees and Alumni produces its annual “What Will They Learn?” ratings, giving each school a grade of “A” to “F” based on how many of seven basic subjects are required: at least one course each in composition, literature, U.S. government or history, economics, mathematics, and natural science, as well as intermediate level-1 competence in a foreign language. These are merely categories, with limitations on what courses count but no specific course requirements. Offering six of the categories results in an “A,” while four rates a “B.” In 2023, more than two-thirds of the 1,100 schools rated scored below a “B,” with many elite names earning a “D” or “F.” The standard ACTA has devised is not strict, but in spotlighting the number of schools that fail to meet it, especially name brands, it (or something in the same vein) stands little chance of being adopted by U.S. News and company.

Besides the strength of the curriculum, the other key component in assessing students’ learning experience is the quality of instruction. Previously used input measures meant to assess teaching (such as faculty salaries, per-student expenditures, and student-faculty ratio) are now in disfavor for reflecting institutional wealth rather than indicating what goes on in classrooms. While these measures indeed do not belong, one input that is relevant is class size. It is widely acknowledged that small classes produce better learning. The standard needs to be not how many small-enrollment courses an institution offers (as with U.S. News, before it dropped class size from its formula this year) but what percentage of each student’s learning experience is comprised of them. That would account for schools listing many small seminars despite students getting a significant amount of their learning from large lecture courses.

What about outputs to measure learning? One proposed answer is exit testing. While this may sound straightforward, the devil is in the details. What is to be tested? There are existing instruments, such as the GRE and the CLA (College Learning Assessment), that measure general cognitive skills but not the important element of content knowledge. These tests are administered voluntarily. If they were required, how would students respond? Would elite schools participate, knowing what they could lose? In general, how many colleges would abstain, realizing that a value-added calculation would require entrance testing, as well, to determine how much improvement there has been in four years of college? Finally, it would be difficult to overestimate the howls that would come from educators decrying a new boon to the testing industry.

Another outcome measurement, used currently by WSJ, surveys students about the quality of their overall learning experience. However, responses are voluntary and from only a portion of the student body. A related idea is to use the end-of-course evaluations that students fill out at most colleges. The problem here is that these ratings are highly suspect, with studies pointing to various flaws, including the tendency of students to reward professors who give high grades and punish the ones who maintain a strict standard.

Wealth and status-related inputs are on the wane, but the industry’s heavyweight still employs them.Besides the dearth of relevant inputs to assess the learning experience, as well as drawbacks for the potential outcomes needed to do so, a common problem exists for both. They approach their objective from the edges. What is needed is for expert observers to go inside college classrooms to see what is really going on. If this were done using a standard evaluation tool, we might have a reasonably accurate picture of student learning. Several such tools are available, for instance from the Lumina Foundation and the Association of American Colleges and Universities, but practicality gets in the way of applying them on a mass scale. Finding experts to travel to all of the colleges being evaluated—and sit in on a large and representative sample of classes—would be prohibitive in terms of time. Multiple visits per course would be needed to ensure representative data, and multiple observers to offset bias. Simply put, it is not feasible to obtain the sort of information necessary for a comparative evaluation of the learning experience that colleges provide.

With the emphasis now on outcomes in determining college rankings, what has been accomplished? What has not? Wealth and status-related inputs are on the wane, as they should be, but the industry’s heavyweight still employs them, which ensures that the same cast of characters gets top billing every year. Helpful inputs such as class size and assessments of the curriculum are left out. With the formulas changing regularly and little consensus about just what factors should be included, what can be said about the popular lists of best-to-worst colleges? In the words of one expert analyst (a former college president and author of a book about rankings) who favors the recipe metaphor mentioned earlier, “In my experience, the efforts of U.S. News and its followers to produce best-college rankings have typically wound up with the equivalent of gruel.”

Yet, in spite of their critics, we know from three decades of experience that college rankings sell magazines. They do it by presenting a seductive array of statistics in a variety of categories. Whatever it is that all of that adds up to, it does not tell us where to go to get the best college education.

William Casement is a former philosophy professor and art dealer. He is the author of numerous articles across several disciplines and of books on the literary canon, reforming higher education, and art forgery.