Are College Exit Exams a Valid Measure of Learning? It’s Complicated

Given the enormity of the public and private investment in US higher education, of course we should evaluate its effectiveness. But, how?

It is claimed that over 200 higher education institutions administer the one-size-fits-all Collegiate Learning Assessment (CLA). When administered pre-post—that is, near the beginning and then again near the end of a student’s program—the difference in scores on equivalent forms of the same test (i.e., the “gain score”) represents how much students have learned in that program. Or does it?

Everyone knows that any one test cannot be valid in all contexts. Administering an advanced calculus exam to kindergarteners would not tell us much, for example, nor would administering it as a college exit exam for art majors. College students study a wide variety of topics.

According to the CLA’s owner, the Council for Aid to Education (CAE),

One of the unique features of CLA+ is that no prior knowledge of any specific content area is necessary in order to perform well on the assessment.

Given that much of a student’s time in college is devoted to accumulating knowledge of specific content, this seems problematic. And according to cognitive scientists, it is. “Higher-order” skills, such as lateral thinking and experimentation, depend on the accumulation of a critical mass of knowledge. Content-free or generic skills do not exist.

If not from cognitive scientists, then, where does the belief in generic skills come from? Ed schools. The cynic in me wants to classify this as another attempt by US educators to hide from meaningful measurement. One of them might say, however, that factual content is readily available just a mouse click away on the internet. Such is true, but only in isolated, disaggregated forms.

Look beyond the college promotional froth about building better citizens, molding character, and teaching “higher-order skills,” such as “reasoning, critical reading and evaluation, and critique.” One will find remaining the more measurable and unfairly derogated benefit of “recall of factual knowledge,” which the CLA eschews.

The CLA cannot do what it purports to do: measure the effectiveness of a college education.

Yet, paradoxically, the CLA is predictive of students’ future competence in employment. How? Because it is largely a test of general cognitive ability—more like an aptitude than an achievement test—with some measurement of writing, vocabulary, and argumentation thrown in. Voluminous research in the field of personnel psychology has shown that standard tests of general cognitive ability, such as IQ tests, predict success in most jobs better than any other single available factor (e.g., interviews, recommendations, school grades).

General cognitive ability, however, doesn’t change much in college. Though politically incorrect to even mention the fact, some have more to begin with through the luck of their genetic configuration. Some nurture it better than others by habit (e.g., with “thinking” activities rather than television watching, keeping physically fit). Moreover, general cognitive ability can be affected as much by activity outside the classroom as inside it.

Employers may be better off ignoring CLA scores and instead administering a simple and inexpensive “off the shelf” cognitive ability test to their applicants.

Ohio University economist Richard Vedder proposes the development of more complex examination that would add a potpourri of content to a CLA-like core.

The 3½-hour test would measure critical reasoning and writing skills by asking students to analyze information from a variety of sources and write a persuasive essay to build a case. That would be followed by a two-hour, 100-question multiple choice test which could assess a student’s grasp of math (10 questions), natural and biological science (10 questions), statistics and data analysis (five questions), literature (10 questions), history and government (10 questions), economics (10 questions), a foreign language of the student’s choice (10 questions), psychology (five questions), and geography (five questions). The last 25 questions could be from a subject of the students’ choice—presumably their majors.

Whereas the 25 questions in a students’ major brings us closer to a valid content-based exit exam, it’s not so for the array of five questions here and ten questions there. Five questions are far from enough to comprise a reliable or valid measure of any student’s mastery of psychology, or even a subfield of psychology. For the many students who take no psychology courses in their college programs, those five questions are totally invalid measures of their college learning.

For those who would like to measure the effectiveness of college education, there simply exists no valid, one-size-fits-all solution. Thorough and valid college learning evaluation would require a large array of sufficiently long content-based exams—on the order of 50 questions or more for each subject area (and not considering the arts).

Before giving up all hope, however, note that some content-based exams already exist. Some fields already offer credentialing exams just after college, for example. In the United States, these are more common at the graduate level (e.g., law, medicine). Still, some undergraduate fields, such as nursing and accounting, administer tests for entry to their profession that could serve as valid exit measures of the students’ college programs.

And the Educational Testing Service (ETS) offers over a dozen Major Field Tests in standard subject areas within the broad categories of business, social sciences, humanities, and STEM. Each of them is subject-specific, content-based, and sufficiently long.

There also exist a wide variety of subject-specific K-12 teacher licensure exams, some of which might be converted with some effort to college student exit exams—though, granted, each would require new, appropriate validation.

I recommend accepting that college education has, and should have, a wide variety of purposes and goals.

But even subject-specific content tests—presumably administered in a student’s major field—loom problematic as measures of college learning. Some college students try to sample a wide variety of topics while in college and may accumulate only the minimum number of credits required for a major, whereas others might accumulate as many credits as they can in their major. Likely, the latter type of student would perform better on a major field exam even though one might argue that they are less-well educated (i.e., less broadly educated).

Moreover, colleges can offer a great deal of variety even within a major field. Indeed, traditionally, institutions were considered superior when offering more variety; a single major field exam might encourage less. Consider a college that offers one of the few courses in Central Asian history available in the United States (and employs one of the country’s few expert historians teaching Central Asian history). Any universal history exit exam will likely cover only those topics generally available across all colleges. If this college is to be judged by the public and accreditors by its students’ scores on a single national history exit exam, how likely will the Central Asian history course remain in the college’s course schedule? How likely will that unique expertise remain available to the US public, military, businesses, and policymakers when they need it?

ETS’s Major Field Exams attempt to address this problem by telling colleges “You can also add up to 50 locally authored questions to cover topic areas unique to your program.” Including locally authored (and, presumably, locally scored) items to a national test, however, brings us back to square one—the exit exam would be neither standardized nor comparable across institutions.

Obviously, I remain skeptical of any reliance on college exit exams to judge or compare individual students or institutions. The statistical conundrum is not unusual. There exist many contexts where the variety of preferences and goals is so wide that no single measure, or single type of measure, can validly represent performance. This does not mean that I oppose all college exit exams, provided their scores and any resultant rankings are issued with appropriately lengthy disclaimers.

Realistically, however, if institution-level exit exam scores are made public, they will surely end up as components in college ranking schemes. There, they might do some good by providing some outcomes balance to rankings now overly reliant on input measures.

I recommend accepting that college education has, and should have, a wide variety of purposes and goals. That variety can only be validly judged with a corresponding variety of outcome measures.

Multiple measures for judging college performance could include graduates’ starting salaries, time to employment, alignment of employment to major field, graduation rates, and more. The federal government has been pressuring colleges to collect and publish more outcome measures.

In this light, the magazine Washington Monthly should be applauded for its efforts to develop and publish alternative college performance metrics. Different citizens value different qualities in higher education institutions, and performance metrics should account for that variety. Washington Monthly ranks colleges by some unusual but intrinsically meaningful metrics. For example: “best bang for the buck” colleges (using a ratio of graduates’ starting salaries to college costs); “contribution to the public good” colleges (using a combination of measures of social mobility, research, and promoting public service); and colleges “doing the most to turn their students into citizens.”

With so many performance measures and so many college rankings, will every college find itself ranked high in something? No. Some will not rank highly on any measure, and those may be the colleges to avoid.

Richard P. Phelps is founder of the Nonpartisan Education Group, editor of the Nonpartisan Education Review, a Fulbright Scholar, and a fellow of the Psychophysics Laboratory. He has authored, edited, and co-authored books on standardized testing, learning, and psychology.