American Council of Learned Societies
Occasional Paper No. 33
The Professional Evaluation of Teaching
Wilbert J. McKeachie
Student Ratings of Teaching
Wilbert H. McKeachie
The Peer Collaboration
and Review of Teaching
How Evaluations of Teaching
Are Used in Personnel Decisions
The American Council of Learned Societies is committed to the professional model of
the teacher-scholar: the teacher who is devoted to constant exploration, the scholar who places
research in the context of challenging the next generation.How the one activity serves the other was explored by Francis Oakley in ACLS Occasional Paper No. 32,
Scholarship and Teaching: A Matter of Mutual Support.
Within the community of scholars we have developed widely accepted procedures, grounded
in peer review, for evaluating scholarship. We use these procedures to improve works while they
are still in manuscript, to make judgments about what to publish, and to provide a basis for
personnel decisions. There is less agreement on how we should evaluate teaching for purposes of
improvement or judgment. What are appropriate professional approaches to the evaluation of teaching?
This Occasional Paper on The Professional Evaluation of
Teaching originated in presentations at the 1996 Annual Meeting in Washington, D.C. Wilbert J. McKeachie, professor of psychology at the University of Michigan, and delegate to the ACLS from the American Psychological Association, summarizes a considerable body of research on student evaluations, including work of his own, showing that student evaluations have considerable validity and are not subject to a number of
biases of which they are frequently suspected.
Pat Hutchings, director of the Teaching Initiatives Group of the American Association
for Higher Education (AAHE), reports on a project she directs on peer review of teaching. She stresses a primary focus on establishing a culture that nurtures the improvement of teaching through peer collaboration. A dozen institutions are participating in the pilot project, as are several learned societies which belong to the ACLS.
James England, provost of Temple University and former provost of Swarthmore
College, addresses how evaluations of teaching are used in personnel decisions. He supports using
both student- and peer-evaluation approaches (Temple University is participating in the AAHE
Project); McKeachie and Hutchings also see student and peer evaluation as complementary, not
competing, approaches. In addition, England sketches a third approach, one that would focus on assessments of what students actually learn.
We hope this Occasional Paper will provide a useful perspective on what constitutes
appropriate professional evaluation of teaching for faculty, deans, and department chairs, board members, and others.
Copyright © 1996, Wilbert J. McKeachie
Student Ratings of Teaching
Wilbert J. McKeachie
University of Michigan
In 1946, when I began teaching at the University of Michigan,
the faculty had already voted that student ratings of teaching should
be collected in all classes, and I used student ratings along with data
on achievement in my first research study on effective teaching in the
winter of 1947.
In 1949 Dean Heyward Keniston gave me a graduate assistant
to collect data and review research on student ratings. We found
that student ratings were collected at a number of universities and
colleges, including Harvard, the University of Washington, and Purdue.
In 1951 the program for evaluating teaching came up for review.
I remember the heated debate in the College of Literature, Science
and the Arts about the recommendation of the College Executive
Committee that the college continue to require collection of student ratings in
all courses. Some faculty members felt strongly about the impropriety
of students presuming to express opinions about a professor's
teaching. Encouraging students to think that they were qualified to
make judgments about teaching would destroy the proper respect
students should have for the faculty. Others asserted with great fervor
that teaching is an art; it is impossible to evaluate in terms of some form
The result of the debate was the adoption of an open-ended form
to be administered in all courses. There were five questions:
- What do you think are the objectives of the course?
- What is your judgment of the value of this course in your education?
Please point out both its contributions and its deficiencies.
- To what extent did you learn to think critically in the area
covered by this course?
- Assuming you have the time and opportunity either in college or
later, do you think you would be inclined to pursue interests in
this area? Explain.
- Keeping in mind that the returns from this questionnaire will be
used by the instructor in improving his teaching, please mention
any other aspects of the course or instructor (such as, for example,
clarity of presentation) not covered in previous questions which
you consider to be especially good or poor, and offer any
suggestions you have for the improvement of the course.
As I look at this almost half a century later, I'm impressed. I don't
think we do any better today.
During the period of student activism in the 1960s and 1970s,
there was a great increase in the use of student ratings. As Pat
Hutchings indicates, they are now used in most colleges and universities.
Nonetheless student ratings are still controversial. Most of us
are sensitive about being evaluated, and anytime the results are
negative it is natural to question the validity of the evaluation. And there are
some negative evaluations in almost every class. That's not
surprising; teaching that is effective for some students is not equally effective
Some years ago I was a member of a committee administering
grants to senior faculty members who proposed to construct or modify
their courses to emphasize thinking. At an end-of-the-year dinner for
the participants, the discussion turned to student ratings, and the
usual criticisms were raised.
"Students don't really appreciate a good course until they are out
"Students can't really judge how well they are learning."
"Students only give high ratings to courses with low standards."
It happened that Herb Marsh, a professor at the University of
Western Sydney, was visiting me at the time, and I had invited him to be my
guest at the dinner. He is probably the world's leading researcher on
student ratings of teaching, and as a guest he kept quiet as long as he
could. But finally he could stand it no longer and said, "You know, there's
a good deal of research evidence on the issues you've raised."
A prominent historian immediately retorted, "We don't care
about research evidence; we have our own experience."
So much for teaching critical thinking!
In any case, I have heard Kenneth Feldman (the preeminent
reviewer of research on student ratings) say that there have been over
2,000 articles published on student ratings--well over 1,000 of which
present research evidence. In fact, we probably have more good research
on student ratings than on any other aspect of higher education.
There are three major uses of student ratings:
How well do student ratings achieve these purposes?
- Student guidance in choice of courses.
- Improvement of teaching.
- Evaluating teaching for use in personnel decisions; e.g., tenure or
merit salary increases.
1. Student Guidance
This is first by right of seniority. Student ratings were first
collected, I believe, at Harvard University and published to provide guidance
for students in choosing courses. Faculty members usually presume
that students are thus likely to choose the easiest courses, but in a study
we ran several years ago, we found that, as compared with an
uninformed control group, students given student ratings of two alternative
introductory courses chose the more highly rated course, even though it
was rated as requiring more work (Coleman and McKeachie).
2. Improvement of Teaching
Harvard was not alone in using student ratings. In the
mid-1920s Herman Remmers of Purdue University began a program of research
on student ratings that made substantial contributions for over
four decades. His studies are still among the best that have ever been done.
Remmers and his students found:
- In multi-section courses, the teachers of those sections achieving
higher scores on classroom examinations are rated higher than those teachers whose students have not learned as much. Moreover, if a teacher aims a course at the top students, those students
give higher ratings than the rest of the class. However, if a teacher is particularly effective with the poorer students those students rate the teacher higher (Elliott).
- Ratings of teachers by alumni 10 years after graduation
correlate well with ratings of the same teachers at the end of a
course (Drucker and Remmers).
- Student characteristics such as age, sex, class standing, and
grade in the course have little effect on ratings of teaching (Remmers
The research of Remmers and those who have followed him also
strongly indicates that:
- Student ratings returned to faculty members result in
some improvement of teaching, but not very much.
- There is more improvement if behavioral items are used
rather than more abstract, general terms. For example, instead of
asking about clarity, ask "Uses concrete examples" or "Fails to define
new terms" (negative), or instead of an item on organization,
use "Reviews topics from previous lecture" or "Puts outline on
the blackboard" (Murray).
- There is substantial improvement when the ratings are
discussed with another teacher (McKeachie et al. 1980).
3. Personnel Decisions
If student ratings are part of the data used in personnel decisions,
one must have convincing evidence that they add valid evidence of
teaching effectiveness. I have already reviewed Remmers's extensive
validity studies. They have been replicated at other universities. In
general, better teachers (as measured by student learning) are rated higher
by students. In addition, there is evidence that students of faculty
members who are rated highly are more likely to be motivated to further
learning as indicated by election of advanced courses in the same field.
Highly rated teachers also produce more change in attitude
sophistication (McKeachie, Lin, and Mann). The instructor's own judgment
also correlates well with student ratings. Marsh found that if you
asked instructors which of two classes went better, their judgments
agreed well with the student ratings of the classes.
Finally, for this assembly, perhaps the most interesting evidence
of validity is that humanities teachers are rated as being more
effective than teachers of science, math, and engineering (Feldman). There
is also fairly persuasive evidence that humanities teachers are
actually better teachers. Humanities teachers:
- Are more expressive--move around more, use more gestures.
- Know students' names, encourage questions, ask questions.
- Show an interest in student ideas, show concern for
- Ask more questions requiring analysis and synthesis on
exams; science and technology teachers ask more rote memory
questions (Murray & Renaud).
- These are characteristics that lead to longer-term retention and
greater gains in thinking and motivation.
But aren't there biases or contextual factors that can invalidate
student ratings? Probably the most common criticism by uninformed
faculty members is that you get good ratings by "dumbing down" your
course. Cutting down the amount of work will, they feel, inevitably result
in higher student ratings. The facts, however, indicate that this is
not generally true. Student ratings of teaching are higher for courses that
are rated as requiring more work or that are more difficult.
Undoubtedly, there is a limit. If a course is pitched above the students' heads, or if
the course requires more work that most students can do, so that
less learning results, student ratings will be lower than for courses that
result in better learning.
Generally, small classes are rated higher than large classes,
but research shows that small classes are more effective than large
classes in producing changes in thinking, motivation, and attitudes
(McKeachie). Similarly, there are often small differences between required
and elective classes and lower-level vs. higher-level classes.
But the great concern about bias is based on the idea that we
should be able to compare teachers to one another, that the number 3.1
should signify better teaching than 3.0, that we should be able to compare
two teachers teaching in different departments at different class levels
with different students. I argue that this is neither necessary nor
desirable. In fact, for promotion and salary decisions we do not need to make
such comparisons. For the decision about promotion we really only need
two categories--good enough to promote or not promotable. Even
for salary increases we need only to determine whether the teacher
is excellent, good, adequate, or in need of help. We can determine
these categories simply by looking at the distribution of the student
responses. What proportion of the students give favorable ratings?
We don't need to figure averages to a decimal point.
Comparing teachers with averages such as 4.3 and 4.1 is like comparing apples
to oranges. We can tell a good apple or a good orange, but
judging whether a good apple is better than a good orange is a much
more difficult task.
It is clear that student ratings have the potential to
contribute positively both to improvement of teaching and to the quality
of personnel decisions about teaching. The problem is not in the
ratings but in their use.
Students. Student time is used to fill out the ratings, but the
students get little benefit from the time they invest. They are not encouraged
to think about their own learning and their own responsibility for
learning. Answering the questions should be an educational experience, not
a mindless appraisal of the teacher.
Forms. The forms used in many colleges and universities are not
as useful as they could be. Often a college or department requires that
a set of standard items be used. Typically such items are not as
applicable to the specific course as would be the case if the teacher developed
or chose items specifically about aspects of the course. Moreover, the
very fact that the items are mandated is likely to lead to resentment
Norms. In order to conclude that a teacher is reasonably good,
what percentage of his or her students would you expect to rate the
teacher as excellent? Ten percent? Twenty percent? Fifty percent? Certainly,
if at least half of your students think you are excellent, you can't be
too bad. At the University of Michigan, over 90 percent of the faculty
are rated as excellent by the majority of their students; yet when the
faculty members look at their results, almost half of those rated as excellent
by the majority of their students find that they are below average. This
is discouraging and is more likely to result in a loss of motivation than
in increased enthusiasm for teaching.
Evaluators. Whatever the source of data--student ratings,
peer evaluation, gossip--some committee or administrator has to make
an evaluative judgment. Students are not the evaluators; they
simply provide data to the evaluators. In most universities the initial
evaluation is made by peers--faculty members elected or appointed to a
committee that reviews the evidence for promotion or merit increase in salary.
A key element is the good sense of the evaluators.
Unfortunately, many evaluators have stereotypes about what constitutes good
teaching, despite the fact that there are many ways to be effective. Thus,
they may undervalue a teacher because the students' judgments of their
own learning may not fit with the pattern of ratings on such
characteristics as organization or enthusiasm, or other characteristics usually
associated with effective teaching.
Often the evaluators give less weight to the student ratings than
to less dependable evidence, such as peer observations of
teaching, testimonials, or general impressions of the teacher's personality.
In an effort to be objective, the evaluators may substitute
arbitrary criteria for reasoned judgment. Thus, they may set as a criterion
for promotion such that the teacher must exceed a certain numerical
mean on student ratings, without consideration of what the teacher is
trying to accomplish, the circumstances under which the teacher has to
work, the kind of course being taught, the nature of the students, and the
many contextual factors that should temper their judgment.
What can we conclude? As Pogo (or one of the
"Pogo" characters) said, "The enemy is us."
Coleman, J., and W.J. McKeachie. "Effects of Instructor/Course
Evaluations on Student Course Selection." Journal of Educational
Psychology 73 (1981): 224-26.
Drucker, A.J., and H.H. Remmers. "Do Alumni and Students Differ
in Their Attitudes Toward Instructors?" Journal of Educational
Psychology 42 (1951): 129-43.
Elliott, D.N. "Characteristics and Relationships of Various Criteria
of Teachings." Diss. Purdue University, 1949.
Feldman, K.A. "Course Characteristics and College Students' Ratings
of Their Teachers: What We Know and What We Don't."
Research in Higher Education 9 (1978): 199-242.
Marsh, H.W. Students' Evaluations of University Teachings:
Research Findings, Methodological Issues, and Directions for Further
Research. Elmsford, NY: Pergamon, 1987.
McKeachie, W.J. Teaching-Tips: Strategies, Research, and Theory
for College and University Teachers. 9th ed. Lexington, MA: D.C.
McKeachie, W.J., Y-G Lin, M. Daugherty, M.M. Moffett, C. Neigler,
J. Nork, M. Walz, and R. Baldwin. "Using Student Ratings and
Consultation to Improve Instruction." British Journal of Educational
Psychology 50 (1980): 168-74.
McKeachie, W.J., Y-G Lin, and W. Mann. "Student Ratings of
Teaching Effectiveness: Validity Studies."
American Educational Research Journal 8 (1971): 435-45.
Murray, H.G. "Low-Inference Classroom Teaching Behaviors
and Student Ratings of Teaching Effectiveness."
Journal of Educational Psychology 75 (1983): 138-49.
Murray, H.G., and R.D. Renaud. "Disciplinary Differences in
Classroom Teaching Behaviors." Disciplinary Differences in Teaching
and Learning: Implications for Practice. New Directions in Teaching
and Learning. No. 64. Ed. N. Hativa and M. Marincovich. San
Francisco: Jossey-Bass, 1995.
Remmers, H.H., and G.C. Brandenburg. "Experimental Data on
the Purdue Rating Scale for Instructors." Educational
Administration and Supervision 13 (1927): 519-27.
Copyright © 1996, Pat Hutchings
The Peer Collaboration and Review of Teaching
American Association for Higher Education
Teaching Initiatives Group
My piece of this picture, as I understand it, is to talk about the
role of faculty in the evaluation of teaching--peer review, if you will. I
do so in the context of a national project I've been involved with for
the past several years, a project of 12 universities, working in
pilot departments, coordinated by the American Association for
Higher Education (AAHE), in partnership with Lee Shulman at
Stanford University, and funded by the Pew Charitable Trusts and the
William and Flora Hewlett Foundation--which pretty much captures all the
vital statistics in one sentence. The project, entitled "From Idea to
Prototype: The Peer Review of Teaching," was launched in January of 1994 at
the AAHE's National Conference on Faculty Roles and Rewards, largely
in response to emerging campus recommendations--first in the
1991 "Pister Report" at the University of California, but now widely
heard--that teaching, like research, should be peer-reviewed; the intent was
to help campuses move toward peer review together, and to
ensure faculty involvement, from the outset, in shaping strategies for
peer collaboration and review that would be intellectually rigorous,
appropriate to the disciplines, and of practical use in improving the
quality of teaching and learning.
Context and Rationale
During the 1970s and 1980s, clear progress was made in
the evaluation of teaching; student ratings of teacher effectiveness, once
the exception, became the rule, and some 86 percent of liberal
arts campuses now routinely require that student ratings be used in
the evaluation of teaching (Seldin). The next step, it would seem, the
next stage of evolution in our seriousness about teaching is to
make teaching--like research--a subject for peer collaboration and review.
There are a number of arguments for doing so, of which I'll
mention only three (and briefly) here. First, student evaluations of
teaching, though essential, are not enough; there are substantive aspects
of teaching that only faculty can judge and assist each other with.
Currency in the field is an obvious example, the setting of appropriate
standards for student work, another. The aim of peer review, let me hasten to
say, is not to replace or supersede evidence provided by students, but
to augment and enrich the picture we get from that traditional
source. Indeed, many of the strategies being explored in the AAHE project
entail ways that faculty peers can help each other gather better and
more useful information from students and about learning, for
instance, through focus groups, interviews with one another's students, and
"co-assessment" of student work.
Second, peer review of teaching is important because
teaching entails learning from experience, which is difficult to do
without colleagues. It's difficult because to learn from experience, one
must have a clear view, and that's hard to get in the booming,
buzzing confusion of the classroom. Faculty can help one another step back
and see more clearly, and therefore learn from, their own teaching
practice in a variety of ways--through direct classroom observation,
videotape, and collaborative case studies of teaching practice, to name
Third, and perhaps most important, peer review puts faculty
in charge of the quality of their work as teachers. As things now stand
on many campuses, the evaluation of teaching at least feels like
something that happens to faculty: The evaluation forms get delivered to
class, filled out by students, and shipped off to the dean's office; or
the department chair parachutes into class one day, checklist in hand,
to conduct an observation. Indeed, even the method of
improvement--in the form of "faculty
development"--tends to treat faculty as
objects; as a wry faculty friend of mine put it recently (speaking from a
campus which shall remain nameless), "We're developing faculty to
death." Against this current reality, the argument for peer collaboration
and review is that it's a process, or perhaps a set of practices, through
which faculty can be active agents--rather than objects--in improving
and ensuring the quality of their work at teaching. That is the right
and professionally responsible thing to do; it's also a smart one, for if
faculty don't oversee the quality of teaching, outside, bureaucratic forms
of accountability, already very much in the air, will surely rule the day.
Lessons from the AAHE Project
There are now lots of interesting stories and examples from the
work of faculty in the AAHE peer review of teaching project
(Gabaccia; Ganschow and Inscoe; Quinlan and Bernstein), but since those
are available elsewhere, it seems useful to focus here, instead, on
some larger lessons from the project and on principles that can help
shape decisions about how to undertake the peer review of teaching in
ways that will actually improve the things we care about: the quality
and conditions of faculty work and the character of student learning.
A first such lesson or principle comes in the form of a gloss on
the phrase "The Peer Review of Teaching," which for most faculty
means classroom observation in the service of personnel
decision-making. What has been clear from the AAHE project, however, is that if what
we want is a higher level of teaching and learning, what's needed is a
whole menu of strategies that faculty can choose among and use to make
their work as teachers available to one another--be it to share
something they've learned about what helps students learn a hard concept, to
be part of an ongoing discussion about a pedagogical issue in the field,
to contribute to local decision making about pedagogical and
curricular stuff, or, yes, to put their work forward for formal review in the
hopes of receiving appropriate recognition and reward.
Thus the peer-review activities of the pilot departments have
been deliberately varied--a corrective, as I say, to the view that peer
review means exclusively classroom observation, and that its purposes
must, by definition, be those of high-stakes personnel
decision-making. Rather, faculty teams have undertaken peer-review projects matched
to their own local purposes, culture, and needs. In some, for instance,
the goal has been to start a conversation about teaching that simply did
not exist, and a successful strategy has therefore been the establishment
of "teaching circles" and discussion groups. In others, more formal
review has been the focus, with faculty collaborating on the design
and development of course portfolios that can provide scholarly
evidence of teaching for promotion and tenure decisions. A number of
departments have also focused on building greater attention to
teaching's quality into existing occasions and processes: for instance,
asking faculty job candidates, as part of the interview process, to deliver
a "pedagogical colloquium" about the teaching of their field. Virtually
all have stressed the need to assess student learning, not just teaching.
A second lesson, based on the variety of work undertaken in the
pilot departments, is that the relationship between "formative"
and "summative" evaluation might be usefully reexamined.
Part of the gospel of evaluation, in teaching and otherwise, is that
it's important to distinguish the processes and evidence employed
for improvement-oriented, formative purposes from those used for
decision-making, summative ones. Nearly all the strategies tried out by
pilot departments in the project were originally undertaken with
improvement in mind. But one thing a number of faculty have reported is
that the same strategies might well be useful for
I'm thinking, for instance, of the experience of a faculty
memberPeterin legal studies at the University of Georgia, who, with
his departmental colleague--Jere--decided to try out a strategy for
interviewing each other's students, part of a larger set of peer-
review activities they piloted. The purpose of the interviewing was to try
to understand more deeply how students were experiencing their
respective courses and teaching, and to gather feedback about
possible improvements--which both Peter and Jere were able to make. But
in addition, Peter took the initiative of writing a memo to Jere, based
on the interviewing experience, summarizing what he thought he
had learned about Jere's effectiveness as a teacher. The idea, as he
wrote Jere, was that "this might just be of benefit to you" (Hutchings
41)--which in fact it was when Jere was nominated for a teaching award,
and he chose to include Peter's memo in his application materials. What
was originally private and formative turned out to be useful, as well, in
a public, summative context.
This insight is echoed in reports by others. A mathematician from
the University of Nebraska developed a course portfolio in order "to
know if I'm getting through to the students." He wanted, he said, "more
than impressions about this." But he also intends to use the portfolio for
an upcoming promotion decision: "I hope to have my portfolio
put together and ready to present for review: something that will
be comprehensive and data-based in a way that people haven't
often seen--something the review committee can sink its teeth into"
(Hutchings 57-58). Similarly, a faculty member in English tells of how
"teaching development" portfolios constructed by graduate teaching assistants
in the composition program at Northern Arizona University later
become tools for job-seeking--clearly a summative use.
The point of these examples is not that the
formative-summative distinction is one, as they say, without a difference. No doubt about
it: it's good to be clear about purposes when undertaking the kinds
of processes and practices that can serve to make teaching
"community property"; it's good to have ground rules at the outset about what
the information will be used for, by whom, and with what, if anything,
The point, rather, is that when faculty set about making
teaching "community property," they develop habits and practices that
can, potentially, serve both formative and summative purposes. And
though this runs counter to conventional thinking, it shouldn't perhaps be
so surprising, since a similar phenomenon is taken fully for granted
in research. As research scholars, faculty deliberately seek feedback
from the scholarly community; we put our work forward to colleagues
their insights and contributions and critique. And we do this
knowing full well (perhaps even hoping) that some of those same colleagues
will judge that work in summative ways when it comes to
publication, grants, promotion and tenure. We cross the line between
formative improvement and summative evaluation and think little of it.
Indeed, doing so is part of what it means to be a member of a
scholarly community--be it as researchers or as teachers.
Which brings me to a third lesson, a sort of corollary to the
second: that with a little forethought and care, we might, in fact, craft
processes for the formal evaluation of teaching that
also promote improvement.
A case in point is the so-called "pedagogical colloquium" that
a number of the pilot departments--the history department at
Stanford, for instance--have been experimenting with as part of the process
of interviewing faculty job candidates. The colloquium is clearly
a summative/evaluative occasion--a very high-stakes one indeed for
the job candidates whose teaching abilities are being judged. But it
also serves improvement by bringing current faculty into conversation
about departmental expectations regarding teaching in ways that are new
and improvement-prompting. Indeed, this seemingly secondary
consequence of the pedagogical colloquium may be as important as
its primary purpose.
A similar dynamic pertains in the use of teaching and
course portfolios. Portfolios may be a route to more intellectually
credible, authentic evidence for the evaluation of teaching (and this is
their original appeal for many faculty), but along the way, the process of
their development gets faculty reflecting on their work in powerful
new ways--especially when they work in partnership with colleagues
who are also developing portfolios.
The punch line here is that though the methods traditionally used
to evaluate teaching have not always done much to improve it (and
may sometimes even work against improvement), that situation need not
be perennial, as Lee Shulman has argued to participants in the AAHE's
peer review of teaching project:
There's a principle that is increasingly employed
in discussions of evaluation and assessment today--a principle that we call
"consequential validity." The point of the principle is that in choosing some form
of assessment--of students, of faculty, of
whomeverit is not enough to demonstrate that the method is
accurate, that it's predictive, that it's fairthough all
of those are important criteria. You also must make
the argument that the use of a given method of
or evaluation contributes to the improvement of
that which is being evaluated; that the evaluation approach
advances the quality of the very enterprise
being evaluated. The principle of consequential validity
may help us bridge the formative/summative distinction.(3)
In short, Shulman says, "we wish to ensure that whatever we do
[to evaluate teaching] contributes to an improvement in the quality of
the teaching" (3) and, as many of the faculty participating in the
AAHE project would want to add, an improvement in the quality of
student learning as well.
Finally, the AAHE project suggests the need to make
professional development and improvement be part of what we mean
by--and evaluate and reward in--good teaching.
Too often, the kind of teaching that's institutionally valued
(though no one says this outright, of course) is teaching without visible
defects: students are satisfied, parents do not call the dean's office
with complaints, and, in general, instruction is "pulled off" without
apparent hitch or glitch. The extreme expression of this ethos is the feeling
among faculty on many campuses that seeking assistance with their
teaching (say, by visiting the Teaching Center or seeking help from a
colleague) is the proverbial kiss of death.
But the peer review of teaching becomes much more powerful if
we begin, instead, with a conception of excellent teaching that is
not "glitchless," good-enough performance but an ongoing,
reflective process aimed always at improvement. Excellent teachers would,
by this measure, be those who set out to inquire into their own
practice, identifying key issues they want to pursue, posing questions
for themselves, exploring alternatives and taking risks, and doing all of
this in the company of peers who can offer critique and support. These
are the habits of mind we expect, after all, in scholarly work, and we
should expect them in teaching as much as in research.
The corollary here is that if excellent teaching entails the
deliberate pursuit of improvement, then the deliberate pursuit of improvement
(I'll call it "reflective practice") should be an explicit institutional
expectation when it comes to summative evaluation. This is, admittedly, a
point that can be taken too far: we don't want an evaluation system
that rewards a bad teacher for getting a little better more than it
rewards achieved excellence. But I at least would argue that we do want
to encourage all teachers, not just the novices and the shaky ones, at
all stages of their careers, to behave as they do as scholars, seeking
new challenges and issues, identifying and solving problems, gathering
and using data to guide their practice, consulting with colleagues, and,
in general, contributing to the advancement of good teaching and
learning in their own classrooms, and beyond.
What might an evaluation system that values this kind of
teaching look like? One answer might be guidelines for portfolio
development that call for one entry focused on some problematic dimension
of teachingby which I do not mean a "problem" or personal deficit,
but some aspect of teaching the field that is inherently and even
universally difficult (e.g., I'm told there's a point about seven weeks into
the semester in calculus where large numbers of students fall away)
and which therefore needs the attention and thought of teachers willing
to go public with their practice.
Even more radically, perhaps, one might imagine criteria
for promotion that recognize the possibility and the need for
ongoing development by faculty as teachers. At Alverno College, for
instance, expectations for teaching differ by rank, with full professors
being called upon not only to teach effectively in their own classrooms
but to "take leadership" in helping colleagues to teach more effectively
and to "influence the professional dialogue" about teaching and learning
in higher education--expectations exactly matched to the premise of
The bottom line here is that when it comes to teaching and
learning, higher education suffers from a too-low level of ambition. This, I
take it, is what Stanford professor William Massy means when he notes
the inclination in teaching to "satisfice," to make do, to be content with
a certain, not very lofty level of performance and to aim no higher.
I would argue that we might counter this low level of ambition
by explicitly calling for, evaluating, and rewarding ongoing improvement.
A Word About the Role of Scholarly Societies
From the inception of the AAHE project, the important role of
the disciplines has been clear. The activity of the project has been
centered, thus far, in a set of pilot departments--originally three on
each campus--identified cooperatively by the campuses themselves in
order to promote intercampus collaboration by field. The goal has been
for historians at Wisconsin to be able to work together with historians
at Northwestern, at Georgia, and so forth, a design decision reflecting
the fact that for some aspects of teaching, the most relevant peers
are scholars from one's own field--first, because teaching history is
not teaching chemistry is not teaching engineering, but also,
importantly, because the field, not the institution, is for many faculty the primary
source of identity and status--and these are exactly what teaching lacks.
With this in mind, we have tried to connect with the
relevant scholarly societies to explore ways that the quality and improvement
of teaching can be made the subjects of discussion and debate within
the community. A number of cooperative ventures have already
begun. Articles based on work in the project, authored by faculty
participants, have appeared in newsletters and journals from some of the
scholarly societies; a number of societies have included sessions on peer
review on their annual meeting program. Faculty in one field are thinking
about sponsoring a national video-conference on the peer review of
teaching. We are eager to help with such efforts (and even have some
funding to underwrite them).
The resources listed below will tell you more about how the
project has evolved in various disciplinary and campus contexts. For
further details, or to have your name added to the project mailing list,
please contact: Pam Bender, Program Coordinator, American Association
for Higher Education, One Dupont Circle, Suite 360, Washington,
DC 20036; e-mail: aaheti.aahe.org; telephone: (202) 293-6440 ext. 56. If
you would like to discuss ways in which your scholarly society might
be involved, please contact me directly, by e-mail: firstname.lastname@example.org, or
by telephone: (307) 766-4825.
[Visit the AAHE Peer Review of Teaching Project's website: http://www.aahepeer.iupui.edu/. -Ed.]
Gabaccia, Donna R. "Thinking Globally, Acting Locally: Peer Review
of History Teaching at UNC Charlotte." AHA
Perspectives March 1996: 21-22.
Ganschow, Tom and John Inscoe. "Talking Teaching at the
University of Georgia." AHA
Perspectives April 1996: 29-30.
Hutchings, Pat. Making Teaching Community Property: A Menu for
Peer Collaboration and Peer Review. Washington, DC: American
Association for Higher Education, 1996.
Massy, William F., and Andrea K. Wilger. "Improving Productivity:
What Faculty Think About ItAnd Its Effect on Quality."
Quinlan, Kathleen and Daniel J. Bernstein, eds.
Innovative Higher Education 20.4 (1996).
Seldin, Peter. "How Colleges Evaluate Professors: 1983 vs. 1993."
AAHE Bulletin Oct. 1993: 6-8, 12.
Shulman, Lee S. "The Peer Review of Teaching: A Framework
for Action: Three Distinctions." A Project
Workbook. Washington, DC: American Association for Higher Education, 1995.
University of California. Report of the University-wide Task Force
on Faculty Rewards. Oakland: University of California, 1991.
AAHE Peer Review of Teaching Project: Participating Campuses
Indiana University-Purdue University Indianapolis
Kent State University
University of California, Santa Cruz
University of Georgia
University of Michigan
University of Nebraska, Lincoln
University of North Carolina, Charlotte
University of Wisconsin, Madison
Additional campuses will be involved in the next phase of work.
Copyright © 1996, James England
How Evaluations of Teaching Are Used
in Personnel Decisions
I have been asked to address how two strategies of
evaluating teaching, peer review and student evaluation, can come together in
the practical setting of personnel decisions.
My experience in higher education has been formed in two
quite different circumstances: the small private liberal arts college and
the large public research university. In other words, I have had
experience at the tails of the distribution of types of institutions in higher
education. In most years, one might expect that having information about the
tails of a distribution would not be terribly useful in figuring out
behavior in the center of the distribution. I will, however, press on, secure in
the knowledge that given the character of this presidential election
year, your capacity to deduce useful information about the center of
a distribution from information about its tails is as acute as it ever will be.
Personnel decisions, more specifically tenure and promotion
decisions, are, of course, based on an integrated evaluation of the
holy trinity of higher education: teaching, research, and service. While
many imagine that the three can be evaluated in isolation from each other,
in fact most people recognize that it is some synthetic construction of
the three evaluations which eventually produces a complete human
being to whom one grants or does not grant tenure. It is also worth noting
that the effort put into combining the three in some useful manner
increases the probability of making correct personnel decisions. But since
the academy has a highly accepted mode of assessing faculty
scholarship (blind refereeing or peer review of publications), I will focus
my attention on how effective use of the teaching assessment tools of
peer review and student evaluation can contribute to a more accurate
and more integrated evaluation of a faculty member.
Let me start by expanding on the fairly obvious: that
personnel decisions have the institutional mission at their core. Only a
small fraction of us have the breadth and quality of performance that
make us ideal for any type of institution at any given time. Faculty
handbooks notwithstanding, why and how much an institution cares
about effective teaching and outstanding scholarship will greatly
influence how it goes about judging scholarship and teaching. This is
where institutional mission comes into play. Let me cast the issue of
institutional mission in terms of the tenure decision, the most
personnel decision made in higher education. Institutional mission
can often be defined as follows: institutions where effective teaching is
a necessary, but not sufficient, condition for granting tenure;
and institutions where outstanding scholarship (or the prospect of the
same) is a necessary, but not sufficient, condition for granting tenure.
The other two possibilities are when either effective teaching or
outstanding scholarship is a sufficient condition for being awarded tenure. I
will leave out these possibilities from this discussion since they exist
only in a caricature of higher education or at institutions where quality
is given lip service and where politics rules the day.
At the first type of institution, where effective teaching is a
necessary condition for the granting of tenure, typically a high-quality liberal
arts college, the quality of (or sometimes the existence of) one's
scholarship is important because it is believed to contribute to one remaining
an effective teacher over the course of a career of 30 or more years.
There are a variety of reasons that guide this thinking:
- Learning complex material is difficult unless it is taught
by someone who is enthusiastically engaged with the material;
- Effective teaching is mentally exhausting and most of us gain
the energy needed from engaging with colleagues in our
- The best way to prevent the "Oracle in the Classroom"
syndrome is for faculty members to put their ideas before peers on a
regular basis in order to experience the humility brought by a
knowledgeable and frank discussion of their
While your own list of reasons for including scholarship as a part
of teaching evaluation will differ from mine, I think most of us would
agree that the effective teacher who is disengaged from the discipline--or
who stays current while not actively participating in his or her
discipline--is an oxymoron. At a college or university where teaching is central to
the institutional mission, scholarship is judged in order to determine
whether a faculty member will continue to be an effective teacher over the
course of his or her career.
At the second type of institution, where scholarship is a
necessary condition for the granting of tenure (typically a research
university), scholarship is judged in terms of its importance to the discipline,
since the production of knowledge is central to the institution's
research mission. While research institutions of high quality subscribe to
the liberal arts college's view of the contribution of scholarship to
effective teaching, one would imagine that all research institutions
would demand both path-breaking research and outstanding teaching. This
is the ideal, but few institutions can afford it. Since effective teaching
and the production of significant or field-defining scholarship require
a substantial commitment of time, only a few (and, I suspect, a
decreasing number of) institutions have the resources to hire a faculty large
enough so that individual faculty members can meet both of these standards.
It is worth noting that, except at the very few research universities
with exceptionally large endowments, the standard for scholarship
varies widely by institution and often by department. Most of us, therefore,
are in need of some careful institutional soul-searching to align
our expectations with our rhetoric, with our financial reality, and with
the competing time demands placed on a faculty member's life.
At either type of institution, but keeping in mind that most
institutions fall somewhere in between those in my experience, we need
to evaluate the quality of teaching and scholarship because they both
play a central role in personnel decisions. Depending on the
institutional mission, it is a matter of priority or of balance. As I stated above, we
do a reasonably good job of evaluating scholarship. We do so because
we have a better understanding of why we are interested in
scholarship, and we have confidence in the established methods for
evaluating scholarship: blind refereeing combined with peer review.
It is the evaluation of teaching that has caused higher education
the greatest difficulty and that often results in a skewed profile of
a faculty member's teaching effectiveness. I think the reasons for this
difficulty can, in part, be explained by the fundamental differences
between teaching and research: research is, in the main, a "global" activity,
while teaching is a "local" activity.
While the standards for scholarship are derived from the
institutional mission, the judgment about whether scholarship has met the
standards is discipline-based and, therefore, made outside the university.
These external reviews provide us with powerful evidence of the quality
of a person's scholarship. At the time of the tenure decision,
problems associated with evaluating scholarship usually come from a lack
of attention to detail in carrying out the peer review or, more often,
from confusion about institutional mission as it relates to scholarship.
The former is a problem that we should be able to solve (although
I sometimes despair that a large organization may not be able to carry
out its procedures with care). The question of institutional mission is
one which, as I have said above, must be clear and consistent. That is,
we need to be clear in our statements of expectations as they appear
in internal memoranda and publications, such as faculty handbooks,
and we need to be certain that the public pronouncements of presidents
and provosts are consistent with these documents.
I have described teaching as a local activity, meaning that it
is evaluated internally. That is, whether one is an effective
teacher depends on the depth of knowledge in one's discipline, and also on
the expectations of one's departmental colleagues and on one's
students' preparation and expectations. Since effective teaching is dependent
on the local culture of the institution, any evaluation of teaching must
take all three of these factors into account. While recognizing that
teaching is, for the most part, a local activity, at too
many institutions it has become a private affair.
One result of the privatization of teaching is that it is difficult, if
not impossible, to evaluate it reliably. Too often we have cut out
colleagues and, in some instances, we have cut out students. Colleagues are
often eliminated out of a misguided notion of "academic freedom" or
because including colleagues presents us with a difficult and
time-consuming process. The elimination of students as a central partner of
teaching evaluation causes us to focus on teaching as performance and not
on student learning as the reason for teaching. I should also note
that involving students in the evaluation of teaching in a thoughtless
manner can lead to a perverse notion of effective teaching. We all know
of institutions where student evaluations are passed out at the end of
every course and in a manner which causes students and faculty alike
to conclude that the quality of teaching is directly related to the quality
of entertainment. If designed carefully, distributed appropriately,
and tabulated thoughtfully, student evaluations of teaching can
contribute to the accurate evaluation of an instructor's teaching and can be
used to improve the teaching abilities of the instructor being
evaluated. Involving peers in the course of reviewing one's teaching can lead
to improving its quality and can create a campus climate that
supports quality teaching. It can also contribute significant evidence of
teaching effectiveness to a personnel file. Involving both, over time, will
provide evidence of teaching effectiveness equal in power to the evidence
we collect about the quality of scholarship. My fellow contributors to
this volume have aptly demonstrated the truth of this statement.
Let me end my remarks by proposing another form of
teaching evaluation that is almost unheard of in higher education, but which
is becoming widely accepted and often demanded in secondary
and elementary education (even if quite controversially so). It is
"student outcomes assessment," as it is called in the current public
education debate. Only a few colleges and universities have caught on to the
value of this form of assessment and have begun to use it in their
teaching evaluation. In the case of the departmental major, student
outcomes assessment would require departments to:
- Define standards of performance for persons graduating with
- Create mechanisms of evaluation to determine whether
students have met the standards; and
- Design curricula and modes of instruction appropriate to
While student outcomes reflect the quality of teaching not by the
quality of a faculty member's input, but by the quality of student learning,
the two do not stand in opposition to each other. Student learning
results from multiple factors, only one of which is the quality of
direct instruction. I am intentionally refraining from supplying details
about how to implement student outcomes assessment because each
institution will have to develop its own formula for implementation that
is appropriate to its own unique circumstances. But I raise this issue
here because a critical element of the success of the peer review of
teaching project comes from the strong support of professional societies, such
as those represented at the ACLS Annual Meeting. At most institutions,
but most acutely at research universities, faculty members often take their cues from their professional societies as much as from their
home institution. Therefore, support from professional societies can do
much to encourage individual institutions to explore student outcomes
seriously as an additional measure of faculty and institutional success.
While I am uncertain about how we translate student
outcomes assessment into evaluations of individual faculty, I do know
that ignoring student learning as demonstrated at an appropriate point
in their collegiate careers is contributing to a significant amount of
public dismay and even cynicism about our enterprise. For the most part,
we have been able to ignore this form of evaluation because the public
has accepted the value of higher education with an almost religious
fervor. Until quite recently, higher education has not been available to,
or required by, a large segment of our society, and we have not had
to justify its existence in comparison to other social goods. Times
have changed. Now we are being held accountable for what we do by
state legislatures and by families who are trying to afford the education
we provide. Convincing the public that the education we give is of
value and that we are, in fact, educating our students requires us both
to describe what we do in clear language and to demonstrate that we
are doing it by assessing student outcomes.
Even though student outcomes may not be immediately
connected to peer review and student evaluations, I raise the issue here
because we need to put it on our collective agendas. It is especially
worth considering as we are now called upon to develop more accurate
and integrated ways of evaluating faculty teaching. Using student
outcomes goes beyond higher education's conventional understanding of
quality teaching and how it is evaluated. We in the academy need to
begin looking at the quality of an individual's teaching as it contributes to
the effectiveness of the department in which he or she teaches. In turn,
a department's effectiveness should then influence individual
tenure decisions. If we are to retain the powerful and much-needed
public support for, and recognition of, the value of higher education in
society, we must be able to demonstrate the value of education through