Copyright © 2002 by the American Academy of Arts and Sciences All rights reserved
This document, either in whole or in part, may NOT be copied, reproduced, republished, uploaded,

posted, transmitted, or distributed in any way, except that you may download
one copy of it on any single computer for your personal, non-commercial home use only,
provided you keep intact this copyright notice.

NOTE: I have deleted some sections of this document in order to avoid overwhelming you with information.

Evaluation and the Academy: Are We Doing the Right Thing?
Grade Inflation and Letters of Recommendation

INTRODUCTION
It is a traditional and generally accepted role of teachers to evaluate their students. We usually accomplish this task by assigning grades and writing letters of recommendation. Informally, of course, we are constantly evaluating students in conversations, office hours, and the like. As representatives of a discipline and members of a larger academic community, we also evaluate peers as well as younger col- leagues:it is a well-established professional obligation that common- ly takes the form of letters of recommendation. Evaluation is general- ly considered to be a core function of our collegial life. That all is not well in these domains is no secret:inside and out- side colleges and universities there has been much discussion about grade inflation and the debasement of letters of recommendation (we prefer the term "letters of evaluation.")There is no unanimity about either the causes or consequences of changed standards of evaluation. Even the very existence of a problem is doubted by some observers. Nevertheless, there appears to be enough unease, lack of consensus, and "noise "to justify a closer examination. To that end, an informal group of academics from different fields and backgrounds for the past year met at the American Academy of Arts and Sciences.We asked the same questions for both grades and letters of recommendation: what is the current situation, what are its consequences, and what remedies, if any, are needed and possible? This Occasional Paper represents the results of our discussions. On all these issues we reached a general consensus, although individual differences about some interpretations remain. Our hope is to start a discussion among our colleagues in all different types of insti- tutions across the country.Such discussions could clarify the situa- tion in each college and university and lead to salutary changes.The quality of evaluation admits of no national solution.Each institution has to determine and be responsible for its own standards,and the best beginning is awareness of the issues. Current conditions have to be seen in the context of recent history. Since World War II,colleges and universities --along with nearly all American institutions --have experienced major changes.A few examples will suffice.The number of faculty members and the number and percentage of students seeking higher education have dra- matically increased since that time.The 1950 census indicates that there were 190,000 academics;a decade later there were 281,000,and by 1970 the number had swelled to 532,000.1 In 1998,according to the latest figures from the U.S.Department of Education,there were 1,074,000 faculty members employed by institutions of higher learn- ing.At the turn of the twentieth century only about 1 percent of high-school students attended college;that figure is closer to 70 per- cent today.Racial and gender diversity has also increased markedly over the past several decades.In 1975,there were 11 million students: 47 percent were women,15 percent were minorities (Black,Hispanic, Asian,American Indian/Alaskan Native).By 1997,there were 12,298,000 students,the percentage of women had grown to 56 percent,and minorities represented 25 percent of the student population. At the same time,the country 's tertiary institutions have faced,and some are still facing,serious economic pressures and increased com- petition,and many are far less isolated from the outside world.All sectors of society clamor for access to knowledge and skills available in our laboratories and in other forms of faculty expertise. These changes --largely external in origin --have had a variety of consequences for higher education.In what follows we begin by examining the implications of a specific and in our opinion undesir- able practice that is part of these changes:grade inflation.At first glance,this practice may appear to be of little consequence,but we shall argue that its presence calls into question central values of aca- demic life.

WHAT ARE THE FUNCTIONS OF GRADES?
Professors expect,and have received,a considerable measure of respect in our society.The privileges that flow from this status are related to the functions they perform and the values they bring to these performances.Consensus about these values has become dilut- ed in recent years.For example,there is controversy in some institu- tions over the relative weight to be given to teaching and research, and over the role of political and ideological commitments in teach- ing and scholarship.The appropriateness of faculty unions is a matter of concern for other institutions.Nevertheless,whatever the balance of energies,commitments,and working arrangements,academics are only entitled to the respect they would like to command if they affirm some common standards.Among these,the least controver- sial --perhaps the most elementary --is the imperative for accuracy in evaluating their students 'academic work.Yet,there is overwhelming evidence that standards regarding student grading have changed sub- stantially over time. Grades are intended to be an objective --though not perfect --index of the degree of academic mastery of a subject.As such,grades serve multiple purposes.They inform students about how well or how poorly they understand the content of their courses.They inform students of their strengths,weaknesses,and areas of talent.This may be helpful to students in making decisions about a career.They also provide information to external audiences:for example,to colleagues not only in one 's own institution but to those in other institutions, to graduate schools,and to employers.We believe that this view of grades represents the consensus within the academy. We recognize,of course,that a significant number of students who had low grades in school were spectacularly successful in later life. That fact,however,does not weaken the rationale for grades.No one would claim that grades are a completely accurate index of the com- prehension of subject matter,let alone a predictor of achievement in the world at large.Yet,they remain an efficient way to communicate valid information,but only if a meaningful range of grades exists. Some professors hold the view that low grades discourage students and frustrate their progress.Some contend it is defensible to give a student a higher grade than he or she deserves in order to motivate those who are anxious or poorly prepared by their earlier secondary school experiences.Advocates of this opinion contend that students ought to be encouraged to learn and that grades can distort that process by motivating students to compete only for grades.A few institutions have acted on this premise by using only written com- ments;for example,Hampshire College,Goddard College,and Evergreen State College (all small liberal arts colleges)and until recently U.C.Santa Cruz.2 A more radical view holds that it is inap- propriate for a professor to perform the assessment function because it violates the relationship that should exist between a faculty member and students engaged in the collaborative process of inquiry.Some critics of grades argue that it is a distorting,harsh,and punitive prac- tice. We doubt that these positions are espoused by large numbers in the academic community.Grades certainly are not harsh for those who do well,and empirical evidence for the hypothesis that lowering the anxiety over grades leads to better learning is weak.As for the inappropriateness of professors performing the assessment function, one must ask:who will perform this task?Relegating evaluation to professional or graduate schools and employers simply "passes the buck "and is unlikely to lead to more accurate and fair evaluations. Although the rejection of grading does not represent the academic
mainstream,the criticisms are influential in some circles,and so we will return to them later in this paper.

DOES GRADE INFLATION EX IST:THE EVIDENCE
Grade inflation can be defined as an upward shift in the grade point average (GPA)of students over an extended period of time without a corresponding increase in student achievement.3 Unlike price infla- tion,where dollar values can --at least in theory --rise indefinitely,the upper boundary of grade inflation is constrained by not being able to rise above an A or a 100.The consequence is grade "compression "at the upper end. We will begin by reviewing grading trends as described in the liter- ature,but will confine our sample to undergraduates.The situation in professional and graduate schools requires separate analysis. Relatively undifferentiated course grading has been a traditional practice in many graduate schools for a very long time.One justifica-tion for this may be the wide reliance on general examinations and theses. Most investigators agree that grade inflation began in the 1960s 4 and continued through,at least,the mid-1990s.Several studies have examined the phenomenon over time,as illustrated in the following table:

Patterns of grading show inflation to be more prevalent in selected disciplines.Grades tend to be higher in the humanities than in the natural sciences,where objective standards of measurement are
enforced more easily.13 This was probably always true,but the differ- ences by discipline appear to have increased over time.It is not sur- prising that the "softer "subjects exhibit the severest grade inflation. Although higher grades appear in all types of institutions,grade inflation appears to have been especially noticeable in the Ivy League. In 1966,22 percent of all grades given to Harvard undergraduates were in the A range.By 1996 that percentage had risen to 46 percent and in that same year 82 percent of Harvard seniors graduated with academic honors.14 In 1973,30.7 percent of all grades at Princeton were in the A range and by 1997 that percentage had risen to 42.5 per- cent.In 1997,only 11.6 percent of all grades fell below the B range.15 Similarly,at Dartmouth,in 1994,44 percent of all grades given were in the A range. When considered alongside indexes of student achievement,these increases in grades do not appear to be warranted.During the time period in which grades increased dramatically,the average combined score on the Scholastic Achievement Test (SAT)actually declined by 5 percent (1969 -1993).16 Since the SAT 's recentering in 1995 (when the mean was reset to a midpoint of 500 in a range of 200 to 800) scores increased only slightly --the average combined score in 1995 was 1,010 and in 2000 it was 1,019. By one estimate,one third of all college and university students were forced to take remedial education courses,and the need for remediation has increased over time.One study found that between 1987 and 1997,73 percent of all institutions reported an increase in the proportion of students requiring remedial education.17 Further, from 1990 to 1995,39 percent of institutions indicated that their enrollments in remedial courses had increased.18 Currently,higher education devotes $2 billion a year to remedial offerings,19 and facul- ty have noticed a shift in student ability and preparation.In 1991,a survey conducted by the Higher Education Research Institute found that only 25 percent of faculty felt their students were "well-prepared academically."20 Discussions that led to standards-based reform also show that sys- tems 'administrators,regents,and state boards of education felt a growing unease about the competence of their students.Eighteen states have currently implemented competency tests that all high- school graduates must pass.Similar testing programs are being con- sidered in several states for institutions of higher learning.The University of Texas System,Utah 's State Board of Regents,and the sixty-four campus SUNY system are all considering implementing competency tests.21 Measures of average achievement are far from perfect,but the available evidence does support the proposition that grading has become more lenient since the 1960s.Higher average grades unac- companied by proportionate increases in average levels of achieve- ment defines grade inflation. We have already mentioned that increases in average grades appear to have been especially noticeable in the Ivy League.Because admis- sion into these institutions became increasingly competitive since the 1960s,it might be possible to argue that higher average grades mere- ly reflected a more academically talented student body.There is some evidence for higher quality,but the magnitude of grade increases in Ivy League institutions seems to indicate inflationary pressures as well.22

It is most important to stress that,once started,grade inflation has a self-sustaining character:it becomes systemic,and it is difficult for faculty to opt out of the system.When significant numbers of profes- sors adjust their grades upwards so as to shelter students from the draft --as certainly happened during the Vietnam era --others are forced to follow suit.Otherwise,some students will be disadvan- taged,and pressures from students,colleagues,and administrators will soon create conformity to emerging norms.(The analogy is not perfect,but when the economy experiences price inflation,the individual seller will adjust prices upwards,and in higher education there is no equivalent of government or the Federal Reserve that can arrest that process.) We are describing an inflationary system in which the individual instructor has very little choice.Grade inflation is not the conse- quence of individual faculty failure,lowered standards,or lack of moral courage.It is the result of a system that is self-sustaining and that produces less than optimal results for all concerned.The issue is not to assign blame;rather,it is to understand the dynamics of grade inflation and its consequences. Are there any adverse consequences?Quite a few can be deduced from what we have said.The present situation creates internal confu- sion giving students and colleagues less accurate information;it leads to individual injustices because of compression at the top that pre- vents discrimination between a real and an inflated A;it may also engender confusion for graduate schools and employers.Not to address these issues represents a failure of responsibility on the part of university and college faculties acting collectively:we have the obligation to make educational improvements when needed and when possible.Simply to accept the status quo is not acceptable pro- fessional conduct.We need,if possible,to suggest ways for institu- tions to initiate reforms that will allow as clear gradations as possible to replace the present confusion.

EXTERNAL VERSUS INTERNAL CONSIDERATIONS
Do inflated grades really hamper the selection process as carried out by those who normally rely on undergraduate transcripts?It is very diffi- cult to answer that question with a desirable degree of certainty.We have found no large body of writings in which,for example,employers or graduate schools complain about lack of information because of inflated grades.Informal conversations with some employers and grad- uate schools lead us to believe that the traditional users of grades have learned to work around present practices:they expect to find high and relatively undifferentiated grades,and therefore rely more heavily on other criteria. Graduate schools use standardized tests (e.g.,the GRE),recommen- dations,the ranking of particular schools,and interviews.Grade infla- tion invites admissions committees to place more emphasis on stan- dardized test scores,which is not necessarily in our view a wise shift in emphasis.Corporations conduct their own evaluations --interviewing candidates,checking references,and in some cases testing the analytic skills of candidates.Grades remain an important criterion but their influence may be waning.For example,one survey of the Human Resource Officers (HRO)from Fortune 500 companies in 1978,1985, and 1995 found that the percentage of HROs who agreed that tran- scripts of college grades ought to be included with an applicant 's resume fell from 37.5 percent to 20 percent.45 Judith Eaton,president of the Council for Higher Education Accreditation,asserts that employers have become dissatisfied with grading information,arguing that now "government and business want to know more specifically what kind of competencies students have."46 It is certain that a diminution in the use of grades increases the rel- ative weight of informal evaluations,and thus being in the proper network may become more valuable than personal achievement.As a matter of fairness,society should have an interest in counteracting this trend. Suppose,just for the sake of argument,that the net negative impact of working around grades is small,and in addition that grades are less important to those who --in some manner --choose our graduates. Should we then adopt the radical response either to give no grades at all,or --and it amounts to the same thing --award A 's to all students? In other words,are there wholly internal justifications for formal evalu- ations of students that offer meaningful gradations?The answers have been given at the beginning of this essay.Grades,if they discriminate sufficiently,help and inform students in many different ways,and stu- dents are entitled to these evaluations. For evaluations to accomplish their intended purpose we must question a currently popular assumption in psychology and educa- tion that virtually all students can excel academically across the board --and in life as well.Accordingly,differences in performance are primarily attributed to levels of "self-confidence "or "self-esteem " because this is assumed to be the most important determinant of suc- cess;motivation and talent are relevant,though secondary.The enemy of high self-confidence is criticism,and that is how rigorous evaluation is perceived. These sentiments may be powerful elements in grade inflation: praise motivates accomplishment.There may even be a grain of truth in this proposition,but it is far from the whole truth.Talent as well as motivation remain powerful explanatory factors in achieving success.In fact,most studies do not support the connection between academic success and self-esteem.In a recent comprehensive review article,Joseph Kahne quotes Mary Ann Scheirer and Robert E.Kraut as follows: The overwhelmingly negative evidence reviewed here for a causal connection between self-concept and academic achieve- ment should create caution among both educators and theo- rists who have heretofore assumed that enhancing a person 's feelings about himself would lead to academic achievement.47

THE NEED FOR AND THE POSSIBILI T IES OF CHANGE
Is there a way to change the status quo?There is neither an easy nor a single answer to that question.Since the term "inflation "originated in economics,we can refer to another concept from the same discipline in order to put the question in focus.Gresham 's Law says that if two kinds of money have the same denomination but different intrinsic value --for example,gold coins versus paper money --the bad money (paper)will drive the good money (gold)out of circulation because the good money will be hoarded.The only solution is currency reform in which only a single standard prevails.In education,bad grading practices drive out good grading practices creating their own version of Gresham 's Law.Can we devise the equivalent of currency reform in higher education?The obstacles are obvious.Currencies are controlled by a single authority,and generally a state can enforce uniform stan- dards.None of this exists in the American system of higher education, nor would we favor anything of the sort.Each institution has to make its own assessment and find its own solutions.The best we can hope for is a series of small steps and individual institutional initiatives whose cumulative effects could amount to the beginnings of reform. Recognizing the problem is a meaningful place to start. What are the characteristics of a good grading system? • It should be rigorous,,accurate,and permit meaningful distinctions among students in applying a uniform standard of performance. • It should be fair to students and candid to those who are entitled to information about students. • It should be supportive of learning and helpful to students in achieving their educational goals. Short of a fundamental systemic overhaul or return to an earlier day, neither of which are realistic possibilities,we review various sugges- tions that are contained in the literature. Institutional Dialogue

FACTORS LEADING TO I N FLATED LETTERS OF RECOMMENDATION
Thus far,we have dealt in some detail with the most common form of evaluation,namely,grades.The other major type of evaluation is letters of reference.Faculty members write letters on behalf of colleagues who are seeking promotion,tenure,and other positions,or who are com- peting for grants and fellowships.56 They also provide references for students,which is an integral part of the graduate admissions and employment process.57 This form of evaluation will receive less exten- sive treatment in this paper:the overlap with grade inflation is very large and problems related to letters are unfortunately much less well researched.What evidence is available --empirical,anecdotal,and expe- riential --leads us to conclude that letters of recommendation suffer from many of the same,or worse,weaknesses and problems as grades. A commentary on letters written for promotion and tenure decisions summarized well the prevailing view:"Puffery is rampant.Evasion abounds.Deliberate obfuscation is the rule of the day."58 Letters for students are similarly flawed.A member of Cornell 's admissions com- mittee observed ruefully:"I would search applications in vain for even subordinate clauses like 'While Susan did not participate often in dis- cussions ….'"59 As experienced academics,all of us sense the accuracy of these observations.

LETTERS OF REFERENCE: EVALUATION OR ACCLAMATION?
We believe that since the late 1960s,academics have been less willing to express negative opinions --either about their students or their colleagues.Many reasons for this phenomenon are identical to the forces that have created grade inflation,such as a legacy of the 1960s,

CONSEQUENCES
The consequences of inflated letters of recommendation are much the same as for grade inflation:poorly differentiated and therefore less useful information . •Inflated recommendations do not help external audiences distin-guish between candidates If too many candidates are described with superlatives,one might as well wonder about the use of recommendations at all.67 Furthermore,inflation cheats those excellent candidates who deserve great praise 68 and gives less dis- tinguished applicants an unfair and unearned advantage.69 It may also cause the employer or educational institution to have unrealistic expectations of the candidate.70 •Inflated letters create self-sustaining and systemic pressures that make this form of evaluation almost meaningless. 71 •The evaluation process is driven into increasingly informal chan-nels In some fields,grade inflation has created an increasing reliance on letters of recommendation.72 However,if recommen- dations fail to provide useful information,people who need information about potential candidates will be forced to gather information in more informal ways (e.g.,telephone calls to friends).This may result in a process where the real information is shared primarily in private channels and therefore is not open to outside scrutiny --a strengthening of the "old boy and girl " network.

A FEW RECOMMENDATIONS
Can anything be done?A few partial remedies have been suggested. For example: •Avoid writing "general" letters of recommendation Whenever possible,evaluators ought to write recommendations regarding specific positions rather than writing a blanket "all purpose "let- ter.Research suggests that greater specificity results in less vague and lofty rhetoric.73 Specificity also adds to the perceived credi-
bility of a recommendation in the minds of employers,74 and no doubt fellowship committees as well. •Discuss what you will and will not write with the candidate: Before agreeing to write a letter,discuss with the candidate your assessment of him or her.He or she will then be in a better posi- tion to decide whether to have you write on his or her behalf.75 •Be clear about your expectations regarding confidentiality: Confidentiality tends to produce more honest appraisals,and research suggests that confidential recommendations are less likely to be inflated.76 Insisting on student waivers is desirable. Those in charge of admissions and job searches look more favor- ably on confidential letters.77 Confidentiality can be breached in case of lawsuits,but those are rare events. Faculty members who write letters of evaluation have a two-fold responsibility.First,the candidate deserves to have his or her unique qualities and qualifications accurately and carefully described. Second,evaluators also have a responsibility to the persons who are receiving the letter and using that information to make decisions. Those persons deserve a balanced account of all candidates.A rephrased Golden Rule is the best guide:Write to others the kind of letter of recommendation you would like to receive from them.To follow the rule is responsible professional conduct.Not to follow the rule perpetuates harmful practices in the academy.

CONCLUSION
The reluctance to engage in frank evaluation of students and col- leagues has --as we have shown --many different sources.Indivi- dually,these are less important than the dynamics created by this reluctance.Once it starts,grade inflation and inflated letters are sub- ject to self-sustaining pressures stemming from the desire not to dis- advantage some students or colleagues without cause.This self-sus- taining character eventually weakens the very meaning of evaluation: compression at the top before long will create a system of grades in which A 's predominate and in which letters consist primarily of praise.Meaningful distinctions will have disappeared. Asystem that fears candor is demoralizing.Much is lost in the cur- rent situation,primarily useful information for students,colleagues, graduate schools,and employers.Even if those who need accurate information have learned to "work around the system,"the cost of what prevails today remains high.Instead of moving through formal and open channels,information is guided toward informal and more secretive byways. We know of no quick or easy solutions;habits of thirty years 'dura- tion are not easily changed.But change has to begin by recognizing the many aspects of the problem,and that is why we urge discussion and education about professional conduct and responsibilities. Reform will have to occur institution by institution,and we hope that what we have presented in this paper will offer a good way to begin.

1.Metzger,"The Academic Profession in the United States,"1987.Note:These figures include part-time faculty.

2.U.C.Santa Cruz did not use grades until their traditional practice was changed in March of 2000.At the same time,the faculty decided to continue the use of written comments.

5.Ibid.
6.Juola,"Grade inflation in higher education-1979.Is it over?"1980.
7.Ibid.
8.Levine and Cureton,When Hope and Fear Collide:A Portrait of Today 's College Student,1998.
9.Basinger,"Fighting grade inflation:A misguided effort?"1997;Stone,"Inflated Grades,Inflated Enrollment,and Inflated Budgets:An Analysis and Call for Review at the State Level,"1996.
10.Kuh and Hu,"Unraveling the Complexity of the Increase in College Grades from the Mid-1980 's to the Mid-1990 's,"1999.
11.Weller,"Attitude Toward Grade Inflation:A Random Survey of American Colleges of Arts and Sciences and Colleges of Education,"1986;Reibstein,"Give me an A,or give me death,"1994;Landrum,"Student Expectations of Grade Inflation,"1999.
12.Farley,"A is for average:The grading crisis in today 's colleges,"1995.
13.Wilson,"The Phenomenon of Grade Inflation in Higher Education,"1999.
14.Lambert,"Desperately Seeking Summa,"1993.
15.Report of the faculty committee on examinations and standings on grading pat- terns at Princeton,5 February 1998.
16.The College Board;Levine and Cureton,When Hope and Fear Collide:A Portrait of Today 's College Student,1998;Schackner in Nagle,"A Proposal for Dealing with Grade Inflation:The Relative Performance Index,"1998.
17.Levine,"How the Academic Profession is Changing,"1997.
18.National Center for Education Statistics,"Remedial Education at Higher Education Institutions,Fall 1995-October 1996,"NCES-97-584.
19.Schmidt,"Colleges are starting to become involved in high-school testing poli- cies,"2000.
20.Dey,Astin,and Korn,"The American Freshman:Twenty-Five Year Trends, 1966 -1990,"1991.
21.Schmidt,"Faculty outcry greets proposal of competency tests at U.of Texas," 2000.
22.This is verified by data provided by C.Anthony Broh,director of research for COFHE.
23.Lamont in Goldman,"The Betrayal of the Gatekeepers:Grade Inflation,"1985.
24.Twitchell,"Stop Me Before I Give Your Kid Another 'A,'"1997.

44.Kolevzon,"Grade inflation in higher education:A comparative study,"1981.
45.Spinks and Wells,"Trends in the Employment Process:Resumes and Job Application Letters,"1999.
46.McMurtie,"Colleges are Urged to Devise Better Ways to Measure Learning," 2001.
47.Kahne,"The Politics of Self-Esteem,"1996.

56.Altshuler,"Dear admissions committee,"2000;Mitchell,"The college letter: College advisor as anthropologist in the field,"1996.
57.Ibid.
58.Schneider,"Why you can 't trust letters of recommendation,"2000.
59.Altshuler,"Dear admissions committee,"2000.

67.Ryan and Martinson,"Perceived effects of exaggeration in recommendation let- ters,"2000.
68.Ibid.
69.Ibid.
70.Ibid.;Bok,Lying ,1999.
'71.Ryan and Martinson,"Perceived effects of exaggeration in recommendation let- ters,"2000.
72.Kasambira,"Recommendation inflation,"1984.
73.Hauenstein in Ryan and Martinson,"Perceived effects of exaggeration in recom- mendation letters,"2000.
74.Knouse,"The letter of recommendation:Specificity and favorability of informa- tion,"1983. 75.Fox,personal communication,1 August 2000. 76.Ceci and Peters,"Letters of Reference:A Naturalistic Study of the Effects of Confidentiality,"1984;Shaffer et al.in Ryan and Martinson,"Perceived effects of exaggeration in recommendation letters,"2000. 77.Shaffer et al in Ryan and Martinson,"Perceived effects of exaggeration in recom- mendation letters,"2000.