From Sputnik to the World Wide Web �
A Retrospective View of Citation Indexing

by

Eugene Garfield
Chairman Emeritus, ISIâ
Publisher, The Scientistâ
3501 Market Street
Philadelphia, PA 19104
Tel. 215-243-2205
Fax 215-387-1266
email: garfield@codex.cis.upenn.edu
Home Page: www.eugenegarfield.org

at

ACRL Science & Technology Program Titled
Quantum Leaps by Decade: Future "Caching" the Past - Forty Years of Creating New Communities for Science Librarianship Through Collaboration
ALA Annual Meeting, San Francisco
June 18, 2001
 



 

In March of 1951, I joined the Johns Hopkins Welch Medical Library Indexing Project.  In the post-World War II period, librarians and documentalists were preoccupied with bibliographic control.   The first books I read at the project were Ridenaur, Shaw, and Hal�s Bibliography in an Age of Science1,  and Bibliographic Organization2,  edited by Jesse Shera and Margaret Egan.  The Project�s formation in 1948 reflected the conclusion that bibliography was in fact out of control.

There was enormous duplication between Chemical, Biological, and Physical Abstracts, and other indexing services like the AMA�s Quarterly Cumulative Index Medicus (QCIM) and the Army Medical Library�s Current List of Medical Literature. By the end of the fifties, QCIM and Current List would merge to become the monthly Index Medicus. In spite of the duplication, there were also major gaps. Multi-disciplinary journals like Nature, Science, and PNAS were subject to the ill-defined selection criteria of discipline-oriented services. Equally frustrating were the time lags in indexing and abstracting. The "radical" solution to the timing problem was Current Contents,® first introduced in the life sciences about 1957.

Then in 1958, at the International Conference of Scientific Information, I recommended compiling unified indexes to the abstracting and indexing services3. Indeed, the original proposal in 1955 to create the Science Citation Index® was motivated not only by the need to provide new methods for indexing and retrieving information but also to provide a multi-disciplinary, cover-to-cover index to eliminate the uncertainties of the selective discipline-oriented services.4

By the time ISI had completed the experimental Genetics Citation Index in 1963, we had already compiled the basic data to establish what has been called Garfield�s Law of Concentration5, a variant on Bradford�s Law of Scattering.6 These laws guided us in selecting the most-cited journals for coverage and led to the invention of the journal impact factor and eventually the ISI Journal Citation Reports.® JCR® was published annually as the last volume of the annual cumulations of the SCI® and SSCI.® The journal impact factor is undoubtedly one of the most widely recognized by-products of citation indexing. Its appropriate use in journal selection and evaluation has been equally matched by its abuse in the hands of ill-informed evaluators seeking shortcuts to peer review. However, it has become a ubiquitous yardstick for new editors seeking to compare their journals with the established leaders. Hundreds of papers have been published on the pros and cons of the journal impact factor.

The Roaring 1950�s

In the 1950�s, the launch of Sputnik and the establishment of the All Union Institute for Scientific and Technical Information (VINITI) in Moscow gave a new stimulus to the need for better "documentation," that is, complete bibliographic control. My testimony in Congress in 1963 addressed some of these issues.7 Congressman Pucinski of Chicago tried in vain to establish a national intelligence and documentation center in his fair city as Bob Hayne and I had recommended at the 1995 annual AAAS meeting.8

Watson and Crick published their landmark paper on the helical structure of DNA in 1953.9   The first issue of the Journal of Molecular Biology had appeared in 1958. The Genetics Citation Index project was started in 1960. Its advisory group recognized that the revolution in molecular biology was multi-disciplinary. Key papers on DNA appeared in the Reviews of Modern Physics as well as crystallographic, chemical, and other non-genetics journals. It was concluded that information input had to be comprehensive but at the same time the output needed to be highly selective. So while the huge volumes of the Science Citation Index (SCI) were created for retrieval, Irv Sher and I developed the first commercial service for selective dissemination of information (SDI). As it turned out, SDI proved to be of greater interest to engineers and applied scientists while Current Contents provided so-called current awareness and appealed to the more eclectic browsing needs of basic researchers.10 This is when I defined information retrieval as being of two worlds and cultures � information discovery and information recovery.11 However, more sophisticated literature users took advantage of CC as well as the citation-based searching available in the SDI service called ASCA,® that is Automatic Subject Citation Alert.®

While these developments were accelerating more specialized approaches to retrieving chemical information, such as Index Chemicus,® which had started in 1960, came on to the scene. I have recently described the history of ISI�s involvement in chemical information systems.12

By the end of the 1960�s academic information scientists became preoccupied with the Cranfield concept of relevance, recall, and precision pioneered by Cyril Cleverdon.13 Even though it never produced an operational system, the Cranfield experiments provided researchers in library and information science useful ways to talk about relevance, recall, and precision. These concepts were significant for word-based, traditional types of indexing systems like Medline. There never has been and really cannot be a comparable evaluation of citation indexing because citation behavior defies any traditional measure of relevance. The closest measure of citation-based relevance would be through use of biographic coupling pioneered by Mike Keseler at MIT. Gerry Salton, who is probably the most-cited information scientist of the century, recognized this duality. The recognition that a viable system needs both types of complementary capabilities was reflected in the adoption of title and key word searching in the Permuterm Subject Index of the SCI and Social Sciences Citation Index. This would be further expanded through KeyWords Plus14, a system of derivative indexing in which terms are extracted from the titles of earlier articles cited in the indexed paper.

While the Cranfield ideas have received the continuing attention of historians, the most successful operational system of information dissemination and retrieval of the 20th century, in my humble opinion, was Current Contents. Yet it is barely mentioned as significant in the history of information science. It was deceptively too simple for government bureaucrats to recognize as significant so they chose to support more complex traditional systems. Millions were poured into Chemical Abstracts. I reviewed these developments somewhat bitterly in a 1968 review published in the Bulletin of the Atomic Scientists.15

The decision to launch the Science Citation Index in 1964 was a calculated risk that almost failed. Unlike today, where the need and value of the SCI is taken for granted, in spite of its perceived high cost, the reception of the SCI in the sixties was initially primarily conservative and cautious. My good friend Cyril Cleverdon reviewed the 1964 SCI for Nature16and as expected was concerned with the relevance of citing papers � a theme that would linger for decades. The issue of relevance has never been resolved because traditional notions of relevance cannot be measured in citation-based retrieval. The mere fact of citation may be relevant to the cited author. The "main" themes of citing and cited papers can either be identical or dissimilar as the case may be. A few critics like Julian Smith recognized the value of "systematic serendipity"17 in the SCI. This oxymoron beautifully captures the spirit of citation indexing.

A few perceptive librarians like Evan Farber at Earlham College also recognized its value in teaching undergraduates how to use the scientific literature. But almost from the outset there were libraries for which it was forbidden to buy the SCI lest faculty be judged by citation counts. The fear of the SCI was real. A single faculty member on the library committee could veto its purchase and often did. These included chemists who were loyal to Chemical Abstracts and medical librarians loyal to Index Medicus. And hostility to private enterprise in MLA and other professional societies was palpable.

It was not until 1972 when the Social Sciences Citation Index® (SSCI) and then SCI went on-line with Dialog that the printed edition began to show a small profit. But the decision to go on-line cost us many industrial subscribers who preferred the pay as you go method of searching. But on-line access also encouraged more academic users to try citation-based retrieval. Denying access to SCI by non-purchase of the printed volumes could not prevent enthusiasts from creating faculty citation rankings since they could use the SCI on Dialog or DIMDI. My 1970 paper in Nature also had a powerful impact.18 It demonstrated the correlation between citation frequency and peer recognition � 12 out of the 50 most-cited authors received the Nobel Prize. And a decade later, our study of the 1,000 most-cited scientists19 would enforce this recognition of SCI as a powerful tool for research faculty evaluation. This was reinforced by publishing over 4,000 Citation Classics.

By the middle of the 70s, SCI and SSCI were accepted to the point where we felt confident university librarians would support the introduction of the A&HCI,® which completed the main gap in our coverage of arts and science scholarship. This decision was made mindful of the fact that 80% of the references cited in science were journal items whereas in the humanities, 80% of the cites were to books. In the social sciences, it was 50%. It is still not realized by many librarians and scholars that the A&HCI and SSCI are the most comprehensive book review indexes available. On the other hand, ISI did not take the necessary steps to make publishers aware of its value for that purpose.

With this introductory background, let me try to cover the more speculative aspect of the theme I was asked to comment upon.

While in 1955 I did mention the potential use of citation analysis to study the impact of research,4 and while social scientists quickly caught on to the value of SCI for studying the social stratification of science (helped by the group at Columbia headed by Robert K. Merton�s students, Jonathan and Stephen Cole)20, it was not until the field of bibliometrics and scientometrics took off that the impact of the SCI would be felt in various arenas. V. V. Nalimov invented the term scientometrics. His classic 1969 book21 was followed a decade later by the founding of the journal Scientometrics in 1978. By 1980, the higher impact journals covered in the SCI and Current Contents had become recognized as the place to publish. About that time, a new director of the Research Council in Italy declared war on the political control of research. He simply asserted that researchers (grantees) should have published at least one or more papers in SCI-covered peer-reviewed journals. Whether they were cited or not did not enter the equation for another ten years. However, Italian scientists were the first to publish a book on the Impact Factor.22

Fifteen years later, the Soros Foundation would use similar criteria in selecting the initial winners of research grants in post-Perestroika Russia! And recently it was reported that the new procedure for election to the Russian Academy will be, in part, citation based.

In the meantime, a variety of countries introduced the notion that salaries of researchers should be tied to citation frequency, that is citation impact. The impact factor of journals, having been widely adopted for journal selection in libraries, became a sine qua non of evaluation. But later this number was being used as a convenient surrogate to estimate the citation impact of individual current papers, that did not yet have time to be cited. This stirred up new controversy about citation studies and a great deal of resentment, since careful and informed observers knew that journal impact factors average the skewed citation frequency distributions of papers published in leading journals.23 Seglen, in particular, provided detailed data on this skewness. Nevertheless, this has not deterred the use of the journal impact as a surrogate for author impact.24 The worldwide preoccupation with impact factors is reflected in the large literature on this topic. No less than 100 articles in the past year discuss the pros and cons of these data. And there is great pressure on ISI to modify its method of calculating impact to better reflect long-term vs. short-term impact.25 This is reflected in their new Essential Science Indicators.®

An enormous educational effort that has been required over the past 35 years in teaching first, the use of SCI for information retrieval, and then the judicious use of citation analysis for faculty or research evaluation.26 The wide gap between these two uses of SCI was reflected in a recent review of the festschrift Web of Knowledge by John Ziman in Nature.27 He correctly points out that very little is said in that volume about the SCI�s use for information retrieval. The sociometric and scientometric uses of SCI dominate the work. Comparative studies of SCI with Medline or CAS have more or less become irrelevant since the advantages of citation indexing have become so transparent. Such studies ought to continue in order to stress the value of using both methods as complimentary tools.

A cottage industry of research evaluation has now grown up here and in Europe but most uses of SCI for evaluation will remain hidden and will be blended into other data. As Professor Robert Merton has often said, the highly cited work will eventually be taken for granted and thereby be obliterated.28 Many authors do not even bother to cite the SCI or SSCI when they use its bibliometric data. Its easy accessibility on the web will increase that obliteration. And as full-text material is used in combination with the Web of Science,® the transparency and ease of use could obliterate the awareness that there is an artifact called the Science Citation Index.

However, hypertext linking from one cited reference to another is quite a different matter from an index display. To see the full range of an author�s work and to observe the collective impact of that work will still require the display we obtain in an index or catalog. Individual isolated links will not provide the juxtaposition of the totality of numerous citing authors. And future citation index displays will provide the contextual environment of citations as is demonstrated in autonomous citation indexes.29

I�ll conclude now by merely mentioning how use of SCI will soon blend with historical research as was proposed back in 1964 when Irv Sher, Dick Torpie, and I published the report called "The Use of Citation Data in Writing the History of Science."30 I call this work algorithmic historiography. In the future, when you conduct a literature search on any given topic, the output of that process will be subjected to a new algorithm which generates a brief historical map of the topic in question. Thus, within minutes, having retrieved 1,000 papers on a given topic, you will be informed of the papers and books that are the core of that literature, presented in the form of chronological tables and maps identifying the works in question in their chronological context. Combined with citation indexing in textual context, we will have taken first steps on the way to artificially intelligent, that is, automatic reviewing.31
 
 

References

1.back to text  Ridenour LN, Shaw RH, and Hal AG.  Bibliography in an Age of Science  Urbana:  University of Illinois Press, 90 pages (1950).

2.back to text  Shera JH and Egan M.  Bibliographic Organization: Papers Presented Before the Fifteenth Annual Conference of the Graduate Library School, July 24-29, 1950.  Chicago:  University of Chicago Press, 275 pgs (1951)

3.back to text  Garfield E. "A Unified Index to Science," Proceedings of the International Conference on Scientific Information. November 16-21, 1958.  Washington, D.C.:  National Academy of Sciences - National Research Council, Volume 1, pgs 461-74, 1959.  Reprinted in Current Contents No. 27, pgs. 5-20 (December 27, 1976).  Reprinted in Essays of an Information Scientist, Volume 2, pgs 674-687. Philadelphia:  ISI Press (1977).
http://garfield.library.upenn.edu/essays/v2p674y1974-76.pdf

4.back to text  Garfield, E. "Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas." Science, 122(3159)108-11 (July 1955)

5.back to text  Garfield E.  �The mystery of the transposed journal lists -- wherein Bradford's law of scattering is generalized according to Garfield's law of concentration.�  Current Contents No. 7, pgs 5-6 (August 4 1971).  Reprinted in Essays of an Information Scientist, Volume 1.  Philadelphia:  ISI Press, pgs. 222-223 (1977).
http://garfield.library.upenn.edu/essays/V1p222y1962-73.pdf

6.back to text  a.  Bradford SC.  �Sources of Information on Specific Subjects,� Engineering, 137:85-6 (1934).
   back to text  b.  Bradford SC.  Documentation.  Washington, DC: Public Affairs Press, 156 pgs. (1950).

7.back to text  Garfield E. "Testimony Before the Ad Hoc Subcommittee of the Committee on Education and Labor [Roman Pucinski, Chair], House of Representatives, Eighty-Eighth Congress, First Session, 1963." Hearings on a National Research Data Processing and Information Retrieval Center, Washington, D.C. : GPO, pgs 227-52, 1963.

8.back to text  Garfield, E. and Hayne, R. "Needed - A National Science Intelligence and Documentation Center ," p.1-7, 1955.  Presented at the "Symposium on Storage and Retrieval of Scientific Information". Annual Meeting of the American Association for the Advancement of Science (AAAS),  Atlanta, December 28, 1955.

9.back to text  a.  Watson JD, Crick FHC.  �Molecular Structure of Nucleic Acids:  A Structure for Deoxyribose  nucleic acid,� Nature 171:737-738 (1953).
   back to text  b. Watson JD, Crick FHC.  �Genetical Implications of the Structure of Deoxyribonucleic Acid,� Nature 171:964-967 (1953).

10.back to text  Garfield E.  �Why Is the Engineer So Different from the Scientist?� The Scientist 14(6):4 (March 20, 2000)

11.back to text  Garfield E.  �'Hi-fi' lists in ASCA system reduce noise while measuring research activity,� Current Contents No. 41, pgs. 5-6 (October 11, 1972).  Reprinted in Essays of an Information Scientist, Volume 1, pgs. 368-369.  Philadelphia:  ISI Press (1977)
http://garfield.library.upenn.edu/essays/V1p368y1962-73.pdf

12.back to text  Garfield E.  �From laboratory to information explosions�the evolution of chemical information services at ISI,� Journal of Information Science 27(2):119-125 (2001)
http://garfield.library.upenn.edu/papers/chemheritage18(3)fall2000.html

13.back to text  Cleverdon CW Cranfield Tests on Index Language Devices,   ASLIB Proceedings, 19:173
(1967).

14.back to text  Garfield, E. "The Permuterm Subject Index: an autobiographical review ," Journal of the American Society for Information Science, 27(5/6):288-291 (1976). Reprinted in Essays of an Information Scientist, Volume 7.  Philadelphia:  ISI Press, pgs. 546-550 (1985).

15.back to text  Garfield E. "Chemical Abstracts Service Annual Report to NSF ," Bulletin of Atomic Scientists, 24(6):43-44 (1968).
http://garfield.library.upenn.edu/essays/v7p546y1984.pdf

16.back to text  Cleverdon CW.  �Citation Indexing,� Nature 208(5012):717 (1965).

17.back to text  Smith JF.  �Systematic Seredipity,�  Chemical & Engineering News 42(35):55-56 (1964).

18.back to text  Garfield, E. "Citation Indexing for studying science ," Nature, 227(5259):669-671 (1970).  Reprinted in Current Contents No. 33  (November 18, 1970).  Reprinted in Essays of an Information Scientist, Volume 1.  Philadelphia:  ISI Press, pgs 132-138 (1977).
http://garfield.library.upenn.edu/essays/V1p132y1962-73.pdf

19.back to text  Garfield E.  �The 1,000 Contemporary Scientists Most-Cited 1965-1978 .1. The Basic List and Introduction,� Current Contents No. 41, pgs 5-14 (1981),  Reprinted in Essays of an Information Scientist, Volume 5.  Philadelphia:  ISI Press, pgs 269-278 (1983)
http://garfield.library.upenn.edu/essays/v5p269y1981-82.pdf

20.back to text  Cole JR, Cole S.   Social Stratification in Science.  Chicago: University of Chicago Press (1973).

21.back to text  Nalimov VV and Mul�chenko ZM.  Naukometriya.  Izuchenie nauki kak informatsionnogo protsessa (Scientometrics.  Study of science as an information process.)  Moscow:  Nauka, 192 pgs. (1969).  (Available in English on microfilm:  Measurement of science.  Study of the development of science as an information process.  Washington, DC:  Foreign Technology Division, U.S. Air Force Systems Command, 13 October 1971. 196pgs.

22back to text  Spiridione, G. & Calza, L.  IL Peso Della Qualita Accademica(The Weight of Academic Quality).  Cooperativa Libraria Editrice Universita di Padova, 1995, 123pgs.

23.back to text  a.  Seglen PO. Why the Impact factor of journals should not be used for evaluating research, British Medical Journal, 314(7079).498-502 (February 15, 1997).
     back to text  b. Seglen P O. Evaluation of Scientists by Journal Impact, Representations of Science and Technology;  Proceedings of the International Conference on Science and Technology Indicators, Bielefeld,  10-12 June, 1990.  P. Weingart, R Sehringer, M. Winterhager (Eds), Leiden: DSWO Press, pgs. 240-252 (1992)

24.back to text  Garfield, E.  �Random Thoughts on Citationology, Its Theory and Practice,� Scientometrics, (43)1:69-71 (1998).
http://garfield.library.upenn.edu/papers/scientometricsv43(1)p69y1998.html

25.back to text  a. Garfield E.  �Long-Term Vs. Short-Term Journal Impact: Does It Matter? �  The Scientist 12(3):10-12 (February 2, 1998)
http://garfield.library.upenn.edu/commentaries/tsv12(03)p10y19980202.pdf
    back to text   b. Garfield E.  �Long-Term vs. Short-Term Impact: Part II,� The Scientist 12(14):12 (July 6, 1998).  http://garfield.library.upenn.edu/commentaries/tsv12(14)p12y19980706.pdf

26.back to text  a. Garfield E.  �How to use Citation Analysis for faculty evaluations, and when is it relevant?  Part I.  Current Contents No. 44, pgs 5-13 (October 31, 1983).  Reprinted in Essays of an  Information Scientist, Volume 6.  Philadelphia:  ISI Press, pgs 354-362 (1984).
http://garfield.library.upenn.edu/essays/v6p354y1983.pdf
    back to text  b. Garfield E.  �How to use Citation Analysis for faculty evaluations, and when is it relevant?  Part II. Current Contents No. 45, pgs 5-14 (November 7, 1983).  Reprinted in Essays of  an  Information Scientist, Volume 6.  Philadelphia:  ISI Press, pgs 363-372 (1984).
http://garfield.library.upenn.edu/essays/v6p363y1983.pdf

27.back to text  Ziman J.  �Citation gold standard,� Nature, 410:518 - 519 (March 29, 2001)

28.back to text  a. Merton, R. K.  Social Theory and Social Structure pgs 27-29, 35-38, New York: Free Press  (1968) 702 pgs.
    back to text  b. Garfield, E.  "The 'Obliteration Phenomenon' in Science -- and the Advantage of Being  Obliterated!"  Current Contents No. 51/52 (December 22, 1975).  Reprinted in Essays of an Information Scientist, Volume 2.  Philadelphia:  ISI Press,  pgs 396-398 (1977).

29.back to text  Lawrence, S.  �Digital Libraries and Autonomous Citation Indexing,� Computer 32(6):67 (1999)
http://www.neci.nec.com/~lawrence/aci.html

30.back to text Garfield, E., I. H. Sher, and R. J. Torpie. "The Use of Citation Data in Writing the History of Science."  Report of research for Air Force Office of Scientific Research under contract F49(638)-1256.  Philadelphia:  The Institute for Scientific Information.   (December 1964)

31.back to text  Lawrence, S.  �Digital Libraries and Autonomous Citation Indexing,� Computer 32(6):67 (1999)
http://www.neci.nec.com/~lawrence/aci.html