New Tools for Improving and Evaluating
the Effectiveness of Research

Irving H. Sher and Eugene Garfield
Institute for Scientific Information
Philadelphia, Pa.



Presented at the Second Conference on Research Program Effectiveness, Washington, D.C., July 27-29, 1965.



The use of indirect methods, such as citation analysis, for measuring human performance is not obvious though there is already a sizable literature in which this has been done on a small scale (1-3).

Citation indexing has been extensively developed as a novel method of Indexing research literature (4-6). As a result, useful tools for dissemination and retrieval of research information have been made available which provide unique and useful approaches to the literature. The Science Citation Index (SCI), printed quarterly and cumulated annually, is primarily used for retrospective searching. Automatic Subject Citation Alert (ASCA) is used for selective dissemination of information. ASCA is an individualized weekly alerting service which culls the current literature for answers to specific questions. In both the SCI and ASCA, any known published item, whether an article, book, report, patent, etc., which falls within a user's sphere of interest may serve as a question or starting point. Both systems tell the user where any one of a group of question citations have been cited in the footnotes or bibliographies of the more current scientific or technical literature. By the act of citing, the authors and editors of the current literature establish new conceptual relationships between the current work and any earlier item cited. Citation indexing enables one to utilize these newly established relationships in a novel fashion — by corning forward in time from any known published work to more recent related literature. Common publication practices usually also provide some identifying reference numbers within the text of the citing item that facilitate the location of the exact place wherein this conceptual relationship is described or implied. The unambiguous identification of starting question references is relatively simple and does not require the user to rephrase the concepts involved in his question into words, nor translate those words into standardized indexing terms.

In addition to a general utility for Individual scientists, documentalists, and librarians who work with the scientific literature, SCI and ASCA also provide, for administrators, interesting capabilities that can be used in studying, evaluating, and improving the effectiveness of research programs.

1. Reduction of Replication of Research Effort

Usually even as early as the inception of a research idea, the scientist is aware of earlier published material related to his idea, and in fact upon which his idea of ten is founded. Certainly by the time an idea has reached the state of a preliminary project proposal or a grant proposal, there should be appended a bibliography of citations to the pertinent previous literature. If items related to or describing this same “new” idea have already been published, there is a high probability that at least some of the known related literature will have been cited. Any one citation to known pertinent literature viii alert the researcher and provide an opportunity to see if the idea has indeed already been developed or if modifications in the proposed research are called for.

Through citation analysis, numerous actual cases of unwitting repetition in the publication of research previously recorded by other groups have been established and examined. Even though the Institute for Scientific Information has amassed considerable citation data, finding such examples in the literature is not easy. For every such case of duplication in the literature there probably exist many more incidences where the redundancy was discovered after conception of the research idea but in time to abort duplicate publication. Published replications by different groups, and published apologies, are merely the most extreme examples that highlight the inadequacies of the classical literature retrieval capabilities as applied by authors, their colleagues, review board, journal editorial staff, and referees. With the availability of SCI and ASCA, simple checks can be made on the possibility of pre-existing literature at anytime throughout the research study and without requiring an established nomenclature in what as yet may be an ill-defined field.

2. An Opportunity to Examine Comments, Critiques, Corrections, Applications, and Extensions of Particular Works of an Author as Noted in Subsequent Works by the Author's Peers

By means of the principle of citation indexing, those responsible for the administration and evaluation of research programs can uncover current items which cite published works of interest. The administrator can, by careful examination of these citing works, gain the insight and perspective provided by other researchers active in a field and citing the work under consideration. This new information should be considered judiciously in combination with the facts available from all other sources of information ordinarily considered when evaluating researchers or their publications. Until a great deal more research has been conducted into the many sociological ramifications of citation practices, and unless he is fully cognizant of the detailed characteristics of a particular citation index file, the administrator must eschew any naive and most-likely erroneous interpretations of the bare statistical counts of the number of citation to particular works (3). The Institute is on record as having cautioned against the promiscuous use of quantitative citation data. However, this by no means implies that such evaluations are not possible.

A preliminary experiment was performed to determine whether or not even the most outstanding scientists differ appreciably from the average in the count of citations to their published works. The file studied was the cumulated 1961 Science Citation Index, a list arranged by reference (cited) author and with the characteristics given in Table I. The names of the persons awarded Nobel prices in physics, chemistry, and medicine for the years 1962 and 1963 were checked in the 1961 SCI. Statistics on the number of reference citations were gathered and the counts were com— pared with the values for average authors in this same file as shown in Table II.

Nobel prize winners had a significantly higher number of their papers cited, in part reflecting their high rate of publication. However, there was also a higher number of references to each cited work, as compared with the average. The combination of both terms, that is number of items cited and number of reference citations, gives the number of citations per cited author. This term, the impact factor, reveals the greatest difference between groups. There are 30 times as many citations per average prizewinner as there are per average cited author in the entire file. Not only is this average for the prize winners higher but all but one of the 13 individual prize winners had a value at least several-fold greater than that of the average author. The proportion of self-citations was not unusual for the prize winners. The occurrence of 20 or more citations to individual papers was higher for the prize winners as a group but did not occur in enough instances to make this a reliable statistic.

It should be noted that these calculations make no attempt to correct for possible homographs, that is, a multiplicity of authors that might have the same last name and initials. Nor does this analysis check the number of citations to works in which the individuals concerned appeared as other than first author. Assuming no special propensity for homographs among the names of the prize winners, there appears to be no special trend among these authors to put their names as secondary authors. If such a trend exists, it is unable to mask the unusually large number of citations to the works of these authors. There exists the possibility that, conversely, the prize winners are more likely to put their names first on their papers thus helping to accrue a larger than average number of citations under their names.

Eliminating the one prizewinner who had only four citations to his name leaves 37 citations as the next smallest number among the prize winners. Since there were 4,792 reference authors (from among the 257,900 total cited reference authors in the file) who had >37 citations to their names, we may state that 12 of the 13 prize winners fall in the group of reference authors constituting the top 1.85% of the file. Similarly, if we eliminate the prizewinner who had 37 citations to his name we see that the next lowest value is 84 citations. There are 477 reference authors in the entire file with 84 or more citations to their names, therefore, 11 of the 13 prize winners are in the top 0.185% of the file. Thus, in spite of the assumptions and possible flaws in this analysis, it appears rather obvious that a group of scientists, acknowledged by receipt of the Nobel Prize as outstanding, are similarly outstanding when measured by citation analysis. There is no chance here that the receipt per se of the Nobel prize by these people influenced the number of citations to their works since we have examined the citations which appeared in 1961 for persons who later received the prize in 1962 and 1963.

3. Convenient Way to Keep Track of the Appearance of Various Current Works Published by an Individual or Organization for Use by the Administration and Public Relations Departments

In January, 1965, an experimental corporate index was begun in the SCI and a little later made available in ASCA. All institutions, universities, hospitals, corporations, or other organizations acknowledged as the place of origin of a published journal item are processed as indexing terms. Since the SCI and ASCA processing of journals Includes every article, review, technical note, letter to the editor, correction, proceedings of meetings, etc. appearing In the journal, that is, everything other than ads and the most ephemeral news notices, the corporate indexing feature provides a simple means of locating all the published writings, in a specified group of journals, attributed to an organization. The administrator can use this feature as an external source of information confirming the appearance in print of works written in the administrator's own organization and can also be used to follow the papers published by other groups.

Similarly, the administrator can easily identify ah of the current articles, letters to the editor, etc. authored by any individual. By using the source index that accompanies the SCI or by entering a source author question into ASCA the system will identify all the appropriate works no matter where the individual's name appears in the authorship —whether as primary author or otherwise.

Similar data based on patent assignments can be obtained from the 1964 and 1965 SCI Source Indexes which enable one to determine a man's current publications, including all U.S. Patents, and ah patents assigned to a specific corporate body.

4. An Opportunity to Study the Utilization of the Items Published in a Given Journal and the Amount of Cross-Disciplinary Interest in Those Items

For those administrators engaged in evaluating the effectiveness of specific publications, SCI and ASCA can provide the data required for such evaluations. The items published in a specific journal, for example, can be followed through citation indexing, and the number and variety of citing publications can be examined. When adjusted for the total number of items published by a journal in a give time period, these statistics can yield individual journal impact f actors (7). Similarly, the amount of intra- and interdisciplinary utilization and cross-stimulation can be surveyed. There is also provided quantitative evidence of the utilization by journals published in other countries.

5. An Opportunity to Use and Evaluate New Algorithmic Technique in Writing the History of Science

One of the most elementary applications of citation data, and the derivative SCI and ASCA services, in writing the history of science is that of construction the introductory section to a current manuscript. Earlier relevant works are easily traced back for many generations by examining the bibliographies of more-current known works. The forward-reaching dimension added by SCI and ASCA greatly facilitates the completion of this process by identifying the most recent descendants of the relevant papers uncovered in each generation level. However, this is only one of many reasons for using the citation index for retrieval of information. As will be seen, however, it is in fact difficult to separate completely the historical and bibliographical relationships between scientific events. At the Institute for Scientific Information, we have been studying these relationships and the possible application of citation Indexing and analysis to the history of science. Essentially what we are investigating is the possibility of developing algorithmic procedures that would match the intellectual results obtained by the historian of science In a recent experiment (8) we tested the heuristic utility of citation indexes in writing the history of science In this approach, the history of science is regarded as a chronological sequence of events in which each new discovery is dependent upon earlier discoveries. Models of history were constructed consisting of chronological maps or topological network diagrams. Two such models were used. The first is based on the events in the history of DNA as described by Dr. Isaac Asimov in The Genetic Code (9). The second is based on the bibliographic citation data contained in the documents which are the original published studies of events represented in the Asimov book. The interdependencies of linkages among 40 major events (nodes) included in both network diagrams were mapped and compared. The study confirmed 65% (28 of 43) of the historical dependencies in the Asimov network by corresponding linkages established by citations. However, 31 citation connections were found which had not been noted in The Genetic Code. Thus, each model turned up data not found in the other. The analyses, supported by numerous statistical tables and specially constructed citation indexes, show that the original hypothesis is reasonable. The techniques evolved in this study appear useful in writing histories of science by helping to identify key events, their chronology, their interrelationships, and their relative importance.

In conclusion, there are a number of different methods and occasions for the use of citation data in evaluating and improving the effectiveness of research. The more obvious methods involve the use of citation indexes to retrieve pertinent information on research projects while it is sufficiently current to prevent replication of effort The less obvious methods involve the use of citation indexes to evaluate individual papers and also to use citation counts for determination of impact factors. Finally, citation data can be used, algorithmically, in the reconstruction of the historical development of individual fields or branches of science.

TABLE I 
Statistical Characteristics of 1961 Science Citation Index 
Source journal titles processed
613
Source journal issues processed
5,467
Source items
101,944
Total references cited
1,395,530
Anonymous reference items
20,200
Unique reference authors cited
257,900
Different reference items cited
869,900
Cited reference authors with >37 citations to the sum of their cited works
4,792
Cited reference authors with >84 citations to the sum of their cited works
477

 
 
 
 
 TABLE II 
Citation Characteristics of Nobel Prize Winners 

First
Author
No. Items 
Cited/Author
No. Citations 
Per Author
% Self
Citations
No. Citations 
Per Item
No. Items 
>20 
Citations  Per Author
Average:
    1961 File  
3.37
5.51
7.8
1.57
0.0039
    Prize Winners  
58.10
169.00
10.5
2.90
0.85 
Nobel Prizes, 1962
    Physics Landau, LD
113.00
177.00
-
1.56
-
    Chemistry Kendrew, JC
26.00
95.00
1.0
3.50
1.00 
  Perutz, MF
20.00
84.00
1.2
4.20
1.00 
    Medicine Criek, FH
48.00
119.00
0.8
2.50
-
  Watson, JD
19.00
112.00
-
5.90
3.00 
Nobel Prizes, 1963
    Physics Wigner, EP
67.00
103.00
2.0
1.54
-
  Mayer, M
35.00
37.00
8.0
1.05
-
  Jensen, JR
3.00
4.00
-
1.33
-
    Chemistry Ziegler, C
112.00
166.00
-
1.48
-
  Natta, G
209.00
380.00
26.0
1.82
-
    Medicine Huxley, AF
22.00
109.00
4.0
4.96
-
  Eccies, JC
119.00
422.00
29.0
3.55
1.00 
  Hodgkins, AL
65.00
388.00
-
5.96
5.00 

 
 
Bibliography
 
 
  1. J. H. Westbrook, “Identifying Significant Research,” Science 132, 1229—1234 (1960) Back to Text
  2. M. H. Hodge, “Rate Your Company's Research Productivity,” Harvard Business Review 41(6), 19-122 (1963)
  3. E. Garfield, “Citation Indexes in Sociological and Historical Research,” American Documentation 14(4), 289-291 (1963) Back to Text
  4. E. Garfield, “Citation Indexes for Science,” Science 122, 108—111 (1955) Back to Text
  5. J. Martyn, “An Examination of Citation Indexes,” AslibProceedings 17(6), 184—196 (1965)
  6. E. Garfield, “Science Citation Index—A new Dimension in Indexing,” Science 144, 649-654 (1964). (This paper contains an extensive bibliography on citation indexing).
  7. E. Garfield and I. H. Sher, “New Factors In the Evaluation of Scientific Literature through Citation Indexing,” American Documentation 14(3), 195—201 (1963) Back to Text
  8. E. Garfield, I. H. Sher and R. J. Torpie, “The Use of Citation Data in Writing the History of Science,” Institute for Scientific Information, Philadelphia, December, 1964, 75 pp. Back to Text
  9. I. Asimov, The GeneticCode, New American Library, New York, 1963, 187 pp. Back to Text