Revue Internationale  De La DocumentationVol. 32, 1965, No 3, p112-116


Science Citation Index  - Answers to frequently asked questions

by Eugene Garfield
Director Institute for Scientific Information, Philadelphia

Citation indexing has been discussed in several recent articles (1, 2, 3). However, in a recent lecture tour the Science Citation Index was subjected to detailed analysis by our European colleagues. At the suggestion of the Editor of the Revue Internationale de la Documentation, I have prepared the following recapitulation of answers to frequently asked questions concerning citation indexing.

Comprehensiveness of the SCI

A number of British documentalists challenge the claims which are made regarding the comprehensiveness of the Science Citation Index. For example, Cleverdon (4) criticized our coverage of physics. He claimed that SCI must improve its coverage of physics because we indexed only 5% of the journals covered by Physics Abstracts.Even documentalists often overlook the implications of the law of scattering so well enunciated by Samuel Bradford (5). Bradford's law has been confirmed in several recent studies (6, 7, 8). In general, this law states that a small percentage of journals account for a large percentage of the articles published in a specific field of science. This is especially true regarding coverage of significant articles For example, of the 400 journals from which selections were actually taken by Physics Abstracts in 1961 (not the misleading figure of 800 "covered"), about 90% of the articles abstracted came from about 100 journals. The 1965 Science Citation Index covers 90% of these leading sources of physics articles comprehensively (every article, review, technical note, letter to the editor, proceedings of meetings, editorial, book review, etc.). In addition, of course, there are many more journals comprehensively covered in the SCI, that are of great interest to physicists, which are not covered by Physics Abstracts, as e.g. mathematics journals. Our experience indicates that Bradford's law holds true for science in general - not merely special fields.

Source vs. reference coverage

The discussion of comprehensiveness does not end here. In the SCI we not only mean that our coverage of each current source journal is complete (all items are processed) but in addition, all citations in the footnotes or bibliography of each article processed are indexed. Where cited references are habitually found in the text, as in the case of correspondence and proceedings of meetings, we even ferret these citations out. The production system contains many elaborate human and computer error-checking features to help insure both comprehensiveness and accuracy. What about the reference coverage? The 1961 SCI file of 1,400,000 total citations includes 870,000 unique cited works. The 1965 SCI file will contain an estimated total of 3,000,000 citations.

What about the number of publications cited? This is difficult to establish precisely until certain studies have been completed. However, there is evidence to indicate that 20,000 different publications are cited in the SCI. The frequency of citation to any journal is, in part, an approximate function of the size of the individual journal. Thus, in absolute terms the larger journals are cited more frequently than the smaller ones. This statement may seem obvious, but its import is forgotten in discussions of coverage by abstracting services. There are, however, some relatively small journals with high "impact factors" which are cited with exceptional frequency.

Coverage of basic science

Insofar as "basic" or "pure" science is concerned - physics, chemistry, biology, experimental medicine, mathematics, etc. - the SCI coverage of the significant journals is excellent. In the past, coverage of certain exotic language groups has been weak but that has been improved. However, this too is a relative statement. What other Western indexing or abstracting service can claim complete coverage of the leading Soviet journal of science- Doklady Akad. Nauk SSSR? The SCI can make this claim and the Doklady ranks as the fifth largest journal in the world, in terms of articles published each year! Similarly, we comprehensively over Comptes Rendus, Nature, Science, Naturwissenschaften, and most other multi-disciplinary journals, many of which rank as the largest contributors to the journal literature. About one-third of the items published in these journals are not covered by any of the three leading abstracting services, that is, CA, BA, or PA (9).

What about SCI's coverage of technology? This calls for a definition. Libraries in technological institutes consider journals like Review of Scientific Instruments, J. Basic Engineering, Proc. IEEE, etc. to be technological. Others consider these journals basic science. By the former definition SCI covers many technological journals. Where SCI's coverage of "technology" might be considered weak is with regard to the many "trade" journals that exist which have been given a lower priority in our plans, for the time being.

Those who interpret technology as being "applied science," I would ask whether journals which we cover, such as J. App. Chem., J. App. Microb., J. App. Math., would be included. We do not, at the present time, cover the Bakery and Confectioners Journal, the construction industry trade journals, etc. Nor do we yet cover the hundreds of minor clinical medical journals that would have to be included to make the SCI universal. To claim comprehensiveness is not to claim absolute universality.

We believe that a significant sector of the technological literature is in the patents. For this reason we began, with the 1964 SCI, comprehensive inclusion of all current US Patents. This claim cannot be made by any other indexing service in the world.

To support the claim that the SCI is the most comprehensive index published today, consider that the 1965 source coverage is estimated at 275,000 items (articles, patents, etc.) while the reference coverage is about 3,000,000 citations. Since there are also about two authors per source item, the 1965 SCI will be indexing the 275,000 current items with over 3,500,000 tags, or to an average depth of about 13 tags per source item.

The only abstracting service which exceeds our source coverage is the combined sections of the Referativny Zhurnal, but there does not exist, to my knowledge, a single index to this service or a complete set of separate indexes.

Journal selection criteria

While statistical criteria have played an important part in the selection of journals covered in the SCI, we have by no means omitted the evaluation of coverage by experts. Our distinguished Editorial Advisory Board is one of many sources of subjective judgment used in source journal selections. Since it is the subscribers who pay for the SCI, their recommendations are also always given careful consideration. The experience of the Institute for Scientific Information as publisher of Current Contents provides a steady source of additional information for use in the section of significant journals. The expansion and improvement of journal coverage is a never-ending task.

Isn't SCI biased in coverage of English-language journals?

The 1965 SCI includes source journals from about 30 countries. The high proportion of English-language journals essentially reflects the superabundance of research conducted in the US and abroad which is published in the English language. Scientists are, to some extent, catholic in the choice of the works which they cite, and thus create a broad international network of bibliographic and Citation Index linkages. There has never been a study, to my knowledge, which could confirm the purely intuitive judgment of individual scientists in one country that scientists of another country have systematically or deliberately omitted citations to their work. The statistics and sociological aspects of trans-national citations can, incidentally, now be studied for the first time in a systematic way due to the availability of the SCI.

How reliable are authors in citing the literature?

In general, we have found that most authors cite pertinent work to the extent that they know it exists. The average journal item contains 12 references to earlier literature. Even though two scientists may unwittingly publish an almost identical work or closely related work and not cite each other, it is a low probability event for them not to share at least one reference citation in common. These coincident citations are easily found through the SCI. In my European lectures I cited the case of a completely duplicated research effort. The papers concerned shared four common references, although both had small bibliographies.

What is the subject in the SCI?

One of the most difficult adjustments librarians and documentalists must make in using the SCI is in defining the "subject." In the conventional language-oriented index one is accustomed to conducting a search by formulating the search question as key-words or subject headings. However, with the SCI, the question is formulated in terms of known works on the concept of interest. Who is making the search - a librarian or a scientist? The scientist is generally familiar with the literature of his field. When enlisting the assistance of the librarian he may not ordinarily proffer the exact extent of his knowledge. He may simply state the question. "Please get me a list of references on subject x." It is not unreasonable, however, for a librarian to inquire whether subject x is completely new to the scientist, or whether in fact he is familiar with one or more works in the field to be searched. Suppose that a life scientist wishes to study information theory as applied to biology. If indeed he knows nothing at all about information theory, he is in the same category as an undergraduate student seeking information on an elementary topic. The librarian would, through the card catalog or classical index, direct him to a book, an encyclopedia, or a review article on the "subject". On the other hand, if the scientist is knowledgeable in the field, the relevant works, or even authors that he knows of, serve as starting points in his search of the SCI. By use of the Source Index, which accompanies the Citation Index, he can easily eliminate papers, the main theme of which may not be pertinent. The full title of each citing paper is listed in the Source Index. However, in many cases the journal title alone would be sufficient for this decision.

How often is the average paper cited?

One commonly held, though erroneous, assumption is that papers are cited so often in the SCI as to make a search extremely tedious. This is a critical point which is related to the degree of specificity achieved by citation indexing. As a rule of thumb, the average paper will be cited about once each year. During a current year the average paper that is cited will be cited less than twice.

Obviously intuition misleads us. Our intuition apparently only considers the frequently cited papers and ignores those that are rarely or less frequently cited. However, statistics from the 1961 SCI show only about one per 870, or 0.12% of all papers are cited more than twenty times in a given year. And 0 fewer than one in 30,000, or 0.0027/0 are cited more than 100 times per year. Most of the latter papers are landmark papers which would not ordinarily be used as lone starting points in an SCI search. The average reader may' not be interested in the group of 1,100 papers in 1964 which cited the Lowry method for protein analysis. However, this unique compilation of the many applications of the Lowry method can be the explicit answer to questions by some users. Furthermore by use of bibliographic coupling even this large number can be easily reduced. The use of coupling or coincidence between a question-profile and the bibliography of an article is particularly facile in our ASCA system.

The statistical frequency of citations is quite pertinent to the degree of specificity achieved. Many doctumentalists, remembering the lack of specificity in the classical "subject" term, find it surprising that a manageable number of documents is retrieved in the usual SCI search. The frequency distribution of the number of source items found under reference headings in an annual cumulation of the SCI shows tremendously greater specificity than, for instance, the analogous numbers of items listed under subject headings in an annual issue of Index Medicus. This greater precision of the SCI is blatantly obvious even though the SCI indexed more items and with several fold more tags per item than Index Medicus.

When desired, one can easily expand the scope of an SCI search by various techniques. One of these is called cycling. If one begins a search with a particular reference and finds, in the SCI, the related citing papers, the search can be extended by re-entering the citation index with one or more new starting references. These may be the new sources found in the first part of the search, or other references cited in the original reference paper or in the new sources. Indeed, we have observed that this latter procedure is most productive. If two research workers are publishing on the same topic but are in ignorance of each other's work, they will not cite each other. However, it is likely that they will cite at least one other earlier paper in common.

Many British documentalists are under the impression that our advertising of the SCI implies that one can do without the conventional indexes. We regret this reaction to our attempt to call attention to the unusual nature and comprehensive scope of the SCI. Certainly, the SCI is not the only useful reference tool. The SCI does add new dimensions to information retrieval, whether used alone or in combination with classical library tools. Surely the decision as to which single reference work to purchase is usually economic and is a decision faced by the individual scientist as well as the librarian. However, there can be no question that a multi-disciplinary science library ought not be without the Science Citation Index. A broad-based index such as the SCI can serve a multi-disciplinary library on more occasions than might a discipline-oriented index to chemistry, biology or physics. However,. most libraries that serve a multi-disciplinary clientele can usually afford several indexing services. This leads to another important question - the cost of the SCI.

Isn't the SCI expensive?

There can be no question that $1250 per year is, in absolute terms, a lot of money. Few individual scientists can afford such an outlay. Similarly, few individuals can afford to purchase Chemical Abstracts. The average university chemical library will continue the purchase of CA at a cost of $ 1200 per year, but perhaps not in the physics or biology department. Many university science libraries will also subscribe to the SCI for use by all departments in the main library. Furthermore, subscribing institutions which also require the SCI in departmental libraries are given a second copy discount. This means the SCI costs only $ 625 at the departmental level. The SCI, reflecting the scope of interdisciplinary research today, presents a problem in those institutions which have highly decentralized library facilities. We believe a good solution to this problem is the multiple-copy pricing policy we have formulated rather than a frustrating attempt to publish discipline-oriented, and thereby incomplete, citation indexes for chemistry, physics or biology. Such fragmented indexes suffer from the uncertainties of selective coverage. One chemical librarian has suggested that we merely change the name of the SCI to the Chemistry Citation Index so that he could justify its purchase. The coverage of chemistry in the SCI is excellent, but it is difficult to convince administrators and some faculty that a chemistry library needs a science index. We have avoided such semantic ploys so far, but they are not without their validity. The Institute publishes the bi-weekly Index Chemicus to which many libraries subscribe. However, other libraries prefer the Encyclopedia Chimica Internationatis which is the cumulated version of the Index Chemicus. We sometimes find this difficult to comprehend since half the benefit of the IC to the chemist is its timeliness. However, there is no question that IC is needed in any chemical library which purports to be "complete" in its coverage of chemistry. The same is true for the SCI. This use of the term "complete" implies that the library makes a systematic attempt to purchase any reference tool which can significantly add to the retrieval capabilities of the existing reference collection. We believe this capability has been amply demonstrated for the SCI.

In considering the cost of an index, the librarian cannot disregard the reality of his budget. If a medical library has a budget of $ 500 per year, it is obvious that an index like the Index Medicus, which is subsidized by the US Government, would be the first choice. If such a medical library must choose between Chemical Abstracts and the Index Medicus, there is little doubt of the outcome. But if a medical library must choose between CA and the SCI then the choice is neither obvious nor simple. Fortunately, most medical library budgets are large enough to justify purchase of both, but there are some medical research institutions which may face a dilemma in this matter. Limited budgets create a competition between information services. These difficult choices are more frequently encountered in Europe where library budgets are often anachronistically small. One solution to this problem is developing a new awareness of the need for greater library budgets in a world where knowledge is increasing at a rapid rate. However, I find it somewhat distressing to hear a librarian bemoan that he cannot afford the SCI while having just decided to purchase a long run of a journal which will probably be used fewer than ten times per year. It is equally surprising that a library budget can sometimes include an expensive computer, but not basic bibliographic tools.

None of the comments above clarify the question of the worth of the SCI at its present price. Obviously to produce this index requires an expenditure far in excess of a single subscription. The ultimate determination of value must be in terms of many intangibles. However, it is possible to estimate the time saved in a search that would otherwise require considerable time using conventional methods. How does one estimate the value of searches that can be accomplished by" SCI but which are literally impossible using any other tools? Our seminars have demonstrated many such searches, including fundamental questions in synthetic and analytical chemistry.

The futility of a priori definitions

The physics librarian may argue that he is not interested in purchasing an index which is only 500/0 physical. But this raises another fundamental issue in evaluating any indexing service. The borderline between fields like physics and biology is increasingly hazy. Occasionally, physics articles cite bio-medical journals, and conversely. How can you find these "needles in the haystack" unless there is created a multi-disciplinary pool of journals covered comprehensively? The dilemma posed by Bradford's Law is that a great deal of literature must be processed in order to find the items on the "periphery" that may" otherwise be lost. The inappropriateness of defining arbitrary boundaries between physics, chemistry and biology on an a priori basis is substantiated by" studies which have shown the high degree of overlap among the material covered by" the conventional services..

The literature of physics vs. the literature of interest to physicists

An important point which is frequently overlooked in such discussions is the difference between the
"literature of physics" and the "literature of interest to physicists." Few will argue the values of Physics Abstracts to a physics library. Yet the SCI includes a large literature in areas such as mathematics, chemistry, and technology that is of use to the physicist, but is not covered by PA.

Is the older literature cited?

In any one yearly SCI any reference year of the technical literature can be and is cited. The extent of this unusual chronological depth of coverage has been carefully studied and reveals an important and overlooked phenomenon. The frequency of citation to a particular year is partly a function of the amount of literature available to be cited. Since there was much less literature published in 1929 than in 1959, one might expect a lesser percentage of the citations in the 1964 literature will be to articles published in 1929. The peaks and troughs of distribution curves of citations-vs-cited-year are somewhat shallower when this factor is taken into account. Certainly the older literature is cited often enough to be of practical importance in information retrieval with the SCI.

Does the 1965 SCI obsolete the 1964 SCI?

Indeed, the citation of a particular paper in 1965 will not alter the permanent reference value of the 1964 SCI which discloses whether the same article has been cited in 1964. Both the 1964 and 1965 yearly indexes would become obsolete if we published a cumulative five-year or ten-year SCI. Our quarterly SCI's do become obsolete upon publication of the corresponding annual SCI. Each yearly SCI should be searched if the search is to be complete. The best search strategy, however, is to begin with the latest SCI available. If a pertinent 1965 paper is found, it is quite possible that it will anticipate, in its bibliography, 1964 papers that might also be found in the 1964 SCI.

Won't the SCI retrieve many irrelevant papers? "Most of the references which an author cites are not directly relevant to the main theme of the paper, and the result is that a high proportion of the papers to which one is referred will be of no interest." (C. W. Cleverdon, Rev. Int. Doc. 31, 161 (1964).
 

This statement is somewhat strange coming from the same review which states: "The most generally used method of finding relevant papers is by looking at the references quoted in a known paper." (Ibid.) This is perhaps the most frequently misunderstood aspect of citation indexing It derives from a preoccupation, in traditional systems, with the "main theme" approach in indexing. It is this "main theme" approach which misses retrieving any but some of the most obvious papers even though "lesser themes" in other papers may be completely relevant to a particular search. In a conventional index you do not retrieve a paper on amoeba protoplasm studies when looking for papers that have used Einstein's equation for measuring molecular dimensions. You do not for at least three reasons. First, you would have difficulty, to say the least, in finding an appropriate
"subject heading" that conveyed the concept of "application of a particular equation." Second, you would not find an index which has the scope of coverage that includes such "diverse" material. And, third, you would not find the indexing of sufficient depth to properly tag the protoplasm paper. If, for a particular user, the application to amoebic protoplasm is completely uninteresting, a glance at the journal title, let alone the article title, will usually be sufficient for culling. Note, on the other hand, that the user of the SCI is presented with unusual opportunities for discovery by serendipity, as well as by propinquity (10).

On the other hand, who would start with the paper by' Einstein if he were looking for papers whose "main themes" concerned amoeba protoplasm? On the contrary, the starting points for such a search would be other references reflecting the "subject" of amoeba protoplasm. In that case, you would not only retrieve papers whose "main themes" were related to amoeba protoplasm, but you might also retrieve papers whose "main theme" was not amoeba protoplasm but which most assuredly would be "relevant." Of course, what may be relevant may or may not be pertinent. Pertinence can only be assigned retrospectively by a specific user with a particular search concept in mind. Cleverdon's confusion stems from an improper and irrelevant comparison between the "main theme" and the individual papers cited, rather than the subject matter of the starting point in the SCI search and the portion of the retrieved document which does indeed concern that subject else it would not have been cited in the first place.

Conventional "main theme" subject indexing can not justify the expense of the indexing in depth achieved in citation indexing because conventional indexing requires a high degree of subject matter "knowledge which is prohibitively expensive. Furthermore, the complex concepts which are so simply and consistently "codified" by citations can be extremely difficult to describe by conventional indexing terms. This effort in the SCI is performed by the author and it is often impossible for an indexer to duplicate his indexing as the conceptual connections between two papers may be quite difficult for anyone but an expert to deduce (11).

One can usefully discuss degrees of similarity between papers. Citation indexing is probably the best measure of similarity we have at present. Bibliographic coupling, which is derived from citation indexing, is reasonably successful in establishing a measure of similarity between papers. Indeed, if the bibliographies appearing in two papers are exactly alike, it is quite likely that the two papers are the same paper published in different journals.

References

1. back to text Garfield, E. "Science Citation Index - A new dimension in indexing." Science, vol. 144 no. 3619, 1964, pp. 649-654. PDF File
2. back to text Garfield, E. "Citation indexing. A natural science literature retrieval system for the social sciences." The American Behavioral Scientist, vol. 7 no. 10, 1964, pp. 58-61. PDF File
3. back to text Garfield, E. "Reply to Randall review of the SCI." Sci-Tech News, vol. 18 no. 4, 1965, pp. 133, 142.
4. back to text Cleverdon, C. W. "Science Citation Index (Review)." Nature, vol. 203 no. 4944, 1964, p. 446. See also "Rev. Int. Doc.", vol. 31 no. 4, 1964, p. 161.
5. back to text Bradford, S. C. Complete documentation, in: Report of the Royal Society Empire Scientific Conference. London: the Society, 1946, p. 729.
6. back to text Cole, P. F. "A new look at reference scattering." Journal of Documentation, vol. 18 no. 2, 1962, pp. 58-64.
7. back to text Garfield, E. and G. Foeman. Statistical analyses of international chemical research by individual chemists, languages and countries. Paper presented at 148th Meeting of the American Chemical Society, Division of Chemical Literature; Chicago; August 30-September 4, 1964. Abstracted in "Abstracts of Papers" for that meeting, p. 12G.
8. back to text Keenan, S. and P. Atherton. The journal literature of physics. American Institute of Physics Report AIP/DRP PA 1 (1964).
9. back to text Garfield, E. and I. H. Sher. Article-by-article coverage of selected abstracting services, Final report to the National Science Foundation under Contract NSF C-332 (Philadelphia, Institute for Scientific Information, 1964).
10.back to text Stonehill, H. I. Science Citation Index. "Information retrieval by propinquity, (Review)." Chemistry and Industry, vol. 10, 1965, pp. 416-417.
11.back to text Garfield, E. Can Citation Indexing be automated? Paper presented at National Bureau of Standards-American Documentation Institute Symposium on Statistical Association Methods for Mechanized Documentation; Washington, D.C., March 17-19, 1964. (Proceedings of the Symposium-In Press). PDF File