Where is the Information Explosion Taking Us?
Presentation at University of Houston, School of Library Science,
Houston, TX., March 4, 1974
As one who writes a weekly editorial essay and numerous other articles each year, speaking to a technical audience presents a terrible dilemma. How can I be original without being redundant? After all, many of you read what I write.
Last year I adopted a technique for solving this problem. A prepared speaker is presumably someone with a prepared speech. As such he has an advantage over his audience because he knows what he is going to say while the audience does not. But it has become fashionable to question the vitality, if not the originality of prepared speeches. So instead of reading a prepared speech I could read abstracts of six different lectures that I could give. You the audience could decide which was most intriguing. Some of the topics I might cover would be "How to Control or Expand the Information Implosion; "How to Forecast Nobel Prize Winners; "The Growth of the Synthetic Chemical Literature; Citation Analysis as a Tool in Evaluating Journals; "Historiographs, Librarianship and the History of Science"; "Citation Networks in Science" or "Where the Action is, Was, and Will Be"; and finally "The Future of the Scientific Journal?"
To this list might be added some of the following topics that might be of interest to this audience. "What is the Value of Scientific Information Services?" In other words, is it cheaper to duplicate research than it is to do a literature search? Your "Director" of Scientific and Technical Communication (Librarian?) must grapple with this problem in his dealings with top management.
He is not alone. The number of information systems managers has been increasing exponentially for a decade. Somebody out there must think that our profession is doing something right.
This reminds me of an editorial I published almost ten years ago in Current Contents entitled "Who are the Future Information Scientists." Not only is information science growing as a profession but each and every scientist is slowly and imperceptibly becoming more and more preoccupied with the fundamental questions which have plagued librarians, documentalists, and classificationists for decades. The correlation and retrieval of scientific data has been part of the scientific method for a long time. But only recently have scientists begun to realize their common ground with such information scientists as Melville Dewey, that brilliant egomaniac who invented the Dewey Decimal Classification System.
One of the many other topics I could cover is the growing number of machine—oriented information dissemination centers. There are so many that we need a trade association called ASIDIC. In addition to computer dissemination the subject of on—line retrieval of information Is a very hot one these days. It was the subject matter of a speech I presented recently at Dartmouth. On-Line retrieval of information is already big business. Whether it survives without government subsidies remains to be seen. I suspect it will and ISI’s databases will contribute to its success.
Scientists today are concerned about the gap between what is actually known and what they know or have access to. We live in an age when almost any statement is obsolete before it is printed or spoken. Therefore, we could also profitably discuss topics as "Can the Scientist Keep Up Today" and "How Does He or She Absorb All That Is Available to Him or Her?" Will it help you if I say that this question was being asked in the 17th century? It was repeated every fifty years or so since and there are documented accounts of this.
Someone has expressed interest in the question "Will the Scientific Information Growth Curve Slacken?" Last Spring, Prof. George Anderla, of the Sorbonne, predicted a 14% per annum increase at least until 1985. While I might dispute his definitions even a 7% per annum growth will have profound effects on the quality of knowledge.
The most satisfactory solution to the feeling of frustration (as distinct from solving the problem itself) is one that was given to me by Henry Evelyn Bliss when he was 84 years old. He was the inventor of the Bliss Classification System for Science. I told him of my frustration in not being able to cope with my interests in so many fields. He had read the basic textbooks in over 100 subject disciplines during his forty years as the librarian of City College of New York only to find that much of what he had learned was hopelessly obsolete. He concluded that the only solution was to perceive the existing and changing relations between these disciplines. That is why he became a classificationist —— as distinct from a classifier or cataloger. Are you, in your own way, a scientist—cataloger rather than a scientific classificationist or classifier? If you do feel this frustration then you are probably a classificationist. The Information explosion does not perturb catalogers.
I could tell you how other organizations are handling their scientific and technical programs. However, I suspect your own experts here could do a better job than I. I no longer can afford the luxury of studying individual company systems. I leave that to the information scientists at ISI, who deal with these organizations daily. In fact, a very substantial part of ISI’s income is derived from its business with technical information centers in large industrial and academic institutions. The list of organizations that use ISI's services is very large. The list of Magnetic Tape customers is quite respectable. Indeed, consider that such organizations as Imperial Chemical Industries, Unilever, Eli Lilly, G.D. Searle, Dow, and others do so. Consider further that in other countries like Canada, Israel, Spain, etc. governmental organizations use our tapes to satisfy their industrial and research needs for current information.
Some of you are vitally concerned about the copyright problem. The Information Industry Association is now awaiting the decision of Williams and Wilkins to appeal the recent decision by the Appeals Court. Considerable testimony has been given before Congressional committees on the problem of copyright. Millions of words continue to appear on this subject. The recent decision of the Soviet Union to sign the Universal Copyright Convention has heightened the interest. Consider that the USSR is using the identical arguments for avoiding copyright payments that are used by the educational lobby in the United States. They too regard non-profit enterprises as sacrosanct. That they have no for-profit enterprises at all is not mentioned. So who will pay the royalty except the central government itself. They had the gall to offer us a royalty of 1/5th of one percent on our Index Chemicus which they admit they photocopy. ISI said it would boycott them unless they come up with a fair and reasonable offer. They are now acting more reasonable.
Another popular question is "Do we need more scientific journals?" As long as science is in a dynamic state new journals will inevitably emerge. Some will be disguised as new sections of old but dynamic journals. Other old and bureaucratically ridden journals will force new ones to emerge. If the Chemical Society of London had responded more readily to the needs of organic chemists the journal Tetrahedron might never have appeared. But as you see Tetrahedron and the Journal of the Chemical Society organic section both thrive now.
The ultimate demise of the scientific journal is forecasted each year but it still is prospering. New journals which respond to changing needs have a good effect on established journals both with respect to timing and editorial content and refereeing methods. Competition is a good thing in all walks of life.
There have been some absurd claims that journals would be replaced by audiocassettes. Perhaps 3M, as a leading tape manufacturer, smacks its chops at this prospect. Although I enjoy Bob Newheart records, I find it hard to listen to a speech on cassettes. To imagine my listening to a taped scientific paper, which I can probably read in 1/5th the time, with double the comprehension, is incredible. Undoubtedly, there is a great potential for audiocassettes in other applications. But the problem of a clinical practitioner with 15 spare minutes between the golf course and the hospital is quite different then the harried research scientist. He will never -have enough time to read all that he can. Audio input is not the solution and neither is a speed-reading course. In spite of the complaints about not enough reading time, scientists perennially complain about poor access to journals.
If anyone tells you that the reprint is dead consider that last year well over 10 million of them were exchanged as a consequence of CC alone. At a recent ACS meeting in Chicago, Harrison Shull, Vice President of Indiana University, pointed out that reprint exchange is an established way of life. In fact, the reprint has made it possible for publishers to recover investments through reprint sales in the face of photocopying and lower subscriptions.
I am sure that most scientists here have access to most of the articles they need. But if you receive 2,000 journals in your library consider that ISI covers three times that many. Even if only 10% of your needs were from these other 3,000 journals providing copies can present a serious logistics problem. If one wishes to avoid infringement of copyright then it seems to me that the alternative ISI provides its customers is important. Unlike most libraries who oppose copyright protection, we have worked out copyright clearances with most of the leading publishers. We regularly pay them royalties on every tear sheet we sell.
Well, this lengthy introduction has provided me an excuse for not getting down to cases. That is not w nature. I generally find it hard to speak in generalities. Before getting down to cases there is an important message I can deliver. I have been preaching it for 20 years. I am not a Doomsday philosopher. I believe the so—called information explosion is a vastly exaggerated phenomenon. No human was ever able to cope with all of the human knowledge available even before the Gutenberg press. The mere growth of per capita consumption of printed paper confuses the issue. After all, that could be replaced by microforms of one kind or another. The Doomsday philosopher tells you that it would require a million years to read everything in the Library of Congress. So what? It took more than a lifetime to read everything in the Alexandrian library
The more fundamental question is —— what is the rate of growth of significant knowledge and wisdom. Philosophers and theologians will argue that wisdom has decreased. But wisdom and knowledge are entirely different realms of human comprehension. I am inclined to agree that our wisdom has changed little in spite of increased fundamental knowledge. What about the question of significances e.g. in the world’s population of scientific journals? This has been variously and absurdly reported to be 25,000 to 100,000 in number. On this ISI’s Journal Citation Reports provide some remarkable insights. An incredibly small number of journals can properly be classified as significant. At most these number from 200 to 500 journals that I have used in helping to design core science libraries in developing countries. And, of these, less than 10% contain most of the articles which very detailed analysis prove to be truly significant. Now how do I define significant? Consider that the average paper is cited about once a year. Consider further that only one out of 25,000 published papers is cited ten or more times within one year of its appearance. There are about one million papers per year published about half of which ISI covers in its various tapes and published citation indexes. This means that about 500 papers per year provide the impetus for the remainder of new scientific knowledge. Let us even stretch this to 1,000, a number that corresponds to the elite membership of the National Academies of Sciences throughout the world. If I could provide them to you would it be so hard for you to read ten to twenty significant papers per week? We know that most scientists spend much more time than that reading. And surely it is not difficult to imagine reading condensations of these papers.
Well, in the near future, ISI intends to do something about this. We already provide a Press Digest in Current Contents that covers an equal number of digests. We intend to augment these digests so that scientists can be aware of the terse conclusions reported in these periodic breakthrough papers.
To illustrate some of the recently published significant papers the first slide shows the 25 most frequently cited 1972 papers. In the short time available to us it is difficult for me to discuss these in depth but you may want to pursue this in the question period. I do not wish to imply that these are the most important papers. I do assert that they rank very high amongst the top 500 to 1,000 that achieve such citation distinction. And yes, this number will allow for the considerable variation expected in small fields like mathematics.
The topic of significant papers is closely related to one of the topics I mentioned earlier as a candidate for our consideration. Scientists are always interested in the techniques I proposed for forecasting Nobel Prize winners. I published on this three years ago in Nature. Since that time about five of these men have gone on to win the Nobel Prize. Some of the other names I listed may not deserve one but they surely deserve consideration. My reason for publishing this list was to show that we could algorithmically generate a list of candidates that was just as respectable as any that would be determined by large—scale award committees.
None of this would have been possible without the Science Citation Index or the tapes that contain the millions of citations we process each year. But you did not invite me here merely to tell you how these tapes might be used to evaluate various research programs or to forecast where the action will be in scientific research —— and that is certainly possible. I’m sure that someone here wants me to comment on the relative merits of different machine databases that are available. I am not going to give you a critique of Chemical Abstracts. They are doing a good job insofar as their job is defined. Indeed, a very large number of our tapes customers use CA tapes as well. Most people who can afford one can afford the other because the cost of tapes is small in comparison to the other costs involved.
Even if the acknowledged superiority of ISI's tapes in timing, accuracy, and comprehensiveness did not matter, I would ask you to consider their use, even for the limited area of chemistry and its applications, because ISI tapes are unique in their indexing method. That method is called citation indexing and it has made possible retrieval and dissemination of kinds of information that are simply beyond the capabilities of CA to provide. If this were not so why is it that over 75% of SCI subscribers have CA?
In the next slide you will see a profile which we use in the ASCA system to provide readers with a weekly report. The subject of this profile is Adhesive Science and Technology. I am not so naive as to imply that an organization such as 3M with such a fundamental interest in this subject can be adequately served by this service which costs $95 per year. However, identifying papers which are related to adhesive science simply on the ground that they refer to one or more specific books or articles or journals or authors that we have specified in this profile produces some rather interesting connections, if I may use this word to cover what has sometimes been called "systematic serendipity". An illustration of this point is the article by Panzer of Esso on "Components of Solid Surface Free-Energy from Wetting Measurements" which you will note does not contain in its title any of the obvious key words that one would ordinarily associate with the subject matter involved. Another example is the article by Floyd of Glidden on "Emulsion Polymers in Coatings."
Perhaps a good illustration of the difference between natural language and the more precise citation language is a recent paper from the J. Med. Genetics 10, 1962-6 (1973) concerning "Mobius Syndrome with Poland’s Anomaly." Some of you may recall the German mathematician and astronomer. However, this article does not-concern either mathematics —— nor does it concern Poland, that country which has produced so many great mathematicians. Poland’s anomaly is not its relations to the USSR but rather a congenital condition first reported by a British physician in Guy’s Hospital Reports 6, 191—3 (1841). In 1888 a German physician P. 3. Mobius reported in the Munich Medical Weekly on congenital bilateral facial paralysis —— a rare disorder which is discussed every decade or so. The use of eponyms is quite common in medicine and science because most of us want to be immortalized in one way or another. This example well illustrates that a citation can be an extension of author indexing. But is this any different than the ability to cluster papers involving a particular chemical reaction or phenomena described by such an eponym? The self—organizing characteristic of citation indexing is even more dramatic in certain fast moving fields where a standardized nomenclature or jargon may not exist. Consider that over 100 papers per year are appearing on such subjects as Poly—A or paramagnetic shift in NMR spectra. Yet, no subject heading list or dictionary contains these terms! Finally, consider the specific case of the Eschenmoser Hydrolysis.
This paper was originally reported in the Swiss journal Helv. Chim. Acta. This paper was published in 1960 and concerns "hydrolysis of hindered esters by lithium halides in pyridine." Synthetic chemists who are interested in following the study and applications of this method can do so simply by specifying the citation for the original paper —— Helv Chim Acta 43, 113 (1960). As with the examples I have cited previously, titles of the relevant papers on this subject are of little help in helping to select pertinent material and yet we have customers who have used this simple question for almost ten years to follow the continuing literature on it. I think that any physical chemists or organic chemists interested in reaction mechanisms can appreciate the near impossibility of doing this kind of literature search by traditional methods.
Let me illustrate one of the diverse games you could play if ISI tapes were in your already sophisticated reservoir of IR systems. Just as ISI can now monitor the ebb and tide of science and technology fields in general anyone else with prescribed interests can do the same in areas of special interest. The key is in knowing what to prescribe. In this next slide is the list of 25 1971 papers most cited in 1972. It is an incredible record of where the action is and will be. Observe that any 1971 paper which achieved the distinction of being cited 13 or more times on this list went on to be cited from 2 to 5 times as often in 1972. There is a strong chance that most of these will be heavily cited again in 1973. Whether they peak in 1973 remains to be seen. But the game of forecasting is in terms of probability not certainty. The record is such that any paper cited 50 or more times in a single year is apt to be cited very frequently’ for many years to come. In some fields like molecular biology it may be replaced or augmented by a related paper which we can easily identify by co-citation analysis and clustering methods applied by our research people, especially Dr. Henry Small. These methods by Henry Small were recently described in the Journal of the American Society for Information Science. This article was reprinted in Current Contents No. 7, 5—10, February 13, 1974. Those of you who are familiar with bibliographic coupling will want to be sure to note the distinction between that conception and co-citation. In bibliographic coupling one clusters documents by comparing the references listed in two or more papers. In co-citation one examines the entries in a Citation Index and compares the list of citing papers.
This last slide shows articles which were cited 10 or more times in any year from 1961 to 1972. For a complete list of the top 100 please refer to my editorial in Current Contents No. 2 for January 9, 1974 and my editorial in Current Contents No. 6 for February 6, 1974.