Information Retrieval: Where the Action Was, Is, and Will Be
Eugene Garfield, Ph.D.
E-mail: garfield@codex.cis.upenn.edu
Website : http://eugenegarfield.orgPresented at the Canadian Library Association Meeting
Winnipeg, Canada
June 22, 1974
The film we have just seen presents a humorous but withal rather accurate description of where the action was in scientific communication and information retrieval. From the point of view of the library community, that can best be described as the provision of bibliographic information — article and title retrieval for the patron. For most of the history of our profession, that bibliographic information has been provided by bibliographic tools. In the comparatively very recent past, it has also been provided by the computer — the batch run computer — serving only one master at a time.
The film also introduces us to where the action is— especially as regards ISI. Obviously, the ISI story is not the complete story for the field of information retrieval. But since I first entered the field of library science in 1951, we have essentially come through the era of mechanization and computerization. During the 1950’s punched cards were the rage. In the 1960’s the computer came on strong. Microfilm was always in the background and coming on slowly and stronger.
We are now at the beginning of the on-line era in which the computer can in effect serve many masters simultaneously. This distinction between the computer era to date, the batch-run era, and the on-line era is a very important one. Selective dissemination of information was a typical new product of the batch-run computer era which had its commercial start with ISI ten years ago, has not in my opinion been adequately used by the scholarly community. It has been highly successful in Canada but perhaps its greatest potential is still to be reached as we also begin to use personalized computer terminals — the symbol of the on-line era.
I emphasize that we are only at the beginning of the on—line era. It is perhaps too easy for some people to look at the accomplishments to date as a culmination rather than as a beginning. For over a decade the goal of on-line information retrieval has retreated down the, road before us like a water mirage. A few projects here and there — like MIT's Project MAC, or the ARPA network, were in existence, but were basically experimental, and their direct impact was very limited. The knowledge, and for the most part, the computer capability necessary for on-line systems were there, but it is only in the last two years that reduced data communications costs made the IR application feasible. Economics is always the primary determinant lurking in the background. Perhaps we have been awaiting the day of on-line information retrieval so long that we have perhaps not adequately addressed the question of where the action will be now that it is finally here. It is for this reason that I have brought for you a paper I read at the International Congress of Medical Librarianship in Amsterdam about five years ago.
In a very real sense, on-line information retrieval, even as it exists now, is the culmination of much of the thought and much of the work of the last quarter century. From a very broad point of view I have spent my professional life in building the "World Brain". In two papers I specifically addressed this question. One I gave in Syracuse in 1968 entitled "World Brain or Memex"(1) The other I recently presented at the AAAS Meeting in San Francisco entitled "The World Brain as Seen by the information Entrepreneur"(2). Neither H. G. Wells nor Vanveer Bush had the on-line concept specifically in mind. But conceptually this is what they were talking about. It is of little consequence to the user whether the brain he is addressing or using is stored on film, cards, or tape or paper.
I don’t want to sound like a Cassandra but the development of on-line capabilities will not be entirely peaches and cream for the library community. Undoubtedly the younger generation of library students responds very positively to the challenge of Boolean algebra and search logic. There is a fascination in working at the computer terminal. I saw this recently when we demonstrated the SCISEARCH system at the Medical Library Association Meeting in San Antonio.
But here is my admonition. The exponential growth in the use of on-line searching will require the use, eventually, of personal terminals. Many scientists now already have their own terminals for computational or other needs. While a single terminal may suffice now in a typical medical library using MEDLINE, this will be hopelessly inadequate later on. Students will demand more, as will patrons in large public libraries that eventually succumb to the host of on-line services that will become available. Now you see why I made the distinction between the computer-era to date, the batch run era, in which the computer was the direct tool of the librarian or information scientist, and the on-line era in which the computer serves the ultimate user directly. The scholar’s desire for a personal terminal is not much different than his desire for a personal subscription to Current Contents or ASCA.
What will the role of the librarian be in the era when the personalized computer terminal is available? Presumably you won’t have to catalog books anymore because you will have access to centralized cataloging. But while traditional library functions may wane, the importance of information will grow. All sorts of professionals will eventually come to grips with their growing information needs and seek out information assistants. In medicine they are called paramedical personnel. This is already a reality in many industrial organizations and in a small number of academic institutions. The University of Tennessee in particular trained an important group of specialized librarians who perform such functions in Memphis and elsewhere.
Another aspect of the online era concerns the online display and manipulation of bibliographic data. I don’t have the time here to describe the concept of citation indexing and citation networks to those of you who are familiar with it. There is a good deal of information in the article I distributed, and the Science Citation Index or Social Sciences Citation Index are well-known examples of the concept. There is a display here at the convention, and ISI will be glad to answer any requests for information.
The SCI is already on—line, as some of you are aware. The system is called SCISEARCH. Even though immensely powerful because of its ability to retrieve by citations, SCISEARCH is still only a more sophisticated member of the many systems available such as MEDLINE, Chemicon, etc. The forthcoming availability of the SSCI file online merely adds scope to the system and awakens the interest of librarians who have not been especially interested in science and technology.
On pages 196 and 198 of the reprint you received there are good illustrations of what I envision in the display of bibliographic citation networks. When conducting a literature search, the user, in addition to receiving a traditional bibliography, will also have displayed before him a picture or graph of the chronological and citation relationships among the items retrieved.
Citations display all the properties of word indexing terms because citations are, in fact, alternative and usually unambiguous symbols for concepts traditionally codified by subject headings. Citations can symbolize "simple" one-word concepts, like eugenics or euphenics, as well as "complex" terms, like conduction through thin-films. Furthermore, citations frequently overcome the syntactic difficulties involved in traditional and even modern Boolean systems which often can neither distinguish between "dog bites man" and vice versa or the multiple meanings of homonyms such as plasma, aging, stress, etc.
But citations provide more than alternative descriptors for terminology. The basic source—reference pair which is retained in our data base makes it possible to display relationships between citing and cited documents. Thus, on—line manipulation of citation data provides an exciting added dimension to searching. The graphic displays can highlight the relationships among papers in a field not apparent otherwise––! even in a matrix representation. These displays give the user powerful clues as to the key and influential papers in a field. Thus they help the searcher know where to begin and what references to look at first Thus, the process enters a domain rarely discussed in computer library applications — subjective selection of reading materials. After all, the average reader does not want to be told there are 200 papers and books on a given subject. He usually wants to know the important ones first if that is not already known.
As on—line systems become more prevalent the impact of research on artificial intelligence will be felt. These involve basic linguistic techniques too detailed to mention here. But the whole field of automatic question—answer systems is ripe for basic research. The field of artificial intelligence is quite controversial as some of you may have observed recently in the New York Review of Books.
As another technical detail we can expect greater use of color in VDU. Thus, in a citation network the closely linked members of the network could be shown in read while the less frequently cited or associated could be shown in yellow or blue. Color terminals have been commercially available for some time.
As terminal technology improves the typographical limitations of VDU’s will disappear. The library community has justifiably complained about the limited visual differentiation of computer generated displays. But this used to be true in the Science Citation Index and Index Medicus. The lack of different type fonts and sizes, of heavy and light print — used to be a problem. Now computerized typesetting is routine. The situation will certainly change radically for VDU’s. Computer displayed information will be easier to scan when we can color code the various elements in a citation —— and use the varieties of typographical styles that we worked centuries to achieve.
At ISI we have a deep and abiding interest in historiography and in particular the history of science. In 1964 we first indicated how such graphical displays could be used in historiography(3). The network on page 196 is a graphical representation of the history of research on D.N.A. It was constructed by a non-subject specialist. For those of you interested in the history and development of scientific research and in the sociology of information flow, I would like to add that it is ISI’s hope that in the not too distant future, we can complete the citation record for the 20th century. This would be done in steps, probably 10-year increments from 1960 to 1900. And if that is well received, I would like to cover the 19th century as well. While I am forecasting I might also mention our hopes to process multi-authored books in the same way we have journals. Every professional librarian knows this is a major bibliographic gap.
From history of science it is really only one step to forecasting. The past is prologue.
One area of forecasting for libraries involves the use of our Journal Citation Reports. Next year, ISI will be including the Journal Citation Reports as a part of the Science Citation Index. The JCR for those of you who may perhaps not be familiar with it, is an extensively cross-tabulated report of the frequency with which journals are cited. This is a tool that librarians can and should be using right now to evaluate their collections, and to stay abreast of the scientific literature — in short to keep your finger on where the action is in science. Perhaps we can discuss this during the various discussion periods. We might also discuss ISI’s aspirations to complete the bibliographical citation record for the entire 20th century.
Thus, the field of information retrieval is already merging and overlapping with the fields of sociology and history of science. ISI's database is already enabling us to more accurately perceive and forecast where the action is and will be in research in general. Through graphical and other techniques for presentation of citation relationships, scholars and library administrators will be able to see new areas of research growth or stagnation in other areas. These perceptions can be used to make intelligent book and periodical purchasing decisions. They can also, of course, be used as input for national and international research policy decisions. Similarly, citation analysis can reveal how new research draws upon hitherto discrete branches of knowledge. Such information perceived in a timely fashion could be the basis for reorganizing library collections to promote interdisciplinary research. Thus, the previous invisible colleges will become quite transparent to the outside observer.
At ISI we have a particular interest in chemical information systems. I don’t think this audience is interested in the details of such specialized interests although many of you are familiar with the work of Chemical Abstracts. ISI has been in the forefront of chemical information research. But it is interesting to point out the similarities in the problems faced in computer display of citation and chemical information.
Chemists, particularly organic chemists, tend to think in pictures — that is pictorial representation of chemical structures. Chemists, and other life scientists, are frequently interested in being able to retrieve in response to particular sub-structural elements. Traditional chemical nomenclature was not able to adequately cope with the structural relationships between chemicals. There are various methods for the representation of structure and for their manipulation. ISI has been a proponent of what has become the most widely adopted of such methods — the Wiswesser Line Notation — and has been a leader in the effort to develop interconversion techniques between different representations. For example, we can translate WLN into several different connectivity tables or fragment codes. Just as bibliographic information will be expanded and citation networks displayed on a screen, chemical formulas that have been retrieved from on—line computer memories will be manipulated so that the structures themselves that the notations represent will be displayed for the searcher.
It is not too visionary to describe systems in which the searcher will be able to draw structures on a screen or tablet and be able to query the database in response to the structural elements he has drawn. In fact, there are already very experimental systems in which a researcher can draw a target molecule on a CRT, and the computer then suggests appropriate repetitive pathways to achieve the synthesis of the molecule desired(4).
The field of chemical information handling leads naturally into another area where the action will increase — retrieving data rather than document references. One can forecast systems in which, for example, when the chemist inputs a substructure, not only will he get references to articles describing compounds containing that substructure, but he could be informed as to whether compounds that contain that substructure have been tested for mutagenicity and carcinogeniuty, and if so, what the results were. This sort of data retrieval will, of course, not be limited to chemists. Census data for the social scientist is another obvious candidate. This is all obviously "information retrieval" even if it is not traditional "literature". We in the library and information science community must be prepared for these developments. If not, others will step into the vacuum and our field will pass us by. In short, the increased access to information will stimulate more sophisticated desires on the part of users. Also, the heightened social responsibility of professional scientists and scholars will increase demands for information service(5).
It is, of course, not novel to talk of libraries and computers in the same breath. Many of us have done that for some time. But a concept is beginning to emerge in computer circles that has for some time only been heretically whispered — that is the concept that computer services ought to be fully available just as are library services. The experience at Dartmouth College suggests very strongly that this is indeed economically feasible. I recommend to your attention a recent article in Science, 31 May 1974, by Luehrman and Nevison — "Computer Use Under a Free-Access(6)
Policy", that I think all librarians should read. The analogy between computer use and library use is striking. Also striking is the awareness of the computerniks of the similarity. Librarians must be no less aware.
It is easy to foresee a future in which the library and the computer center are organized into one information service organization. Such a pattern is already beginning to emerge in industrial organizations, particularly in information conscious areas like the pharmaceutical industry. One can also foresee a power struggle, sometimes open, sometimes disguised, as to who will manage such organizations. Whichever faction, if you will, has the clearest claim to having expertise in information service would seem to have the upper leg. If librarians want to be in that position then we must orient our activities and our training to information service, not just bibliographic service. In Stephen Franklin’s "Knowledge Park"(7), in which the information center of the world is created on the Ontario/Quebec line, notice how trivial is the role played by professional librarians. Is this futuristic book accurately predicting the future of our profession?
For librarians it is distressingly easy to forget that their activities are only a part of a long chain of activities in the total communication process of science. It is in the process of creating new information that the scholar relies upon us to deliver old information so that it can be constantly refined and processed. Thus, when I introduce my students at the University of Pennsylvania to the field of information retrieval.
I tell them that the course will be devoted to solving the following problem. A scholar has prepared a manuscript for publication in a journal. During its trip to the editorial office the usual bibliography at the end of the manuscript has been lost. Most of my students are majors in Computers and Information Science so it is not hard for them to accept the following assumption. We have at our disposal a computer with unlimited storage space and completely automatic programming. Describe the process whereby the computer provides that missing bibliography. The student is also told that the manuscript is converted completely into machine language according to whatever system he is using.
If you will accept this problem as a prototype I think you will appreciate where some of the action may be in the future. What it takes my students an entire semester to find out is, among other things, the borderline between the human brain as it deals with bibliographical information during the creative process and the limits of the human brain, on the other side of the borderline, to program a priori, all those processes which are peculiar to the human side. Notice that I have not made the contrast between the human and the computer but rather between the human in us and the computer-like or logical person in us. This schizoid personality of the information scientist — and the awareness of its existence is what enables us to push on to find out exactly what are the limits of the artificially intelligent machine.
The film told you where the action was. I’ve brought you up-to-date on where it is — at least at ISI and I’ve speculated on where it may be. I’d like to think it was in the Wellsian tradition — utopian yet plausible. I hope this sparks some interesting discussion.
References:
1. back to text Garfield, E., "‘World Brain’ or ‘Memex?’ Mechanical and Intellectual Requirements for Universal Bibliographic c Control," Reprinted from "The Foundation of Access to Knowledge," 1968, Syracuse University Press, Syracuse, New York, pp. 169-196. Reprinted in Current Contents/Life Sciences 14(15) M23-M41 (April 14, 1971).
2. back to text Garfield, E. "The World Brain as Seen by an Information Entrepreneur," Presented at the AAAS Symposium on "Reorganizing Information Resources to Improve Decision-Making," San Francisco, February 27, 1974.
3. back to text Garfield, E.; Sher, I. H.; and Torpie, R. J. "The Use of Citation Data in Writing the History of Science" Institute for Scientific Information, Philadelphia, Pa., 1964.
4. back to text Wipke, W. T,; Dyott, T. M. ; Gund, P.; and Still,C. "Stereochemical Considerations in Computer—Assisted Design of Organic Svntheses" Abstracts of Papers, American Chemical Society 164, 40 (August-September, 1972).
5. back to text Garfield, E. "The Responsibility and Role of Chemical Information Scientists in Solving Today’s Crises V Current Contents No. 24, 5-7 (June 12, 1974).
6. back to text Luehrmann, A. W. and Nevison, J. M. "Computer Use under a Free- Access Policy," Science 184(4140), 957-961 (Nay 31, 1974).
7. back to text Franklin, S. Knowledge Park, McClelland & Stewart, Ltd., Toronto, Canada, 1972, 191 p.