The Mechanization of Indexing

Eugene Garfield
September 29, 1952

Welch Medical Library
1900 E. Monument Street
Baltimore 5, MD

The most notable achievements in machine documentation have so far along the lines of storage and reproduction of literature) as e.g.. The wide use of microfilm and other micro storage methods, as well as numerous copying methods such as ozalid, xerography, copyfix, etc. These have been most important developments. It is difficult to estimate how much modern methods of reproduction have increased the use of recorded information. It must be added, paradoxically, that other modern methods of reproduction have also tended to prevent use of much recorded information by creating a problem of bulk. And it is precisely this problem of bulk that necessitates comprehensive indexing.

There are many systems presently available for sifting this bulk, including hand sorted punched cards as well as Hollerith machines, photoelectric devices (some in combination with punched cards, others with microfilm) and electronic devices of various kinds. Since codes are usually required in connection with such devices any number of coding systems have been developed for each of these systems. However, in every instance it is presumed that the source document has previously been "indexed" coded, analyzed, described, abstracted, factored, etc.). It is only after a suitable storage has been prepared (punched card file, microfilm reel, magnetic tape or drum, etc.) that these devices can be used to "search" the source documents. This application of machines is still in its infancy. One can only say that searching the warehouse of human knowledge is a terribly important problem and like the problem of storage and reproduction needs more working on.

On the other hand experience has shown that indexes, i.e. printed indexes, can be most useful in searching the literature if properly prepared and maintained. This would include library catalogs as well as directories, and indexes such as the subject and author indexes to various abstracting journals. Here, again, increased interest has been focussed on the application of technology to the production of these "conventional" printed indexes. Operational research has shown, especially in large scale indexing operations, that the preparation of indexes is a combination of repetitive operations conducive to mechanization. The use of machines in this connection has already been attempted with reasonable measures of success. At present the only shortcomings apparent are the resulting typography. nevertheless actual preparation of these indexes has been shown to be amenable to mechanization. Again, the application of machines has been confined to indexing "operations" that come after the material has already been indexed.

From the above it can be seen that it is most essential that we always keep in mind the various meanings, implied or otherwise, of the term indexing.

Indexing, cataloguing or what have you then has become a complex of several activities. It has so far been essential to all indexing systems that the greatest amount of energy must be extended in the so-called creative or intellectual aspects of this task. Machine applications have so far been concerned with the smaller effort that follows. Experienced indexing editors have, in general, been quite skeptical about using machines. I believe these comments may provide them with additional substance for their persistent resistance to "mechanization". When over 80 per cent of an indexing budget is spent for actual editorial work, one cannot help but agree that the only real solution available to the problem of increasing bulk is added revenue. On the other hand studies have shown that this figure can be considerably reduced by the application of technology. Many "editorial" functions can be shown, like many clerical activities, to be repetitive operations amenable to mechanization. The attempted use of machines requires a complete change in the psychology one uses. It is impossible to think in texts of the "old" operation alone. Consequently it is possible to envisage that this figure may be reduced to no less than 50 to 60 per cent, a considerable reduction, but by no means a complete victory for mechanization. On the other hand one must keep in mind that the use of machines invariably results in the development of new services which would never have been contemplated before. Once the information is stored, manipulations possible by machine are quite amazing. It is often difficult to say that the resultant is a mere rearrangement — for after all this is what many "new" ideas are -- a rearrangement of existing ideas and data.

One is then ultimately and unavoidably provoked to ask the question -- what about this other 50 to 80 per cent? The problem has been raised before. Yet no one has dared to venture on this sacred ground of "intellectual" endeavor. The technologists have been scolded by wiser individuals about their anthropomorphic conception of machines, especially the newer electronic devices (the press is equally to blame for its sensationalization of developments in this field) -- and rightly so.

But we must not confuse the issue. To say a machine can think creatively is one thing. To say it can perform thinking operations is another. Not until we are absolutely certain what the processes of thinking really involve can we be excessively critical of the machine maniacs. Man is unfortunately quite egocentric -not too many of us ever really think as creatively as we think we do — consequently it is a blow to be told that most of our present thinking is, basically, as simple as the difference between 0 and 1. Not that the all or more principle is something new, but after all, a machine is something rather ungodly. However, those who fear the possible consequences of such developments can take consolation. At least if our Frankenstein destroys us — we did create him.

On this point, though, there has been an interesting cycle of thought. One hundred years ago addition, subtraction, etc. was a laborious task that many persevering clerks spent their lives at for lack of any easier way. Then came the adding machines. The present generation thinks of such tasks as strictly machine operations whether it be adding up a grocery bill on a huge cash register or sending out tax bills by Hollerith machines. However, there are a few remarkable individuals who have the uncanny ability to add, subtract, etc. faster than machines. We then hear reports about a "young girl with a brain like a machine." Now, before the invention of the adding machine it would be impossible to characterize such an individual as a machine. Genius would have been more appropriate, because genius is in certain respects the ability to do things faster than average. I am certain that the thought has occurred to some that this Indian girl is literally a walking cash register.

All of the above has been indicated to better prepare those persevering individuals who are the "clerks" of the documentation profession, (and Taube has correctly stated that the librarians are themselves to blame for permitting the task of cataloguing to become a "menial" task) for the application of machines to their problems of addition and subtraction.

Can indexing, in all of its interpretations, be mechanized? And will the results of such mechanization be useful to the extent that it eliminates in part or in whole the present tedious tasks of indexing.

Richens suggested that it might be possible to index material on the basis of vocabulary analysis. Unfortunately he did not pursue this thought any further.

The author felt that this approach needed further study. A comparative study was felt to be necessary. Several documents were chosen at random that had been indexed by professional indexers on the staff of Chemical Abstracts as well as the Current List of Medical Literature and members of the Medical Indexing Project. The questions posed were the following:

Assuming that the quality of the indexing in each instance is perfect, how many of the subject headings used were to be found in the 1) title, 2) the abstract, 3) the article.