Proposal for Research in Mechanical Indexing

April 1956

Indexing work can be divided into two categories: intellectual and mechanical. By intellectual is meant the work done in reading documents to determine the principal subject matter. This usually consists of assigning several subject headings or descriptors to the document in question. Once this intellectual activity is completed, most of the remaining work in preparing indexes is of a clerical or mechanical nature. It is not by coincidence that the use of machines has been limited primarily to the latter. Punched-card machines have been used to facilitate the analysis and compilation of subject heading lists (1); and to print automatically indexes to medical literature (2). The myriad uses of punched—card equipment in searching is always confined to the mechanical phase of the work. The importance of the mechanical problem should not be underestimated. The cost of such work may vary from 25% to 75% of the total cost of an indexing operation. However, the same figures apply to the intellectual problems involved -- more research is needed in this area.

As far back as 1933, IBM machines were used in compiling concordances (3). Busa, using more equipment, compiled a "word index to Aquinas" (4). However, no attempt was made in either case to perform intellectual analysis by machine.

Quite obviously, the work done in recent years on mechanical translation (MT), has a direct bearing on the problem of mechanical indexing (MI). However, it is possible that long before MT has reached a degree of practicality, problems on mechanized indexing may be solved. Although the problems of NI and NT are similar, they are not the same. MT is in some respects vastly more complex since One 18 concerned with complete rendition of a foreign text into one's own language (or another foreign language) rather than enumeration of the principal subject matter discussed.

It is proposed that a small grant be made to study the potentialities for mechanical indexing of texts. The work shall be confined to scientific publications.

The procedure to be followed for the first portion of the project will be:

1. Review of known methods for reducing texts to their fundamentals, especially the techniques of structural linguistics. (cf. Chomsky (5) and Harris (6).

2. Consideration of how these methods can be improved through known machine techniques by combining the experience of workers in linguistics, indexing, and mechanical documentation.

The first phase will primarily serve to familiarize the linguistic expert with indexing and machine techniques and vice versa.

The second phase will determine a methodology for analysis of scientific publications leading to some sample mechanical analyses.

It will be assumed’ that under ideal circumstances, "reading" of texts will be done mechanically either by some type of s canning device or by direct mechanical storage of the text. It is essentially the purpose of this project to formulate the “program” required to analyze a text for indexing purposes once it is stored.

A grant of $10,000 is requested to be used as follows:

1    Linguistic Consultant (part time)                  $  3,000
1    Machine Indexing Consultant (part time)         1,500
      Clerical assistance                                              2,000
      Overhead                                                             1,500

The principal investigator will be Professor Z. Harris; the first linguistic consultant will be Mr. C. Borkowski; and the first indexing consultant will be Mr. E. Garfield. Clerical help will probably be students at the University.

The machine budget will be used for work done at local service bureaus or for rentals as required. All work will be conducted at the University of Pennsylvania, Department of Linguistics.

1.    back to text Garfield, E. J. Documentation 10:1-10 (1954).
2.    back to text Garfield, E. American Documentation 6:68-76 (1955).
3.    back to text Bachne, Punched-card Applications in Colleges and  Universities  1935.
4.    back to text Busa, R., Nachr. Dokumentation 3:14 (1952).
5.    back to text Chomsky, Transformational Analysis, Ph.D. Dissertation 1955.
6.    back to text Harris, Z. Methods of Structural Linguistics 1951.

