Bulletin of the Atomic Scientists 24(6) p.43-44, 1968
Chemical Abstracts Service Annual Report to NSFby
Eugene Garfield
In the Spring of 1965, the National Science Foundation (NSF), acting as coordinator for several government agencies, signed a contract with the American Chemical Society (ACS) to develop and test a registry file of chemical compounds and to conduct selected research and development on chemical data handling. The Chemical Society has now reported on the first year of what originally was to be a two-year program costing the taxpayers over $2 million. Already it appears that consideration is being given to escalating the project in duration and size at a cost of $20 million.
"Whether or not the government continues to fund this project is of vital concern to the entire scientific community--not just to the relatively small number of industrial and government chemical organizations that will make the greatest use of this system when and if it comes into operation.
The project and the report should raise a broad range of questions--political, scientific, and economic--for serious consideration by the scientific community and the public at large.
When the federal government subsidizes a new scientific information activity, the scientific and lay communities should be alert because government subsidies are always accompanied by government controls. The degree of control which the government chooses to exercise at any particular moment may be unpredictable but recent history is replete with instances of ill-advised and even silly restrictions placed on the free flow of scientific information in the name of national security.
If, however, the government claims that it does not intend to control but merely to support, an information program, the citizen may rightly ask by what authority it grants a particular private organization a virtual monopoly on a vital source of information. This criticism was anticipated in intragovernmental discussions which preceded the award of this contract to the American Chemical Society. That the particular private organization happens to be a professional society with a tax-free status, will not and should not pacify congressional critics, since the ACS, like other professional societies, is designed to further the self-interest of its members rather than that of the American taxpayer. Indeed, it may properly be asked why the government should subsidize the activities of a wealthy professional special-interest group when that group discriminates against public libraries and other "non-members" through different prices charged for its publications and services.
Why was this large contract awarded without conducting a simple and inexpensive market survey to determine who needs the "product" and what they would be willing to pay for it? Whenever the chemical notation systems are discussed at scientific meetings most chemists appear uninterested. Yet, the same small clique of notation enthusiasts shows up -- representing industry, government, and documentalists -- with perhaps only a dozen or so persons seriously involved. This does not mean that the benefits of a chemical registry and structure searching system will not ultimately benefit thousands of chemists throughout the world, in the same way that mankind may benefit from a hundred million dollar particle accelerator even though only a few may use it. It is rather a question of national priorities.
Criticism also comes from private entrepreneurs in the growing information industry. Such critics ask why government agencies completely by-passed established government procedures in the procurement of this multi-million dollar "research" program. The question is made even more pertinent by the fact that the program is more development than research, since the answers to many of the "questions" spelled out in such detail in this report were available years ago from studies of time files of the Chemical-Biological Coordination Center.
The project shows no respect for well-established statistical sampling techniques. Apparently the project managers feel that the only way to test their system is to build it first and try it later. Having planned on the encoding of 200,000 chemical compounds, the project has done closer to 800,000 and appears to be headed, in the already signed follow-up contract, toward at least doubling that figure. Where does one stop before deciding the system works? The elemental composition of chemical compounds has not changed much since the CBCC analyzed them, and the CBCC frequency distributions are particularly pertinent since Chemical Abstracts (C.A.) plans to go back and pick up several million old compounds.
A number of small developmental studies have been conducted which seem useful, such as comparing magnetic tape typewriters to key punch machines and typewriters for the input of information on chemical structures to a computer. But no one at C.A. seems to have asked why one should spend from one to two dollars per compound to get a structure into the file when it could he done at lower cost using the already tested Wiswesser notation system or a similar system. If one has to draw a structural diagram in order to "encode," then this will be the same in either system. But this requirement happens to be built into the C.A. system because the abstractors have converted diagrams reported by authors into nomenclature. If the encoding were done directly from the original documents or from similar sources containing the structural diagrams, it would be absurd to expect to use character readers in the near future. If one has the structure or a systematic name, then presumably algorithms exist or can be created which would convert systematic notations to connection tables or any other atom-by-atom connection one needs or desires. C.A. is almost obsessed with the objective of producing these connection tables as a byproduct of a structure drawing operation. All this seems irrational if Chemical Abstracts is to be the source of the information.
The fact is that structural diagrams or their equivalents must exist in the original documents that will be retrieved in time registry system. Therefore, much or the activity described in this report is in a vicious circle. Given the structural diagram, it appears to be economical, according to notation enthusiasts, to create a line notation. If the diagram is not given, then a name must be given; else where is the compound reported? Given the chemical name, one could also create algorithms for converting nomenclature to whatever canonical forms one desires. That the present report skirts these fundamental issues derives from the political nature of the controversy that has raged between the C.A. Dyson-International Union of Pure and Applied Chemistry enthusiasts and the Wiswesser enthusiasts. The situation would seem less political if the National Science Foundation or the Office of Science and Technology simultaneously had supported studies by both camps but, given the blessing of a National Academy of Sciences ad hoc committee that C.A. is the logical organization to "do" chemical notation things, accompanied by a government obsession to centralize all chemical information activities somewhere, the result was inevitable - a unilateral approach which is subject to many serious scientific and political objections.
Finally, the free dissemination of scientific information is already in jeopardy through the structure of this C.A. project. An important appendix discusses the means by which C.A. intends to retain the confidential nature of data included in its files. In addition to data on compounds reported in Chemical Abstracts itself, data from such government agencies as the National Cancer Institute and, one presumes, also time chemical-biological warfare branch of the Army, among others, will be tied into this system, but the compounds will not be accessible to those who do not have a right to such information. In brief, the premises of C.A. will now contain information which will be rated as "confidential," and one wonders when security guards and clearance procedures will be instituted in the new $8 million C.A. edifice at Columbus, Ohio, for Chemical Society members and "non-members" alike.
Eugene Garfield is president of the Institute for Scientific Information, Philadelphia, Pennsylvania.