Chemical Information as a Commercial Marketplace


Dr. Eugene Garfield

President and Chairman
Institute for Scientific Information
Philadelphia, USA

Proceedings of the Montreux 1989 International Chemical Information Conference, Montreux, Switzerland, Infonortics Ltd. Calne, Collier, H.R. ed., p.1-11, 1989.
_________________________________

In economics, a marketplace is the collection of people, organizations, and technologies that enables commodities to be exchanged, products and services to be bought and sold. A simple model of the marketplace consists of three parts: producers, distributors, and consumers.

In the chemical information marketplace, university and industry researchers are both producers and consumers. Scientific and technical publishers are the primary means of distribution. Information services are secondary distributors. They improve market efficiency by alerting consumers to new chemical information or by retrieving relevant archival information.

Many experts say that chemistry has benefited earlier and more than other scientific disciplines from information products and services. One of the reasons is that chemistry is, and has been, one of the largest and most dominant fields. One hundred years ago, chemistry was indeed the largest research area.[1] In a study of the oldest journals covered in Current Contents, we identified 170 that have published continuously since the 1780s and 1880s.[2] Twenty of these were chemistry journals.

Another reasons is that chemistry has been closely allied with industry. The corporate sector has provided substantial economic incentives to develop chemical information services that support its strategic business interests -- exploratory research, patent applications, field tests or clinical trials, government regulatory reviews, and so on. In a competitive industry, a high value is placed on information that can help companies avoid delays and bottlenecks in bringing new products to market. For example, in the pharmaceutical industry, a one month delay in the development of a product can equate to a loss of as much as $3 million in sales.

Journal Growth

Of course, the general growth of scientific literature also was a key factor in the development of the chemical information marketplace. According to a recent study by D.F. Zaye and W.V. Metanowski, Chemical Abstracts Services, the number of published journals has grown from four in 1660 to more than 70,000 today - a 1.7 million percent increase over 300 years.[3] Price was one of the first to quantify the rising flood of scientific literature. He estimated that the number of journals has doubled every 15 years since the beginning of the 19th century, presumably reaching over 100,000 today.[4] In 1960, Ted Benfey and Laurence Strong stated that the amount of chemical information was growing exponentially, doubling every 13 years since 1900.[5]

1987 SCI Source Item/Citation Distribution

However, these projections and estimates tend to be exaggerated because they are based on a rather broad definition of ‘journals’ that includes trade and other types of serials. My studies of the concentration of source items and citations in a relatively small set of journals repeatedly show that a Bradford-type distribution applies to all fields of science.[6] Also, a study of IS's Current Abstracts of Chemistry and Index Chemicus showed that 30 journals accounted for 68% of the new compounds  announced; that 40 accounted for 88%; and that only 43 journals accounted for 90% of the new compounds.[7]  A 1966 study of Chemical Abstracts showed that only 8% of the journals it covers accounted for 75% of the items it considered important enough to abstract.[8]

Despite recent indications that the literature growth curve is levelling off, the post-war flood of literature has created high demands for both information discovery and information recovery services. By information discovery, I mean services designed to maintain awareness of current information. Information recovery involves services that allow users to access archival information.

Computer

But the chemical information marketplace owes much of its present strength and value to the computer. It provided the power to process vast amounts of information and the speed to maintain adequate currency of chemical information products. The computer is responsible for significant improvements in virtually every aspect of the chemical information industry - data gathering and storage, product design and production, product and service delivery, and communication. Continued innovations in computers and other technologies are driving the chemical information marketplace closer to its ultimate goal - easy, instant, and affordable personal access to any needed bit of information in the entire body of chemical knowledge.

ISI was founded in 1960, and its history spans the growth and diversification of the chemical information marketplace. ISI’s history also spans the emergence of computer, telecommunications, optical disk, and other technologies that have created new opportunities in this marketplace. These experiences are representative of the chemical information industry as a whole, and I will draw on them to describe current and future trends in the marketplace.

CC

ISI’s first chemical information product was introduced about 30 years ago in 1958 -- the Çhemical, Pharmaco-Medical& Life Science edition of Current Contents. CC®was ISI's answer to the information flood -- it allowed chemists and pharmacologists to quickly scan the contents pages of many key journals in their field and related disciplines. At that time, CC covered about 250 journals for a limited number of bulk subscribers in industry, mostly pharmaceutical companies.

I recently became aware of Naturae Novitates, a current-awareness journal that could be considered a precursor of CC. Published between 1879 and 1944, it listed new international literature in botany, chemistry, physics, and other natural sciences.[9] Naturae Novitates illustrates the point that scanning is an old and common habit of scientists, and this accounts for CC’s continued growth, popularity, and specialisation. There are now 11 editions of CC - six weeklies, one biweekly, three semi-monthlies, and a new monthly CC covering the Health Services Administration literature. These CC editions cover about 7,000 journals for about 40,000 worldwide subscribers and ten times as many pass-along readers.

IC

CC was followed in 1960 by Index Chemicus®, a then monthly alert to new chemical compounds and reactions. It provided new chemical information within 90 days of publication in the primary literature, compared to the two-year turnaround of Chemical Abstracts at the time. IC® enabled chemists, pharmacologists, and other investigators to browse and search articles by chemical compound. The IC graphic record of chemical structures also included the author’s abstract, which is why the name was changed in 1969 to Current Abstracts of Chemistry and Index Chemicus.

SCI

Three years later in 1963,1SI published the first issue of the Science Citation Index - the first comprehensive and multidisciplinary index to the scientific literature. SCI® enabled chemists to trace the developments and application of new chemical concepts, methods, and compounds in many different fields through an article’s citation history.

The value of SCI, CACI C® , and CC in the chemical information marketplace depended on their being current, comprehensive, and affordable. ISI began using computers in the early 1960s primarily to speed up production and lower costs. At about the same time, in 1967 Fred Tate announced that Chemical Abstracts Service and all American Chemical Society publications were converting to computer-manipulable form.[10] He predicted

CAS information handling would be computer-based within two years. But this prediction was optimistic by six years - it wasn’t until 1975 that CAS became fully computer-based. [11]

While computers improved production speed and efficiency, the growing computer ifies of chemical information created opportunities for new product development. Since the SCI was developed on IBM punched-card machines, it was a simple transition to publish it by computer. Shortly after, the magnetic tape files were used to design and market a new SDI service - the Automatic Subject Citation Alert (ASCA® ) - in 1967. CAS used its tapes to launch Chemical Titles in 1961, and experimented with an SDI service similar to ASCA. ISI and CAS also made their magnetic tapes available to corporate and institutional subscribers for their internal search needs.

Computers also allowed information providers to fully exploit the advantage of structural formulae, the universal language of chemistry, for indexing and retrieval purposes. Chemists might not come up with the same name for a particular substance, but they usually can draw its structure accurately. These specific, unique, and unambiguous structural descriptors enable chemical information searches to achieve unusually high yield and relevance. [12]

Over 25 years ago, I described an algorithm for the automatic and direct translation of chemical names into chemical formulae.[12] It was based on my doctoral dissertation work, which involved the linguistic analysis of chemical nomenclature. [13] A group of computer scientists at the University of Hull, England, recently published a series of articles reviewing grammar-based techniques for automatically translating chemical nomenclature. [1415,16]

Punched Cards

Early efforts to automate structure handling in the 1930s and 1940s predated computers as we know them today, and relied on comparatively simple mechanical devices that punched and sorted cards. In 1946 Malcolm Dyson developed the first well-defined linear notation to represent chemical nomenclature, but it never was widely used.

Wiswesser Line Notations

The most popular system was the Wiswesser Line Notation, first demonstrated in 1952 by William J. Wiswesser. WLNs were rapidly applied to substructure searching, compound registration, structure-property correlations, and displays. It became so widely used that the international Chemical Notation Association was created to coordinate and direct the development of WLN coding rules.

CSI

ISI used WLNs to build its structure files, and in 1968 began marketing tape files and search software to subscribers interested in substructure searching. In 1970, ISI published the Chemical Substructure Index®, a print product for substructure searching. CSI was an index of permuted WLNS that allowed chemists to locate new compounds containing a specific ring, functional group, or other substructure. CAS also started using WLNS in its Parent Compound Handbook, the modern version of the Ring Index.

Connectivity Table

Another method for structure handling is the connectivity table. If a molecular formula is analogous to a ‘parts list’, then a connectivity table can be likened to an assembly sheet. It gives a complete atom-by-atom, bond-by-bond representation of a structure. Most connectivity tables are based on systems for numbering all non-hydrogen atoms of a compound and a defined set of bond codes. Connectivity tables can be computer-generated from WLNS, nomenclature, structural formulae, etc., and they are the basis for today’s graphic storage and retrieval systems.

Online

By the early 1970s, computers offered much larger memory capacities, faster operating speeds, and direct-access storage systems. These advances, combined with improvements in telecommunications technologies, enabled the establishment of online chemical information systems. In 1972 Dialog be-came a commercial online information distributor, and within a few years offered access to the SCI, CA Condensates, and other electronic files.

Online information sources offered chemists an alternative to manual searches of multi-volume printed reference tools. But the majority of chemists did not - and still do not - perform online searches themselves, because the intricacies of conducting successful searches took time, training, and practice to master. Also, the costs involved in online searching, including the end-user’s time as well as direct online charges, discouraged widespread use by bench scientists. Instead, they relied on information specialists to retrieve current and comprehensive information quickly and easily for them.

However, it was about ten years before online systems were able to offer users the option of conducting structure searches. In 1981 Télésystèmes became the first vendor to provide graphic substructure access to the CAS file, and CAS itself followed suit in 1982 with CAS Online.

The capability for graphics structure searching was soon followed by the availability of graphic chemical reaction searching. A pioneer in the field of chemical information retrieval, Jacques Valls of Roussel Uclaf, Paris wrote in 1974 that "retrieving information on reactions is of major - I would say vital - importance and at least as necessary as retrieving information on chemical compounds."[17] work as a travelling UNESCO consultant in the Third World. He has since married a Thai medical librarian after establishing an Information Center at the Asian Institute of Technology in Bangkok.

CCR In-house Database

In 1981, Molecular Design Ltd., San Leandro, California, announced its Reaction Access System (REACCS), and this reaction retrieval system has since become popular worldwide. ISI chose REACCS as the software package for its Current Chemical Reactions® In-House Database, which was introduced in 1987. The CCR In-House Database is a graphic and textual index covering over 5,000 new synthetic methods per year and more than 30,000 individual reaction steps. The database and software are used on a company’s own mainframe computer, making it possible to create specialized databases as well as to combine in-house with ISI data for simultaneous searches.

Other major reaction retrieval software packages include ORAC (Organic Reactions Accessed by Computer, developed by Computer Aided Design, Leeds, UK) and SYNLIB (Synthesis Library, marketed by Smith Kline Beckman, Philadelphia, US).

PC

During the 1980s new technological innovations have opened up direct access to information for bench chemists. A major factor was the marketing of IBM and other personal computers. These desktop models gave individual scientists the power, speed, and memory previously available through huge, centralised time-sharing computers.

Breakdown of Software Categories

The PC ‘boom’ sparked the growth of new software products to meet chemical research and information needs. There are about 75 software packages designed to aid research chemists in five major areas: Structure Management; Structure Drawing; Molecular Modelling; Simulation; and Special Applications. There are also more than 100 text management software packages, including some that are specific to chemistry, such as MDL’s ChemText. These software packages enable chemists to perform searches directly, create their own databases, conduct modelling studies, and prepare manuscripts and reports far more easily than in the past.

CD-ROM

Optical disks are another computer-based technology that provides chemists with direct access to information. The storage capacity of CD-ROMs is phenomenal - one 4.75 inch diameter disk can hold the information equivalent of 1,000 books. Improved information compression techniques are rapidly increasing the storage capacity of CD-ROMs.

About 18 months ago, ISI introduced the SCI CD Edition, which contains all the information in the 18-volume print edition on two independently searchable disks. The SCI CD, updated quarterly, provides the same comprehensive and timely coverage as the print version. It also offers the ease and speed of an online search without the telecommunications and online charges. In addition, the SCI CD offers a variety of access and browse features not available online or in print.

For example, the SCI CD makes available for the first time the full power of citation indexing through bibliographic coupling. This retrieval strategy allows chemists not only to identify a particular article of interest but also to automatically locate and examine ‘related records’, other articles that cited one or more of the references included in the original article being searched. Usually, the first 20 records are ranked by the number of shared references. Bibliographic coupling is based on the idea that the number of cited references shared by two or more papers is a measure of their similarity in concepts, topics, or methodologies. Mike Kessler examined bibliographic coupling between physics papers over 25 years ago.[18]

Info Access

CD-ROM is another technological step on the way toward fully realizing the ultimate promise of the chemical information marketplace - putting the world’s knowledge at everyone’s fingertips. We are not there yet, but we are close - continued advances in computer, communications, and storage media technologies might make it possible to realize the dream of universal information access in the early 21st century.

This dream has inspired many future-oriented thinkers for a very long time. Joshua Lederberg, the Nobel laureate geneticist and president of Rockefeller University, observed that the efficient refinement and sharing of human knowledge was an idea that obsessed Gottfried Willhelm von Leibnetz, the renowned German philosopher and mathematician of the 17th century.[19] In 1938, H.G. Wells described his vision of universal information access, which he called the World Brain.[20] In 1945, Vannevar Bush presented Memex, his concept of the information workstation of the future.[21] And Manfred Kochen pursued this idea with many others who were interested in the World Encyclopedia.[22]

Workstation

What might the chemical information workstation look like in the Year 2000 and beyond? The heart of the workstation will be the personal computer, which will have the combined power and speed of today’s supercomputers and parallel processors. Optical disks which can be read, erased, and rewritten will be used to store and search large corporate files and archival databases as well as to create and update personal databases. More books and journals will also be produced on optical and floppy disks for electronic scanning and reading on the workstation.

Online searching via the workstation will be limited primarily to retrieving and downloading current information. Searches of external databases will be easier due to the emergence of improved gateways, and widespread use of artificial intelligence software which can automatically formulate precise and personalized searches. Internal databases will also be more easily accessible, and they will increasingly include licensed portions of frequently used commercial databases in order to preserve corporate confidentiality and avoid telecommunications charges.

Through fiber optic digital communication lines, the workstation will enable chemists to exchange textual, graphic, and verbal information simultaneously and virtually instantaneously. Advances in speech recognition and synthesis will allow chemists to talk to their workstations and verbally initiate operating commands or enter meeting minutes and correspondence. Of course, chemists will still use their workstation as they do PCs today - to automatically record, manipulate, and analyze experimental data; prepare manuscripts; and draft reports and correspondence.

The World Brain is becoming a practical and feasible concept as a result of improved technologies that increase our abilities to store, retrieve, manipulate, and transmit data. However, technology is no longer the rate-limiting factor in the full realization of this vision of universal information access.

Connector

Rather, what is now needed to attain this vision is total interconnectivity, which involves overcoming current problems in accessing and sharing information. Total interconnectivity is not solely a question of technology. It also involves organizational, economic, political, and legal issues which may be more difficult to overcome than technological barriers.

One of the key organizational issues is how to get the many different database producers and host systems to develop and adopt coherent, compatible, and consistent standards. Economic issues include how price, tax, and trade policies will impact domestic and foreign users in the international information marketplace. Political and legal concerns revolve around the questions of privacy and data protection, transnational data flow, national security, and patent and copyright protection.

As the concept of truly universal information access becomes more practical and feasible with improvements in technology, the strategic and competitive value of the information itself will determine if and when the World Brain becomes a reality. Information is an integral component of both research and business. It must be used effectively to avoid wasteful duplication of efforts, improve the quality of scientific output, accelerate corporate decision making, and increase the probability of success in the marketplace.

ISI's mission is to continue to meet the information needs of the worldwide community of knowledge seekers in business, academia, and government. We are committed to providing high quality, timely, and current information products; to add value to information rather than merely reproduce what is available in the primary literature; to make it as easy, comfortable, and affordable as possible to retrieve information; and to remain media-independent so we can deliver data in whatever format the end user finds most effective and desirable.

The chemical information marketplace, and the information industry as a whole, has changed profoundly over the past 25 years. ISI has played a leadership role in stimulating some of these changes, and we intend to continue doing so in the future. The next 25 years should be even more interesting and exciting than the last quarter century.

REFERENCES
1. back to text Kevles DJ, Sturchio JL & Carroll PT. The sciences of America, circa 1880. Science 209:27-32,1980.

2. back to textGarfield E. The 170 surviving journals that CC would have covered 100 years ago. Current Contents (26):3-12, 29 June 1987.

3. back to textZaye DF & Metanowski WV. Scientific communication pathways: an overview and introduction to a symposium. J. Chem. Info. Comp. Sci. 26:43-44,1986.

4. back to textPrice DJD. Science since Babylon. New Haven, CT: Yale University Press, 1975. 215 p.

5. back to textStrong LE & Benfey OT. Is chemical information growing exponentially? J. Chem. Educ. 37:29-30,1960.

6. back to textGarfield E. The mystery of the transposed journal lists - wherein Bradford’s Law of Scattering is generalised according to Garfield’s Law of Concentration. Essays of an information scientist. vol. 1. Philadelphia, PA: ISI Press, 1977. P. 222-3.

7. back to text Garfield E, Revesz GS & Batzig JR. The synthetic chemical literature from 1960-1969. Nature 242:307-9, 1973.

8. back to text Wood JL. The parameters of document acquisition at Chemical Abstracts Service. Paper presented at the American University 8th Annual Institute of Information Storage and Retrieval Meeting, Washington, DC, February 14-17,1966.

9. back to textSchmid R. Naturae Novitates, 1879-1944: its publication and intercontinental transit times mirror European history. Taxon 33:636-54,1984.

10. back to textTate FA. Progress toward a computer-based chemical information system. Chem. Eng. News 45:78-90,1967.

11. back to textWigington RL. Evolution of information technology and its impacts on chemical information. J. Chem. Info. Comp. Sci. 27:51-55, 1987.

12. back to textGarfield E. Chemico-linguistics: computer translation of chemical nomenclature. Nature 192:192, 1961.

13. back to textGarfield E. An algorithm for translating chemical names to molecular formulas. Doctoral dissertation, University of Pennsylvania, 1961. Essays of an information scientist, vol. 7. Philadelphia, PA: ISI Press, 1985. P. 441-513.

14. back to textCooke-Fox DI, Kirby GH & Rayner JD. Computer translation of IUPAC systematic organic chemical nomenclature. 1. Introduction and background to a grammar-based approach. J. Chem. Info. Comp. Sci. 29:101-5, 1989.

15. back to textCooke-Fox DI, Kirby GH & Rayner JD. Computer translation of TUPAC systematic organic chemical nomenclature. 2. Development of a formal grammar. J. Chem. Info. Comp. Sci. 29:106-12,1989.

16. back to textCooke-Fox DI, Kirby GH & Rayner JD. Computer translation of TUPAC systematic organic chemical nomenclature. 3. Syntax analysis and semantic processing. J. Chem. Info. Comp. Sci. 29:112-18,1989.

17. back to text Valls J. Reaction documentation. (Wipke WT, Heller SR, Feldmann RJ & Hyde E, eds.) Computer representation and manipulation of chemical information. New York: Wiley, 1974, p. 83-103.

18. back to textKessler MM. Bibliographic coupling between scientific papers. Amer. Doc. 14:10-25, 1963.

19. back to text Lederberg J. Digital communications and the conduct of science: the new literacy. Proc. IEEE 66:1314-9, 1978.

20. back to text Wells HG. World brain. Garden City, NY: Doubleday, 1938. 130p.

21. back to text Bush V. As we may think. Atlantic Monthly 176:101-8,1945.

22. back to textGarfield E. Manfred Kochen: in memory of an information scientist pioneer qua World Brain-ist. Current Contents (21):3-14, June 19, 1989.