Reviews in Computational Chemistry, Volume 15 - Hardcover

 
9780471361688: Reviews in Computational Chemistry, Volume 15

Inhaltsangabe

THIS VOLUME, WHICH IS DESIGNED FOR STAND-ALONE USE IN TEACHING AND RESEARCH, FOCUSES ON QUANTUM CHEMISTRY, AN AREA OF SCIENCE THAT MANY CONSIDER TO BE THE CENTRAL CORE OF COMPUTATIONAL CHEMISTRY. TUTORIALS AND REVIEWS COVER
* HOW TO OBTAIN SIMPLE CHEMICAL INSIGHT AND CONCEPTS FROM DENSITY FUNCTIONAL THEORY CALCULATIONS,
* HOW TO MODEL PHOTOCHEMICAL REACTIONS AND EXCITED STATES, AND
* HOW TO COMPUTE ENTHALPIES OF FORMATION OF MOLECULES.

A FOURTH CHAPTER TRACES CANADIAN RESEARCH IN THE EVOLUTION OF COMPUTATIONAL CHEMISTRY. ALSO INCLUDED WITH THIS VOLUME IS A SPECIAL TRIBUTE TO QCPE.FROM REVIEWS OF THE SERIES

"Reviews in Computational Chemistry proves itself an invaluable resource to the computational chemist. This series has a place in every computational chemist's library."-Journal of the American Chemical Society

Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.

Über die Autorin bzw. den Autor

Kenny B. Lipkowitz, PhD, is a retired Professor of Chemistry from North Dakota State University.

Donald B. Boyd was apponted Research Professor of Chemistry at Indiana University - Purdue University Indianapolis in 1994. He has published over 100 refereed journal papers and book chapters.

Von der hinteren Coverseite

THIS VOLUME, WHICH IS DESIGNED FOR STAND-ALONE USE IN TEACHING AND RESEARCH, FOCUSES ON QUANTUM CHEMISTRY, AN AREA OF SCIENCE THAT MANY CONSIDER TO BE THE CENTRAL CORE OF COMPUTATIONAL CHEMISTRY. TUTORIALS AND REVIEWS COVER
* HOW TO OBTAIN SIMPLE CHEMICAL INSIGHT AND CONCEPTS FROM DENSITY FUNCTIONAL THEORY CALCULATIONS,
* HOW TO MODEL PHOTOCHEMICAL REACTIONS AND EXCITED STATES, AND
* HOW TO COMPUTE ENTHALPIES OF FORMATION OF MOLECULES.

A FOURTH CHAPTER TRACES CANADIAN RESEARCH IN THE EVOLUTION OF COMPUTATIONAL CHEMISTRY. ALSO INCLUDED WITH THIS VOLUME IS A SPECIAL TRIBUTE TO QCPE.FROM REVIEWS OF THE SERIES

"Reviews in Computational Chemistry proves itself an invaluable resource to the computational chemist. This series has a place in every computational chemist's library."-Journal of the American Chemical Society

Aus dem Klappentext

THIS VOLUME, WHICH IS DESIGNED FOR STAND-ALONE USE IN TEACHING AND RESEARCH, FOCUSES ON QUANTUM CHEMISTRY, AN AREA OF SCIENCE THAT MANY CONSIDER TO BE THE CENTRAL CORE OF COMPUTATIONAL CHEMISTRY. TUTORIALS AND REVIEWS COVER
* HOW TO OBTAIN SIMPLE CHEMICAL INSIGHT AND CONCEPTS FROM DENSITY FUNCTIONAL THEORY CALCULATIONS,
* HOW TO MODEL PHOTOCHEMICAL REACTIONS AND EXCITED STATES, AND
* HOW TO COMPUTE ENTHALPIES OF FORMATION OF MOLECULES.

A FOURTH CHAPTER TRACES CANADIAN RESEARCH IN THE EVOLUTION OF COMPUTATIONAL CHEMISTRY. ALSO INCLUDED WITH THIS VOLUME IS A SPECIAL TRIBUTE TO QCPE.FROM REVIEWS OF THE SERIES

"Reviews in Computational Chemistry proves itself an invaluable resource to the computational chemist. This series has a place in every computational chemist's library."-Journal of the American Chemical Society

Auszug. © Genehmigter Nachdruck. Alle Rechte vorbehalten.

Reviews in Computational Chemistry

By Donald B. Boyd

Wiley-VCH Verlag GmbH

Copyright © 2000 Donald B. Boyd
All right reserved.

ISBN: 9780471361688

Chapter One

Clustering Methods and Their Uses in Computational Chemistry

Geoff M. Downs and John M. Barnard

Barnard Chemical Information Ltd., 46 Uppergate Road, Stannington, Sheffield S6 6BX, United Kingdom

INTRODUCTION

Clustering is a data analysis technique that, when applied to a set of heterogeneous items, identifies homogeneous subgroups as defined by a given model or measure of similarity. Of the many uses of clustering, a prime motivation for the increasing interest in clustering methods is their use in the selection and design of combinatorial libraries of chemical structures pertinent to pharmaceutical discovery.

One feature of clustering is that the process is unsupervised, that is, there is no predefined grouping that the clustering seeks to reproduce. In contrast to supervised learning, where the task is to establish relationships between given inputs and outputs to enable prediction of the output from new inputs, in unsupervised learning only the inputs are available and the task is to reveal aspects of the underlying distribution of the input data. Clustering is thus complemented by the related supervised process of classification, in which items are assigned labels applied to predefined groups: examples include recursive partitioning, naive Bayesian analysis, and K nearest-neighbor selection. Clustering is a technique for exploratory data analysis and is used increasingly in preliminary analyses of large data sets of medium and high dimensionality as a method of selection, diversity analysis, and data reduction. This chapter reviews the main clustering methods that are used for analyzing chemical data sets and gives examples of their application in pharmaceutical companies. Compared to the other costs of drug discovery, clustering can add significant value at minimal cost. First, we provide an outline of clustering as a discipline and define some of the terminology. Then, we give a brief tutorial on clustering algorithms, review progress in developing the methods, and offer some example applications.

Clustering methodology has been developed and used in a variety of areas including archaeology, astronomy, biology, computer science, electronics, engineering, information science, and medicine. Good, general introductory texts on the topic of clustering include those by Sneath and Sokal, Kaufmann and Rousseeuw, Everitt, and Gordon. The main text that is devoted to clustering of chemical data sets is by Willett, with review articles by Bratchell, Barnard and Downs, and Downs and Willett. The present chapter is a complement and update to the latter article. In a previous volume of this series, Lewis, Pickett, and Clark reviewed the use of diversity analysis techniques in combinatorial library design.

As will be shown in the section on Chemical Applications, the current main uses of clustering for chemical data sets are to find representative subsets from high throughput screening (HTS) and combinatorial chemistry, and to increase the diversity of in-house data sets through selection of additional compounds from other data sets. Methods suitable for compound selection are the main focus of this chapter. The methods must be able to handle large data sets of high-dimensional data. For small, low-dimensional data sets, most clustering methods are applicable, and descriptions in the standard texts and implementations available in standard statistical software packages suffice. Implementations designed for use on chemical data sets are available from most of the specialist software vendors, the majority of which were reviewed by Warr.

The overall process of clustering involves the following steps:

1. Generate appropriate descriptors for each compound in the data set.

2. Select an appropriate similarity measure.

3. Use an appropriate clustering method to cluster the data set.

4. Analyze the results.

This chapter focuses on step 3. For step 1, descriptors may include property values, biological properties, topological indexes, and structural fragments. The performance of these descriptors and forms of representation have been analyzed by Brown and Brown and Martin. Similarity searching for step 2 has been discussed by Downs and Willett; characteristics of various similarity measures have been discussed by Barnard, Downs, and Willett. For step 4, little has been published specifically about visualization and analysis of results for chemical data sets. However, most publications that focus on implementing systems that utilize clustering do provide details of how the results were displayed or analyzed.

The terminology associated with clustering is extensive, with many terms used to describe the same thing (reflecting the separate development of clustering methods within a multitude of disciplines). Clusters can be overlapping or nonoverlapping; if a compound occurs in more than one cluster, the clusters are overlapping. At one extreme, each compound is a member of all clusters to a certain degree. An example of this is fuzzy clustering in which the degree of membership of an individual compound is in the range 0 to 1, and the total membership summed across all clusters is normally required to be 1. This scheme contrasts with crisp clustering in which each compound's degree of membership in any cluster is either 0 or 1. At the other extreme, is the situation wherein each compound is a member of exactly one cluster, in which case the clusters are said to be nonoverlapping. Intermediate situations sometimes occur, where compounds can be members of several, though not of all, clusters. The majority of clustering methods used on chemical data sets generate crisp, nonoverlapping clusters, because analysis of such clusters is relatively simple.

If a data set is analyzed in an iterative way, such that at each step a pair of clusters is merged or a single cluster is divided, the result is hierarchical, with a parent-child relationship being established between clusters at each successive level of the iteration. The successive levels can be visualized using a dendrogram, as shown in Figure 1. Each level of the hierarchy represents a partitioning of the data set into a set of clusters. In contrast, if the data set is analyzed to produce a single partition of the compounds resulting in a set of clusters, the result is then nonhierarchical. Note that the term partitioning in this context is different from the technique of partitioning (otherwise known as cell-based partitioning). The latter technique is a method of classification rather than of clustering, and a useful review of it, as applied to chemical data sets, is given by Mason and Pickett. A broad classification of the most common clustering methods is shown in Figure 2. Note that, with the wide range of clustering methods devised, some can be placed in more than one of the given categories.

If a hierarchical method starts with all compounds as singletons (in clusters by themselves) and the latter are merged iteratively until all compounds are in a single cluster, the method is said to be agglomerative. With respect to the dendrogram in Figure 1, it is a bottom-up approach. If the hierarchical method starts with all compounds in a single cluster and iteratively splits one cluster into two until all compounds are singletons, the method is divisive, that is, it is a top-down approach. If, at each split, only one...

„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.