A hands-on guide to making valuable decisions from data using advanced data mining methods and techniques
This second installment in the Making Sense of Data series continues to explore a diverse range of commonly used approaches to making and communicating decisions from data. Delving into more technical topics, this book equips readers with advanced data mining methods that are needed to successfully translate raw data into smart decisions across various fields of research including business, engineering, finance, and the social sciences.
Following a comprehensive introduction that details how to define a problem, perform an analysis, and deploy the results, Making Sense of Data II addresses the following key techniques for advanced data analysis:
Data Visualization reviews principles and methods for understanding and communicating data through the use of visualization including single variables, the relationship between two or more variables, groupings in data, and dynamic approaches to interacting with data through graphical user interfaces.
Clustering outlines common approaches to clustering data sets and provides detailed explanations of methods for determining the distance between observations and procedures for clustering observations. Agglomerative hierarchical clustering, partitioned-based clustering, and fuzzy clustering are also discussed.
Predictive Analytics presents a discussion on how to build and assess models, along with a series of predictive analytics that can be used in a variety of situations including principal component analysis, multiple linear regression, discriminate analysis, logistic regression, and Naïve Bayes.
Applications demonstrates the current uses of data mining across a wide range of industries and features case studies that illustrate the related applications in real-world scenarios.
Each method is discussed within the context of a data mining process including defining the problem and deploying the results, and readers are provided with guidance on when and how each method should be used. The related Web site for the series (www.makingsenseofdata.com) provides a hands-on data analysis and data mining experience. Readers wishing to gain more practical experience will benefit from the tutorial section of the book in conjunction with the TraceisTM software, which is freely available online.
With its comprehensive collection of advanced data mining methods coupled with tutorials for applications in a range of fields, Making Sense of Data II is an indispensable book for courses on data analysis and data mining at the upper-undergraduate and graduate levels. It also serves as a valuable reference for researchers and professionals who are interested in learning how to accomplish effective decision making from data and understanding if data analysis and data mining methods could help their organization.
Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.
Glenn J. Myatt, PhD, is cofounder of Leadscope, Inc. and a Partner of Myatt & Johnson, Inc., a consulting company that focuses on business intelligence application development delivered through the Internet. Dr. Myatt is the author of Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining, also published by Wiley. WAYNE P. JOHNSON, MSc., is cofounder of Leadscope, Inc. and a Partner of Myatt & Johnson, Inc. Mr. Johnson has over two decades of experience in the design and development of large software systems, and his current professional interests include human–computer interaction, information visualization, and methodologies for contextual inquiry.
A hands-on guide to making valuable decisions from data using advanced data mining methods and techniques
This second installment in the Making Sense of Data series continues to explore a diverse range of commonly used approaches to making and communicating decisions from data. Delving into more technical topics, this book equips readers with advanced data mining methods that are needed to successfully translate raw data into smart decisions across various fields of research including business, engineering, finance, and the social sciences.
Following a comprehensive introduction that details how to define a problem, perform an analysis, and deploy the results, Making Sense of Data II addresses the following key techniques for advanced data analysis:
Data Visualization reviews principles and methods for understanding and communicating data through the use of visualization including single variables, the relationship between two or more variables, groupings in data, and dynamic approaches to interacting with data through graphical user interfaces.
Clustering outlines common approaches to clustering data sets and provides detailed explanations of methods for determining the distance between observations and procedures for clustering observations. Agglomerative hierarchical clustering, partitioned-based clustering, and fuzzy clustering are also discussed.
Predictive Analytics presents a discussion on how to build and assess models, along with a series of predictive analytics that can be used in a variety of situations including principal component analysis, multiple linear regression, discriminate analysis, logistic regression, and Naïve Bayes.
Applications demonstrates the current uses of data mining across a wide range of industries and features case studies that illustrate the related applications in real-world scenarios.
Each method is discussed within the context of a data mining process including defining the problem and deploying the results, and readers are provided with guidance on when and how each method should be used. The related Web site for the series (www.makingsenseofdata.com) provides a hands-on data analysis and data mining experience. Readers wishing to gain more practical experience will benefit from the tutorial section of the book in conjunction with the TraceisTM software, which is freely available online.
With its comprehensive collection of advanced data mining methods coupled with tutorials for applications in a range of fields, Making Sense of Data II is an indispensable book for courses on data analysis and data mining at the upper-undergraduate and graduate levels. It also serves as a valuable reference for researchers and professionals who are interested in learning how to accomplish effective decision making from data and understanding if data analysis and data mining methods could help their organization.
A hands-on guide to making valuable decisions from data using advanced data mining methods and techniques
This second installment in the Making Sense of Data series continues to explore a diverse range of commonly used approaches to making and communicating decisions from data. Delving into more technical topics, this book equips readers with advanced data mining methods that are needed to successfully translate raw data into smart decisions across various fields of research including business, engineering, finance, and the social sciences.
Following a comprehensive introduction that details how to define a problem, perform an analysis, and deploy the results, Making Sense of Data II addresses the following key techniques for advanced data analysis:
Data Visualization reviews principles and methods for understanding and communicating data through the use of visualization including single variables, the relationship between two or more variables, groupings in data, and dynamic approaches to interacting with data through graphical user interfaces.
Clustering outlines common approaches to clustering data sets and provides detailed explanations of methods for determining the distance between observations and procedures for clustering observations. Agglomerative hierarchical clustering, partitioned-based clustering, and fuzzy clustering are also discussed.
Predictive Analytics presents a discussion on how to build and assess models, along with a series of predictive analytics that can be used in a variety of situations including principal component analysis, multiple linear regression, discriminate analysis, logistic regression, and Naïve Bayes.
Applications demonstrates the current uses of data mining across a wide range of industries and features case studies that illustrate the related applications in real-world scenarios.
Each method is discussed within the context of a data mining process including defining the problem and deploying the results, and readers are provided with guidance on when and how each method should be used. The related Web site for the series (www.makingsenseofdata.com) provides a hands-on data analysis and data mining experience. Readers wishing to gain more practical experience will benefit from the tutorial section of the book in conjunction with the TraceisTM software, which is freely available online.
With its comprehensive collection of advanced data mining methods coupled with tutorials for applications in a range of fields, Making Sense of Data II is an indispensable book for courses on data analysis and data mining at the upper-undergraduate and graduate levels. It also serves as a valuable reference for researchers and professionals who are interested in learning how to accomplish effective decision making from data and understanding if data analysis and data mining methods could help their organization.
1.1 OVERVIEW
A growing number of fields, in particular the fields of business and science, are turning to data mining to make sense of large volumes of data. Financial institutions, manufacturing companies, and government agencies are just a few of the types of organizations using data mining. Data mining is also being used to address a wide range of problems, such as managing financial portfolios, optimizing marketing campaigns, and identifying insurance fraud. The adoption of data mining techniques is driven by a combination of competitive pressure, the availability of large amounts of data, and ever increasing computing power. Organizations that apply it to critical operations achieve significant returns. The use of a process helps ensure that the results from data mining projects translate into actionable and profitable business decisions. The following chapter summarizes four steps necessary to complete a data mining project: (1) definition, (2) preparation, (3) analysis, and (4) deployment. The methods discussed in this book are reviewed within this context. This chapter concludes with an outline of the book's content and suggestions for further reading.
1.2 DEFINITION
The first step in any data mining process is to define and plan the project. The following summarizes issues to consider when defining a project:
Objectives: Articulating the overriding business or scientific objective of the data mining project is an important first step. Based on this objective, it is also important to specify the success criteria to be measured upon delivery. The project should be divided into a series of goals that can be achieved using available data or data acquired from other sources. These objectives and goals should be understood by everyone working on the project or having an interest in the project's results.
Deliverables: Specifying exactly what is going to be delivered sets the correct expectation for the project. Examples of deliverables include a report outlining the results of the analysis or a predictive model (a mathematical model that estimates critical data) integrated within an operational system. Deliverables also identify who will use the results of the analysis and how they will be delivered. Consider criteria such as the accuracy of the predictive model, the time required to compute, or whether the predictions must be explained.
Roles and Responsibilities: Most data mining projects involve a cross-disciplinary team that includes (1) experts in data analysis and data mining, (2) experts in the subject matter, (3) information technology professionals, and (4) representatives from the community who will make use of the analysis. Including interested parties will help overcome any potential difficulties associated with user acceptance or deployment.
Project Plan: An assessment should be made of the current situation, including the source and quality of the data, any other assumptions relating to the data (such as licensing restrictions or a need to protect the confidentiality of the data), any constraints connected to the project (such as software, hardware, or budget limitations), or any other issues that may be important to the final deliverables. A timetable of events should be implemented, including the different stages of the project, along with deliverables at each stage. The plan should allot time for cross-team education and progress reviews. Contingencies should be built into the plan in case unexpected events arise. The timetable can be used to generate a budget for the project. This budget, in conjunction with any anticipated financial benefits, can form the basis for a cost-benefit analysis.
1.3 PREPARATION
1.3.1 Overview
Preparing the data for a data mining exercise can be one of the most time-consuming activities; however, it is critical to the project's success. The quality of the data accumulated and prepared will be the single most influential factor in determining the quality of the analysis results. In addition, understanding the contents of the data set in detail will be invaluable when it comes to mining the data. The following section outlines issues to consider when accessing and preparing a data set. The format of different sources is reviewed and includes data tables and nontabular information (such as text documents). Methods to categorize and describe any variables are outlined, including a discussion regarding the scale the data is measured on. A variety of descriptive statistics are discussed for use in understanding the data. Approaches to handling inconsistent or problematic data values are reviewed. As part of the preparation of the data, methods to reduce the number of variables in the data set should be considered, along with methods for transforming the data that match the problem more closely or to use with the analysis methods. These methods are reviewed. Finally, only a sample of the data set may be required for the analysis, and techniques for segmenting the data are outlined.
1.3.2 Accessing Tabular Data
Tabular information is often used directly in the data mining project. This data can be taken directly from an operational database system, such as an ERP (enterprise resource planning) system, a CRM (customer relationship management) system, SCM (supply chain management) system, or databases containing various transactions. Other common sources of data include surveys, results from experiments, or data collected directly from devices. Where internal data is not sufficient for the objective of the data mining exercise, data from other sources may need to be acquired and carefully integrated with existing data. In all of these situations, the data would be formatted as a table of observations with information on different variables of interest. If not, the data should be processed into a tabular format.
Preparing the data may include joining separate relational tables, or concatenating data sources; for example, combining tables that cover different periods in time. In addition, each row in the table should relate to the entity of the project, such as a customer. Where multiple rows relate to this entity of interest, generating a summary table may help in the data mining exercise. Generating this table may involve calculating summarized data from the original data, using computations such as sum, mode (most common value), average, or counts (number of observations). For example, a table may comprise individual customer transactions, yet the focus of the data mining exercise is the customer, as opposed to the individual transactions. Each row in the table should refer to a customer, and additional columns should be generated by summarizing the rows from the original table, such as total sales per product. This summary table will now replace the original table in the data mining exercise.
Many organizations have invested heavily in creating a high-quality, consolidated repository of information necessary for supporting decision-making. These repositories make use of data from operational systems or other sources. Data warehouses are an example of...
„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.
Anbieter: Antiquariat Bookfarm, Löbnitz, Deutschland
Softcover. 291 S. Ehem. Bibliotheksexemplar mit Signatur und Stempel. GUTER Zustand, ein paar Gebrauchsspuren. Ex-library with stamp and library-signature. GOOD condition, some traces of use. w15855 9780470222805 Sprache: Englisch Gewicht in Gramm: 350. Artikel-Nr. 2433997
Anzahl: 1 verfügbar
Anbieter: Revaluation Books, Exeter, Vereinigtes Königreich
Paperback. Zustand: Brand New. 1st edition. 291 pages. 9.10x6.10x0.60 inches. In Stock. Artikel-Nr. x-0470222808
Anzahl: 2 verfügbar
Anbieter: Kennys Bookstore, Olney, MD, USA
Zustand: New. This book provides a general end-to-end discussion concerning the process of translating raw data to scientific and business decisions. The reader's ability to find patterns in data will be greatly enhanced due to the book's combination of statistical learning with powerful visualization techniques. Num Pages: 308 pages, Illustrations. BIC Classification: UNF. Category: (P) Professional & Vocational; (U) Tertiary Education (US: College). Dimension: 233 x 158 x 17. Weight in Grams: 418. . 2009. 1st Edition. paperback. . . . . Books ship from the US and Ireland. Artikel-Nr. V9780470222805
Anzahl: 2 verfügbar
Anbieter: moluna, Greven, Deutschland
Zustand: New. This book provides a general end-to-end discussion concerning the process of translating raw data to scientific and business decisions. The reader s ability to find patterns in data will be greatly enhanced due to the book s combination of statistical learn. Artikel-Nr. 594694120
Anzahl: Mehr als 20 verfügbar
Anbieter: AHA-BUCH GmbH, Einbeck, Deutschland
Taschenbuch. Zustand: Neu. Neuware - A hands-on guide to making valuable decisions from data using advanced data mining methods and techniques. Artikel-Nr. 9780470222805
Anzahl: 2 verfügbar