Verwandte Artikel zu Data Governance Tools: Evaluation Criteria, Big Data...

Data Governance Tools: Evaluation Criteria, Big Data Governance, and Alignment with Enterprise Data Management - Softcover

 
9781583478448: Data Governance Tools: Evaluation Criteria, Big Data Governance, and Alignment with Enterprise Data Management

Inhaltsangabe

Comprehensively covers evaluation criteria for and capabilities of the software tools available for implementing a data governance program
 
Data governance programs often start off using programs such as Microsoft Excel and Microsoft SharePoint to document and share data governance artifacts. But these tools often lack critical functionality. Meanwhile, vendors have matured their data governance offerings to the extent that today's organizations need to consider tools as a critical component of their data governance programs. In this book, data governance expert Sunil Soares reviews the Enterprise Data Management (EDM) reference architecture and discusses key data governance tasks that can be automated by tools for business glossaries, metadata management, data profiling, data quality management, master data management, reference data management, and information policy management. Subsequent sections describe the integration points between EDM tools and data governance and examine how governance tools interact with big data technologies, including Hadoop, NoSQL, stream computing, and text analytics. The final section of the book discusses evaluation criteria for data governance tools and provides an overview of key vendor platforms, including ASG, Collibra, Global IDs, IBM, Informatica, Orchestra Networks, SAP, and Talend.

Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.

Über die Autorin bzw. den Autor

Sunil Soares is the director of data governance within IBM Software Group. He is a former director of Worldwide Channels and Alliances for InfoSphere, IBM Software Group, where he worked with a number of partners about their Data Governance practices. He lives in Harrington Park, New Jersey.

Auszug. © Genehmigter Nachdruck. Alle Rechte vorbehalten.

Data Governance Tools

By Sunil Soares

MC Press

Copyright © 2014 Sunil Soares
All rights reserved.
ISBN: 978-1-58347-844-8

Contents

About the Author,
Forewords,
Preface,
PART I — Introduction,
1: An Introduction to Data Governance,
2: Enterprise Data Management Reference Architecture,
PART II — Categories of Data Governance Tools,
3: The Business Glossary,
4: Metadata Management,
5: Data Profiling,
6: Data Quality Management,
7: Master Data Management,
8: Reference Data Management,
9: Information Policy Management,
PART III — The Integration Between Enterprise Data Management and Data Governance Tools,
10: Data Modeling,
11: Data Integration,
12: Analytics and Reporting,
13: Business Process Management,
14: Data Security and Privacy,
15: Information Lifecycle Management,
PART IV — Big Data Governance Tools,
16: Hadoop and NoSQL,
17: Stream Computing,
18: Text Analytics,
PART V — Evaluation Criteria and the Vendor Landscape,
19: The Evaluation Criteria for Data Governance Platforms,
20: ASG,
21: Collibra,
22: Global IDs,
23: IBM,
24: Informatica,
25: Orchestra Networks,
26: SAP,
27: Talend,
28: Notable Vendors,
Appendix A: List of Acronyms,
Appendix B: Glossary,
Appendix C: Potential Data Governance Tasks to Be Automated with Tools,
Index,


CHAPTER 1

An Introduction to Data Governance


We are in the middle of a major shift in the market. Scarcely a day goes by without a company restating earnings or a bank having to set aside more capital to deal with unforeseen losses. A lot of these issues can be traced back to poor data governance. The rise of big data makes data governance even more important. After all, companies need to trust their big data before they invest huge sums of money in analyzing it. The marketplace is gradually coalescing around data governance as a separate and unique discipline. A search on LinkedIn produces thousands of hits for "data governance." Vendors such as ASG, Collibra, Informatica, IBM, and Talend have released offerings to support data governance.


Definition

Data governance can be defined as follows:

Data governance is the formulation of policy to optimize, secure, and leverage information as an enterprise asset by aligning the objectives of multiple functions.


By decomposing this definition, we lay out the essential prerequisites of data governance:

Formulate policy — Policy includes the written or unwritten declarations of how people should behave in a given situation. For example, data governance might institute a "search before create" policy that requires customer service agents to avoid duplicates by searching for an existing customer record before creating a new one.

Optimize information — Consider how organizations might apply the principles of the physical world to their information. Companies have well-defined enterprise asset management programs to care for their machinery, aircraft, vehicles, and other physical assets. Over the past decade, companies have seen an explosion in the volume of this information. With the onset of big data, it is nearly impossible for companies to know where all this information is located. Similar to cataloging physical assets, organizations need to build inventories of their existing information. We refer to this process as "data profiling" or "data discovery," and cover it later in this book. In addition, all companies have routine preventive maintenance programs for their physical assets. Companies need to institute similar maintenance programs around the information about their customers, vendors, products, and assets. We refer to this process as "data quality management," also covered later in this book.

Secure information — Organizations need to secure business-critical data within their enterprise applications from unauthorized access, since this can affect the integrity of their financial reporting, as well as the quality and reliability of daily business decisions. They must also protect sensitive customer information such as credit card numbers as well as intellectual property such as customer lists, product designs, and proprietary algorithms from both internal and external threats.

Leverage information — Organizations need to get the maximum value out of their information to support broader initiatives that grow revenues, reduce costs, and manage risk.

Treat information as an enterprise asset — Traditional accounting rules do not allow companies to treat information as a financial asset on their balance sheets unless it is purchased from external sources. Despite this conservative accounting treatment, organizations now recognize that they should treat information as an asset.

Align the objectives of multiple functions — Because multiple functions leverage the same information, their objectives need to be reconciled as part of a data governance program. For example, ownership of customer data is typically an issue when different departments use that information for different purposes. This can result in challenges such as inconsistent definitions for the term "customer."


Case Study

Let's review a situation that shows the impact of poor data governance on people's lives. Case Study 1.1 details the unfortunate events surrounding the Mars Climate Orbiter.


The Pillars of Data Governance

Most business initiatives rest on the three pillars of people, process, and technology. Data governance programs have traditionally focused on people and process. Because data governance programs have often started from scratch with little or no funding, technology has historically not been a key consideration. The remainder of this book focuses on the technology pillar of data governance programs.


Summary

In this chapter, we defined data governance as the formulation of policy to optimize, secure, and leverage information as an enterprise asset by aligning the objectives of multiple functions. While traditional data governance programs have focused on people and process, this book focuses on technology.

CHAPTER 2

Enterprise Data Management Reference Architecture


Enterprise Data Management (EDM) refers to the ability of an organization to precisely define, easily integrate, and effectively retrieve data for both internal applications and external communication.

Like data governance, EDM involves the three pillars of people, process, and technology. Also like data governance, there has been a historical emphasis in EDM on the people and process pillars. However, the technology pillar is at least as important as the other two because it makes data governance tangible in the eyes of business users. The EDM reference architecture includes 20 categories, as shown in Figure 2.1.


EDM Categories

EDM consists of a number of categories. Some of these categories are more closely tied to data governance than others. In addition, these categories are interrelated in several important aspects. A high-level description of the 20 categories of EDM follows; the rest of the book goes into more detail:

1. Data Sources — At the very bottom, we have the data sources that need to be governed. These data sources may be internal or external to the organization. Internal data sources include enterprise applications such as SAP, Oracle, and Salesforce. External data sources include social media, sensor data, and information purchased from data brokers.

2. Databases — Databases fall into a few different categories:

* In-Memory — In-memory database management systems rely on main memory for data storage. Compared to traditional database management systems that store data to disk, in-memory databases are optimized for speed. SAP HANA, Oracle TimesTen In-Memory Database, and IBM solidDB are all examples of in-memory databases.

* Relational — Relational database management systems (RDBMSs) rely on relational data and are at the heart of most distributed computing platforms today. IBM DB2, Oracle Database 12c, and Microsoft SQL Server are all examples of RDBMS solutions.

* Legacy — Legacy database management systems such as IBM Information Management System (IMS) rely on non-relational approaches to database management.

3. Data Modeling — Data modeling is a critical exercise to develop an understanding of an organization's data artifacts. Data modeling tools include Embarcadero ERwin Data Modeler, SAP PowerDesigner, Embarcadero ER/Studio, and IBM InfoSphere Data Architect.

4. Data Integration — Data integration tools fall into a few different categories:

* Bulk Data Movement — Bulk data movement includes technologies such as Extract, Transform, and Load (ETL) to extract data from one or more data sources, transform the data, and load the data into a target database. Tools include IBM InfoSphere Data Stage and Informatica PowerCenter.

* Data Replication — According to Information Management Magazine, data replication is the process of copying a portion of a database from one environment to another and keeping the subsequent copies of the data in sync with the original source. Changes made to the original source are propagated to copies of the data in other environments. Replication technologies such as change data capture (CDC) allow the capture of only change data and transfer it from publisher to subscriber systems. Replication tools include IBM InfoSphere Data Replication, Oracle GoldenGate, Informatica Fast Clone, and Informatica Data Replication.

* Data Visualization — Data virtualization is also known as data federation. According to Information Management Magazine, data federation is the method of linking data from two or more physically different locations and making the access/linkage appear transparent, as if the data were co-located. This approach is in contrast to the data warehouse method of housing data in one place and accessing data from that single location. Data virtualization allows an application to issue SQL queries against a virtual view of data in heterogeneous sources such as in relational databases, XML documents, and on the mainframe. Offerings include IBM InfoSphere Federation Server, Informatica Data Services, and Denodo.

5. Data Profiling — Data profiling is the process of understanding the data in a system, where it is located, and how it relates to other systems. This includes developing a statistical analysis of the data, such as data type, null percentages, and uniqueness. While there might be some nuances that distinguish data profiling from data discovery, we will use the terms synonymously in this book.

In the absence of tools, data analysts have historically resorted to the use of SQL queries to discover and profile data. Offerings include IBM InfoSphere Information Analyzer, Informatica Data Quality, Oracle Enterprise Data Quality, SAP Information Steward, SAS Data Management, and Trillium Software TS Discovery. These tools support a variety of data sources, including Hadoop.

6. Data Quality — Data quality management is a discipline that includes the methods to measure and improve the quality and integrity of an organization's data. While data profiling uncovers issues with the data, data quality actually remediates those issues. Offerings include IBM InfoSphere QualityStage, Informatica Data Quality, Oracle Enterprise Data Quality, SAP Information Steward, SAS Data Management, and Trillium Software TS Quality.

7. Business Glossary — A business glossary is a repository of key terms that brings together common definitions across business and IT. Offerings include Adaptive Business Glossary Manager, ASG-metaGlossary, Collibra Business Glossary, Embarcadero CONNECT, IBM InfoSphere Business Glossary, Informatica Business Glossary, and SAS Data Management.

8. Metadata — Metadata is information that describes the characteristics of any data artifact, such as its name, location, perceived importance, quality, or value to the enterprise, and its relationships to other data artifacts that the enterprise deems worth managing. Offerings include Adaptive Metadata Manager, ASG-Rochade, Data Advantage Group MetaCenter, IBM InfoSphere Metadata Workbench, and Informatica Metadata Manager.

9. Information Policy Management — Most business glossary vendors also support the management of information policies. In addition, Governance, Risk, and Compliance (GRC) platforms such as EMC RSA Archer GRC and IBM OpenPages GRC also offer capabilities to manage broader policies, including those relating to information.

10. Master Data Management — Master Data Management (MDM) refers to the discipline associated with establishing a single version of the truth for critical data domains such as customer, vendor, product, location, asset, employee, and chart of accounts. MDM vendors include IBM, Informatica, Oracle, Orchestra Networks, Riversand, SAP, SAS, Semarchy, Stibo Systems, and Talend.

11. Reference DataManagement — Reference data is relatively static and may be placed in lookup tables for reference by other applications. Reference data is sometimes referred to as code tables, code lists, code sets, and lists of values. Examples of reference data include country codes, state codes, and province codes. Tools include Collibra Data Governance Center, IBM InfoSphere Master Data Management Reference Data Management Hub, Oracle Data Relationship Management, and Orchestra Networks EBX.

12. Data Warehouses and Data Marts — Organizations have large investments in data warehouses and data marts that might be based on the following:

* Relational databases such as Oracle Database 12c and IBM DB2

* Columnar databases such as SAP Sybase IQ and HP Vertica, which are geared toward big data analytics

* Data warehousing appliances such as Oracle Exalytics In-Memory Machine, IBM PureData System for Analytics, SAP HANA, EMC Pivotal Greenplum, and Teradata Aster

13. Analytics and Reporting — At the end of the day, organizations need to analyze their data needs to make business decisions. A number of open source and proprietary tools can support big data analytics and reporting. These tools include SAS Business Intelligence, SAS Analytics, IBM Cognos, IBM SPSS, SAP BusinessObjects, Tableau, QlikView, R, and Pentaho.

14. Business Process Management — Business process management (BPM) is a holistic management approach to aligning an organization's business processes with the wants and needs of clients. BPM tools include IBM Business Process Manager and Pega Business Process Management. In addition, the Eclipse open-source framework includes a plug-in for Business Process Model and Notation (BPMN).

15. Data Security and Privacy — This category includes a number of subcategories, including data masking, data tokenization, database encryption, and database monitoring. Offerings include IBM InfoSphere Guardium, IBM InfoSphere Optim Data Privacy, Imperva, Informatica Data Masking, and Protegrity.

16. Information Lifecycle Management — Information lifecycle management (ILM) is a process and methodology for managing information through its lifecycle, from creation through disposal, including compliance with legal, regulatory, and privacy requirements. ILM programs should enable the efficient disposition of information at the end of its usefulness to the business, and in accordance with legal and regulatory obligations. ILM includes a number of sub-categories, including information archiving, records and retention management, legal holds and evidence collection (eDiscovery), and test data management. Vendors include Symantec, IBM, Informatica, EMC, HP, and OpenText.

17. Hadoop and NoSQL — Apache Hadoop is an open source software library that supports the distributed processing of large data sets across thousands of computers based on commodity hardware. The Apache Hadoop project grew out of pioneering work at Yahoo! and Google, where researchers worked with huge volumes of data across large clusters of computers. As with other open source software, Apache Hadoop does not come with product support for things like bug fixes. To address these shortcomings, a number of vendors have released their own distributions of Hadoop, which have undergone release testing. These vendors bundle product support and offer training for an additional fee. Most enterprises that have deployed Hadoop for commercial use have selected one of the Hadoop distributions. Offerings include Cloudera, MapR, Hortonworks, IBM InfoSphere BigInsights, Amazon Elastic MapReduce, and EMC Pivotal Greenplum HD.


(Continues...)
Excerpted from Data Governance Tools by Sunil Soares. Copyright © 2014 Sunil Soares. Excerpted by permission of MC Press.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.

Gebraucht kaufen

Zustand: Gut
May have limited writing in cover...
Diesen Artikel anzeigen

EUR 8,39 für den Versand von USA nach Deutschland

Versandziele, Kosten & Dauer

EUR 11,53 für den Versand von Vereinigtes Königreich nach Deutschland

Versandziele, Kosten & Dauer

Suchergebnisse für Data Governance Tools: Evaluation Criteria, Big Data...

Beispielbild für diese ISBN

Soares, Sunil
Verlag: MC Press, 2014
ISBN 10: 1583478442 ISBN 13: 9781583478448
Gebraucht Paperback

Anbieter: ThriftBooks-Dallas, Dallas, TX, USA

Verkäuferbewertung 5 von 5 Sternen 5 Sterne, Erfahren Sie mehr über Verkäufer-Bewertungen

Paperback. Zustand: Very Good. No Jacket. May have limited writing in cover pages. Pages are unmarked. ~ ThriftBooks: Read More, Spend Less 1.3. Artikel-Nr. G1583478442I4N00

Verkäufer kontaktieren

Gebraucht kaufen

EUR 10,92
Währung umrechnen
Versand: EUR 8,39
Von USA nach Deutschland
Versandziele, Kosten & Dauer

Anzahl: 1 verfügbar

In den Warenkorb

Beispielbild für diese ISBN

Soares, Sunil
Verlag: Mc Pr Llc, 2015
ISBN 10: 1583478442 ISBN 13: 9781583478448
Neu Paperback

Anbieter: Revaluation Books, Exeter, Vereinigtes Königreich

Verkäuferbewertung 5 von 5 Sternen 5 Sterne, Erfahren Sie mehr über Verkäufer-Bewertungen

Paperback. Zustand: Brand New. 1st edition. 368 pages. 9.00x7.25x0.75 inches. In Stock. Artikel-Nr. 1583478442

Verkäufer kontaktieren

Neu kaufen

EUR 85,92
Währung umrechnen
Versand: EUR 11,53
Von Vereinigtes Königreich nach Deutschland
Versandziele, Kosten & Dauer

Anzahl: 1 verfügbar

In den Warenkorb