Big Data Analytics: Disruptive Technologies for Changing the Game - Softcover

Sathi, Arvind

 
9781583473801: Big Data Analytics: Disruptive Technologies for Changing the Game

Inhaltsangabe

Bringing a practitioner&;s view to big data analytics, this work examines the drivers behind big data, postulates a set of use cases, identifies sets of solution components, and recommends various implementation approaches. This work also addresses and thoroughly answers key questions on this emerging topic, including What is big data and how is it being used? How can strategic plans for big data analytics be generated? and How does big data change analytics architecture? The author, who has more than 20 years of experience in information management architecture and delivery, has drawn the material from a large breadth of workshops and interviews with business and information technology leaders, providing readers with the latest in evolutionary, revolutionary, and hybrid methodologies of moving forward to the brave new world of big data.

Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.

Über die Autorin bzw. den Autor

Dr. Arvind Sathi is the worldwide communication sector lead architect for the information agenda team at IBM. He received a PhD in Business Administration from Carnegie Mellon University and is the author of Customer Experience Analytics. He lives in Englewood, Colorado.

Auszug. © Genehmigter Nachdruck. Alle Rechte vorbehalten.

Big Data Analytics

Disruptive Technologies for Changing the Game

By Arvind Sathi

MC Press

Copyright © 2012 IBM Corporation
All rights reserved.
ISBN: 978-1-58347-380-1

Contents

Foreword by Bob Keseley,
Foreword by Jeff Jonas,
Chapter 1: Introduction,
Chapter 2: Drivers for Big Data?,
Chapter 3: Big Data Analytics Applications,
Chapter 4: Architecture Components,
Chapter 5: Advanced Analytics Platform,
Chapter 6: Implementation of Big Data Analytics,
Chapter 7: Closing Thoughts,
Notes,
Abbreviations,


CHAPTER 1

Introduction

Big Data Analytics is a popular topic. While everyone has heard stories of new Silicon Valley valuation bubbles and critical shortages of data scientists, there are an equal number of concerns: Will it take away my current investment in Business Intelligence or replace my organization? How do I integrate my Data Warehouse and Business Intelligence with Big Data? How do I get started, so I can show some results? What are the skills required? What happens to data governance? How do we deal with data privacy?

Over the past 9 to 12 months, I have conducted many workshops with practitioners in this field. I am always fascinated with the two views that so often clash in the same room — the bright-eyed explorers ready to share their data and the worriers identifying ways this can lead to trouble. A similar divide exists among consumers. As in any new field, implementation of Big Data requires a delicate balance between the two views and a robust architecture that can accommodate divergent concerns.

Unlike many other Big Data Analytics blogs and books that cover the basics and technological underpinnings, this book takes a practitioner's viewpoint. It identifies the use cases for Big Data Analytics, its engineering components, and how Big Data is integrated with business processes and systems. In doing so, it respects the large investments in Data Warehouse and Business Intelligence and shows both evolutionary and revolutionary — as well as hybrid — ways of moving forward to the brave new world of Big Data. It deliberates on serious topics of data privacy and corporate governance and how we must take care in the implementation of Big Data programs to safeguard our data, our customers' privacy, and our products.

So, what is Big Data? There are two common sources of data grouped under the banner of Big Data. First, we have a fair amount of data within the corporation that, thanks to automation and access, is increasingly shared. This includes emails, mainframe logs, blogs, Adobe PDF documents, business process events, and any other structured, unstructured, or semi-structured data available inside the organization. Second, we are seeing a lot more data outside the organization — some available publicly free of cost, some based on paid subscription, and the rest available selectively for specific business partners or customers. This includes information available on social media sites, product literature freely distributed by competitors, corporate customers' organization hierarchies, helpful hints available from third parties, and customer complaints posted on regulatory sites.

Many organizations are trying to incentivize customers to create new data. For example, Foursquare (www.foursquare.com) encourages me to document my visits to a set of businesses advertised through Foursquare. It provides me with points for each visit and rewards me with the "Mayor" title if I am the most frequent visitor to a specific business location. For example, every time I visit Tokyo Joe's — my favorite nearby sushi place — I let Foursquare know about my visit and collect award points. Presumably, Foursquare, Tokyo Joe's, and all the competing sushi restaurants can use this information to attract my attention at the next meal opportunity.

Sunil Soares has identified five types of Big Data: web and social media, machine-to-machine (M2M), big transaction data, biometrics, and human generated. Here are some examples of Big Data that I will use in this book:

• Social media text

• Cell phone locations

• Channel click information from set-top box

• Web browsing and search

• Product manuals

• Communications network events

• Call detail records (CDRs)

• Radio Frequency Identification (RFID) tags

• Maps

• Traffic patterns

• Weather data

• Mainframe logs

Why is Big Data different from any other data that we have dealt with in the past? There are "four V's" that characterize this data: Volume, Velocity, Variety, and Veracity. Some analysts have added other V's to this list, but for the purpose of this book, I will focus on the four V's described here.


1.1 Volume

Most organizations were already struggling with the increasing size of their databases as the Big Data tsunami hit the data stores. According to Fortune magazine, we created 5 exabytes of digital data in recorded time until 2003. In 2011, the same amount of data was created in two days. By 2013, that time period is expected to shrink to just 10 minutes.

A decade ago, organizations typically counted their data storage for analytics infrastructure in terabytes. They have now graduated to applications requiring storage in petabytes. This data is straining the analytics infrastructure in a number of industries. For a communications service provider (CSP) with 100 million customers, the daily location data could amount to about 50 terabytes, which, if stored for 100 days, would occupy about 5 petabytes. In my discussions with one cable company, I learned that they discard most of their network data at the end of the day because they lack the capacity to store it. However, regulators have asked most CSPs and cable operators to store call detail records and associated usage data. For a 100-million-subscriber CSP, the CDRs could easily exceed 5 billion records a day. As of 2010, AT&T had 193 trillion CDRs in its database.


1.2 Velocity

There are two aspects to velocity, one representing the throughput of data and the other representing latency. Let us start with throughput, which represents the data moving in the pipes. The amount of global mobile data is growing at a 78 percent compounded growth rate and is expected to reach 10.8 exabytes per month in 2016 as consumers share more pictures and videos. To analyze this data, the corporate analytics infrastructure is seeking bigger pipes and massively parallel processing.

Latency is the other measure of velocity. Analytics used to be a "store and report" environment where reporting typically contained data as of yesterday — popularly represented as "D-1." Now, the analytics is increasingly being embedded in business processes using data-in-motion with reduced latency. For example, Turn (www.turn.com) is conducting its analytics in 10 milliseconds to place advertisements in online advertising platforms.


1.3 Variety

In the 1990s, as Data Warehouse technology was rapidly introduced, the initial push was to create meta-models to represent all the data in one standard format. The data was compiled from a variety of sources and transformed using ETL (Extract, Transform, Load) or ELT (Extract the data and Load it in the warehouse, then Transform it inside the warehouse). The basic premise was narrow variety and structured content. Big Data has significantly expanded our horizons,...

„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.

Weitere beliebte Ausgaben desselben Titels

9781583476208: Big Data Analytics: Disruptive Technologies for Changing the Game

Vorgestellte Ausgabe

ISBN 10:  1583476202 ISBN 13:  9781583476208
Softcover