Simplify Hadoop programming to create complex endtoend Enterprise Big Data solutions with PigAbout This Book
- Quickly understand how to use Pig to design end-to-end Big Data systems
- Implement a hands-on programming approach using design patterns to solve commonly occurring enterprise Big Data challenges
- Enhances users' capabilities to utilize Pig and create their own design patterns wherever applicable
Who This Book Is For
The experienced developer who is already familiar with Pig and is looking for a use case standpoint where they can relate to the problems of data ingestion, profiling, cleansing, transforming, and egressing data encountered in the enterprises. Knowledge of Hadoop and Pig is necessary for readers to grasp the intricacies of Pig design patterns better.
What You Will Learn
- Understand Pig's relevance in an enterprise context
- Use Pig in design patterns that enable data movement across platforms during and after analytical processing
- See how Pig can co-exist with other components of the Hadoop ecosystem to create Big Data solutions using design patterns
- Simplify the process of creating complex data pipelines using transformations, aggregations, enrichment, cleansing, filtering, reformatting, lookups, and data type conversions
- Apply knowledge of Pig in design patterns that deal with integration of Hadoop with other systems to enable multi-platform analytics
- Comprehend design patterns and use Pig in cases related to complex analysis of pure structured data
In Detail
Pig Design Patterns is a comprehensive guide that will enable readers to readily use design patterns that simplify the creation of complex data pipelines in various stages of data management. This book focuses on using Pig in an enterprise context, bridging the gap between theoretical understanding and practical implementation. Each chapter contains a set of design patterns that pose and then solve technical challenges that are relevant to the enterprise use cases.
The book covers the journey of Big Data from the time it enters the enterprise to its eventual use in analytics, in the form of a report or a predictive model. By the end of the book, readers will appreciate Pig's real power in addressing each and every problem encountered when creating an analytics-based data product. Each design pattern comes with a suggested solution, analyzing the trade-offs of implementing the solution in a different way, explaining how the code works, and the results.
Pradeep Pasupuleti has over 16 years of experience in architecting and developing distributed and realtime datadriven systems. Currently, his focus is on developing robust data platforms and data products that are fuelled by scalable machinelearning algorithms, and delivering value to customers by addressing business problems by juxtaposing his deep technical insights into Big Data technologies with future data management and analytical needs. He is extremely passionate about Big Data and believes that it will be the cradle of many innovations that will save humans their time, money, and lives. He has built solid data product teams with experience spanning through every aspect of data science, thus successfully helping clients to build an endtoend strategy around how their current data architecture can evolve into a hybrid pattern that is capable of supporting analytics in both batch and real time-all of this is done using the lambda architecture. He has created COE's (Center of Excellence) to provide quick wins with data products that analyze highdimensional multistructured data using scalable natural language processing and deep learning techniques. He has performed roles in technology consulting advising Fortune 500 companies on their Big Data strategy, product management, systems architecture, social network analysis, negotiations, conflict resolution, chaos and nonlinear dynamics, international policy, highperformance computing, advanced statistical techniques, risk management, marketing, visualization of high dimensional data, humancomputer interaction, machine learning, information retrieval, and data mining. He has a strong experience of working in ambiguity to solve complex problems using innovation by bringing smart people together. His other interests include writing and reading poetry, enjoying the expressive delights of ghazals, spending time with kids discussing impossible inventions, and searching for archeological sites. You can reach him at http://www.linkedin.com/in/pradeeppasupuleti and pasupuleti.pradeepkumar@gmail.com.