Book: Hadoop Operations
Over the past few years, there has been a fundamental shift in data storage, management,
and processing. Companies are storing more data from more sources in more
formats than ever before. This isn’t just about being a “data packrat” but rather building
products, features, and intelligence predicated on knowing more about the world
(where the world can be users, searches, machine logs, or whatever is relevant to an
organization). Organizations are finding new ways to use data that was previously believed
to be of little value, or far too expensive to retain, to better serve their constituents.
Sourcing and storing data is one half of the equation. Processing that data to
produce information is fundamental to the daily operations of every modern business.
If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance.
Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.
Get a high-level overview of HDFS and MapReduce: why they exist and how they work
Plan a Hadoop deployment, from hardware and OS selection to network requirements
Learn setup and configuration details with a list of critical properties
Manage resources by sharing a cluster across multiple groups
Get a runbook of the most common cluster maintenance tasks
Monitor Hadoop clusters—and learn troubleshooting with the help of real-world war stories
Use basic tools and techniques to handle backup and catastrophic failure