Book: Programming Hive
Programming Hive introduces Hive, an essential tool in the Hadoop ecosystem that
provides an SQL (Structured Query Language) dialect for querying data stored in the
Hadoop Distributed Filesystem (HDFS), other filesystems that integrate with Hadoop,
such as MapR-FS and Amazon’s S3 and databases like HBase (the Hadoop database)
Most data warehouse applications are implemented using relational databases that use
SQL as the query language. Hive lowers the barrier for moving these applications to
Hadoop. People who know SQL can learn Hive easily. Without Hive, these users must
learn new languages and tools to become productive again. Similarly, Hive makes it
easier for developers to port SQL-based applications to Hadoop, compared to other
tool options. Without Hive, developers would face a daunting challenge when porting
their SQL applications to Hadoop.
Still, there are aspects of Hive that are different from other SQL-based environments.
Documentation for Hive users and Hadoop developers has been sparse. We decided
to write this book to fill that gap. We provide a pragmatic, comprehensive introduction
to Hive that is suitable for SQL experts, such as database designers and business analysts.
We also cover the in-depth technical details that Hadoop developers require for
tuning and customizing Hive.