Book: Data Clustering in C++: An Object-Oriented Approach (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
Publisher: CRC Press
Data clustering is a highly interdisciplinary field whose goal is to divide a
set of objects into homogeneous groups such that objects in the same group
are similar and objects in different groups are quite distinct. Thousands of
papers and a number of books on data clustering have been published over
the past 50 years. However, almost all papers and books focus on the theory
of data clustering. There are few books that teach people how to implement
data clustering algorithms.
This book was written for anyone who wants to implement data clustering
algorithms and for those who want to implement new data clustering algorithms
in a better way. Using object-oriented design and programming techniques,
I have exploited the commonalities of all data clustering algorithms
to create a flexible set of reusable classes that simplifies the implementation
of any data clustering algorithm. Readers can follow me through the development
of the base data clustering classes and several popular data clustering
This book focuses on how to implement data clustering algorithms in an
object-oriented way. Other topics of clustering such as data pre-processing,
data visualization, cluster visualization, and cluster interpretation are touched
but not in detail. In this book, I used a direct and simple way to implement
data clustering algorithms so that readers can understand the methodology
easily. I also present the material in this book in a straightforward way. When
I introduce a class, I present and explain the class method by method rather
than present and go through the whole implementation of the class.
Complete listings of classes, examples, unit test cases, and GNU configuration
files are included in the appendices of this book as well as in the
CD-ROM of the book. I have tested the code under Unix-like platforms (e.g.,
Ubuntu and Cygwin) and Microsoft Windows XP. The only requirements to
compile the code are a modern C++ compiler and the Boost C++ libraries.
This book is divided into three parts: Data Clustering and C++ Preliminaries,
A C++ Data Clustering Framework, and Data Clustering Algorithms.
The first part reviews some basic concepts of data clustering, the unified
modeling language, object-oriented programming in C++, and design patterns.
The second part develops the data clustering base classes. The third
part implements several popular data clustering algorithms. The content of
each chapter is described briefly below.