Book: Advances in K-means Clustering: A Data Mining Thinking (Springer Theses: Recognizing Outstanding Ph.D. Research)
The series ‘‘Springer Theses’’ brings together a selection of the very best Ph.D.
theses from around the world and across the physical sciences. Nominated and
endorsed by two recognized specialists, each published volume has been selected
for its scientific excellence and the high impact of its contents for the pertinent
field of research. For greater accessibility to non-specialists, the published versions
include an extended introduction, as well as a foreword by the student’s supervisor
explaining the special relevance of the work for the field. As a whole, the series
will provide a valuable resource both for newcomers to the research fields
described, and for other scientists seeking detailed background information on
special questions. Finally, it provides an accredited documentation of the valuable
contributions made by today’s younger generation of scientists.
Nearly everyone knows K-means algorithm in the fields of data mining and business intelligence. But the ever-emerging data with extremely complicated characteristics bring new challenges to this "old" algorithm. This book addresses these challenges and makes novel contributions in establishing theoretical frameworks for K-means distances and K-means based consensus clustering, identifying the "dangerous" uniform effect and zero-value dilemma of K-means, adapting right measures for cluster validity, and integrating K-means with SVMs for rare class analysis. This book not only enriches the clustering and optimization theories, but also provides good guidance for the practical use of K-means, especially for important tasks such as network intrusion detection and credit fraud prediction. The thesis on which this book is based has won the "2010 National Excellent Doctoral Dissertation Award", the highest honor for not more than 100 PhD theses per year in China.