Bisecting k-means

The bisecting k-means clustering algorithm combines k-means clustering with divisive hierarchy clustering.

The bisecting k-means clustering algorithm combines k-means clustering with divisive hierarchy clustering. With bisecting k-means, you get not only the clusters but also the hierarchical structure of the clusters of data points.

This hierarchy is more informative than the unstructured set of flat clusters returned by K-means. The hierarchy shows how the clustering results would look at every step of the process of bisecting clusters to find new clusters. The hierarchy of clusters makes it easier to decide the number of clusters in the data set.

Given a hierarchy of k clusters produced by bisecting k-means, you can easily calculate any prediction of the form: Assume the data contain only k' clusters, where k' is a number that is smaller than or equal to the k used to train the model.

For a complete example of how to use bisecting k-means to analyze a table in Vertica, see Clustering data hierarchically using bisecting k-means.