Text atop reads "k-means clustering: assign each observation to one of k clusters based on the nearest cluster centroid." Illustrated below introduce the characters - "observations", shown as small, smiling gray blogs, and "cluster centroids" as larger green, pink and purple circles with a spiral pattern.

Header text: "(1) Specify the number of clusters (in this example, k = 3). Then imagine k cluster centroids are created." Illustrated smiling cluster centroids say "hi!", "hello." and "howdy y'all."

Header text:"(2) Those k centroids get randomly placed in your space." The illustration below shows the pink, green and purple centroids getting hurled into a space containing many small gray blogs representing observations. Text pointing to one observation reads "observations, not currently assigned to any cluster."

Header text: "(3) Each observation gets temporarily assigned to its closest centroid, e.g. by Euclidean distance." Each of the originally gray observations now have their color updated based on the color of the centroid they are closest to. Additional text reads "Note: observations stay put, but their color updates to nearest centroid."

Header text: "(4) Then the centroid of each cluster is calculated based on all observations assigned to that cluster..." The red, purple and pink centroids are moving to different locations at the center of their assigned observations. Additional text reads "Centroids move from original location to recalculated centroid."

Header text: "...UH OH. Now that the cluster centroids have moved, some of the observations are now closer to a different centroid." Highlighted are some green observations, now looking very anxious, that are now closer to the purple centroid. Text points to these observations: "These 4 that were originally assigned to green are now closer to the purple centroid!"

Header text: "(5) No problem! Observations get reassigned to a different cluster based on the recalculated centroid." The four observations that were green, but closer to purple, are now reassigned to the purple cluster. They look happy again. Additional text reads "These 4 now assigned to the purple cluster!"

Header text: "(6) But now that observations have been reassigned, the centroids need to move again [recalculate centroids from updated clusters]." The green, purple, and pink centroids move slightly as they are recalculated. Stylized text reads "beep boop beep RECALCULATING CENTROIDS" in robotic looking letters.

Header text: "(7) Again, now observations are reassigned as needed to the closest centroid." Several observations' colors change as they are reassigned to the closest cluster centroid.

Header text: "...Then the centroid for each cluster is recalculated...which means observations will be reassigned." The green, purple and pink centroids move slightly as they are recalculated.

Text: "That iterative process of: Recalculate cluster centroids, then Reassign observations to nearest centroid, then Recalculate cluster centroids, then Reassign observations to the nearest centroid, etc...Continues until nothing is moving or being reassigned anymore!"

Header text: "Which means the iteration is done and each observation is assigned to its final cluster." Each smiling little observation blog now has a set color depending on the final cluster they were assigned to.

Back to Top