attheoaks.com

Enhanced K-means++: The Intelligent Evolution of K-means

Written on

Introduction to K-means Clustering

The K-means clustering algorithm is widely recognized among machine learning enthusiasts and is often one of the first algorithms introduced in academic courses.

To summarize, K-means clustering is a vector quantization technique originating from signal processing. Its primary goal is to divide n observations into k clusters, where each observation is assigned to the cluster with the closest mean, which acts as the cluster's representative.

Challenges with the Standard K-means Algorithm

However, this beloved algorithm has its drawbacks. K-means is particularly sensitive to how the initial centroids, or mean points, are selected. For example, if a centroid is positioned far from the data points, it might not attract any associated points, and conversely, multiple clusters could be grouped under a single centroid.

Enter K-means++

This is where K-means++ comes into play! It retains the core principles of K-means but incorporates a more intelligent method for initializing centroids, thereby enhancing the quality of the clustering process.

Implementation of K-means++ Algorithm

Here’s a practical implementation of the K-means++ algorithm:

# Importing necessary libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import sys

# Generating synthetic data

mean_01 = np.array([0.0, 0.0])

cov_01 = np.array([[1, 0.3], [0.3, 1]])

dist_01 = np.random.multivariate_normal(mean_01, cov_01, 100)

mean_02 = np.array([6.0, 7.0])

cov_02 = np.array([[1.5, 0.3], [0.3, 1]])

dist_02 = np.random.multivariate_normal(mean_02, cov_02, 100)

mean_03 = np.array([7.0, -5.0])

cov_03 = np.array([[1.2, 0.5], [0.5, 1.3]])

dist_03 = np.random.multivariate_normal(mean_03, cov_01, 100)

mean_04 = np.array([2.0, -7.0])

cov_04 = np.array([[1.2, 0.5], [0.5, 1.3]])

dist_04 = np.random.multivariate_normal(mean_04, cov_01, 100)

# Combining data

data = np.vstack((dist_01, dist_02, dist_03, dist_04))

np.random.shuffle(data)

# Function to visualize selected centroids

def plot(data, centroids):

plt.scatter(data[:, 0], data[:, 1], marker='.', color='gray', label='Data Points')

plt.scatter(centroids[:-1, 0], centroids[:-1, 1], color='black', label='Previously Selected Centroids')

plt.scatter(centroids[-1, 0], centroids[-1, 1], color='red', label='Next Centroid')

plt.title('Select %d-th Centroid' % (centroids.shape[0]))

plt.legend()

plt.xlim(-5, 12)

plt.ylim(-10, 15)

plt.show()

# Function to calculate Euclidean distance

def distance(p1, p2):

return np.sum((p1 - p2)**2)

# K-means++ initialization

def initialize(data, k):

centroids = []

centroids.append(data[np.random.randint(data.shape[0]), :])

plot(data, np.array(centroids))

for c_id in range(k - 1):

dist = []

for i in range(data.shape[0]):

point = data[i, :]

d = sys.maxsize

for j in range(len(centroids)):

temp_dist = distance(point, centroids[j])

d = min(d, temp_dist)

dist.append(d)

dist = np.array(dist)

next_centroid = data[np.argmax(dist), :]

centroids.append(next_centroid)

plot(data, np.array(centroids))

return centroids

centroids = initialize(data, k=4)

In this video titled "Implementing k-Means Clustering from Scratch," you will see a comprehensive guide on how to implement the K-means algorithm step-by-step, including the nuances of its mechanics.

Exploring K-means Clustering Theory

To further understand the theory behind K-means and its enhanced version, K-means++, let's watch this insightful video.

In this second video, "k-means clustering theory algorithm implementation scaling," the discussion revolves around the theoretical foundations of the K-means algorithm, delving into its scalability and practical applications.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Effective Strategies for Managing Null Values in Datasets

Explore effective techniques to handle null values in datasets, ensuring cleaner data for machine learning and analytics.

The Perilous Allure of Mount Everest: Why Are Lives Lost?

An exploration of the dangers faced by climbers on Everest, revealing the risks associated with altitude and overcrowding.

Innovative Space Travel Partnership: Momentus and SatRev Join Forces

Momentus and SatRev's new contract for orbital delivery in 2024 signifies a leap in space technology and transportation efficiency.

Avoid This If You’re Sensitive to Disgusting Airplane Tales

A humorous yet disturbing account of airplane hygiene and etiquette.

Breaking Free from the Rat Race: A New Perspective on Life

A reflective exploration of the rat race phenomenon and the quest for true freedom in life.

The Peculiar History of Mailing Babies in America

Discover the bizarre yet true story of how babies were once sent through the mail in the U.S., exploring its origins and eventual ban.

Exploring Benjamin Franklin's Scientific Collaborations

An overview of Benjamin Franklin's role in 18th-century scientific networks, highlighting collaborations and discoveries.

Your Mind Shapes Your Life: Break Free from Boredom

Discover how your mindset shapes your reality and learn to harness the power of your subconscious to create a fulfilling life.