Skip to content

Latest commit

 

History

History
42 lines (25 loc) · 1.61 KB

README.md

File metadata and controls

42 lines (25 loc) · 1.61 KB

K-Means clustering implementation in Python

Overview

This repository contains a Python implementation of the K-Means clustering algorithm. K-Means is a widely used unsupervised machine learning algorithm for partitioning a dataset into a specified number (k) of clusters, based on similarity. This implementation utilizes Python 3 and Numpy for the numerical operations.

Files and structure

  1. kmeans.py

This file contains the main implementation of the KMeansModel class, which encapsulates the K-Means clustering logic.

  1. utils.py

This file provides utility functions used in the K-Means implementation. These functions include normalization, resizing, and centroid generation.

  1. main.ipynb

Jupyter Notebook demonstrating the usage of the K-Means algorithm. It serves as a visual guide and provides insights into the clustering process.

Example usage

from kmeans import KMeansModel
import pandas as pd

# Load your data into a pandas DataFrame (replace this with your own data)
data = pd.read_csv("your_data.csv")

# Instantiate the KMeansModel
model = KMeansModel()

# Cluster the data
centroids, clusters = model.cluster(data, n_clusters=4, n_iter=10)

Model parameters

  • n_clusters: The number of clusters the algorithm partitions the dataset into.
  • n_iter: The number of iterations the algorithm goes through to iteratively update cluster assignments and centroids.

Adjusting these parameters allows you to control the granularity of clustering and the convergence of the algorithm. Experimenting with different values can impact the quality and efficiency of the clustering results.