Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added "Iris Dataset classification" under supervised learning/KNN/ #103

Merged
merged 8 commits into from
Oct 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Iris Dataset Classification Using K-Nearest Neighbors (KNN)

This project is designed to help beginners understand how the K-Nearest Neighbors (KNN) algorithm works by applying it to the famous Iris dataset. The Iris dataset is often used for learning machine learning algorithms because of its simplicity and well-defined structure.

## Table of Contents
- [Overview](#overview)
- [Dataset](#dataset)
- [Algorithm](#algorithm)
- [Installation](#installation)
- [Usage](#usage)

---

## Overview
In this project, we classify the Iris dataset into one of three species: **Setosa**, **Versicolor**, and **Virginica**. K-Nearest Neighbors (KNN) is used for classification, which is a simple and effective machine learning algorithm. We will explore the dataset, preprocess the data, and evaluate the model’s performance.

---

## Dataset
The Iris dataset consists of 150 samples, each having four features:
- **Sepal Length**
- **Sepal Width**
- **Petal Length**
- **Petal Width**

Each sample belongs to one of three classes:
1. Setosa
2. Versicolor
3. Virginica

You can directly import the dataset from sklearn library

---

## Algorithm
We use the **K-Nearest Neighbors (KNN)** algorithm, which classifies a sample based on the majority class among its K nearest neighbors.

### Steps:
1. **Data Loading**: Load the Iris dataset and inspect the structure.
2. **Data Preprocessing**: Split the data into training and testing sets.
3. **Model Training**: Apply KNN to the training data.
4. **Model Evaluation**: Use accuracy, precision, recall, and F1-score to evaluate the model.
5. **Hyperparameter Tuning**: Optimize the value of K to improve classification accuracy.

---

## Installation
To get started with the project, follow these steps:

1. **Clone the repository**:
```bash
git clone https://github.com/UppuluriKalyani/ML-Nexus/tree/main/Supervised%20Learning/K%20Nearest%20Neighbors/Iris%20Dataset%20Classification
2. **Install dependencies**:
Ensure you have Python 3 installed. Then, install the necessary Python libraries using pip:
```bash
pip install -r requirements.txt
## Usage
After installing the required dependencies, you can run the project and see the results:


irisClassifier.ipynb

- Change the value of K in the code to see how it affects the classification accuracy:

```python
knn = KNeighborsClassifier(n_neighbors=3) # Change 3 to any value of K you want to try


Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Results for Iris Dataset Classification Using K-Nearest Neighbors (KNN)

In this section, we present the results of applying the K-Nearest Neighbors (KNN) algorithm on the Iris dataset. The following visualizations help us understand how well the model performs and how the data is distributed.

---

## 1. Scatter Plot of All Data Points

The first plot shows all the data points in the Iris dataset with the following features:
- **Sepal Length**
- **Sepal Width**


![All Points Scatter Plot](assets/images/1.png)

---

## 2. Category-wise Scatter Plots

Each point is color-coded according to its species (Setosa, Versicolor, Virginica).


![CategoryWise Scatter Plot](assets/images/2.png)

---

## 3. Decision Boundary Plot

The following plot shows the decision boundaries created by the KNN algorithm. This helps us visualize the regions where the algorithm classifies a new data point into a specific class.

![Decision Boundary Plot](assets/images/3.png)

---

## Summary of Results
![Decision Boundary Plot](assets/images/4.png)

- The **scatter plots** show a clear separation between the species, especially between Setosa and the other two classes.
- The **decision boundary plot** shows how the KNN algorithm divides the feature space into different regions based on the nearest neighbors.

By tuning the value of **K**, we can adjust the smoothness of the decision boundaries. In this example, we used `K=5`, which provides a good balance between underfitting and overfitting.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading