Parameter Optimization of SVM Using Dry Bean Dataset

This repository contains code for parameter optimization of Support Vector Machines (SVM) using the Dry Bean Dataset. The dataset is a multi-class classification problem with 17 features and 7 classes. The goal of this project is to find the best parameters for the SVM model in order to achieve the highest accuracy possible.

Dataset

The Dry Bean Dataset consists of 13611 instances with 16 attributes, including 17 feature attributes, 1 ID attribute, and 2 quality attributes. The dataset is publicly available at the UCI Machine Learning Repository. The features include geometric shape, form, and structure features of the beans that are important in selecting high-quality beans. The dataset is preprocessed and ready to be used for classification tasks.

1. Number of Instances: 13611

2. Number of Attributes: 17

3. Attribute Information:

1.) Area (A): The area of a bean zone and the number of pixels within its boundaries.
2.) Perimeter (P): Bean circumference is defined as the length of its border.
3.) Major axis length (L): The distance between the ends of the longest line that can be drawn from a bean.
4.) Minor axis length (l): The longest line that can be drawn from the bean while standing perpendicular to the main axis.
5.) Aspect ratio (K): Defines the relationship between L and l.
6.) Eccentricity (Ec): Eccentricity of the ellipse having the same moments as the region.
7.) Convex area (C): Number of pixels in the smallest convex polygon that can contain the area of a bean seed.
8.) Equivalent diameter (Ed): The diameter of a circle having the same area as a bean seed area.
9.) Extent (Ex): The ratio of the pixels in the bounding box to the bean area.
10.)Solidity (S): Also known as convexity. The ratio of the pixels in the convex shell to those found in beans.
11.)Roundness (R): Calculated with the following formula: (4piA)/(P^2)
12.)Compactness (CO): Measures the roundness of an object: Ed/L
13.)ShapeFactor1 (SF1)
14.)ShapeFactor2 (SF2)
15.)ShapeFactor3 (SF3)
16.)ShapeFactor4 (SF4)
17.)Class (Seker, Barbunya, Bombay, Cali, Dermosan, Horoz and Sira)

Requirements

To run the code in this repository, you will need to have the following libraries installed:

scikit-learn
pandas
numpy
matplotlib
seaborn

Usage

The repository contains a Jupyter notebook (svm_drybean.ipynb) that walks through the process of parameter optimization for SVM using the Dry Bean Dataset. The notebook includes detailed explanations and comments for each step of the process.
To run the notebook, you can simply open it in Jupyter and run the cells one by one. You can also run the notebook in Google Colab or any other similar environment.

Results

The results of the parameter optimization process are summarized in the notebook. The best parameters for the SVM model were found using GridSearchCV and resulted in an accuracy of 97.8%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Parameter Optimization of SVM Using Dry Bean Dataset

Dataset

1. Number of Instances: 13611

2. Number of Attributes: 17

3. Attribute Information:

Requirements

Usage

Results

Scatter Plot

Histogram

Box Plot

Accuracy Plot

Files

README.md

Latest commit

History

README.md

File metadata and controls

Parameter Optimization of SVM Using Dry Bean Dataset

Dataset

1. Number of Instances: 13611

2. Number of Attributes: 17

3. Attribute Information:

Requirements

Usage

Results

Scatter Plot

Histogram

Box Plot

Accuracy Plot