Explore my diverse collection of projects showcasing machine learning, data analysis, and more. Organized by project, each directory contains code, datasets, documentation, and resources.
Welcome to my Data Science and Machine Learning Projects Repository! This repository contains a collection of my data science projects, showcasing my skills and expertise in the field. Each project demonstrates different aspects of data analysis, machine learning, and visualization.
-
- Description: The project predicts the diagnosis (M = malignant, B = benign) of the Breast Cancer
- Technologies Used: The notebooks uses Decision Tree Classification and Logistic Regression
- Results: The logistic regression gave 97% accuracy and decision tree gave 93.5% accuracy
-
E-Commerce Product Delivery Prediction
- Description: The aim of this project is to predict whether products from an international e-commerce company will reach customers on time or not. Additionally, the project analyzes various factors influencing product delivery and studies customer behavior. The company primarily sells electronic products.
- Technologies Used: The notebooks uses Exploratory Data Analysis, Decision tree classifier, K Nearest Neighbors and Logistic Regression.
- Results: The decision tree classifier as the highest accuracy among the other models, with accuracy of 69%. The random forest classifier and logistic regression had accuracy of 68% and 67% respectively. The K Nearest Neighbors had the lowest accuracy of 65%.
-
- Description: The aim of this analysis is to predict the price of diamonds based on their characteristics. The dataset used for this analysis is the Diamonds dataset from Kaggle. The dataset contains 53940 observations and 10 variables.
- Technologies Used: The notebooks uses Exploratory Data Analysis, Decision Tree Regressor and Random Forest Regressor.
- Results: Both the models have almost same accuracy. However, the Random Forest Regressor model is slightly better than the Decision Tree Regressor model. There is something interesting about the data. The price of the diamonds with J color and I1 clarity is higher than the price of the diamonds with D color and IF clarity which couldn't be explained by the models. This could be because of the other factors that affect the price of the diamond.
-
- Description: The aims to predict the likelihood of a patient experiencing a stroke based on various input parameters such as gender, age, presence of diseases, and smoking status. The dataset provides relevant information about each patient, enabling the development of a predictive model.
- Technologies Used: The notebooks uses Exploratory Data Analysis, Logistic Regression, Logistic Regression, Support Vector Machine (SVM), Decision Tree Classifier, K-Nearest Neighbors (KNN).
- Results: The model accuracies of Logistic Regression, SVM and KNN are quite similar i.e. 93.8 %. The accuracy of Decision Tree Classifier is 91.8 %. So, we can use any of these models to predict the heart stroke.