Skip to content

Commit

Permalink
Merge pull request #868 from Varunshiyam/fixes-864
Browse files Browse the repository at this point in the history
Playtime Classification with Logistic Regression
  • Loading branch information
UppuluriKalyani authored Nov 10, 2024
2 parents af97f29 + aeae294 commit 9c5be4c
Show file tree
Hide file tree
Showing 2 changed files with 299 additions and 0 deletions.
31 changes: 31 additions & 0 deletions Prediction Models/Playtime_prediction/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Playtime Classification with Logistic Regression

## Project Overview

This project employs a logistic regression model to classify playtime data into predefined categories. By analyzing features such as session length, frequency of play, and user demographics, the model helps predict user engagement levels. This insight can be valuable for game developers and analysts to enhance user retention and engagement strategies.

## Problem Statement

Understanding player engagement patterns is crucial in the gaming industry to personalize experiences and optimize retention. This project addresses the need for a predictive model that can classify users based on playtime data, supporting data-driven decisions in game development and marketing.

## Features

The dataset contains several key attributes:
- **Session Length**: Duration of each gaming session.
- **Frequency of Play**: How often users play.
- **Demographic Details**: Information such as age and region.

## Methodology

1. **Data Preprocessing**: Clean and preprocess data, including handling missing values and feature scaling.
2. **Feature Engineering**: Selecting and engineering relevant features for the model.
3. **Model Training**: Train a logistic regression model on the dataset.
4. **Evaluation**: Evaluate model performance using accuracy, precision, recall, and F1-score.

## Requirements

- Python 3.x
- pandas
- numpy
- scikit-learn

Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"_execution_state": "idle",
"_uuid": "051d70d956493feee0c6d64651c6a088724dca2a"
},
"outputs": [],
"source": [
"library(tidyverse) # metapackage of all tidyverse packages\n",
"library(ggcorrplot)\n",
"library(grid) # package for arranging plots into a grid\n",
"library(gridExtra)\n",
"library(caret) # package for confusion matrix\n",
"library(boot) # package for K-FoldCV and boostrap"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dataset = read_csv(\"../input/performance-prediction/summary.csv\")\n",
"head(dataset,10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"summary(dataset)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"str(dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# EAD\n",
"\n",
"Let's find the most representative variables through EAD first and then construct a statistical model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data Wrangling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I will be removing the FreeThrowPercent, 3PointPercent, FieldGoalPercent since they are just combinations of previous columns of the dataset, and also the Name column. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dataset$Target = factor(ifelse(dataset$Target==1,\"Above5Years\",\"Less5Years\")) # factor the Target column\n",
"dataset = dataset %>% select(-FreeThrowPercent,-`3PointPercent`,-FieldGoalPercent,-Name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Boxplot\n",
"\n",
"Since these variables are in different scales I will be divinding in various subplots insted of appling a log transormation on the y-ax."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"library(grid)\n",
"options(repr.plot.width = 20, repr.plot.height = 20)\n",
"plot_boxplot = function(columns=vector()){ #simple function to produce plots\n",
" dataset %>% select(names(dataset[,columns]),Target) %>% \n",
" gather(key = k1,value = VariableValue,-\"Target\") %>%\n",
" ggplot(aes(y = VariableValue,x = Target,fill = Target)) +\n",
" stat_boxplot(aes(fill = Target)) + facet_grid(.~k1) +\n",
" theme(axis.text.x = element_blank(),\n",
" axis.ticks.x = element_blank(),\n",
" axis.title.x = element_blank(),\n",
" strip.text.x = element_text(size = 10, colour = \"black\", angle = 0)) +\n",
" scale_fill_manual(values = c('#e7298a','#66a61e'))\n",
"}\n",
"\n",
"box1 = plot_boxplot(columns = c(11:17))\n",
"box2 = plot_boxplot(columns = c(2,3))\n",
"box3 = plot_boxplot(columns = c(4,5))\n",
"box4 = plot_boxplot(columns = c(6:10))\n",
"\n",
"grid.arrange(arrangeGrob(box1,box2,box3,box4,ncol=2,nrow=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As common sense would point out older players apparently do peforme better than younger ones on average."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Correlogram"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"options(repr.plot.width = 10, repr.plot.height = 10)\n",
"cnr = cor(dataset%>% select(-Target))\n",
"p.values = cor_pmat(dataset%>% select(-Target)) # p-values matrix\n",
"ggcorrplot(cnr, hc.order = TRUE, type = \"lower\",\n",
" outline.col = \"black\",\n",
" ggtheme = ggplot2::theme_gray,\n",
" colors = c(\"#6D9EC1\", \"white\", \"#E46726\"),p.mat=p.values,lab = TRUE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see some variables are not statistically coorelated with others given their p-values are too high, as highlighted by the X mark. With that in mind I'm going to remove the 3PointMade and 3PointAttempt also because they are not highly coorelated with any other variables."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dataset = dataset %>% select(-`3PointMade`,-`3PointAttempt`)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Logit Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"shuffel.rows = sample(nrow(dataset)*0.9) # row shuffeling\n",
"dataset_train = dataset[shuffel.rows,]\n",
"dataset_test = dataset[-shuffel.rows,]\n",
"\n",
"logit.fit = glm(Target~., data=dataset_train,family='binomial')\n",
"summary(logit.fit)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"logit.probs = predict(logit.fit,newdata = dataset_test,type = 'response')\n",
"class.pred = factor(ifelse(logit.probs>0.5,\"Above5Years\",\"Less5Years\"))\n",
"confusionMatrix(class.pred,dataset_test$Target)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So the model is not that good, but could use some improvement"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## K-fold Cross-Validation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train.control = trainControl(method = \"cv\", number = 15)\n",
"logit.fitKfold = train(Target ~., data = dataset_train, method = \"glm\",\n",
" trControl = train.control)\n",
"logit.fitKfold"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"probs.Kfold = predict(logit.fitKfold, newdata = dataset_test, type = \"raw\")\n",
"confusionMatrix(probs.Kfold,dataset_test$Target)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It appers that the model have improved on the test set.\n",
"If there are any errors please do let me know."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

0 comments on commit 9c5be4c

Please sign in to comment.