Skip to content

KachukwuOkoh/undergradproject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine and Deep Learning Models Development and Predictions for a Chemical Engineering Project

Project Topic: Prediction of Moisture ratio of Harvested Yam using Machine Learning

PythonMATLABB

This folder contains implementation codes used for my undergraduate project with Python and MATLAB. 
The aim was to predict the moisture ratio of harvested yam 
Dried yam is beneficial in making materials like flakes, flour and chips from yam slices 
This project helps the local farmer obtain a moisture ratio value needed for further processing.
This simulation reduces the number of experiments required to get moisture ratio 
This also reduces inadequacies as well as time consumption in the laboratory.

Dataset

First of, the dataset used was from an experiment conducted on yam dehydration which can be accessed here: https://drive.google.com/file/d/1EUijQdymBHPggFsWUmpaVcR6Abla4EtK/view?usp=sharing.

The independent variables comprises:

  • Temperature of drying operation
  • Size of the yam material
  • Time taken at specific intervals during the drying operation

The dependent variable is the Moisture Ratio

Exploratory Data Analysis

The dataset consists of drying temperatures in the range 65⁰C to 95⁰C, sizes of yam material in the range 1.5mm to 4.5mm, drying times in the range 0 to 320 minutes in different step sizes and their respective moisture ratios.

A preview of the dataset is given below:

Dataset

On exploring the dataset, the dataset was found to have different properties.

  • Numerical Features
  • Categorical Features
  • Time-Series Features

Models used as according to the type of dataset explored above were:

1. Multi-Layer Feedforward Neural Network (MLFNN)

image

2. Extra Trees Regressor Model

3. Long-Short Term Memory Neural Network (LSTM NN)

image

Models Training

About 83% was used in training the models while the rest (17%) for testing. The sub-dataset used for testing was of 95 degree celcius, 4.5mm yam material size and 0-320 seconds for time.

An optimization technique was implemented to get the suitable model parameters for the models listed above. This was achieved using a for loop to iterate over a number of parameters and checking for the redundancies in models by using performance metrics like RMSEs, R squared, etc.

1. The MLFNN was developed using MATLAB. The model parameters checked for were Hidden Layers and Number of Neurons in each hidden layer. The optimization codes can be found in this folder. The table below gives a summary statisctics on the for loop created to iterate over a given number of hidden layers and number of neurons for optimum parameter determination.

Table 4. 1 Optimum neurons and corresponding RMSEs for hidden layers

Hidden Layers 1 2 3 4 5

Optimum RMSE 0.0031 0.0013 0.0009 0.0008 0.0009

Optimum Neuron 42 43 27 36 41

Optimum parameters were chosen as 4 Hidden Layers, and 36 Number of Neurons.

MODEL

image

Training Plots

image

Predicted vs Actual Plot

image

2. The Extra Trees Regressor was developed using Python. An automated machine learning library called pyCaret was used for this implementation. This library develops different types of regressional models and picks the best one using various performance metrics. As earlier mentioned, the features were seen as categorical here and shuffling was set as True to reduce overfitting on training set

Variable Importance Plot showing categorical features formed

image

Predicted versus Actual Plot

image

3. The LSTM NN was developed using Python. A part of the codes were referenced from online sites like https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/ and https://apmonitor.com/do/index.php/Main/LSTMNetwork. The model parameters checked for were Hidden Layers, Number of neurons in each Hidden Layer and the Number of effective Time Steps required for a sequence/pattern to be developed.

From the iterative conditions considered as

  1. Number of neurons from 10 to 90 with a step size of 10 while time step from 2 to 10 with a step size of 2
  2. Number of neurons from 2 to 10 with a step size of 2 while time step from 10 to 100 with a step size of 10

and using the R squared values (R²) on the test dataset as performance metrics, the following results were extracted for analysis and insights.

1. Effect of Time Steps on the performance metric R2 with increasing hidden layers

image image

image image

2. Effect of Neurons on the performance metric R2 with increasing time steps

image image

image image

image

After much iterations, insights from the figures above, sorting to get the optimum parameters of the highest R², the supposedly best model had parameters as given: Time Steps of 2, 70 Neurons and 3 Hidden Layers.

MODEL

image

Predicted vs Actual Plot

image

Models Evaluation

The LSTM, MLFNN and Extra Trees models were seen as suitable predictive models showing values close enough to the actual moisture ratio from experimentation and good performance metrics with R squared values of 0.988, 0.979 and 0.999 respectively.