Machine and Deep Learning Models Development and Predictions for a Chemical Engineering Project

Project Topic: Prediction of Moisture ratio of Harvested Yam using Machine Learning

This folder contains implementation codes used for my undergraduate project with Python and MATLAB. 
The aim was to predict the moisture ratio of harvested yam 
Dried yam is beneficial in making materials like flakes, flour and chips from yam slices 
This project helps the local farmer obtain a moisture ratio value needed for further processing.
This simulation reduces the number of experiments required to get moisture ratio 
This also reduces inadequacies as well as time consumption in the laboratory.

Dataset

First of, the dataset used was from an experiment conducted on yam dehydration which can be accessed here: https://drive.google.com/file/d/1EUijQdymBHPggFsWUmpaVcR6Abla4EtK/view?usp=sharing.

The independent variables comprises:

Temperature of drying operation
Size of the yam material
Time taken at specific intervals during the drying operation

The dependent variable is the Moisture Ratio

Exploratory Data Analysis

The dataset consists of drying temperatures in the range 65⁰C to 95⁰C, sizes of yam material in the range 1.5mm to 4.5mm, drying times in the range 0 to 320 minutes in different step sizes and their respective moisture ratios.

A preview of the dataset is given below:

On exploring the dataset, the dataset was found to have different properties.

Numerical Features
Categorical Features
Time-Series Features

Models used as according to the type of dataset explored above were:

1. Multi-Layer Feedforward Neural Network (MLFNN)

2. Extra Trees Regressor Model

3. Long-Short Term Memory Neural Network (LSTM NN)

Models Training

About 83% was used in training the models while the rest (17%) for testing. The sub-dataset used for testing was of 95 degree celcius, 4.5mm yam material size and 0-320 seconds for time.

An optimization technique was implemented to get the suitable model parameters for the models listed above. This was achieved using a for loop to iterate over a number of parameters and checking for the redundancies in models by using performance metrics like RMSEs, R squared, etc.

1. The MLFNN was developed using MATLAB. The model parameters checked for were Hidden Layers and Number of Neurons in each hidden layer. The optimization codes can be found in this folder. The table below gives a summary statisctics on the for loop created to iterate over a given number of hidden layers and number of neurons for optimum parameter determination.

Table 4. 1 Optimum neurons and corresponding RMSEs for hidden layers

Hidden Layers 1 2 3 4 5

Optimum RMSE 0.0031 0.0013 0.0009 0.0008 0.0009

Optimum Neuron 42 43 27 36 41

Optimum parameters were chosen as 4 Hidden Layers, and 36 Number of Neurons.

MODEL

Training Plots

Predicted vs Actual Plot

2. The Extra Trees Regressor was developed using Python. An automated machine learning library called pyCaret was used for this implementation. This library develops different types of regressional models and picks the best one using various performance metrics. As earlier mentioned, the features were seen as categorical here and shuffling was set as True to reduce overfitting on training set

Variable Importance Plot showing categorical features formed

Predicted versus Actual Plot

3. The LSTM NN was developed using Python. A part of the codes were referenced from online sites like https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/ and https://apmonitor.com/do/index.php/Main/LSTMNetwork. The model parameters checked for were Hidden Layers, Number of neurons in each Hidden Layer and the Number of effective Time Steps required for a sequence/pattern to be developed.

From the iterative conditions considered as

Number of neurons from 10 to 90 with a step size of 10 while time step from 2 to 10 with a step size of 2
Number of neurons from 2 to 10 with a step size of 2 while time step from 10 to 100 with a step size of 10

and using the R squared values (R²) on the test dataset as performance metrics, the following results were extracted for analysis and insights.

1. Effect of Time Steps on the performance metric R2 with increasing hidden layers

2. Effect of Neurons on the performance metric R2 with increasing time steps

After much iterations, insights from the figures above, sorting to get the optimum parameters of the highest R², the supposedly best model had parameters as given: Time Steps of 2, 70 Neurons and 3 Hidden Layers.

MODEL

Predicted vs Actual Plot

Models Evaluation

The LSTM, MLFNN and Extra Trees models were seen as suitable predictive models showing values close enough to the actual moisture ratio from experimentation and good performance metrics with R squared values of 0.988, 0.979 and 0.999 respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
LSTM		LSTM
MLFNN		MLFNN
Tree-Based Model		Tree-Based Model
LICENSE		LICENSE
Project Documents.zip		Project Documents.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine and Deep Learning Models Development and Predictions for a Chemical Engineering Project

Project Topic: Prediction of Moisture ratio of Harvested Yam using Machine Learning

Dataset

Exploratory Data Analysis

1. Multi-Layer Feedforward Neural Network (MLFNN)

2. Extra Trees Regressor Model

3. Long-Short Term Memory Neural Network (LSTM NN)

Models Training

MODEL

Training Plots

Predicted vs Actual Plot

Variable Importance Plot showing categorical features formed

Predicted versus Actual Plot

1. Effect of Time Steps on the performance metric R2 with increasing hidden layers

2. Effect of Neurons on the performance metric R2 with increasing time steps

MODEL

Predicted vs Actual Plot

Models Evaluation

About

Releases

Packages

Languages

License

KachukwuOkoh/undergradproject

Folders and files

Latest commit

History

Repository files navigation

Machine and Deep Learning Models Development and Predictions for a Chemical Engineering Project

Project Topic: Prediction of Moisture ratio of Harvested Yam using Machine Learning

Dataset

Exploratory Data Analysis

1. Multi-Layer Feedforward Neural Network (MLFNN)

2. Extra Trees Regressor Model

3. Long-Short Term Memory Neural Network (LSTM NN)

Models Training

MODEL

Training Plots

Predicted vs Actual Plot

Variable Importance Plot showing categorical features formed

Predicted versus Actual Plot

1. Effect of Time Steps on the performance metric R2 with increasing hidden layers

2. Effect of Neurons on the performance metric R2 with increasing time steps

MODEL

Predicted vs Actual Plot

Models Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages