This folder contains implementation codes used for my undergraduate project with Python and MATLAB.
The aim was to predict the moisture ratio of harvested yam
Dried yam is beneficial in making materials like flakes, flour and chips from yam slices
This project helps the local farmer obtain a moisture ratio value needed for further processing.
This simulation reduces the number of experiments required to get moisture ratio
This also reduces inadequacies as well as time consumption in the laboratory.
First of, the dataset used was from an experiment conducted on yam dehydration which can be accessed here: https://drive.google.com/file/d/1EUijQdymBHPggFsWUmpaVcR6Abla4EtK/view?usp=sharing.
The independent variables comprises:
- Temperature of drying operation
- Size of the yam material
- Time taken at specific intervals during the drying operation
The dependent variable is the Moisture Ratio
The dataset consists of drying temperatures in the range 65⁰C to 95⁰C, sizes of yam material in the range 1.5mm to 4.5mm, drying times in the range 0 to 320 minutes in different step sizes and their respective moisture ratios.
A preview of the dataset is given below:
On exploring the dataset, the dataset was found to have different properties.
- Numerical Features
- Categorical Features
- Time-Series Features
Models used as according to the type of dataset explored above were:
About 83% was used in training the models while the rest (17%) for testing. The sub-dataset used for testing was of 95 degree celcius, 4.5mm yam material size and 0-320 seconds for time.
An optimization technique was implemented to get the suitable model parameters for the models listed above. This was achieved using a for loop to iterate over a number of parameters and checking for the redundancies in models by using performance metrics like RMSEs, R squared, etc.
1. The MLFNN was developed using MATLAB. The model parameters checked for were Hidden Layers and Number of Neurons in each hidden layer. The optimization codes can be found in this folder. The table below gives a summary statisctics on the for loop created to iterate over a given number of hidden layers and number of neurons for optimum parameter determination.
Table 4. 1 Optimum neurons and corresponding RMSEs for hidden layers
Hidden Layers 1 2 3 4 5
Optimum RMSE 0.0031 0.0013 0.0009 0.0008 0.0009
Optimum Neuron 42 43 27 36 41
Optimum parameters were chosen as 4 Hidden Layers, and 36 Number of Neurons.
2. The Extra Trees Regressor was developed using Python. An automated machine learning library called pyCaret was used for this implementation. This library develops different types of regressional models and picks the best one using various performance metrics. As earlier mentioned, the features were seen as categorical here and shuffling was set as True to reduce overfitting on training set
3. The LSTM NN was developed using Python. A part of the codes were referenced from online sites like https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/ and https://apmonitor.com/do/index.php/Main/LSTMNetwork. The model parameters checked for were Hidden Layers, Number of neurons in each Hidden Layer and the Number of effective Time Steps required for a sequence/pattern to be developed.
From the iterative conditions considered as
- Number of neurons from 10 to 90 with a step size of 10 while time step from 2 to 10 with a step size of 2
- Number of neurons from 2 to 10 with a step size of 2 while time step from 10 to 100 with a step size of 10
and using the R squared values (R²) on the test dataset as performance metrics, the following results were extracted for analysis and insights.
1. Effect of Time Steps on the performance metric R2 with increasing hidden layers
After much iterations, insights from the figures above, sorting to get the optimum parameters of the highest R², the supposedly best model had parameters as given: Time Steps of 2, 70 Neurons and 3 Hidden Layers.
The LSTM, MLFNN and Extra Trees models were seen as suitable predictive models showing values close enough to the actual moisture ratio from experimentation and good performance metrics with R squared values of 0.988, 0.979 and 0.999 respectively.