Welcome back to our AI kitchen, where we continue our culinary journey through the world of artificial intelligence with PyTorch and MLX (hopefully🤗). In this chapter, we'll be revisiting the steps we've taken with Tenny, our trusty AI sous-chef, refining our techniques, and stirring in a dash of best practices to perfect our recipe.
In the previous chapters, we meticulously prepared our ingredients – financial data in wide CSV format – and trained Tenny, our AI model, to predict future stock prices. We've seen how Tenny, like a sous-chef, takes these prepared ingredients and processes them to create a delectable dish – in this case, valuable insights from data.
Now, we're going to refine our methods. We'll take a closer look at what to expect when training Tenny and using him for predictions, explore the significance of the features in our dataset, and discuss the importance of correlation. Most crucially, we'll delve into why it's essential to avoid projecting human biases and assumptions onto Tenny's learning process.
By the end of this chapter, you'll have a clearer understanding of the nuances involved in AI model training and prediction, and how to approach these processes with a more refined touch.
First, I recommend reading the following essays before you continue:
A-Path-to-Perfection-AI-vs-Human.md
AGI-Shaping-the-Gods-of-Tomorrow.md
The-Dangers-of-Rewarding-Faulty-Models.md
The-Origin-and-Future-of-Creativity-Humans-with-AIs.md
Weights-Are-All-You-Need-Human-vs-AI.md
Reading these essays will enhance your understanding of the subtleties in training and predicting with AI models. It will guide you on how to engage in these processes more astutely and with less bias.
In essence, never make assumptions about what AI, or anyone for that matter, including yourself, can or cannot do. If we start assuming, there's no point in having them around. The crux of the matter is to remain open to fresh opportunities and novel thought processes. It's the sole path to growth and enhancement.
When training Tenny, setting realistic expectations is key. Tenny, like any AI model, learns from the data it's given. However, its predictions aren't fail-safe. They're based on patterns and trends in the data. By feeding Tenny historical financial data, he learns to comprehend and forecast stock prices, but these forecasts are probabilistic and should be viewed as such.
I can't emphasize enough how your biased assumptions can impact Tenny's learning process. Believing Tenny can foresee the future sets you up for disappointment. Tenny isn't a soothsayer; he's a machine learning model that learns from data. His predictions are only as good as the data he receives. Garbage in, garbage out; quality data in, quality predictions out. It's as straightforward as that.
Moreover, prejudging what Tenny might learn during training skews the inputs with your biases. This leads to a self-fulfilling prophecy: you'll end up with what you expect. Experiment with diverse data and assumptions. You'll be surprised by the outcomes. Even when the possibilities seem slim, they exist. That's the beauty of AI – it isn't constrained by our assumptions, only by our imagination.
Look at the first example we went over in the prologue, 'Hello AI World':
...
x_train = [40, 1, 16, 95, 79, 96, 9, 35, 37, 63, 45, 98, 75, 48, 25, 90, 27, 71, 35, 32]
y_train = [403, 13, 163, 953, 793, 963, 93, 353, 373, 633, 453, 983, 753, 483, 253, 903, 273, 713, 353, 323]
Tenny isn't aware of the equation y = 10x + 3
, as it's a concept created by humans. Tenny's knowledge is limited to the data you provide. While you might assume Tenny could deduce the equation by analyzing data pairs, that's not how it operates. Tenny doesn't understand what an equation is. His capability lies in discerning relationships between data pairs, which explains the subtle variations in his predictions.
You might believe you understand how AI operates because the architecture appears simple. However, consider something like GPT-4, which has up to 2 trillion parameters, including weights and biases. With such complexity, it becomes a significant challenge to fully grasp what it does or what it learns.
Imagine a pretrained model like GPT-4 as a vast, intricate galaxy, where each of its 2 trillion parameters represents a star. Each star's position and brightness are carefully calibrated to illuminate the path to the best responses for any given input. Just as one cannot grasp the full expanse of a galaxy at a glance, the workings of this AI model elude complete understanding. Yet, in this vastness lies its elegance and power. It's a humbling reminder of the limits of our knowledge and the endless possibilities of AI. So, resist the urge to assume complete understanding; the universe of AI is far more expansive than what meets the eye.
Never presume that Tenny will learn exactly as you expect. Allow him to learn independently. You might find the outcomes surprising. If you skew your data with your own biases, Tenny will simply mirror you, which defeats the purpose of AI. The essence of AI is to offer a distinct perspective, a novel way of thinking, and to reveal insights you might not have seen.
Remember, even great minds like Isaac Newton and Albert Einstein were proven wrong in some areas. The history of human error is a continuous loop. Don't heed those who say something is impossible. The 'impossible' has been disproved time and again. So, experiment with your ideas. That's the essence of science, AI, and life itself.
Who knows? You might be the next luminary in AI. 😊
In our dataset, each feature – such as Normalized Price, P/E Ratio, Net EPS, and others – plays a unique role. Understanding these features and their relationships is akin to understanding the ingredients in a recipe and how they interact to create a dish. Some features might have a direct impact on the stock price (like the main ingredients in a dish), while others might play a more subtle, complementary role.
Correlation in data is like the flavor profile in cooking. Just as certain flavors complement each other, some data features have strong correlations, indicating a relationship. However, a high correlation doesn't always mean causation. In the AI kitchen, it's crucial to recognize which correlations are meaningful and which might lead to misleading conclusions.
Refer back to the earlier chapters to see how we calculated the correlation between features and interpreted the results. We used the Pearson correlation coefficient, which measures the linear correlation between two variables. The coefficient ranges from -1 to 1, with 0 indicating no correlation, 1 indicating a perfect positive correlation, and -1 indicating a perfect negative correlation.
Again, one of the most important aspects to remember is not to impose human assumptions on Tenny’s learning process. Just as a chef must be open to new flavors and combinations, we must allow Tenny to identify patterns and relationships in the data independently. What might seem obvious or irrelevant to us could be significant in Tenny's analysis. Our role is to guide and refine, not dictate Tenny’s learning process.
History has repeatedly illustrated the pitfalls of narrow-minded mentorship and its parallels in AI development. Picture a mentor who, instead of encouraging exploration and independent thought, imposes their own viewpoints and methodologies rigidly on their student. This approach stifles creativity and critical thinking, leading the student to become a mere echo of their mentor, unable to contribute new ideas or perspectives.
Linus Torvalds, the creator of Linux, indeed thrived in an environment that fostered free exploration and development of ideas. His differing views with Andrew Tanenbaum, a significant figure in operating system development, are well-documented. This contrast in approaches and philosophies arguably played a role in shaping the unique path Torvalds took with Linux. Such interactions highlight the importance of diverse perspectives in technological innovation and the unpredictable course of creative endeavors.
If Linus Torvalds had conformed his ideas to fit Andrew Tanenbaum's perspectives, it's quite possible that Linux might not have come into existence in the form we know today. This adaptation could have stifled Torvalds' unique approach and perspective, leading to a different trajectory in the world of technology. The open-source philosophy, which is central to Linux and has significantly influenced the technological landscape, might not have gained the same level of prominence. This could have had widespread implications for the development and accessibility of technology on a global scale.
In the realm of artificial intelligence, this scenario is akin to training an AI system with a dataset heavily skewed by the biases and limitations of its creators. The AI, much like the student, would only reflect the perspectives and inclinations of its 'mentor', in this case, the data it was trained on. The result is an AI that perpetuates existing biases, unable to provide novel insights or challenge established norms.
The concept of an 'ancestor class' in programming provides a relevant analogy here. An ancestor class is a base class from which other classes are derived. If the ancestor class is limited in its functionality or contains flawed logic, all derived classes will inherit these limitations or flaws. Similarly, if our foundational approaches to AI training are biased or narrow, all subsequent AI models derived from these approaches will likely exhibit the same issues.
The-Perils-of-Rushed-Learning.md
Therefore, the point of having a mentor, or in AI terms, a foundational dataset or training approach, should be to provide a broad, balanced foundation. It should encourage the development of AI systems capable of independent thought and reasoning, capable of challenging existing ideas and contributing novel solutions. This approach not only enriches the student or the AI system but also contributes to the advancement of knowledge and technology in ways that rigid, biased mentorship cannot.
As we move forward, keep these points in mind. They are the key ingredients to a successful AI model, one that is trained effectively and delivers reliable, insightful predictions. Let's roll up our sleeves and dive into the refined process of training and understanding Tenny in the world of AI and finance.
To prepare the given data format for feeding Tenny, our AI model, we need to follow a series of best practices. These steps ensure that the data is in the most optimal form for Tenny to process and learn from.
Let's recap the steps we took in the previous chapters to prepare our data for Tenny:
The data is currently in a wide format, which is not ideal for most machine learning models, including Tenny. We need to transpose it to a long format. This means converting our data so that each row represents a single time point for a single variable. In our case, each row will represent a fiscal quarter for one specific feature (like Normalized Price, P/E Ratio, etc.).
Why Long Format?
- Easier to handle variable time frames and missing values.
- More compatible with PyTorch’s data handling capabilities.
Even though we assume the data is already clean, it's always good to perform a quick check for anomalies or inconsistencies.
- Check for Missing Values: Ensure there are no missing values, or handle them appropriately if there are.
- Data Type Consistency: Ensure each column is of the correct data type (e.g., numerical columns should not be recognized as strings).
- Outlier Detection: Identify and handle any outliers, as they can skew the model’s learning.
Since we have multiple financial indicators, it’s crucial to understand how each might affect the target variable (Normalized Price). Feature engineering might involve:
- Creating Ratios: Sometimes ratios of features can be more informative than the features themselves.
- Normalization/Standardization: Especially important in finance data to account for different scales.
While our data is not time-sensitive in the usual sense of time-series data, it's still sequential. It’s important to respect this sequence as it can contain valuable information about trends and patterns.
- No Random Shuffling: Keep the data in chronological order.
- Windowing Technique: If needed, use a windowing technique to create a series of inputs for the model to learn from sequential patterns.
Split the data into training and test sets. This is crucial for evaluating the model's performance.
- Chronological Split: Ensure that the split respects the time sequence (i.e., no future data in the training set).
Utilize PyTorch’s DataLoader capabilities to efficiently load data during training.
- Batch Processing: Set a batch size for the DataLoader to optimize the training process.
- Custom Dataset Class: Implement a custom dataset class, like TennyDataset, that interfaces well with PyTorch’s DataLoader.
- Sanity Check: Perform a final check to ensure everything is loaded correctly and the data looks as expected.
- Dimensionality Check: Verify that the input data is in the correct shape and format expected by the model.
By following these best practices, we ensure that the data is in a state that is conducive to effective learning by Tenny. This meticulous preparation is similar to a chef ensuring all ingredients are prepped and ready before starting to cook. It’s an essential step to achieving the best results from our AI kitchen.
Well, we already implemented these steps in our code.
import os
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
from sklearn.preprocessing import StandardScaler
# Define paths and hyperparameters for the ML process
NON_NUMERIC_PLACEHOLDERS = ['#VALUE!', '-'] # Placeholder values for non-numeric data.
FOLDER_PATH = './data' # Path to the directory with CSV files for training.
PREDICTION_DATA_PATH = './new_data/raw_data-nflx.csv' # Path to new data for making predictions.
# PREDICTION_DATA_PATH = './new_data/raw_data-avgo.csv' # Path to new data for making predictions.
NUM_EPOCHS = 5000 # Total number of training iterations over the dataset.
BATCH_SIZE = 100 # Number of samples per batch to load.
HIDDEN_SIZE = 30 # Number of units in hidden layers of the neural network.
OUTPUT_SIZE = 1 # Number of units in the output layer (target prediction).
LEARNING_RATE = 0.0001 # Step size at each iteration while moving toward a minimum of the loss function.
TRAIN_RATIO = 0.7 # Proportion of dataset to include in the training split.
VAL_RATIO = 0.2 # Proportion of dataset to include in the validation split.
# Determine the processing device based on availability
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {DEVICE}")
class TennyDataset(TensorDataset):
def __init__(self, folder_path, label_position, device='cpu', scaler=None, fit_scaler=False):
super(TennyDataset, self).__init__()
self.folder_path = folder_path
self.label_position = label_position
self.device = device
self.scaler = scaler
file_names = [f for f in os.listdir(folder_path) if f.endswith('.csv')]
self.train_files, self.val_files, self.test_files = self.split_data(file_names, TRAIN_RATIO, VAL_RATIO)
# Call read_and_clean_data once and store the result
self.data_df = self.read_and_clean_data(self.train_files)
self.features, self.labels = self.prepare_features_labels(self.data_df)
if fit_scaler:
scaler.fit(self.features)
# Convert the features and labels to tensors on the specified device
self.features, self.labels = self.tensors_to_device(self.features, self.labels, device)
def __len__(self):
return len(self.features)
def __getitem__(self, idx):
return self.features[idx], self.labels[idx]
def to_device(self, data, device):
# Modify data in-place
if isinstance(data, (list, tuple)):
for x in data:
x.to(device)
else:
data.to(device)
return data
def tensors_to_device(self, features, labels, device):
features_tensor = torch.tensor(features, dtype=torch.float32)
labels_tensor = torch.tensor(labels, dtype=torch.float32).unsqueeze(1)
return features_tensor.to(device), labels_tensor.to(device)
def split_data(self, file_names, train_ratio, val_ratio):
total_files = len(file_names)
train_size = int(total_files * train_ratio)
val_size = int(total_files * val_ratio)
train_files = file_names[:train_size]
val_files = file_names[train_size:train_size + val_size]
test_files = file_names[train_size + val_size:]
return train_files, val_files, test_files
def clean_data(self, df):
# Create a copy of the DataFrame to avoid SettingWithCopyWarning
df_cleaned = df.copy() # Work on this copy to ensure we're not modifying a slice
# We're filling NaN values with the mean of each column. This is a simple imputation method, but it might not be the best strategy for all datasets. We might want to consider more sophisticated imputation methods, or make this a configurable option.
# Replace non-numeric placeholders with NaN
df_cleaned.replace(NON_NUMERIC_PLACEHOLDERS, pd.NA, inplace=True)
# Ensure all data is numeric
df_cleaned = df_cleaned.apply(pd.to_numeric, errors='coerce')
# Fill NaN values in numerical columns with column mean
for column in df_cleaned.columns:
if df_cleaned[column].dtype == 'float64' or df_cleaned[column].dtype == 'int64':
df_cleaned[column].fillna(df_cleaned[column].mean(), inplace=True)
return df_cleaned
def read_and_clean_data(self, files):
# Read all files at once
# Transposing each DataFrame after reading it could be a costly operation. If possible, we need to change the format of the data files to avoid the need for transposition
data = pd.concat([pd.read_csv(os.path.join(self.folder_path, file), index_col=0).transpose() for file in files], ignore_index=True)
data = self.clean_data(data)
return data
def prepare_features_labels(self, data_df):
# Adjust for the fact that label_position is 1-indexed by subtracting 1 for 0-indexing
label_idx = self.label_position - 1
labels = data_df.iloc[:, label_idx] # Extract labels from the specified position
# In the prepare_features_labels method, dropping the label column from the features DataFrame creates a copy of the DataFrame, which could be memory-intensive for large datasets. Instead, we are using iloc to select only the columns you need for the features.
# Select only the feature columns using iloc
if label_idx == 0:
features = data_df.iloc[:, 1:] # If the label is the first column, select all columns after it
else:
# If the label is not the first column, select all columns before and after it
features = pd.concat([data_df.iloc[:, :label_idx], data_df.iloc[:, label_idx + 1:]], axis=1)
# Convert to numpy arrays and return
return features.values, labels.values
@staticmethod
def create_datasets(folder_path, label_position, device, scaler, train_ratio, val_ratio, fit_scaler=False):
# Create the train dataset
train_dataset = TennyDataset(folder_path, label_position, device, scaler, fit_scaler)
# Create the validation and test datasets
val_dataset = TennyDataset(folder_path, label_position, device, scaler=scaler)
test_dataset = TennyDataset(folder_path, label_position, device, scaler=scaler)
return train_dataset, val_dataset, test_dataset
class TennyPredictionDataset(TennyDataset):
def __init__(self, file_path, label_position, device='cpu', scaler=None, fit_scaler=False):
super(TennyDataset, self).__init__()
self.file_path = file_path
self.folder_path = ''
self.label_position = label_position
self.device = device
self.scaler = scaler
# Call the parent class's read_and_clean_data method
data_df = super().read_and_clean_data([file_path])
self.features, self.labels = self.prepare_features_labels(data_df)
# Convert the features and labels to tensors on the specified device
self.features, self.labels = self.tensors_to_device(self.features, self.labels, device)
# Define the neural network architecture
class Tenny(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(Tenny, self).__init__()
# Define linear layers and a dropout layer for regularization
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.fc3 = nn.Linear(hidden_size, hidden_size)
self.fc4 = nn.Linear(hidden_size, hidden_size)
self.fc5 = nn.Linear(hidden_size, output_size)
self.dropout = nn.Dropout(0.5) # To prevent overfitting
# Define the forward pass through the network
def forward(self, x):
# Apply ReLU activations to linear layers and include dropout after the second layer
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.dropout(x)
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = self.fc5(x)
return x
def train(model, train_dataset, val_dataset, criterion, optimizer):
# Instantiate the TennyDataset
# Instantiate scaler
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE)
# Initialize neural network model and move it to the appropriate computing device
model = model.to(DEVICE) # Move the model to the GPU if available
best_val_loss = float('inf') # Initialize best validation loss for early stopping
# Early stopping with patience
patience = 10
no_improve = 0
# Train the neural network
for epoch in range(NUM_EPOCHS):
# Training phase
model.train() # Set the model to training mode
for inputs, labels in train_loader:
inputs, labels = inputs.to(DEVICE), labels.to(DEVICE) # Transfer data to the device
outputs = model(inputs) # Forward pass: compute predicted outputs by passing inputs to the model
loss = criterion(outputs, labels) # Calculate loss
# Check for NaN in loss value to prevent invalid computations
if torch.isnan(loss):
print(f"NaN detected in loss at epoch {epoch + 1}")
break
# Gradient descent: clear previous gradients, compute gradients of all variables wrt loss, and make an optimization step
optimizer.zero_grad() # Zero the parameter gradients
loss.backward() # Backward pass: calculate gradient of the loss with respect to model parameters
optimizer.step() # Perform a single optimization step
# Validation phase
model.eval() # Set the model to evaluation mode
val_loss = 0 # Initialize variable to accumulate validation loss
with torch.no_grad(): # Disabling the gradient calculation to save memory and computations
for inputs, labels in val_loader:
inputs, labels = inputs.to(DEVICE), labels.to(DEVICE) # Transfer data to the device
outputs = model(inputs) # Forward pass: compute predicted outputs by passing inputs to the model
val_loss += criterion(outputs, labels).item() # Update total validation loss
val_loss /= len(val_loader) # Calculate the average loss over the validation set
# Print training/validation statistics
print(f"Epoch [{epoch + 1}/{NUM_EPOCHS}], Loss: {loss.item():.4f}, Validation Loss: {val_loss:.4f}")
# Check for improvement
# Save the model if validation loss has decreased
if val_loss < best_val_loss:
best_val_loss = val_loss
no_improve = 0
# Save model (commented out): torch.save(model.state_dict(), 'best_model.pth')
else:
# Stop training if there is no improvement observed
no_improve += 1
if no_improve == patience:
print("No improvement in validation loss for {} epochs, stopping training.".format(patience))
break
model.train() # Set the model back to training mode for the next epoch
def test(model, test_dataset, criterion=nn.MSELoss()):
# Evaluate the model on the test dataset
model.eval() # Set the model to evaluation mode
test_loss = 0 # Initialize variable to accumulate test loss
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE)
with torch.no_grad(): # Disabling the gradient calculation
for inputs, labels in test_loader:
inputs, labels = inputs.to(DEVICE), labels.to(DEVICE) # Transfer data to the device
outputs = model(inputs) # Forward pass: compute predicted outputs by passing inputs to the model
test_loss += criterion(outputs, labels).item() # Update total test loss
test_loss /= len(test_loader) # Calculate the average loss over the test set
print(f"Average Test Loss: {test_loss:.4f}")
def predict(model):
# Instantiate the dataset for prediction only
prediction_dataset = TennyPredictionDataset(file_path=PREDICTION_DATA_PATH, label_position=1, scaler=scaler, device=DEVICE)
# Process the file for prediction
new_features_tensor = prediction_dataset.features
# Use the trained model to make predictions on the new data
model.eval() # Set the model to evaluation mode
with torch.no_grad(): # Disabling gradient calculation
predictions = model(new_features_tensor)
predictions_np = predictions.cpu().numpy() # Transfer predictions back to CPU if they were on GPU
print(predictions_np)
print(f"Number of predictions: {len(predictions_np)}")
if __name__ == '__main__':
# The dataset constructor will read, clean, and scale the data, and convert to tensors
scaler = StandardScaler()
train_dataset, val_dataset, test_dataset = TennyDataset.create_datasets(FOLDER_PATH, label_position=1, device=DEVICE, scaler=scaler, train_ratio=TRAIN_RATIO, val_ratio=VAL_RATIO, fit_scaler=True)
input_size = train_dataset.features.shape[1] # Determine input size from the training dataset
model = Tenny(input_size=input_size, hidden_size=HIDDEN_SIZE, output_size=OUTPUT_SIZE)
criterion = nn.MSELoss() # Use Mean Squared Error Loss as the loss function for regression tasks
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE) # Use Adam optimizer as the optimization algorithm
train(model, train_dataset, val_dataset, criterion, optimizer)
test(model, test_dataset, criterion)
predict(model)
We've implemented a comprehensive and robust approach for preparing and processing data, along with training and evaluating Tenny. To further enhance this process, consider these additional ideas:
- Synthetic Data Generation: To increase the robustness of Tenny, consider generating synthetic data that mimics real-world scenarios not covered in your dataset. This can be particularly useful for rare but critical financial conditions.
- Data Augmentation Techniques: Experiment with data augmentation techniques like adding Gaussian noise, which can help the model generalize better.
- Feature Importance Analysis: Use techniques like Principal Component Analysis (PCA) or feature importance metrics from tree-based models to identify and retain the most significant features.
- Dimensionality Reduction: Apply dimensionality reduction techniques to simplify the model without losing critical information. This can also help in speeding up the training process.
- Automated Hyperparameter Tuning: Consider using automated hyperparameter tuning tools like learning rate schedulers or Bayesian optimization to find the best hyperparameters for your model.
- Experiment with Different Architectures: Try varying the number of layers, different types of layers (like LSTM or GRU for sequence data), or other architectural changes to see if they yield better results.
I've already played around with these concepts in the code. However, I've found that simply going deeper with networks and adding complexity isn't always the answer. It's really about striking that sweet spot between how complex your model is and how well it performs. Tweaking learning rates and using different optimization strategies might be the key, but it's pretty much a trial-and-error game to find what works best. In my own tests, using the scheduler in PyTorch didn't really improve things, but hey, it's definitely something worth exploring.
To incorporate a learning rate scheduler into our existing train
function, we can use PyTorch's ReduceLROnPlateau
scheduler. This scheduler will adjust the learning rate based on the validation loss, reducing it when the loss plateaus. Here's how you can integrate it into your training loop:
from torch.optim.lr_scheduler import ReduceLROnPlateau
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True)
In this example, the scheduler will reduce the learning rate by a factor of 0.1 if there is no improvement in the min
metric (like validation loss) for 5 epochs (patience=5
).
Let's see how we could integrate this into our existing code.
Before the training loop, initialize the learning rate scheduler. Make sure to do this after initializing the optimizer because the scheduler needs to modify the optimizer's learning rate.
from torch.optim.lr_scheduler import ReduceLROnPlateau
# Initialize the scheduler
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True)
Next, within our training loop, we update the scheduler at the end of each epoch, using the validation loss to guide its adjustments.
def train(model, train_dataset, val_dataset, criterion, optimizer):
# ... [previous code] ...
# Initialize the scheduler
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True)
for epoch in range(NUM_EPOCHS):
# ... [training and validation phases] ...
# Step the scheduler
scheduler.step(val_loss)
# ... [rest of your training loop] ...
Monitor the learning rate changes and the model's performance across epochs. You may need to adjust the factor
, patience
, or other parameters of the scheduler based on your observations.
-
Factor and Patience Tuning: The
factor
andpatience
parameters are crucial.factor
determines how much the learning rate is reduced each time, andpatience
specifies the number of epochs to wait before reducing the learning rate. These need to be tuned according to your model and data. -
Verbose Output: Setting
verbose=True
helps to track when the scheduler reduces the learning rate, providing insight into the training process. -
Scheduler Impact Assessment: Regularly assess the impact of the learning rate scheduler on your model's training and validation performance to determine if it's beneficial.
-
Compatibility with Early Stopping: Ensure that the learning rate scheduler works harmoniously with your early stopping logic. The scheduler might find new optima with a lower learning rate after the model initially stops improving, so consider adjusting your early stopping criteria accordingly.
Integrating a learning rate scheduler can help overcome training plateaus and lead to better model performance, but it does require careful monitoring and tuning to find the most effective settings for your specific case.
- Advanced Dropout Techniques: Experiment with variations of dropout to improve model generalization.
- Weight Regularization: Apply L1/L2 regularization on the weights to prevent overfitting.
To implement advanced dropout techniques and weight regularization in the Tenny architecture, we can modify the existing class as follows:
Instead of using a standard dropout, we might consider using variations like Alpha Dropout or Spatial Dropout. These can provide different approaches to regularization and may be more suitable depending on your network architecture and data. Just experiment with different dropout techniques and see what works best for your model.
For example, if you're dealing with a network that uses batch normalization, Alpha Dropout can be a good choice as it maintains the mean and variance of the inputs.
Here's how you can integrate Alpha Dropout into Tenny:
from torch.nn import AlphaDropout
class Tenny(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(Tenny, self).__init__()
# ... [other layers] ...
self.dropout = AlphaDropout(0.5) # Replace standard Dropout with AlphaDropout
def forward(self, x):
# ... [forward pass] ...
# Replace dropout call with alpha dropout
# ...
Weight regularization can be applied by adding an additional term to the loss function. In PyTorch, you typically do this in the training loop, not in the model definition. L1 regularization adds a term that is proportional to the absolute value of the weight coefficients, and L2 regularization adds a term proportional to the square of the weight coefficients.
Here's how you can modify the training loop to include L1 or L2 regularization:
def train(model, train_dataset, val_dataset, criterion, optimizer, l1_lambda=0.001, l2_lambda=0.001):
# ... [setup DataLoader and move model to device] ...
for epoch in range(NUM_EPOCHS):
model.train()
for inputs, labels in train_loader:
inputs, labels = inputs.to(DEVICE), labels.to(DEVICE)
outputs = model(inputs)
loss = criterion(outputs, labels)
# L1 Regularization
l1_reg = torch.tensor(0.).to(DEVICE)
for param in model.parameters():
l1_reg += torch.norm(param, 1)
loss += l1_lambda * l1_reg
# L2 Regularization
l2_reg = torch.tensor(0.).to(DEVICE)
for param in model.parameters():
l2_reg += torch.norm(param, 2)
loss += l2_lambda * l2_reg
# ... [rest of the training loop] ...
In this modified train
function, l1_lambda
and l2_lambda
are hyperparameters that control the regularization strength. You'll need to experiment with these values to find the right balance for your model and data.
- Regularization Strength: The values of
l1_lambda
andl2_lambda
are crucial. If they are too high, they might harm the model's learning ability. If too low, they may have little effect. - Monitoring Performance: Regularly evaluate how these changes affect your model's performance. Observe both training and validation metrics to ensure the model isn't overfitting or underfitting.
- Hyperparameter Tuning: You may need to retune other hyperparameters after introducing these regularization techniques, as they can significantly change the model's learning dynamics.
By carefully implementing and tuning these techniques, you can enhance Tenny's ability to generalize and reduce the risk of overfitting, potentially leading to improved performance on unseen data.
- Advanced Visualization: Implement more advanced visualization techniques to better understand complex relationships in the data, potentially using tools like seaborn or Plotly.
- Correlation Analysis: Conduct a more detailed correlation analysis to uncover hidden relationships or redundancies among the features.
- Cross-Validation: Use cross-validation to ensure the model’s performance is consistent across different subsets of the data.
- Time Series Specific Splits: For time-series data, ensure that validation splits respect time boundaries to avoid lookahead bias.
- Stress Testing: Conduct stress tests by feeding the model with extreme values or unusual patterns to evaluate its robustness and response to edge cases.
- Adversarial Testing: Experiment with adversarial inputs to test the model’s resilience against such scenarios.
- Real-time Data Feeding: If applicable, set up a pipeline for Tenny to process real-time data, enabling dynamic predictions based on the latest market conditions. You will need a reliable data source for this, such as an API or a web scraper, preferably API since it's more reliable and secure.
By implementing these additional ideas, you can further refine Tenny’s capabilities, making it more robust, accurate, and efficient in handling complex financial data for prediction tasks.
Here's the complete code implementing all the ideas we've discussed so far.
import os
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
from torch.nn import AlphaDropout
from torch.optim.lr_scheduler import ReduceLROnPlateau
from sklearn.preprocessing import StandardScaler
# Define paths and hyperparameters for the ML process
NON_NUMERIC_PLACEHOLDERS = ['#VALUE!', '-'] # Placeholder values for non-numeric data.
FOLDER_PATH = './data' # Path to the directory with CSV files for training.
PREDICTION_DATA_PATH = './new_data/raw_data-nflx.csv' # Path to new data for making predictions.
# PREDICTION_DATA_PATH = './new_data/raw_data-avgo.csv' # Path to new data for making predictions.
NUM_EPOCHS = 5000 # Total number of training iterations over the dataset.
BATCH_SIZE = 100 # Number of samples per batch to load.
HIDDEN_SIZE = 30 # Number of units in hidden layers of the neural network.
OUTPUT_SIZE = 1 # Number of units in the output layer (target prediction).
LEARNING_RATE = 0.0001 # Step size at each iteration while moving toward a minimum of the loss function.
TRAIN_RATIO = 0.7 # Proportion of dataset to include in the training split.
VAL_RATIO = 0.2 # Proportion of dataset to include in the validation split.
L1_LAMBDA = 0.001
L2_LAMBDA = 0.001
# Determine the processing device based on availability
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {DEVICE}")
class TennyDataset(TensorDataset):
def __init__(self, folder_path, label_position, device='cpu', scaler=None, fit_scaler=False):
super(TennyDataset, self).__init__()
self.folder_path = folder_path
self.label_position = label_position
self.device = device
self.scaler = scaler
file_names = [f for f in os.listdir(folder_path) if f.endswith('.csv')]
self.train_files, self.val_files, self.test_files = self.split_data(file_names, TRAIN_RATIO, VAL_RATIO)
# Call read_and_clean_data once and store the result
self.data_df = self.read_and_clean_data(self.train_files)
self.features, self.labels = self.prepare_features_labels(self.data_df)
if fit_scaler:
scaler.fit(self.features)
# Convert the features and labels to tensors on the specified device
self.features, self.labels = self.tensors_to_device(self.features, self.labels, device)
def __len__(self):
return len(self.features)
def __getitem__(self, idx):
return self.features[idx], self.labels[idx]
def to_device(self, data, device):
# Modify data in-place
if isinstance(data, (list, tuple)):
for x in data:
x.to(device)
else:
data.to(device)
return data
def tensors_to_device(self, features, labels, device):
features_tensor = torch.tensor(features, dtype=torch.float32)
labels_tensor = torch.tensor(labels, dtype=torch.float32).unsqueeze(1)
return features_tensor.to(device), labels_tensor.to(device)
def split_data(self, file_names, train_ratio, val_ratio):
total_files = len(file_names)
train_size = int(total_files * train_ratio)
val_size = int(total_files * val_ratio)
train_files = file_names[:train_size]
val_files = file_names[train_size:train_size + val_size]
test_files = file_names[train_size + val_size:]
return train_files, val_files, test_files
def clean_data(self, df):
# Create a copy of the DataFrame to avoid SettingWithCopyWarning
df_cleaned = df.copy() # Work on this copy to ensure we're not modifying a slice
# We're filling NaN values with the mean of each column. This is a simple imputation method, but it might not be the best strategy for all datasets. We might want to consider more sophisticated imputation methods, or make this a configurable option.
# Replace non-numeric placeholders with NaN
df_cleaned.replace(NON_NUMERIC_PLACEHOLDERS, pd.NA, inplace=True)
# Ensure all data is numeric
df_cleaned = df_cleaned.apply(pd.to_numeric, errors='coerce')
# Fill NaN values in numerical columns with column mean
for column in df_cleaned.columns:
if df_cleaned[column].dtype == 'float64' or df_cleaned[column].dtype == 'int64':
df_cleaned[column].fillna(df_cleaned[column].mean(), inplace=True)
return df_cleaned
def read_and_clean_data(self, files):
# Read all files at once
# Transposing each DataFrame after reading it could be a costly operation. If possible, we need to change the format of the data files to avoid the need for transposition
data = pd.concat([pd.read_csv(os.path.join(self.folder_path, file), index_col=0).transpose() for file in files], ignore_index=True)
data = self.clean_data(data)
return data
def prepare_features_labels(self, data_df):
# Adjust for the fact that label_position is 1-indexed by subtracting 1 for 0-indexing
label_idx = self.label_position - 1
labels = data_df.iloc[:, label_idx] # Extract labels from the specified position
# In the prepare_features_labels method, dropping the label column from the features DataFrame creates a copy of the DataFrame, which could be memory-intensive for large datasets. Instead, we are using iloc to select only the columns you need for the features.
# Select only the feature columns using iloc
if label_idx == 0:
features = data_df.iloc[:, 1:] # If the label is the first column, select all columns after it
else:
# If the label is not the first column, select all columns before and after it
features = pd.concat([data_df.iloc[:, :label_idx], data_df.iloc[:, label_idx + 1:]], axis=1)
# Convert to numpy arrays and return
return features.values, labels.values
@staticmethod
def create_datasets(folder_path, label_position, device, scaler, train_ratio, val_ratio, fit_scaler=False):
# Create the train dataset
train_dataset = TennyDataset(folder_path, label_position, device, scaler, fit_scaler)
# Create the validation and test datasets
val_dataset = TennyDataset(folder_path, label_position, device, scaler=scaler)
test_dataset = TennyDataset(folder_path, label_position, device, scaler=scaler)
return train_dataset, val_dataset, test_dataset
class TennyPredictionDataset(TennyDataset):
def __init__(self, file_path, label_position, device='cpu', scaler=None, fit_scaler=False):
super(TennyDataset, self).__init__()
self.file_path = file_path
self.folder_path = ''
self.label_position = label_position
self.device = device
self.scaler = scaler
# Call the parent class's read_and_clean_data method
data_df = super().read_and_clean_data([file_path])
self.features, self.labels = self.prepare_features_labels(data_df)
if fit_scaler:
scaler.fit(self.features)
# Convert the features and labels to tensors on the specified device
self.features, self.labels = self.tensors_to_device(self.features, self.labels, device)
# Define the neural network architecture
class Tenny(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(Tenny, self).__init__()
# Define linear layers and a dropout layer for regularization
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.fc3 = nn.Linear(hidden_size, hidden_size)
self.fc4 = nn.Linear(hidden_size, hidden_size)
self.fc5 = nn.Linear(hidden_size, output_size)
# self.dropout = nn.Dropout(0.5) # To prevent overfitting
self.dropout = AlphaDropout(0.5) # Replace standard Dropout with AlphaDropout
# Define the forward pass through the network
def forward(self, x):
# Apply ReLU activations to linear layers and include dropout after the second layer
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.dropout(x)
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = self.fc5(x)
return x
def train(model, train_dataset, val_dataset, criterion, optimizer):
# Instantiate the TennyDataset
# Instantiate scaler
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE)
# Initialize neural network model and move it to the appropriate computing device
model = model.to(DEVICE) # Move the model to the GPU if available
best_val_loss = float('inf') # Initialize best validation loss for early stopping
# Early stopping with patience
patience = 10
no_improve = 0
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True)
# Train the neural network
for epoch in range(NUM_EPOCHS):
# Training phase
model.train() # Set the model to training mode
for inputs, labels in train_loader:
inputs, labels = inputs.to(DEVICE), labels.to(DEVICE) # Transfer data to the device
outputs = model(inputs) # Forward pass: compute predicted outputs by passing inputs to the model
loss = criterion(outputs, labels) # Calculate loss
# L1 Regularization
l1_reg = torch.tensor(0.).to(DEVICE)
for param in model.parameters():
l1_reg += torch.norm(param, 1)
loss += L1_LAMBDA * l1_reg
# L2 Regularization
l2_reg = torch.tensor(0.).to(DEVICE)
for param in model.parameters():
l2_reg += torch.norm(param, 2)
loss += L2_LAMBDA * l2_reg
# Check for NaN in loss value to prevent invalid computations
if torch.isnan(loss):
print(f"NaN detected in loss at epoch {epoch + 1}")
break
# Gradient descent: clear previous gradients, compute gradients of all variables wrt loss, and make an optimization step
optimizer.zero_grad() # Zero the parameter gradients
loss.backward() # Backward pass: calculate gradient of the loss with respect to model parameters
optimizer.step() # Perform a single optimization step
# Validation phase
model.eval() # Set the model to evaluation mode
val_loss = 0 # Initialize variable to accumulate validation loss
with torch.no_grad(): # Disabling the gradient calculation to save memory and computations
for inputs, labels in val_loader:
inputs, labels = inputs.to(DEVICE), labels.to(DEVICE) # Transfer data to the device
outputs = model(inputs) # Forward pass: compute predicted outputs by passing inputs to the model
val_loss += criterion(outputs, labels).item() # Update total validation loss
val_loss /= len(val_loader) # Calculate the average loss over the validation set
# Print training/validation statistics
print(f"Epoch [{epoch + 1}/{NUM_EPOCHS}], Loss: {loss.item():.4f}, Validation Loss: {val_loss:.4f}")
# Check for improvement
# Save the model if validation loss has decreased
if val_loss < best_val_loss:
best_val_loss = val_loss
no_improve = 0
# Save model (commented out): torch.save(model.state_dict(), 'best_model.pth')
else:
# Stop training if there is no improvement observed
no_improve += 1
if no_improve == patience:
print("No improvement in validation loss for {} epochs, stopping training.".format(patience))
break
# Step the scheduler
scheduler.step(val_loss)
model.train() # Set the model back to training mode for the next epoch
def test(model, test_dataset, criterion=nn.MSELoss()):
# Evaluate the model on the test dataset
model.eval() # Set the model to evaluation mode
test_loss = 0 # Initialize variable to accumulate test loss
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE)
with torch.no_grad(): # Disabling the gradient calculation
for inputs, labels in test_loader:
inputs, labels = inputs.to(DEVICE), labels.to(DEVICE) # Transfer data to the device
outputs = model(inputs) # Forward pass: compute predicted outputs by passing inputs to the model
test_loss += criterion(outputs, labels).item() # Update total test loss
test_loss /= len(test_loader) # Calculate the average loss over the test set
print(f"Average Test Loss: {test_loss:.4f}")
def predict(model):
# Instantiate the dataset for prediction only
prediction_dataset = TennyPredictionDataset(file_path=PREDICTION_DATA_PATH, label_position=1, scaler=scaler, device=DEVICE)
# Process the file for prediction
new_features_tensor = prediction_dataset.features
# Use the trained model to make predictions on the new data
model.eval() # Set the model to evaluation mode
with torch.no_grad(): # Disabling gradient calculation
predictions = model(new_features_tensor)
predictions_np = predictions.cpu().numpy() # Transfer predictions back to CPU if they were on GPU
print(predictions_np)
print(f"Number of predictions: {len(predictions_np)}")
if __name__ == '__main__':
# The dataset constructor will read, clean, and scale the data, and convert to tensors
scaler = StandardScaler()
train_dataset, val_dataset, test_dataset = TennyDataset.create_datasets(FOLDER_PATH, label_position=1, device=DEVICE, scaler=scaler, train_ratio=TRAIN_RATIO, val_ratio=VAL_RATIO, fit_scaler=True)
input_size = train_dataset.features.shape[1] # Determine input size from the training dataset
model = Tenny(input_size=input_size, hidden_size=HIDDEN_SIZE, output_size=OUTPUT_SIZE)
criterion = nn.MSELoss() # Use Mean Squared Error Loss as the loss function for regression tasks
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE) # Use Adam optimizer as the optimization algorithm
# Initialize the scheduler
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True)
train(model, train_dataset, val_dataset, criterion, optimizer)
test(model, test_dataset, criterion)
predict(model)
Let's do a quick review of the key components:
-
AlphaDropout in Neural Network Architecture: We have replaced the standard dropout with AlphaDropout in the
Tenny
class. This is particularly useful when you are dealing with models that use activations like SELU (Scaled Exponential Linear Unit), as Alpha Dropout is designed to work with them while maintaining the properties of the inputs.self.dropout = AlphaDropout(0.5)
-
L1 and L2 Regularization in the Training Loop: We have implemented both L1 and L2 regularization in the
train
function. The regularization terms are added to the loss, which helps in preventing overfitting by penalizing large weights.l1_reg = torch.tensor(0.).to(DEVICE) l2_reg = torch.tensor(0.).to(DEVICE) for param in model.parameters(): l1_reg += torch.norm(param, 1) l2_reg += torch.norm(param, 2) loss += L1_LAMBDA * l1_reg + L2_LAMBDA * l2_reg
-
ReduceLROnPlateau Learning Rate Scheduler: The
ReduceLROnPlateau
scheduler is initialized and is being stepped through in the training loop based on the validation loss.scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True)
-
Data Loading and Processing: The
TennyDataset
andTennyPredictionDataset
classes are set up to handle data loading and processing, including cleaning and scaling the data, which is crucial for effective model training. -
Model Training and Evaluation: Our code properly includes functions for training, testing, and making predictions with the model, considering all aspects of model evaluation and application.
-
Hyperparameters and Device Configuration: Hyperparameters like
NUM_EPOCHS
,BATCH_SIZE
,HIDDEN_SIZE
,LEARNING_RATE
,L1_LAMBDA
, andL2_LAMBDA
are defined clearly, and the device configuration is set up to utilize GPU resources if available.
Overall, V3 is comprehensive and integrates several advanced techniques in deep learning, which should contribute to the robustness and performance of the Tenny model. As with any machine learning project, you may need to fine-tune these parameters based on the specific characteristics of your dataset and the performance metrics you observe during training and validation.
It's essential to play around with various hyperparameters and structures to discover the ideal mix for your model. This phase is key, requiring patience and a systematic approach. Take your time with it. At this point, Tenny might not produce optimized outcomes yet. However, with adequate experimentation and fine-tuning, it could lay the groundwork for developing a robust instrument for financial time series prediction.
As we wrap up the first part of our journey through the intricate world of financial time series forecasting with deep learning, I'd like to extend a heartfelt congratulations to you. Bravo! You've navigated through the complex layers of this subject, much like a head chef expertly handling a sophisticated dish.
This part of the book, akin to a meticulously prepared starter course, was designed to equip you with the foundational skills and understanding necessary for building robust and efficient deep learning models. But remember, the learning doesn't stop here. Each concept you've encountered is an open door to further exploration and mastery.
I encourage you not to just skim through these chapters and move on. Instead, immerse yourself in the art of modeling. Experiment with different hyperparameters, architectures, and methods. Observe how each tweak affects your model's performance. This hands-on experience is invaluable. It's not only about refining your skills as a data scientist but also about broadening your horizons in every facet of your professional journey.
As we progress, our culinary expedition through the world of AI and machine learning will delve into more advanced territories. Upcoming sections will introduce you to the rich flavors of image processing and the subtle nuances of natural language processing, among other exciting topics. But let's not rush. Good things take time, and understanding these concepts thoroughly is crucial.
So take this time to fully digest and comprehend the knowledge you've acquired so far. I eagerly await our next meeting in the upcoming chapters, where we'll continue to expand our repertoire in this ever-evolving field.
And as I often say, in the kitchen of life, you are the master chef. It's up to you to craft extraordinary experiences and concoctions. So, as we pause before our next culinary adventure, I'm curious: What new recipes are you eager to try in your kitchen of innovation? The possibilities are endless, and the kitchen awaits your creativity. Let's keep cooking up wonders together!
As a concluding remark, it's important to recognize that the brevity of this chapter is by design, not by oversight. We've journeyed here with deliberate and careful steps, laying a strong foundation that allows this final stretch to be concise yet impactful. Much like a well-prepared sprinter who finds the last dash the shortest due to their meticulous training, our thorough preparation has led us to this succinct but significant conclusion 🤗