Skip to content

Latest commit

 

History

History
288 lines (173 loc) · 26.9 KB

File metadata and controls

288 lines (173 loc) · 26.9 KB

Chapter 10. Statistics Part III - The Art of Learning from Data

crystal-ball-of-numbers.png

As we embark on the enchanting journey of this chapter, we find ourselves at the heart of a whimsical paradox: teaching humans how machines learn. This chapter promises to be a merry dance through the fields of data, where numbers twirl and leap in the ballet of statistical learning.

Our tale begins with the magical process of predicting values. Imagine, if you will, a crystal ball not of glass but of numbers, through which the future can be glimpsed—not in vague images, but in precise predictions. This crystal ball is our regression model. It whispers of relationships between variables, hinting at how one can influence another. Just as a seasoned gardener can predict the blossoming of flowers by observing the seasons, our regression model predicts outcomes based on input data.

But what happens when the crystal ball's predictions are not quite right? Here enters the concept of losses, a measure of the distance between what was predicted and what actually occurred. Think of it as the magical feedback from our attempts at spellcasting. When a spell doesn't quite turn out as expected, a good wizard reflects, learns, and adjusts. Similarly, in our data-driven enchantment, understanding losses helps us refine our predictive spells to be more accurate next time.

This leads us to the grand ball—optimization. The optimization process is akin to a grand dance of adjustments, where each step is measured and refined to achieve the perfect choreography. It's where the magic of learning truly happens, adjusting our model with each misstep and triumph until it dances flawlessly, predicting and learning from data with elegance and precision.

And thus, we see the entire feedback loop come to life: from predicting values to evaluating the accuracy of these predictions through losses, and then optimizing our approach based on this feedback. It's a cycle as eternal as the seasons, as intricate as the most complex spell.

The beauty of this is that the regression model is but the first spell in the wizard's book of AI. Once mastered, it opens the door to understanding more complex magics—neural networks, decision trees, and beyond. Yet, at their core, they all dance to the same rhythm of predicting, losing, learning, and optimizing.

So, let us hold our crystal balls of numbers close, and step confidently into the art of learning from data. For in this chapter, we're not just learning about machines; we're learning about the essence of learning itself, a concept as beautifully complex as the universe we inhabit. Together, we'll unravel these mysteries, turning daunting data into delightful insights. Onwards, to a chapter filled with enchantment, learning, and a touch of mathematical magic!

Intuitive Way of Understanding a Regression Model

Picture yourself at a shooting range, each shot you take aimed at the heart of a target. The first bullet strikes the bullseye, a perfect hit. Yet, the subsequent shots, while aimed with the same intention, land with varying degrees of precision—some near the center, others further away. Despite the differences, a pattern emerges from where these bullets find their resting places. This scenario is not just an exercise in marksmanship but a metaphor for understanding a regression model.

shooting-range.png

In this metaphorical shooting range, every shot fired and its landing point offer valuable data points. If we were to plot these points on a cartesian plane, a visual pattern would start to materialize. From observing this pattern, we make adjustments—aiming a bit to the left or the right, accounting for unseen forces like the wind or perhaps a slight tremble in our hands. With each adjustment and shot, our aim improves, drawing us ever closer to consistently hitting the bullseye.

🧐 The Cartesian plane, a term honoring the French mathematician René Descartes, is a two-dimensional space defined by the crossing of two perpendicular lines, known as axes, which segment the plane into four quadrants. The horizontal line is termed the x-axis, while the vertical counterpart is the y-axis, intersecting at a central point called the origin, marked as (0, 0).

Much like the way 'normal distribution' pays homage to Carl Friedrich Gauss by sometimes being referred to as 'Gaussian distribution', the Cartesian plane is Descartes' legacy in the realm of mathematics. In this framework, any location can be pinpointed by a pair of numbers (x, y), these coordinates. The x-coordinate, or abscissa, reveals the point's horizontal distance from the origin, and the y-coordinate, or ordinate, its vertical stance. This elegant system enables the meticulous mapping and examination of shapes, functions, and various mathematical entities within a plane. Essentially, it's the mathematicians' way of naming conventions, transforming the Cartesian plane into a tribute to Descartes' groundbreaking contribution to geometry.

This process of adjustment and improvement can be likened to drawing a straight line through the scattered data points on our cartesian plane. This line is not drawn at random; it's carefully calculated to be the best fit, minimizing the distance between itself and all the points it seeks to represent. This line, elegant in its simplicity, is the essence of our regression model.

Why a straight line, you might wonder? The data points we're working with are samples from a broader population, a snapshot of what's possible, not an exhaustive record of every outcome. In aiming for a straight line, we seek a model that generalizes well, that can predict outcomes not just for the data we have but for new, unseen data as well.

However, caution is advised against drawing our line with too eager an adherence to the minutiae of our data points. If our line zigzags to touch each point exactly, we risk overfitting—a model so finely tuned to our specific data set that it loses its ability to generalize, to predict outcomes beyond its immediate experience. It's akin to cramming for a test with last year's answer sheet and finding oneself adrift when faced with a new set of questions. Our goal is a model that balances precision with the ability to adapt and predict in a broader range of scenarios.

Thus, understanding a regression model is akin to learning the art of adjustment and prediction. It's a journey of aiming closer to the truth with each new piece of data, guided by the principles of balance, precision, and the wisdom to know when our line strikes the right chord between fitting our data and forecasting the future.

To illustrate the concept of regression with a straight line and the phenomenon of overfitting using Python and the Seaborn library, let's set up two conceptual examples. We'll create a simple dataset for each scenario and use Seaborn to visualize the regression line and the overfitting.

Example 1: Regression with a Straight Line

regression.png

First, we'll simulate a dataset where the relationship between two variables can be effectively modeled with a straight line. This example will demonstrate a good fit, where the regression model generalizes well across the data.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Generate synthetic data for a simple linear relationship
np.random.seed(42) # For reproducibility
x = np.arange(0, 100, 5)
y = 2 * x + np.random.normal(0, 10, len(x)) # Linear relationship with some noise

# Plotting the data and a linear regression model fit
sns.set_theme(style="darkgrid")
plt.figure(figsize=(10, 6))
sns.regplot(x=x, y=y)
plt.title('Linear Regression: Good Fit')
plt.xlabel('X Value')
plt.ylabel('Y Value')
plt.show()

This code generates a set of points (x, y) that follow a linear relationship with some added noise to simulate real-world data variance. Seaborn's regplot is used to plot these points and fit a linear regression line that minimizes the distance between the line and the points, illustrating a scenario of a good fit.

Example 2: Overfitting

overfitting.png

Next, let's simulate a scenario that demonstrates overfitting. In this example, we'll fit a polynomial regression model of a high degree to our data, which will closely follow the training data points, demonstrating overfitting.

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
import numpy as np
import matplotlib.pyplot as plt

# Generate the same synthetic data
x = np.arange(0, 100, 5)
y = 2 * x + np.random.normal(0, 10, len(x))

# Transforming the data to fit a polynomial regression model
x_reshape = x[:, np.newaxis] # Reshaping for compatibility with model
model = make_pipeline(PolynomialFeatures(degree=10), LinearRegression())
model.fit(x_reshape, y)
y_pred = model.predict(x_reshape)

# Plotting the original data and the overfitted model predictions
plt.figure(figsize=(10, 6))
plt.scatter(x, y, color='blue', label='Data Points')
plt.plot(x, y_pred, color='red', label='Overfitted Model')
plt.title('Polynomial Regression: Overfitting')
plt.xlabel('X Value')
plt.ylabel('Y Value')
plt.legend()
plt.show()

In this example, by fitting a 10th-degree polynomial model to our data, we create a scenario where the model overly complicates itself to capture all the data points perfectly. This complexity leads to overfitting, where the model is now too closely aligned with the training data, potentially losing its ability to generalize well to new, unseen data.

Both examples use Python libraries such as NumPy for data manipulation, Matplotlib and Seaborn for visualization, and Scikit-learn for creating a polynomial regression model to demonstrate overfitting.

The Dance of Losses: Finding the Shortest Path to the Truth

In our quest to perfect the alignment between our regression line and the scattered data points that dot the landscape of our analysis, we encounter the concept of losses. These losses serve as the whispers of guidance, nudging our model towards ever-increasing accuracy. They illuminate the discrepancies between our predictions and reality, showing us where adjustments are needed.

perpendicular-lines.png

Imagine for a moment that each data point sends out a line towards the regression line in a quest to connect. The length of this line represents the loss—the shorter the line, the closer we are to precision. Our mission is to minimize these losses, ensuring that our regression line is a mirror reflecting the true essence of the data as closely as possible. Therefore, the line representing the error for each data point is drawn vertically to the regression line, minimizing the vertical gap between prediction and reality.

🧐 When we discuss the relationship between data points and the regression line in the context of least squares regression, it is essential to understand the nature of the 'perpendicular' distances involved. These distances, often referred to as 'errors' or 'residuals', are not geometrically perpendicular to the regression line. Instead, they are vertical distances from each actual data point directly down (or up) to the regression line. This means that the 'perpendicular' term refers to the orientation of these lines relative to the x-axis, not the regression line. The least squares criterion focuses solely on minimizing these vertical discrepancies, which are squared and then summed up to determine how well the regression line fits the data.

This vertical measurement is a deliberate choice in regression analysis. It ensures that the differences between the observed values and the model's predictions are treated consistently across all levels of the independent variable. If, instead, we were to minimize the shortest Euclidean distance from each point to the regression line — that is, if we were to draw lines that are geometrically perpendicular to the regression line — we would be using a different methodology altogether, known as orthogonal regression. Orthogonal regression is used in specific circumstances where errors in both variables (independent and dependent) need to be considered. However, in standard OLS(ordinary least squares) regression, the focus remains on the vertical distances, which align with the errors in the predictions of the dependent variable, thus providing a clear path for minimizing prediction error and refining the model.

Here's where our mathematical toolkit expands: we can either capture the essence of these distances by taking absolute values or by squaring them. The choice of absolute values leads us down the path of minimizing the sum of direct distances, a straightforward approach. On the other hand, squaring the distances launches us into the realm of the least squares method, a technique that not only magnifies larger errors but also cleverly eradicates any concern of negative distances clouding our summation.

The act of squaring distances in the least squares method does more than just remove negatives; it disproportionately emphasizes larger errors, ensuring that our model pays heed to significant deviations. This method becomes our compass, steering us through the stormy seas of data analysis.

The decision to use absolute values or squares pivots on the unique landscape of the problem at hand. In scenarios like financial modeling, where the scales and stakes of over- and under-predictions may differ vastly, absolute values hold their ground. However, the least squares method, with its robustness and adaptability, often takes center stage, especially when we seek a model that not only learns but also adapts with grace.

This dance of losses, with its steps of adjustment and refinement, leads us on the shortest path to uncovering the truth hidden within our data. It is a dance that demands both precision and intuition, guiding our model to a harmony that resonates with the underlying reality it seeks to capture.

To demonstrate how loss minimization functions work using absolute values and squaring, let's create simple Python examples for both approaches. These examples will illustrate the computation of total loss in a dataset when using the absolute value method (also known as L1 loss or mean absolute error) and the squaring method (also known as L2 loss or mean squared error).

Example 1: Minimizing Loss Using Absolute Values (L1 Loss)

The L1 loss function computes the total absolute difference between the actual values and the predicted values. It's defined as:

loss-abs.png

where y_i are the actual values, y_i_hat are the predicted values, and n is the number of observations.

import numpy as np

# Example data: actual and predicted values
y_actual = np.array([3, -0.5, 2, 7])
y_predicted = np.array([2.5, 0.0, 2, 8])

# Compute L1 loss
l1_loss = np.sum(np.abs(y_actual - y_predicted))

print("L1 Loss:", l1_loss)

Example 2: Minimizing Loss Using Squaring (L2 Loss)

The L2 loss function computes the total squared difference between the actual values and the predicted values. It's defined as:

loss-mse.png

This approach gives more weight to larger errors and is defined as the mean squared error when averaged over all observations.

import numpy as np

# Example data: actual and predicted values
y_actual = np.array([3, -0.5, 2, 7])
y_predicted = np.array([2.5, 0.0, 2, 8])

# Compute L2 loss
l2_loss = np.sum((y_actual - y_predicted) ** 2)

print("L2 Loss:", l2_loss)

These examples demonstrate the two fundamental approaches to calculating loss in regression models. The choice between L1 and L2 loss can depend on the specifics of the problem at hand, such as the presence of outliers (L1 is more robust to outliers) or the importance of penalizing large errors more heavily (L2 does this by squaring the errors).

With the groundwork we've laid on understanding the Mean Squared Error (MSE) as a loss function, the purpose and utility of MSE in computational models will become clearer, especially when encountered in code snippets like the one below. MSE is a cornerstone in the realm of machine learning for quantifying the difference between the predicted values by a model and the actual values from the dataset. It's instrumental in steering the model adjustments during the training phase to enhance prediction accuracy.

import torch
import torch.nn as nn

# Define a simple dataset
X = torch.tensor([[1.0], [2.0], [3.0]]) # Input features
y = torch.tensor([[2.0], [4.0], [6.0]]) # Actual outputs

# Initialize a linear regression model
model = nn.Linear(in_features=1, out_features=1)

# Specify the Mean Squared Error Loss function
loss_fn = nn.MSELoss()

# Perform a forward pass to get the model's predictions
predictions = model(X)

# Calculate the MSE loss between the model's predictions and the actual values
mse_loss = loss_fn(predictions, y)

print(f"MSE Loss: {mse_loss.item()}")

This code exemplifies a fundamental pattern in training machine learning models, highlighting why MSE is a preferred metric for evaluating performance. By squaring the differences between predicted and actual values, MSE not only penalizes larger errors more severely but also provides a smooth gradient for optimization, making it a powerful tool for guiding models towards better accuracy. In essence, the use of MSE in loss calculation is a testament to its effectiveness in refining models to closely mirror the underlying patterns of the data they aim to represent.

The Perils of Applauding Erroneous Strategies

In the realm of artificial intelligence, reinforcing positive behaviors and penalizing mistakes is a cornerstone principle for fostering accurate and efficient learning in AI models. This methodology mirrors the ideal human cognitive process, where rational and wise decision-making is rewarded, and errors are used as learning opportunities.

However, a conundrum arises when humans, in their own practices, inadvertently celebrate their missteps. This behavior starkly contrasts with the aspirational benchmarks AI seeks to emulate—the epitome of human rationality and wisdom.

Consider, for instance, the folly of professing a 200% confidence level—a concept that defies statistical reasoning and reflects a misalignment with the principles of rational judgment. Similarly, employing leverage in scenarios characterized by a negative long-term expected return exemplifies a fundamental misunderstanding of risk, akin to compounding one's own losses:

-1 x 2 = -2
-1 x 10 = -10

Engaging in such practices is analogous to repeatedly multiplying negatives, only to achieve increasingly adverse outcomes. This cycle of reinforcing detrimental habits ensures that neither AI nor human practitioners learn from their errors, setting the stage for a repetitive sequence of failures, much like AI models trapped in a flawed regression loop.

Consider the evaluation of a model that consistently reinforces incorrect decisions:

Training accuracy: 0.0002
Validation accuracy: 0.00001
Test accuracy: 0.00000001

The metrics speak volumes. Persisting with a model yielding such dismal performance metrics is akin to endorsing inefficacy. The logical conclusion is unequivocal—a fresh start is imperative. Persisting with a fundamentally flawed approach, be it in AI modeling or human decision-making, is a venture into futility. The path forward demands a recalibration of strategies, discarding those that reward errors in favor of those that align with the principles of sound, rational, and effective learning.

A Path to Perfection: AI and the Human Quest

agi-the-art-of-raising-intelligence.jpeg

Mistakes are universal. From artificial intelligences (AIs) to artificial general intelligences (AGIs), and even within theological discussions, the concept of imperfection permeates. Some philosophies and faiths propose that imperfection itself is a divine gift, offering a canvas for growth and learning. This perspective raises intriguing questions about the nature of perfection in both divine and artificial constructs.

Perfection, in its absolute sense, leaves no room for evolution or discovery, akin to a story with its ending known or a game fully mastered. The allure of such experiences wanes because their outcomes offer no surprise. In this light, true AGI or divine entities, if claiming perfection, would contradict the very essence of growth and exploration.

In the divine schema, perfection might symbolize the ability to foster a world rich in dynamism and potential, a concept perhaps too vast for human understanding. Yet, in striving for this elusive perfection, humanity finds its rhythm through trial and error, much like AI models refining their algorithms.

We engage in a perpetual dance with truth, making predictions, assessing their accuracy against reality, and adjusting based on the gap between our expectations and outcomes. In these moments of alignment or discrepancy, we find our lessons, aiming for a balance where errors are minimized and understanding deepened.

AI models exemplify this journey towards refinement, employing methods like gradient descent to navigate the complexities of data. This mathematical technique, akin to descending a mountain in search of its lowest point, symbolizes the meticulous process of optimization—a quest for the point where predictions align seamlessly with reality.

Yet, the speed of learning is not the sole measure of success. Just as hasty decisions can lead to misunderstandings in human interactions, AI models benefit from a paced and thoughtful learning process. Overfitting and underfitting serve as cautionary tales, reminding us of the dangers of too narrowly interpreting data or failing to capture its underlying patterns.

This narrative extends beyond AI, reflecting a broader human condition. Our journey towards knowledge and understanding is punctuated by successes and setbacks, each offering valuable insights. Like AI, we thrive on diverse experiences and the richness of exploration, using our mistakes as stepping stones towards greater wisdom.

As we continue to explore AI and its parallels with human learning, we're reminded of the intricate dance between knowledge and discovery. The next chapter will delve into the realm of Calculus, further unraveling the mathematical tapestries that underpin our quest for understanding.

Baldur's Gate Infinite - A Simulated Universe of AGIs

balders-gate-infinite-1.jpeg

balders-gate-infinite-2.jpeg

balders-gate-infinite-3.jpeg

balders-gate-infinite-4.jpeg

Let's entertain a concept, one that was inspired by my time playing Baldur's Gate 3.

Imagine if we could construct a game of such extraordinary magnitude and complexity that it surpasses the boundaries of our current comprehension. A game where we play god, so to speak. Let's call this hypothetical masterpiece "Baldur's Gate Infinite."

In this hypothetical game, let's make the following assumptions:

  • Every entity, whether animate or inanimate, possesses an intelligence that surpasses human comprehension.
  • Each entity, animate or inanimate, rolls a die to determine the next phase of its existence.
  • Every interaction between entities triggers an infinite number of related die rolls, each influencing the others.
  • The outcome of each die roll can only be altered by adding extra chances and bonuses accumulated by the entity rolling the die.

In Baldur's Gate 3, we engage with the roll of a 20-sided die. However, in the hypothetical realm of Baldur's Gate Infinite, we would be dealing with the roll of an infinite-sided die. Quite fitting, isn't it?

Doesn't this concept echo our understanding of the universe, which is loosely based on quantum mechanics and the principle of uncertainty?

Now, imagine if every entity in 'Baldur's Gate Infinite' was endowed with artificial general intelligence, allowing it to carve its own path and shape its destiny. What kind of game would that be?

In current games, NPCs and other inanimate objects are programmed to perform specific actions. Even though we sometimes refer to them as 'AI', they are not truly autonomous. This is akin to how we used to refer to Alexa and Siri as 'AI assistants'.

In 'Baldur's Gate 3', each playthrough should be a unique experience because every NPC evolves differently based on your interactions. There are too many variables to replicate the exact same outcome and characters in a subsequent playthrough, no matter how meticulously you try to recreate your previous actions.

It's important to note that in Baldur's Gate Infinite, there is no 'save' feature. Every interaction and roll of the die is final and irreversible.

So, the experience of playing 'Baldur's Gate Infinite', a game with infinite possibilities, should be thrillingly unpredictable. Even if you play it a second time, retaining all the memories from your first playthrough, you can't guarantee any specific outcome. Your chances of success or failure won't be better or worse. It's always a fresh start, not a new game plus, as your achievements from the first playthrough won't carry over.

It's always a new game, with unknown chances of winning or losing.

How would you approach it?

Personally, I would choose to forget my first playthrough entirely. Since it won't provide any advantage, and could potentially reduce the fun and increase the risk, starting afresh seems like the best option.

The idea might seem amusing at first, but consider this: we are on a trajectory towards creating such games. Given infinite time, data, and compute, we could endow every entity with artificial general intelligence. Allow this concept some time to settle in, and you might find yourself in agreement.

This signifies the completion of our trilogy on Statistics, a journey that has taken us through the foundational aspects of this vast domain. While we've covered significant ground, it's important to remember that the journey into data science demands a deeper dive beyond what we've explored here. Nonetheless, for those seeking to grasp the fundamentals of statistics, its real-world applications, and the basics of coding AI models, this series offers a solid foundation. We've aimed to demystify the essentials, providing a springboard for further exploration and learning in the fascinating world of data and artificial intelligence.

As we prepare to turn the digital page, our adventure beckons us into the intricate world of Calculus. Here, we will continue to weave through the mathematical tapestries that are foundational to our quest for understanding, exploring the nuanced interplay of change and motion that Calculus so elegantly captures. Our exploration is far from over; in fact, a new chapter of discovery awaits, promising deeper insights and further enlightenment in the magical landscape of mathematics.