-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
20 changed files
with
14 additions
and
0 deletions.
There are no files selected for viewing
Binary file added
BIN
+73.8 KB
...ze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-22-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+75.5 KB
...ze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-24-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+37.8 KB
...ze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-32-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions
14
_freeze/posts/machine-learning-part1/05-metrics/index/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"hash": "924d0a3cf1e163dd82c56852b67cd6a2", | ||
"result": { | ||
"markdown": "---\ntitle: \"Defining Success\"\nauthor: \"Francois de Ryckel\"\ndate: \"2024-04-16\"\ncategories: [sklearn, tidymodel]\neditor: source\ndate-modified: '2024-04-20'\nexecute:\n cache: true\n---\n\n\nWhen evaluating models for a given ML algorithm, we need to define in advance what would be our metric to measure success. How would we decide if this models is better than this model? Or even which are the hyper-parameters that fine-tuned a model better? \n\nThis post is about defining what is *'best'* or *'better'* when comparing different **supervised models**. we'll have 2 main parts: measure of success for regression models and measure of success for classification models. \n\n# Regression models \n\nWhen modeling for regression, we somehow **measure the distance between our prediction and the actual observed value**. When comparing models, we usually want to keep the model which give the smallest sum of distance. \n\n## RMSE\n\nThis is probably the most well-known measure when comparing regression models. Because we are squaring the distance between the predicted and the observed, this penalizes predicted values that are far off the real values. Hence this measures is used when we want to avoid 'outlier' predictions (prediction that are far off.)\n\n$$RMSE = \\sqrt \\frac{\\sum_{i=1}^{n}(y_i - \\hat{y}_i)^2}{n}$$\n\n## MAE\n\nWith **Mean Absolute Error**, we just take the average of the errors. Useful when we don't really care if predictions is far off from the observed data. \n\n$$MAE = \\frac {\\sum_{i=1}^{n} \\lvert y_i - \\hat{y}_i \\rvert}{n}$$\n\n## Huber Loss\n\nHuber loss is a mixture of RMSE and MAE. Kind of the best of both world basically. \n\n$$$$\n\n\n# Classfication models \n\n## Accuracy \n\nShortcomings: \n\n* for imbalanced dataset, we can have good accuracy by just predicting most observation with the most frequent class. For instance in the case of a rare disease or big financial meltdown, we can just predict \n\n## Precision \n\nIf you call it true, is it indeed true? In other words, the proportion of predicted positive that are actually positive. \n\n## Recall \n\nIf there is a positive, did the model predict a positive. \n\n\n## F1 score \n\nIt is the **harmonic mean** of both precision and recall. The harmonic mean penalizes model that have very low precision or recall. Which wouldn't be the case with arithmetic mean. \n\n$$\\frac{2 \\cdot Precision \\cdot Recall}{Precision + Recall}$$\n\n## AUC & ROC Curve\n\nneed to get the prediction as a probability \n\n::: {.cell hash='index_cache/html/unnamed-chunk-1_f8febe300bdf5210f7934e8f7c002c44'}\n\n```{.r .cell-code}\nlibrary(yardstick)\n```\n:::\n", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
Binary file added
BIN
+20.5 KB
...sts/machine-learning-part1/01-knn/index_files/figure-html/py_scatterplot1-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
-2.85 KB
(99%)
...osts/machine-learning-part1/01-knn/index_files/figure-html/scatterplot3_r-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+75.5 KB
...ts/machine-learning-part1/01-knn/index_files/figure-html/unnamed-chunk-24-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Empty file.
Binary file added
BIN
+125 Bytes
...arning-part1/01-knn/index_cache/html/py_scatterplot2_615dc31a592cc2155de0ee60f7526871.rdx
Binary file not shown.
Binary file added
BIN
+2.96 KB
...rt1/01-knn/index_cache/html/r_baseMoel_prediction1_136062dc13140f9fe797e69bbf2739ab.RData
Binary file not shown.
File renamed without changes.
File renamed without changes.
Binary file added
BIN
+3.47 KB
...rning-part1/01-knn/index_cache/html/r_scatterplot2_4824b59dcc9eba957dfc7e37abec471c.RData
Binary file not shown.
Binary file added
BIN
+2.83 KB
...ing-part1/01-knn/index_cache/html/r_tuningHyperpa3_bd3e1003495f48f17bf1edc920be55a4.RData
Binary file not shown.