diff --git a/_freeze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-22-3.png b/_freeze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-22-3.png new file mode 100644 index 0000000..4ea29c3 Binary files /dev/null and b/_freeze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-22-3.png differ diff --git a/_freeze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-24-1.png b/_freeze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-24-1.png new file mode 100644 index 0000000..84526bb Binary files /dev/null and b/_freeze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-24-1.png differ diff --git a/_freeze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-32-1.png b/_freeze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-32-1.png new file mode 100644 index 0000000..87394f9 Binary files /dev/null and b/_freeze/posts/machine-learning-part1/01-knn/index/figure-html/unnamed-chunk-32-1.png differ diff --git a/_freeze/posts/machine-learning-part1/05-metrics/index/execute-results/html.json b/_freeze/posts/machine-learning-part1/05-metrics/index/execute-results/html.json new file mode 100644 index 0000000..26422fa --- /dev/null +++ b/_freeze/posts/machine-learning-part1/05-metrics/index/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "924d0a3cf1e163dd82c56852b67cd6a2", + "result": { + "markdown": "---\ntitle: \"Defining Success\"\nauthor: \"Francois de Ryckel\"\ndate: \"2024-04-16\"\ncategories: [sklearn, tidymodel]\neditor: source\ndate-modified: '2024-04-20'\nexecute:\n cache: true\n---\n\n\nWhen evaluating models for a given ML algorithm, we need to define in advance what would be our metric to measure success. How would we decide if this models is better than this model? Or even which are the hyper-parameters that fine-tuned a model better? \n\nThis post is about defining what is *'best'* or *'better'* when comparing different **supervised models**. we'll have 2 main parts: measure of success for regression models and measure of success for classification models. \n\n# Regression models \n\nWhen modeling for regression, we somehow **measure the distance between our prediction and the actual observed value**. When comparing models, we usually want to keep the model which give the smallest sum of distance. \n\n## RMSE\n\nThis is probably the most well-known measure when comparing regression models. Because we are squaring the distance between the predicted and the observed, this penalizes predicted values that are far off the real values. Hence this measures is used when we want to avoid 'outlier' predictions (prediction that are far off.)\n\n$$RMSE = \\sqrt \\frac{\\sum_{i=1}^{n}(y_i - \\hat{y}_i)^2}{n}$$\n\n## MAE\n\nWith **Mean Absolute Error**, we just take the average of the errors. Useful when we don't really care if predictions is far off from the observed data. \n\n$$MAE = \\frac {\\sum_{i=1}^{n} \\lvert y_i - \\hat{y}_i \\rvert}{n}$$\n\n## Huber Loss\n\nHuber loss is a mixture of RMSE and MAE. Kind of the best of both world basically. \n\n$$$$\n\n\n# Classfication models \n\n## Accuracy \n\nShortcomings: \n\n* for imbalanced dataset, we can have good accuracy by just predicting most observation with the most frequent class. For instance in the case of a rare disease or big financial meltdown, we can just predict \n\n## Precision \n\nIf you call it true, is it indeed true? In other words, the proportion of predicted positive that are actually positive. \n\n## Recall \n\nIf there is a positive, did the model predict a positive. \n\n\n## F1 score \n\nIt is the **harmonic mean** of both precision and recall. The harmonic mean penalizes model that have very low precision or recall. Which wouldn't be the case with arithmetic mean. \n\n$$\\frac{2 \\cdot Precision \\cdot Recall}{Precision + Recall}$$\n\n## AUC & ROC Curve\n\nneed to get the prediction as a probability \n\n::: {.cell hash='index_cache/html/unnamed-chunk-1_f8febe300bdf5210f7934e8f7c002c44'}\n\n```{.r .cell-code}\nlibrary(yardstick)\n```\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/docs/posts/machine-learning-part1/01-knn/index_files/figure-html/py_scatterplot1-1.png b/docs/posts/machine-learning-part1/01-knn/index_files/figure-html/py_scatterplot1-1.png new file mode 100644 index 0000000..209c18e Binary files /dev/null and b/docs/posts/machine-learning-part1/01-knn/index_files/figure-html/py_scatterplot1-1.png differ diff --git a/docs/posts/machine-learning-part1/01-knn/index_files/figure-html/scatterplot3_r-5.png b/docs/posts/machine-learning-part1/01-knn/index_files/figure-html/scatterplot3_r-5.png index 97c1b71..d3b252e 100644 Binary files a/docs/posts/machine-learning-part1/01-knn/index_files/figure-html/scatterplot3_r-5.png and b/docs/posts/machine-learning-part1/01-knn/index_files/figure-html/scatterplot3_r-5.png differ diff --git a/docs/posts/machine-learning-part1/01-knn/index_files/figure-html/unnamed-chunk-24-1.png b/docs/posts/machine-learning-part1/01-knn/index_files/figure-html/unnamed-chunk-24-1.png new file mode 100644 index 0000000..84526bb Binary files /dev/null and b/docs/posts/machine-learning-part1/01-knn/index_files/figure-html/unnamed-chunk-24-1.png differ diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/scatterplot1_99c48e654f7984d0c8bd33ea3bc26fe9.rdb b/posts/machine-learning-part1/01-knn/index_cache/html/py-loadFinanData_7987215e3e9f572d65c22c105975fd99.rdb similarity index 100% rename from posts/machine-learning-part1/01-knn/index_cache/html/scatterplot1_99c48e654f7984d0c8bd33ea3bc26fe9.rdb rename to posts/machine-learning-part1/01-knn/index_cache/html/py-loadFinanData_7987215e3e9f572d65c22c105975fd99.rdb diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/scatterplot1_99c48e654f7984d0c8bd33ea3bc26fe9.rdx b/posts/machine-learning-part1/01-knn/index_cache/html/py-loadFinanData_7987215e3e9f572d65c22c105975fd99.rdx similarity index 100% rename from posts/machine-learning-part1/01-knn/index_cache/html/scatterplot1_99c48e654f7984d0c8bd33ea3bc26fe9.rdx rename to posts/machine-learning-part1/01-knn/index_cache/html/py-loadFinanData_7987215e3e9f572d65c22c105975fd99.rdx diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/scatterplot2_7abfe7bb22d8aae052fc2b56f0519406.rdb b/posts/machine-learning-part1/01-knn/index_cache/html/py_baseModel_prediction_f3f124f9627240ae17cace4747a8208c.rdb similarity index 100% rename from posts/machine-learning-part1/01-knn/index_cache/html/scatterplot2_7abfe7bb22d8aae052fc2b56f0519406.rdb rename to posts/machine-learning-part1/01-knn/index_cache/html/py_baseModel_prediction_f3f124f9627240ae17cace4747a8208c.rdb diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/scatterplot2_7abfe7bb22d8aae052fc2b56f0519406.rdx b/posts/machine-learning-part1/01-knn/index_cache/html/py_baseModel_prediction_f3f124f9627240ae17cace4747a8208c.rdx similarity index 100% rename from posts/machine-learning-part1/01-knn/index_cache/html/scatterplot2_7abfe7bb22d8aae052fc2b56f0519406.rdx rename to posts/machine-learning-part1/01-knn/index_cache/html/py_baseModel_prediction_f3f124f9627240ae17cace4747a8208c.rdx diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/scatterplot3_4b3619dd6837ef96842ac46c0733b534.rdb b/posts/machine-learning-part1/01-knn/index_cache/html/py_baseMoel_prediction1_590eeab07a772e8d9714b20477fa9692.rdb similarity index 100% rename from posts/machine-learning-part1/01-knn/index_cache/html/scatterplot3_4b3619dd6837ef96842ac46c0733b534.rdb rename to posts/machine-learning-part1/01-knn/index_cache/html/py_baseMoel_prediction1_590eeab07a772e8d9714b20477fa9692.rdb diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/scatterplot3_4b3619dd6837ef96842ac46c0733b534.rdx b/posts/machine-learning-part1/01-knn/index_cache/html/py_baseMoel_prediction1_590eeab07a772e8d9714b20477fa9692.rdx similarity index 100% rename from posts/machine-learning-part1/01-knn/index_cache/html/scatterplot3_4b3619dd6837ef96842ac46c0733b534.rdx rename to posts/machine-learning-part1/01-knn/index_cache/html/py_baseMoel_prediction1_590eeab07a772e8d9714b20477fa9692.rdx diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/py_baseMoel_prediction2_e70ce6e086f62f7a1f9239c7b38cdd3c.rdb b/posts/machine-learning-part1/01-knn/index_cache/html/py_baseMoel_prediction2_e70ce6e086f62f7a1f9239c7b38cdd3c.rdb new file mode 100644 index 0000000..e69de29 diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/py_scatterplot2_615dc31a592cc2155de0ee60f7526871.rdx b/posts/machine-learning-part1/01-knn/index_cache/html/py_scatterplot2_615dc31a592cc2155de0ee60f7526871.rdx new file mode 100644 index 0000000..b7002d3 Binary files /dev/null and b/posts/machine-learning-part1/01-knn/index_cache/html/py_scatterplot2_615dc31a592cc2155de0ee60f7526871.rdx differ diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/r_baseMoel_prediction1_136062dc13140f9fe797e69bbf2739ab.RData b/posts/machine-learning-part1/01-knn/index_cache/html/r_baseMoel_prediction1_136062dc13140f9fe797e69bbf2739ab.RData new file mode 100644 index 0000000..e20a951 Binary files /dev/null and b/posts/machine-learning-part1/01-knn/index_cache/html/r_baseMoel_prediction1_136062dc13140f9fe797e69bbf2739ab.RData differ diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/scatterplot1_r_1fed42550fd363f7d21e8e0bdc879949.rdb b/posts/machine-learning-part1/01-knn/index_cache/html/r_scatterplot1_560455c1405724dd9e685cf21c791e40.rdb similarity index 100% rename from posts/machine-learning-part1/01-knn/index_cache/html/scatterplot1_r_1fed42550fd363f7d21e8e0bdc879949.rdb rename to posts/machine-learning-part1/01-knn/index_cache/html/r_scatterplot1_560455c1405724dd9e685cf21c791e40.rdb diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/scatterplot1_r_1fed42550fd363f7d21e8e0bdc879949.rdx b/posts/machine-learning-part1/01-knn/index_cache/html/r_scatterplot1_560455c1405724dd9e685cf21c791e40.rdx similarity index 100% rename from posts/machine-learning-part1/01-knn/index_cache/html/scatterplot1_r_1fed42550fd363f7d21e8e0bdc879949.rdx rename to posts/machine-learning-part1/01-knn/index_cache/html/r_scatterplot1_560455c1405724dd9e685cf21c791e40.rdx diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/r_scatterplot2_4824b59dcc9eba957dfc7e37abec471c.RData b/posts/machine-learning-part1/01-knn/index_cache/html/r_scatterplot2_4824b59dcc9eba957dfc7e37abec471c.RData new file mode 100644 index 0000000..7a32ae2 Binary files /dev/null and b/posts/machine-learning-part1/01-knn/index_cache/html/r_scatterplot2_4824b59dcc9eba957dfc7e37abec471c.RData differ diff --git a/posts/machine-learning-part1/01-knn/index_cache/html/r_tuningHyperpa3_bd3e1003495f48f17bf1edc920be55a4.RData b/posts/machine-learning-part1/01-knn/index_cache/html/r_tuningHyperpa3_bd3e1003495f48f17bf1edc920be55a4.RData new file mode 100644 index 0000000..4fb187e Binary files /dev/null and b/posts/machine-learning-part1/01-knn/index_cache/html/r_tuningHyperpa3_bd3e1003495f48f17bf1edc920be55a4.RData differ