From b98333587d1346ac5daff3d8d0a504c9ef2f30c4 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Tue, 19 Mar 2024 14:59:20 +0000 Subject: [PATCH] build based on 68f69b4 --- dev/.documenter-siteinfo.json | 2 +- dev/Api.html | 8 +- dev/Api_v2_developer.html | 4 +- dev/Api_v2_user.html | 4 +- dev/Benchmarks.html | 6 + dev/Clustering.html | 6 +- dev/Examples.html | 2 +- dev/GMM.html | 14 +- dev/Imputation.html | 12 +- dev/MLJ_interface.html | 50 +- dev/Nn.html | 36 +- dev/Perceptron.html | 8 +- dev/StyleGuide_templates.html | 4 +- dev/Trees.html | 6 +- dev/Utils.html | 43 +- dev/index.html | 4 +- dev/objects.inv | Bin 5698 -> 5750 bytes dev/search_index.js | 2 +- .../Betaml_tutorial_getting_started.html | 4 +- ...tutorial_classification_cars-5d3675ed.svg} | 46 +- ...tutorial_classification_cars-cc4fd638.svg} | 46 +- .../betaml_tutorial_classification_cars.html | 42 +- ...betaml_tutorial_cluster_iris-00862f78.svg} | 72 +- .../betaml_tutorial_cluster_iris.html | 10 +- ...taml_tutorial_multibranch_nn-915b89af.svg} | 52 +- ...taml_tutorial_multibranch_nn-ee2a7d64.svg} | 254 ++-- .../betaml_tutorial_multibranch_nn.html | 6 +- ...rial_regression_sharingBikes-04274d94.svg} | 64 +- ...rial_regression_sharingBikes-1840719c.svg} | 1142 ++++++++-------- ...rial_regression_sharingBikes-3917c772.svg} | 420 +++--- ...rial_regression_sharingBikes-3dcbf167.svg} | 428 +++--- ...rial_regression_sharingBikes-445fdc7c.svg} | 64 +- ...rial_regression_sharingBikes-687c029d.svg} | 424 +++--- ...rial_regression_sharingBikes-7ca0b26e.svg} | 1154 ++++++++--------- ...rial_regression_sharingBikes-832944fc.svg} | 1142 ++++++++-------- ...rial_regression_sharingBikes-9d9c07ea.svg} | 64 +- ...rial_regression_sharingBikes-9f2283a5.svg} | 64 +- ...rial_regression_sharingBikes-a018a4dd.svg} | 64 +- ...rial_regression_sharingBikes-ad52845b.svg} | 64 +- ...rial_regression_sharingBikes-ba588daa.svg} | 52 +- ...rial_regression_sharingBikes-c13c3d88.svg} | 64 +- ...rial_regression_sharingBikes-c4671433.svg} | 424 +++--- ...rial_regression_sharingBikes-e6db0fad.svg} | 64 +- ...rial_regression_sharingBikes-eb7964a6.svg} | 1142 ++++++++-------- ...rial_regression_sharingBikes-f11408b5.svg} | 64 +- ...rial_regression_sharingBikes-fff0b19f.svg} | 66 +- ...taml_tutorial_regression_sharingBikes.html | 40 +- 47 files changed, 3880 insertions(+), 3873 deletions(-) create mode 100644 dev/Benchmarks.html rename dev/tutorials/Classification - cars/{betaml_tutorial_classification_cars-08cd8a42.svg => betaml_tutorial_classification_cars-5d3675ed.svg} (90%) rename dev/tutorials/Classification - cars/{betaml_tutorial_classification_cars-bf76b088.svg => betaml_tutorial_classification_cars-cc4fd638.svg} (90%) rename dev/tutorials/Clustering - Iris/{betaml_tutorial_cluster_iris-c117f6b5.svg => betaml_tutorial_cluster_iris-00862f78.svg} (86%) rename dev/tutorials/Multi-branch neural network/{betaml_tutorial_multibranch_nn-f226ab71.svg => betaml_tutorial_multibranch_nn-915b89af.svg} (87%) rename dev/tutorials/Multi-branch neural network/{betaml_tutorial_multibranch_nn-0d345d12.svg => betaml_tutorial_multibranch_nn-ee2a7d64.svg} (80%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-57973fed.svg => betaml_tutorial_regression_sharingBikes-04274d94.svg} (87%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-fd386bbe.svg => betaml_tutorial_regression_sharingBikes-1840719c.svg} (72%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-fb217b8a.svg => betaml_tutorial_regression_sharingBikes-3917c772.svg} (79%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-6ffd2a5c.svg => betaml_tutorial_regression_sharingBikes-3dcbf167.svg} (79%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-40891983.svg => betaml_tutorial_regression_sharingBikes-445fdc7c.svg} (89%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-a876a07a.svg => betaml_tutorial_regression_sharingBikes-687c029d.svg} (79%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-5aa275e5.svg => betaml_tutorial_regression_sharingBikes-7ca0b26e.svg} (73%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-a13e0a0d.svg => betaml_tutorial_regression_sharingBikes-832944fc.svg} (72%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-660a6d67.svg => betaml_tutorial_regression_sharingBikes-9d9c07ea.svg} (90%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-5f85076a.svg => betaml_tutorial_regression_sharingBikes-9f2283a5.svg} (90%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-34a5da8e.svg => betaml_tutorial_regression_sharingBikes-a018a4dd.svg} (90%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-65342eec.svg => betaml_tutorial_regression_sharingBikes-ad52845b.svg} (89%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-996a1433.svg => betaml_tutorial_regression_sharingBikes-ba588daa.svg} (89%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-14a15b5b.svg => betaml_tutorial_regression_sharingBikes-c13c3d88.svg} (86%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-ce6f6faf.svg => betaml_tutorial_regression_sharingBikes-c4671433.svg} (79%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-d7735aab.svg => betaml_tutorial_regression_sharingBikes-e6db0fad.svg} (87%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-abadc39b.svg => betaml_tutorial_regression_sharingBikes-eb7964a6.svg} (72%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-d20630c7.svg => betaml_tutorial_regression_sharingBikes-f11408b5.svg} (86%) rename dev/tutorials/Regression - bike sharing/{betaml_tutorial_regression_sharingBikes-848cd3de.svg => betaml_tutorial_regression_sharingBikes-fff0b19f.svg} (87%) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 2847d28..019bfb9 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.6.7","generation_timestamp":"2024-03-18T12:33:20","documenter_version":"1.3.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.6.7","generation_timestamp":"2024-03-19T14:59:10","documenter_version":"1.3.0"}} \ No newline at end of file diff --git a/dev/Api.html b/dev/Api.html index cce8494..7228d7d 100644 --- a/dev/Api.html +++ b/dev/Api.html @@ -3,9 +3,9 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

The BetaML.Api Module

BetaML.ApiModule
Api

The Api Module (currently v2)

This module includes the shared api trough the various BetaML submodules, i.e. names used by more than one submodule.

Modules are free to use other functions but these are defined here to avoid name conflicts and allows instead Multiple Dispatch to handle them. For a user-prospective overall description of the BetaML API see the page API V2Introduction for users, while for the implementation of the API see the page API V2For developers

source

Module Index

Detailed API

BetaML.Api.FIXEDRNGConstant

Fixed ring to allow reproducible results

Use it with:

  • myAlgorithm(;rng=FIXEDRNG) # always produce the same sequence of results on each run of the script ("pulling" from the same rng object on different calls)
  • myAlgorithm(;rng=copy(FIXEDRNG)) # always produce the same result (new rng object on each function call)
source
BetaML.Api.FIXEDSEEDConstant
const FIXEDSEED

Fixed seed to allow reproducible results. This is the seed used to obtain the same results under unit tests.

Use it with:

  • myAlgorithm(;rng=MyChoosenRNG(FIXEDSEED)) # always produce the same sequence of results on each run of the script ("pulling" from the same rng object on different calls)
  • myAlgorithm(;rng=copy(MyChoosenRNG(FIXEDSEED))) # always produce the same result (new rng object on each call)
source
BetaML.Api.BML_optionsType
mutable struct BML_options <: BetaMLOptionsSet

A struct defining the options used by default by the algorithms that do not override it with their own option sets.

Fields:

  • cache::Bool: Cache the results of the fitting stage, as to allow predict(mod) [default: true]. Set it to false to save memory for large data.

  • descr::String: An optional title and/or description for this model

  • autotune::Bool: 0ption for hyper-parameters autotuning [def: false, i.e. not autotuning performed]. If activated, autotuning is performed on the first fit!() call. Controll auto-tuning trough the option tunemethod (see the model hyper-parameters)

  • verbosity::Verbosity: The verbosity level to be used in training or prediction: NONE, LOW, STD [default], HIGH or FULL

  • rng::Random.AbstractRNG: Random Number Generator (see ?FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • even if a model doesn't override BML_options, may not use all its options, for example deterministic models would not make use of the rng parameter. Passing such parameters in these cases would simply have no influence.

Example:

julia> options = BML_options(cache=false,descr="My model")
source
BetaML.Api.VerbosityType
primitive type Verbosity <: Enum{Int32} 32

Many models and functions accept a verbosity parameter.

Choose between: NONE, LOW, STD [default], HIGH and FULL.

source
BetaML.Api.fit!Method
fit!(m::BetaMLModel,X,[y])

Fit ("train") a BetaMLModel (i.e. learn the algorithm's parameters) based on data, either only features or features and labels.

Each specific model implements its own version of fit!(m,X,[Y]), but the usage is consistent across models.

Notes:

  • For online algorithms, i.e. models that support updating of the learned parameters with new data, fit! can be repeated as new data arrive, altought not all algorithms guarantee that training each record at the time is equivalent to train all the records at once.
  • If the model has been trained while having the cache option set on true (by default) fit! returns instead of nothing effectively making it behave like a fit-and-transform function.
  • In Python and other languages that don't allow the exclamation mark within the function name, use fit_ex(⋅) instead of fit!(⋅)
source
BetaML.Api.hyperparametersMethod
hyperparameters(m::BetaMLModel)

Returns the hyperparameters of a BetaML model. See also ?options for the parameters that do not directly affect learning.

Warning

The returned object is a reference, so if it is modified, the relative object in the model will change too.

source
BetaML.Api.infoMethod
info(m::BetaMLModel) -> Any
-

Return a string-keyed dictionary of "additional" information stored during model fitting.

source
BetaML.Api.inverse_predictMethod
inverse_predict(m::BetaMLModel,X)

Given a model m that fitted on x produces xnew, it takes xnew to return (possibly an approximation of ) x.

For example, when OneHotEncoder is fitted with a subset of the possible categories and the handle_unknown option is set on infrequent, inverse_transform will aggregate all the other categories as specified in other_categories_name.

Notes:

  • Inplemented only in a few models.
source
BetaML.Api.model_loadFunction
model_load(filename::AbstractString)
+

The BetaML.Api Module

BetaML.ApiModule
Api

The Api Module (currently v2)

This module includes the shared api trough the various BetaML submodules, i.e. names used by more than one submodule.

Modules are free to use other functions but these are defined here to avoid name conflicts and allows instead Multiple Dispatch to handle them. For a user-prospective overall description of the BetaML API see the page API V2Introduction for users, while for the implementation of the API see the page API V2For developers

source

Module Index

Detailed API

BetaML.Api.FIXEDRNGConstant

Fixed ring to allow reproducible results

Use it with:

  • myAlgorithm(;rng=FIXEDRNG) # always produce the same sequence of results on each run of the script ("pulling" from the same rng object on different calls)
  • myAlgorithm(;rng=copy(FIXEDRNG)) # always produce the same result (new rng object on each function call)
source
BetaML.Api.FIXEDSEEDConstant
const FIXEDSEED

Fixed seed to allow reproducible results. This is the seed used to obtain the same results under unit tests.

Use it with:

  • myAlgorithm(;rng=MyChoosenRNG(FIXEDSEED)) # always produce the same sequence of results on each run of the script ("pulling" from the same rng object on different calls)
  • myAlgorithm(;rng=copy(MyChoosenRNG(FIXEDSEED))) # always produce the same result (new rng object on each call)
source
BetaML.Api.BML_optionsType
mutable struct BML_options <: BetaMLOptionsSet

A struct defining the options used by default by the algorithms that do not override it with their own option sets.

Fields:

  • cache::Bool: Cache the results of the fitting stage, as to allow predict(mod) [default: true]. Set it to false to save memory for large data.

  • descr::String: An optional title and/or description for this model

  • autotune::Bool: 0ption for hyper-parameters autotuning [def: false, i.e. not autotuning performed]. If activated, autotuning is performed on the first fit!() call. Controll auto-tuning trough the option tunemethod (see the model hyper-parameters)

  • verbosity::Verbosity: The verbosity level to be used in training or prediction: NONE, LOW, STD [default], HIGH or FULL

  • rng::Random.AbstractRNG: Random Number Generator (see ?FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • even if a model doesn't override BML_options, may not use all its options, for example deterministic models would not make use of the rng parameter. Passing such parameters in these cases would simply have no influence.

Example:

julia> options = BML_options(cache=false,descr="My model")
source
BetaML.Api.VerbosityType
primitive type Verbosity <: Enum{Int32} 32

Many models and functions accept a verbosity parameter.

Choose between: NONE, LOW, STD [default], HIGH and FULL.

source
BetaML.Api.fit!Method
fit!(m::BetaMLModel,X,[y])

Fit ("train") a BetaMLModel (i.e. learn the algorithm's parameters) based on data, either only features or features and labels.

Each specific model implements its own version of fit!(m,X,[Y]), but the usage is consistent across models.

Notes:

  • For online algorithms, i.e. models that support updating of the learned parameters with new data, fit! can be repeated as new data arrive, altought not all algorithms guarantee that training each record at the time is equivalent to train all the records at once.
  • If the model has been trained while having the cache option set on true (by default) fit! returns instead of nothing effectively making it behave like a fit-and-transform function.
  • In Python and other languages that don't allow the exclamation mark within the function name, use fit_ex(⋅) instead of fit!(⋅)
source
BetaML.Api.hyperparametersMethod
hyperparameters(m::BetaMLModel)

Returns the hyperparameters of a BetaML model. See also ?options for the parameters that do not directly affect learning.

Warning

The returned object is a reference, so if it is modified, the relative object in the model will change too.

source
BetaML.Api.infoMethod
info(m::BetaMLModel) -> Any
+

Return a string-keyed dictionary of "additional" information stored during model fitting.

source
BetaML.Api.inverse_predictMethod
inverse_predict(m::BetaMLModel,X)

Given a model m that fitted on x produces xnew, it takes xnew to return (possibly an approximation of ) x.

For example, when OneHotEncoder is fitted with a subset of the possible categories and the handle_unknown option is set on infrequent, inverse_transform will aggregate all the other categories as specified in other_categories_name.

Notes:

  • Inplemented only in a few models.
source
BetaML.Api.model_loadFunction
model_load(filename::AbstractString)
 model_load(filename::AbstractString,args::AbstractString...)

Load from file one or more BetaML models (wheter fitted or not).

Notes:

  • If no model names to retrieve are specified it returns a dictionary keyed with the model names
  • If multiple models are demanded, a tuple is returned
  • For further options see the documentation of the function load of the JLD2 package

Examples:

julia> models = model_load("fittedModels.jl"; mod1Name=mod1,mod2)
 julia> mod1 = model_load("fittedModels.jl",mod1)
-julia> (mod1,mod2) = model_load("fittedModels.jl","mod1", "mod2")
source
BetaML.Api.model_saveFunction
model_save(filename::AbstractString,overwrite_file::Bool=false;kwargs...)

Allow to save one or more BetaML models (wheter fitted or not), eventually specifying a name for each of them.

Parameters:

  • filename: Name of the destination file
  • overwrite_file: Wheter to overrite the file if it alreaxy exist or preserve it (for the objects different than the one that are going to be saved) [def: false, i.e. preserve the file]
  • kwargs: model objects to be saved, eventually associated with a different name to save the mwith (e.g. mod1Name=mod1,mod2)

Notes:

  • If an object with the given name already exists on the destination JLD2 file it will be ovenwritten.
  • If the file exists, but it is not a JLD2 file and the option overwrite_file is set to false, an error will be raisen.
  • Use the semicolon ; to separate the filename from the model(s) to save
  • For further options see the documentation of the JLD2 package

Examples

julia> model_save("fittedModels.jl"; mod1Name=mod1,mod2)
source
BetaML.Api.optionsMethod
options(m::BetaMLModel)

Returns the non-learning related options of a BetaML model. See also ?hyperparameters for the parameters that directly affect learning.

Warning

The returned object is a reference, so if it is modified, the relative object in the model will change too.

source
BetaML.Api.parametersMethod
parameters(m::BetaMLModel)

Returns the learned parameters of a BetaML model.

Warning

The returned object is a reference, so if it is modified, the relative object in the model will change too.

source
BetaML.Api.predictMethod
predict(m::BetaMLModel,[X])

Predict new information (including transformation) based on a fitted BetaMLModel, eventually applied to new features when the algorithm generalises to new data.

Notes:

  • As a convenience, if the model has been trained while having the cache option set on true (by default) the predictions associated with the last training of the model is retained in the model object and can be retrieved simply with predict(m).
source
BetaML.Api.reset!Method
reset!(m::BetaMLModel)

Reset the parameters of a trained model.

Notes:

  • In Python and other languages that don't allow the exclamation mark within the function name, use reset_ex(⋅) instead of reset!(⋅)
source
BetaML.Api.sethp!Method
sethp!(m::BetaMLModel, hp::Dict)
-

Set the hyperparameters of model m as specified in the hp dictionary.

source
+julia> (mod1,mod2) = model_load("fittedModels.jl","mod1", "mod2")
source
BetaML.Api.model_saveFunction
model_save(filename::AbstractString,overwrite_file::Bool=false;kwargs...)

Allow to save one or more BetaML models (wheter fitted or not), eventually specifying a name for each of them.

Parameters:

  • filename: Name of the destination file
  • overwrite_file: Wheter to overrite the file if it alreaxy exist or preserve it (for the objects different than the one that are going to be saved) [def: false, i.e. preserve the file]
  • kwargs: model objects to be saved, eventually associated with a different name to save the mwith (e.g. mod1Name=mod1,mod2)

Notes:

  • If an object with the given name already exists on the destination JLD2 file it will be ovenwritten.
  • If the file exists, but it is not a JLD2 file and the option overwrite_file is set to false, an error will be raisen.
  • Use the semicolon ; to separate the filename from the model(s) to save
  • For further options see the documentation of the JLD2 package

Examples

julia> model_save("fittedModels.jl"; mod1Name=mod1,mod2)
source
BetaML.Api.optionsMethod
options(m::BetaMLModel)

Returns the non-learning related options of a BetaML model. See also ?hyperparameters for the parameters that directly affect learning.

Warning

The returned object is a reference, so if it is modified, the relative object in the model will change too.

source
BetaML.Api.parametersMethod
parameters(m::BetaMLModel)

Returns the learned parameters of a BetaML model.

Warning

The returned object is a reference, so if it is modified, the relative object in the model will change too.

source
BetaML.Api.predictMethod
predict(m::BetaMLModel,[X])

Predict new information (including transformation) based on a fitted BetaMLModel, eventually applied to new features when the algorithm generalises to new data.

Notes:

  • As a convenience, if the model has been trained while having the cache option set on true (by default) the predictions associated with the last training of the model is retained in the model object and can be retrieved simply with predict(m).
source
BetaML.Api.reset!Method
reset!(m::BetaMLModel)

Reset the parameters of a trained model.

Notes:

  • In Python and other languages that don't allow the exclamation mark within the function name, use reset_ex(⋅) instead of reset!(⋅)
source
BetaML.Api.sethp!Method
sethp!(m::BetaMLModel, hp::Dict)
+

Set the hyperparameters of model m as specified in the hp dictionary.

source
diff --git a/dev/Api_v2_developer.html b/dev/Api_v2_developer.html index d28e0d7..2e1f9c1 100644 --- a/dev/Api_v2_developer.html +++ b/dev/Api_v2_developer.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

Api v2 - developer documentation (API implementation)

Each model is a child of either BetaMLSuperVisedModel or BetaMLSuperVisedModel, both in turn child of BetaMLModel:

BetaMLSuperVisedModel   <: BetaMLModel
+

Api v2 - developer documentation (API implementation)

Each model is a child of either BetaMLSuperVisedModel or BetaMLSuperVisedModel, both in turn child of BetaMLModel:

BetaMLSuperVisedModel   <: BetaMLModel
 BetaMLUnsupervisedModel <: BetaMLModel
 RandomForestEstimator                 <: BetaMLSuperVisedModel

The model struct is composed of the following elements:

mutable struct DecisionTreeEstimator <: BetaMLSupervisedModel
     hpar::DecisionTreeE_hp   # Hyper-pharameters
@@ -38,4 +38,4 @@
         found || error("Keyword \"$kw\" is not part of this model.")
     end
     return m
-end

So, in order to implement a new model we need to:

  • implement its struct and constructor
  • implement the relative ModelHyperParametersSet, ModelLearnedParametersSet and eventually ModelOptionsSet.
  • define fit!(model, X, [y]), predict(model,X) and eventually inverse_predict(model,X).
+end

So, in order to implement a new model we need to:

  • implement its struct and constructor
  • implement the relative ModelHyperParametersSet, ModelLearnedParametersSet and eventually ModelOptionsSet.
  • define fit!(model, X, [y]), predict(model,X) and eventually inverse_predict(model,X).
diff --git a/dev/Api_v2_user.html b/dev/Api_v2_user.html index 0888ee3..0779996 100644 --- a/dev/Api_v2_user.html +++ b/dev/Api_v2_user.html @@ -3,6 +3,6 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

BetaML Api v2

Note

The API described below is the default one starting from BetaML v0.8.

The following API is designed to further simply the usage of the various ML models provided by BetaML introducing a common workflow. This is the user documentation. Refer to the developer documentation to learn how the API is implemented.

Supervised , unsupervised and transformed models

Supervised refers to models designed to learn a relation between some features (often noted with X) and some labels (often noted with Y) in order to predict the label of new data given the observed features alone. Perceptron, decision trees or neural networks are common examples. Unsupervised and transformer models relate to models that learn a "structure" from the data itself (without any label attached from which to learn) and report either some new information using this learned structure (e.g. a cluster class) or directly process a transformation of the data itself, like PCAEncoder or missing imputers. There is no difference in BetaML about these kind of models, aside that the fitting (aka training) function for the former takes both the features and the labels. In particular there isn't a separate transform function as in other frameworks, but any information we need to learn using the model, wheter a label or some transformation of the original data, is provided by the predict function.

Model constructor

The first step is to build the model constructor by passing (using keyword arguments) the agorithm hyperparameters and various options (cache results flag, debug levels, random number generators, ...):

mod = ModelName(par1=X,par2=Y,...)

Sometimes a parameter is itself another model, in such case we would have:

mod = ModelName(par1=OtherModel(a_par_of_OtherModel=X,...),par2=Y,...)

Training of the model

The second step is to fit (aka train) the model:

fit!(m,X,[Y])

where Y is present only for supervised models.

For online algorithms, i.e. models that support updating of the learned parameters with new data, fit! can be repeated as new data arrive, altought not all algorithms guarantee that training each record at the time is equivalent to train all the records at once. In some algorithms the "old training" could be used as initial conditions, without consideration if these has been achieved with hundread or millions of records, and the new data we use for training become much more important than the old one for the determination of the learned parameters.

Prediction

Fitted models can be used to predict y (wheter the label, some desired new information or a transformation) given new X:

ŷ = predict(mod,X)

As a convenience, if the model has been trained while having the cache option set on true (by default) the of the last training is retained in the model object and it can be retrieved simply with predict(mod). Also in such case the fit! function returns instead of nothing effectively making it to behave like a fit-and-transform function. The 3 expressions below are hence equivalent :

ŷ  = fit!(mod,xtrain)    # only with `cache=true` in the model constructor (default)
+

BetaML Api v2

Note

The API described below is the default one starting from BetaML v0.8.

The following API is designed to further simply the usage of the various ML models provided by BetaML introducing a common workflow. This is the user documentation. Refer to the developer documentation to learn how the API is implemented.

Supervised , unsupervised and transformed models

Supervised refers to models designed to learn a relation between some features (often noted with X) and some labels (often noted with Y) in order to predict the label of new data given the observed features alone. Perceptron, decision trees or neural networks are common examples. Unsupervised and transformer models relate to models that learn a "structure" from the data itself (without any label attached from which to learn) and report either some new information using this learned structure (e.g. a cluster class) or directly process a transformation of the data itself, like PCAEncoder or missing imputers. There is no difference in BetaML about these kind of models, aside that the fitting (aka training) function for the former takes both the features and the labels. In particular there isn't a separate transform function as in other frameworks, but any information we need to learn using the model, wheter a label or some transformation of the original data, is provided by the predict function.

Model constructor

The first step is to build the model constructor by passing (using keyword arguments) the agorithm hyperparameters and various options (cache results flag, debug levels, random number generators, ...):

mod = ModelName(par1=X,par2=Y,...)

Sometimes a parameter is itself another model, in such case we would have:

mod = ModelName(par1=OtherModel(a_par_of_OtherModel=X,...),par2=Y,...)

Training of the model

The second step is to fit (aka train) the model:

fit!(m,X,[Y])

where Y is present only for supervised models.

For online algorithms, i.e. models that support updating of the learned parameters with new data, fit! can be repeated as new data arrive, altought not all algorithms guarantee that training each record at the time is equivalent to train all the records at once. In some algorithms the "old training" could be used as initial conditions, without consideration if these has been achieved with hundread or millions of records, and the new data we use for training become much more important than the old one for the determination of the learned parameters.

Prediction

Fitted models can be used to predict y (wheter the label, some desired new information or a transformation) given new X:

ŷ = predict(mod,X)

As a convenience, if the model has been trained while having the cache option set on true (by default) the of the last training is retained in the model object and it can be retrieved simply with predict(mod). Also in such case the fit! function returns instead of nothing effectively making it to behave like a fit-and-transform function. The 3 expressions below are hence equivalent :

ŷ  = fit!(mod,xtrain)    # only with `cache=true` in the model constructor (default)
 ŷ1 = predict(mod)        # only with `cache=true` in the model constructor (default)
-ŷ2 = predict(mod,xtrain) 

Other functions

Models can be resetted to lose the learned information with reset!(mod) and training information (other than the algorithm learned parameters, see below) can be retrieved with info(mod).

Hyperparameters, options and learned parameters can be retrieved with the functions hyperparameters, parameters and options respectively. Note that they can be used also to set new values to the model as they return a reference to the required objects.

Note

Which is the difference between the output of info, parameters and the predict function ? The predict function (and, when cache is used, the fit! one too) returns the main information required from the model.. the prediceted label for supervised models, the class assignment for clusters or the reprojected data for PCA.... info returns complementary information like the number of dimensions of the data or the number of data emploied for training. It doesn't include information that is necessary for the training itself, like the centroids in cluser analysis. These can be retrieved instead using parameters that include all and only the information required to compute predict.

Some models allow an inverse transformation, that using the parameters learned at trainign time (e.g. the scale factors) perform an inverse tranformation of new data to the space of the training data (e.g. the unscaled space). Use inverse_predict(mod,xnew).

+ŷ2 = predict(mod,xtrain)

Other functions

Models can be resetted to lose the learned information with reset!(mod) and training information (other than the algorithm learned parameters, see below) can be retrieved with info(mod).

Hyperparameters, options and learned parameters can be retrieved with the functions hyperparameters, parameters and options respectively. Note that they can be used also to set new values to the model as they return a reference to the required objects.

Note

Which is the difference between the output of info, parameters and the predict function ? The predict function (and, when cache is used, the fit! one too) returns the main information required from the model.. the prediceted label for supervised models, the class assignment for clusters or the reprojected data for PCA.... info returns complementary information like the number of dimensions of the data or the number of data emploied for training. It doesn't include information that is necessary for the training itself, like the centroids in cluser analysis. These can be retrieved instead using parameters that include all and only the information required to compute predict.

Some models allow an inverse transformation, that using the parameters learned at trainign time (e.g. the scale factors) perform an inverse tranformation of new data to the space of the training data (e.g. the unscaled space). Use inverse_predict(mod,xnew).

diff --git a/dev/Benchmarks.html b/dev/Benchmarks.html new file mode 100644 index 0000000..e058ec7 --- /dev/null +++ b/dev/Benchmarks.html @@ -0,0 +1,6 @@ + +Benchmarks · BetaML.jl Documentation

BetaML Benchmarks

This benchmark allows to quickly check for regressions across versions. As it is run and compiled using GitHub actions, and these may be powered by different computational resources, timing results are normalized using SystemBenchmark.

This page also provides a basic comparison with other leading Julia libraries for the same algorithm, USING DEFAULT VALUES. This file is intended just for benchmarking, not much as a tutorial, and it doesn't employ a full ML workflow, just the minimum preprocessing such that the algorithms work.

Benchmark setup

Threads.nthreads()
4
avg_factor_to_ref
1.7456763872216032

Regression

A simple regression over 500 points with y = x₁²-x₂+x₃²

df_regr

6 rows × 8 columns

nametimememoryallocsmre_trainstd_trainmre_teststd_test
StringFloat64Int64Int64Float64Float64Float64Float64
1DT8.38164e68302672556060.0007237252.76557e-50.00522130.000643395
2RF6.70826e71157979209977150.001231453.35691e-50.003094810.000272821
3NN1.0668e91048887408112270280.001173295.52884e-50.001195240.000245931
4DT (DecisionTrees.jl)1.11167e5733444990.006483750.0002144950.008783510.00135441
5RF (DecisionTrees.jl)3.60889e5100748834180.004917040.0001564990.00611280.00136698
6NN (Flux.jl)1.2503e821532168011780010.003345919.90508e-50.00336549.90508e-5

Classification

A dicotomic diagnostic breast cancer classification based on the Wisconsin Breast Cancer Database.

df_class

9 rows × 8 columns

nametimememoryallocsacc_trainstd_trainacc_teststd_test
StringFloat64Int64Int64Float64Float64Float64Float64
1DT3.96147e821075297613006230.9994140.001318260.9121550.0247153
2RF6.56556e869466608041204930.999610.001232860.9613410.0215558
3NN2.69502e910846554912150407440.9994140.0009434490.971930.0205928
4Perc7.85226e73522944966209100.9900410.005782150.9578950.0236824
5KPerc1.06994e91447524096119951321.00.00.9403510.0332871
6Peg1.63525e860667641613780090.9408320.01850630.9385960.0312195
7DT (DT.jl)3.57919e6569282421.00.00.9280080.0424737
8RF (DT.jl)1.21079e6178912018040.9908220.003912270.9613410.0215558
9NN (Flux.jl)2.57438e857292808014088011.00.00.9649120.021881

Clustering

TODO :-)

Missing imputation

TODO :-)

diff --git a/dev/Clustering.html b/dev/Clustering.html index a5a7bc0..56ab80b 100644 --- a/dev/Clustering.html +++ b/dev/Clustering.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

The BetaML.Clustering Module

BetaML.ClusteringModule
Clustering module (WIP)

(Hard) Clustering algorithms

Provide hard clustering methods using K-means and K-medoids. Please see also the GMM module for GMM-based soft clustering (i.e. where a probability distribution to be part of the various classes is assigned to each record instead of a single class), missing values imputation / collaborative filtering / reccomendation systems using clustering methods as backend.

The module provides the following models. Use ?[model] to access their documentation:

Some metrics of the clustered output are available (e.g. silhouette).

source

Module Index

Detailed API

BetaML.Clustering.KMeansC_hpType
mutable struct KMeansC_hp <: BetaMLHyperParametersSet

Hyperparameters for the KMeansClusterer model

Parameters:

  • n_classes::Int64: Number of classes to discriminate the data [def: 3]

  • dist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance, cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.

  • initialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:

    • "random": randomly in the X space [default]
    • "grid": using a grid approach
    • "shuffle": selecting randomly within the available points
    • "given": using a provided set of initial representatives provided in the initial_representatives parameter
  • initial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy="given") [default: nothing]
source
BetaML.Clustering.KMeansClustererType
mutable struct KMeansClusterer <: BetaMLUnsupervisedModel

The classical "K-Means" clustering algorithm (unsupervised).

Learn to partition the data and assign each record to one of the n_classes classes according to a distance metric (default Euclidean).

For the parameters see ?KMeansC_hp and ?BML_options.

Notes:

  • data must be numerical
  • online fitting (re-fitting with new data) is supported by using the "old" representatives as init ones

Example :

julia> using BetaML
+

The BetaML.Clustering Module

BetaML.ClusteringModule
Clustering module (WIP)

(Hard) Clustering algorithms

Provide hard clustering methods using K-means and K-medoids. Please see also the GMM module for GMM-based soft clustering (i.e. where a probability distribution to be part of the various classes is assigned to each record instead of a single class), missing values imputation / collaborative filtering / reccomendation systems using clustering methods as backend.

The module provides the following models. Use ?[model] to access their documentation:

Some metrics of the clustered output are available (e.g. silhouette).

source

Module Index

Detailed API

BetaML.Clustering.KMeansC_hpType
mutable struct KMeansC_hp <: BetaMLHyperParametersSet

Hyperparameters for the KMeansClusterer model

Parameters:

  • n_classes::Int64: Number of classes to discriminate the data [def: 3]

  • dist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance, cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.

  • initialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:

    • "random": randomly in the X space [default]
    • "grid": using a grid approach
    • "shuffle": selecting randomly within the available points
    • "given": using a provided set of initial representatives provided in the initial_representatives parameter
  • initial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy="given") [default: nothing]
source
BetaML.Clustering.KMeansClustererType
mutable struct KMeansClusterer <: BetaMLUnsupervisedModel

The classical "K-Means" clustering algorithm (unsupervised).

Learn to partition the data and assign each record to one of the n_classes classes according to a distance metric (default Euclidean).

For the parameters see ?KMeansC_hp and ?BML_options.

Notes:

  • data must be numerical
  • online fitting (re-fitting with new data) is supported by using the "old" representatives as init ones

Example :

julia> using BetaML
 
 julia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8]
 5×2 Matrix{Float64}:
@@ -36,7 +36,7 @@
 
 julia> parameters(mod)
 BetaML.Clustering.KMeansMedoids_lp (a BetaMLLearnableParametersSet struct)
-- representatives: [1.13366 9.7209; 11.0 0.9]
source
BetaML.Clustering.KMedoidsC_hpType
mutable struct KMedoidsC_hp <: BetaMLHyperParametersSet

Hyperparameters for the and KMedoidsClusterer models

Parameters:

  • n_classes::Int64: Number of classes to discriminate the data [def: 3]

  • dist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance, cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.

  • initialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:

    • "random": randomly in the X space
    • "grid": using a grid approach
    • "shuffle": selecting randomly within the available points [default]
    • "given": using a provided set of initial representatives provided in the initial_representatives parameter
  • initial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy="given") [default: nothing]
source
BetaML.Clustering.KMedoidsClustererType
mutable struct KMedoidsClusterer <: BetaMLUnsupervisedModel

The classical "K-Medoids" clustering algorithm (unsupervised).

Similar to K-Means, learn to partition the data and assign each record to one of the n_classes classes according to a distance metric, but the "representatives" (the cetroids) are guaranteed to be one of the training points. The algorithm work with any arbitrary distance measure (default Euclidean).

For the parameters see ?KMedoidsC_hp and ?BML_options.

Notes:

  • data must be numerical
  • online fitting (re-fitting with new data) is supported by using the "old" representatives as init ones
  • with initialisation_strategy different than shuffle (the default initialisation for K-Medoids) the representatives may not be one of the training points when the algorithm doesn't perform enought iterations. This can happen for example when the number of classes is close to the number of records to cluster.

Example:

julia> using BetaML
+- representatives: [1.13366 9.7209; 11.0 0.9]
source
BetaML.Clustering.KMedoidsC_hpType
mutable struct KMedoidsC_hp <: BetaMLHyperParametersSet

Hyperparameters for the and KMedoidsClusterer models

Parameters:

  • n_classes::Int64: Number of classes to discriminate the data [def: 3]

  • dist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance, cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.

  • initialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:

    • "random": randomly in the X space
    • "grid": using a grid approach
    • "shuffle": selecting randomly within the available points [default]
    • "given": using a provided set of initial representatives provided in the initial_representatives parameter
  • initial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy="given") [default: nothing]
source
BetaML.Clustering.KMedoidsClustererType
mutable struct KMedoidsClusterer <: BetaMLUnsupervisedModel

The classical "K-Medoids" clustering algorithm (unsupervised).

Similar to K-Means, learn to partition the data and assign each record to one of the n_classes classes according to a distance metric, but the "representatives" (the cetroids) are guaranteed to be one of the training points. The algorithm work with any arbitrary distance measure (default Euclidean).

For the parameters see ?KMedoidsC_hp and ?BML_options.

Notes:

  • data must be numerical
  • online fitting (re-fitting with new data) is supported by using the "old" representatives as init ones
  • with initialisation_strategy different than shuffle (the default initialisation for K-Medoids) the representatives may not be one of the training points when the algorithm doesn't perform enought iterations. This can happen for example when the number of classes is close to the number of records to cluster.

Example:

julia> using BetaML
 
 julia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8]
 5×2 Matrix{Float64}:
@@ -69,4 +69,4 @@
 
 julia> parameters(mod)
 BetaML.Clustering.KMeansMedoids_lp (a BetaMLLearnableParametersSet struct)
-- representatives: [0.9 9.8; 11.0 0.9]
source
+- representatives: [0.9 9.8; 11.0 0.9]
source
diff --git a/dev/Examples.html b/dev/Examples.html index a91a59e..9c91958 100644 --- a/dev/Examples.html +++ b/dev/Examples.html @@ -3,4 +3,4 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

Examples

Supervised learning

Regression

Estimating the bike sharing demand

The task is to estimate the influence of several variables (like the weather, the season, the day of the week..) on the demand of shared bicycles, so that the authority in charge of the service can organise the service in the best way.

Data origin:

  • original full dataset (by hour, not used here): https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
  • simplified dataset (by day, with some simple scaling): https://www.hds.utc.fr/~tdenoeux/dokuwiki/en/aec
    • description: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/exam2019ace.pdf
    • data: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/bikesharing_day.csv.zip

Note that even if we are estimating a time serie, we are not using here a recurrent neural network as we assume the temporal dependence to be negligible (i.e. $Y_t = f(X_t)$ alone).

Classification

Unsupervised lerarning

Notebooks

The following notebooks provide runnable examples of the package functionality:

Note: the live, runnable computational environment is a temporary new copy made at each connection. The first time after a commit is done on this repository a new environment has to be set (instead of just being copied), and the server may take several minutes.

This is only if you are the unlucky user triggering the rebuild of the environment after the commit.

+

Examples

Supervised learning

Regression

Estimating the bike sharing demand

The task is to estimate the influence of several variables (like the weather, the season, the day of the week..) on the demand of shared bicycles, so that the authority in charge of the service can organise the service in the best way.

Data origin:

  • original full dataset (by hour, not used here): https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
  • simplified dataset (by day, with some simple scaling): https://www.hds.utc.fr/~tdenoeux/dokuwiki/en/aec
    • description: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/exam2019ace.pdf
    • data: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/bikesharing_day.csv.zip

Note that even if we are estimating a time serie, we are not using here a recurrent neural network as we assume the temporal dependence to be negligible (i.e. $Y_t = f(X_t)$ alone).

Classification

Unsupervised lerarning

Notebooks

The following notebooks provide runnable examples of the package functionality:

Note: the live, runnable computational environment is a temporary new copy made at each connection. The first time after a commit is done on this repository a new environment has to be set (instead of just being copied), and the server may take several minutes.

This is only if you are the unlucky user triggering the rebuild of the environment after the commit.

diff --git a/dev/GMM.html b/dev/GMM.html index af8aa00..0f369ec 100644 --- a/dev/GMM.html +++ b/dev/GMM.html @@ -3,19 +3,19 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

The BetaML.GMM Module

BetaML.GMMModule
GMM module

Generative (Gaussian) Mixed Model learners (supervised/unsupervised)

Provides clustering and regressors using (Generative) Gaussiam Mixture Model (probabilistic).

Collaborative filtering / missing values imputation / reccomendation systems based on GMM is available in the Imputation module.

The module provides the following models. Use ?[model] to access their documentation:

All the algorithms works with arbitrary mixture distribution, altought only {Spherical|Diagonal|Full} Gaussian mixtures has been implemented. User defined mixtures can be used defining a struct as subtype of AbstractMixture and implementing for that mixture the following functions:

  • init_mixtures!(mixtures, X; minimum_variance, minimum_covariance, initialisation_strategy)
  • lpdf(m,x,mask) (for the e-step)
  • update_parameters!(mixtures, X, pₙₖ; minimum_variance, minimum_covariance) (the m-step)
  • npar(mixtures::Array{T,1}) (for the BIC/AIC computation)

All the GMM-based algorithms works only with numerical data, but accepts also Missing one.

The GaussianMixtureClusterer algorithm reports the BIC and the AIC in its info(model), but some metrics of the clustered output are also available, for example the silhouette score.

source

Module Index

Detailed API

BetaML.GMM.DiagonalGaussianMethod
DiagonalGaussian(
+

The BetaML.GMM Module

BetaML.GMMModule
GMM module

Generative (Gaussian) Mixed Model learners (supervised/unsupervised)

Provides clustering and regressors using (Generative) Gaussiam Mixture Model (probabilistic).

Collaborative filtering / missing values imputation / reccomendation systems based on GMM is available in the Imputation module.

The module provides the following models. Use ?[model] to access their documentation:

All the algorithms works with arbitrary mixture distribution, altought only {Spherical|Diagonal|Full} Gaussian mixtures has been implemented. User defined mixtures can be used defining a struct as subtype of AbstractMixture and implementing for that mixture the following functions:

  • init_mixtures!(mixtures, X; minimum_variance, minimum_covariance, initialisation_strategy)
  • lpdf(m,x,mask) (for the e-step)
  • update_parameters!(mixtures, X, pₙₖ; minimum_variance, minimum_covariance) (the m-step)
  • npar(mixtures::Array{T,1}) (for the BIC/AIC computation)

All the GMM-based algorithms works only with numerical data, but accepts also Missing one.

The GaussianMixtureClusterer algorithm reports the BIC and the AIC in its info(model), but some metrics of the clustered output are also available, for example the silhouette score.

source

Module Index

Detailed API

BetaML.GMM.DiagonalGaussianMethod
DiagonalGaussian(
     μ::Union{Nothing, Array{T, 1}}
 ) -> DiagonalGaussian
 DiagonalGaussian(
     μ::Union{Nothing, Array{T, 1}},
     σ²::Union{Nothing, Array{T, 1}}
 ) -> DiagonalGaussian
-

DiagonalGaussian(μ,σ²) - Gaussian mixture with mean μ and variances σ² (and fixed zero covariances)

source
BetaML.GMM.FullGaussianMethod
FullGaussian(μ::Union{Nothing, Array{T, 1}}) -> FullGaussian
+

DiagonalGaussian(μ,σ²) - Gaussian mixture with mean μ and variances σ² (and fixed zero covariances)

source
BetaML.GMM.FullGaussianMethod
FullGaussian(μ::Union{Nothing, Array{T, 1}}) -> FullGaussian
 FullGaussian(
     μ::Union{Nothing, Array{T, 1}},
     σ²::Union{Nothing, Array{T, 2}}
 ) -> FullGaussian
-

FullGaussian(μ,σ²) - Gaussian mixture with mean μ and variance/covariance matrix σ²

source
BetaML.GMM.GaussianMixtureClustererType
mutable struct GaussianMixtureClusterer <: BetaMLUnsupervisedModel

Assign class probabilities to records (i.e. soft clustering) assuming a probabilistic generative model of observed data using mixtures.

For the parameters see ?GaussianMixture_hp and ?BML_options.

Notes:

  • Data must be numerical
  • Mixtures can be user defined: see the ?GMM module documentation for a discussion on provided vs custom mixtures.
  • Online fitting (re-fitting with new data) is supported by setting the old learned mixtrures as the starting values
  • The model is fitted using an Expectation-Minimisation (EM) algorithm that supports Missing data and is implemented in the log-domain for better numerical accuracy with many dimensions

Example:

julia> using BetaML
+

FullGaussian(μ,σ²) - Gaussian mixture with mean μ and variance/covariance matrix σ²

source
BetaML.GMM.GaussianMixtureClustererType
mutable struct GaussianMixtureClusterer <: BetaMLUnsupervisedModel

Assign class probabilities to records (i.e. soft clustering) assuming a probabilistic generative model of observed data using mixtures.

For the parameters see ?GaussianMixture_hp and ?BML_options.

Notes:

  • Data must be numerical
  • Mixtures can be user defined: see the ?GMM module documentation for a discussion on provided vs custom mixtures.
  • Online fitting (re-fitting with new data) is supported by setting the old learned mixtrures as the starting values
  • The model is fitted using an Expectation-Minimisation (EM) algorithm that supports Missing data and is implemented in the log-domain for better numerical accuracy with many dimensions

Example:

julia> using BetaML
 
 julia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8];
 
@@ -48,7 +48,7 @@
 julia> parameters(mod)
 BetaML.GMM.GMMCluster_lp (a BetaMLLearnableParametersSet struct)
 - mixtures: DiagonalGaussian{Float64}[DiagonalGaussian{Float64}([0.9333333333333332, 9.9], [0.05, 0.05]), DiagonalGaussian{Float64}([11.05, 0.9500000000000001], [0.05, 0.05])]
-- initial_probmixtures: [0.0, 1.0]
source
BetaML.GMM.GaussianMixtureRegressorType
mutable struct GaussianMixtureRegressor <: BetaMLUnsupervisedModel

A multi-dimensional, missing data friendly non-linear regressor based on Generative (Gaussian) Mixture Model.

The training data is used to fit a probabilistic model with latent mixtures (Gaussian distributions with different covariances are already implemented) and then predictions of new data is obtained by fitting the new data to the mixtures.

For hyperparameters see GaussianMixture_hp and BML_options.

Thsi strategy (GaussianMixtureRegressor) works by training the EM algorithm on a combined (hcat) matrix of X and Y. At predict time, the new data is first fitted to the learned mixtures using the e-step part of the EM algorithm (and using missing values for the dimensions belonging to Y) to obtain the probabilistic assignment of each record to the various mixtures. Then these probabilities are multiplied to the mixture averages for the Y dimensions to obtain the predicted value(s) for each record.

Example:

julia> using BetaML
+- initial_probmixtures: [0.0, 1.0]
source
BetaML.GMM.GaussianMixtureRegressorType
mutable struct GaussianMixtureRegressor <: BetaMLUnsupervisedModel

A multi-dimensional, missing data friendly non-linear regressor based on Generative (Gaussian) Mixture Model.

The training data is used to fit a probabilistic model with latent mixtures (Gaussian distributions with different covariances are already implemented) and then predictions of new data is obtained by fitting the new data to the mixtures.

For hyperparameters see GaussianMixture_hp and BML_options.

Thsi strategy (GaussianMixtureRegressor) works by training the EM algorithm on a combined (hcat) matrix of X and Y. At predict time, the new data is first fitted to the learned mixtures using the e-step part of the EM algorithm (and using missing values for the dimensions belonging to Y) to obtain the probabilistic assignment of each record to the various mixtures. Then these probabilities are multiplied to the mixture averages for the Y dimensions to obtain the predicted value(s) for each record.

Example:

julia> using BetaML
 
 julia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8];
 
@@ -88,7 +88,7 @@
 julia> parameters(mod)
 BetaML.GMM.GMMCluster_lp (a BetaMLLearnableParametersSet struct)
 - mixtures: DiagonalGaussian{Float64}[DiagonalGaussian{Float64}([0.9333333333333332, 9.9, -8.033333333333333], [1.1024999999999996, 0.05, 5.0625]), DiagonalGaussian{Float64}([11.05, 0.9500000000000001, 21.15], [1.1024999999999996, 0.05, 5.0625])]
-- initial_probmixtures: [0.6, 0.4]
source
BetaML.GMM.GaussianMixtureRegressor2Type
mutable struct GaussianMixtureRegressor2 <: BetaMLUnsupervisedModel

A multi-dimensional, missing data friendly non-linear regressor based on Generative (Gaussian) Mixture Model (strategy "1").

The training data is used to fit a probabilistic model with latent mixtures (Gaussian distributions with different covariances are already implemented) and then predictions of new data is obtained by fitting the new data to the mixtures.

For hyperparameters see GaussianMixture_hp and BML_options.

This strategy (GaussianMixtureRegressor2) works by fitting the EM algorithm on the feature matrix X. Once the data has been probabilistically assigned to the various classes, a mean value of fitting values Y is computed for each cluster (using the probabilities as weigths). At predict time, the new data is first fitted to the learned mixtures using the e-step part of the EM algorithm to obtain the probabilistic assignment of each record to the various mixtures. Then these probabilities are multiplied to the mixture averages for the Y dimensions learned at training time to obtain the predicted value(s) for each record.

Notes:

  • Predicted values are always a matrix, even when a single variable is predicted (use dropdims(ŷ,dims=2) to get a single vector).

Example:

julia> using BetaML
+- initial_probmixtures: [0.6, 0.4]
source
BetaML.GMM.GaussianMixtureRegressor2Type
mutable struct GaussianMixtureRegressor2 <: BetaMLUnsupervisedModel

A multi-dimensional, missing data friendly non-linear regressor based on Generative (Gaussian) Mixture Model (strategy "1").

The training data is used to fit a probabilistic model with latent mixtures (Gaussian distributions with different covariances are already implemented) and then predictions of new data is obtained by fitting the new data to the mixtures.

For hyperparameters see GaussianMixture_hp and BML_options.

This strategy (GaussianMixtureRegressor2) works by fitting the EM algorithm on the feature matrix X. Once the data has been probabilistically assigned to the various classes, a mean value of fitting values Y is computed for each cluster (using the probabilities as weigths). At predict time, the new data is first fitted to the learned mixtures using the e-step part of the EM algorithm to obtain the probabilistic assignment of each record to the various mixtures. Then these probabilities are multiplied to the mixture averages for the Y dimensions learned at training time to obtain the predicted value(s) for each record.

Notes:

  • Predicted values are always a matrix, even when a single variable is predicted (use dropdims(ŷ,dims=2) to get a single vector).

Example:

julia> using BetaML
 
 julia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8];
 
@@ -123,11 +123,11 @@
   "AIC"            => 32.7605
   "fitted_records" => 5
   "lL"             => -7.38023
-  "BIC"            => 29.2454
source
BetaML.GMM.GaussianMixture_hpType
mutable struct GaussianMixture_hp <: BetaMLHyperParametersSet

Hyperparameters for GMM clusters and other GMM-based algorithms

Parameters:

  • n_classes: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures: An array (of length n_classes) of the mixtures to employ (see the ?GMM module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if the initialisation_strategy parameter is set to "gived". This parameter can also be given symply in term of a type. In this case it is automatically extended to a vector of n_classes mixtures of the specified type. Note that mixing of different mixture types is not currently supported and that currently implemented mixtures are SphericalGaussian, DiagonalGaussian and FullGaussian. [def: DiagonalGaussian]

  • tol: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance.

  • initialisation_strategy: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • maximum_iterations: Maximum number of iterations [def: 5000]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method (suitable for the GMM-based regressors) To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.GMM.GaussianMixture_hpType
mutable struct GaussianMixture_hp <: BetaMLHyperParametersSet

Hyperparameters for GMM clusters and other GMM-based algorithms

Parameters:

  • n_classes: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures: An array (of length n_classes) of the mixtures to employ (see the ?GMM module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if the initialisation_strategy parameter is set to "gived". This parameter can also be given symply in term of a type. In this case it is automatically extended to a vector of n_classes mixtures of the specified type. Note that mixing of different mixture types is not currently supported and that currently implemented mixtures are SphericalGaussian, DiagonalGaussian and FullGaussian. [def: DiagonalGaussian]

  • tol: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance.

  • initialisation_strategy: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • maximum_iterations: Maximum number of iterations [def: 5000]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method (suitable for the GMM-based regressors) To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.GMM.SphericalGaussianMethod
SphericalGaussian(
     μ::Union{Nothing, Array{T, 1}}
 ) -> SphericalGaussian
 SphericalGaussian(
     μ::Union{Nothing, Array{T, 1}},
     σ²::Union{Nothing, T} where T
 ) -> SphericalGaussian
-

SphericalGaussian(μ,σ²) - Spherical Gaussian mixture with mean μ and (single) variance σ²

source
BetaML.GMM.init_mixtures!Method
init_mixtures!(mixtures::Array{T,1}, X; minimum_variance=0.25, minimum_covariance=0.0, initialisation_strategy="grid",rng=Random.GLOBAL_RNG)

The parameter initialisation_strategy can be grid, kmeans or given:

  • grid: Uniformly cover the space observed by the data
  • kmeans: Use the kmeans algorithm. If the data contains missing values, a first run of predictMissing is done under init=grid to impute the missing values just to allow the kmeans algorithm. Then the em algorithm is used with the output of kmean as init values.
  • given: Leave the provided set of initial mixtures
source
BetaML.GMM.lpdfMethod

lpdf(m::DiagonalGaussian,x,mask) - Log PDF of the mixture given the observation x

source
BetaML.GMM.lpdfMethod

lpdf(m::FullGaussian,x,mask) - Log PDF of the mixture given the observation x

source
BetaML.GMM.lpdfMethod

lpdf(m::SphericalGaussian,x,mask) - Log PDF of the mixture given the observation x

source
+

SphericalGaussian(μ,σ²) - Spherical Gaussian mixture with mean μ and (single) variance σ²

source
BetaML.GMM.init_mixtures!Method
init_mixtures!(mixtures::Array{T,1}, X; minimum_variance=0.25, minimum_covariance=0.0, initialisation_strategy="grid",rng=Random.GLOBAL_RNG)

The parameter initialisation_strategy can be grid, kmeans or given:

  • grid: Uniformly cover the space observed by the data
  • kmeans: Use the kmeans algorithm. If the data contains missing values, a first run of predictMissing is done under init=grid to impute the missing values just to allow the kmeans algorithm. Then the em algorithm is used with the output of kmean as init values.
  • given: Leave the provided set of initial mixtures
source
BetaML.GMM.lpdfMethod

lpdf(m::DiagonalGaussian,x,mask) - Log PDF of the mixture given the observation x

source
BetaML.GMM.lpdfMethod

lpdf(m::FullGaussian,x,mask) - Log PDF of the mixture given the observation x

source
BetaML.GMM.lpdfMethod

lpdf(m::SphericalGaussian,x,mask) - Log PDF of the mixture given the observation x

source
diff --git a/dev/Imputation.html b/dev/Imputation.html index 0ce795e..1630c7a 100644 --- a/dev/Imputation.html +++ b/dev/Imputation.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

The BetaML.Imputation Module

BetaML.ImputationModule
Imputation module

Provide various imputation methods for missing data. Note that the interpretation of "missing" can be very wide. For example, reccomendation systems / collaborative filtering (e.g. suggestion of the film to watch) can well be representated as a missing data to impute problem, often with better results than traditional algorithms as k-nearest neighbors (KNN)

Provided imputers:

  • SimpleImputer: Impute data using the feature (column) mean, optionally normalised by l-norms of the records (rows) (fastest)
  • GaussianMixtureImputer: Impute data using a Generative (Gaussian) Mixture Model (good trade off)
  • RandomForestImputer: Impute missing data using Random Forests, with optional replicable multiple imputations (most accurate).
  • GeneralImputer: Impute missing data using a vector (one per column) of arbitrary learning models (classifiers/regressors) that implement m = Model([options]), fit!(m,X,Y) and predict(m,X) (not necessarily from BetaML).

Imputations for all these models can be optained by running mod = ImputatorModel([options]), fit!(mod,X). The data with the missing values imputed can then be obtained with predict(mod). Useinfo(m::Imputer) to retrieve further information concerning the imputation. Trained models can be also used to impute missing values in new data with predict(mox,xNew). Note that if multiple imputations are run (for the supporting imputators) predict() will return a vector of predictions rather than a single one`.

Example

julia> using Statistics, BetaML
+

The BetaML.Imputation Module

BetaML.ImputationModule
Imputation module

Provide various imputation methods for missing data. Note that the interpretation of "missing" can be very wide. For example, reccomendation systems / collaborative filtering (e.g. suggestion of the film to watch) can well be representated as a missing data to impute problem, often with better results than traditional algorithms as k-nearest neighbors (KNN)

Provided imputers:

  • SimpleImputer: Impute data using the feature (column) mean, optionally normalised by l-norms of the records (rows) (fastest)
  • GaussianMixtureImputer: Impute data using a Generative (Gaussian) Mixture Model (good trade off)
  • RandomForestImputer: Impute missing data using Random Forests, with optional replicable multiple imputations (most accurate).
  • GeneralImputer: Impute missing data using a vector (one per column) of arbitrary learning models (classifiers/regressors) that implement m = Model([options]), fit!(m,X,Y) and predict(m,X) (not necessarily from BetaML).

Imputations for all these models can be optained by running mod = ImputatorModel([options]), fit!(mod,X). The data with the missing values imputed can then be obtained with predict(mod). Useinfo(m::Imputer) to retrieve further information concerning the imputation. Trained models can be also used to impute missing values in new data with predict(mox,xNew). Note that if multiple imputations are run (for the supporting imputators) predict() will return a vector of predictions rather than a single one`.

Example

julia> using Statistics, BetaML
 
 julia> X            = [2 missing 10; 2000 4000 1000; 2000 4000 10000; 3 5 12 ; 4 8 20; 1 2 5]
 6×3 Matrix{Union{Missing, Int64}}:
@@ -46,7 +46,7 @@
 julia> infos        = info(mod);
 
 julia> infos["n_imputed_values"]
-1
source

Module Index

Detailed API

BetaML.Imputation.GaussianMixtureImputerType
mutable struct GaussianMixtureImputer <: Imputer

Missing data imputer that uses a Generative (Gaussian) Mixture Model.

For the parameters (n_classes,mixtures,..) see GaussianMixture_hp.

Limitations:

  • data must be numerical
  • the resulted matrix is a Matrix{Float64}
  • currently the Mixtures available do not support random initialisation for missing imputation, and the rest of the algorithm (Expectation-Maximisation) is deterministic, so there is no random component involved (i.e. no multiple imputations)

Example:

julia> using BetaML
+1
source

Module Index

Detailed API

BetaML.Imputation.GaussianMixtureImputerType
mutable struct GaussianMixtureImputer <: Imputer

Missing data imputer that uses a Generative (Gaussian) Mixture Model.

For the parameters (n_classes,mixtures,..) see GaussianMixture_hp.

Limitations:

  • data must be numerical
  • the resulted matrix is a Matrix{Float64}
  • currently the Mixtures available do not support random initialisation for missing imputation, and the rest of the algorithm (Expectation-Maximisation) is deterministic, so there is no random component involved (i.e. no multiple imputations)

Example:

julia> using BetaML
 
 julia> X = [1 2.5; missing 20.5; 0.8 18; 12 22.8; 0.4 missing; 1.6 3.7];
 
@@ -77,7 +77,7 @@
 BetaML.Imputation.GaussianMixtureImputer_lp (a BetaMLLearnableParametersSet struct)
 - mixtures: AbstractMixture[SphericalGaussian{Float64}([1.0179819950570768, 3.0999990977255845], 0.2865287884295908), SphericalGaussian{Float64}([6.149053737674149, 20.43331198167713], 15.18664378248651)]
 - initial_probmixtures: [0.48544987084082347, 0.5145501291591764]
-- probRecords: [0.9999996039918224 3.9600817749531375e-7; 2.3866922376272767e-229 1.0; … ; 0.9127030246369684 0.08729697536303167; 0.9999965964161501 3.403583849794472e-6]
source
BetaML.Imputation.GeneralI_hpType
mutable struct GeneralI_hp <: BetaMLHyperParametersSet

Hyperparameters for GeneralImputer

Parameters:

  • cols_to_impute: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords "auto" (default) or "all". With "auto" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use "all" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.

  • estimator: An entimator model (regressor or classifier), with eventually its options (hyper-parameters), to be used to impute the various columns of the matrix. It can also be a cols_to_impute-length vector of different estimators to consider a different estimator for each column (dimension) to impute, for example when some columns are categorical (and will hence require a classifier) and some others are numerical (hence requiring a regressor). [default: nothing, i.e. use BetaML random forests, handling classification and regression jobs automatically].

  • missing_supported: Wheter the estimator(s) used to predict the missing data support itself missing data in the training features (X). If not, when the model for a certain dimension is fitted, dimensions with missing data in the same rows of those where imputation is needed are dropped and then only non-missing rows in the other remaining dimensions are considered. It can be a vector of boolean values to specify this property for each individual estimator or a single booleann value to apply to all the estimators [default: false]

  • fit_function: The function used by the estimator(s) to fit the model. It should take as fist argument the model itself, as second argument a matrix representing the features, and as third argument a vector representing the labels. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.fit!]

  • predict_function: The function used by the estimator(s) to predict the labels. It should take as fist argument the model itself and as second argument a matrix representing the features. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.predict]

  • recursive_passages: Define the number of times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].

  • multiple_imputations: Determine the number of independent imputation of the whole dataset to make. Note that while independent, the imputations share the same random number generator (RNG).

source
BetaML.Imputation.GeneralImputerType
mutable struct GeneralImputer <: Imputer

Impute missing values using arbitrary learning models.

Impute missing values using any arbitrary learning model (classifier or regressor, not necessarily from BetaML) that implement an interface m = Model([options]), train!(m,X,Y) and predict(m,X). For non-BetaML supervised models the actual training and predict functions must be specified in the fit_function and predict_function parameters respectively. If needed (for example when some columns with missing data are categorical and some numerical) different models can be specified for each column. Multiple imputations and multiple "passages" trought the various colums for a single imputation are supported.

See GeneralI_hp for all the hyper-parameters.

Examples:

  • Using BetaML models:
julia> using BetaML
+- probRecords: [0.9999996039918224 3.9600817749531375e-7; 2.3866922376272767e-229 1.0; … ; 0.9127030246369684 0.08729697536303167; 0.9999965964161501 3.403583849794472e-6]
source
BetaML.Imputation.GeneralI_hpType
mutable struct GeneralI_hp <: BetaMLHyperParametersSet

Hyperparameters for GeneralImputer

Parameters:

  • cols_to_impute: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords "auto" (default) or "all". With "auto" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use "all" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.

  • estimator: An entimator model (regressor or classifier), with eventually its options (hyper-parameters), to be used to impute the various columns of the matrix. It can also be a cols_to_impute-length vector of different estimators to consider a different estimator for each column (dimension) to impute, for example when some columns are categorical (and will hence require a classifier) and some others are numerical (hence requiring a regressor). [default: nothing, i.e. use BetaML random forests, handling classification and regression jobs automatically].

  • missing_supported: Wheter the estimator(s) used to predict the missing data support itself missing data in the training features (X). If not, when the model for a certain dimension is fitted, dimensions with missing data in the same rows of those where imputation is needed are dropped and then only non-missing rows in the other remaining dimensions are considered. It can be a vector of boolean values to specify this property for each individual estimator or a single booleann value to apply to all the estimators [default: false]

  • fit_function: The function used by the estimator(s) to fit the model. It should take as fist argument the model itself, as second argument a matrix representing the features, and as third argument a vector representing the labels. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.fit!]

  • predict_function: The function used by the estimator(s) to predict the labels. It should take as fist argument the model itself and as second argument a matrix representing the features. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.predict]

  • recursive_passages: Define the number of times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].

  • multiple_imputations: Determine the number of independent imputation of the whole dataset to make. Note that while independent, the imputations share the same random number generator (RNG).

source
BetaML.Imputation.GeneralImputerType
mutable struct GeneralImputer <: Imputer

Impute missing values using arbitrary learning models.

Impute missing values using any arbitrary learning model (classifier or regressor, not necessarily from BetaML) that implement an interface m = Model([options]), train!(m,X,Y) and predict(m,X). For non-BetaML supervised models the actual training and predict functions must be specified in the fit_function and predict_function parameters respectively. If needed (for example when some columns with missing data are categorical and some numerical) different models can be specified for each column. Multiple imputations and multiple "passages" trought the various colums for a single imputation are supported.

See GeneralI_hp for all the hyper-parameters.

Examples:

  • Using BetaML models:
julia> using BetaML
 julia> X = [1.4 2.5 "a"; missing 20.5 "b"; 0.6 18 missing; 0.7 22.8 "b"; 0.4 missing "b"; 1.6 3.7 "a"]
 6×3 Matrix{Any}:
  1.4        2.5       "a"
@@ -135,7 +135,7 @@
  0.6   18    "b"
  0.7   22.8  "b"
  0.4   13.5  "b"
- 1.6    3.7  "a"
source
BetaML.Imputation.RandomForestI_hpType
mutable struct RandomForestI_hp <: BetaMLHyperParametersSet

Hyperparameters for RandomForestImputer

Parameters:

  • rfhpar::Any: For the underlying random forest algorithm parameters (n_trees,max_depth,min_gain,min_records,max_features:,splitting_criterion,β,initialisation_strategy, oob and rng) see RandomForestE_hp for the specific RF algorithm parameters

  • forced_categorical_cols::Vector{Int64}: Specify the positions of the integer columns to treat as categorical instead of cardinal. [Default: empty vector (all numerical cols are treated as cardinal by default and the others as categorical)]

  • recursive_passages::Int64: Define the times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].

  • multiple_imputations::Int64: Determine the number of independent imputation of the whole dataset to make. Note that while independent, the imputations share the same random number generator (RNG).

  • cols_to_impute::Union{String, Vector{Int64}}: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords "auto" (default) or "all". With "auto" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use "auto" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.

Example:

julia>mod = RandomForestImputer(n_trees=20,max_depth=10,recursive_passages=3)
source
BetaML.Imputation.RandomForestImputerType
mutable struct RandomForestImputer <: Imputer

Impute missing data using Random Forests, with optional replicable multiple imputations.

See RandomForestI_hp, RandomForestE_hp and BML_options for the parameters.

Notes:

  • Given a certain RNG and its status (e.g. RandomForestImputer(...,rng=StableRNG(FIXEDSEED))), the algorithm is completely deterministic, i.e. replicable.
  • The algorithm accepts virtually any kind of data, sortable or not

Example:

julia> using BetaML
+ 1.6    3.7  "a"
source
BetaML.Imputation.RandomForestI_hpType
mutable struct RandomForestI_hp <: BetaMLHyperParametersSet

Hyperparameters for RandomForestImputer

Parameters:

  • rfhpar::Any: For the underlying random forest algorithm parameters (n_trees,max_depth,min_gain,min_records,max_features:,splitting_criterion,β,initialisation_strategy, oob and rng) see RandomForestE_hp for the specific RF algorithm parameters

  • forced_categorical_cols::Vector{Int64}: Specify the positions of the integer columns to treat as categorical instead of cardinal. [Default: empty vector (all numerical cols are treated as cardinal by default and the others as categorical)]

  • recursive_passages::Int64: Define the times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].

  • multiple_imputations::Int64: Determine the number of independent imputation of the whole dataset to make. Note that while independent, the imputations share the same random number generator (RNG).

  • cols_to_impute::Union{String, Vector{Int64}}: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords "auto" (default) or "all". With "auto" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use "auto" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.

Example:

julia>mod = RandomForestImputer(n_trees=20,max_depth=10,recursive_passages=3)
source
BetaML.Imputation.RandomForestImputerType
mutable struct RandomForestImputer <: Imputer

Impute missing data using Random Forests, with optional replicable multiple imputations.

See RandomForestI_hp, RandomForestE_hp and BML_options for the parameters.

Notes:

  • Given a certain RNG and its status (e.g. RandomForestImputer(...,rng=StableRNG(FIXEDSEED))), the algorithm is completely deterministic, i.e. replicable.
  • The algorithm accepts virtually any kind of data, sortable or not

Example:

julia> using BetaML
 
 julia> X = [1.4 2.5 "a"; missing 20.5 "b"; 0.6 18 missing; 0.7 22.8 "b"; 0.4 missing "b"; 1.6 3.7 "a"]
 6×3 Matrix{Any}:
@@ -157,7 +157,7 @@
  0.6       18       "b"
  0.7       22.8     "b"
  0.4       20.0837  "b"
- 1.6        3.7     "a"
source
BetaML.Imputation.SimpleI_hpType
mutable struct SimpleI_hp <: BetaMLHyperParametersSet

Hyperparameters for the SimpleImputer model

Parameters:

  • statistic::Function: The descriptive statistic of the column (feature) to use as imputed value [def: mean]

  • norm::Union{Nothing, Int64}: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).

source
BetaML.Imputation.SimpleImputerType
mutable struct SimpleImputer <: Imputer

Simple imputer using the missing data's feature (column) statistic (def: mean), optionally normalised by l-norms of the records (rows)

Parameters:

  • statistics: The descriptive statistic of the column (feature) to use as imputed value [def: mean]
  • norm: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).

Limitations:

  • data must be numerical

Example:

julia> using BetaML
+ 1.6        3.7     "a"
source
BetaML.Imputation.SimpleI_hpType
mutable struct SimpleI_hp <: BetaMLHyperParametersSet

Hyperparameters for the SimpleImputer model

Parameters:

  • statistic::Function: The descriptive statistic of the column (feature) to use as imputed value [def: mean]

  • norm::Union{Nothing, Int64}: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).

source
BetaML.Imputation.SimpleImputerType
mutable struct SimpleImputer <: Imputer

Simple imputer using the missing data's feature (column) statistic (def: mean), optionally normalised by l-norms of the records (rows)

Parameters:

  • statistics: The descriptive statistic of the column (feature) to use as imputed value [def: mean]
  • norm: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).

Limitations:

  • data must be numerical

Example:

julia> using BetaML
 
 julia> X = [2.0 missing 10; 20 40 100]
 2×3 Matrix{Union{Missing, Float64}}:
@@ -179,4 +179,4 @@
 julia> parameters(mod)
 BetaML.Imputation.SimpleImputer_lp (a BetaMLLearnableParametersSet struct)
 - cStats: [11.0, 40.0, 55.0]
-- norms: [6.0, 53.333333333333336]
source
+- norms: [6.0, 53.333333333333336]
source
diff --git a/dev/MLJ_interface.html b/dev/MLJ_interface.html index e183adf..72c3406 100644 --- a/dev/MLJ_interface.html +++ b/dev/MLJ_interface.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

The MLJ interface to BetaML Models

BetaML.BmljModule

MLJ interface for BetaML models

In this module we define the interface of several BetaML models. They can be used using the MLJ framework.

Note that MLJ models (whose name could be the same as the underlying BetaML model) are not exported. You can access them with BetaML.Bmlj.ModelXYZ.

source

Models available through MLJ

Detailed models documentation

BetaML.Bmlj.AutoEncoderType
mutable struct AutoEncoder <: MLJModelInterface.Unsupervised

A ready-to use AutoEncoder, from the Beta Machine Learning Toolkit (BetaML) for ecoding and decoding of data using neural networks

Parameters:

  • encoded_size: The number of neurons (i.e. dimensions) of the encoded data. If the value is a float it is consiered a percentual (to be rounded) of the dimensionality of the data [def: 0.33]

  • layers_size: Inner layer dimension (i.e. number of neurons). If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: nothing that applies a specific heuristic]. Consider that the underlying neural network is trying to predict multiple values at the same times. Normally this requires many more neurons than a scalar prediction. If e_layers or d_layers are specified, this parameter is ignored for the respective part.

  • e_layers: The layers (vector of AbstractLayers) responsable of the encoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]. See subtypes(BetaML.AbstractLayer) for supported layers

  • d_layers: The layers (vector of AbstractLayers) responsable of the decoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]. See subtypes(BetaML.AbstractLayer) for supported layers

  • loss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as (n x d) matrices.

    Warning

    If you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.

  • dloss: Derivative of the loss function [def: BetaML.dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 8]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()] See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

  • descr: An optional title and/or description for this model

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • use transform to obtain the encoded data, and inverse_trasnform to decode to the original data

Example:

julia> using MLJ
+

The MLJ interface to BetaML Models

BetaML.BmljModule

MLJ interface for BetaML models

In this module we define the interface of several BetaML models. They can be used using the MLJ framework.

Note that MLJ models (whose name could be the same as the underlying BetaML model) are not exported. You can access them with BetaML.Bmlj.ModelXYZ.

source

Models available through MLJ

Detailed models documentation

BetaML.Bmlj.AutoEncoderType
mutable struct AutoEncoder <: MLJModelInterface.Unsupervised

A ready-to use AutoEncoder, from the Beta Machine Learning Toolkit (BetaML) for ecoding and decoding of data using neural networks

Parameters:

  • encoded_size: The number of neurons (i.e. dimensions) of the encoded data. If the value is a float it is consiered a percentual (to be rounded) of the dimensionality of the data [def: 0.33]

  • layers_size: Inner layer dimension (i.e. number of neurons). If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: nothing that applies a specific heuristic]. Consider that the underlying neural network is trying to predict multiple values at the same times. Normally this requires many more neurons than a scalar prediction. If e_layers or d_layers are specified, this parameter is ignored for the respective part.

  • e_layers: The layers (vector of AbstractLayers) responsable of the encoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]. See subtypes(BetaML.AbstractLayer) for supported layers

  • d_layers: The layers (vector of AbstractLayers) responsable of the decoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]. See subtypes(BetaML.AbstractLayer) for supported layers

  • loss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as (n x d) matrices.

    Warning

    If you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.

  • dloss: Derivative of the loss function [def: BetaML.dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 8]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()] See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

  • descr: An optional title and/or description for this model

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • use transform to obtain the encoded data, and inverse_trasnform to decode to the original data

Example:

julia> using MLJ
 
 julia> X, y        = @load_iris;
 
@@ -63,7 +63,7 @@
 julia> BetaML.relative_mean_error(MLJ.matrix(X),X_recovered)
 0.03387721261716176
 
-
source
BetaML.Bmlj.DecisionTreeClassifierType
mutable struct DecisionTreeClassifier <: MLJModelInterface.Probabilistic

A simple Decision Tree model for classification with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • max_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. look at all features]

  • splitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: gini]. Either gini, entropy or a custom function. It can also be an anonymous function.

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
+
source
BetaML.Bmlj.DecisionTreeClassifierType
mutable struct DecisionTreeClassifier <: MLJModelInterface.Probabilistic

A simple Decision Tree model for classification with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • max_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. look at all features]

  • splitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: gini]. Either gini, entropy or a custom function. It can also be an anonymous function.

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
 
 julia> X, y        = @load_iris;
 
@@ -91,7 +91,7 @@
  ⋮
  UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)
  UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)
- UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)
source
BetaML.Bmlj.DecisionTreeRegressorType
mutable struct DecisionTreeRegressor <: MLJModelInterface.Deterministic

A simple Decision Tree model for regression with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • max_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. look at all features]

  • splitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: variance]. Either variance or a custom function. It can also be an anonymous function.

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
+ UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)
source
BetaML.Bmlj.DecisionTreeRegressorType
mutable struct DecisionTreeRegressor <: MLJModelInterface.Deterministic

A simple Decision Tree model for regression with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • max_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. look at all features]

  • splitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: variance]. Either variance or a custom function. It can also be an anonymous function.

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
 
 julia> X, y        = @load_boston;
 
@@ -122,7 +122,7 @@
   ⋮    
  23.9  23.75
  22.0  22.2
- 11.9  13.2
source
BetaML.Bmlj.GaussianMixtureClustererType
mutable struct GaussianMixtureClusterer <: MLJModelInterface.Unsupervised

A Expectation-Maximisation clustering algorithm with customisable mixtures, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures::AbstractVector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures::Union{Type, Vector{var"#s1270"} where var"#s1270"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the ?GMM module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if the initialisation_strategy parameter is set to "gived". This parameter can also be given symply in term of a type. In this case it is automatically extended to a vector of n_classes mixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def: [DiagonalGaussian() for i in 1:n_classes]]

  • tol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).

  • initialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • maximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

Example:


+ 11.9  13.2
source
BetaML.Bmlj.GaussianMixtureClustererType
mutable struct GaussianMixtureClusterer <: MLJModelInterface.Unsupervised

A Expectation-Maximisation clustering algorithm with customisable mixtures, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures::AbstractVector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures::Union{Type, Vector{var"#s1270"} where var"#s1270"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the ?GMM module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if the initialisation_strategy parameter is set to "gived". This parameter can also be given symply in term of a type. In this case it is automatically extended to a vector of n_classes mixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def: [DiagonalGaussian() for i in 1:n_classes]]

  • tol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).

  • initialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • maximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

Example:


 julia> using MLJ
 
 julia> X, y        = @load_iris;
@@ -157,7 +157,7 @@
  ⋮
  UnivariateFinite{Multiclass{3}}(1=>5.39e-25, 2=>0.0167, 3=>0.983)
  UnivariateFinite{Multiclass{3}}(1=>7.5e-29, 2=>0.000106, 3=>1.0)
- UnivariateFinite{Multiclass{3}}(1=>1.6e-20, 2=>0.594, 3=>0.406)
source
BetaML.Bmlj.GaussianMixtureImputerType
mutable struct GaussianMixtureImputer <: MLJModelInterface.Unsupervised

Impute missing values using a probabilistic approach (Gaussian Mixture Models) fitted using the Expectation-Maximisation algorithm, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures::Union{Type, Vector{var"#s1270"} where var"#s1270"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module in BetaML). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to "gived" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported and that currently implemented mixtures areSphericalGaussian,DiagonalGaussianandFullGaussian. [def:DiagonalGaussian`]

  • tol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance.

  • initialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example :

julia> using MLJ
+ UnivariateFinite{Multiclass{3}}(1=>1.6e-20, 2=>0.594, 3=>0.406)
source
BetaML.Bmlj.GaussianMixtureImputerType
mutable struct GaussianMixtureImputer <: MLJModelInterface.Unsupervised

Impute missing values using a probabilistic approach (Gaussian Mixture Models) fitted using the Expectation-Maximisation algorithm, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures::Union{Type, Vector{var"#s1270"} where var"#s1270"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module in BetaML). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to "gived" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported and that currently implemented mixtures areSphericalGaussian,DiagonalGaussianandFullGaussian. [def:DiagonalGaussian`]

  • tol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance.

  • initialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example :

julia> using MLJ
 
 julia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;
 
@@ -191,7 +191,7 @@
  2.51842  15.1747
  3.3      38.0
  2.47412  -2.3
- 5.2      -2.4
source
BetaML.Bmlj.GaussianMixtureRegressorType
mutable struct GaussianMixtureRegressor <: MLJModelInterface.Deterministic

A non-linear regressor derived from fitting the data on a probabilistic model (Gaussian Mixture Model). Relatively fast but generally not very precise, except for data with a structure matching the chosen underlying mixture.

This is the single-target version of the model. If you want to predict several labels (y) at once, use the MLJ model MultitargetGaussianMixtureRegressor.

Hyperparameters:

  • n_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures::Union{Type, Vector{var"#s1270"} where var"#s1270"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to "gived" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def:[DiagonalGaussian() for i in 1:n_classes]`]

  • tol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).

  • initialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • maximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
+ 5.2      -2.4
source
BetaML.Bmlj.GaussianMixtureRegressorType
mutable struct GaussianMixtureRegressor <: MLJModelInterface.Deterministic

A non-linear regressor derived from fitting the data on a probabilistic model (Gaussian Mixture Model). Relatively fast but generally not very precise, except for data with a structure matching the chosen underlying mixture.

This is the single-target version of the model. If you want to predict several labels (y) at once, use the MLJ model MultitargetGaussianMixtureRegressor.

Hyperparameters:

  • n_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures::Union{Type, Vector{var"#s1270"} where var"#s1270"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to "gived" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def:[DiagonalGaussian() for i in 1:n_classes]`]

  • tol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).

  • initialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • maximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
 
 julia> X, y      = @load_boston;
 
@@ -222,7 +222,7 @@
  24.70344283512716
   ⋮
  17.172486989759676
- 17.172486989759644
source
BetaML.Bmlj.GeneralImputerType
mutable struct GeneralImputer <: MLJModelInterface.Unsupervised

Impute missing values using arbitrary learning models, from the Beta Machine Learning Toolkit (BetaML).

Impute missing values using a vector (one per column) of arbitrary learning models (classifiers/regressors, not necessarily from BetaML) that implement the interface m = Model([options]), train!(m,X,Y) and predict(m,X).

Hyperparameters:

  • cols_to_impute::Union{String, Vector{Int64}}: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords "auto" (default) or "all". With "auto" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use "all" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.

  • estimator::Any: An entimator model (regressor or classifier), with eventually its options (hyper-parameters), to be used to impute the various columns of the matrix. It can also be a cols_to_impute-length vector of different estimators to consider a different estimator for each column (dimension) to impute, for example when some columns are categorical (and will hence require a classifier) and some others are numerical (hence requiring a regressor). [default: nothing, i.e. use BetaML random forests, handling classification and regression jobs automatically].

  • missing_supported::Union{Bool, Vector{Bool}}: Wheter the estimator(s) used to predict the missing data support itself missing data in the training features (X). If not, when the model for a certain dimension is fitted, dimensions with missing data in the same rows of those where imputation is needed are dropped and then only non-missing rows in the other remaining dimensions are considered. It can be a vector of boolean values to specify this property for each individual estimator or a single booleann value to apply to all the estimators [default: false]

  • fit_function::Union{Function, Vector{Function}}: The function used by the estimator(s) to fit the model. It should take as fist argument the model itself, as second argument a matrix representing the features, and as third argument a vector representing the labels. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.fit!]

  • predict_function::Union{Function, Vector{Function}}: The function used by the estimator(s) to predict the labels. It should take as fist argument the model itself and as second argument a matrix representing the features. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.predict]

  • recursive_passages::Int64: Define the number of times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]. Note that this influence only the specific GeneralImputer code, the individual estimators may have their own rng (or similar) parameter.

Examples :

  • Using BetaML models:
julia> using MLJ;
+ 17.172486989759644
source
BetaML.Bmlj.GeneralImputerType
mutable struct GeneralImputer <: MLJModelInterface.Unsupervised

Impute missing values using arbitrary learning models, from the Beta Machine Learning Toolkit (BetaML).

Impute missing values using a vector (one per column) of arbitrary learning models (classifiers/regressors, not necessarily from BetaML) that implement the interface m = Model([options]), train!(m,X,Y) and predict(m,X).

Hyperparameters:

  • cols_to_impute::Union{String, Vector{Int64}}: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords "auto" (default) or "all". With "auto" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use "all" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.

  • estimator::Any: An entimator model (regressor or classifier), with eventually its options (hyper-parameters), to be used to impute the various columns of the matrix. It can also be a cols_to_impute-length vector of different estimators to consider a different estimator for each column (dimension) to impute, for example when some columns are categorical (and will hence require a classifier) and some others are numerical (hence requiring a regressor). [default: nothing, i.e. use BetaML random forests, handling classification and regression jobs automatically].

  • missing_supported::Union{Bool, Vector{Bool}}: Wheter the estimator(s) used to predict the missing data support itself missing data in the training features (X). If not, when the model for a certain dimension is fitted, dimensions with missing data in the same rows of those where imputation is needed are dropped and then only non-missing rows in the other remaining dimensions are considered. It can be a vector of boolean values to specify this property for each individual estimator or a single booleann value to apply to all the estimators [default: false]

  • fit_function::Union{Function, Vector{Function}}: The function used by the estimator(s) to fit the model. It should take as fist argument the model itself, as second argument a matrix representing the features, and as third argument a vector representing the labels. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.fit!]

  • predict_function::Union{Function, Vector{Function}}: The function used by the estimator(s) to predict the labels. It should take as fist argument the model itself and as second argument a matrix representing the features. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.predict]

  • recursive_passages::Int64: Define the number of times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]. Note that this influence only the specific GeneralImputer code, the individual estimators may have their own rng (or similar) parameter.

Examples :

  • Using BetaML models:
julia> using MLJ;
 julia> import BetaML # The library from which to get the individual estimators to be used for each column imputation
 julia> X = ["a"         8.2;
             "a"     missing;
@@ -280,7 +280,7 @@
  "b"  20
  "c"  -1.8
  "c"  -2.3
- "c"  -2.4
source
BetaML.Bmlj.KMeansClustererType
mutable struct KMeansClusterer <: MLJModelInterface.Unsupervised

The classical KMeansClusterer clustering algorithm, from the Beta Machine Learning Toolkit (BetaML).

Parameters:

  • n_classes::Int64: Number of classes to discriminate the data [def: 3]

  • dist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance), cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that, contrary to KMedoidsClusterer, the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.

  • initialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:

    • "random": randomly in the X space
    • "grid": using a grid approach
    • "shuffle": selecting randomly within the available points [default]
    • "given": using a provided set of initial representatives provided in the initial_representatives parameter
  • initial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy="given") [default: nothing]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • online fitting (re-fitting with new data) is supported

Example:

julia> using MLJ
+ "c"  -2.4
source
BetaML.Bmlj.KMeansClustererType
mutable struct KMeansClusterer <: MLJModelInterface.Unsupervised

The classical KMeansClusterer clustering algorithm, from the Beta Machine Learning Toolkit (BetaML).

Parameters:

  • n_classes::Int64: Number of classes to discriminate the data [def: 3]

  • dist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance), cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that, contrary to KMedoidsClusterer, the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.

  • initialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:

    • "random": randomly in the X space
    • "grid": using a grid approach
    • "shuffle": selecting randomly within the available points [default]
    • "given": using a provided set of initial representatives provided in the initial_representatives parameter
  • initial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy="given") [default: nothing]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • online fitting (re-fitting with new data) is supported

Example:

julia> using MLJ
 
 julia> X, y        = @load_iris;
 
@@ -310,7 +310,7 @@
  ⋮            
  "virginica"  3
  "virginica"  3
- "virginica"  1
source
BetaML.Bmlj.KMedoidsClustererType
mutable struct KMedoidsClusterer <: MLJModelInterface.Unsupervised

Parameters:

  • n_classes::Int64: Number of classes to discriminate the data [def: 3]

  • dist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance), cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics.

  • initialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:

    • "random": randomly in the X space
    • "grid": using a grid approach
    • "shuffle": selecting randomly within the available points [default]
    • "given": using a provided set of initial representatives provided in the initial_representatives parameter
  • initial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy="given") [default: nothing]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

The K-medoids clustering algorithm with customisable distance function, from the Beta Machine Learning Toolkit (BetaML).

Similar to K-Means, but the "representatives" (the cetroids) are guaranteed to be one of the training points. The algorithm work with any arbitrary distance measure.

Notes:

  • data must be numerical
  • online fitting (re-fitting with new data) is supported

Example:

julia> using MLJ
+ "virginica"  1
source
BetaML.Bmlj.KMedoidsClustererType
mutable struct KMedoidsClusterer <: MLJModelInterface.Unsupervised

Parameters:

  • n_classes::Int64: Number of classes to discriminate the data [def: 3]

  • dist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance), cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics.

  • initialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:

    • "random": randomly in the X space
    • "grid": using a grid approach
    • "shuffle": selecting randomly within the available points [default]
    • "given": using a provided set of initial representatives provided in the initial_representatives parameter
  • initial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy="given") [default: nothing]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

The K-medoids clustering algorithm with customisable distance function, from the Beta Machine Learning Toolkit (BetaML).

Similar to K-Means, but the "representatives" (the cetroids) are guaranteed to be one of the training points. The algorithm work with any arbitrary distance measure.

Notes:

  • data must be numerical
  • online fitting (re-fitting with new data) is supported

Example:

julia> using MLJ
 
 julia> X, y        = @load_iris;
 
@@ -340,7 +340,7 @@
  ⋮            
  "virginica"  1
  "virginica"  1
- "virginica"  2
source
BetaML.Bmlj.KernelPerceptronClassifierType
mutable struct KernelPerceptronClassifier <: MLJModelInterface.Probabilistic

The kernel perceptron algorithm using one-vs-one for multiclass, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • kernel::Function: Kernel function to employ. See ?radial_kernel or ?polynomial_kernel (once loaded the BetaML package) for details or check ?BetaML.Utils to verify if other kernels are defined (you can alsways define your own kernel) [def: radial_kernel]

  • epochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 100]

  • initial_errors::Union{Nothing, Vector{Vector{Int64}}}: Initial distribution of the number of errors errors [def: nothing, i.e. zeros]. If provided, this should be a nModels-lenght vector of nRecords integer values vectors , where nModels is computed as (n_classes * (n_classes - 1)) / 2

  • shuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
+ "virginica"  2
source
BetaML.Bmlj.KernelPerceptronClassifierType
mutable struct KernelPerceptronClassifier <: MLJModelInterface.Probabilistic

The kernel perceptron algorithm using one-vs-one for multiclass, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • kernel::Function: Kernel function to employ. See ?radial_kernel or ?polynomial_kernel (once loaded the BetaML package) for details or check ?BetaML.Utils to verify if other kernels are defined (you can alsways define your own kernel) [def: radial_kernel]

  • epochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 100]

  • initial_errors::Union{Nothing, Vector{Vector{Int64}}}: Initial distribution of the number of errors errors [def: nothing, i.e. zeros]. If provided, this should be a nModels-lenght vector of nRecords integer values vectors , where nModels is computed as (n_classes * (n_classes - 1)) / 2

  • shuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
 
 julia> X, y        = @load_iris;
 
@@ -367,7 +367,7 @@
  UnivariateFinite{Multiclass{3}}(setosa=>0.665, versicolor=>0.245, virginica=>0.09)
  ⋮
  UnivariateFinite{Multiclass{3}}(setosa=>0.09, versicolor=>0.245, virginica=>0.665)
- UnivariateFinite{Multiclass{3}}(setosa=>0.09, versicolor=>0.665, virginica=>0.245)
source
BetaML.Bmlj.MultitargetGaussianMixtureRegressorType
mutable struct MultitargetGaussianMixtureRegressor <: MLJModelInterface.Deterministic

A non-linear regressor derived from fitting the data on a probabilistic model (Gaussian Mixture Model). Relatively fast but generally not very precise, except for data with a structure matching the chosen underlying mixture.

This is the multi-target version of the model. If you want to predict a single label (y), use the MLJ model GaussianMixtureRegressor.

Hyperparameters:

  • n_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures::Union{Type, Vector{var"#s1270"} where var"#s1270"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to "gived" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def:[DiagonalGaussian() for i in 1:n_classes]`]

  • tol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).

  • initialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • maximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
+ UnivariateFinite{Multiclass{3}}(setosa=>0.09, versicolor=>0.665, virginica=>0.245)
source
BetaML.Bmlj.MultitargetGaussianMixtureRegressorType
mutable struct MultitargetGaussianMixtureRegressor <: MLJModelInterface.Deterministic

A non-linear regressor derived from fitting the data on a probabilistic model (Gaussian Mixture Model). Relatively fast but generally not very precise, except for data with a structure matching the chosen underlying mixture.

This is the multi-target version of the model. If you want to predict a single label (y), use the MLJ model GaussianMixtureRegressor.

Hyperparameters:

  • n_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures::Union{Type, Vector{var"#s1270"} where var"#s1270"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to "gived" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def:[DiagonalGaussian() for i in 1:n_classes]`]

  • tol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).

  • initialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • maximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
 
 julia> X, y        = @load_boston;
 
@@ -400,7 +400,7 @@
  23.3358  51.6717
   ⋮       
  16.6843  38.3686
- 16.6843  38.3686
source
BetaML.Bmlj.MultitargetNeuralNetworkRegressorType
mutable struct MultitargetNeuralNetworkRegressor <: MLJModelInterface.Deterministic

A simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for regression of multiple dimensional targets.

Parameters:

  • layers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers

  • loss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as matrices.

    Warning

    If you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.

  • dloss: Derivative of the loss function [def: BetaML.dsquared_cost, i.e. use the derivative of the squared cost]. Use nothing for autodiff.

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 300]

  • batch_size: Size of each individual batch [def: 16]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • descr: An optional title and/or description for this model

  • cb: A call back function to provide information during training [def: BetaML.fitting_info]

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • the label should be a n-records by n-dimensions matrix

Example:

julia> using MLJ
+ 16.6843  38.3686
source
BetaML.Bmlj.MultitargetNeuralNetworkRegressorType
mutable struct MultitargetNeuralNetworkRegressor <: MLJModelInterface.Deterministic

A simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for regression of multiple dimensional targets.

Parameters:

  • layers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers

  • loss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as matrices.

    Warning

    If you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.

  • dloss: Derivative of the loss function [def: BetaML.dsquared_cost, i.e. use the derivative of the squared cost]. Use nothing for autodiff.

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 300]

  • batch_size: Size of each individual batch [def: 16]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • descr: An optional title and/or description for this model

  • cb: A call back function to provide information during training [def: BetaML.fitting_info]

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • the label should be a n-records by n-dimensions matrix

Example:

julia> using MLJ
 
 julia> X, y        = @load_boston;
 
@@ -439,7 +439,7 @@
   ⋮                   
  23.9  52.8  23.3573  50.654
  22.0  49.0  22.1141  48.5926
- 11.9  28.8  19.9639  45.5823
source
BetaML.Bmlj.NeuralNetworkClassifierType
mutable struct NeuralNetworkClassifier <: MLJModelInterface.Probabilistic

A simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for classification problems.

Parameters:

  • layers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers. The last "softmax" layer is automatically added.

  • loss: Loss (cost) function [def: BetaML.crossentropy]. Should always assume y and ŷ as matrices.

    Warning

    If you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.

  • dloss: Derivative of the loss function [def: BetaML.dcrossentropy, i.e. the derivative of the cross-entropy]. Use nothing for autodiff.

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 16]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • descr: An optional title and/or description for this model

  • cb: A call back function to provide information during training [def: BetaML.fitting_info]

  • categories: The categories to represent as columns. [def: nothing, i.e. unique training values].

  • handle_unknown: How to handle categories not seens in training or not present in the provided categories array? "error" (default) rises an error, "infrequent" adds a specific column for these categories.

  • other_categories_name: Which value during prediction to assign to this "other" category (i.e. categories not seen on training or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and "other" for other types]. This setting is active only if handle_unknown="infrequent" and in that case it MUST be specified if Y is neither integer or strings

  • rng: Random Number Generator [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • the label should be a n-records by n-dimensions matrix (e.g. a one-hot-encoded data for classification), where the output columns should be interpreted as the probabilities for each categories.

Example:

julia> using MLJ
+ 11.9  28.8  19.9639  45.5823
source
BetaML.Bmlj.NeuralNetworkClassifierType
mutable struct NeuralNetworkClassifier <: MLJModelInterface.Probabilistic

A simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for classification problems.

Parameters:

  • layers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers. The last "softmax" layer is automatically added.

  • loss: Loss (cost) function [def: BetaML.crossentropy]. Should always assume y and ŷ as matrices.

    Warning

    If you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.

  • dloss: Derivative of the loss function [def: BetaML.dcrossentropy, i.e. the derivative of the cross-entropy]. Use nothing for autodiff.

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 16]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • descr: An optional title and/or description for this model

  • cb: A call back function to provide information during training [def: BetaML.fitting_info]

  • categories: The categories to represent as columns. [def: nothing, i.e. unique training values].

  • handle_unknown: How to handle categories not seens in training or not present in the provided categories array? "error" (default) rises an error, "infrequent" adds a specific column for these categories.

  • other_categories_name: Which value during prediction to assign to this "other" category (i.e. categories not seen on training or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and "other" for other types]. This setting is active only if handle_unknown="infrequent" and in that case it MUST be specified if Y is neither integer or strings

  • rng: Random Number Generator [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • the label should be a n-records by n-dimensions matrix (e.g. a one-hot-encoded data for classification), where the output columns should be interpreted as the probabilities for each categories.

Example:

julia> using MLJ
 
 julia> X, y        = @load_iris;
 
@@ -474,7 +474,7 @@
  UnivariateFinite{Multiclass{3}}(setosa=>0.573, versicolor=>0.213, virginica=>0.213)
  ⋮
  UnivariateFinite{Multiclass{3}}(setosa=>0.236, versicolor=>0.236, virginica=>0.529)
- UnivariateFinite{Multiclass{3}}(setosa=>0.254, versicolor=>0.254, virginica=>0.492)
source
BetaML.Bmlj.NeuralNetworkRegressorType
mutable struct NeuralNetworkRegressor <: MLJModelInterface.Deterministic

A simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for regression of a single dimensional target.

Parameters:

  • layers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers

  • loss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as matrices, even if the regression task is 1-D

    Warning

    If you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.

  • dloss: Derivative of the loss function [def: BetaML.dsquared_cost, i.e. use the derivative of the squared cost]. Use nothing for autodiff.

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 16]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • descr: An optional title and/or description for this model

  • cb: A call back function to provide information during training [def: fitting_info]

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • the label should be be a n-records vector.

Example:

julia> using MLJ
+ UnivariateFinite{Multiclass{3}}(setosa=>0.254, versicolor=>0.254, virginica=>0.492)
source
BetaML.Bmlj.NeuralNetworkRegressorType
mutable struct NeuralNetworkRegressor <: MLJModelInterface.Deterministic

A simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for regression of a single dimensional target.

Parameters:

  • layers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers

  • loss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as matrices, even if the regression task is 1-D

    Warning

    If you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.

  • dloss: Derivative of the loss function [def: BetaML.dsquared_cost, i.e. use the derivative of the squared cost]. Use nothing for autodiff.

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 16]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • descr: An optional title and/or description for this model

  • cb: A call back function to provide information during training [def: fitting_info]

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • the label should be be a n-records vector.

Example:

julia> using MLJ
 
 julia> X, y        = @load_boston;
 
@@ -510,7 +510,7 @@
   ⋮    
  23.9  30.9032
  22.0  29.49
- 11.9  27.2438
source
BetaML.Bmlj.PegasosClassifierType
mutable struct PegasosClassifier <: MLJModelInterface.Probabilistic

The gradient-based linear "pegasos" classifier using one-vs-all for multiclass, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • initial_coefficients::Union{Nothing, Matrix{Float64}}: N-classes by D-dimensions matrix of initial linear coefficients [def: nothing, i.e. zeros]

  • initial_constant::Union{Nothing, Vector{Float64}}: N-classes vector of initial contant terms [def: nothing, i.e. zeros]

  • learning_rate::Function: Learning rate [def: (epoch -> 1/sqrt(epoch))]

  • learning_rate_multiplicative::Float64: Multiplicative term of the learning rate [def: 0.5]

  • epochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]

  • shuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • force_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]

  • return_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
+ 11.9  27.2438
source
BetaML.Bmlj.PegasosClassifierType
mutable struct PegasosClassifier <: MLJModelInterface.Probabilistic

The gradient-based linear "pegasos" classifier using one-vs-all for multiclass, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • initial_coefficients::Union{Nothing, Matrix{Float64}}: N-classes by D-dimensions matrix of initial linear coefficients [def: nothing, i.e. zeros]

  • initial_constant::Union{Nothing, Vector{Float64}}: N-classes vector of initial contant terms [def: nothing, i.e. zeros]

  • learning_rate::Function: Learning rate [def: (epoch -> 1/sqrt(epoch))]

  • learning_rate_multiplicative::Float64: Multiplicative term of the learning rate [def: 0.5]

  • epochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]

  • shuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • force_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]

  • return_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
 
 julia> X, y        = @load_iris;
 
@@ -539,7 +539,7 @@
  UnivariateFinite{Multiclass{3}}(setosa=>0.791, versicolor=>0.177, virginica=>0.0318)
  ⋮
  UnivariateFinite{Multiclass{3}}(setosa=>0.254, versicolor=>0.5, virginica=>0.246)
- UnivariateFinite{Multiclass{3}}(setosa=>0.283, versicolor=>0.51, virginica=>0.207)
source
BetaML.Bmlj.PerceptronClassifierType
mutable struct PerceptronClassifier <: MLJModelInterface.Probabilistic

The classical perceptron algorithm using one-vs-all for multiclass, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • initial_coefficients::Union{Nothing, Matrix{Float64}}: N-classes by D-dimensions matrix of initial linear coefficients [def: nothing, i.e. zeros]

  • initial_constant::Union{Nothing, Vector{Float64}}: N-classes vector of initial contant terms [def: nothing, i.e. zeros]

  • epochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]

  • shuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • force_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]

  • return_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
+ UnivariateFinite{Multiclass{3}}(setosa=>0.283, versicolor=>0.51, virginica=>0.207)
source
BetaML.Bmlj.PerceptronClassifierType
mutable struct PerceptronClassifier <: MLJModelInterface.Probabilistic

The classical perceptron algorithm using one-vs-all for multiclass, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • initial_coefficients::Union{Nothing, Matrix{Float64}}: N-classes by D-dimensions matrix of initial linear coefficients [def: nothing, i.e. zeros]

  • initial_constant::Union{Nothing, Vector{Float64}}: N-classes vector of initial contant terms [def: nothing, i.e. zeros]

  • epochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]

  • shuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • force_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]

  • return_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
 
 julia> X, y        = @load_iris;
 
@@ -569,7 +569,7 @@
  UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>1.27e-18, virginica=>1.86e-310)
  ⋮
  UnivariateFinite{Multiclass{3}}(setosa=>2.77e-57, versicolor=>1.1099999999999999e-82, virginica=>1.0)
- UnivariateFinite{Multiclass{3}}(setosa=>3.09e-22, versicolor=>4.03e-25, virginica=>1.0)
source
BetaML.Bmlj.RandomForestClassifierType
mutable struct RandomForestClassifier <: MLJModelInterface.Probabilistic

A simple Random Forest model for classification with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_trees::Int64

  • max_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. square root of the data dimensions]

  • splitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: gini]. Either gini, entropy or a custom function. It can also be an anonymous function.

  • β::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour "better" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example :

julia> using MLJ
+ UnivariateFinite{Multiclass{3}}(setosa=>3.09e-22, versicolor=>4.03e-25, virginica=>1.0)
source
BetaML.Bmlj.RandomForestClassifierType
mutable struct RandomForestClassifier <: MLJModelInterface.Probabilistic

A simple Random Forest model for classification with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_trees::Int64

  • max_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. square root of the data dimensions]

  • splitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: gini]. Either gini, entropy or a custom function. It can also be an anonymous function.

  • β::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour "better" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example :

julia> using MLJ
 
 julia> X, y        = @load_iris;
 
@@ -598,7 +598,7 @@
  UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>0.0, virginica=>0.0)
  ⋮
  UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)
- UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0667, virginica=>0.933)
source
BetaML.Bmlj.RandomForestImputerType
mutable struct RandomForestImputer <: MLJModelInterface.Unsupervised

Impute missing values using Random Forests, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_trees::Int64: Number of (decision) trees in the forest [def: 30]

  • max_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Union{Nothing, Int64}: The maximum number of (random) features to consider at each partitioning [def: nothing, i.e. square root of the data dimension]

  • forced_categorical_cols::Vector{Int64}: Specify the positions of the integer columns to treat as categorical instead of cardinal. [Default: empty vector (all numerical cols are treated as cardinal by default and the others as categorical)]

  • splitting_criterion::Union{Nothing, Function}: Either gini, entropy or variance. This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. It can be an anonymous function.

  • recursive_passages::Int64: Define the times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
+ UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0667, virginica=>0.933)
source
BetaML.Bmlj.RandomForestImputerType
mutable struct RandomForestImputer <: MLJModelInterface.Unsupervised

Impute missing values using Random Forests, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_trees::Int64: Number of (decision) trees in the forest [def: 30]

  • max_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Union{Nothing, Int64}: The maximum number of (random) features to consider at each partitioning [def: nothing, i.e. square root of the data dimension]

  • forced_categorical_cols::Vector{Int64}: Specify the positions of the integer columns to treat as categorical instead of cardinal. [Default: empty vector (all numerical cols are treated as cardinal by default and the others as categorical)]

  • splitting_criterion::Union{Nothing, Function}: Either gini, entropy or variance. This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. It can be an anonymous function.

  • recursive_passages::Int64: Define the times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
 
 julia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;
 
@@ -632,7 +632,7 @@
  2.88375   8.66125
  3.3      38.0
  3.98125  -2.3
- 5.2      -2.4
source
BetaML.Bmlj.RandomForestRegressorType
mutable struct RandomForestRegressor <: MLJModelInterface.Deterministic

A simple Random Forest model for regression with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_trees::Int64: Number of (decision) trees in the forest [def: 30]

  • max_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. square root of the data dimension]

  • splitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: variance]. Either variance or a custom function. It can also be an anonymous function.

  • β::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour "better" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
+ 5.2      -2.4
source
BetaML.Bmlj.RandomForestRegressorType
mutable struct RandomForestRegressor <: MLJModelInterface.Deterministic

A simple Random Forest model for regression with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_trees::Int64: Number of (decision) trees in the forest [def: 30]

  • max_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. square root of the data dimension]

  • splitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: variance]. Either variance or a custom function. It can also be an anonymous function.

  • β::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour "better" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ
 
 julia> X, y        = @load_boston;
 
@@ -666,7 +666,7 @@
   ⋮    
  23.9  24.42
  22.0  22.4433
- 11.9  15.5833
source
BetaML.Bmlj.SimpleImputerType
mutable struct SimpleImputer <: MLJModelInterface.Unsupervised

Impute missing values using feature (column) mean, with optional record normalisation (using l-norm norms), from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • statistic::Function: The descriptive statistic of the column (feature) to use as imputed value [def: mean]

  • norm::Union{Nothing, Int64}: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).

Example:

julia> using MLJ
+ 11.9  15.5833
source
BetaML.Bmlj.SimpleImputerType
mutable struct SimpleImputer <: MLJModelInterface.Unsupervised

Impute missing values using feature (column) mean, with optional record normalisation (using l-norm norms), from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • statistic::Function: The descriptive statistic of the column (feature) to use as imputed value [def: mean]

  • norm::Union{Nothing, Int64}: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).

Example:

julia> using MLJ
 
 julia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;
 
@@ -693,22 +693,22 @@
  0.280952    1.69524
  3.3        38.0
  0.0750839  -2.3
- 5.2        -2.4
source
MLJModelInterface.fitMethod
fit(
     m::BetaML.Bmlj.MultitargetNeuralNetworkRegressor,
     verbosity,
     X,
     y
 ) -> Tuple{NeuralNetworkEstimator, Nothing, Nothing}
-

For the verbosity parameter see Verbosity)

source
MLJModelInterface.fitMethod
fit(
     m::BetaML.Bmlj.NeuralNetworkRegressor,
     verbosity,
     X,
     y
 ) -> Tuple{NeuralNetworkEstimator, Nothing, Nothing}
-

For the verbosity parameter see Verbosity)

source
MLJModelInterface.predictMethod

predict(m::KMeansClusterer, fitResults, X) - Given a fitted clustering model and some observations, predict the class of the observation

source
MLJModelInterface.transformMethod
transform(m, fitResults, X)

Given a trained imputator model fill the missing data of some new observations. Note that with multiple recursive imputations and inner estimators that don't support missing data, this function works only for X for which th model has been trained with, i.e. this function can not be applied to new matrices with empty values using model trained on other matrices.

source
MLJModelInterface.transformMethod

fit(m::KMeansClusterer, fitResults, X) - Given a fitted clustering model and some observations, return the distances to each centroids

source
+

For the verbosity parameter see Verbosity)

source
MLJModelInterface.predictMethod

predict(m::KMeansClusterer, fitResults, X) - Given a fitted clustering model and some observations, predict the class of the observation

source
MLJModelInterface.transformMethod
transform(m, fitResults, X)

Given a trained imputator model fill the missing data of some new observations. Note that with multiple recursive imputations and inner estimators that don't support missing data, this function works only for X for which th model has been trained with, i.e. this function can not be applied to new matrices with empty values using model trained on other matrices.

source
MLJModelInterface.transformMethod

fit(m::KMeansClusterer, fitResults, X) - Given a fitted clustering model and some observations, return the distances to each centroids

source
diff --git a/dev/Nn.html b/dev/Nn.html index aaca27b..aa27583 100644 --- a/dev/Nn.html +++ b/dev/Nn.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

The BetaML.Nn Module

BetaML.NnModule
BetaML.Nn module

Implement the functionality required to define an artificial Neural Network, train it with data, forecast data and assess its performances.

Common type of layers and optimisation algorithms are already provided, but you can define your own ones subclassing respectively the AbstractLayer and OptimisationAlgorithm abstract types.

The module provide the following types or functions. Use ?[type or function] to access their full signature and detailed documentation:

Model definition:

  • DenseLayer: Classical feed-forward layer with user-defined activation function
  • DenseNoBiasLayer: Classical layer without the bias parameter
  • VectorFunctionLayer: Layer whose activation function run over the ensable of its nodes rather than on each one individually. No learnable weigths on input, optional learnable weigths as parameters of the activation function.
  • ScalarFunctionLayer: Layer whose activation function run over each node individually, like a classic DenseLayer, but with no learnable weigths on input and optional learnable weigths as parameters of the activation function.
  • ReplicatorLayer: Alias for a ScalarFunctionLayer with no learnable parameters and identity as activation function
  • ReshaperLayer: Reshape the output of a layer (or the input data) to the shape needed for the next one
  • PoolingLayer: In the middle between VectorFunctionLayer and ScalarFunctionLayer, it applyes a function to the set of nodes defined in a sliding kernel. Weightless.
  • ConvLayer: A generic N+1 (channels) dimensional convolutional layer
  • GroupedLayer: To stack several layers into a single layer, e.g. for multi-branches networks
  • NeuralNetworkEstimator: Build the chained network and define a cost function

Each layer can use a default activation function, one of the functions provided in the Utils module (relu, tanh, softmax,...) or one provided by you. BetaML will try to recognise if it is a "known" function for which it sets the exact derivatives, otherwise you can normally provide the layer with it. If the derivative of the activation function is not provided (either manually or automatically), AD will be used and training may be slower, altought this difference tends to vanish with bigger datasets.

You can alternativly implement your own layer defining a new type as subtype of the abstract type AbstractLayer. Each user-implemented layer must define the following methods:

  • A suitable constructor
  • forward(layer,x)
  • backward(layer,x,next_gradient)
  • get_params(layer)
  • get_gradient(layer,x,next_gradient)
  • set_params!(layer,w)
  • size(layer)

Model fitting:

  • fit!(nn,X,Y): fitting function
  • fitting_info(nn): Default callback function during fitting
  • SGD: The classical optimisation algorithm
  • ADAM: A faster moment-based optimisation algorithm

To define your own optimisation algorithm define a subtype of OptimisationAlgorithm and implement the function single_update!(θ,▽;opt_alg) and eventually init_optalg!(⋅) specific for it.

Model predictions and assessment:

  • predict(nn) or predict(nn,X): Return the output given the data

While high-level functions operating on the dataset expect it to be in the standard format (nrecords × ndimensions matrices) it is customary to represent the chain of a neural network as a flow of column vectors, so all low-level operations (operating on a single datapoint) expect both the input and the output as a column vector.

source

Module Index

Detailed API

BetaML.Nn.ADAMType
ADAM(;η, λ, β₁, β₂, ϵ)

The ADAM algorithm, an adaptive moment estimation optimiser.

Fields:

  • η: Learning rate (stepsize, α in the paper), as a function of the current epoch [def: t -> 0.001 (i.e. fixed)]
  • λ: Multiplicative constant to the learning rate [def: 1]
  • β₁: Exponential decay rate for the first moment estimate [range: ∈ [0,1], def: 0.9]
  • β₂: Exponential decay rate for the second moment estimate [range: ∈ [0,1], def: 0.999]
  • ϵ: Epsilon value to avoid division by zero [def: 10^-8]
source
BetaML.Nn.ConvLayerType
struct ConvLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

A generic N+1 (channels) dimensional convolutional layer

EXPERIMENTAL: Still too slow for practical applications

This convolutional layer has two constructors, one with the form ConvLayer(input_size,kernel_size,nchannels_in,nchannels_out), and an alternative one as ConvLayer(input_size_with_channel,kernel_size,nchannels_out). If the input is a vector, use a ReshaperLayer in front.

Fields:

  • input_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Input size (including nchannel_in as last dimension)

  • output_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Output size (including nchannel_out as last dimension)

  • weight::Array{WET, NDPLUS2} where {NDPLUS2, WET<:Number}: Weight tensor (aka "filter" or "kernel") with respect to the input from previous layer or data (kernelsize array augmented by the nchannelsin and nchannels_out dimensions)

  • usebias::Bool: Wether to use (and learn) a bias weigth [def: true]

  • bias::Vector{WET} where WET<:Number: Bias (nchannels_out array)

  • padding_start::StaticArraysCore.SVector{ND, Int64} where ND: Padding (initial)

  • padding_end::StaticArraysCore.SVector{ND, Int64} where ND: Padding (ending)

  • stride::StaticArraysCore.SVector{ND, Int64} where ND: Stride

  • ndims::Int64: Number of dimensions (excluding input and output channels)

  • f::Function: Activation function

  • df::Union{Nothing, Function}: Derivative of the activation function

  • x_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1: x ids of the convolution (computed in preprocessing- itself at the beginning oftrain`

  • y_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1: y ids of the convolution (computed in preprocessing- itself at the beginning oftrain`

  • w_ids::Array{StaticArraysCore.SVector{NDPLUS2, Int64}, 1} where NDPLUS2: w ids of the convolution (computed in preprocessing- itself at the beginning oftrain`

  • y_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1: A y-dims array of vectors of ids of x(s) contributing to the giving y

  • y_to_w_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS2}}, 1}, NDPLUS1} where {NDPLUS1, NDPLUS2}: A y-dims array of vectors of corresponding w(s) contributing to the giving y

source
BetaML.Nn.ConvLayerMethod
ConvLayer(
+

The BetaML.Nn Module

BetaML.NnModule
BetaML.Nn module

Implement the functionality required to define an artificial Neural Network, train it with data, forecast data and assess its performances.

Common type of layers and optimisation algorithms are already provided, but you can define your own ones subclassing respectively the AbstractLayer and OptimisationAlgorithm abstract types.

The module provide the following types or functions. Use ?[type or function] to access their full signature and detailed documentation:

Model definition:

  • DenseLayer: Classical feed-forward layer with user-defined activation function
  • DenseNoBiasLayer: Classical layer without the bias parameter
  • VectorFunctionLayer: Layer whose activation function run over the ensable of its nodes rather than on each one individually. No learnable weigths on input, optional learnable weigths as parameters of the activation function.
  • ScalarFunctionLayer: Layer whose activation function run over each node individually, like a classic DenseLayer, but with no learnable weigths on input and optional learnable weigths as parameters of the activation function.
  • ReplicatorLayer: Alias for a ScalarFunctionLayer with no learnable parameters and identity as activation function
  • ReshaperLayer: Reshape the output of a layer (or the input data) to the shape needed for the next one
  • PoolingLayer: In the middle between VectorFunctionLayer and ScalarFunctionLayer, it applyes a function to the set of nodes defined in a sliding kernel. Weightless.
  • ConvLayer: A generic N+1 (channels) dimensional convolutional layer
  • GroupedLayer: To stack several layers into a single layer, e.g. for multi-branches networks
  • NeuralNetworkEstimator: Build the chained network and define a cost function

Each layer can use a default activation function, one of the functions provided in the Utils module (relu, tanh, softmax,...) or one provided by you. BetaML will try to recognise if it is a "known" function for which it sets the exact derivatives, otherwise you can normally provide the layer with it. If the derivative of the activation function is not provided (either manually or automatically), AD will be used and training may be slower, altought this difference tends to vanish with bigger datasets.

You can alternativly implement your own layer defining a new type as subtype of the abstract type AbstractLayer. Each user-implemented layer must define the following methods:

  • A suitable constructor
  • forward(layer,x)
  • backward(layer,x,next_gradient)
  • get_params(layer)
  • get_gradient(layer,x,next_gradient)
  • set_params!(layer,w)
  • size(layer)

Model fitting:

  • fit!(nn,X,Y): fitting function
  • fitting_info(nn): Default callback function during fitting
  • SGD: The classical optimisation algorithm
  • ADAM: A faster moment-based optimisation algorithm

To define your own optimisation algorithm define a subtype of OptimisationAlgorithm and implement the function single_update!(θ,▽;opt_alg) and eventually init_optalg!(⋅) specific for it.

Model predictions and assessment:

  • predict(nn) or predict(nn,X): Return the output given the data

While high-level functions operating on the dataset expect it to be in the standard format (nrecords × ndimensions matrices) it is customary to represent the chain of a neural network as a flow of column vectors, so all low-level operations (operating on a single datapoint) expect both the input and the output as a column vector.

source

Module Index

Detailed API

BetaML.Nn.ADAMType
ADAM(;η, λ, β₁, β₂, ϵ)

The ADAM algorithm, an adaptive moment estimation optimiser.

Fields:

  • η: Learning rate (stepsize, α in the paper), as a function of the current epoch [def: t -> 0.001 (i.e. fixed)]
  • λ: Multiplicative constant to the learning rate [def: 1]
  • β₁: Exponential decay rate for the first moment estimate [range: ∈ [0,1], def: 0.9]
  • β₂: Exponential decay rate for the second moment estimate [range: ∈ [0,1], def: 0.999]
  • ϵ: Epsilon value to avoid division by zero [def: 10^-8]
source
BetaML.Nn.ConvLayerType
struct ConvLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

A generic N+1 (channels) dimensional convolutional layer

EXPERIMENTAL: Still too slow for practical applications

This convolutional layer has two constructors, one with the form ConvLayer(input_size,kernel_size,nchannels_in,nchannels_out), and an alternative one as ConvLayer(input_size_with_channel,kernel_size,nchannels_out). If the input is a vector, use a ReshaperLayer in front.

Fields:

  • input_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Input size (including nchannel_in as last dimension)

  • output_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Output size (including nchannel_out as last dimension)

  • weight::Array{WET, NDPLUS2} where {NDPLUS2, WET<:Number}: Weight tensor (aka "filter" or "kernel") with respect to the input from previous layer or data (kernelsize array augmented by the nchannelsin and nchannels_out dimensions)

  • usebias::Bool: Wether to use (and learn) a bias weigth [def: true]

  • bias::Vector{WET} where WET<:Number: Bias (nchannels_out array)

  • padding_start::StaticArraysCore.SVector{ND, Int64} where ND: Padding (initial)

  • padding_end::StaticArraysCore.SVector{ND, Int64} where ND: Padding (ending)

  • stride::StaticArraysCore.SVector{ND, Int64} where ND: Stride

  • ndims::Int64: Number of dimensions (excluding input and output channels)

  • f::Function: Activation function

  • df::Union{Nothing, Function}: Derivative of the activation function

  • x_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1: x ids of the convolution (computed in preprocessing- itself at the beginning oftrain`

  • y_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1: y ids of the convolution (computed in preprocessing- itself at the beginning oftrain`

  • w_ids::Array{StaticArraysCore.SVector{NDPLUS2, Int64}, 1} where NDPLUS2: w ids of the convolution (computed in preprocessing- itself at the beginning oftrain`

  • y_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1: A y-dims array of vectors of ids of x(s) contributing to the giving y

  • y_to_w_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS2}}, 1}, NDPLUS1} where {NDPLUS1, NDPLUS2}: A y-dims array of vectors of corresponding w(s) contributing to the giving y

source
BetaML.Nn.ConvLayerMethod
ConvLayer(
     input_size,
     kernel_size,
     nchannels_in,
@@ -18,7 +18,7 @@
     f,
     df
 ) -> ConvLayer{_A, _B, _C, typeof(identity), _D, Float64} where {_A, _B, _C, _D<:Union{Nothing, Function}}
-

Instantiate a new nD-dimensional, possibly multichannel ConvolutionalLayer

The input data is either a column vector (in which case is reshaped) or an array of input_size augmented by the n_channels dimension, the output size depends on the input_size, kernel_size, padding and striding but has always nchannels_out as its last dimention.

Positional arguments:

  • input_size: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.
  • kernel_size: Size of the kernel (aka filter or learnable weights) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.
  • nchannels_in: Number of channels in input
  • nchannels_out: Number of channels in output

Keyword arguments:

  • stride: "Steps" to move the convolution with across the various tensor dimensions [def: ones]
  • padding: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def: nothing, i.e. set the padding required to keep the same dimensions in output (with stride==1)]
  • f: Activation function [def: relu]
  • df: Derivative of the activation function [default: try to match a known funcion, AD otherwise. Use nothing to force AD]
  • kernel_eltype: Kernel eltype [def: Float64]
  • kernel_init: Initial weigths with respect to the input [default: Xavier initialisation]. If explicitly provided, it should be a multidimensional array of kernel_size augmented by nchannels_in and nchannels_out dimensions
  • bias_init: Initial weigths with respect to the bias [default: Xavier initialisation]. If given it should be a nchannels_out vector of scalars.
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Xavier initialization is sampled from a Uniform distribution between ⨦ sqrt(6/(prod(input_size)*nchannels_in))
  • to retrieve the output size of the layer, use size(ConvLayer[2]). The output size on each dimension d (except the last one that is given by nchannels_out) is given by the following formula (ceiled): output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]
  • with strides higher than 1, the automatic padding is set to keep outsize = inside/stride
source
BetaML.Nn.ConvLayerMethod
ConvLayer(
+

Instantiate a new nD-dimensional, possibly multichannel ConvolutionalLayer

The input data is either a column vector (in which case is reshaped) or an array of input_size augmented by the n_channels dimension, the output size depends on the input_size, kernel_size, padding and striding but has always nchannels_out as its last dimention.

Positional arguments:

  • input_size: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.
  • kernel_size: Size of the kernel (aka filter or learnable weights) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.
  • nchannels_in: Number of channels in input
  • nchannels_out: Number of channels in output

Keyword arguments:

  • stride: "Steps" to move the convolution with across the various tensor dimensions [def: ones]
  • padding: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def: nothing, i.e. set the padding required to keep the same dimensions in output (with stride==1)]
  • f: Activation function [def: relu]
  • df: Derivative of the activation function [default: try to match a known funcion, AD otherwise. Use nothing to force AD]
  • kernel_eltype: Kernel eltype [def: Float64]
  • kernel_init: Initial weigths with respect to the input [default: Xavier initialisation]. If explicitly provided, it should be a multidimensional array of kernel_size augmented by nchannels_in and nchannels_out dimensions
  • bias_init: Initial weigths with respect to the bias [default: Xavier initialisation]. If given it should be a nchannels_out vector of scalars.
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Xavier initialization is sampled from a Uniform distribution between ⨦ sqrt(6/(prod(input_size)*nchannels_in))
  • to retrieve the output size of the layer, use size(ConvLayer[2]). The output size on each dimension d (except the last one that is given by nchannels_out) is given by the following formula (ceiled): output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]
  • with strides higher than 1, the automatic padding is set to keep outsize = inside/stride
source
BetaML.Nn.ConvLayerMethod
ConvLayer(
     input_size_with_channel,
     kernel_size,
     nchannels_out;
@@ -32,7 +32,7 @@
     f,
     df
 ) -> ConvLayer{_A, _B, _C, typeof(identity), _D, _E} where {_A, _B, _C, _D<:Union{Nothing, Function}, _E<:Number}
-

Alternative constructor for a ConvLayer where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2] if one wish.

For arguments and default values see the documentation of the main constructor.

source
BetaML.Nn.DenseLayerType
struct DenseLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a layer in the network

Fields:

  • w: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)
  • wb: Biases (n)
  • f: Activation function
  • df: Derivative of the activation function
source
BetaML.Nn.DenseLayerMethod
DenseLayer(
+

Alternative constructor for a ConvLayer where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2] if one wish.

For arguments and default values see the documentation of the main constructor.

source
BetaML.Nn.DenseLayerType
struct DenseLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a layer in the network

Fields:

  • w: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)
  • wb: Biases (n)
  • f: Activation function
  • df: Derivative of the activation function
source
BetaML.Nn.DenseLayerMethod
DenseLayer(
     nₗ,
     n;
     rng,
@@ -42,7 +42,7 @@
     f,
     df
 ) -> DenseLayer{typeof(identity), _A, Float64} where _A<:Union{Nothing, Function}
-

Instantiate a new DenseLayer

Positional arguments:

  • nₗ: Number of nodes of the previous layer
  • n: Number of nodes

Keyword arguments:

  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (n,nₗ)]
  • wb: Initial weigths with respect to bias [default: Xavier initialisation, dims = (n)]
  • f: Activation function [def: identity]
  • df: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD if f is unknown]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ+n))
  • Specify df=nothing to explicitly use AD
source
BetaML.Nn.DenseNoBiasLayerType
struct DenseNoBiasLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a layer without bias in the network

Fields:

  • w: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)
  • f: Activation function
  • df: Derivative of the activation function
source
BetaML.Nn.DenseNoBiasLayerMethod
DenseNoBiasLayer(
+

Instantiate a new DenseLayer

Positional arguments:

  • nₗ: Number of nodes of the previous layer
  • n: Number of nodes

Keyword arguments:

  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (n,nₗ)]
  • wb: Initial weigths with respect to bias [default: Xavier initialisation, dims = (n)]
  • f: Activation function [def: identity]
  • df: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD if f is unknown]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ+n))
  • Specify df=nothing to explicitly use AD
source
BetaML.Nn.DenseNoBiasLayerType
struct DenseNoBiasLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a layer without bias in the network

Fields:

  • w: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)
  • f: Activation function
  • df: Derivative of the activation function
source
BetaML.Nn.DenseNoBiasLayerMethod
DenseNoBiasLayer(
     nₗ,
     n;
     rng,
@@ -51,8 +51,8 @@
     f,
     df
 ) -> DenseNoBiasLayer{typeof(identity), _A, Float64} where _A<:Union{Nothing, Function}
-

Instantiate a new DenseNoBiasLayer

Positional arguments:

  • nₗ: Number of nodes of the previous layer
  • n: Number of nodes

Keyword arguments:

  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]
  • f: Activation function [def: identity]
  • df: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD if f is unknown]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ,n))
source
BetaML.Nn.GroupedLayerType
struct GroupedLayer <: AbstractLayer

Representation of a "group" of layers, each of which operates on different inputs (features) and acting as a single layer in the network.

Fields:

  • layers: The individual layers that compose this grouped layer
source
BetaML.Nn.GroupedLayerMethod
GroupedLayer(layers) -> GroupedLayer
-

Instantiate a new GroupedLayer, a layer made up of several other layers stacked together in order to cover all the data dimensions but without connect all the inputs to all the outputs like a single DenseLayer would do.

Positional arguments:

  • layers: The individual layers that compose this grouped layer

Notes:

  • can be used to create composable neural networks with multiple branches
  • tested only with 1 dimensional layers. For convolutional networks use ReshaperLayers before and/or after.
source
BetaML.Nn.LearnableType

Learnable(data)

Structure representing the learnable parameters of a layer or its gradient.

The learnable parameters of a layers are given in the form of a N-tuple of Array{Float64,N2} where N2 can change (e.g. we can have a layer with the first parameter being a matrix, and the second one being a scalar). We wrap the tuple on its own structure a bit for some efficiency gain, but above all to define standard mathematic operations on the gradients without doing "type piracy" with respect to Base tuples.

source
BetaML.Nn.NeuralNetworkE_hpType

**`

mutable struct NeuralNetworkE_hp <: BetaMLHyperParametersSet

`**

Hyperparameters for the Feedforward neural network model

Parameters:

  • layers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers

  • loss: Loss (cost) function [def: squared_cost] It must always assume y and ŷ as (n x d) matrices, eventually using dropdims inside.

  • dloss: Derivative of the loss function [def: dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 16]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: ADAM()]

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

To know the available layers type subtypes(AbstractLayer)) and then type ?LayerName for information on how to use each layer.

source
BetaML.Nn.NeuralNetworkE_optionsType

NeuralNetworkE_options

A struct defining the options used by the Feedforward neural network model

Parameters:

  • cache: Cache the results of the fitting stage, as to allow predict(mod) [default: true]. Set it to false to save memory for large data.

  • descr: An optional title and/or description for this model

  • verbosity: The verbosity level to be used in training or prediction (see Verbosity) [deafult: STD]

  • cb: A call back function to provide information during training [def: fitting_info

  • autotune: 0ption for hyper-parameters autotuning [def: false, i.e. not autotuning performed]. If activated, autotuning is performed on the first fit!() call. Controll auto-tuning trough the option tunemethod (see the model hyper-parameters)

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

source
BetaML.Nn.NeuralNetworkEstimatorType

NeuralNetworkEstimator

A "feedforward" (but also multi-branch) neural network (supervised).

For the parameters see NeuralNetworkE_hp and for the training options NeuralNetworkE_options (we have a few more options for this specific estimator).

Notes:

  • data must be numerical
  • the label can be a n-records vector or a n-records by n-dimensions matrix, but the result is always a matrix.
    • For one-dimension regressions drop the unnecessary dimension with dropdims(ŷ,dims=2)
    • For classification tasks the columns should normally be interpreted as the probabilities for each categories

Examples:

  • Classification...
julia> using BetaML
+

Instantiate a new DenseNoBiasLayer

Positional arguments:

  • nₗ: Number of nodes of the previous layer
  • n: Number of nodes

Keyword arguments:

  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]
  • f: Activation function [def: identity]
  • df: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD if f is unknown]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ,n))
source
BetaML.Nn.GroupedLayerType
struct GroupedLayer <: AbstractLayer

Representation of a "group" of layers, each of which operates on different inputs (features) and acting as a single layer in the network.

Fields:

  • layers: The individual layers that compose this grouped layer
source
BetaML.Nn.GroupedLayerMethod
GroupedLayer(layers) -> GroupedLayer
+

Instantiate a new GroupedLayer, a layer made up of several other layers stacked together in order to cover all the data dimensions but without connect all the inputs to all the outputs like a single DenseLayer would do.

Positional arguments:

  • layers: The individual layers that compose this grouped layer

Notes:

  • can be used to create composable neural networks with multiple branches
  • tested only with 1 dimensional layers. For convolutional networks use ReshaperLayers before and/or after.
source
BetaML.Nn.LearnableType

Learnable(data)

Structure representing the learnable parameters of a layer or its gradient.

The learnable parameters of a layers are given in the form of a N-tuple of Array{Float64,N2} where N2 can change (e.g. we can have a layer with the first parameter being a matrix, and the second one being a scalar). We wrap the tuple on its own structure a bit for some efficiency gain, but above all to define standard mathematic operations on the gradients without doing "type piracy" with respect to Base tuples.

source
BetaML.Nn.NeuralNetworkE_hpType

**`

mutable struct NeuralNetworkE_hp <: BetaMLHyperParametersSet

`**

Hyperparameters for the Feedforward neural network model

Parameters:

  • layers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers

  • loss: Loss (cost) function [def: squared_cost] It must always assume y and ŷ as (n x d) matrices, eventually using dropdims inside.

  • dloss: Derivative of the loss function [def: dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 16]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: ADAM()]

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

To know the available layers type subtypes(AbstractLayer)) and then type ?LayerName for information on how to use each layer.

source
BetaML.Nn.NeuralNetworkE_optionsType

NeuralNetworkE_options

A struct defining the options used by the Feedforward neural network model

Parameters:

  • cache: Cache the results of the fitting stage, as to allow predict(mod) [default: true]. Set it to false to save memory for large data.

  • descr: An optional title and/or description for this model

  • verbosity: The verbosity level to be used in training or prediction (see Verbosity) [deafult: STD]

  • cb: A call back function to provide information during training [def: fitting_info

  • autotune: 0ption for hyper-parameters autotuning [def: false, i.e. not autotuning performed]. If activated, autotuning is performed on the first fit!() call. Controll auto-tuning trough the option tunemethod (see the model hyper-parameters)

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

source
BetaML.Nn.NeuralNetworkEstimatorType

NeuralNetworkEstimator

A "feedforward" (but also multi-branch) neural network (supervised).

For the parameters see NeuralNetworkE_hp and for the training options NeuralNetworkE_options (we have a few more options for this specific estimator).

Notes:

  • data must be numerical
  • the label can be a n-records vector or a n-records by n-dimensions matrix, but the result is always a matrix.
    • For one-dimension regressions drop the unnecessary dimension with dropdims(ŷ,dims=2)
    • For classification tasks the columns should normally be interpreted as the probabilities for each categories

Examples:

  • Classification...
julia> using BetaML
 
 julia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];
 
@@ -119,7 +119,7 @@
  -13.8  -13.8381
  -18.4  -18.3876
  -27.2  -27.1667
-   2.7    2.70542
source
BetaML.Nn.PoolingLayerType
struct PoolingLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a pooling layer in the network (weightless)

EXPERIMENTAL: Still too slow for practical applications

In the middle between VectorFunctionLayer and ScalarFunctionLayer, it applyes a function to the set of nodes defined in a sliding kernel.

Fields:

  • input_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Input size (including nchannel_in as last dimension)

  • output_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Output size (including nchannel_out as last dimension)

  • kernel_size::StaticArraysCore.SVector{NDPLUS2, Int64} where NDPLUS2: kernelsize augmented by the nchannelsin and nchannels_out dimensions

  • padding_start::StaticArraysCore.SVector{ND, Int64} where ND: Padding (initial)

  • padding_end::StaticArraysCore.SVector{ND, Int64} where ND: Padding (ending)

  • stride::StaticArraysCore.SVector{ND, Int64} where ND: Stride

  • ndims::Int64: Number of dimensions (excluding input and output channels)

  • f::Function: Activation function

  • df::Union{Nothing, Function}: Derivative of the activation function

  • y_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1: A y-dims array of vectors of ids of x(s) contributing to the giving y

source
BetaML.Nn.PoolingLayerType
struct PoolingLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a pooling layer in the network (weightless)

EXPERIMENTAL: Still too slow for practical applications

In the middle between VectorFunctionLayer and ScalarFunctionLayer, it applyes a function to the set of nodes defined in a sliding kernel.

Fields:

  • input_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Input size (including nchannel_in as last dimension)

  • output_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Output size (including nchannel_out as last dimension)

  • kernel_size::StaticArraysCore.SVector{NDPLUS2, Int64} where NDPLUS2: kernelsize augmented by the nchannelsin and nchannels_out dimensions

  • padding_start::StaticArraysCore.SVector{ND, Int64} where ND: Padding (initial)

  • padding_end::StaticArraysCore.SVector{ND, Int64} where ND: Padding (ending)

  • stride::StaticArraysCore.SVector{ND, Int64} where ND: Stride

  • ndims::Int64: Number of dimensions (excluding input and output channels)

  • f::Function: Activation function

  • df::Union{Nothing, Function}: Derivative of the activation function

  • y_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1: A y-dims array of vectors of ids of x(s) contributing to the giving y

source
BetaML.Nn.PoolingLayerMethod
PoolingLayer(
     input_size,
     kernel_size,
     nchannels_in;
@@ -129,7 +129,7 @@
     f,
     df
 ) -> PoolingLayer{_A, _B, _C, typeof(maximum), _D, Float64} where {_A, _B, _C, _D<:Union{Nothing, Function}}
-

Instantiate a new nD-dimensional, possibly multichannel PoolingLayer

The input data is either a column vector (in which case is reshaped) or an array of input_size augmented by the n_channels dimension, the output size depends on the input_size, kernel_size, padding and striding but has always nchannels_out as its last dimention.

Positional arguments:

  • input_size: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.
  • kernel_eltype: Kernel eltype [def: Float64]
  • kernel_size: Size of the kernel (aka filter) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.
  • nchannels_in: Number of channels in input
  • nchannels_out: Number of channels in output

Keyword arguments:

  • stride: "Steps" to move the convolution with across the various tensor dimensions [def: kernel_size, i.e. each X contributes to a single y]
  • padding: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def: nothing, i.e. set the padding required to keep outside = inside / stride ]
  • f: Activation function. It should have a vector as input and produce a scalar as output[def: maximum]
  • df: Derivative (gradient) of the activation function for the various inputs. [default: nothing (i.e. use AD)]

Notes:

  • to retrieve the output size of the layer, use size(PoolLayer[2]). The output size on each dimension d (except the last one that is given by nchannels_out) is given by the following formula (ceiled): output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]
  • differently from a ConvLayer, the pooling applies always on a single channel level, so that the output has always the same number of channels of the input. If you want to reduce the channels number either use a ConvLayer with the desired number of channels in output or use a ReghaperLayer to add a 1-element further dimension that will be treated as "channel" and choose the desided stride for the last pooling dimension (the one that was originally the channel dimension)
source
BetaML.Nn.PoolingLayerMethod
PoolingLayer(
+

Instantiate a new nD-dimensional, possibly multichannel PoolingLayer

The input data is either a column vector (in which case is reshaped) or an array of input_size augmented by the n_channels dimension, the output size depends on the input_size, kernel_size, padding and striding but has always nchannels_out as its last dimention.

Positional arguments:

  • input_size: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.
  • kernel_eltype: Kernel eltype [def: Float64]
  • kernel_size: Size of the kernel (aka filter) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.
  • nchannels_in: Number of channels in input
  • nchannels_out: Number of channels in output

Keyword arguments:

  • stride: "Steps" to move the convolution with across the various tensor dimensions [def: kernel_size, i.e. each X contributes to a single y]
  • padding: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def: nothing, i.e. set the padding required to keep outside = inside / stride ]
  • f: Activation function. It should have a vector as input and produce a scalar as output[def: maximum]
  • df: Derivative (gradient) of the activation function for the various inputs. [default: nothing (i.e. use AD)]

Notes:

  • to retrieve the output size of the layer, use size(PoolLayer[2]). The output size on each dimension d (except the last one that is given by nchannels_out) is given by the following formula (ceiled): output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]
  • differently from a ConvLayer, the pooling applies always on a single channel level, so that the output has always the same number of channels of the input. If you want to reduce the channels number either use a ConvLayer with the desired number of channels in output or use a ReghaperLayer to add a 1-element further dimension that will be treated as "channel" and choose the desided stride for the last pooling dimension (the one that was originally the channel dimension)
source
BetaML.Nn.PoolingLayerMethod
PoolingLayer(
     input_size_with_channel,
     kernel_size;
     stride,
@@ -138,14 +138,14 @@
     kernel_eltype,
     df
 ) -> PoolingLayer{_A, _B, _C, typeof(maximum), _D, _E} where {_A, _B, _C, _D<:Union{Nothing, Function}, _E<:Number}
-

Alternative constructor for a PoolingLayer where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2] if one wish.

For arguments and default values see the documentation of the main constructor.

source
BetaML.Nn.ReshaperLayerType
struct ReshaperLayer{NDIN, NDOUT} <: AbstractLayer

Representation of a "reshaper" (weigthless) layer in the network

Reshape the output of a layer (or the input data) to the shape needed for the next one.

Fields:

  • input_size::StaticArraysCore.SVector{NDIN, Int64} where NDIN: Input size

  • output_size::StaticArraysCore.SVector{NDOUT, Int64} where NDOUT: Output size

source
BetaML.Nn.ReshaperLayerType
ReshaperLayer(
+

Alternative constructor for a PoolingLayer where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2] if one wish.

For arguments and default values see the documentation of the main constructor.

source
BetaML.Nn.ReshaperLayerType
struct ReshaperLayer{NDIN, NDOUT} <: AbstractLayer

Representation of a "reshaper" (weigthless) layer in the network

Reshape the output of a layer (or the input data) to the shape needed for the next one.

Fields:

  • input_size::StaticArraysCore.SVector{NDIN, Int64} where NDIN: Input size

  • output_size::StaticArraysCore.SVector{NDOUT, Int64} where NDOUT: Output size

source
BetaML.Nn.ReshaperLayerType
ReshaperLayer(
     input_size
 ) -> ReshaperLayer{_A, _B} where {_A, _B}
 ReshaperLayer(
     input_size,
     output_size
 ) -> ReshaperLayer{_A, _B} where {_A, _B}
-

Instantiate a new ReshaperLayer

Positional arguments:

  • input_size: Shape of the input layer (tuple).
  • output_size: Shape of the input layer (tuple) [def: prod([input_size...])), i.e. reshape to a vector of appropriate lenght].
source
BetaML.Nn.SGDType
SGD(;η=t -> 1/(1+t), λ=2)

Stochastic Gradient Descent algorithm (default)

Fields:

  • η: Learning rate, as a function of the current epoch [def: t -> 1/(1+t)]
  • λ: Multiplicative constant to the learning rate [def: 2]
source
BetaML.Nn.ScalarFunctionLayerType
struct ScalarFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a ScalarFunction layer in the network. ScalarFunctionLayer applies the activation function directly to the output of the previous layer (i.e., without passing for a weigth matrix), but using an optional learnable parameter (an array) used as second argument, similarly to [VectorFunctionLayer(@ref). Differently from VectorFunctionLayer, the function is applied scalarwise to each node.

The number of nodes in input must be set to the same as in the previous layer

Fields:

  • w: Weigths (parameter) array passes as second argument to the activation function (if not empty)
  • n: Number of nodes in output (≡ number of nodes in input )
  • f: Activation function (vector)
  • dfx: Derivative of the (vector) activation function with respect to the layer inputs (x)
  • dfw: Derivative of the (vector) activation function with respect to the optional learnable weigths (w)

Notes:

  • The output size of this layer is the same as those of the previous layers.
source
BetaML.Nn.ScalarFunctionLayerMethod
ScalarFunctionLayer(
+

Instantiate a new ReshaperLayer

Positional arguments:

  • input_size: Shape of the input layer (tuple).
  • output_size: Shape of the input layer (tuple) [def: prod([input_size...])), i.e. reshape to a vector of appropriate lenght].
source
BetaML.Nn.SGDType
SGD(;η=t -> 1/(1+t), λ=2)

Stochastic Gradient Descent algorithm (default)

Fields:

  • η: Learning rate, as a function of the current epoch [def: t -> 1/(1+t)]
  • λ: Multiplicative constant to the learning rate [def: 2]
source
BetaML.Nn.ScalarFunctionLayerType
struct ScalarFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a ScalarFunction layer in the network. ScalarFunctionLayer applies the activation function directly to the output of the previous layer (i.e., without passing for a weigth matrix), but using an optional learnable parameter (an array) used as second argument, similarly to [VectorFunctionLayer(@ref). Differently from VectorFunctionLayer, the function is applied scalarwise to each node.

The number of nodes in input must be set to the same as in the previous layer

Fields:

  • w: Weigths (parameter) array passes as second argument to the activation function (if not empty)
  • n: Number of nodes in output (≡ number of nodes in input )
  • f: Activation function (vector)
  • dfx: Derivative of the (vector) activation function with respect to the layer inputs (x)
  • dfw: Derivative of the (vector) activation function with respect to the optional learnable weigths (w)

Notes:

  • The output size of this layer is the same as those of the previous layers.
source
BetaML.Nn.ScalarFunctionLayerMethod
ScalarFunctionLayer(
     nₗ;
     rng,
     wsize,
@@ -155,7 +155,7 @@
     dfx,
     dfw
 ) -> ScalarFunctionLayer{_A, typeof(softmax), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}
-

Instantiate a new ScalarFunctionLayer

Positional arguments:

  • nₗ: Number of nodes (must be same as in the previous layer)

Keyword arguments:

  • wsize: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]
  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]
  • f: Activation function [def: softmax]
  • dfx: Derivative of the activation function with respect to the data [default: try to match with well-known derivatives, resort to AD if f is unknown]
  • dfw: Derivative of the activation function with respect to the learnable parameter [default: nothing (i.e. use AD)]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • If the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)
  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))
source
BetaML.Nn.VectorFunctionLayerType
struct VectorFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a VectorFunction layer in the network. Vector function layer expects a vector activation function, i.e. a function taking the whole output of the previous layer an input rather than working on a single node as "normal" activation functions would do. Useful for example with the SoftMax function in classification or with the pool1D function to implement a "pool" layer in 1 dimensions. By default it is weightless, i.e. it doesn't apply any transformation to the output coming from the previous layer except the activation function. However, by passing the parameter wsize (a touple or array - tested only 1D) you can pass the learnable parameter to the activation function too. It is your responsability to be sure the activation function accept only X or also this learnable array (as second argument). The number of nodes in input must be set to the same as in the previous layer (and if you are using this for classification, to the number of classes, i.e. the previous layer must be set equal to the number of classes in the predictions).

Fields:

  • w: Weigths (parameter) array passes as second argument to the activation function (if not empty)
  • nₗ: Number of nodes in input (i.e. length of previous layer)
  • n: Number of nodes in output (automatically inferred in the constructor)
  • f: Activation function (vector)
  • dfx: Derivative of the (vector) activation function with respect to the layer inputs (x)
  • dfw: Derivative of the (vector) activation function with respect to the optional learnable weigths (w)

Notes:

  • The output size of this layer is given by the size of the output function,

that not necessarily is the same as the previous layers.

source
BetaML.Nn.VectorFunctionLayerMethod
VectorFunctionLayer(
+

Instantiate a new ScalarFunctionLayer

Positional arguments:

  • nₗ: Number of nodes (must be same as in the previous layer)

Keyword arguments:

  • wsize: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]
  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]
  • f: Activation function [def: softmax]
  • dfx: Derivative of the activation function with respect to the data [default: try to match with well-known derivatives, resort to AD if f is unknown]
  • dfw: Derivative of the activation function with respect to the learnable parameter [default: nothing (i.e. use AD)]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • If the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)
  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))
source
BetaML.Nn.VectorFunctionLayerType
struct VectorFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a VectorFunction layer in the network. Vector function layer expects a vector activation function, i.e. a function taking the whole output of the previous layer an input rather than working on a single node as "normal" activation functions would do. Useful for example with the SoftMax function in classification or with the pool1D function to implement a "pool" layer in 1 dimensions. By default it is weightless, i.e. it doesn't apply any transformation to the output coming from the previous layer except the activation function. However, by passing the parameter wsize (a touple or array - tested only 1D) you can pass the learnable parameter to the activation function too. It is your responsability to be sure the activation function accept only X or also this learnable array (as second argument). The number of nodes in input must be set to the same as in the previous layer (and if you are using this for classification, to the number of classes, i.e. the previous layer must be set equal to the number of classes in the predictions).

Fields:

  • w: Weigths (parameter) array passes as second argument to the activation function (if not empty)
  • nₗ: Number of nodes in input (i.e. length of previous layer)
  • n: Number of nodes in output (automatically inferred in the constructor)
  • f: Activation function (vector)
  • dfx: Derivative of the (vector) activation function with respect to the layer inputs (x)
  • dfw: Derivative of the (vector) activation function with respect to the optional learnable weigths (w)

Notes:

  • The output size of this layer is given by the size of the output function,

that not necessarily is the same as the previous layers.

source
BetaML.Nn.VectorFunctionLayerMethod
VectorFunctionLayer(
     nₗ;
     rng,
     wsize,
@@ -166,20 +166,20 @@
     dfw,
     dummyDataToTestOutputSize
 ) -> VectorFunctionLayer{_A, typeof(softmax), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}
-

Instantiate a new VectorFunctionLayer

Positional arguments:

  • nₗ: Number of nodes (must be same as in the previous layer)

Keyword arguments:

  • wsize: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]
  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]
  • f: Activation function [def: softmax]
  • dfx: Derivative of the activation function with respect to the data

[default: try to match with well-known derivatives, resort to AD if f is unknown]

  • dfw: Derivative of the activation function with respect to the learnable parameter [default: nothing (i.e. use AD)]
  • dummyDataToTestOutputSize: Dummy data to test the output size [def:

ones(nₗ)]

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • If the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)
  • To avoid recomputing the activation function just to determine its output size, we compute the output size once here in the layer constructor by calling the activation function with dummyDataToTestOutputSize. Feel free to change it if it doesn't match with the activation function you are setting
  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))
source
Base.sizeMethod
size(layer)

Get the size of the layers in terms of (size in input, size in output) - both as tuples

Notes:

  • You need to use import Base.size before defining this function for your layer
source
Base.sizeMethod
size(layer::ConvLayer) -> Tuple{Tuple, Tuple}
-

Get the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension

source
Base.sizeMethod
size(
+

Instantiate a new VectorFunctionLayer

Positional arguments:

  • nₗ: Number of nodes (must be same as in the previous layer)

Keyword arguments:

  • wsize: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]
  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]
  • f: Activation function [def: softmax]
  • dfx: Derivative of the activation function with respect to the data

[default: try to match with well-known derivatives, resort to AD if f is unknown]

  • dfw: Derivative of the activation function with respect to the learnable parameter [default: nothing (i.e. use AD)]
  • dummyDataToTestOutputSize: Dummy data to test the output size [def:

ones(nₗ)]

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • If the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)
  • To avoid recomputing the activation function just to determine its output size, we compute the output size once here in the layer constructor by calling the activation function with dummyDataToTestOutputSize. Feel free to change it if it doesn't match with the activation function you are setting
  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))
source
Base.sizeMethod
size(layer)

Get the size of the layers in terms of (size in input, size in output) - both as tuples

Notes:

  • You need to use import Base.size before defining this function for your layer
source
Base.sizeMethod
size(layer::ConvLayer) -> Tuple{Tuple, Tuple}
+

Get the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension

source
Base.sizeMethod
size(
     layer::PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET} where {TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number}
 ) -> Tuple{Tuple, Tuple}
-

Get the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension

source
BetaML.Nn.ReplicatorLayerMethod
ReplicatorLayer(
+

Get the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension

source
BetaML.Nn.ReplicatorLayerMethod
ReplicatorLayer(
     n
 ) -> ScalarFunctionLayer{_A, typeof(identity), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}
-

Create a weigthless layer whose output is equal to the input.

Fields:

  • n: Number of nodes in output (≡ number of nodes in input )

Notes:

  • The output size of this layer is the same as those of the previous layers.
  • This is just an alias for a ScalarFunctionLayer with no weigths and identity function.
source
BetaML.Nn.backwardMethod
backward(layer,x,next_gradient)

Compute backpropagation for this layer with respect to its inputs

Parameters:

  • layer: Worker layer
  • x: Input to the layer
  • next_gradient: Derivative of the overal loss with respect to the input of the next layer (output of this layer)

Return:

  • The evaluated gradient of the loss with respect to this layer inputs
source
BetaML.Nn.fitting_infoMethod

fittinginfo(nn,xbatch,ybatch,x,y;n,batchsize,epochs,epochsran,verbosity,nepoch,n_batch)

Default callback funtion to display information during training, depending on the verbosity level

Parameters:

  • nn: Worker network
  • xbatch: Batch input to the network (batch_size,din)
  • ybatch: Batch label input (batch_size,dout)
  • x: Full input to the network (n_records,din)
  • y: Full label input (n_records,dout)
  • n: Size of the full training set
  • n_batches : Number of baches per epoch
  • epochs: Number of epochs defined for the training
  • epochs_ran: Number of epochs already ran in previous training sessions
  • verbosity: Verbosity level defined for the training (NONE,LOW,STD,HIGH,FULL)
  • n_epoch: Counter of the current epoch
  • n_batch: Counter of the current batch

#Notes:

  • Reporting of the error (loss of the network) is expensive. Use verbosity=NONE for better performances
source
BetaML.Nn.forwardMethod
forward(layer,x)

Predict the output of the layer given the input

Parameters:

  • layer: Worker layer
  • x: Input to the layer

Return:

  • An Array{T,1} of the prediction (even for a scalar)
source
BetaML.Nn.forwardMethod
forward(
+

Create a weigthless layer whose output is equal to the input.

Fields:

  • n: Number of nodes in output (≡ number of nodes in input )

Notes:

  • The output size of this layer is the same as those of the previous layers.
  • This is just an alias for a ScalarFunctionLayer with no weigths and identity function.
source
BetaML.Nn.backwardMethod
backward(layer,x,next_gradient)

Compute backpropagation for this layer with respect to its inputs

Parameters:

  • layer: Worker layer
  • x: Input to the layer
  • next_gradient: Derivative of the overal loss with respect to the input of the next layer (output of this layer)

Return:

  • The evaluated gradient of the loss with respect to this layer inputs
source
BetaML.Nn.fitting_infoMethod

fittinginfo(nn,xbatch,ybatch,x,y;n,batchsize,epochs,epochsran,verbosity,nepoch,n_batch)

Default callback funtion to display information during training, depending on the verbosity level

Parameters:

  • nn: Worker network
  • xbatch: Batch input to the network (batch_size,din)
  • ybatch: Batch label input (batch_size,dout)
  • x: Full input to the network (n_records,din)
  • y: Full label input (n_records,dout)
  • n: Size of the full training set
  • n_batches : Number of baches per epoch
  • epochs: Number of epochs defined for the training
  • epochs_ran: Number of epochs already ran in previous training sessions
  • verbosity: Verbosity level defined for the training (NONE,LOW,STD,HIGH,FULL)
  • n_epoch: Counter of the current epoch
  • n_batch: Counter of the current batch

#Notes:

  • Reporting of the error (loss of the network) is expensive. Use verbosity=NONE for better performances
source
BetaML.Nn.forwardMethod
forward(layer,x)

Predict the output of the layer given the input

Parameters:

  • layer: Worker layer
  • x: Input to the layer

Return:

  • An Array{T,1} of the prediction (even for a scalar)
source
BetaML.Nn.forwardMethod
forward(
     layer::ConvLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET},
     x
 ) -> Any
-

Compute forward pass of a ConvLayer

source
BetaML.Nn.forwardMethod
forward(
     layer::PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET},
     x
 ) -> Any
-

Compute forward pass of a ConvLayer

source
BetaML.Nn.get_gradientMethod
get_gradient(layer,x,next_gradient)

Compute backpropagation for this layer with respect to the layer weigths

Parameters:

  • layer: Worker layer
  • x: Input to the layer
  • next_gradient: Derivative of the overaall loss with respect to the input of the next layer (output of this layer)

Return:

  • The evaluated gradient of the loss with respect to this layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the get_params() and set_params() functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in its Learnable type.
source
BetaML.Nn.get_gradientMethod

get_gradient(nn,x,y)

Low level function that retrieve the current gradient of the weigthts (i.e. derivative of the cost with respect to the weigths). Unexported in BetaML >= v0.9

Parameters:

  • nn: Worker network
  • x: Input to the network (d,1)
  • y: Label input (d,1)

#Notes:

  • The output is a vector of tuples of each layer's input weigths and bias weigths
source
BetaML.Nn.get_paramsMethod
get_params(layer)

Get the layers current value of its trainable parameters

Parameters:

  • layer: Worker layer

Return:

  • The current value of the layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the get_gradient() and set_params() functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in its Learnable type.
source
BetaML.Nn.get_paramsMethod

get_params(nn)

Retrieve current weigthts

Parameters:

  • nn: Worker network

Notes:

  • The output is a vector of tuples of each layer's input weigths and bias weigths
source
BetaML.Nn.init_optalg!Method
init_optalg!(opt_alg::ADAM;θ,batch_size,x,y,rng)

Initialize the ADAM algorithm with the parameters m and v as zeros and check parameter bounds

source
BetaML.Nn.init_optalg!Method

initoptalg!(optalg;θ,batch_size,x,y)

Initialize the optimisation algorithm

Parameters:

  • opt_alg: The Optimisation algorithm to use
  • θ: Current parameters
  • batch_size: The size of the batch
  • x: The training (input) data
  • y: The training "labels" to match
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Only a few optimizers need this function and consequently ovverride it. By default it does nothing, so if you want write your own optimizer and don't need to initialise it, you don't have to override this method
source
BetaML.Nn.preprocess!Method
preprocess!(layer::AbstractLayer)
-

Preprocess the layer with information known at layer creation (i.e. no data info used)

This function is used for some layers to cache some computation that doesn't require the data and it is called at the beginning of fit!. For example, it is used in ConvLayer to store the ids of the convolution.

Notes:

  • as it doesn't depend on data, it is not reset by reset!
source
BetaML.Nn.set_params!Method
set_params!(layer,w)

Set the trainable parameters of the layer with the given values

Parameters:

  • layer: Worker layer
  • w: The new parameters to set (Learnable)

Notes:

  • The format of the tuple wrapped by Learnable must be consistent with those of the get_params() and get_gradient() functions.
source
BetaML.Nn.set_params!Method

set_params!(nn,w)

Update weigths of the network

Parameters:

  • nn: Worker network
  • w: The new weights to set
source
BetaML.Nn.single_update!Method

singleupdate!(θ,▽;nepoch,nbatch,batchsize,xbatch,ybatch,opt_alg)

Perform the parameters update based on the average batch gradient.

Parameters:

  • θ: Current parameters
  • : Average gradient of the batch
  • n_epoch: Count of current epoch
  • n_batch: Count of current batch
  • n_batches: Number of batches per epoch
  • xbatch: Data associated to the current batch
  • ybatch: Labels associated to the current batch
  • opt_alg: The Optimisation algorithm to use for the update

Notes:

  • This function is overridden so that each optimisation algorithm implement their

own version

  • Most parameters are not used by any optimisation algorithm. They are provided

to support the largest possible class of optimisation algorithms

  • Some optimisation algorithms may change their internal structure in this function
source
+

Compute forward pass of a ConvLayer

source
BetaML.Nn.get_gradientMethod
get_gradient(layer,x,next_gradient)

Compute backpropagation for this layer with respect to the layer weigths

Parameters:

  • layer: Worker layer
  • x: Input to the layer
  • next_gradient: Derivative of the overaall loss with respect to the input of the next layer (output of this layer)

Return:

  • The evaluated gradient of the loss with respect to this layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the get_params() and set_params() functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in its Learnable type.
source
BetaML.Nn.get_gradientMethod

get_gradient(nn,x,y)

Low level function that retrieve the current gradient of the weigthts (i.e. derivative of the cost with respect to the weigths). Unexported in BetaML >= v0.9

Parameters:

  • nn: Worker network
  • x: Input to the network (d,1)
  • y: Label input (d,1)

#Notes:

  • The output is a vector of tuples of each layer's input weigths and bias weigths
source
BetaML.Nn.get_paramsMethod
get_params(layer)

Get the layers current value of its trainable parameters

Parameters:

  • layer: Worker layer

Return:

  • The current value of the layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the get_gradient() and set_params() functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in its Learnable type.
source
BetaML.Nn.get_paramsMethod

get_params(nn)

Retrieve current weigthts

Parameters:

  • nn: Worker network

Notes:

  • The output is a vector of tuples of each layer's input weigths and bias weigths
source
BetaML.Nn.init_optalg!Method
init_optalg!(opt_alg::ADAM;θ,batch_size,x,y,rng)

Initialize the ADAM algorithm with the parameters m and v as zeros and check parameter bounds

source
BetaML.Nn.init_optalg!Method

initoptalg!(optalg;θ,batch_size,x,y)

Initialize the optimisation algorithm

Parameters:

  • opt_alg: The Optimisation algorithm to use
  • θ: Current parameters
  • batch_size: The size of the batch
  • x: The training (input) data
  • y: The training "labels" to match
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Only a few optimizers need this function and consequently ovverride it. By default it does nothing, so if you want write your own optimizer and don't need to initialise it, you don't have to override this method
source
BetaML.Nn.preprocess!Method
preprocess!(layer::AbstractLayer)
+

Preprocess the layer with information known at layer creation (i.e. no data info used)

This function is used for some layers to cache some computation that doesn't require the data and it is called at the beginning of fit!. For example, it is used in ConvLayer to store the ids of the convolution.

Notes:

  • as it doesn't depend on data, it is not reset by reset!
source
BetaML.Nn.set_params!Method
set_params!(layer,w)

Set the trainable parameters of the layer with the given values

Parameters:

  • layer: Worker layer
  • w: The new parameters to set (Learnable)

Notes:

  • The format of the tuple wrapped by Learnable must be consistent with those of the get_params() and get_gradient() functions.
source
BetaML.Nn.set_params!Method

set_params!(nn,w)

Update weigths of the network

Parameters:

  • nn: Worker network
  • w: The new weights to set
source
BetaML.Nn.single_update!Method

singleupdate!(θ,▽;nepoch,nbatch,batchsize,xbatch,ybatch,opt_alg)

Perform the parameters update based on the average batch gradient.

Parameters:

  • θ: Current parameters
  • : Average gradient of the batch
  • n_epoch: Count of current epoch
  • n_batch: Count of current batch
  • n_batches: Number of batches per epoch
  • xbatch: Data associated to the current batch
  • ybatch: Labels associated to the current batch
  • opt_alg: The Optimisation algorithm to use for the update

Notes:

  • This function is overridden so that each optimisation algorithm implement their

own version

  • Most parameters are not used by any optimisation algorithm. They are provided

to support the largest possible class of optimisation algorithms

  • Some optimisation algorithms may change their internal structure in this function
source
diff --git a/dev/Perceptron.html b/dev/Perceptron.html index 9c44e6a..fede5a8 100644 --- a/dev/Perceptron.html +++ b/dev/Perceptron.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

The BetaML.Perceptron Module

BetaML.PerceptronModule
Perceptron module

Provide linear and kernel classifiers.

Provide the following supervised models:

All algorithms are multiclass, with PerceptronClassifier and PegasosClassifier employing a one-vs-all strategy, while KernelPerceptronClassifier employs a one-vs-one approach, and return a "probability" for each class in term of a dictionary for each record. Use mode(ŷ) to return a single class prediction per record.

These models are available in the MLJ framework as PerceptronClassifier,KernelPerceptronClassifier and PegasosClassifier respectivly.

source

Module Index

Detailed API

BetaML.Perceptron.KernelPerceptronC_hpType
mutable struct KernelPerceptronC_hp <: BetaMLHyperParametersSet

Hyperparameters for the KernelPerceptronClassifier model

Parameters:

  • kernel: Kernel function to employ. See ?radial_kernel or ?polynomial_kernel for details or check ?BetaML.Utils to verify if other kernels are defined (you can alsways define your own kernel) [def: radial_kernel]

  • initial_errors: Initial distribution of the number of errors errors [def: nothing, i.e. zeros]. If provided, this should be a nModels-lenght vector of nRecords integer values vectors , where nModels is computed as (n_classes * (n_classes - 1)) / 2

  • epochs: Maximum number of epochs, i.e. passages trough the whole training sample [def: 100]

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Perceptron.KernelPerceptronClassifierType
mutable struct KernelPerceptronClassifier <: BetaMLSupervisedModel

A "kernel" version of the Perceptron model (supervised) with user configurable kernel function.

For the parameters see ? KernelPerceptronC_hp and ?BML_options

Limitations:

  • data must be numerical
  • online training (retraining) is not supported

Example:

julia> using BetaML
+

The BetaML.Perceptron Module

BetaML.PerceptronModule
Perceptron module

Provide linear and kernel classifiers.

Provide the following supervised models:

All algorithms are multiclass, with PerceptronClassifier and PegasosClassifier employing a one-vs-all strategy, while KernelPerceptronClassifier employs a one-vs-one approach, and return a "probability" for each class in term of a dictionary for each record. Use mode(ŷ) to return a single class prediction per record.

These models are available in the MLJ framework as PerceptronClassifier,KernelPerceptronClassifier and PegasosClassifier respectivly.

source

Module Index

Detailed API

BetaML.Perceptron.KernelPerceptronC_hpType
mutable struct KernelPerceptronC_hp <: BetaMLHyperParametersSet

Hyperparameters for the KernelPerceptronClassifier model

Parameters:

  • kernel: Kernel function to employ. See ?radial_kernel or ?polynomial_kernel for details or check ?BetaML.Utils to verify if other kernels are defined (you can alsways define your own kernel) [def: radial_kernel]

  • initial_errors: Initial distribution of the number of errors errors [def: nothing, i.e. zeros]. If provided, this should be a nModels-lenght vector of nRecords integer values vectors , where nModels is computed as (n_classes * (n_classes - 1)) / 2

  • epochs: Maximum number of epochs, i.e. passages trough the whole training sample [def: 100]

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Perceptron.KernelPerceptronClassifierType
mutable struct KernelPerceptronClassifier <: BetaMLSupervisedModel

A "kernel" version of the Perceptron model (supervised) with user configurable kernel function.

For the parameters see ? KernelPerceptronC_hp and ?BML_options

Limitations:

  • data must be numerical
  • online training (retraining) is not supported

Example:

julia> using BetaML
 
 julia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];
 
@@ -28,7 +28,7 @@
  "b"
  "b"
  "b"
- "b"
source
BetaML.Perceptron.PegasosC_hpType
mutable struct PegasosC_hp <: BetaMLHyperParametersSet

Hyperparameters for the PegasosClassifier model.

Parameters:

  • learning_rate::Function: Learning rate [def: (epoch -> 1/sqrt(epoch))]

  • learning_rate_multiplicative::Float64: Multiplicative term of the learning rate [def: 0.5]

  • initial_parameters::Union{Nothing, Matrix{Float64}}: Initial parameters. If given, should be a matrix of n-classes by feature dimension + 1 (to include the constant term as the first element) [def: nothing, i.e. zeros]

  • epochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]

  • shuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • force_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]

  • return_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]

  • tunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Perceptron.PegasosClassifierType
mutable struct PegasosClassifier <: BetaMLSupervisedModel

The PegasosClassifier model, a linear, gradient-based classifier. Multiclass is supported using a one-vs-all approach.

See ?PegasosC_hp and ?BML_options for applicable hyperparameters and options.

Example:

julia> using BetaML
+ "b"
source
BetaML.Perceptron.PegasosC_hpType
mutable struct PegasosC_hp <: BetaMLHyperParametersSet

Hyperparameters for the PegasosClassifier model.

Parameters:

  • learning_rate::Function: Learning rate [def: (epoch -> 1/sqrt(epoch))]

  • learning_rate_multiplicative::Float64: Multiplicative term of the learning rate [def: 0.5]

  • initial_parameters::Union{Nothing, Matrix{Float64}}: Initial parameters. If given, should be a matrix of n-classes by feature dimension + 1 (to include the constant term as the first element) [def: nothing, i.e. zeros]

  • epochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]

  • shuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • force_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]

  • return_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]

  • tunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Perceptron.PegasosClassifierType
mutable struct PegasosClassifier <: BetaMLSupervisedModel

The PegasosClassifier model, a linear, gradient-based classifier. Multiclass is supported using a one-vs-all approach.

See ?PegasosC_hp and ?BML_options for applicable hyperparameters and options.

Example:

julia> using BetaML
 
 julia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];
 
@@ -48,7 +48,7 @@
  "b"
  "b"
  "b"
- "a"
source
BetaML.Perceptron.PerceptronC_hpType
mutable struct PerceptronC_hp <: BetaMLHyperParametersSet

Hyperparameters for the PerceptronClassifier model

Parameters:

  • initial_parameters::Union{Nothing, Matrix{Float64}}: Initial parameters. If given, should be a matrix of n-classes by feature dimension + 1 (to include the constant term as the first element) [def: nothing, i.e. zeros]

  • epochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]

  • shuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • force_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]

  • return_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]

  • tunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Perceptron.PerceptronC_hpType
mutable struct PerceptronC_hp <: BetaMLHyperParametersSet

Hyperparameters for the PerceptronClassifier model

Parameters:

  • initial_parameters::Union{Nothing, Matrix{Float64}}: Initial parameters. If given, should be a matrix of n-classes by feature dimension + 1 (to include the constant term as the first element) [def: nothing, i.e. zeros]

  • epochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]

  • shuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • force_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]

  • return_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]

  • tunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Perceptron.PerceptronClassifierType
mutable struct PerceptronClassifier <: BetaMLSupervisedModel

The classical "perceptron" linear classifier (supervised).

For the parameters see ?PerceptronC_hp and ?BML_options.

Notes:

  • data must be numerical
  • online fitting (re-fitting with new data) is not supported

Example:

julia> using BetaML
 
 julia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];
 
@@ -70,4 +70,4 @@
  "b"
  "b"
  "b"
- "a"
source
+ "a"
source
diff --git a/dev/StyleGuide_templates.html b/dev/StyleGuide_templates.html index fcacbc7..999d33b 100644 --- a/dev/StyleGuide_templates.html +++ b/dev/StyleGuide_templates.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

Style guide and template for BetaML developers

Master Style guide

The code in BetaML should follow the official Julia Style Guide.

Names style

  • Each file name should start with a capital letter, no spaces allowed (and each file content should start with: "Part of [BetaML](https://github.com/sylvaticus/BetaML.jl). Licence is MIT.")
  • Type names use the so-called "CamelCase" convention, where the words are separated by a capital letter rather than _ ,while function names use lower letters only, with words eventually separated (but only when really neeed for readibility) by an _;
  • In the code and documentation we refer with N the number of observations/records, D the number of dimensions and K the number of classes/categories;
  • Error/accuracy/loss functions want firt y and then
  • In API exposed to users, strings are preferred to symbols

Docstrings

Please apply the following templates when writing a docstring for BetaML:

  • Functions (add @docs if the function is not on the root module level, like for inner constructors, i.e. @docs """ foo()x ...."""):
"""
+

Style guide and template for BetaML developers

Master Style guide

The code in BetaML should follow the official Julia Style Guide.

Names style

  • Each file name should start with a capital letter, no spaces allowed (and each file content should start with: "Part of [BetaML](https://github.com/sylvaticus/BetaML.jl). Licence is MIT.")
  • Type names use the so-called "CamelCase" convention, where the words are separated by a capital letter rather than _ ,while function names use lower letters only, with words eventually separated (but only when really neeed for readibility) by an _;
  • In the code and documentation we refer with N the number of observations/records, D the number of dimensions and K the number of classes/categories;
  • Error/accuracy/loss functions want firt y and then
  • In API exposed to users, strings are preferred to symbols

Docstrings

Please apply the following templates when writing a docstring for BetaML:

  • Functions (add @docs if the function is not on the root module level, like for inner constructors, i.e. @docs """ foo()x ...."""):
"""
 $(TYPEDSIGNATURES)
 
 One line description
@@ -70,4 +70,4 @@
 
 Detailed description on the module objectives, content and organisation
 
-"""

To refer to a documented object: [`NAME`](@ref) or [`NAME`](@ref manual_id). In particular for internal links use [`?NAME`](@ref ?NAME)

To create an id manually: [Title](@id manual_id)

Data organisation

  • While some functions provide a dims parameter, most BetaML algorithms expect the input data layout with observations organised by rows and fields/features by columns.
  • While some algorithms accept as input DataFrames, the usage of standard arrays is encourages (if the data is passed to the function as dataframe, it may be converted to standard arrays somewhere inside inner loops, leading to great inefficiencies).
+"""

To refer to a documented object: [`NAME`](@ref) or [`NAME`](@ref manual_id). In particular for internal links use [`?NAME`](@ref ?NAME)

To create an id manually: [Title](@id manual_id)

Data organisation

  • While some functions provide a dims parameter, most BetaML algorithms expect the input data layout with observations organised by rows and fields/features by columns.
  • While some algorithms accept as input DataFrames, the usage of standard arrays is encourages (if the data is passed to the function as dataframe, it may be converted to standard arrays somewhere inside inner loops, leading to great inefficiencies).
diff --git a/dev/Trees.html b/dev/Trees.html index a576df3..20083c8 100644 --- a/dev/Trees.html +++ b/dev/Trees.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

The BetaML.Trees Module

BetaML.TreesModule
BetaML.Trees module

Implement the DecisionTreeEstimator and RandomForestEstimator models (Decision Trees and Random Forests).

Both Decision Trees and Random Forests can be used for regression or classification problems, based on the type of the labels (numerical or not). The automatic selection can be overridden with the parameter force_classification=true, typically if labels are integer representing some categories rather than numbers. For classification problems the output of predict is a dictionary with the key being the labels with non-zero probabilitity and the corresponding value its probability; for regression it is a numerical value.

Please be aware that, differently from most other implementations, the Random Forest algorithm collects and averages the probabilities from the trees, rather than just repording the mode, i.e. no information is lost and the output of the forest classifier is still a PMF.

To retrieve the prediction with the highest probability use mode over the prediciton returned by the model. Most error/accuracy measures in the Utils BetaML module works diretly with this format.

Missing data and trully unordered types are supported on the features, both on training and on prediction.

The module provide the following functions. Use ?[type or function] to access their full signature and detailed documentation:

Features are expected to be in the standard format (nRecords × nDimensions matrices) and the labels (either categorical or numerical) as a nRecords column vector.

Acknowlegdments: originally based on the Josh Gordon's code

source

Module Index

Detailed API

BetaML.Trees.DecisionNodeType

DecisionNode(question,trueBranch,falseBranch, depth)

A tree's non-terminal node.

Constructor's arguments and struct members:

  • question: The question asked in this node
  • trueBranch: A reference to the "true" branch of the trees
  • falseBranch: A reference to the "false" branch of the trees
  • depth: The nodes's depth in the tree
source
BetaML.Trees.DecisionTreeE_hpType
mutable struct DecisionTreeE_hp <: BetaMLHyperParametersSet

Hyperparameters for DecisionTreeEstimator (Decision Tree).

Parameters:

  • max_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Union{Nothing, Int64}: The maximum number of (random) features to consider at each partitioning [def: nothing, i.e. look at all features]

  • force_classification::Bool: Whether to force a classification task even if the labels are numerical (typically when labels are integers encoding some feature rather than representing a real cardinal measure) [def: false]

  • splitting_criterion::Union{Nothing, Function}: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. Either gini, entropy, variance or a custom function. It can also be an anonymous function.

  • fast_algorithm::Bool: Use an experimental faster algoritm for looking up the best split in ordered fields (colums). Currently it brings down the fitting time of an order of magnitude, but predictions are sensibly affected. If used, control the meaning of integer fields with integer_encoded_cols.

  • integer_encoded_cols::Union{Nothing, Vector{Int64}}: A vector of columns positions to specify which integer columns should be treated as encoding of categorical variables insteads of ordered classes/values. [def: nothing, integer columns with less than 20 unique values are considered categorical]. Useful in conjunction with fast_algorithm, little difference otherwise.

  • tunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Trees.DecisionTreeEstimatorType
mutable struct DecisionTreeEstimator <: BetaMLSupervisedModel

A Decision Tree classifier and regressor (supervised).

Decision Tree works by finding the "best" question to split the fitting data (according to the metric specified by the parameter splitting_criterion on the associated labels) untill either all the dataset is separated or a terminal condition is reached.

For the parameters see ?DecisionTreeE_hp and ?BML_options.

Notes:

  • Online fitting (re-fitting with new data) is not supported
  • Missing data (in the feature dataset) is supported.

Examples:

  • Classification...
julia> using BetaML
+

The BetaML.Trees Module

BetaML.TreesModule
BetaML.Trees module

Implement the DecisionTreeEstimator and RandomForestEstimator models (Decision Trees and Random Forests).

Both Decision Trees and Random Forests can be used for regression or classification problems, based on the type of the labels (numerical or not). The automatic selection can be overridden with the parameter force_classification=true, typically if labels are integer representing some categories rather than numbers. For classification problems the output of predict is a dictionary with the key being the labels with non-zero probabilitity and the corresponding value its probability; for regression it is a numerical value.

Please be aware that, differently from most other implementations, the Random Forest algorithm collects and averages the probabilities from the trees, rather than just repording the mode, i.e. no information is lost and the output of the forest classifier is still a PMF.

To retrieve the prediction with the highest probability use mode over the prediciton returned by the model. Most error/accuracy measures in the Utils BetaML module works diretly with this format.

Missing data and trully unordered types are supported on the features, both on training and on prediction.

The module provide the following functions. Use ?[type or function] to access their full signature and detailed documentation:

Features are expected to be in the standard format (nRecords × nDimensions matrices) and the labels (either categorical or numerical) as a nRecords column vector.

Acknowlegdments: originally based on the Josh Gordon's code

source

Module Index

Detailed API

BetaML.Trees.DecisionNodeType

DecisionNode(question,trueBranch,falseBranch, depth)

A tree's non-terminal node.

Constructor's arguments and struct members:

  • question: The question asked in this node
  • trueBranch: A reference to the "true" branch of the trees
  • falseBranch: A reference to the "false" branch of the trees
  • depth: The nodes's depth in the tree
source
BetaML.Trees.DecisionTreeE_hpType
mutable struct DecisionTreeE_hp <: BetaMLHyperParametersSet

Hyperparameters for DecisionTreeEstimator (Decision Tree).

Parameters:

  • max_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Union{Nothing, Int64}: The maximum number of (random) features to consider at each partitioning [def: nothing, i.e. look at all features]

  • force_classification::Bool: Whether to force a classification task even if the labels are numerical (typically when labels are integers encoding some feature rather than representing a real cardinal measure) [def: false]

  • splitting_criterion::Union{Nothing, Function}: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. Either gini, entropy, variance or a custom function. It can also be an anonymous function.

  • fast_algorithm::Bool: Use an experimental faster algoritm for looking up the best split in ordered fields (colums). Currently it brings down the fitting time of an order of magnitude, but predictions are sensibly affected. If used, control the meaning of integer fields with integer_encoded_cols.

  • integer_encoded_cols::Union{Nothing, Vector{Int64}}: A vector of columns positions to specify which integer columns should be treated as encoding of categorical variables insteads of ordered classes/values. [def: nothing, integer columns with less than 20 unique values are considered categorical]. Useful in conjunction with fast_algorithm, little difference otherwise.

  • tunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Trees.DecisionTreeEstimatorType
mutable struct DecisionTreeEstimator <: BetaMLSupervisedModel

A Decision Tree classifier and regressor (supervised).

Decision Tree works by finding the "best" question to split the fitting data (according to the metric specified by the parameter splitting_criterion on the associated labels) untill either all the dataset is separated or a terminal condition is reached.

For the parameters see ?DecisionTreeE_hp and ?BML_options.

Notes:

  • Online fitting (re-fitting with new data) is not supported
  • Missing data (in the feature dataset) is supported.

Examples:

  • Classification...
julia> using BetaML
 
 julia> X   = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];
 
@@ -90,7 +90,7 @@
 │     └─ -13.8
 │        
 └─ 3.3999999999999995
-julia> plot(wrapped_tree)    

DT plot

source
BetaML.Trees.InfoNodeType

These types are introduced so that additional information currently not present in a DecisionTree-structure – namely the feature names – can be used for visualization.

source
BetaML.Trees.LeafType

Leaf(y,depth)

A tree's leaf (terminal) node.

Constructor's arguments:

  • y: The labels assorciated to each record (either numerical or categorical)
  • depth: The nodes's depth in the tree

Struct members:

  • predictions: Either the relative label's count (i.e. a PMF) or the mean
  • depth: The nodes's depth in the tree
source
BetaML.Trees.RandomForestE_hpType
mutable struct RandomForestE_hp <: BetaMLHyperParametersSet

Hyperparameters for RandomForestEstimator (Random Forest).

Parameters:

  • n_trees::Int64: Number of (decision) trees in the forest [def: 30]

  • max_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Union{Nothing, Int64}: The maximum number of (random) features to consider when choosing the optimal partition of the dataset [def: nothing, i.e. square root of the dimensions of the training data`]

  • force_classification::Bool: Whether to force a classification task even if the labels are numerical (typically when labels are integers encoding some feature rather than representing a real cardinal measure) [def: false]

  • splitting_criterion::Union{Nothing, Function}: Either gini, entropy or variance. This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. It can be an anonymous function.

  • fast_algorithm::Bool: Use an experimental faster algoritm for looking up the best split in ordered fields (colums). Currently it brings down the fitting time of an order of magnitude, but predictions are sensibly affected. If used, control the meaning of integer fields with integer_encoded_cols.

  • integer_encoded_cols::Union{Nothing, Vector{Int64}}: A vector of columns positions to specify which integer columns should be treated as encoding of categorical variables insteads of ordered classes/values. [def: nothing, integer columns with less than 20 unique values are considered categorical]. Useful in conjunction with fast_algorithm, little difference otherwise.

  • beta::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour "better" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]

  • oob::Bool: Wheter to compute the Out-Of-Bag error, an estimation of the validation error (the mismatching error for classification and the relative mean error for regression jobs).

  • tunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Trees.RandomForestEstimatorType
mutable struct RandomForestEstimator <: BetaMLSupervisedModel

A Random Forest classifier and regressor (supervised).

Random forests are ensemble of Decision Trees models (see ?DecisionTreeEstimator).

For the parameters see ?RandomForestE_hp and ?BML_options.

Notes :

  • Each individual decision tree is built using bootstrap over the data, i.e. "sampling N records with replacement" (hence, some records appear multiple times and some records do not appear in the specific tree training). The maxx_feature injects further variability and reduces the correlation between the forest trees.
  • The predictions of the "forest" (using the function predict()) are then the aggregated predictions of the individual trees (from which the name "bagging": boostrap aggregating).
  • The performances of each individual trees, as measured using the records they have not being trained with, can then be (optionally) used as weights in the predict function. The parameter beta ≥ 0 regulate the distribution of these weights: larger is β, the greater the importance (hence the weights) attached to the best-performing trees compared to the low-performing ones. Using these weights can significantly improve the forest performances (especially using small forests), however the correct value of beta depends on the problem under exam (and the chosen caratteristics of the random forest estimator) and should be cross-validated to avoid over-fitting.
  • Note that training RandomForestEstimator uses multiple threads if these are available. You can check the number of threads available with Threads.nthreads(). To set the number of threads in Julia either set the environmental variable JULIA_NUM_THREADS (before starting Julia) or start Julia with the command line option --threads (most integrated development editors for Julia already set the number of threads to 4).
  • Online fitting (re-fitting with new data) is not supported
  • Missing data (in the feature dataset) is supported.

Examples:

  • Classification...
julia> using BetaML
+julia> plot(wrapped_tree)    

DT plot

source
BetaML.Trees.InfoNodeType

These types are introduced so that additional information currently not present in a DecisionTree-structure – namely the feature names – can be used for visualization.

source
BetaML.Trees.LeafType

Leaf(y,depth)

A tree's leaf (terminal) node.

Constructor's arguments:

  • y: The labels assorciated to each record (either numerical or categorical)
  • depth: The nodes's depth in the tree

Struct members:

  • predictions: Either the relative label's count (i.e. a PMF) or the mean
  • depth: The nodes's depth in the tree
source
BetaML.Trees.RandomForestE_hpType
mutable struct RandomForestE_hp <: BetaMLHyperParametersSet

Hyperparameters for RandomForestEstimator (Random Forest).

Parameters:

  • n_trees::Int64: Number of (decision) trees in the forest [def: 30]

  • max_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]

  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]

  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]

  • max_features::Union{Nothing, Int64}: The maximum number of (random) features to consider when choosing the optimal partition of the dataset [def: nothing, i.e. square root of the dimensions of the training data`]

  • force_classification::Bool: Whether to force a classification task even if the labels are numerical (typically when labels are integers encoding some feature rather than representing a real cardinal measure) [def: false]

  • splitting_criterion::Union{Nothing, Function}: Either gini, entropy or variance. This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. It can be an anonymous function.

  • fast_algorithm::Bool: Use an experimental faster algoritm for looking up the best split in ordered fields (colums). Currently it brings down the fitting time of an order of magnitude, but predictions are sensibly affected. If used, control the meaning of integer fields with integer_encoded_cols.

  • integer_encoded_cols::Union{Nothing, Vector{Int64}}: A vector of columns positions to specify which integer columns should be treated as encoding of categorical variables insteads of ordered classes/values. [def: nothing, integer columns with less than 20 unique values are considered categorical]. Useful in conjunction with fast_algorithm, little difference otherwise.

  • beta::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour "better" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]

  • oob::Bool: Wheter to compute the Out-Of-Bag error, an estimation of the validation error (the mismatching error for classification and the relative mean error for regression jobs).

  • tunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Trees.RandomForestEstimatorType
mutable struct RandomForestEstimator <: BetaMLSupervisedModel

A Random Forest classifier and regressor (supervised).

Random forests are ensemble of Decision Trees models (see ?DecisionTreeEstimator).

For the parameters see ?RandomForestE_hp and ?BML_options.

Notes :

  • Each individual decision tree is built using bootstrap over the data, i.e. "sampling N records with replacement" (hence, some records appear multiple times and some records do not appear in the specific tree training). The maxx_feature injects further variability and reduces the correlation between the forest trees.
  • The predictions of the "forest" (using the function predict()) are then the aggregated predictions of the individual trees (from which the name "bagging": boostrap aggregating).
  • The performances of each individual trees, as measured using the records they have not being trained with, can then be (optionally) used as weights in the predict function. The parameter beta ≥ 0 regulate the distribution of these weights: larger is β, the greater the importance (hence the weights) attached to the best-performing trees compared to the low-performing ones. Using these weights can significantly improve the forest performances (especially using small forests), however the correct value of beta depends on the problem under exam (and the chosen caratteristics of the random forest estimator) and should be cross-validated to avoid over-fitting.
  • Note that training RandomForestEstimator uses multiple threads if these are available. You can check the number of threads available with Threads.nthreads(). To set the number of threads in Julia either set the environmental variable JULIA_NUM_THREADS (before starting Julia) or start Julia with the command line option --threads (most integrated development editors for Julia already set the number of threads to 4).
  • Online fitting (re-fitting with new data) is not supported
  • Missing data (in the feature dataset) is supported.

Examples:

  • Classification...
julia> using BetaML
 
 julia> X   = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];
 
@@ -132,4 +132,4 @@
 
 julia> println(mod)
 RandomForestEstimator - A 5 trees Random Forest regressor (fitted on 6 records)
-Dict{String, Any}("job_is_regression" => 1, "fitted_records" => 6, "avg_avg_depth" => 2.8833333333333333, "oob_errors" => Inf, "avg_max_reached_depth" => 3.4, "xndims" => 2)
source
+Dict{String, Any}("job_is_regression" => 1, "fitted_records" => 6, "avg_avg_depth" => 2.8833333333333333, "oob_errors" => Inf, "avg_max_reached_depth" => 3.4, "xndims" => 2)
source
diff --git a/dev/Utils.html b/dev/Utils.html index 9425099..e4352b5 100644 --- a/dev/Utils.html +++ b/dev/Utils.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

The BetaML.Utils Module

BetaML.UtilsModule
Utils module

Provide shared utility functions and/or models for various machine learning algorithms.

For the complete list of functions provided see below. The main ones are:

Helper functions for logging

  • Most BetaML functions accept a parameter verbosity (choose between NONE, LOW, STD, HIGH or FULL)
  • Writing complex code and need to find where something is executed ? Use the macro @codelocation

Stochasticity management

  • Utils provide [FIXEDSEED], [FIXEDRNG] and generate_parallel_rngs. All stochastic functions and models accept a rng parameter. See the "Getting started" section in the tutorial for details.

Data processing

Samplers

  • Utilities to sample from data (e.g. for neural network training or for cross-validation)
  • Include the "generic" type SamplerWithData, together with the sampler implementation KFold and the function batch

Transformers

Measures

source

Module Index

Detailed API

BetaML.Utils.AutoE_hpType
mutable struct AutoE_hp <: BetaMLHyperParametersSet

Hyperparameters for the AutoEncoder transformer

Parameters

  • encoded_size: The desired size of the encoded data, that is the number of dimensions in output or the size of the latent space. This is the number of neurons of the layer sitting between the econding and decoding layers. If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: 0.33]

  • layers_size: Inner layers dimension (i.e. number of neurons). If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: nothing that applies a specific heuristic]. Consider that the underlying neural network is trying to predict multiple values at the same times. Normally this requires many more neurons than a scalar prediction. If e_layers or d_layers are specified, this parameter is ignored for the respective part.

  • e_layers: The layers (vector of AbstractLayers) responsable of the encoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]

  • d_layers: The layers (vector of AbstractLayers) responsable of the decoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]

  • loss: Loss (cost) function [def: squared_cost] It must always assume y and ŷ as (n x d) matrices, eventually using dropdims inside.

  • dloss: Derivative of the loss function [def: dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 8]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: ADAM()]

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Utils.AutoEncoderType
mutable struct AutoEncoder <: BetaMLUnsupervisedModel

Perform a (possibly-non linear) transformation ("encoding") of the data into a different space, e.g. for dimensionality reduction using neural network trained to replicate the input data.

A neural network is trained to first transform the data (ofter "compress") to a subspace (the output of an inner layer) and then retransform (subsequent layers) to the original data.

predict(mod::AutoEncoder,x) returns the encoded data, inverse_predict(mod::AutoEncoder,xtransformed) performs the decoding.

For the parameters see AutoE_hp and BML_options

Notes:

  • AutoEncoder doesn't automatically scale the data. It is suggested to apply the Scaler model before running it.
  • Missing data are not supported. Impute them first, see the Imputation module.
  • Decoding layers can be optinally choosen (parameter d_layers) in order to suit the kind of data, e.g. a relu activation function for nonegative data

Example:

julia> using BetaML
+

The BetaML.Utils Module

BetaML.UtilsModule
Utils module

Provide shared utility functions and/or models for various machine learning algorithms.

For the complete list of functions provided see below. The main ones are:

Helper functions for logging

  • Most BetaML functions accept a parameter verbosity (choose between NONE, LOW, STD, HIGH or FULL)
  • Writing complex code and need to find where something is executed ? Use the macro @codelocation

Stochasticity management

  • Utils provide [FIXEDSEED], [FIXEDRNG] and generate_parallel_rngs. All stochastic functions and models accept a rng parameter. See the "Getting started" section in the tutorial for details.

Data processing

Samplers

  • Utilities to sample from data (e.g. for neural network training or for cross-validation)
  • Include the "generic" type SamplerWithData, together with the sampler implementation KFold and the function batch

Transformers

Measures

source

Module Index

Detailed API

BetaML.Utils.AutoE_hpType
mutable struct AutoE_hp <: BetaMLHyperParametersSet

Hyperparameters for the AutoEncoder transformer

Parameters

  • encoded_size: The desired size of the encoded data, that is the number of dimensions in output or the size of the latent space. This is the number of neurons of the layer sitting between the econding and decoding layers. If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: 0.33]

  • layers_size: Inner layers dimension (i.e. number of neurons). If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: nothing that applies a specific heuristic]. Consider that the underlying neural network is trying to predict multiple values at the same times. Normally this requires many more neurons than a scalar prediction. If e_layers or d_layers are specified, this parameter is ignored for the respective part.

  • e_layers: The layers (vector of AbstractLayers) responsable of the encoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]

  • d_layers: The layers (vector of AbstractLayers) responsable of the decoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]

  • loss: Loss (cost) function [def: squared_cost] It must always assume y and ŷ as (n x d) matrices, eventually using dropdims inside.

  • dloss: Derivative of the loss function [def: dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 8]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: ADAM()]

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

source
BetaML.Utils.AutoEncoderType
mutable struct AutoEncoder <: BetaMLUnsupervisedModel

Perform a (possibly-non linear) transformation ("encoding") of the data into a different space, e.g. for dimensionality reduction using neural network trained to replicate the input data.

A neural network is trained to first transform the data (ofter "compress") to a subspace (the output of an inner layer) and then retransform (subsequent layers) to the original data.

predict(mod::AutoEncoder,x) returns the encoded data, inverse_predict(mod::AutoEncoder,xtransformed) performs the decoding.

For the parameters see AutoE_hp and BML_options

Notes:

  • AutoEncoder doesn't automatically scale the data. It is suggested to apply the Scaler model before running it.
  • Missing data are not supported. Impute them first, see the Imputation module.
  • Decoding layers can be optinally choosen (parameter d_layers) in order to suit the kind of data, e.g. a relu activation function for nonegative data

Example:

julia> using BetaML
 
 julia> x = [0.12 0.31 0.29 3.21 0.21;
             0.22 0.61 0.58 6.43 0.42;
@@ -45,7 +45,7 @@
  0.22  0.61  0.58   6.43  0.42  0.205628   0.470884  0.558655   6.51042  0.487416
  0.51  1.47  1.46  16.12  0.99  0.529785   1.56431   1.45762   16.067    0.971123
  0.35  0.93  0.91  10.04  0.71  0.3264     0.878264  0.893584  10.0709   0.667632
- 0.44  1.21  1.18  13.54  0.85  0.443453   1.2731    1.2182    13.5218   0.842298
source
BetaML.Utils.ConfusionMatrixType
mutable struct ConfusionMatrix <: BetaMLUnsupervisedModel

Compute a confusion matrix detailing the mismatch between observations and predictions of a categorical variable

For the parameters see ConfusionMatrix_hp and BML_options.

The "predicted" values are either the scores or the normalised scores (depending on the parameter normalise_scores [def: true]).

Notes:

  • The Confusion matrix report can be printed (i.e. print(cm_model). If you plan to print the Confusion Matrix report, be sure that the type of the data in y and can be converted to String.

  • Information in a structured way is available trought the info(cm) function that returns the following dictionary:

    • accuracy: Oveall accuracy rate
    • misclassification: Overall misclassification rate
    • actual_count: Array of counts per lebel in the actual data
    • predicted_count: Array of counts per label in the predicted data
    • scores: Matrix actual (rows) vs predicted (columns)
    • normalised_scores: Normalised scores
    • tp: True positive (by class)
    • tn: True negative (by class)
    • fp: False positive (by class)
    • fn: False negative (by class)
    • precision: True class i over predicted class i (by class)
    • recall: Predicted class i over true class i (by class)
    • specificity: Predicted not class i over true not class i (by class)
    • f1score: Harmonic mean of precision and recall
    • mean_precision: Mean by class, respectively unweighted and weighted by actual_count
    • mean_recall: Mean by class, respectively unweighted and weighted by actual_count
    • mean_specificity: Mean by class, respectively unweighted and weighted by actual_count
    • mean_f1score: Mean by class, respectively unweighted and weighted by actual_count
    • categories: The categories considered
    • fitted_records: Number of records considered
    • n_categories: Number of categories considered

Example:

The confusion matrix can also be plotted, e.g.:

julia> using Plots, BetaML
+ 0.44  1.21  1.18  13.54  0.85  0.443453   1.2731    1.2182    13.5218   0.842298
source
BetaML.Utils.ConfusionMatrixType
mutable struct ConfusionMatrix <: BetaMLUnsupervisedModel

Compute a confusion matrix detailing the mismatch between observations and predictions of a categorical variable

For the parameters see ConfusionMatrix_hp and BML_options.

The "predicted" values are either the scores or the normalised scores (depending on the parameter normalise_scores [def: true]).

Notes:

  • The Confusion matrix report can be printed (i.e. print(cm_model). If you plan to print the Confusion Matrix report, be sure that the type of the data in y and can be converted to String.

  • Information in a structured way is available trought the info(cm) function that returns the following dictionary:

    • accuracy: Oveall accuracy rate
    • misclassification: Overall misclassification rate
    • actual_count: Array of counts per lebel in the actual data
    • predicted_count: Array of counts per label in the predicted data
    • scores: Matrix actual (rows) vs predicted (columns)
    • normalised_scores: Normalised scores
    • tp: True positive (by class)
    • tn: True negative (by class)
    • fp: False positive (by class)
    • fn: False negative (by class)
    • precision: True class i over predicted class i (by class)
    • recall: Predicted class i over true class i (by class)
    • specificity: Predicted not class i over true not class i (by class)
    • f1score: Harmonic mean of precision and recall
    • mean_precision: Mean by class, respectively unweighted and weighted by actual_count
    • mean_recall: Mean by class, respectively unweighted and weighted by actual_count
    • mean_specificity: Mean by class, respectively unweighted and weighted by actual_count
    • mean_f1score: Mean by class, respectively unweighted and weighted by actual_count
    • categories: The categories considered
    • fitted_records: Number of records considered
    • n_categories: Number of categories considered

Example:

The confusion matrix can also be plotted, e.g.:

julia> using Plots, BetaML
 
 julia> y  = ["apple","mandarin","clementine","clementine","mandarin","apple","clementine","clementine","apple","mandarin","clementine"];
 
@@ -124,7 +124,7 @@
 
 julia> res = info(cm);
 
-julia> heatmap(string.(res["categories"]),string.(res["categories"]),res["normalised_scores"],seriescolor=cgrad([:white,:blue]),xlabel="Predicted",ylabel="Actual", title="Confusion Matrix (normalised scores)")

CM plot

source
BetaML.Utils.ConfusionMatrix_hpType
mutable struct ConfusionMatrix_hp <: BetaMLHyperParametersSet

Hyperparameters for ConfusionMatrix

Parameters:

  • categories: The categories (aka "levels") to represent. [def: nothing, i.e. unique ground true values].

  • handle_unknown: How to handle categories not seen in the ground true values or not present in the provided categories array? "error" (default) rises an error, "infrequent" adds a specific category for these values.

  • handle_missing: How to handle missing values in either ground true or predicted values ? "error" [default] will rise an error, "drop" will drop the record

  • other_categories_name: Which value to assign to the "other" category (i.e. categories not seen in the gound truth or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and "other" for other types]. This setting is active only if handle_unknown="infrequent" and in that case it MUST be specified if the vector to one-hot encode is neither integer or strings

  • categories_names: A dictionary to map categories to some custom names. Useful for example if categories are integers, or you want to use shorter names [def: Dict(), i.e. not used]. This option isn't currently compatible with missing values or when some record has a value not in this provided dictionary.

  • normalise_scores: Wether predict should return the normalised scores. Note that both unnormalised and normalised scores remain available using info. [def: true]

source
BetaML.Utils.GridSearchType
mutable struct GridSearch <: AutoTuneMethod

Simple grid method for hyper-parameters validation of supervised models.

All parameters are tested using cross-validation and then the "best" combination is used.

Notes:

  • the default loss is suitable for 1-dimensional output supervised models

Parameters:

  • loss::Function: Loss function to use. [def: l2loss_by_cv]. Any function that takes a model, data (a vector of arrays, even if we work only with X) and (using therng` keyword) a RNG and return a scalar loss.

  • res_share::Float64: Share of the (data) resources to use for the autotuning [def: 0.1]. With res_share=1 all the dataset is used for autotuning, it can be very time consuming!

  • hpranges::Dict{String, Any}: Dictionary of parameter names (String) and associated vector of values to test. Note that you can easily sample these values from a distribution with rand(distrobject,nvalues). The number of points you provide for a given parameter can be interpreted as proportional to the prior you have on the importance of that parameter for the algorithm quality.

  • multithreads::Bool: Use multithreads in the search for the best hyperparameters [def: false]

source
BetaML.Utils.KFoldType

KFold(nsplits=5,nrepeats=1,shuffle=true,rng=Random.GLOBAL_RNG)

Iterator for k-fold cross_validation strategy.

source
BetaML.Utils.MinMaxScalerType
mutable struct MinMaxScaler <: BetaML.Utils.AbstractScaler

Scale the data to a given (def: unit) hypercube

Parameters:

  • inputRange: The range of the input. [def: (minimum,maximum)]. Both ranges are functions of the data. You can consider other relative of absolute ranges using e.g. inputRange=(x->minimum(x)*0.8,x->100)

  • outputRange: The range of the scaled output [def: (0,1)]

Example:

julia> using BetaML
+julia> heatmap(string.(res["categories"]),string.(res["categories"]),res["normalised_scores"],seriescolor=cgrad([:white,:blue]),xlabel="Predicted",ylabel="Actual", title="Confusion Matrix (normalised scores)")

CM plot

source
BetaML.Utils.ConfusionMatrix_hpType
mutable struct ConfusionMatrix_hp <: BetaMLHyperParametersSet

Hyperparameters for ConfusionMatrix

Parameters:

  • categories: The categories (aka "levels") to represent. [def: nothing, i.e. unique ground true values].

  • handle_unknown: How to handle categories not seen in the ground true values or not present in the provided categories array? "error" (default) rises an error, "infrequent" adds a specific category for these values.

  • handle_missing: How to handle missing values in either ground true or predicted values ? "error" [default] will rise an error, "drop" will drop the record

  • other_categories_name: Which value to assign to the "other" category (i.e. categories not seen in the gound truth or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and "other" for other types]. This setting is active only if handle_unknown="infrequent" and in that case it MUST be specified if the vector to one-hot encode is neither integer or strings

  • categories_names: A dictionary to map categories to some custom names. Useful for example if categories are integers, or you want to use shorter names [def: Dict(), i.e. not used]. This option isn't currently compatible with missing values or when some record has a value not in this provided dictionary.

  • normalise_scores: Wether predict should return the normalised scores. Note that both unnormalised and normalised scores remain available using info. [def: true]

source
BetaML.Utils.GridSearchType
mutable struct GridSearch <: AutoTuneMethod

Simple grid method for hyper-parameters validation of supervised models.

All parameters are tested using cross-validation and then the "best" combination is used.

Notes:

  • the default loss is suitable for 1-dimensional output supervised models

Parameters:

  • loss::Function: Loss function to use. [def: l2loss_by_cv]. Any function that takes a model, data (a vector of arrays, even if we work only with X) and (using therng` keyword) a RNG and return a scalar loss.

  • res_share::Float64: Share of the (data) resources to use for the autotuning [def: 0.1]. With res_share=1 all the dataset is used for autotuning, it can be very time consuming!

  • hpranges::Dict{String, Any}: Dictionary of parameter names (String) and associated vector of values to test. Note that you can easily sample these values from a distribution with rand(distrobject,nvalues). The number of points you provide for a given parameter can be interpreted as proportional to the prior you have on the importance of that parameter for the algorithm quality.

  • multithreads::Bool: Use multithreads in the search for the best hyperparameters [def: false]

source
BetaML.Utils.KFoldType

KFold(nsplits=5,nrepeats=1,shuffle=true,rng=Random.GLOBAL_RNG)

Iterator for k-fold cross_validation strategy.

source
BetaML.Utils.MinMaxScalerType
mutable struct MinMaxScaler <: BetaML.Utils.AbstractScaler

Scale the data to a given (def: unit) hypercube

Parameters:

  • inputRange: The range of the input. [def: (minimum,maximum)]. Both ranges are functions of the data. You can consider other relative of absolute ranges using e.g. inputRange=(x->minimum(x)*0.8,x->100)

  • outputRange: The range of the scaled output [def: (0,1)]

Example:

julia> using BetaML
 
 julia> x       = [[4000,1000,2000,3000] ["a", "categorical", "variable", "not to scale"] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]
 4×4 Matrix{Any}:
@@ -148,7 +148,7 @@
  4000.0  "a"             4.0  0.4
  1000.0  "categorical"   1.0  0.1
  2000.0  "variable"      2.0  0.2
- 3000.0  "not to scale"  3.0  0.3
source
BetaML.Utils.OneHotE_hpType
mutable struct OneHotE_hp <: BetaMLHyperParametersSet

Hyperparameters for both OneHotEncoder and OrdinalEncoder

Parameters:

  • categories: The categories to represent as columns. [def: nothing, i.e. unique training values or range for integers]. Do not include missing in this list.

  • handle_unknown: How to handle categories not seen in training or not present in the provided categories array? "error" (default) rises an error, "missing" labels the whole output with missing values, "infrequent" adds a specific column for these categories in one-hot encoding or a single new category for ordinal one.

  • other_categories_name: Which value during inverse transformation to assign to the "other" category (i.e. categories not seen on training or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and "other" for other types]. This setting is active only if handle_unknown="infrequent" and in that case it MUST be specified if the vector to one-hot encode is neither integer or strings

source
BetaML.Utils.OneHotEncoderType
mutable struct OneHotEncoder <: BetaMLUnsupervisedModel

Encode a vector of categorical values as one-hot columns.

The algorithm distinguishes between missing values, for which it returns a one-hot encoded row of missing values, and other categories not in the provided list or not seen during training that are handled according to the handle_unknown parameter.

For the parameters see OneHotE_hp and BML_options. This model supports inverse_predict.

Example:

julia> using BetaML
+ 3000.0  "not to scale"  3.0  0.3
source
BetaML.Utils.OneHotE_hpType
mutable struct OneHotE_hp <: BetaMLHyperParametersSet

Hyperparameters for both OneHotEncoder and OrdinalEncoder

Parameters:

  • categories: The categories to represent as columns. [def: nothing, i.e. unique training values or range for integers]. Do not include missing in this list.

  • handle_unknown: How to handle categories not seen in training or not present in the provided categories array? "error" (default) rises an error, "missing" labels the whole output with missing values, "infrequent" adds a specific column for these categories in one-hot encoding or a single new category for ordinal one.

  • other_categories_name: Which value during inverse transformation to assign to the "other" category (i.e. categories not seen on training or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and "other" for other types]. This setting is active only if handle_unknown="infrequent" and in that case it MUST be specified if the vector to one-hot encode is neither integer or strings

source
BetaML.Utils.OneHotEncoderType
mutable struct OneHotEncoder <: BetaMLUnsupervisedModel

Encode a vector of categorical values as one-hot columns.

The algorithm distinguishes between missing values, for which it returns a one-hot encoded row of missing values, and other categories not in the provided list or not seen during training that are handled according to the handle_unknown parameter.

For the parameters see OneHotE_hp and BML_options. This model supports inverse_predict.

Example:

julia> using BetaML
 
 julia> x       = ["a","d","e","c","d"];
 
@@ -175,7 +175,7 @@
 3-element Vector{String}:
  "a"
  "zz"
- "c"
source
BetaML.Utils.OrdinalEncoderType
mutable struct OrdinalEncoder <: BetaMLUnsupervisedModel

Encode a vector of categorical values as integers.

The algorithm distinguishes between missing values, for which it propagate the missing, and other categories not in the provided list or not seen during training that are handled according to the handle_unknown parameter.

For the parameters see OneHotE_hp and BML_options. This model supports inverse_predict.

Example:

julia> using BetaML
+ "c"
source
BetaML.Utils.OrdinalEncoderType
mutable struct OrdinalEncoder <: BetaMLUnsupervisedModel

Encode a vector of categorical values as integers.

The algorithm distinguishes between missing values, for which it propagate the missing, and other categories not in the provided list or not seen during training that are handled according to the handle_unknown parameter.

For the parameters see OneHotE_hp and BML_options. This model supports inverse_predict.

Example:

julia> using BetaML
 
 julia> x       = ["a","d","e","c","d"];
 
@@ -204,7 +204,7 @@
  "a"
  "zz"
  "c"
- "zz"
source
BetaML.Utils.PCAE_hpType
mutable struct PCAE_hp <: BetaMLHyperParametersSet

Hyperparameters for the PCAEncoder transformer

Parameters

  • encoded_size: The size, that is the number of dimensions, to maintain (with encoded_size <= size(X,2) ) [def: nothing, i.e. the number of output dimensions is determined from the parameter max_unexplained_var]

  • max_unexplained_var: The maximum proportion of variance that we are willing to accept when reducing the number of dimensions in our data [def: 0.05]. It doesn't have any effect when the output number of dimensions is explicitly chosen with the parameter encoded_size

source
BetaML.Utils.PCAEncoderType
mutable struct PCAEncoder <: BetaMLUnsupervisedModel

Perform a Principal Component Analysis, a dimensionality reduction tecnique employing a linear trasformation of the original matrix by the eigenvectors of the covariance matrix.

PCAEncoder returns the matrix reprojected among the dimensions of maximum variance.

For the parameters see PCAE_hp and BML_options

Notes:

  • PCAEncoder doesn't automatically scale the data. It is suggested to apply the Scaler model before running it.
  • Missing data are not supported. Impute them first, see the Imputation module.
  • If one doesn't know a priori the maximum unexplained variance that he is willling to accept, nor the wished number of dimensions, he can run the model with all the dimensions in output (i.e. with encoded_size=size(X,2)), analise the proportions of explained cumulative variance by dimensions in info(mod,""explained_var_by_dim"), choose the number of dimensions K according to his needs and finally pick from the reprojected matrix only the number of dimensions required, i.e. out.X[:,1:K].

Example:

julia> using BetaML
+ "zz"
source
BetaML.Utils.PCAE_hpType
mutable struct PCAE_hp <: BetaMLHyperParametersSet

Hyperparameters for the PCAEncoder transformer

Parameters

  • encoded_size: The size, that is the number of dimensions, to maintain (with encoded_size <= size(X,2) ) [def: nothing, i.e. the number of output dimensions is determined from the parameter max_unexplained_var]

  • max_unexplained_var: The maximum proportion of variance that we are willing to accept when reducing the number of dimensions in our data [def: 0.05]. It doesn't have any effect when the output number of dimensions is explicitly chosen with the parameter encoded_size

source
BetaML.Utils.PCAEncoderType
mutable struct PCAEncoder <: BetaMLUnsupervisedModel

Perform a Principal Component Analysis, a dimensionality reduction tecnique employing a linear trasformation of the original matrix by the eigenvectors of the covariance matrix.

PCAEncoder returns the matrix reprojected among the dimensions of maximum variance.

For the parameters see PCAE_hp and BML_options

Notes:

  • PCAEncoder doesn't automatically scale the data. It is suggested to apply the Scaler model before running it.
  • Missing data are not supported. Impute them first, see the Imputation module.
  • If one doesn't know a priori the maximum unexplained variance that he is willling to accept, nor the wished number of dimensions, he can run the model with all the dimensions in output (i.e. with encoded_size=size(X,2)), analise the proportions of explained cumulative variance by dimensions in info(mod,""explained_var_by_dim"), choose the number of dimensions K according to his needs and finally pick from the reprojected matrix only the number of dimensions required, i.e. out.X[:,1:K].

Example:

julia> using BetaML
 
 julia> xtrain        = [1 10 100; 1.1 15 120; 0.95 23 90; 0.99 17 120; 1.05 8 90; 1.1 12 95];
 
@@ -232,7 +232,7 @@
 
 julia> xtest_reproj  = predict(mod,xtest)
 1×2 Matrix{Float64}:
- 200.898  6.3566
source
BetaML.Utils.ScalerType
mutable struct Scaler <: BetaMLUnsupervisedModel

Scale the data according to the specific chosen method (def: StandardScaler)

For the parameters see Scaler_hp and BML_options

Examples:

  • Standard scaler (default)...
julia> using BetaML, Statistics
+ 200.898  6.3566
source
BetaML.Utils.ScalerType
mutable struct Scaler <: BetaMLUnsupervisedModel

Scale the data according to the specific chosen method (def: StandardScaler)

For the parameters see Scaler_hp and BML_options

Examples:

  • Standard scaler (default)...
julia> using BetaML, Statistics
 
 julia> x         = [[4000,1000,2000,3000] [400,100,200,300] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]
 4×4 Matrix{Float64}:
@@ -288,7 +288,7 @@
  4000.0  "a"             4.0  0.4
  1000.0  "categorical"   1.0  0.1
  2000.0  "variable"      2.0  0.2
- 3000.0  "not to scale"  3.0  0.3
source
BetaML.Utils.Scaler_hpType
mutable struct Scaler_hp <: BetaMLHyperParametersSet

Hyperparameters for the Scaler transformer

Parameters

  • method: The specific scaler method to employ with its own parameters. See StandardScaler [def] or MinMaxScaler.

  • skip: The positional ids of the columns to skip scaling (eg. categorical columns, dummies,...) [def: []]

source
BetaML.Utils.StandardScalerType
mutable struct StandardScaler <: BetaML.Utils.AbstractScaler

Standardise the input to zero mean and unit standard deviation, aka "Z-score". Note that missing values are skipped.

Parameters:

  • scale: Scale to unit variance [def: true]

  • center: Center to zero mean [def: true]

Example:

julia> using BetaML, Statistics
+ 3000.0  "not to scale"  3.0  0.3
source
BetaML.Utils.Scaler_hpType
mutable struct Scaler_hp <: BetaMLHyperParametersSet

Hyperparameters for the Scaler transformer

Parameters

  • method: The specific scaler method to employ with its own parameters. See StandardScaler [def] or MinMaxScaler.

  • skip: The positional ids of the columns to skip scaling (eg. categorical columns, dummies,...) [def: []]

source
BetaML.Utils.StandardScalerType
mutable struct StandardScaler <: BetaML.Utils.AbstractScaler

Standardise the input to zero mean and unit standard deviation, aka "Z-score". Note that missing values are skipped.

Parameters:

  • scale: Scale to unit variance [def: true]

  • center: Center to zero mean [def: true]

Example:

julia> using BetaML, Statistics
 
 julia> x         = [[4000,1000,2000,3000] [400,100,200,300] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]
 4×4 Matrix{Float64}:
@@ -320,8 +320,8 @@
  4000.0  400.0  4.0  0.4
  1000.0  100.0  1.0  0.1
  2000.0  200.0  2.0  0.2
- 3000.0  300.0  3.0  0.3
source
BetaML.Utils.SuccessiveHalvingSearchType
mutable struct SuccessiveHalvingSearch <: AutoTuneMethod

Hyper-parameters validation of supervised models that search the parameters space trouth successive halving

All parameters are tested on a small sub-sample, then the "best" combinations are kept for a second round that use more samples and so on untill only one hyperparameter combination is left.

Notes:

  • the default loss is suitable for 1-dimensional output supervised models, and applies itself cross-validation. Any function that accepts a model, some data and return a scalar loss can be used
  • the rate at which the potential candidate combinations of hyperparameters shrink is controlled by the number of data shares defined in res_shared (i.e. the epochs): more epochs are choosen, lower the "shrink" coefficient

Parameters:

  • loss::Function: Loss function to use. [def: l2loss_by_cv]. Any function that takes a model, data (a vector of arrays, even if we work only with X) and (using therng` keyword) a RNG and return a scalar loss.

  • res_shares::Vector{Float64}: Shares of the (data) resources to use for the autotuning in the successive iterations [def: [0.05, 0.2, 0.3]]. With res_share=1 all the dataset is used for autotuning, it can be very time consuming! The number of models is reduced of the same share in order to arrive with a single model. Increase the number of res_shares in order to increase the number of models kept at each iteration.

  • hpranges::Dict{String, Any}: Dictionary of parameter names (String) and associated vector of values to test. Note that you can easily sample these values from a distribution with rand(distrobject,nvalues). The number of points you provide for a given parameter can be interpreted as proportional to the prior you have on the importance of that parameter for the algorithm quality.

  • multithreads::Bool: Use multiple threads in the search for the best hyperparameters [def: false]

source
Base.errorMethod

error(y,ŷ;ignorelabels=false) - Categorical error (T vs T)

source
Base.errorMethod

error(y,ŷ) - Categorical error with probabilistic prediction of a single datapoint (Int vs PMF).

source
Base.errorMethod

error(y,ŷ) - Categorical error with probabilistic predictions of a dataset (Int vs PMF).

source
Base.errorMethod

error(y,ŷ) - Categorical error with with probabilistic predictions of a dataset given in terms of a dictionary of probabilities (T vs Dict{T,Float64}).

source
Base.reshapeMethod

reshape(myNumber, dims..) - Reshape a number as a n dimensional Array

source
BetaML.Utils.accuracyMethod

accuracy(y,ŷ;tol,ignorelabels)

Categorical accuracy with probabilistic predictions of a dataset (PMF vs Int).

Parameters:

  • y: The N array with the correct category for each point $n$.
  • : An (N,K) matrix of probabilities that each $\hat y_n$ record with $n \in 1,....,N$ being of category $k$ with $k \in 1,...,K$.
  • tol: The tollerance to the prediction, i.e. if considering "correct" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].
  • ignorelabels: Whether to ignore the specific label order in y. Useful for unsupervised learning algorithms where the specific label order don't make sense [def: false]
source
BetaML.Utils.accuracyMethod

accuracy(y,ŷ;tol)

Categorical accuracy with probabilistic predictions of a dataset given in terms of a dictionary of probabilities (Dict{T,Float64} vs T).

Parameters:

  • : An array where each item is the estimated probability mass function in terms of a Dictionary(Item1 => Prob1, Item2 => Prob2, ...)
  • y: The N array with the correct category for each point $n$.
  • tol: The tollerance to the prediction, i.e. if considering "correct" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].
source
BetaML.Utils.accuracyMethod
accuracy(y,ŷ;tol)

Categorical accuracy with probabilistic prediction of a single datapoint (PMF vs Int).

Use the parameter tol [def: 1] to determine the tollerance of the prediction, i.e. if considering "correct" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values.

source
BetaML.Utils.accuracyMethod
accuracy(y,ŷ;tol)

Categorical accuracy with probabilistic prediction of a single datapoint given in terms of a dictionary of probabilities (Dict{T,Float64} vs T).

Parameters:

  • : The returned probability mass function in terms of a Dictionary(Item1 => Prob1, Item2 => Prob2, ...)
  • tol: The tollerance to the prediction, i.e. if considering "correct" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].
source
BetaML.Utils.autojacobianMethod

autojacobian(f,x;nY)

Evaluate the Jacobian using AD in the form of a (nY,nX) matrix of first derivatives

Parameters:

  • f: The function to compute the Jacobian
  • x: The input to the function where the jacobian has to be computed
  • nY: The number of outputs of the function f [def: length(f(x))]

Return values:

  • An Array{Float64,2} of the locally evaluated Jacobian

Notes:

  • The nY parameter is optional. If provided it avoids having to compute f(x)
source
BetaML.Utils.batchMethod

batch(n,bsize;sequential=false,rng)

Return a vector of bsize vectors of indeces from 1 to n. Randomly unless the optional parameter sequential is used.

Example:

julia julia> Utils.batch(6,2,sequential=true) 3-element Array{Array{Int64,1},1}: [1, 2] [3, 4] [5, 6]

source
BetaML.Utils.class_countsMethod

class_counts(x;classes=nothing)

Return a (unsorted) vector with the counts of each unique item (element or rows) in a dataset.

If order is important or not all classes are present in the data, a preset vectors of classes can be given in the parameter classes

source
BetaML.Utils.consistent_shuffleMethod
consistent_shuffle(data;dims,rng)

Shuffle a vector of n-dimensional arrays across dimension dims keeping the same order between the arrays

Parameters

  • data: The vector of arrays to shuffle
  • dims: The dimension over to apply the shuffle [def: 1]
  • rng: An AbstractRNG to apply for the shuffle

Notes

  • All the arrays must have the same size for the dimension to shuffle

Example

julia> a = [1 2 30; 10 20 30]; b = [100 200 300]; julia> (aShuffled, bShuffled) = consistent_shuffle([a,b],dims=2) 2-element Vector{Matrix{Int64}}: [1 30 2; 10 30 20] [100 300 200]

source
BetaML.Utils.SuccessiveHalvingSearchType
mutable struct SuccessiveHalvingSearch <: AutoTuneMethod

Hyper-parameters validation of supervised models that search the parameters space trouth successive halving

All parameters are tested on a small sub-sample, then the "best" combinations are kept for a second round that use more samples and so on untill only one hyperparameter combination is left.

Notes:

  • the default loss is suitable for 1-dimensional output supervised models, and applies itself cross-validation. Any function that accepts a model, some data and return a scalar loss can be used
  • the rate at which the potential candidate combinations of hyperparameters shrink is controlled by the number of data shares defined in res_shared (i.e. the epochs): more epochs are choosen, lower the "shrink" coefficient

Parameters:

  • loss::Function: Loss function to use. [def: l2loss_by_cv]. Any function that takes a model, data (a vector of arrays, even if we work only with X) and (using therng` keyword) a RNG and return a scalar loss.

  • res_shares::Vector{Float64}: Shares of the (data) resources to use for the autotuning in the successive iterations [def: [0.05, 0.2, 0.3]]. With res_share=1 all the dataset is used for autotuning, it can be very time consuming! The number of models is reduced of the same share in order to arrive with a single model. Increase the number of res_shares in order to increase the number of models kept at each iteration.

  • hpranges::Dict{String, Any}: Dictionary of parameter names (String) and associated vector of values to test. Note that you can easily sample these values from a distribution with rand(distrobject,nvalues). The number of points you provide for a given parameter can be interpreted as proportional to the prior you have on the importance of that parameter for the algorithm quality.

  • multithreads::Bool: Use multiple threads in the search for the best hyperparameters [def: false]

source
Base.errorMethod

error(y,ŷ;ignorelabels=false) - Categorical error (T vs T)

source
Base.errorMethod

error(y,ŷ) - Categorical error with probabilistic prediction of a single datapoint (Int vs PMF).

source
Base.errorMethod

error(y,ŷ) - Categorical error with probabilistic predictions of a dataset (Int vs PMF).

source
Base.errorMethod

error(y,ŷ) - Categorical error with with probabilistic predictions of a dataset given in terms of a dictionary of probabilities (T vs Dict{T,Float64}).

source
Base.reshapeMethod

reshape(myNumber, dims..) - Reshape a number as a n dimensional Array

source
BetaML.Utils.accuracyMethod

accuracy(y,ŷ;tol,ignorelabels)

Categorical accuracy with probabilistic predictions of a dataset (PMF vs Int).

Parameters:

  • y: The N array with the correct category for each point $n$.
  • : An (N,K) matrix of probabilities that each $\hat y_n$ record with $n \in 1,....,N$ being of category $k$ with $k \in 1,...,K$.
  • tol: The tollerance to the prediction, i.e. if considering "correct" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].
  • ignorelabels: Whether to ignore the specific label order in y. Useful for unsupervised learning algorithms where the specific label order don't make sense [def: false]
source
BetaML.Utils.accuracyMethod

accuracy(y,ŷ;tol)

Categorical accuracy with probabilistic predictions of a dataset given in terms of a dictionary of probabilities (Dict{T,Float64} vs T).

Parameters:

  • : An array where each item is the estimated probability mass function in terms of a Dictionary(Item1 => Prob1, Item2 => Prob2, ...)
  • y: The N array with the correct category for each point $n$.
  • tol: The tollerance to the prediction, i.e. if considering "correct" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].
source
BetaML.Utils.accuracyMethod
accuracy(y,ŷ;tol)

Categorical accuracy with probabilistic prediction of a single datapoint (PMF vs Int).

Use the parameter tol [def: 1] to determine the tollerance of the prediction, i.e. if considering "correct" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values.

source
BetaML.Utils.accuracyMethod
accuracy(y,ŷ;tol)

Categorical accuracy with probabilistic prediction of a single datapoint given in terms of a dictionary of probabilities (Dict{T,Float64} vs T).

Parameters:

  • : The returned probability mass function in terms of a Dictionary(Item1 => Prob1, Item2 => Prob2, ...)
  • tol: The tollerance to the prediction, i.e. if considering "correct" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].
source
BetaML.Utils.autojacobianMethod

autojacobian(f,x;nY)

Evaluate the Jacobian using AD in the form of a (nY,nX) matrix of first derivatives

Parameters:

  • f: The function to compute the Jacobian
  • x: The input to the function where the jacobian has to be computed
  • nY: The number of outputs of the function f [def: length(f(x))]

Return values:

  • An Array{Float64,2} of the locally evaluated Jacobian

Notes:

  • The nY parameter is optional. If provided it avoids having to compute f(x)
source
BetaML.Utils.batchMethod

batch(n,bsize;sequential=false,rng)

Return a vector of bsize vectors of indeces from 1 to n. Randomly unless the optional parameter sequential is used.

Example:

julia julia> Utils.batch(6,2,sequential=true) 3-element Array{Array{Int64,1},1}: [1, 2] [3, 4] [5, 6]

source
BetaML.Utils.class_countsMethod

class_counts(x;classes=nothing)

Return a (unsorted) vector with the counts of each unique item (element or rows) in a dataset.

If order is important or not all classes are present in the data, a preset vectors of classes can be given in the parameter classes

source
BetaML.Utils.consistent_shuffleMethod
consistent_shuffle(data;dims,rng)

Shuffle a vector of n-dimensional arrays across dimension dims keeping the same order between the arrays

Parameters

  • data: The vector of arrays to shuffle
  • dims: The dimension over to apply the shuffle [def: 1]
  • rng: An AbstractRNG to apply for the shuffle

Notes

  • All the arrays must have the same size for the dimension to shuffle

Example

julia> a = [1 2 30; 10 20 30]; b = [100 200 300]; julia> (aShuffled, bShuffled) = consistent_shuffle([a,b],dims=2) 2-element Vector{Matrix{Int64}}: [1 30 2; 10 30 20] [100 300 200]

source
BetaML.Utils.cross_validationFunction
cross_validation(
     f,
     data
 ) -> Union{Tuple{Any, Any}, Vector{Any}}
@@ -337,14 +337,15 @@
 julia> Y = [1:9;];
 julia> sampler = KFold(nsplits=3);
 julia> (μ,σ) = cross_validation([X,Y],sampler) do trainData,valData,rng
-                 (xtrain,ytrain) = trainData; (xval,yval) = valData
-                 trainedModel    = buildForest(xtrain,ytrain,30)
-                 ŷval            = predict(trainedModel,xval)
-                 ϵ               = relative_mean_error(yval,ŷval,normrec=false)
-                 return ϵ
-               end
-(0.3202242202242202, 0.04307662219315022)
source
BetaML.Utils.crossentropyMethod

crossentropy(y,ŷ; weight)

Compute the (weighted) cross-entropy between the predicted and the sampled probability distributions.

To be used in classification problems.

source
BetaML.Utils.dpluMethod

dplu(x;α=0.1,c=1)

Piecewise Linear Unit derivative

https://arxiv.org/pdf/1809.09534.pdf

source
BetaML.Utils.dsoftmaxMethod

dsoftmax(x; β=1)

Derivative of the softmax function

https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/

source
BetaML.Utils.entropyMethod

entropy(x)

Calculate the entropy for a list of items (or rows).

See: https://en.wikipedia.org/wiki/Decisiontreelearning#Gini_impurity

source
BetaML.Utils.generate_parallel_rngsMethod
generate_parallel_rngs(rng::AbstractRNG, n::Integer;reSeed=false)

For multi-threaded models, return n independent random number generators (one per thread) to be used in threaded computations.

Note that each ring is a copy of the original random ring. This means that code that use these RNGs will not change the original RNG state.

Use it with rngs = generate_parallel_rngs(rng,Threads.nthreads()) to have a separate rng per thread. By default the function doesn't re-seed the RNG, as you may want to have a loop index based re-seeding strategy rather than a threadid-based one (to guarantee the same result independently of the number of threads). If you prefer, you can instead re-seed the RNG here (using the parameter reSeed=true), such that each thread has a different seed. Be aware however that the stream of number generated will depend from the number of threads at run time.

source
BetaML.Utils.getpermutationsMethod
getpermutations(v::AbstractArray{T,1};keepStructure=false)

Return a vector of either (a) all possible permutations (uncollected) or (b) just those based on the unique values of the vector

Useful to measure accuracy where you don't care about the actual name of the labels, like in unsupervised classifications (e.g. clustering)

source
BetaML.Utils.giniMethod

gini(x)

Calculate the Gini Impurity for a list of items (or rows).

See: https://en.wikipedia.org/wiki/Decisiontreelearning#Information_gain

source
BetaML.Utils.mean_dictsMethod

mean_dicts(dicts)

Compute the mean of the values of an array of dictionaries.

Given dicts an array of dictionaries, mean_dicts first compute the union of the keys and then average the values. If the original valueas are probabilities (non-negative items summing to 1), the result is also a probability distribution.

source
BetaML.Utils.modeMethod

mode(elements,rng)

Given a vector of dictionaries whose key is numerical (e.g. probabilities), a vector of vectors or a matrix, it returns the mode of each element (dictionary, vector or row) in terms of the key or the position.

Use it to return a unique value from a multiclass classifier returning probabilities.

Note:

  • If multiple classes have the highest mode, one is returned at random (use the parameter rng to fix the stochasticity)
source
BetaML.Utils.modeMethod

mode(v::AbstractVector{T};rng)

Return the position with the highest value in an array, interpreted as mode (using rand in case of multimodal values)

source
BetaML.Utils.modeMethod

mode(dict::Dict{T,Float64};rng)

Return the key with highest mode (using rand in case of multimodal values)

source
BetaML.Utils.mseMethod
mse(y,ŷ)

Compute the mean squared error (MSE) (aka mean squared deviation - MSD) between two vectors y and ŷ. Note that while the deviation is averaged by the length of y is is not scaled to give it a relative meaning.

source
BetaML.Utils.pairwiseMethod
pairwise(x::AbstractArray; distance, dims) -> Any
-

Compute pairwise distance matrix between elements of an array identified across dimension dims.

Parameters:

  • x: the data array
  • distance: a distance measure [def: l2_distance]
  • dims: the dimension of the observations [def: 1, i.e. records on rows]

Returns:

  • a nrecords by nrecords simmetric matrix of the pairwise distances

Notes:

  • if performances matters, you can use something like Distances.pairwise(Distances.euclidean,x,dims=1) from the Distances package.
source
BetaML.Utils.partitionMethod
partition(data,parts;shuffle,dims,rng)

Partition (by rows) one or more matrices according to the shares in parts.

Parameters

  • data: A matrix/vector or a vector of matrices/vectors
  • parts: A vector of the required shares (must sum to 1)
  • shufle: Whether to randomly shuffle the matrices (preserving the relative order between matrices)
  • dims: The dimension for which to partition [def: 1]
  • copy: Wheter to copy the actual data or only create a reference [def: true]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • The sum of parts must be equal to 1
  • The number of elements in the specified dimension must be the same for all the arrays in data

Example:

julia julia> x = [1:10 11:20] julia> y = collect(31:40) julia> ((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.7,0.3])

source
BetaML.Utils.polynomial_kernelMethod

Polynomial kernel parametrised with constant=0 and degree=2 (i.e. a quadratic kernel). For other cᵢ and dᵢ use K = (x,y) -> polynomial_kernel(x,y,c=cᵢ,d=dᵢ) as kernel function in the supporting algorithms

source
BetaML.Utils.pool1dFunction
pool1d(x,poolsize=2;f=mean)

Apply funtion f to a rolling poolsize contiguous (in 1d) neurons.

Applicable to VectorFunctionLayer, e.g. layer2 = VectorFunctionLayer(nₗ,f=(x->pool1d(x,4,f=mean)) Attention: to apply this function as activation function in a neural network you will need Julia version >= 1.6, otherwise you may experience a segmentation fault (see this bug report)

source
BetaML.Utils.radial_kernelMethod

Radial Kernel (aka RBF kernel) parametrised with γ=1/2. For other gammas γᵢ use K = (x,y) -> radial_kernel(x,y,γ=γᵢ) as kernel function in the supporting algorithms

source
BetaML.Utils.relative_mean_errorMethod

relativemeanerror(y, ŷ;normdim=false,normrec=false,p=1)

Compute the relative mean error (l-1 based by default) between y and ŷ.

There are many ways to compute a relative mean error. In particular, if normrec (normdim) is set to true, the records (dimensions) are normalised, in the sense that it doesn't matter if a record (dimension) is bigger or smaller than the others, the relative error is first computed for each record (dimension) and then it is averaged. With both normdim and normrec set to false (default) the function returns the relative mean error; with both set to true it returns the mean relative error (i.e. with p=1 the "mean absolute percentage error (MAPE)") The parameter p [def: 1] controls the p-norm used to define the error.

The mean relative error enfatises the relativeness of the error, i.e. all observations and dimensions weigth the same, wether large or small. Conversly, in the relative mean error the same relative error on larger observations (or dimensions) weights more.

For example, given y = [1,44,3] and ŷ = [2,45,2], the mean relative error mean_relative_error(y,ŷ,normrec=true) is 0.452, while the relative mean error relative_mean_error(y,ŷ, normrec=false) is "only" 0.0625.

source
BetaML.Utils.reluMethod

relu(x)

Rectified Linear Unit

https://www.cs.toronto.edu/~hinton/absps/reluICML.pdf

source
BetaML.Utils.silhouetteMethod
silhouette(distances, classes) -> Any
+    (xtrain,ytrain) = trainData; (xval,yval) = valData
+    model           = RandomForestEstimator(n_trees=30,rng=rng)            
+    fit!(model,xtrain,ytrain)
+    ŷval            = predict(model,xval)
+    ϵ               = relative_mean_error(yval,ŷval)
+    return ϵ
+  end
+(0.3202242202242202, 0.04307662219315022)
source
BetaML.Utils.crossentropyMethod

crossentropy(y,ŷ; weight)

Compute the (weighted) cross-entropy between the predicted and the sampled probability distributions.

To be used in classification problems.

source
BetaML.Utils.dpluMethod

dplu(x;α=0.1,c=1)

Piecewise Linear Unit derivative

https://arxiv.org/pdf/1809.09534.pdf

source
BetaML.Utils.dsoftmaxMethod

dsoftmax(x; β=1)

Derivative of the softmax function

https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/

source
BetaML.Utils.entropyMethod

entropy(x)

Calculate the entropy for a list of items (or rows).

See: https://en.wikipedia.org/wiki/Decisiontreelearning#Gini_impurity

source
BetaML.Utils.generate_parallel_rngsMethod
generate_parallel_rngs(rng::AbstractRNG, n::Integer;reSeed=false)

For multi-threaded models, return n independent random number generators (one per thread) to be used in threaded computations.

Note that each ring is a copy of the original random ring. This means that code that use these RNGs will not change the original RNG state.

Use it with rngs = generate_parallel_rngs(rng,Threads.nthreads()) to have a separate rng per thread. By default the function doesn't re-seed the RNG, as you may want to have a loop index based re-seeding strategy rather than a threadid-based one (to guarantee the same result independently of the number of threads). If you prefer, you can instead re-seed the RNG here (using the parameter reSeed=true), such that each thread has a different seed. Be aware however that the stream of number generated will depend from the number of threads at run time.

source
BetaML.Utils.getpermutationsMethod
getpermutations(v::AbstractArray{T,1};keepStructure=false)

Return a vector of either (a) all possible permutations (uncollected) or (b) just those based on the unique values of the vector

Useful to measure accuracy where you don't care about the actual name of the labels, like in unsupervised classifications (e.g. clustering)

source
BetaML.Utils.giniMethod

gini(x)

Calculate the Gini Impurity for a list of items (or rows).

See: https://en.wikipedia.org/wiki/Decisiontreelearning#Information_gain

source
BetaML.Utils.mean_dictsMethod

mean_dicts(dicts)

Compute the mean of the values of an array of dictionaries.

Given dicts an array of dictionaries, mean_dicts first compute the union of the keys and then average the values. If the original valueas are probabilities (non-negative items summing to 1), the result is also a probability distribution.

source
BetaML.Utils.modeMethod

mode(elements,rng)

Given a vector of dictionaries whose key is numerical (e.g. probabilities), a vector of vectors or a matrix, it returns the mode of each element (dictionary, vector or row) in terms of the key or the position.

Use it to return a unique value from a multiclass classifier returning probabilities.

Note:

  • If multiple classes have the highest mode, one is returned at random (use the parameter rng to fix the stochasticity)
source
BetaML.Utils.modeMethod

mode(v::AbstractVector{T};rng)

Return the position with the highest value in an array, interpreted as mode (using rand in case of multimodal values)

source
BetaML.Utils.modeMethod

mode(dict::Dict{T,Float64};rng)

Return the key with highest mode (using rand in case of multimodal values)

source
BetaML.Utils.mseMethod
mse(y,ŷ)

Compute the mean squared error (MSE) (aka mean squared deviation - MSD) between two vectors y and ŷ. Note that while the deviation is averaged by the length of y is is not scaled to give it a relative meaning.

source
BetaML.Utils.pairwiseMethod
pairwise(x::AbstractArray; distance, dims) -> Any
+

Compute pairwise distance matrix between elements of an array identified across dimension dims.

Parameters:

  • x: the data array
  • distance: a distance measure [def: l2_distance]
  • dims: the dimension of the observations [def: 1, i.e. records on rows]

Returns:

  • a nrecords by nrecords simmetric matrix of the pairwise distances

Notes:

  • if performances matters, you can use something like Distances.pairwise(Distances.euclidean,x,dims=1) from the Distances package.
source
BetaML.Utils.partitionMethod
partition(data,parts;shuffle,dims,rng)

Partition (by rows) one or more matrices according to the shares in parts.

Parameters

  • data: A matrix/vector or a vector of matrices/vectors
  • parts: A vector of the required shares (must sum to 1)
  • shufle: Whether to randomly shuffle the matrices (preserving the relative order between matrices)
  • dims: The dimension for which to partition [def: 1]
  • copy: Wheter to copy the actual data or only create a reference [def: true]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • The sum of parts must be equal to 1
  • The number of elements in the specified dimension must be the same for all the arrays in data

Example:

julia julia> x = [1:10 11:20] julia> y = collect(31:40) julia> ((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.7,0.3])

source
BetaML.Utils.polynomial_kernelMethod

Polynomial kernel parametrised with constant=0 and degree=2 (i.e. a quadratic kernel). For other cᵢ and dᵢ use K = (x,y) -> polynomial_kernel(x,y,c=cᵢ,d=dᵢ) as kernel function in the supporting algorithms

source
BetaML.Utils.pool1dFunction
pool1d(x,poolsize=2;f=mean)

Apply funtion f to a rolling poolsize contiguous (in 1d) neurons.

Applicable to VectorFunctionLayer, e.g. layer2 = VectorFunctionLayer(nₗ,f=(x->pool1d(x,4,f=mean)) Attention: to apply this function as activation function in a neural network you will need Julia version >= 1.6, otherwise you may experience a segmentation fault (see this bug report)

source
BetaML.Utils.radial_kernelMethod

Radial Kernel (aka RBF kernel) parametrised with γ=1/2. For other gammas γᵢ use K = (x,y) -> radial_kernel(x,y,γ=γᵢ) as kernel function in the supporting algorithms

source
BetaML.Utils.relative_mean_errorMethod

relativemeanerror(y, ŷ;normdim=false,normrec=false,p=1)

Compute the relative mean error (l-1 based by default) between y and ŷ.

There are many ways to compute a relative mean error. In particular, if normrec (normdim) is set to true, the records (dimensions) are normalised, in the sense that it doesn't matter if a record (dimension) is bigger or smaller than the others, the relative error is first computed for each record (dimension) and then it is averaged. With both normdim and normrec set to false (default) the function returns the relative mean error; with both set to true it returns the mean relative error (i.e. with p=1 the "mean absolute percentage error (MAPE)") The parameter p [def: 1] controls the p-norm used to define the error.

The mean relative error enfatises the relativeness of the error, i.e. all observations and dimensions weigth the same, wether large or small. Conversly, in the relative mean error the same relative error on larger observations (or dimensions) weights more.

For example, given y = [1,44,3] and ŷ = [2,45,2], the mean relative error mean_relative_error(y,ŷ,normrec=true) is 0.452, while the relative mean error relative_mean_error(y,ŷ, normrec=false) is "only" 0.0625.

source
BetaML.Utils.reluMethod

relu(x)

Rectified Linear Unit

https://www.cs.toronto.edu/~hinton/absps/reluICML.pdf

source
BetaML.Utils.silhouetteMethod
silhouette(distances, classes) -> Any
 

Provide Silhouette scoring for cluster outputs

Parameters:

  • distances: the nrecords by nrecords pairwise distance matrix
  • classes: the vector of assigned classes to each record

Notes:

  • the matrix of pairwise distances can be obtained with the function pairwise
  • this function doesn't sample. Eventually sample before
  • to get the score for the cluster simply compute the mean
  • see also the Wikipedia article

Example:

julia> x  = [1 2 3 3; 1.2 3 3.1 3.2; 2 4 6 6.2; 2.1 3.5 5.9 6.3];
 
 julia> s_scores = silhouette(pairwise(x),[1,2,2,2])
@@ -352,7 +353,7 @@
   0.0
  -0.7590778795827623
   0.5030093571833065
-  0.4936350560759424
source
BetaML.Utils.squared_costMethod

squared_cost(y,ŷ)

Compute the squared costs between a vector of observations and one of prediction as (1/2)*norm(y - ŷ)^2.

Aside the 1/2 term, it correspond to the squared l-2 norm distance and when it is averaged on multiple datapoints corresponds to the Mean Squared Error (MSE). It is mostly used for regression problems.

source
BetaML.Utils.squared_costMethod

squared_cost(y,ŷ)

Compute the squared costs between a vector of observations and one of prediction as (1/2)*norm(y - ŷ)^2.

Aside the 1/2 term, it correspond to the squared l-2 norm distance and when it is averaged on multiple datapoints corresponds to the Mean Squared Error (MSE). It is mostly used for regression problems.

source
BetaML.Utils.xavier_initFunction
xavier_init(previous_npar, this_npar) -> Matrix{Float64}
 xavier_init(
     previous_npar,
     this_npar,
@@ -360,11 +361,11 @@
     rng,
     eltype
 ) -> Any
-

PErform a Xavier initialisation of the weigths

Parameters:

  • previous_npar: number of parameters of the previous layer
  • this_npar: number of parameters of this layer
  • outsize: tuple with the size of the weigths [def: (this_npar,previous_npar)]
  • rng : random number generator [def: Random.GLOBAL_RNG]
  • eltype: eltype of the weigth array [def: Float64]
source
BetaML.Utils.@threadsifMacro

Conditionally apply multi-threading to for loops. This is a variation on Base.Threads.@threads that adds a run-time boolean flag to enable or disable threading.

Example:

function optimize(objectives; use_threads=true)
+

PErform a Xavier initialisation of the weigths

Parameters:

  • previous_npar: number of parameters of the previous layer
  • this_npar: number of parameters of this layer
  • outsize: tuple with the size of the weigths [def: (this_npar,previous_npar)]
  • rng : random number generator [def: Random.GLOBAL_RNG]
  • eltype: eltype of the weigth array [def: Float64]
source
BetaML.Utils.@threadsifMacro

Conditionally apply multi-threading to for loops. This is a variation on Base.Threads.@threads that adds a run-time boolean flag to enable or disable threading.

Example:

function optimize(objectives; use_threads=true)
     @threadsif use_threads for k = 1:length(objectives)
     # ...
     end
 end
 
 # Notes:
-- Borrowed from https://github.com/JuliaQuantumControl/QuantumControlBase.jl/blob/master/src/conditionalthreads.jl
source
+- Borrowed from https://github.com/JuliaQuantumControl/QuantumControlBase.jl/blob/master/src/conditionalthreads.jl
source
diff --git a/dev/index.html b/dev/index.html index 8326fc8..19fca6b 100644 --- a/dev/index.html +++ b/dev/index.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

BLogos BetaML.jl Documentation

Welcome to the documentation of the Beta Machine Learning toolkit.

About

The BetaML toolkit provides machine learning algorithms written in the Julia programming language.

Aside the algorithms themselves, BetaML provides many "utility" functions. Because algorithms are all self-contained in the library itself (you are invited to explore their source code by typing @edit functionOfInterest(par1,par2,...)), the utility functions have APIs that are coordinated with the algorithms, facilitating the "preparation" of the data for the analysis, the choice of the hyper-parameters or the evaluation of the models. Most models have an interface for the MLJ framework.

Aside Julia, BetaML can be accessed in R or Python using respectively JuliaCall and PyJulia. See the tutorial for details.

!!! Warning Version 0.11 brings homogenization in the models' names and put some order on other stuff, but at the cost of severe breaking changes. Follow the updated documentation.

Installation

The BetaML package is included in the standard Julia register, install it with:

  • ] add BetaML

Available modules

While BetaML is split in several (sub)modules, all of them are re-exported at the root module level. This means that you can access their functionality by simply typing using BetaML:

using BetaML
+

BLogos BetaML.jl Documentation

Welcome to the documentation of the Beta Machine Learning toolkit.

About

The BetaML toolkit provides machine learning algorithms written in the Julia programming language.

Aside the algorithms themselves, BetaML provides many "utility" functions. Because algorithms are all self-contained in the library itself (you are invited to explore their source code by typing @edit functionOfInterest(par1,par2,...)), the utility functions have APIs that are coordinated with the algorithms, facilitating the "preparation" of the data for the analysis, the choice of the hyper-parameters or the evaluation of the models. Most models have an interface for the MLJ framework.

Aside Julia, BetaML can be accessed in R or Python using respectively JuliaCall and PyJulia. See the tutorial for details.

!!! Warning Version 0.11 brings homogenization in the models' names and put some order on other stuff, but at the cost of severe breaking changes. Follow the updated documentation.

Installation

The BetaML package is included in the standard Julia register, install it with:

  • ] add BetaML

Available modules

While BetaML is split in several (sub)modules, all of them are re-exported at the root module level. This means that you can access their functionality by simply typing using BetaML:

using BetaML
 myLayer = DenseLayer(2,3) # DenseLayer is defined in the Nn submodule
 res     = KernelPerceptronClassifier() # KernelPerceptronClassifier is defined in the Perceptron module
 @edit DenseLayer(2,3)     # Open a text editor with to the relevant source code

Each module is documented on the links below (you can also use the inline Julia help system: just press the question mark ? and then, on the special help prompt help?>, type the function name):

  • BetaML.Perceptron: The Perceptron, Kernel Perceptron and Pegasos classification algorithms;
  • BetaML.Trees: The Decision Trees and Random Forests algorithms for classification or regression (with missing values supported);
  • BetaML.Nn: Implementation of Artificial Neural Networks;
  • BetaML.Clustering: (hard) Clustering algorithms (K-Means, K-Mdedoids)
  • BetaML.GMM: Various algorithms (Clustering, regressor, missing imputation / collaborative filtering / recommandation systems) that use a Generative (Gaussian) mixture models (probabilistic) fitter, fitted using a EM algorithm;
  • BetaML.Imputation: Imputation algorithms;
  • BetaML.Utils: Various utility functions (scale, one-hot, distances, kernels, pca, accuracy/error measures..).

Available models

Currently BetaML provides the following models:

BetaML nameMLJ InterfaceCategory*
PerceptronClassifierPerceptronClassifierSupervised classifier
KernelPerceptronClassifierKernelPerceptronClassifierSupervised classifier
PegasosClassifierPegasosClassifierSupervised classifier
DecisionTreeEstimatorDecisionTreeClassifier, DecisionTreeRegressorSupervised regressor and classifier
RandomForestEstimatorRandomForestClassifier, RandomForestRegressorSupervised regressor and classifier
NeuralNetworkEstimatorNeuralNetworkRegressor, MultitargetNeuralNetworkRegressor, NeuralNetworkClassifierSupervised regressor and classifier
GaussianMixtureRegressorGaussianMixtureRegressor, MultitargetGaussianMixtureRegressorSupervised regressor
GaussianMixtureRegressor2Supervised regressor
KMeansClustererKMeansClustererUnsupervised hard clusterer
KMedoidsClustererKMedoidsClustererUnsupervised hard clusterer
GaussianMixtureClustererGaussianMixtureClustererUnsupervised soft clusterer
SimpleImputerSimpleImputerUnsupervised missing data imputer
GaussianMixtureImputerGaussianMixtureImputerUnsupervised missing data imputer
RandomForestImputerRandomForestImputerUnsupervised missing data imputer
GeneralImputerGeneralImputerUnsupervised missing data imputer
MinMaxScalerData transformer
StandardScalerData transformer
ScalerData transformer
PCAEncoderUnsupervised dimensionality reduction
AutoEncoderAutoEncoderUnsupervised non-linear dimensionality reduction
OneHotEncoderData transformer
OrdinalEncoderData transformer
ConfusionMatrixPredictions assessment

* There is no formal distinction in BetaML between a transformer, or also a model to assess predictions, and a unsupervised model. They are all treated as unsupervised models that given some data they lern how to return some useful information, wheter a class grouping, a specific tranformation or a quality evaluation..

Usage

New to BetaML or even to Julia / Machine Learning altogether? Start from the tutorial!

All models supports the (a) model construction (where hyperparameters and options are choosen), (b) fitting and (c) prediction paradigm. A few model support inverse_transform, for example to go back from the one-hot encoded columns to the original categorical variable (factor).

This paradigm is described in detail in the API V2 page.

Quick examples

(see the tutorial for a more step-by-step guide to the examples below and to other examples)

  • Using an Artificial Neural Network for multinomial categorisation

In this example we see how to train a neural networks model to predict the specie's name (5th column) given floral sepals and petals measures (first 4 columns) in the famous iris flower dataset.

# Load Modules
@@ -103,4 +103,4 @@
 # Prediction assessment
 relative_mean_error_train = relative_mean_error(ytrain,ŷtrain) # 0.039
 relative_mean_error_test  = relative_mean_error(ytest,ŷtest)   # 0.076
-scatter(ytest,ŷtest,xlabel="Actual",ylabel="Estimated",label=nothing,title="Est vs. obs MPG (test set)")

results

  • Further examples

Finally, you may want to give a look at the "test" folder. While the primary objective of the scripts under the "test" folder is to provide automatic testing of the BetaML toolkit, they can also be used to see how functions should be called, as virtually all functions provided by BetaML are tested there.

Acknowledgements

The development of this package at the Bureau d'Economie Théorique et Appliquée (BETA, Nancy) was supported by the French National Research Agency through the Laboratory of Excellence ARBRE, a part of the “Investissements d'Avenir” Program (ANR 11 – LABX-0002-01).

BLogos

+scatter(ytest,ŷtest,xlabel="Actual",ylabel="Estimated",label=nothing,title="Est vs. obs MPG (test set)")

results

  • Further examples

Finally, you may want to give a look at the "test" folder. While the primary objective of the scripts under the "test" folder is to provide automatic testing of the BetaML toolkit, they can also be used to see how functions should be called, as virtually all functions provided by BetaML are tested there.

Acknowledgements

The development of this package at the Bureau d'Economie Théorique et Appliquée (BETA, Nancy) was supported by the French National Research Agency through the Laboratory of Excellence ARBRE, a part of the “Investissements d'Avenir” Program (ANR 11 – LABX-0002-01).

BLogos

diff --git a/dev/objects.inv b/dev/objects.inv index e3d38dca6523337098e9677fc89f19251e06d16a..c7c25b36d8166646ff6b18d8b5b555b73ed74115 100644 GIT binary patch delta 5646 zcmV+p7V+uAEcPsrn|~dvSp6C01nO$I>}@51=UUbDlQ@v9-)Ljcz)AlDLSJcgQjhYD#2;}-D}d| zEZxcn+Z}0|B;Rf823HM~>oYFh~5Q&Eh%{M8o>*P9#aDS5&ISNsPtNOdE0n$ht zwWVY1-00{IRxd*M&le@<^%&_tp$6Bt=<^N9QHrlfS|eN$bek8SDZ4RgNR=fyon3Q7 zP*k*eQ{AKD8UdYMlG9I&-(7MCDA*YKHM+qSjsQ5Rn4b=_$@?dun?9_Vk5Q*{OHLO&9d+jf$`t1fs!I|ju&mcrk)i9fs0dC` zMsVH23ZQQ&9AhxXfQU5TDpOpyX_JIk70#m@l#{js$bXzPpNs0&U^1g)`Os~i+o!AI zgUND??1K+ZIrW&3a=;DxoHREmQ`yB;bdxkBYG9hdahYuIR@=5FmCBj&Sn8CfD&m$e z;dN0V`jdo?$sI|H(u$({O)&ZX3?*5al8odHJBPtVbDxrTZ4#4hL*OUe=%Q}>og-tv zpdI~*vVSt5&c|HBF4EsJ;65e6ds0QDgt=%N(Ld!t{rBLqB04+lfxt~a1{)34MxfZ_ zfpQL%0sT`xu>T&sTd&oT@KvbU^*T7q%9byfI;4LpwyFCbd~A|bW1KzU!`R1Qeg5}t zk^}8)91(S-_5+82N2q%gye1$>8Ln;tPT&ZIMYY3uQumW!dUte-|K|^g zf+7`^BTEY$!-HPWJM50Ko}{6xfq8`7#UM?A|afXkXqCAdDtC> z+JF9vkTaTc06T>oFY56I$5%z$=-7ML0MLcOrfA4jQQQuAc>4v@b6NKT1=EAg?f?ec z4Syhl8)e@=)-SKfY06$=pG{69V2-=zs5*tXh__hG0U2Q?T?^?Vg^@Hir zjTRJ{m0Y&;IrLyx0}> z|9%6KO&UJ^_4zi1huiO8eEI&x^D^IkD}OfX`KL5IDWbMh&`x1GSREbw@S5X3_~GPW z^=FR%V0CJxaIpG|lg2k^87$%}#Ib#gx^YMw7rZHQI$`)&hsAZcVa5*rFk=^jk|(Ki zJ@@G;!6GZHP^W^kf&?pwvjY5PpdpgF$U|z0!xL~hC@#wJ232Kvc=sxz-2@;bO^!RX#4HHpKN;3_wi z|4`U^03-o|bLz=g_snacu|~KfIT?3n_7E=sGlDR4D=1;q6AqMxkXW8*jUNz$Zg58G zux1{Uw@fo2#DENSO+{_wpgyu<7Sw}XmTh_bqB9}`1OTOBjX{u7Mk?XpV_vJrrt&#ADh%pihVFn=aUqM$!tcTl+`OJLQ&xu=X>N-6X_Un{ zYv3cfm>c*N5|Brm=sU#0kN`shOj`;}gMg5^$BO&?{vJj;Rf?xita8W5pjP#g!SCc` zf?w(R;SK`b`v{bEdH<Tjx1dt$r1c6{6p;L-%FQWfn$}`qY z0gveCq`3#lELplHxk^&_{=k(!AvmQBGNq&uVy0t2tj`XF5Dm2Dbkr!GAT~U}Wn$HK zGNq$yJDS@O+HOs~t}p`0Y^pYbDH-ruPK(A#6apy_LFdDToQ2xg-+!ZH2jT8s&YK?d zinB$RsI847WSOA=rK@ze3eI4nI8DdHI@QZdIyX;ATS3@DU6psZ1u@6@I{89ViS?*M zyEF(;8R7D$F~a1}!bq=ua`Kr};oqrRcmq=h9Yj}f-ztbvK^+yusdReiqq$du1coejDf( z@QGY1-FgOVOl2oja&`WGB8SzI&RSOdVWJcUS6TYWx|e1;R_v*kltoP;Py3pOpE9mc zQ>9>O8sNg3h266ul+lHnFxAP+AWqgQ9+R7mRu&2LfN0V)*ndyFwWvZP>mT|&VF1zdk66;I`1t5Jo zO3g^j={TbPTT0>`Q*dKWLKv)*nmOW$Ucnjhhy(=cA!unRyuvhK5(=*v)({>|g?6M- zPdfOSCc?w(Hh*Wjub%Wn*m%^958n6D{1j*yT;MyYzoaf}DxBo78siyXPod)YnSzqa zO5m57ywjxV5{&$SJgTA)v7m~2s=*~)9=aeR z32a#t%T7utnZe~#JEImX2QvUM#xRjLW{)KtVsODf+J7Xv4T)7BXeW{lL~>xz_X~K$ zhl_LBkJ*->%)qAG1_ftbw<$#{HHxZ3+yz3-O;xnJn~-`N(|!pDM;$m4;B>T_E&v)!yEJAUvV{*+rqsLD8 z^7GtN3x9eGV-=l2dx0)GN=JegfY&4?qaN{|?Gv9=M>Y5r)`O}h>Q~V;7}ZZA^Ro91 z6ts|_#SS&61>sHuN1&6uFm~5etp`cM3qjN)te^`8pM$u5##J1C%;EWOmn|r@-+q02 z`SlR-&zC2s{pbGU=H&hP$BWhA`;qd!`v^zi?tdKLlL{UlA~8^YSiz`M`0oV%$H^Y& z_g}x#gjYq74+Q#@zT!zt)Catd`5pY}HRB2X0_ee?PT<_be#vKm(|l$=5_~wGA6cFo zkGeJCKeOK?xbkHspiaf#6Y)W19@dL~re9yMRfE3|!P<3h&`QAbL2~4Sbm8c#H}@q$ zT7MZKR0WGWAt0I8MY!Rj@B`o8sVy@hVM0Hs)9op}hOl?YB~vX#<a74{C*)c<98}tQnr~2!Ho$+6Oe35d@cA>rhW7U=@OMzI9Lemzg-@pQ-APmn?Jt%+UtVukX~xsE14q)X5|o4?DQ+& zT0uifvvvv%5x*RW055Css>YkHT(m$0o?o!c)0w_~O_O0v=_9&0acFOO91b^Q` z&VjOr*1gr9_~BtstwZ_DbdoON;Yo-yIA0r(P-U~U?8M-hL?ms+F*fv>>j3mHz?fBw z@f?$A4zOujmk5I&|D04+QE5`s%LIfve|)LQjoj}aUv1j#ik}nf3c7T)9cQBUMed&V{kqdb??};2Erv)$(&Tf6mF@iso$|!9Z0^v#Hs+*;d= zDSxCvPFmD>iLQu8l?8Bn@$SUM{**EhT7VtaubLHj6s6E<0>`|ewlmAG=(y0+gG=HNH*F!!-iDjz2vgl2xQ#~+RBYLv_cZ)&8Zlj`Z7SS>h zOcUh?$VaS&AuEdXD5isDs(%6?T}ig$q;7B?En7zy8s8aCPLDg2(zo&QH2KqWg0Cls z(o_8lsjiBe=GptvO3RBj76R;NHhcXGXohOX*KQ-t!Dd>c0KS9Ok$>BAbG+(eZghRW zv8?R-H^*KIkv|m$$9sp-g9!Hcs%~h@lM&I*i(_>{qoz-UwJKp(u!NLO1dB?4NH>F5bRnjs@#!qSXYu5D8_u#IqT5Z1@mRl-Dd{&V2+QpYHhNc4LLP{6Jr5>W9lNbAh-Y&o;%fGZrZx!QQDN1Ao6|?wq2L%UK$XUcf<$aTR-- z3fT8|Rne9t9y|2rwDmZQt3ho(eA}hG9HPOGdWiG&L0d44<$t8E9N~xW#3)Z%%3I$! zlZCDp_q$k-0XI*vB)Wrh^gst`2yaz=)4SldReN4uHzBX}%J418nE8>*C5(8NLz7N} zr=vc=N0(2-*f|hN3_LI=s#b8BU9*^CLfJD4mU4za5ht+I$g@DJ1TkI|q4j{ST&hUo zFYtq2ug|wd$$#6D#$9Qcc*&aAY25#8ASh$8p#@y?DO42N)XXY{mj%`r5W?FeYqQx4 zufaJtwp!p6R?|-5gnvIco-(;5A1_IAEp+;gW3A`CbdF}`zd&N%Lxc=Y+TeV8l~qm9 zG5^&ls?8}#n90@wuU-rffQQ^#**5*$B}v;Xsc+^AaeohE5n>*q)p%+>$;(nw z8!yArqcHQj9L<;+SuQS2?HWBthv?|PD?f~_v34c){b)kCXj(2U<7=@g7bOg#*N&x- zxf~-~8%~c0_%Sf~XG!dtI4yD6=e>Fc_nn$^5P!7HV@&MYruc-T;)=E=EzR%h2*ysD z()zBZZKoRo7P=Aflpo>f;4@XRrG;-+G-Pe%bL!fuKE9D&Act7Y2|z~Tr9A~r?&wKL zIL|d0@G~VV&t|fNGjxdfUr(c|2%BzVI`msn|J;dAQ&+_P*Q-LQk+mL z+kde8BQ{y)cH)v&DSM~WBG-I~7gs}y7SWbw;8Lk!8or)~$m=MEQx`AMAAIrP$+(dYf=%@G?sq43R#DQmZ} z1bt?_cWI6?-A>b)mc(4@Mj>RhKN~P2z*8As+5A(6U$Ma5SmW8|NI_``cvZkjzJJ4M z5_k6~CR=#udmvQTNwYJ)zN!+f+54v-&ug6|J=1?351T|2_ef;@SSf9#uiG=Cw)=F- z^@oYF`p88I&ix&(61|w|XK{2mM`ts;F}JPp9T|wRqX@L?qGe0G?W1p0;Gt?T<>aT+ zB8@%UqN1Wsa`Ld+uNB91XWUg1O@BMwE}9*%m$yB3181gq+3N7#&bubyJ6spTE}KcY zE^$(QPWo<8Rjqs81<){m8&IPFZLELz4=8eL)Zpr#pkE!jDN9nZ8peQ zI5!7r+)sDVSX$6$R$lGk6KULY`hHjT3Iv}3Ly4re^fvjHUn74=L}ubK`r%=jZ*WQ6 zO}CDY;(1q1G%7wFw_g3m6E2JAH5!(^en{4O(OnuPMZK5Qv`85(Y{{AI$dL^s3##hz zXj$EL=4Ync($zz+56t$C%6}rg&x?${7~+<>EQ<&`uuPM;rzY*5B+Z-{C59)spX@9e z+EP0X29@LVd%2uERf7~w!|e4!6@5b_`)5Nux`ydXS!hxUr@O8h?c{hpWt}Z9M07544O5n}42?ZCl2;nYeRc zsllqiURMt2o4eKhalDx03P-o2PQMp>Iu^I95UAMQ{vprge{TR23ugdhqha0Df$f>FU5)BDNb`(>Jvkr0cHM z@Q$%5@%|B&6WHw@EPuH89#7wVPgf~TO&>Y8jpFKd9B+jA~<&HEZsWQ>DS3gq0$4X_t)gDghU6sU> zH`=>&jXK%6Er4Aa=k&){Qh~6ae|$CY5aq958K8_Mm6GsupMT*~|$ zGp9T@m9y|qdg(0{Gw$DhjT}^OyhCCaLF<{@r*4i6-CXn_J0yPtt33_TA-yS?5=z9c^RtR z^Ce#vg+8OF)Rp;GQvG{kaZ0EBArcQ2n(tFwRmpV{;eR?QaulKnm(_PyJ*1H~ zYE8%3xzW+>EnbK4pD#+z>k-m_LN%^#(bpT2qZD6}v_iNf=r%9DQg&lflQK(kI=lLY zpr~l_x_m&zH3B-DB&VMkznkO^P_QxdYjlH49071r)k#z-OY(_qN&*b9<+w_2394W$ z5HRLYwj;v{0L2Gi!uOYUK*GMwvs z-&Aw~;I|N+!9rLziH%q%@S|(PB@QvGQ9Ru)x73{z*rqtIQMDye0?T?`78$xui<00J zWdv6ZtN{9k!Z8M842Vemtun=Blh#RiRpLClK{;tkfPc(M{k14>4JI=>mJeO$xqZ4S zKA0@W$UgYslv9rh+4i_WUz7RgP z<$-bzlmXpSKCt^9oGzE@Ncbw$>~a~LW!r`?m^!3;Dz>Tp9(=BoRAZby;KSI*V0r%c zb&><^Ya9`Er1k@cfJdl(6dc2e2#a!q^Q7u{Vt;CMaL4}V4~T*y6_g`O3mn4|grG@6 zIILSxq9)#5+5^doAdG-8QV4UY44G^(rsPCI5dM%F$K`q0%68>He??{)O*w!aDvlR* zc!Q&>qN#Q4ouvnKVX!J{a#a+!Jsv*XV|vW0Zs6yNNXC;q{N!3%AHxs$0GnhNjNzxC zsekAovHDb_t_x70FUGj!iCfIl=tX6`J92|j=$o8ek=NY^G5(+;`sH>`O&c!pN>e5LQmH*=jP{XNoj-O(LR z;1^Jzz0FS?yZ@}#Lm_V6<^T>>Rnu8cunwUj&=sH4N!r}M$VBKlV-uki1O4Q`)S1^0 zc@FZw>^vom zI>Ldn5E9D^t??6LycNz!6;{lJ@s?=@gcy*auBoUE9Mor4%z}Eb$+9(%4|GPP0}h}x zj1V@Hdg;r4G!dp$Ey_~{p4-8*p4lE@q!JE3=aqVFN}qG1!k`wW?+!>67k{$IC;Tq_ z#LepoGi6nnk@}{HUq)FBvj#qri@Ba(ApvA$Nxhi*z3~E%*7yM36Cis<}AMPN~y^layZy&xiX@bKoj&8x*dm<;4fN%r^0VD_@ zK_KW!=$IngiRk~A@_1EUz<(q9IjJ8&GPf)pl3XPze1GCf9}}EX2ANXQ2r<*qBNnuN zP{@^5syDR%ItMCc3htsS}~>melDABM{8GY$BME0k5TuXqZGHkOI+*L3eUs z^gvV`4m!!EGR}a7M*&LbdAA78fK{BP!*LmP)0AZHNZ76TiNd2(GJil2p-`{n6gPki z=gZ`tX633u-*v9PuQI|#O=E0(d6VSDZ{^04e%Zo3DgZ;>s@ObMg_H05U0|r zoX_U2_O>R^t@6FfE#ke(rF1tmC1wTe@@6_LPp}DaDsXyU(<*s-Hk6)ECU;Cx$sB5| z2kvIZ^xHr;b}!^T>3@bYSYj$Wp^_)_cL_PHmcP{!%uf@gFu2OnFV?*@Gn-;hwfrWk z3#k{DJkpdQgqo%Vqf!GG)(q&5ZJrDy)MTYjUdC%O?f;zIth5S4pvOL=PQY$Lt69Yk zVR=~gAjDAwu&e;Ib5$#ZhY>;@p`fNG$10dzRY6xW{D2E3V}C2$&Oavo;;l4YV+L*8 z#vaz0^lB;kbd>UonA1^2UA2_N8>Zlfe3_6INWv>j!y2Khihk<-*|c9v8g-Zp;F2y6T@aDj zwXBI{Cnc23%JQk5QB#$JzK6J>pPU;qyb=yExZod665WQxdhoTQ#|9!fFz9OR9x>nS z{Pc5{@|79blvuAwt1B_3uB(PdRfyZbtiCCWW^)r#w|`>X0pZ}N1xEs$j#g8k`(dM| zc34>w#IL$8%FJ=D4T)0F?JeGWAyjgz`$z@g1!PDn@wKSBA}JYkeRpil_@r8@!LP6$R5eitil(-xeiGS~oo}F^ zg#;~jsDC*vh-+#%02%;WJ1#QfC0OI-;mvQ(xhv&atHlWmg`}M=+ z<38e_FON^U&)vt>@yGMe7mMEaL*;w>5stv!Ies7|JlscOp!~3aQ77=eM#T)#3bq>TgUwN{`8LV1b+c^?|)CnaPDEhvEzgZd z?V9kP*>4hD`LYmDC*tq1_@FWm>qS4)udmsv!QZD~1+CI$_YZnmkWbR#qpMy&Yzfjz z1EDHd+zA26ye`5O7lj}A_Kt0t2?-PWL7i?#=^ezkeJ+`5Au5m7BhunfR1xKjYciYq zQh(?73!!Pe6XC$J%~S}kOcCN+^)pQYe=CrzMGtB(C1C zZ?+0orY&!Qd1{9$3r!UqgMabc= zQku0>Xo&b_PXu_Wf>$cuw(-2r7c~MM_A$vTvMSysxbir+;s8~}b)Dh+$p=q^ z)6O0h`YQR4G%T$eaYQrva6RNRlvt+PD~sNAI@LX+HJ~S}e>WI3>^dr{dKN7c!8lQV zfPBPS7_y>B4`Vu576wKD(tm|yD^98!=h3`%bfNK`;pp_ZGbw!=FHe&{Jtz2jbSOR5 zzmW2(sA!(O8?E%QXk#J3er2=Qzkp^aH+=0@(j2VDH45O{TO7JAH^-|k=1SN1E6d7$ ze1GJn5cyM4S$uF9J&0hBuBw{0F&PooIwa~8< z6CiXGPtN}N;rQai2j*C?MkQKJunUn;mUBFtA;*Rf0Y!PuGUnVz@bT%6j*pf;W&wkh zt(%;@nx!Qqv?}XP^#D9*@mx=qEiA6|?2~1L#s9HcW1IVTH_ zKP`)9OX8tJZ%$i}!?+sM=EJvL%F90L{iugHk5Afyek><-v78ay5K z9zME!8phUvP-5VoIZ?HO%j}xP6cfsxQLvOV{E0Y$okpGoT3Zm~SrJ+X=*p#vB>oIP z=-u*sU2J(1)37TI6W_9ibQ<@+>IuqFY-j=3djHcP6TsY2YrScI6z?do!htH9cF+M|1IB}5_p z&BO(4tLSfrtlk!vHEZngJ7t&32@{XZp!&zRa>!9x;(rIfdWO!&wDWdpoVE6+p_rSW zIGxvi3N6`W;C)RM3=Pa88;ZtO1;jr`Y7s6S)+txd-~6L0o&WFuX&kj2`QitZ}&^wMX}pPX<7Flf|sedk=gh?NpI^&1yWAp5$dIsg;-E z=unt>U4M>d$c)Sv7p8WNo}+zq_}`Tu#@0}~68nBQB3v{rmu};0u_+f@7(%ZSOCfVP zMz%Jb4iE5SVDitB*fVij;-lDWCLetAMwAQMngkE&l>|4IYs`t-_pZMK%Z8p{uLk?F*#6DfSgjCP%B%r z`y)14=62$eRw+BD(<0Y=h!5+J&Ol)&iX4^`^ve0;IIL ztdy{@iwG|lmR4omE#*ke6V8x4I(~;>kbe_gJSvA?S~L2*`@A}2gGUs;IWcAJR+ga8 zjQ1|hQKs8zI@6MvOWi1hjP_>(Mg({&!wZ{#%J3@|xE*UeTOBGW4FN9;ILUW7P2%<* z#b^r;ec^-ZI%#&s*H=}dHGBW`^LeeKq$m2X!(o$X;vR^spDU%U^mThi)OMdvxqtpJ zQdS?hD8aeE!)2luGyN=%4(I4>W;f>66}}@qF?JMzHf^+QiMM_9jSAdX4W^v@bW)_T zXIoTMR7p;rR{OQ$81Ia`N}_Rx+eNbl_VTueZs1H5FIyen+j-Xne1of^-(@o@*IS&F zUz4sIR8{MqcL6jErfJY!8krrn)PFAhuIj1qM(MJ*S-lQgpGv9GMeDAO0?a3}P3a3H zigIH8+6rjV4t5PTagR4io=nwS9gND=oURvp3zy#fqLm?R1GM7vOLWHAxi%YQES#Hz zH15YcXe=%06DzNF@R2m`IeljhWM% z-F4b6I8)c3>GNZ%<9yJxQ85F-iGyIud8`I0nugiyg?~!=DoFNkb9i(O)0eW)q!dm!Z8O^O(Q?c$!!~cuqH2?9op4a~UT&PA6O&rQO?j%Ih9z85K4?CF^Dz<9g)Ifu#nk z0()IKpl{<=_s8*Kjw>A94m$l_?CDtCuD_MjF3P4aO!JGfE#lq%_i zp8v5l$IQDeW1jJj+JAwjIL5o|0kR}sf7$lbR0acOkt%IcFC&oPT?Km(X|ofoI`W0uv22 z1m|mdVQzBh-G_9VK~|t6^u^-ahj~>11Hrf%q*u5-J#3No$8{uA1fu0&Vt<2zfmg0r z+k&%)LD)gJnjel*?3ePU^e36kn$wf;_HNC=RD2gW_aZgjRdK#ZMZR`trf1{Ro^HGi zYSsl5t(^a)fPcrtG%@HAbyuS-=GmO+s)$)$^)c_=C+=6Dbnyd-+RfJb?g*NI9FIR3 z{NTGHcp}qwtqeSzJF;dPkNbslmeGl4Xj!Q!YM!;1*l8uH8w5HDue8{{C(+6JO9W<L8*E12YhNi9E`N7YEvgF3G&VY!eZ{Ldlgw6z!Qb#vBS107gvW@6yKJNfRMbzR7R zx3_(!yCb9i=5*AGpJMUbic}QhvZSiImf#898&(9)v;J4=ucZb5XBY%A-!zs93S@@6 zC@nb-dr~e{jje_UTV6Ld<(9PTI^Oh~*3?;eT7AcYVa!gAccpo|PxOrTyJTOW?xUdb num9Nj{2i8mY$QEz45{zlo!$1VBJpf|nBDR)*6i?q_1gk*XRhxW diff --git a/dev/search_index.js b/dev/search_index.js index 49577f3..99da67a 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"Api.html#api_module","page":"The Api module","title":"The BetaML.Api Module","text":"","category":"section"},{"location":"Api.html","page":"The Api module","title":"The Api module","text":"Api","category":"page"},{"location":"Api.html#BetaML.Api","page":"The Api module","title":"BetaML.Api","text":"Api\n\nThe Api Module (currently v2)\n\nThis module includes the shared api trough the various BetaML submodules, i.e. names used by more than one submodule.\n\nModules are free to use other functions but these are defined here to avoid name conflicts and allows instead Multiple Dispatch to handle them. For a user-prospective overall description of the BetaML API see the page API V2 → Introduction for users, while for the implementation of the API see the page API V2 → For developers\n\n\n\n\n\n","category":"module"},{"location":"Api.html#Module-Index","page":"The Api module","title":"Module Index","text":"","category":"section"},{"location":"Api.html","page":"The Api module","title":"The Api module","text":"Modules = [Api]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Api.html#Detailed-API","page":"The Api module","title":"Detailed API","text":"","category":"section"},{"location":"Api.html","page":"The Api module","title":"The Api module","text":"Modules = [Api]\nPrivate = false","category":"page"},{"location":"Api.html#BetaML.Api.FIXEDRNG","page":"The Api module","title":"BetaML.Api.FIXEDRNG","text":"Fixed ring to allow reproducible results\n\nUse it with:\n\nmyAlgorithm(;rng=FIXEDRNG) # always produce the same sequence of results on each run of the script (\"pulling\" from the same rng object on different calls)\nmyAlgorithm(;rng=copy(FIXEDRNG)) # always produce the same result (new rng object on each function call)\n\n\n\n\n\n","category":"constant"},{"location":"Api.html#BetaML.Api.FIXEDSEED","page":"The Api module","title":"BetaML.Api.FIXEDSEED","text":"const FIXEDSEED\n\nFixed seed to allow reproducible results. This is the seed used to obtain the same results under unit tests.\n\nUse it with:\n\nmyAlgorithm(;rng=MyChoosenRNG(FIXEDSEED)) # always produce the same sequence of results on each run of the script (\"pulling\" from the same rng object on different calls)\nmyAlgorithm(;rng=copy(MyChoosenRNG(FIXEDSEED))) # always produce the same result (new rng object on each call)\n\n\n\n\n\n","category":"constant"},{"location":"Api.html#BetaML.Api.BML_options","page":"The Api module","title":"BetaML.Api.BML_options","text":"mutable struct BML_options <: BetaMLOptionsSet\n\nA struct defining the options used by default by the algorithms that do not override it with their own option sets.\n\nFields:\n\ncache::Bool: Cache the results of the fitting stage, as to allow predict(mod) [default: true]. Set it to false to save memory for large data.\ndescr::String: An optional title and/or description for this model\nautotune::Bool: 0ption for hyper-parameters autotuning [def: false, i.e. not autotuning performed]. If activated, autotuning is performed on the first fit!() call. Controll auto-tuning trough the option tunemethod (see the model hyper-parameters)\nverbosity::Verbosity: The verbosity level to be used in training or prediction: NONE, LOW, STD [default], HIGH or FULL\nrng::Random.AbstractRNG: Random Number Generator (see ?FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\neven if a model doesn't override BML_options, may not use all its options, for example deterministic models would not make use of the rng parameter. Passing such parameters in these cases would simply have no influence.\n\nExample:\n\njulia> options = BML_options(cache=false,descr=\"My model\")\n\n\n\n\n\n","category":"type"},{"location":"Api.html#BetaML.Api.Verbosity","page":"The Api module","title":"BetaML.Api.Verbosity","text":"primitive type Verbosity <: Enum{Int32} 32\n\nMany models and functions accept a verbosity parameter.\n\nChoose between: NONE, LOW, STD [default], HIGH and FULL.\n\n\n\n\n\n","category":"type"},{"location":"Api.html#BetaML.Api.fit!-Tuple{BetaMLModel, Vararg{Any, N} where N}","page":"The Api module","title":"BetaML.Api.fit!","text":"fit!(m::BetaMLModel,X,[y])\n\nFit (\"train\") a BetaMLModel (i.e. learn the algorithm's parameters) based on data, either only features or features and labels.\n\nEach specific model implements its own version of fit!(m,X,[Y]), but the usage is consistent across models.\n\nNotes:\n\nFor online algorithms, i.e. models that support updating of the learned parameters with new data, fit! can be repeated as new data arrive, altought not all algorithms guarantee that training each record at the time is equivalent to train all the records at once.\nIf the model has been trained while having the cache option set on true (by default) fit! returns ŷ instead of nothing effectively making it behave like a fit-and-transform function.\nIn Python and other languages that don't allow the exclamation mark within the function name, use fit_ex(⋅) instead of fit!(⋅)\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.hyperparameters-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.hyperparameters","text":"hyperparameters(m::BetaMLModel)\n\nReturns the hyperparameters of a BetaML model. See also ?options for the parameters that do not directly affect learning.\n\nwarning: Warning\nThe returned object is a reference, so if it is modified, the relative object in the model will change too.\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.info-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.info","text":"info(m::BetaMLModel) -> Any\n\n\nReturn a string-keyed dictionary of \"additional\" information stored during model fitting.\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.inverse_predict-Tuple{BetaMLModel, Any}","page":"The Api module","title":"BetaML.Api.inverse_predict","text":"inverse_predict(m::BetaMLModel,X)\n\nGiven a model m that fitted on x produces xnew, it takes xnew to return (possibly an approximation of ) x.\n\nFor example, when OneHotEncoder is fitted with a subset of the possible categories and the handle_unknown option is set on infrequent, inverse_transform will aggregate all the other categories as specified in other_categories_name.\n\nNotes:\n\nInplemented only in a few models.\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.model_load","page":"The Api module","title":"BetaML.Api.model_load","text":"model_load(filename::AbstractString)\nmodel_load(filename::AbstractString,args::AbstractString...)\n\nLoad from file one or more BetaML models (wheter fitted or not).\n\nNotes:\n\nIf no model names to retrieve are specified it returns a dictionary keyed with the model names\nIf multiple models are demanded, a tuple is returned\nFor further options see the documentation of the function load of the JLD2 package\n\nExamples:\n\njulia> models = model_load(\"fittedModels.jl\"; mod1Name=mod1,mod2)\njulia> mod1 = model_load(\"fittedModels.jl\",mod1)\njulia> (mod1,mod2) = model_load(\"fittedModels.jl\",\"mod1\", \"mod2\")\n\n\n\n\n\n","category":"function"},{"location":"Api.html#BetaML.Api.model_save","page":"The Api module","title":"BetaML.Api.model_save","text":"model_save(filename::AbstractString,overwrite_file::Bool=false;kwargs...)\n\nAllow to save one or more BetaML models (wheter fitted or not), eventually specifying a name for each of them.\n\nParameters:\n\nfilename: Name of the destination file\noverwrite_file: Wheter to overrite the file if it alreaxy exist or preserve it (for the objects different than the one that are going to be saved) [def: false, i.e. preserve the file]\nkwargs: model objects to be saved, eventually associated with a different name to save the mwith (e.g. mod1Name=mod1,mod2) \n\nNotes:\n\nIf an object with the given name already exists on the destination JLD2 file it will be ovenwritten.\nIf the file exists, but it is not a JLD2 file and the option overwrite_file is set to false, an error will be raisen.\nUse the semicolon ; to separate the filename from the model(s) to save\nFor further options see the documentation of the JLD2 package\n\nExamples\n\njulia> model_save(\"fittedModels.jl\"; mod1Name=mod1,mod2)\n\n\n\n\n\n","category":"function"},{"location":"Api.html#BetaML.Api.options-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.options","text":"options(m::BetaMLModel)\n\nReturns the non-learning related options of a BetaML model. See also ?hyperparameters for the parameters that directly affect learning.\n\nwarning: Warning\nThe returned object is a reference, so if it is modified, the relative object in the model will change too.\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.parameters-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.parameters","text":"parameters(m::BetaMLModel)\n\nReturns the learned parameters of a BetaML model.\n\nwarning: Warning\nThe returned object is a reference, so if it is modified, the relative object in the model will change too.\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.predict-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.predict","text":"predict(m::BetaMLModel,[X])\n\nPredict new information (including transformation) based on a fitted BetaMLModel, eventually applied to new features when the algorithm generalises to new data.\n\nNotes:\n\nAs a convenience, if the model has been trained while having the cache option set on true (by default) the predictions associated with the last training of the model is retained in the model object and can be retrieved simply with predict(m).\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.reset!-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.reset!","text":"reset!(m::BetaMLModel)\n\nReset the parameters of a trained model.\n\nNotes:\n\nIn Python and other languages that don't allow the exclamation mark within the function name, use reset_ex(⋅) instead of reset!(⋅)\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.sethp!-Tuple{BetaMLModel, Dict}","page":"The Api module","title":"BetaML.Api.sethp!","text":"sethp!(m::BetaMLModel, hp::Dict)\n\n\nSet the hyperparameters of model m as specified in the hp dictionary.\n\n\n\n\n\n","category":"method"},{"location":"Imputation.html#imputation_module","page":"Imputation","title":"The BetaML.Imputation Module","text":"","category":"section"},{"location":"Imputation.html","page":"Imputation","title":"Imputation","text":"Imputation","category":"page"},{"location":"Imputation.html#BetaML.Imputation","page":"Imputation","title":"BetaML.Imputation","text":"Imputation module\n\nProvide various imputation methods for missing data. Note that the interpretation of \"missing\" can be very wide. For example, reccomendation systems / collaborative filtering (e.g. suggestion of the film to watch) can well be representated as a missing data to impute problem, often with better results than traditional algorithms as k-nearest neighbors (KNN)\n\nProvided imputers:\n\nSimpleImputer: Impute data using the feature (column) mean, optionally normalised by l-norms of the records (rows) (fastest)\nGaussianMixtureImputer: Impute data using a Generative (Gaussian) Mixture Model (good trade off)\nRandomForestImputer: Impute missing data using Random Forests, with optional replicable multiple imputations (most accurate).\nGeneralImputer: Impute missing data using a vector (one per column) of arbitrary learning models (classifiers/regressors) that implement m = Model([options]), fit!(m,X,Y) and predict(m,X) (not necessarily from BetaML).\n\nImputations for all these models can be optained by running mod = ImputatorModel([options]), fit!(mod,X). The data with the missing values imputed can then be obtained with predict(mod). Useinfo(m::Imputer) to retrieve further information concerning the imputation. Trained models can be also used to impute missing values in new data with predict(mox,xNew). Note that if multiple imputations are run (for the supporting imputators) predict() will return a vector of predictions rather than a single one`.\n\nExample\n\njulia> using Statistics, BetaML\n\njulia> X = [2 missing 10; 2000 4000 1000; 2000 4000 10000; 3 5 12 ; 4 8 20; 1 2 5]\n6×3 Matrix{Union{Missing, Int64}}:\n 2 missing 10\n 2000 4000 1000\n 2000 4000 10000\n 3 5 12\n 4 8 20\n 1 2 5\n\njulia> mod = RandomForestImputer(multiple_imputations=10, rng=copy(FIXEDRNG));\n\njulia> fit!(mod,X);\n\njulia> vals = predict(mod)\n10-element Vector{Matrix{Union{Missing, Int64}}}:\n [2 3 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 4 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 4 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 136 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 137 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 4 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 4 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 4 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 137 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 137 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n\njulia> nR,nC = size(vals[1])\n(6, 3)\n\njulia> medianValues = [median([v[r,c] for v in vals]) for r in 1:nR, c in 1:nC]\n6×3 Matrix{Float64}:\n 2.0 4.0 10.0\n 2000.0 4000.0 1000.0\n 2000.0 4000.0 10000.0\n 3.0 5.0 12.0\n 4.0 8.0 20.0\n 1.0 2.0 5.0\n\njulia> infos = info(mod);\n\njulia> infos[\"n_imputed_values\"]\n1\n\n\n\n\n\n","category":"module"},{"location":"Imputation.html#Module-Index","page":"Imputation","title":"Module Index","text":"","category":"section"},{"location":"Imputation.html","page":"Imputation","title":"Imputation","text":"Modules = [Imputation]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Imputation.html#Detailed-API","page":"Imputation","title":"Detailed API","text":"","category":"section"},{"location":"Imputation.html","page":"Imputation","title":"Imputation","text":"Modules = [Imputation]\nPrivate = false","category":"page"},{"location":"Imputation.html#BetaML.Imputation.GaussianMixtureImputer","page":"Imputation","title":"BetaML.Imputation.GaussianMixtureImputer","text":"mutable struct GaussianMixtureImputer <: Imputer\n\nMissing data imputer that uses a Generative (Gaussian) Mixture Model.\n\nFor the parameters (n_classes,mixtures,..) see GaussianMixture_hp.\n\nLimitations:\n\ndata must be numerical\nthe resulted matrix is a Matrix{Float64}\ncurrently the Mixtures available do not support random initialisation for missing imputation, and the rest of the algorithm (Expectation-Maximisation) is deterministic, so there is no random component involved (i.e. no multiple imputations)\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1 2.5; missing 20.5; 0.8 18; 12 22.8; 0.4 missing; 1.6 3.7];\n\njulia> mod = GaussianMixtureImputer(mixtures=[SphericalGaussian() for i in 1:2])\nGaussianMixtureImputer - A Gaussian Mixture Model based imputer (unfitted)\n\njulia> X_full = fit!(mod,X)\nIter. 1: Var. of the post 2.373498171519511 Log-likelihood -29.111866299189792\n6×2 Matrix{Float64}:\n 1.0 2.5\n 6.14905 20.5\n 0.8 18.0\n 12.0 22.8\n 0.4 4.61314\n 1.6 3.7\n\njulia> info(mod)\nDict{String, Any} with 7 entries:\n \"xndims\" => 2\n \"error\" => [2.3735, 0.17527, 0.0283747, 0.0053147, 0.000981885]\n \"AIC\" => 57.798\n \"fitted_records\" => 6\n \"lL\" => -21.899\n \"n_imputed_values\" => 2\n \"BIC\" => 56.3403\n\njulia> parameters(mod)\nBetaML.Imputation.GaussianMixtureImputer_lp (a BetaMLLearnableParametersSet struct)\n- mixtures: AbstractMixture[SphericalGaussian{Float64}([1.0179819950570768, 3.0999990977255845], 0.2865287884295908), SphericalGaussian{Float64}([6.149053737674149, 20.43331198167713], 15.18664378248651)]\n- initial_probmixtures: [0.48544987084082347, 0.5145501291591764]\n- probRecords: [0.9999996039918224 3.9600817749531375e-7; 2.3866922376272767e-229 1.0; … ; 0.9127030246369684 0.08729697536303167; 0.9999965964161501 3.403583849794472e-6]\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.GeneralI_hp","page":"Imputation","title":"BetaML.Imputation.GeneralI_hp","text":"mutable struct GeneralI_hp <: BetaMLHyperParametersSet\n\nHyperparameters for GeneralImputer\n\nParameters:\n\ncols_to_impute: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords \"auto\" (default) or \"all\". With \"auto\" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use \"all\" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.\nestimator: An entimator model (regressor or classifier), with eventually its options (hyper-parameters), to be used to impute the various columns of the matrix. It can also be a cols_to_impute-length vector of different estimators to consider a different estimator for each column (dimension) to impute, for example when some columns are categorical (and will hence require a classifier) and some others are numerical (hence requiring a regressor). [default: nothing, i.e. use BetaML random forests, handling classification and regression jobs automatically].\nmissing_supported: Wheter the estimator(s) used to predict the missing data support itself missing data in the training features (X). If not, when the model for a certain dimension is fitted, dimensions with missing data in the same rows of those where imputation is needed are dropped and then only non-missing rows in the other remaining dimensions are considered. It can be a vector of boolean values to specify this property for each individual estimator or a single booleann value to apply to all the estimators [default: false]\nfit_function: The function used by the estimator(s) to fit the model. It should take as fist argument the model itself, as second argument a matrix representing the features, and as third argument a vector representing the labels. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.fit!]\npredict_function: The function used by the estimator(s) to predict the labels. It should take as fist argument the model itself and as second argument a matrix representing the features. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.predict]\nrecursive_passages: Define the number of times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].\nmultiple_imputations: Determine the number of independent imputation of the whole dataset to make. Note that while independent, the imputations share the same random number generator (RNG).\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.GeneralImputer","page":"Imputation","title":"BetaML.Imputation.GeneralImputer","text":"mutable struct GeneralImputer <: Imputer\n\nImpute missing values using arbitrary learning models.\n\nImpute missing values using any arbitrary learning model (classifier or regressor, not necessarily from BetaML) that implement an interface m = Model([options]), train!(m,X,Y) and predict(m,X). For non-BetaML supervised models the actual training and predict functions must be specified in the fit_function and predict_function parameters respectively. If needed (for example when some columns with missing data are categorical and some numerical) different models can be specified for each column. Multiple imputations and multiple \"passages\" trought the various colums for a single imputation are supported. \n\nSee GeneralI_hp for all the hyper-parameters.\n\nExamples:\n\nUsing BetaML models:\n\njulia> using BetaML\njulia> X = [1.4 2.5 \"a\"; missing 20.5 \"b\"; 0.6 18 missing; 0.7 22.8 \"b\"; 0.4 missing \"b\"; 1.6 3.7 \"a\"]\n6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n missing 20.5 \"b\"\n 0.6 18 missing\n 0.7 22.8 \"b\"\n 0.4 missing \"b\"\n 1.6 3.7 \"a\"\n\n julia> mod = GeneralImputer(recursive_passages=2,multiple_imputations=2)\n GeneralImputer - A imputer based on an arbitrary regressor/classifier(unfitted)\n\n julia> mX_full = fit!(mod,X);\n ** Processing imputation 1\n ** Processing imputation 2\n\n julia> mX_full[1]\n 6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n 0.546722 20.5 \"b\"\n 0.6 18 \"b\"\n 0.7 22.8 \"b\"\n 0.4 19.8061 \"b\"\n 1.6 3.7 \"a\"\n\n julia> mX_full[2]\n 6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n 0.554167 20.5 \"b\"\n 0.6 18 \"b\"\n 0.7 22.8 \"b\"\n 0.4 20.7551 \"b\"\n 1.6 3.7 \"a\"\n \n julia> info(mod)\n Dict{String, Any} with 1 entry:\n \"n_imputed_values\" => 3\n \n\nUsing third party packages (in this example DecisionTree):\n\njulia> using BetaML\njulia> import DecisionTree\njulia> X = [1.4 2.5 \"a\"; missing 20.5 \"b\"; 0.6 18 missing; 0.7 22.8 \"b\"; 0.4 missing \"b\"; 1.6 3.7 \"a\"]\n6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n missing 20.5 \"b\"\n 0.6 18 missing\n 0.7 22.8 \"b\"\n 0.4 missing \"b\"\n 1.6 3.7 \"a\"\njulia> mod = GeneralImputer(estimator=[DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeClassifier()], fit_function = DecisionTree.fit!, predict_function=DecisionTree.predict, recursive_passages=2)\nGeneralImputer - A imputer based on an arbitrary regressor/classifier(unfitted)\njulia> X_full = fit!(mod,X)\n** Processing imputation 1\n6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n 0.94 20.5 \"b\"\n 0.6 18 \"b\"\n 0.7 22.8 \"b\"\n 0.4 13.5 \"b\"\n 1.6 3.7 \"a\"\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.RandomForestI_hp","page":"Imputation","title":"BetaML.Imputation.RandomForestI_hp","text":"mutable struct RandomForestI_hp <: BetaMLHyperParametersSet\n\nHyperparameters for RandomForestImputer\n\nParameters:\n\nrfhpar::Any: For the underlying random forest algorithm parameters (n_trees,max_depth,min_gain,min_records,max_features:,splitting_criterion,β,initialisation_strategy, oob and rng) see RandomForestE_hp for the specific RF algorithm parameters\nforced_categorical_cols::Vector{Int64}: Specify the positions of the integer columns to treat as categorical instead of cardinal. [Default: empty vector (all numerical cols are treated as cardinal by default and the others as categorical)]\nrecursive_passages::Int64: Define the times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].\nmultiple_imputations::Int64: Determine the number of independent imputation of the whole dataset to make. Note that while independent, the imputations share the same random number generator (RNG).\ncols_to_impute::Union{String, Vector{Int64}}: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords \"auto\" (default) or \"all\". With \"auto\" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use \"auto\" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.\n\nExample:\n\njulia>mod = RandomForestImputer(n_trees=20,max_depth=10,recursive_passages=3)\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.RandomForestImputer","page":"Imputation","title":"BetaML.Imputation.RandomForestImputer","text":"mutable struct RandomForestImputer <: Imputer\n\nImpute missing data using Random Forests, with optional replicable multiple imputations. \n\nSee RandomForestI_hp, RandomForestE_hp and BML_options for the parameters.\n\nNotes:\n\nGiven a certain RNG and its status (e.g. RandomForestImputer(...,rng=StableRNG(FIXEDSEED))), the algorithm is completely deterministic, i.e. replicable. \nThe algorithm accepts virtually any kind of data, sortable or not\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.4 2.5 \"a\"; missing 20.5 \"b\"; 0.6 18 missing; 0.7 22.8 \"b\"; 0.4 missing \"b\"; 1.6 3.7 \"a\"]\n6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n missing 20.5 \"b\"\n 0.6 18 missing\n 0.7 22.8 \"b\"\n 0.4 missing \"b\"\n 1.6 3.7 \"a\"\n\njulia> mod = RandomForestImputer(n_trees=20,max_depth=10,recursive_passages=2)\nRandomForestImputer - A Random-Forests based imputer (unfitted)\n\njulia> X_full = fit!(mod,X)\n** Processing imputation 1\n6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n 0.504167 20.5 \"b\"\n 0.6 18 \"b\"\n 0.7 22.8 \"b\"\n 0.4 20.0837 \"b\"\n 1.6 3.7 \"a\"\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.SimpleI_hp","page":"Imputation","title":"BetaML.Imputation.SimpleI_hp","text":"mutable struct SimpleI_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the SimpleImputer model\n\nParameters:\n\nstatistic::Function: The descriptive statistic of the column (feature) to use as imputed value [def: mean]\nnorm::Union{Nothing, Int64}: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.SimpleImputer","page":"Imputation","title":"BetaML.Imputation.SimpleImputer","text":"mutable struct SimpleImputer <: Imputer\n\nSimple imputer using the missing data's feature (column) statistic (def: mean), optionally normalised by l-norms of the records (rows)\n\nParameters:\n\nstatistics: The descriptive statistic of the column (feature) to use as imputed value [def: mean]\nnorm: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries). \n\nLimitations:\n\ndata must be numerical\n\nExample:\n\njulia> using BetaML\n\njulia> X = [2.0 missing 10; 20 40 100]\n2×3 Matrix{Union{Missing, Float64}}:\n 2.0 missing 10.0\n 20.0 40.0 100.0\n\njulia> mod = SimpleImputer(norm=1)\nSimpleImputer - A simple feature-stat based imputer (unfitted)\n\njulia> X_full = fit!(mod,X)\n2×3 Matrix{Float64}:\n 2.0 4.04494 10.0\n 20.0 40.0 100.0\n\njulia> info(mod)\nDict{String, Any} with 1 entry:\n \"n_imputed_values\" => 1\n\njulia> parameters(mod)\nBetaML.Imputation.SimpleImputer_lp (a BetaMLLearnableParametersSet struct)\n- cStats: [11.0, 40.0, 55.0]\n- norms: [6.0, 53.333333333333336]\n\n\n\n\n\n","category":"type"},{"location":"Examples.html#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"Examples.html#Supervised-learning","page":"Examples","title":"Supervised learning","text":"","category":"section"},{"location":"Examples.html#Regression","page":"Examples","title":"Regression","text":"","category":"section"},{"location":"Examples.html#Estimating-the-bike-sharing-demand","page":"Examples","title":"Estimating the bike sharing demand","text":"","category":"section"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"The task is to estimate the influence of several variables (like the weather, the season, the day of the week..) on the demand of shared bicycles, so that the authority in charge of the service can organise the service in the best way.","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"Data origin:","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"original full dataset (by hour, not used here): https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset\nsimplified dataset (by day, with some simple scaling): https://www.hds.utc.fr/~tdenoeux/dokuwiki/en/aec\ndescription: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/exam2019ace.pdf\ndata: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/bikesharing_day.csv.zip","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"Note that even if we are estimating a time serie, we are not using here a recurrent neural network as we assume the temporal dependence to be negligible (i.e. Y_t = f(X_t) alone).","category":"page"},{"location":"Examples.html#Classification","page":"Examples","title":"Classification","text":"","category":"section"},{"location":"Examples.html#Unsupervised-lerarning","page":"Examples","title":"Unsupervised lerarning","text":"","category":"section"},{"location":"Examples.html#Notebooks","page":"Examples","title":"Notebooks","text":"","category":"section"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"The following notebooks provide runnable examples of the package functionality:","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"Pegasus classifiers: [Static notebook] - [myBinder]\nDecision Trees and Random Forest regression on Bike sharing demand forecast (daily data): [Static notebook] - [myBinder]\nNeural Networks: [Static notebook] - [myBinder]\nBike sharing demand forecast (daily data): [Static notebook] - [myBinder]\nClustering: [Static notebook] - [myBinder]","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"Note: the live, runnable computational environment is a temporary new copy made at each connection. The first time after a commit is done on this repository a new environment has to be set (instead of just being copied), and the server may take several minutes.","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"This is only if you are the unlucky user triggering the rebuild of the environment after the commit.","category":"page"},{"location":"Nn.html#nn_module","page":"Nn","title":"The BetaML.Nn Module","text":"","category":"section"},{"location":"Nn.html","page":"Nn","title":"Nn","text":"Nn","category":"page"},{"location":"Nn.html#BetaML.Nn","page":"Nn","title":"BetaML.Nn","text":"BetaML.Nn module\n\nImplement the functionality required to define an artificial Neural Network, train it with data, forecast data and assess its performances.\n\nCommon type of layers and optimisation algorithms are already provided, but you can define your own ones subclassing respectively the AbstractLayer and OptimisationAlgorithm abstract types.\n\nThe module provide the following types or functions. Use ?[type or function] to access their full signature and detailed documentation:\n\nModel definition:\n\nDenseLayer: Classical feed-forward layer with user-defined activation function\nDenseNoBiasLayer: Classical layer without the bias parameter\nVectorFunctionLayer: Layer whose activation function run over the ensable of its nodes rather than on each one individually. No learnable weigths on input, optional learnable weigths as parameters of the activation function.\nScalarFunctionLayer: Layer whose activation function run over each node individually, like a classic DenseLayer, but with no learnable weigths on input and optional learnable weigths as parameters of the activation function.\nReplicatorLayer: Alias for a ScalarFunctionLayer with no learnable parameters and identity as activation function\nReshaperLayer: Reshape the output of a layer (or the input data) to the shape needed for the next one\nPoolingLayer: In the middle between VectorFunctionLayer and ScalarFunctionLayer, it applyes a function to the set of nodes defined in a sliding kernel. Weightless.\nConvLayer: A generic N+1 (channels) dimensional convolutional layer \nGroupedLayer: To stack several layers into a single layer, e.g. for multi-branches networks\nNeuralNetworkEstimator: Build the chained network and define a cost function\n\nEach layer can use a default activation function, one of the functions provided in the Utils module (relu, tanh, softmax,...) or one provided by you. BetaML will try to recognise if it is a \"known\" function for which it sets the exact derivatives, otherwise you can normally provide the layer with it. If the derivative of the activation function is not provided (either manually or automatically), AD will be used and training may be slower, altought this difference tends to vanish with bigger datasets.\n\nYou can alternativly implement your own layer defining a new type as subtype of the abstract type AbstractLayer. Each user-implemented layer must define the following methods:\n\nA suitable constructor\nforward(layer,x)\nbackward(layer,x,next_gradient)\nget_params(layer)\nget_gradient(layer,x,next_gradient)\nset_params!(layer,w)\nsize(layer)\n\nModel fitting:\n\nfit!(nn,X,Y): fitting function\nfitting_info(nn): Default callback function during fitting\nSGD: The classical optimisation algorithm\nADAM: A faster moment-based optimisation algorithm \n\nTo define your own optimisation algorithm define a subtype of OptimisationAlgorithm and implement the function single_update!(θ,▽;opt_alg) and eventually init_optalg!(⋅) specific for it.\n\nModel predictions and assessment:\n\npredict(nn) or predict(nn,X): Return the output given the data\n\nWhile high-level functions operating on the dataset expect it to be in the standard format (nrecords × ndimensions matrices) it is customary to represent the chain of a neural network as a flow of column vectors, so all low-level operations (operating on a single datapoint) expect both the input and the output as a column vector.\n\n\n\n\n\n","category":"module"},{"location":"Nn.html#Module-Index","page":"Nn","title":"Module Index","text":"","category":"section"},{"location":"Nn.html","page":"Nn","title":"Nn","text":"Modules = [Nn]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Nn.html#Detailed-API","page":"Nn","title":"Detailed API","text":"","category":"section"},{"location":"Nn.html","page":"Nn","title":"Nn","text":"Modules = [Nn]\nPrivate = false","category":"page"},{"location":"Nn.html#BetaML.Nn.ADAM","page":"Nn","title":"BetaML.Nn.ADAM","text":"ADAM(;η, λ, β₁, β₂, ϵ)\n\nThe ADAM algorithm, an adaptive moment estimation optimiser.\n\nFields:\n\nη: Learning rate (stepsize, α in the paper), as a function of the current epoch [def: t -> 0.001 (i.e. fixed)]\nλ: Multiplicative constant to the learning rate [def: 1]\nβ₁: Exponential decay rate for the first moment estimate [range: ∈ [0,1], def: 0.9]\nβ₂: Exponential decay rate for the second moment estimate [range: ∈ [0,1], def: 0.999]\nϵ: Epsilon value to avoid division by zero [def: 10^-8]\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.ConvLayer","page":"Nn","title":"BetaML.Nn.ConvLayer","text":"struct ConvLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nA generic N+1 (channels) dimensional convolutional layer\n\nEXPERIMENTAL: Still too slow for practical applications\n\nThis convolutional layer has two constructors, one with the form ConvLayer(input_size,kernel_size,nchannels_in,nchannels_out), and an alternative one as ConvLayer(input_size_with_channel,kernel_size,nchannels_out). If the input is a vector, use a ReshaperLayer in front.\n\nFields:\n\ninput_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Input size (including nchannel_in as last dimension)\noutput_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Output size (including nchannel_out as last dimension)\nweight::Array{WET, NDPLUS2} where {NDPLUS2, WET<:Number}: Weight tensor (aka \"filter\" or \"kernel\") with respect to the input from previous layer or data (kernelsize array augmented by the nchannelsin and nchannels_out dimensions)\nusebias::Bool: Wether to use (and learn) a bias weigth [def: true]\nbias::Vector{WET} where WET<:Number: Bias (nchannels_out array)\npadding_start::StaticArraysCore.SVector{ND, Int64} where ND: Padding (initial)\npadding_end::StaticArraysCore.SVector{ND, Int64} where ND: Padding (ending)\nstride::StaticArraysCore.SVector{ND, Int64} where ND: Stride\nndims::Int64: Number of dimensions (excluding input and output channels)\nf::Function: Activation function\ndf::Union{Nothing, Function}: Derivative of the activation function\nx_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1: x ids of the convolution (computed in preprocessing- itself at the beginning oftrain`\ny_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1: y ids of the convolution (computed in preprocessing- itself at the beginning oftrain`\nw_ids::Array{StaticArraysCore.SVector{NDPLUS2, Int64}, 1} where NDPLUS2: w ids of the convolution (computed in preprocessing- itself at the beginning oftrain`\ny_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1: A y-dims array of vectors of ids of x(s) contributing to the giving y\ny_to_w_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS2}}, 1}, NDPLUS1} where {NDPLUS1, NDPLUS2}: A y-dims array of vectors of corresponding w(s) contributing to the giving y\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.ConvLayer-NTuple{4, Any}","page":"Nn","title":"BetaML.Nn.ConvLayer","text":"ConvLayer(\n input_size,\n kernel_size,\n nchannels_in,\n nchannels_out;\n stride,\n rng,\n padding,\n kernel_eltype,\n kernel_init,\n usebias,\n bias_init,\n f,\n df\n) -> ConvLayer{_A, _B, _C, typeof(identity), _D, Float64} where {_A, _B, _C, _D<:Union{Nothing, Function}}\n\n\nInstantiate a new nD-dimensional, possibly multichannel ConvolutionalLayer\n\nThe input data is either a column vector (in which case is reshaped) or an array of input_size augmented by the n_channels dimension, the output size depends on the input_size, kernel_size, padding and striding but has always nchannels_out as its last dimention. \n\nPositional arguments:\n\ninput_size: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.\nkernel_size: Size of the kernel (aka filter or learnable weights) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.\nnchannels_in: Number of channels in input\nnchannels_out: Number of channels in output\n\nKeyword arguments:\n\nstride: \"Steps\" to move the convolution with across the various tensor dimensions [def: ones]\npadding: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def: nothing, i.e. set the padding required to keep the same dimensions in output (with stride==1)]\nf: Activation function [def: relu]\ndf: Derivative of the activation function [default: try to match a known funcion, AD otherwise. Use nothing to force AD]\nkernel_eltype: Kernel eltype [def: Float64]\nkernel_init: Initial weigths with respect to the input [default: Xavier initialisation]. If explicitly provided, it should be a multidimensional array of kernel_size augmented by nchannels_in and nchannels_out dimensions\nbias_init: Initial weigths with respect to the bias [default: Xavier initialisation]. If given it should be a nchannels_out vector of scalars.\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nXavier initialization is sampled from a Uniform distribution between ⨦ sqrt(6/(prod(input_size)*nchannels_in))\nto retrieve the output size of the layer, use size(ConvLayer[2]). The output size on each dimension d (except the last one that is given by nchannels_out) is given by the following formula (ceiled): output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]\nwith strides higher than 1, the automatic padding is set to keep outsize = inside/stride\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.ConvLayer-Tuple{Any, Any, Any}","page":"Nn","title":"BetaML.Nn.ConvLayer","text":"ConvLayer(\n input_size_with_channel,\n kernel_size,\n nchannels_out;\n stride,\n rng,\n padding,\n kernel_eltype,\n kernel_init,\n usebias,\n bias_init,\n f,\n df\n) -> ConvLayer{_A, _B, _C, typeof(identity), _D, _E} where {_A, _B, _C, _D<:Union{Nothing, Function}, _E<:Number}\n\n\nAlternative constructor for a ConvLayer where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2] if one wish.\n\nFor arguments and default values see the documentation of the main constructor.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.DenseLayer","page":"Nn","title":"BetaML.Nn.DenseLayer","text":"struct DenseLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nRepresentation of a layer in the network\n\nFields:\n\nw: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)\nwb: Biases (n)\nf: Activation function\ndf: Derivative of the activation function\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.DenseLayer-Tuple{Any, Any}","page":"Nn","title":"BetaML.Nn.DenseLayer","text":"DenseLayer(\n nₗ,\n n;\n rng,\n w_eltype,\n w,\n wb,\n f,\n df\n) -> DenseLayer{typeof(identity), _A, Float64} where _A<:Union{Nothing, Function}\n\n\nInstantiate a new DenseLayer\n\nPositional arguments:\n\nnₗ: Number of nodes of the previous layer\nn: Number of nodes\n\nKeyword arguments:\n\nw_eltype: Eltype of the weigths [def: Float64]\nw: Initial weigths with respect to input [default: Xavier initialisation, dims = (n,nₗ)]\nwb: Initial weigths with respect to bias [default: Xavier initialisation, dims = (n)]\nf: Activation function [def: identity]\ndf: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD if f is unknown]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nXavier initialization = rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ+n))\nSpecify df=nothing to explicitly use AD\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.DenseNoBiasLayer","page":"Nn","title":"BetaML.Nn.DenseNoBiasLayer","text":"struct DenseNoBiasLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nRepresentation of a layer without bias in the network\n\nFields:\n\nw: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)\nf: Activation function\ndf: Derivative of the activation function\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.DenseNoBiasLayer-Tuple{Any, Any}","page":"Nn","title":"BetaML.Nn.DenseNoBiasLayer","text":"DenseNoBiasLayer(\n nₗ,\n n;\n rng,\n w_eltype,\n w,\n f,\n df\n) -> DenseNoBiasLayer{typeof(identity), _A, Float64} where _A<:Union{Nothing, Function}\n\n\nInstantiate a new DenseNoBiasLayer\n\nPositional arguments:\n\nnₗ: Number of nodes of the previous layer\nn: Number of nodes\n\nKeyword arguments:\n\nw_eltype: Eltype of the weigths [def: Float64]\nw: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]\nf: Activation function [def: identity]\ndf: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD if f is unknown]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nXavier initialization = rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ,n))\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.GroupedLayer","page":"Nn","title":"BetaML.Nn.GroupedLayer","text":"struct GroupedLayer <: AbstractLayer\n\nRepresentation of a \"group\" of layers, each of which operates on different inputs (features) and acting as a single layer in the network.\n\nFields:\n\nlayers: The individual layers that compose this grouped layer\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.GroupedLayer-Tuple{Any}","page":"Nn","title":"BetaML.Nn.GroupedLayer","text":"GroupedLayer(layers) -> GroupedLayer\n\n\nInstantiate a new GroupedLayer, a layer made up of several other layers stacked together in order to cover all the data dimensions but without connect all the inputs to all the outputs like a single DenseLayer would do.\n\nPositional arguments:\n\nlayers: The individual layers that compose this grouped layer\n\nNotes:\n\ncan be used to create composable neural networks with multiple branches\ntested only with 1 dimensional layers. For convolutional networks use ReshaperLayers before and/or after.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.Learnable","page":"Nn","title":"BetaML.Nn.Learnable","text":"Learnable(data)\n\nStructure representing the learnable parameters of a layer or its gradient.\n\nThe learnable parameters of a layers are given in the form of a N-tuple of Array{Float64,N2} where N2 can change (e.g. we can have a layer with the first parameter being a matrix, and the second one being a scalar). We wrap the tuple on its own structure a bit for some efficiency gain, but above all to define standard mathematic operations on the gradients without doing \"type piracy\" with respect to Base tuples.\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.NeuralNetworkE_hp","page":"Nn","title":"BetaML.Nn.NeuralNetworkE_hp","text":"**`\n\nmutable struct NeuralNetworkE_hp <: BetaMLHyperParametersSet\n\n`**\n\nHyperparameters for the Feedforward neural network model\n\nParameters:\n\nlayers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers\nloss: Loss (cost) function [def: squared_cost] It must always assume y and ŷ as (n x d) matrices, eventually using dropdims inside.\n\ndloss: Derivative of the loss function [def: dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]\nbatch_size: Size of each individual batch [def: 16]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: ADAM()]\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ntunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\nTo know the available layers type subtypes(AbstractLayer)) and then type ?LayerName for information on how to use each layer.\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.NeuralNetworkE_options","page":"Nn","title":"BetaML.Nn.NeuralNetworkE_options","text":"NeuralNetworkE_options\n\nA struct defining the options used by the Feedforward neural network model\n\nParameters:\n\ncache: Cache the results of the fitting stage, as to allow predict(mod) [default: true]. Set it to false to save memory for large data.\ndescr: An optional title and/or description for this model\nverbosity: The verbosity level to be used in training or prediction (see Verbosity) [deafult: STD]\ncb: A call back function to provide information during training [def: fitting_info\nautotune: 0ption for hyper-parameters autotuning [def: false, i.e. not autotuning performed]. If activated, autotuning is performed on the first fit!() call. Controll auto-tuning trough the option tunemethod (see the model hyper-parameters)\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.NeuralNetworkEstimator","page":"Nn","title":"BetaML.Nn.NeuralNetworkEstimator","text":"NeuralNetworkEstimator\n\nA \"feedforward\" (but also multi-branch) neural network (supervised).\n\nFor the parameters see NeuralNetworkE_hp and for the training options NeuralNetworkE_options (we have a few more options for this specific estimator).\n\nNotes:\n\ndata must be numerical\nthe label can be a n-records vector or a n-records by n-dimensions matrix, but the result is always a matrix.\nFor one-dimension regressions drop the unnecessary dimension with dropdims(ŷ,dims=2)\nFor classification tasks the columns should normally be interpreted as the probabilities for each categories\n\nExamples:\n\nClassification...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> ohmod = OneHotEncoder()\nA OneHotEncoder BetaMLModel (unfitted)\n\njulia> y_oh = fit!(ohmod,y)\n6×2 Matrix{Bool}:\n 1 0\n 0 1\n 0 1\n 0 1\n 0 1\n 1 0\n\njulia> layers = [DenseLayer(2,6),DenseLayer(6,2),VectorFunctionLayer(2,f=softmax)];\n\njulia> m = NeuralNetworkEstimator(layers=layers,opt_alg=ADAM(),epochs=300,verbosity=LOW)\nNeuralNetworkEstimator - A Feed-forward neural network (unfitted)\n\njulia> ŷ_prob = fit!(m,X,y_oh)\n***\n*** Training for 300 epochs with algorithm ADAM.\nTraining.. avg ϵ on (Epoch 1 Batch 1): 0.4116936481380642\nTraining of 300 epoch completed. Final epoch error: 0.44308719831108734.\n6×2 Matrix{Float64}:\n 0.853198 0.146802\n 0.0513715 0.948629\n 0.0894273 0.910573\n 0.0367079 0.963292\n 0.00548038 0.99452\n 0.808334 0.191666\n\njulia> ŷ = inverse_predict(ohmod,ŷ_prob)\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n \"a\"\n\nRegression...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = 2 .* X[:,1] .- X[:,2] .+ 3;\n\njulia> layers = [DenseLayer(2,6),DenseLayer(6,6),DenseLayer(6,1)];\n\njulia> m = NeuralNetworkEstimator(layers=layers,opt_alg=ADAM(),epochs=3000,verbosity=LOW)\nNeuralNetworkEstimator - A Feed-forward neural network (unfitted)\n\njulia> ŷ = fit!(m,X,y);\n***\n*** Training for 3000 epochs with algorithm ADAM.\nTraining.. avg ϵ on (Epoch 1 Batch 1): 33.30063874270561\nTraining of 3000 epoch completed. Final epoch error: 34.61265465430473.\n\njulia> hcat(y,ŷ)\n6×2 Matrix{Float64}:\n 4.1 4.11015\n -16.5 -16.5329\n -13.8 -13.8381\n -18.4 -18.3876\n -27.2 -27.1667\n 2.7 2.70542\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.PoolingLayer","page":"Nn","title":"BetaML.Nn.PoolingLayer","text":"struct PoolingLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nRepresentation of a pooling layer in the network (weightless)\n\nEXPERIMENTAL: Still too slow for practical applications\n\nIn the middle between VectorFunctionLayer and ScalarFunctionLayer, it applyes a function to the set of nodes defined in a sliding kernel.\n\nFields:\n\ninput_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Input size (including nchannel_in as last dimension)\noutput_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Output size (including nchannel_out as last dimension)\nkernel_size::StaticArraysCore.SVector{NDPLUS2, Int64} where NDPLUS2: kernelsize augmented by the nchannelsin and nchannels_out dimensions\npadding_start::StaticArraysCore.SVector{ND, Int64} where ND: Padding (initial)\npadding_end::StaticArraysCore.SVector{ND, Int64} where ND: Padding (ending)\nstride::StaticArraysCore.SVector{ND, Int64} where ND: Stride\nndims::Int64: Number of dimensions (excluding input and output channels)\nf::Function: Activation function\ndf::Union{Nothing, Function}: Derivative of the activation function\ny_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1: A y-dims array of vectors of ids of x(s) contributing to the giving y\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.PoolingLayer-Tuple{Any, Any, Any}","page":"Nn","title":"BetaML.Nn.PoolingLayer","text":"PoolingLayer(\n input_size,\n kernel_size,\n nchannels_in;\n stride,\n kernel_eltype,\n padding,\n f,\n df\n) -> PoolingLayer{_A, _B, _C, typeof(maximum), _D, Float64} where {_A, _B, _C, _D<:Union{Nothing, Function}}\n\n\nInstantiate a new nD-dimensional, possibly multichannel PoolingLayer\n\nThe input data is either a column vector (in which case is reshaped) or an array of input_size augmented by the n_channels dimension, the output size depends on the input_size, kernel_size, padding and striding but has always nchannels_out as its last dimention. \n\nPositional arguments:\n\ninput_size: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.\nkernel_eltype: Kernel eltype [def: Float64]\nkernel_size: Size of the kernel (aka filter) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.\nnchannels_in: Number of channels in input\nnchannels_out: Number of channels in output\n\nKeyword arguments:\n\nstride: \"Steps\" to move the convolution with across the various tensor dimensions [def: kernel_size, i.e. each X contributes to a single y]\npadding: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def: nothing, i.e. set the padding required to keep outside = inside / stride ]\nf: Activation function. It should have a vector as input and produce a scalar as output[def: maximum]\ndf: Derivative (gradient) of the activation function for the various inputs. [default: nothing (i.e. use AD)]\n\nNotes:\n\nto retrieve the output size of the layer, use size(PoolLayer[2]). The output size on each dimension d (except the last one that is given by nchannels_out) is given by the following formula (ceiled): output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]\ndifferently from a ConvLayer, the pooling applies always on a single channel level, so that the output has always the same number of channels of the input. If you want to reduce the channels number either use a ConvLayer with the desired number of channels in output or use a ReghaperLayer to add a 1-element further dimension that will be treated as \"channel\" and choose the desided stride for the last pooling dimension (the one that was originally the channel dimension) \n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.PoolingLayer-Tuple{Any, Any}","page":"Nn","title":"BetaML.Nn.PoolingLayer","text":"PoolingLayer(\n input_size_with_channel,\n kernel_size;\n stride,\n padding,\n f,\n kernel_eltype,\n df\n) -> PoolingLayer{_A, _B, _C, typeof(maximum), _D, _E} where {_A, _B, _C, _D<:Union{Nothing, Function}, _E<:Number}\n\n\nAlternative constructor for a PoolingLayer where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2] if one wish.\n\nFor arguments and default values see the documentation of the main constructor.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.ReshaperLayer","page":"Nn","title":"BetaML.Nn.ReshaperLayer","text":"struct ReshaperLayer{NDIN, NDOUT} <: AbstractLayer\n\nRepresentation of a \"reshaper\" (weigthless) layer in the network\n\nReshape the output of a layer (or the input data) to the shape needed for the next one.\n\nFields:\n\ninput_size::StaticArraysCore.SVector{NDIN, Int64} where NDIN: Input size\noutput_size::StaticArraysCore.SVector{NDOUT, Int64} where NDOUT: Output size\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.ReshaperLayer-2","page":"Nn","title":"BetaML.Nn.ReshaperLayer","text":"ReshaperLayer(\n input_size\n) -> ReshaperLayer{_A, _B} where {_A, _B}\nReshaperLayer(\n input_size,\n output_size\n) -> ReshaperLayer{_A, _B} where {_A, _B}\n\n\nInstantiate a new ReshaperLayer\n\nPositional arguments:\n\ninput_size: Shape of the input layer (tuple).\noutput_size: Shape of the input layer (tuple) [def: prod([input_size...])), i.e. reshape to a vector of appropriate lenght].\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.SGD","page":"Nn","title":"BetaML.Nn.SGD","text":"SGD(;η=t -> 1/(1+t), λ=2)\n\nStochastic Gradient Descent algorithm (default)\n\nFields:\n\nη: Learning rate, as a function of the current epoch [def: t -> 1/(1+t)]\nλ: Multiplicative constant to the learning rate [def: 2]\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.ScalarFunctionLayer","page":"Nn","title":"BetaML.Nn.ScalarFunctionLayer","text":"struct ScalarFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nRepresentation of a ScalarFunction layer in the network. ScalarFunctionLayer applies the activation function directly to the output of the previous layer (i.e., without passing for a weigth matrix), but using an optional learnable parameter (an array) used as second argument, similarly to [VectorFunctionLayer(@ref). Differently from VectorFunctionLayer, the function is applied scalarwise to each node. \n\nThe number of nodes in input must be set to the same as in the previous layer\n\nFields:\n\nw: Weigths (parameter) array passes as second argument to the activation function (if not empty)\nn: Number of nodes in output (≡ number of nodes in input )\nf: Activation function (vector)\ndfx: Derivative of the (vector) activation function with respect to the layer inputs (x)\ndfw: Derivative of the (vector) activation function with respect to the optional learnable weigths (w) \n\nNotes:\n\nThe output size of this layer is the same as those of the previous layers.\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.ScalarFunctionLayer-Tuple{Any}","page":"Nn","title":"BetaML.Nn.ScalarFunctionLayer","text":"ScalarFunctionLayer(\n nₗ;\n rng,\n wsize,\n w_eltype,\n w,\n f,\n dfx,\n dfw\n) -> ScalarFunctionLayer{_A, typeof(softmax), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}\n\n\nInstantiate a new ScalarFunctionLayer\n\nPositional arguments:\n\nnₗ: Number of nodes (must be same as in the previous layer)\n\nKeyword arguments:\n\nwsize: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]\nw_eltype: Eltype of the weigths [def: Float64]\nw: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]\nf: Activation function [def: softmax]\ndfx: Derivative of the activation function with respect to the data [default: try to match with well-known derivatives, resort to AD if f is unknown]\ndfw: Derivative of the activation function with respect to the learnable parameter [default: nothing (i.e. use AD)]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nIf the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)\nXavier initialization = rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.VectorFunctionLayer","page":"Nn","title":"BetaML.Nn.VectorFunctionLayer","text":"struct VectorFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nRepresentation of a VectorFunction layer in the network. Vector function layer expects a vector activation function, i.e. a function taking the whole output of the previous layer an input rather than working on a single node as \"normal\" activation functions would do. Useful for example with the SoftMax function in classification or with the pool1D function to implement a \"pool\" layer in 1 dimensions. By default it is weightless, i.e. it doesn't apply any transformation to the output coming from the previous layer except the activation function. However, by passing the parameter wsize (a touple or array - tested only 1D) you can pass the learnable parameter to the activation function too. It is your responsability to be sure the activation function accept only X or also this learnable array (as second argument). The number of nodes in input must be set to the same as in the previous layer (and if you are using this for classification, to the number of classes, i.e. the previous layer must be set equal to the number of classes in the predictions).\n\nFields:\n\nw: Weigths (parameter) array passes as second argument to the activation function (if not empty)\nnₗ: Number of nodes in input (i.e. length of previous layer)\nn: Number of nodes in output (automatically inferred in the constructor)\nf: Activation function (vector)\ndfx: Derivative of the (vector) activation function with respect to the layer inputs (x)\ndfw: Derivative of the (vector) activation function with respect to the optional learnable weigths (w) \n\nNotes:\n\nThe output size of this layer is given by the size of the output function,\n\nthat not necessarily is the same as the previous layers.\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.VectorFunctionLayer-Tuple{Any}","page":"Nn","title":"BetaML.Nn.VectorFunctionLayer","text":"VectorFunctionLayer(\n nₗ;\n rng,\n wsize,\n w_eltype,\n w,\n f,\n dfx,\n dfw,\n dummyDataToTestOutputSize\n) -> VectorFunctionLayer{_A, typeof(softmax), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}\n\n\nInstantiate a new VectorFunctionLayer\n\nPositional arguments:\n\nnₗ: Number of nodes (must be same as in the previous layer)\n\nKeyword arguments:\n\nwsize: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]\nw_eltype: Eltype of the weigths [def: Float64]\nw: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]\nf: Activation function [def: softmax]\ndfx: Derivative of the activation function with respect to the data\n\n[default: try to match with well-known derivatives, resort to AD if f is unknown]\n\ndfw: Derivative of the activation function with respect to the learnable parameter [default: nothing (i.e. use AD)]\ndummyDataToTestOutputSize: Dummy data to test the output size [def:\n\nones(nₗ)]\n\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nIf the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)\nTo avoid recomputing the activation function just to determine its output size, we compute the output size once here in the layer constructor by calling the activation function with dummyDataToTestOutputSize. Feel free to change it if it doesn't match with the activation function you are setting\nXavier initialization = rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#Base.size-Tuple{AbstractLayer}","page":"Nn","title":"Base.size","text":"size(layer)\n\nGet the size of the layers in terms of (size in input, size in output) - both as tuples\n\nNotes:\n\nYou need to use import Base.size before defining this function for your layer\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#Base.size-Tuple{ConvLayer}","page":"Nn","title":"Base.size","text":"size(layer::ConvLayer) -> Tuple{Tuple, Tuple}\n\n\nGet the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#Base.size-Union{Tuple{PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET} where {TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number}}, Tuple{NDPLUS2}, Tuple{NDPLUS1}, Tuple{ND}} where {ND, NDPLUS1, NDPLUS2}","page":"Nn","title":"Base.size","text":"size(\n layer::PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET} where {TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number}\n) -> Tuple{Tuple, Tuple}\n\n\nGet the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.ReplicatorLayer-Tuple{Any}","page":"Nn","title":"BetaML.Nn.ReplicatorLayer","text":"ReplicatorLayer(\n n\n) -> ScalarFunctionLayer{_A, typeof(identity), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}\n\n\nCreate a weigthless layer whose output is equal to the input. \n\nFields:\n\nn: Number of nodes in output (≡ number of nodes in input ) \n\nNotes:\n\nThe output size of this layer is the same as those of the previous layers.\nThis is just an alias for a ScalarFunctionLayer with no weigths and identity function.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.backward-Tuple{AbstractLayer, Any, Any}","page":"Nn","title":"BetaML.Nn.backward","text":"backward(layer,x,next_gradient)\n\nCompute backpropagation for this layer with respect to its inputs\n\nParameters:\n\nlayer: Worker layer\nx: Input to the layer\nnext_gradient: Derivative of the overal loss with respect to the input of the next layer (output of this layer)\n\nReturn:\n\nThe evaluated gradient of the loss with respect to this layer inputs\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.fitting_info-NTuple{5, Any}","page":"Nn","title":"BetaML.Nn.fitting_info","text":"fittinginfo(nn,xbatch,ybatch,x,y;n,batchsize,epochs,epochsran,verbosity,nepoch,n_batch)\n\nDefault callback funtion to display information during training, depending on the verbosity level\n\nParameters:\n\nnn: Worker network\nxbatch: Batch input to the network (batch_size,din)\nybatch: Batch label input (batch_size,dout)\nx: Full input to the network (n_records,din)\ny: Full label input (n_records,dout)\nn: Size of the full training set\nn_batches : Number of baches per epoch\nepochs: Number of epochs defined for the training\nepochs_ran: Number of epochs already ran in previous training sessions\nverbosity: Verbosity level defined for the training (NONE,LOW,STD,HIGH,FULL)\nn_epoch: Counter of the current epoch\nn_batch: Counter of the current batch\n\n#Notes:\n\nReporting of the error (loss of the network) is expensive. Use verbosity=NONE for better performances\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.forward-Tuple{AbstractLayer, Any}","page":"Nn","title":"BetaML.Nn.forward","text":"forward(layer,x)\n\nPredict the output of the layer given the input\n\nParameters:\n\nlayer: Worker layer\nx: Input to the layer\n\nReturn:\n\nAn Array{T,1} of the prediction (even for a scalar)\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.forward-Union{Tuple{WET}, Tuple{TDF}, Tuple{TF}, Tuple{NDPLUS2}, Tuple{NDPLUS1}, Tuple{ND}, Tuple{ConvLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET}, Any}} where {ND, NDPLUS1, NDPLUS2, TF, TDF, WET}","page":"Nn","title":"BetaML.Nn.forward","text":"forward(\n layer::ConvLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET},\n x\n) -> Any\n\n\nCompute forward pass of a ConvLayer\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.forward-Union{Tuple{WET}, Tuple{TDF}, Tuple{TF}, Tuple{NDPLUS2}, Tuple{NDPLUS1}, Tuple{ND}, Tuple{PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET}, Any}} where {ND, NDPLUS1, NDPLUS2, TF, TDF, WET}","page":"Nn","title":"BetaML.Nn.forward","text":"forward(\n layer::PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET},\n x\n) -> Any\n\n\nCompute forward pass of a ConvLayer\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.get_gradient-Tuple{AbstractLayer, Any, Any}","page":"Nn","title":"BetaML.Nn.get_gradient","text":"get_gradient(layer,x,next_gradient)\n\nCompute backpropagation for this layer with respect to the layer weigths\n\nParameters:\n\nlayer: Worker layer\nx: Input to the layer\nnext_gradient: Derivative of the overaall loss with respect to the input of the next layer (output of this layer)\n\nReturn:\n\nThe evaluated gradient of the loss with respect to this layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the get_params() and set_params() functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in its Learnable type.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.get_gradient-Union{Tuple{N2}, Tuple{N1}, Tuple{T2}, Tuple{T}, Tuple{BetaML.Nn.NN, Union{AbstractArray{T, N1}, T}, Union{AbstractArray{T2, N2}, T2}}} where {T<:Number, T2<:Number, N1, N2}","page":"Nn","title":"BetaML.Nn.get_gradient","text":"get_gradient(nn,x,y)\n\nLow level function that retrieve the current gradient of the weigthts (i.e. derivative of the cost with respect to the weigths). Unexported in BetaML >= v0.9\n\nParameters:\n\nnn: Worker network\nx: Input to the network (d,1)\ny: Label input (d,1)\n\n#Notes:\n\nThe output is a vector of tuples of each layer's input weigths and bias weigths\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.get_params-Tuple{AbstractLayer}","page":"Nn","title":"BetaML.Nn.get_params","text":"get_params(layer)\n\nGet the layers current value of its trainable parameters\n\nParameters:\n\nlayer: Worker layer\n\nReturn:\n\nThe current value of the layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the get_gradient() and set_params() functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in its Learnable type.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.get_params-Tuple{BetaML.Nn.NN}","page":"Nn","title":"BetaML.Nn.get_params","text":"get_params(nn)\n\nRetrieve current weigthts\n\nParameters:\n\nnn: Worker network\n\nNotes:\n\nThe output is a vector of tuples of each layer's input weigths and bias weigths\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.init_optalg!-Tuple{ADAM}","page":"Nn","title":"BetaML.Nn.init_optalg!","text":"init_optalg!(opt_alg::ADAM;θ,batch_size,x,y,rng)\n\nInitialize the ADAM algorithm with the parameters m and v as zeros and check parameter bounds\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.init_optalg!-Tuple{BetaML.Nn.OptimisationAlgorithm}","page":"Nn","title":"BetaML.Nn.init_optalg!","text":"initoptalg!(optalg;θ,batch_size,x,y)\n\nInitialize the optimisation algorithm\n\nParameters:\n\nopt_alg: The Optimisation algorithm to use\nθ: Current parameters\nbatch_size: The size of the batch\nx: The training (input) data\ny: The training \"labels\" to match\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nOnly a few optimizers need this function and consequently ovverride it. By default it does nothing, so if you want write your own optimizer and don't need to initialise it, you don't have to override this method\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.preprocess!-Tuple{AbstractLayer}","page":"Nn","title":"BetaML.Nn.preprocess!","text":"preprocess!(layer::AbstractLayer)\n\n\nPreprocess the layer with information known at layer creation (i.e. no data info used)\n\nThis function is used for some layers to cache some computation that doesn't require the data and it is called at the beginning of fit!. For example, it is used in ConvLayer to store the ids of the convolution.\n\nNotes:\n\nas it doesn't depend on data, it is not reset by reset!\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.set_params!-Tuple{AbstractLayer, Any}","page":"Nn","title":"BetaML.Nn.set_params!","text":"set_params!(layer,w)\n\nSet the trainable parameters of the layer with the given values\n\nParameters:\n\nlayer: Worker layer\nw: The new parameters to set (Learnable)\n\nNotes:\n\nThe format of the tuple wrapped by Learnable must be consistent with those of the get_params() and get_gradient() functions.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.set_params!-Tuple{BetaML.Nn.NN, Any}","page":"Nn","title":"BetaML.Nn.set_params!","text":"set_params!(nn,w)\n\nUpdate weigths of the network\n\nParameters:\n\nnn: Worker network\nw: The new weights to set\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.single_update!-Tuple{Any, Any}","page":"Nn","title":"BetaML.Nn.single_update!","text":"singleupdate!(θ,▽;nepoch,nbatch,batchsize,xbatch,ybatch,opt_alg)\n\nPerform the parameters update based on the average batch gradient.\n\nParameters:\n\nθ: Current parameters\n▽: Average gradient of the batch\nn_epoch: Count of current epoch\nn_batch: Count of current batch\nn_batches: Number of batches per epoch\nxbatch: Data associated to the current batch\nybatch: Labels associated to the current batch\nopt_alg: The Optimisation algorithm to use for the update\n\nNotes:\n\nThis function is overridden so that each optimisation algorithm implement their\n\nown version\n\nMost parameters are not used by any optimisation algorithm. They are provided\n\nto support the largest possible class of optimisation algorithms\n\nSome optimisation algorithms may change their internal structure in this function\n\n\n\n\n\n","category":"method"},{"location":"Trees.html#trees_module","page":"Trees","title":"The BetaML.Trees Module","text":"","category":"section"},{"location":"Trees.html","page":"Trees","title":"Trees","text":"Trees","category":"page"},{"location":"Trees.html#BetaML.Trees","page":"Trees","title":"BetaML.Trees","text":"BetaML.Trees module\n\nImplement the DecisionTreeEstimator and RandomForestEstimator models (Decision Trees and Random Forests).\n\nBoth Decision Trees and Random Forests can be used for regression or classification problems, based on the type of the labels (numerical or not). The automatic selection can be overridden with the parameter force_classification=true, typically if labels are integer representing some categories rather than numbers. For classification problems the output of predict is a dictionary with the key being the labels with non-zero probabilitity and the corresponding value its probability; for regression it is a numerical value.\n\nPlease be aware that, differently from most other implementations, the Random Forest algorithm collects and averages the probabilities from the trees, rather than just repording the mode, i.e. no information is lost and the output of the forest classifier is still a PMF.\n\nTo retrieve the prediction with the highest probability use mode over the prediciton returned by the model. Most error/accuracy measures in the Utils BetaML module works diretly with this format.\n\nMissing data and trully unordered types are supported on the features, both on training and on prediction.\n\nThe module provide the following functions. Use ?[type or function] to access their full signature and detailed documentation:\n\nFeatures are expected to be in the standard format (nRecords × nDimensions matrices) and the labels (either categorical or numerical) as a nRecords column vector.\n\nAcknowlegdments: originally based on the Josh Gordon's code\n\n\n\n\n\n","category":"module"},{"location":"Trees.html#Module-Index","page":"Trees","title":"Module Index","text":"","category":"section"},{"location":"Trees.html","page":"Trees","title":"Trees","text":"Modules = [Trees]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Trees.html#Detailed-API","page":"Trees","title":"Detailed API","text":"","category":"section"},{"location":"Trees.html","page":"Trees","title":"Trees","text":"Modules = [Trees]\nPrivate = false","category":"page"},{"location":"Trees.html#BetaML.Trees.DecisionNode","page":"Trees","title":"BetaML.Trees.DecisionNode","text":"DecisionNode(question,trueBranch,falseBranch, depth)\n\nA tree's non-terminal node.\n\nConstructor's arguments and struct members:\n\nquestion: The question asked in this node\ntrueBranch: A reference to the \"true\" branch of the trees\nfalseBranch: A reference to the \"false\" branch of the trees\ndepth: The nodes's depth in the tree\n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.DecisionTreeE_hp","page":"Trees","title":"BetaML.Trees.DecisionTreeE_hp","text":"mutable struct DecisionTreeE_hp <: BetaMLHyperParametersSet\n\nHyperparameters for DecisionTreeEstimator (Decision Tree).\n\nParameters:\n\nmax_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Union{Nothing, Int64}: The maximum number of (random) features to consider at each partitioning [def: nothing, i.e. look at all features]\nforce_classification::Bool: Whether to force a classification task even if the labels are numerical (typically when labels are integers encoding some feature rather than representing a real cardinal measure) [def: false]\nsplitting_criterion::Union{Nothing, Function}: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. Either gini, entropy, variance or a custom function. It can also be an anonymous function.\nfast_algorithm::Bool: Use an experimental faster algoritm for looking up the best split in ordered fields (colums). Currently it brings down the fitting time of an order of magnitude, but predictions are sensibly affected. If used, control the meaning of integer fields with integer_encoded_cols.\ninteger_encoded_cols::Union{Nothing, Vector{Int64}}: A vector of columns positions to specify which integer columns should be treated as encoding of categorical variables insteads of ordered classes/values. [def: nothing, integer columns with less than 20 unique values are considered categorical]. Useful in conjunction with fast_algorithm, little difference otherwise.\ntunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.DecisionTreeEstimator","page":"Trees","title":"BetaML.Trees.DecisionTreeEstimator","text":"mutable struct DecisionTreeEstimator <: BetaMLSupervisedModel\n\nA Decision Tree classifier and regressor (supervised).\n\nDecision Tree works by finding the \"best\" question to split the fitting data (according to the metric specified by the parameter splitting_criterion on the associated labels) untill either all the dataset is separated or a terminal condition is reached. \n\nFor the parameters see ?DecisionTreeE_hp and ?BML_options.\n\nNotes:\n\nOnline fitting (re-fitting with new data) is not supported\nMissing data (in the feature dataset) is supported.\n\nExamples:\n\nClassification...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> mod = DecisionTreeEstimator(max_depth=5)\nDecisionTreeEstimator - A Decision Tree model (unfitted)\n\njulia> ŷ = fit!(mod,X,y) |> mode\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n \"a\"\n\njulia> println(mod)\nDecisionTreeEstimator - A Decision Tree classifier (fitted on 6 records)\nDict{String, Any}(\"job_is_regression\" => 0, \"fitted_records\" => 6, \"max_reached_depth\" => 2, \"avg_depth\" => 2.0, \"xndims\" => 2)\n*** Printing Decision Tree: ***\n\n1. Is col 2 >= 18.0 ?\n--> True : Dict(\"b\" => 1.0)\n--> False: Dict(\"a\" => 1.0)\n\nRegression...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = 2 .* X[:,1] .- X[:,2] .+ 3;\n\njulia> mod = DecisionTreeEstimator(max_depth=10)\nDecisionTreeEstimator - A Decision Tree model (unfitted)\n\njulia> ŷ = fit!(mod,X,y);\n\njulia> hcat(y,ŷ)\n6×2 Matrix{Float64}:\n 4.1 3.4\n -16.5 -17.45\n -13.8 -13.8\n -18.4 -17.45\n -27.2 -27.2\n 2.7 3.4\n\njulia> println(mod)\nDecisionTreeEstimator - A Decision Tree regressor (fitted on 6 records)\nDict{String, Any}(\"job_is_regression\" => 1, \"fitted_records\" => 6, \"max_reached_depth\" => 4, \"avg_depth\" => 3.25, \"xndims\" => 2)\n*** Printing Decision Tree: ***\n\n1. Is col 2 >= 18.0 ?\n--> True :\n 1.2. Is col 2 >= 31.0 ?\n --> True : -27.2\n --> False:\n 1.2.3. Is col 2 >= 20.5 ?\n --> True : -17.450000000000003\n --> False: -13.8\n--> False: 3.3999999999999995\n\nVisualisation...\n\nYou can either text-print or plot a decision tree using the AbstractTree and TreeRecipe package..\n\njulia> println(mod)\nDecisionTreeEstimator - A Decision Tree regressor (fitted on 6 records)\nDict{String, Any}(\"job_is_regression\" => 1, \"fitted_records\" => 6, \"max_reached_depth\" => 4, \"avg_depth\" => 3.25, \"xndims\" => 2)\n*** Printing Decision Tree: ***\n\n1. Is col 2 >= 18.0 ?\n--> True :\n 1.2. Is col 2 >= 31.0 ?\n --> True : -27.2\n --> False:\n 1.2.3. Is col 2 >= 20.5 ?\n --> True : -17.450000000000003\n --> False: -13.8\n--> False: 3.3999999999999995\n\njulia> using Plots, TreeRecipe, AbstractTrees\njulia> featurenames = [\"Something\", \"Som else\"];\njulia> wrapped_tree = wrapdn(dtree, featurenames = featurenames); # featurenames is otional\njulia> print_tree(wrapped_tree)\nSom else >= 18.0?\n├─ Som else >= 31.0?\n│ ├─ -27.2\n│ │ \n│ └─ Som else >= 20.5?\n│ ├─ -17.450000000000003\n│ │ \n│ └─ -13.8\n│ \n└─ 3.3999999999999995\njulia> plot(wrapped_tree) \n\n(Image: DT plot) \n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.InfoNode","page":"Trees","title":"BetaML.Trees.InfoNode","text":"These types are introduced so that additional information currently not present in a DecisionTree-structure – namely the feature names – can be used for visualization.\n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.Leaf","page":"Trees","title":"BetaML.Trees.Leaf","text":"Leaf(y,depth)\n\nA tree's leaf (terminal) node.\n\nConstructor's arguments:\n\ny: The labels assorciated to each record (either numerical or categorical)\ndepth: The nodes's depth in the tree\n\nStruct members:\n\npredictions: Either the relative label's count (i.e. a PMF) or the mean\ndepth: The nodes's depth in the tree\n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.RandomForestE_hp","page":"Trees","title":"BetaML.Trees.RandomForestE_hp","text":"mutable struct RandomForestE_hp <: BetaMLHyperParametersSet\n\nHyperparameters for RandomForestEstimator (Random Forest).\n\nParameters:\n\nn_trees::Int64: Number of (decision) trees in the forest [def: 30]\nmax_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Union{Nothing, Int64}: The maximum number of (random) features to consider when choosing the optimal partition of the dataset [def: nothing, i.e. square root of the dimensions of the training data`]\nforce_classification::Bool: Whether to force a classification task even if the labels are numerical (typically when labels are integers encoding some feature rather than representing a real cardinal measure) [def: false]\nsplitting_criterion::Union{Nothing, Function}: Either gini, entropy or variance. This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. It can be an anonymous function.\nfast_algorithm::Bool: Use an experimental faster algoritm for looking up the best split in ordered fields (colums). Currently it brings down the fitting time of an order of magnitude, but predictions are sensibly affected. If used, control the meaning of integer fields with integer_encoded_cols.\ninteger_encoded_cols::Union{Nothing, Vector{Int64}}: A vector of columns positions to specify which integer columns should be treated as encoding of categorical variables insteads of ordered classes/values. [def: nothing, integer columns with less than 20 unique values are considered categorical]. Useful in conjunction with fast_algorithm, little difference otherwise.\nbeta::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour \"better\" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]\noob::Bool: Wheter to compute the Out-Of-Bag error, an estimation of the validation error (the mismatching error for classification and the relative mean error for regression jobs).\ntunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.RandomForestEstimator","page":"Trees","title":"BetaML.Trees.RandomForestEstimator","text":"mutable struct RandomForestEstimator <: BetaMLSupervisedModel\n\nA Random Forest classifier and regressor (supervised).\n\nRandom forests are ensemble of Decision Trees models (see ?DecisionTreeEstimator).\n\nFor the parameters see ?RandomForestE_hp and ?BML_options.\n\nNotes :\n\nEach individual decision tree is built using bootstrap over the data, i.e. \"sampling N records with replacement\" (hence, some records appear multiple times and some records do not appear in the specific tree training). The maxx_feature injects further variability and reduces the correlation between the forest trees.\nThe predictions of the \"forest\" (using the function predict()) are then the aggregated predictions of the individual trees (from which the name \"bagging\": boostrap aggregating).\nThe performances of each individual trees, as measured using the records they have not being trained with, can then be (optionally) used as weights in the predict function. The parameter beta ≥ 0 regulate the distribution of these weights: larger is β, the greater the importance (hence the weights) attached to the best-performing trees compared to the low-performing ones. Using these weights can significantly improve the forest performances (especially using small forests), however the correct value of beta depends on the problem under exam (and the chosen caratteristics of the random forest estimator) and should be cross-validated to avoid over-fitting.\nNote that training RandomForestEstimator uses multiple threads if these are available. You can check the number of threads available with Threads.nthreads(). To set the number of threads in Julia either set the environmental variable JULIA_NUM_THREADS (before starting Julia) or start Julia with the command line option --threads (most integrated development editors for Julia already set the number of threads to 4).\nOnline fitting (re-fitting with new data) is not supported\nMissing data (in the feature dataset) is supported.\n\nExamples:\n\nClassification...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> mod = RandomForestEstimator(n_trees=5)\nRandomForestEstimator - A 5 trees Random Forest model (unfitted)\n\njulia> ŷ = fit!(mod,X,y) |> mode\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n \"a\"\n\njulia> println(mod)\nRandomForestEstimator - A 5 trees Random Forest classifier (fitted on 6 records)\nDict{String, Any}(\"job_is_regression\" => 0, \"avg_avg_depth\" => 1.8, \"fitted_records\" => 6, \"avg_mmax_reached_depth\" => 1.8, \"oob_errors\" => Inf, \"xndims\" => 2)\n\nRegression...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = 2 .* X[:,1] .- X[:,2] .+ 3;\n\njulia> mod = RandomForestEstimator(n_trees=5)\nRandomForestEstimator - A 5 trees Random Forest model (unfitted)\n\njulia> ŷ = fit!(mod,X,y);\n\njulia> hcat(y,ŷ)\n6×2 Matrix{Float64}:\n 4.1 2.98\n -16.5 -18.37\n -13.8 -14.61\n -18.4 -17.37\n -27.2 -20.78\n 2.7 2.98\n\njulia> println(mod)\nRandomForestEstimator - A 5 trees Random Forest regressor (fitted on 6 records)\nDict{String, Any}(\"job_is_regression\" => 1, \"fitted_records\" => 6, \"avg_avg_depth\" => 2.8833333333333333, \"oob_errors\" => Inf, \"avg_max_reached_depth\" => 3.4, \"xndims\" => 2)\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#utils_module","page":"Utils","title":"The BetaML.Utils Module","text":"","category":"section"},{"location":"Utils.html","page":"Utils","title":"Utils","text":"Utils\n","category":"page"},{"location":"Utils.html#BetaML.Utils","page":"Utils","title":"BetaML.Utils","text":"Utils module\n\nProvide shared utility functions and/or models for various machine learning algorithms.\n\nFor the complete list of functions provided see below. The main ones are:\n\nHelper functions for logging\n\nMost BetaML functions accept a parameter verbosity (choose between NONE, LOW, STD, HIGH or FULL)\nWriting complex code and need to find where something is executed ? Use the macro @codelocation\n\nStochasticity management\n\nUtils provide [FIXEDSEED], [FIXEDRNG] and generate_parallel_rngs. All stochastic functions and models accept a rng parameter. See the \"Getting started\" section in the tutorial for details.\n\nData processing\n\nVarious small and large utilities for helping processing the data, expecially before running a ML algorithm\nIncludes getpermutations, OneHotEncoder, OrdinalEncoder, partition, Scaler, PCAEncoder, AutoEncoder, cross_validation.\nAuto-tuning of hyperparameters is implemented in the supported models by specifying autotune=true and optionally overriding the tunemethod parameters (e.g. for different hyperparameters ranges or different resources available for the tuning). Autotuning is then implemented in the (first) fit! call. Provided autotuning methods: SuccessiveHalvingSearch (default), GridSearch\n\nSamplers\n\nUtilities to sample from data (e.g. for neural network training or for cross-validation)\nInclude the \"generic\" type SamplerWithData, together with the sampler implementation KFold and the function batch\n\nTransformers\n\nFuntions that \"transform\" a single input (that can be also a vector or a matrix)\nIncludes varios NN \"activation\" functions (relu, celu, sigmoid, softmax, pool1d) and their derivatives (d[FunctionName]), but also gini, entropy, variance, BIC, AIC\n\nMeasures\n\nSeveral functions of a pair of parameters (often y and ŷ) to measure the goodness of ŷ, the distance between the two elements of the pair, ...\nIncludes \"classical\" distance functions (l1_distance, l2_distance, l2squared_distance cosine_distance), \"cost\" functions for continuous variables (squared_cost, relative_mean_error) and comparision functions for multi-class variables (crossentropy, accuracy, ConfusionMatrix, silhouette)\nDistances can be used to compute a pairwise distance matrix using the function pairwise\n\n\n\n\n\n","category":"module"},{"location":"Utils.html#Module-Index","page":"Utils","title":"Module Index","text":"","category":"section"},{"location":"Utils.html","page":"Utils","title":"Utils","text":"Modules = [Utils]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Utils.html#Detailed-API","page":"Utils","title":"Detailed API","text":"","category":"section"},{"location":"Utils.html","page":"Utils","title":"Utils","text":"Modules = [Utils]\nPrivate = false","category":"page"},{"location":"Utils.html#BetaML.Utils.AutoE_hp","page":"Utils","title":"BetaML.Utils.AutoE_hp","text":"mutable struct AutoE_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the AutoEncoder transformer\n\nParameters\n\nencoded_size: The desired size of the encoded data, that is the number of dimensions in output or the size of the latent space. This is the number of neurons of the layer sitting between the econding and decoding layers. If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: 0.33]\nlayers_size: Inner layers dimension (i.e. number of neurons). If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: nothing that applies a specific heuristic]. Consider that the underlying neural network is trying to predict multiple values at the same times. Normally this requires many more neurons than a scalar prediction. If e_layers or d_layers are specified, this parameter is ignored for the respective part.\ne_layers: The layers (vector of AbstractLayers) responsable of the encoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]\nd_layers: The layers (vector of AbstractLayers) responsable of the decoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]\nloss: Loss (cost) function [def: squared_cost] It must always assume y and ŷ as (n x d) matrices, eventually using dropdims inside.\n\ndloss: Derivative of the loss function [def: dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]\nbatch_size: Size of each individual batch [def: 8]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: ADAM()]\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ntunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.AutoEncoder","page":"Utils","title":"BetaML.Utils.AutoEncoder","text":"mutable struct AutoEncoder <: BetaMLUnsupervisedModel\n\nPerform a (possibly-non linear) transformation (\"encoding\") of the data into a different space, e.g. for dimensionality reduction using neural network trained to replicate the input data.\n\nA neural network is trained to first transform the data (ofter \"compress\") to a subspace (the output of an inner layer) and then retransform (subsequent layers) to the original data.\n\npredict(mod::AutoEncoder,x) returns the encoded data, inverse_predict(mod::AutoEncoder,xtransformed) performs the decoding.\n\nFor the parameters see AutoE_hp and BML_options \n\nNotes:\n\nAutoEncoder doesn't automatically scale the data. It is suggested to apply the Scaler model before running it. \nMissing data are not supported. Impute them first, see the Imputation module.\nDecoding layers can be optinally choosen (parameter d_layers) in order to suit the kind of data, e.g. a relu activation function for nonegative data\n\nExample:\n\njulia> using BetaML\n\njulia> x = [0.12 0.31 0.29 3.21 0.21;\n 0.22 0.61 0.58 6.43 0.42;\n 0.51 1.47 1.46 16.12 0.99;\n 0.35 0.93 0.91 10.04 0.71;\n 0.44 1.21 1.18 13.54 0.85];\n\njulia> m = AutoEncoder(encoded_size=1,epochs=400)\nA AutoEncoder BetaMLModel (unfitted)\n\njulia> x_reduced = fit!(m,x)\n***\n*** Training for 400 epochs with algorithm ADAM.\nTraining.. avg loss on epoch 1 (1): 60.27802763757111\nTraining.. avg loss on epoch 200 (200): 0.08970099870421573\nTraining.. avg loss on epoch 400 (400): 0.013138484118673664\nTraining of 400 epoch completed. Final epoch error: 0.013138484118673664.\n5×1 Matrix{Float64}:\n -3.5483740608901186\n -6.90396890458868\n -17.06296512222304\n -10.688936344498398\n -14.35734756603212\n\njulia> x̂ = inverse_predict(m,x_reduced)\n5×5 Matrix{Float64}:\n 0.0982406 0.110294 0.264047 3.35501 0.327228\n 0.205628 0.470884 0.558655 6.51042 0.487416\n 0.529785 1.56431 1.45762 16.067 0.971123\n 0.3264 0.878264 0.893584 10.0709 0.667632\n 0.443453 1.2731 1.2182 13.5218 0.842298\n\njulia> info(m)[\"rme\"]\n0.020858783340281222\n\njulia> hcat(x,x̂)\n5×10 Matrix{Float64}:\n 0.12 0.31 0.29 3.21 0.21 0.0982406 0.110294 0.264047 3.35501 0.327228\n 0.22 0.61 0.58 6.43 0.42 0.205628 0.470884 0.558655 6.51042 0.487416\n 0.51 1.47 1.46 16.12 0.99 0.529785 1.56431 1.45762 16.067 0.971123\n 0.35 0.93 0.91 10.04 0.71 0.3264 0.878264 0.893584 10.0709 0.667632\n 0.44 1.21 1.18 13.54 0.85 0.443453 1.2731 1.2182 13.5218 0.842298\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.ConfusionMatrix","page":"Utils","title":"BetaML.Utils.ConfusionMatrix","text":"mutable struct ConfusionMatrix <: BetaMLUnsupervisedModel\n\nCompute a confusion matrix detailing the mismatch between observations and predictions of a categorical variable\n\nFor the parameters see ConfusionMatrix_hp and BML_options.\n\nThe \"predicted\" values are either the scores or the normalised scores (depending on the parameter normalise_scores [def: true]).\n\nNotes:\n\nThe Confusion matrix report can be printed (i.e. print(cm_model). If you plan to print the Confusion Matrix report, be sure that the type of the data in y and ŷ can be converted to String.\nInformation in a structured way is available trought the info(cm) function that returns the following dictionary:\naccuracy: Oveall accuracy rate\nmisclassification: Overall misclassification rate\nactual_count: Array of counts per lebel in the actual data\npredicted_count: Array of counts per label in the predicted data\nscores: Matrix actual (rows) vs predicted (columns)\nnormalised_scores: Normalised scores\ntp: True positive (by class)\ntn: True negative (by class)\nfp: False positive (by class)\nfn: False negative (by class)\nprecision: True class i over predicted class i (by class)\nrecall: Predicted class i over true class i (by class)\nspecificity: Predicted not class i over true not class i (by class)\nf1score: Harmonic mean of precision and recall\nmean_precision: Mean by class, respectively unweighted and weighted by actual_count\nmean_recall: Mean by class, respectively unweighted and weighted by actual_count\nmean_specificity: Mean by class, respectively unweighted and weighted by actual_count\nmean_f1score: Mean by class, respectively unweighted and weighted by actual_count\ncategories: The categories considered\nfitted_records: Number of records considered\nn_categories: Number of categories considered\n\nExample:\n\nThe confusion matrix can also be plotted, e.g.:\n\njulia> using Plots, BetaML\n\njulia> y = [\"apple\",\"mandarin\",\"clementine\",\"clementine\",\"mandarin\",\"apple\",\"clementine\",\"clementine\",\"apple\",\"mandarin\",\"clementine\"];\n\njulia> ŷ = [\"apple\",\"mandarin\",\"clementine\",\"mandarin\",\"mandarin\",\"apple\",\"clementine\",\"clementine\",missing,\"clementine\",\"clementine\"];\n\njulia> cm = ConfusionMatrix(handle_missing=\"drop\")\nA ConfusionMatrix BetaMLModel (unfitted)\n\njulia> normalised_scores = fit!(cm,y,ŷ)\n3×3 Matrix{Float64}:\n 1.0 0.0 0.0\n 0.0 0.666667 0.333333\n 0.0 0.2 0.8\n\njulia> println(cm)\nA ConfusionMatrix BetaMLModel (fitted)\n\n-----------------------------------------------------------------\n\n*** CONFUSION MATRIX ***\n\nScores actual (rows) vs predicted (columns):\n\n4×4 Matrix{Any}:\n \"Labels\" \"apple\" \"mandarin\" \"clementine\"\n \"apple\" 2 0 0\n \"mandarin\" 0 2 1\n \"clementine\" 0 1 4\nNormalised scores actual (rows) vs predicted (columns):\n\n4×4 Matrix{Any}:\n \"Labels\" \"apple\" \"mandarin\" \"clementine\"\n \"apple\" 1.0 0.0 0.0\n \"mandarin\" 0.0 0.666667 0.333333\n \"clementine\" 0.0 0.2 0.8\n\n *** CONFUSION REPORT ***\n\n- Accuracy: 0.8\n- Misclassification rate: 0.19999999999999996\n- Number of classes: 3\n\n N Class precision recall specificity f1score actual_count predicted_count\n TPR TNR support \n\n 1 apple 1.000 1.000 1.000 1.000 2 2\n 2 mandarin 0.667 0.667 0.857 0.667 3 3\n 3 clementine 0.800 0.800 0.800 0.800 5 5\n\n- Simple avg. 0.822 0.822 0.886 0.822\n- Weigthed avg. 0.800 0.800 0.857 0.800\n\n-----------------------------------------------------------------\nOutput of `info(cm)`:\n- mean_precision: (0.8222222222222223, 0.8)\n- fitted_records: 10\n- specificity: [1.0, 0.8571428571428571, 0.8]\n- precision: [1.0, 0.6666666666666666, 0.8]\n- misclassification: 0.19999999999999996\n- mean_recall: (0.8222222222222223, 0.8)\n- n_categories: 3\n- normalised_scores: [1.0 0.0 0.0; 0.0 0.6666666666666666 0.3333333333333333; 0.0 0.2 0.8]\n- tn: [8, 6, 4]\n- mean_f1score: (0.8222222222222223, 0.8)\n- actual_count: [2, 3, 5]\n- accuracy: 0.8\n- recall: [1.0, 0.6666666666666666, 0.8]\n- f1score: [1.0, 0.6666666666666666, 0.8]\n- mean_specificity: (0.8857142857142858, 0.8571428571428571)\n- predicted_count: [2, 3, 5]\n- scores: [2 0 0; 0 2 1; 0 1 4]\n- tp: [2, 2, 4]\n- fn: [0, 1, 1]\n- categories: [\"apple\", \"mandarin\", \"clementine\"]\n- fp: [0, 1, 1]\n\njulia> res = info(cm);\n\njulia> heatmap(string.(res[\"categories\"]),string.(res[\"categories\"]),res[\"normalised_scores\"],seriescolor=cgrad([:white,:blue]),xlabel=\"Predicted\",ylabel=\"Actual\", title=\"Confusion Matrix (normalised scores)\")\n\n(Image: CM plot) \n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.ConfusionMatrix_hp","page":"Utils","title":"BetaML.Utils.ConfusionMatrix_hp","text":"mutable struct ConfusionMatrix_hp <: BetaMLHyperParametersSet\n\nHyperparameters for ConfusionMatrix\n\nParameters:\n\ncategories: The categories (aka \"levels\") to represent. [def: nothing, i.e. unique ground true values].\nhandle_unknown: How to handle categories not seen in the ground true values or not present in the provided categories array? \"error\" (default) rises an error, \"infrequent\" adds a specific category for these values.\nhandle_missing: How to handle missing values in either ground true or predicted values ? \"error\" [default] will rise an error, \"drop\" will drop the record\nother_categories_name: Which value to assign to the \"other\" category (i.e. categories not seen in the gound truth or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and \"other\" for other types]. This setting is active only if handle_unknown=\"infrequent\" and in that case it MUST be specified if the vector to one-hot encode is neither integer or strings\ncategories_names: A dictionary to map categories to some custom names. Useful for example if categories are integers, or you want to use shorter names [def: Dict(), i.e. not used]. This option isn't currently compatible with missing values or when some record has a value not in this provided dictionary.\nnormalise_scores: Wether predict should return the normalised scores. Note that both unnormalised and normalised scores remain available using info. [def: true]\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.GridSearch","page":"Utils","title":"BetaML.Utils.GridSearch","text":"mutable struct GridSearch <: AutoTuneMethod\n\nSimple grid method for hyper-parameters validation of supervised models.\n\nAll parameters are tested using cross-validation and then the \"best\" combination is used. \n\nNotes:\n\nthe default loss is suitable for 1-dimensional output supervised models\n\nParameters:\n\nloss::Function: Loss function to use. [def: l2loss_by_cv]. Any function that takes a model, data (a vector of arrays, even if we work only with X) and (using therng` keyword) a RNG and return a scalar loss.\nres_share::Float64: Share of the (data) resources to use for the autotuning [def: 0.1]. With res_share=1 all the dataset is used for autotuning, it can be very time consuming!\nhpranges::Dict{String, Any}: Dictionary of parameter names (String) and associated vector of values to test. Note that you can easily sample these values from a distribution with rand(distrobject,nvalues). The number of points you provide for a given parameter can be interpreted as proportional to the prior you have on the importance of that parameter for the algorithm quality.\nmultithreads::Bool: Use multithreads in the search for the best hyperparameters [def: false]\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.KFold","page":"Utils","title":"BetaML.Utils.KFold","text":"KFold(nsplits=5,nrepeats=1,shuffle=true,rng=Random.GLOBAL_RNG)\n\nIterator for k-fold cross_validation strategy.\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.MinMaxScaler","page":"Utils","title":"BetaML.Utils.MinMaxScaler","text":"mutable struct MinMaxScaler <: BetaML.Utils.AbstractScaler\n\nScale the data to a given (def: unit) hypercube\n\nParameters:\n\ninputRange: The range of the input. [def: (minimum,maximum)]. Both ranges are functions of the data. You can consider other relative of absolute ranges using e.g. inputRange=(x->minimum(x)*0.8,x->100)\noutputRange: The range of the scaled output [def: (0,1)]\n\nExample:\n\njulia> using BetaML\n\njulia> x = [[4000,1000,2000,3000] [\"a\", \"categorical\", \"variable\", \"not to scale\"] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]\n4×4 Matrix{Any}:\n 4000 \"a\" 4 0.4\n 1000 \"categorical\" 1 0.1\n 2000 \"variable\" 2 0.2\n 3000 \"not to scale\" 3 0.3\n\njulia> mod = Scaler(MinMaxScaler(outputRange=(0,10)), skip=[2])\nA Scaler BetaMLModel (unfitted)\n\njulia> xscaled = fit!(mod,x)\n4×4 Matrix{Any}:\n 10.0 \"a\" 10.0 10.0\n 0.0 \"categorical\" 0.0 0.0\n 3.33333 \"variable\" 3.33333 3.33333\n 6.66667 \"not to scale\" 6.66667 6.66667\n\njulia> xback = inverse_predict(mod, xscaled)\n4×4 Matrix{Any}:\n 4000.0 \"a\" 4.0 0.4\n 1000.0 \"categorical\" 1.0 0.1\n 2000.0 \"variable\" 2.0 0.2\n 3000.0 \"not to scale\" 3.0 0.3\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.OneHotE_hp","page":"Utils","title":"BetaML.Utils.OneHotE_hp","text":"mutable struct OneHotE_hp <: BetaMLHyperParametersSet\n\nHyperparameters for both OneHotEncoder and OrdinalEncoder\n\nParameters:\n\ncategories: The categories to represent as columns. [def: nothing, i.e. unique training values or range for integers]. Do not include missing in this list.\nhandle_unknown: How to handle categories not seen in training or not present in the provided categories array? \"error\" (default) rises an error, \"missing\" labels the whole output with missing values, \"infrequent\" adds a specific column for these categories in one-hot encoding or a single new category for ordinal one.\nother_categories_name: Which value during inverse transformation to assign to the \"other\" category (i.e. categories not seen on training or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and \"other\" for other types]. This setting is active only if handle_unknown=\"infrequent\" and in that case it MUST be specified if the vector to one-hot encode is neither integer or strings\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.OneHotEncoder","page":"Utils","title":"BetaML.Utils.OneHotEncoder","text":"mutable struct OneHotEncoder <: BetaMLUnsupervisedModel\n\nEncode a vector of categorical values as one-hot columns.\n\nThe algorithm distinguishes between missing values, for which it returns a one-hot encoded row of missing values, and other categories not in the provided list or not seen during training that are handled according to the handle_unknown parameter. \n\nFor the parameters see OneHotE_hp and BML_options. This model supports inverse_predict.\n\nExample:\n\njulia> using BetaML\n\njulia> x = [\"a\",\"d\",\"e\",\"c\",\"d\"];\n\njulia> mod = OneHotEncoder(handle_unknown=\"infrequent\",other_categories_name=\"zz\")\nA OneHotEncoder BetaMLModel (unfitted)\n\njulia> x_oh = fit!(mod,x) # last col is for the \"infrequent\" category\n5×5 Matrix{Bool}:\n 1 0 0 0 0\n 0 1 0 0 0\n 0 0 1 0 0\n 0 0 0 1 0\n 0 1 0 0 0\n\njulia> x2 = [\"a\",\"b\",\"c\"];\n\njulia> x2_oh = predict(mod,x2)\n3×5 Matrix{Bool}:\n 1 0 0 0 0\n 0 0 0 0 1\n 0 0 0 1 0\n\njulia> x2_back = inverse_predict(mod,x2_oh)\n3-element Vector{String}:\n \"a\"\n \"zz\"\n \"c\"\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.OrdinalEncoder","page":"Utils","title":"BetaML.Utils.OrdinalEncoder","text":"mutable struct OrdinalEncoder <: BetaMLUnsupervisedModel\n\nEncode a vector of categorical values as integers.\n\nThe algorithm distinguishes between missing values, for which it propagate the missing, and other categories not in the provided list or not seen during training that are handled according to the handle_unknown parameter. \n\nFor the parameters see OneHotE_hp and BML_options. This model supports inverse_predict.\n\nExample:\n\njulia> using BetaML\n\njulia> x = [\"a\",\"d\",\"e\",\"c\",\"d\"];\n\njulia> mod = OrdinalEncoder(handle_unknown=\"infrequent\",other_categories_name=\"zz\")\nA OrdinalEncoder BetaMLModel (unfitted)\n\njulia> x_int = fit!(mod,x)\n5-element Vector{Int64}:\n 1\n 2\n 3\n 4\n 2\n\njulia> x2 = [\"a\",\"b\",\"c\",\"g\"];\n\njulia> x2_int = predict(mod,x2) # 5 is for the \"infrequent\" category\n4-element Vector{Int64}:\n 1\n 5\n 4\n 5\n\njulia> x2_back = inverse_predict(mod,x2_oh)\n4-element Vector{String}:\n \"a\"\n \"zz\"\n \"c\"\n \"zz\"\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.PCAE_hp","page":"Utils","title":"BetaML.Utils.PCAE_hp","text":"mutable struct PCAE_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the PCAEncoder transformer\n\nParameters\n\nencoded_size: The size, that is the number of dimensions, to maintain (with encoded_size <= size(X,2) ) [def: nothing, i.e. the number of output dimensions is determined from the parameter max_unexplained_var]\nmax_unexplained_var: The maximum proportion of variance that we are willing to accept when reducing the number of dimensions in our data [def: 0.05]. It doesn't have any effect when the output number of dimensions is explicitly chosen with the parameter encoded_size\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.PCAEncoder","page":"Utils","title":"BetaML.Utils.PCAEncoder","text":"mutable struct PCAEncoder <: BetaMLUnsupervisedModel\n\nPerform a Principal Component Analysis, a dimensionality reduction tecnique employing a linear trasformation of the original matrix by the eigenvectors of the covariance matrix.\n\nPCAEncoder returns the matrix reprojected among the dimensions of maximum variance.\n\nFor the parameters see PCAE_hp and BML_options \n\nNotes:\n\nPCAEncoder doesn't automatically scale the data. It is suggested to apply the Scaler model before running it. \nMissing data are not supported. Impute them first, see the Imputation module.\nIf one doesn't know a priori the maximum unexplained variance that he is willling to accept, nor the wished number of dimensions, he can run the model with all the dimensions in output (i.e. with encoded_size=size(X,2)), analise the proportions of explained cumulative variance by dimensions in info(mod,\"\"explained_var_by_dim\"), choose the number of dimensions K according to his needs and finally pick from the reprojected matrix only the number of dimensions required, i.e. out.X[:,1:K].\n\nExample:\n\njulia> using BetaML\n\njulia> xtrain = [1 10 100; 1.1 15 120; 0.95 23 90; 0.99 17 120; 1.05 8 90; 1.1 12 95];\n\njulia> mod = PCAEncoder(max_unexplained_var=0.05)\nA PCAEncoder BetaMLModel (unfitted)\n\njulia> xtrain_reproj = fit!(mod,xtrain)\n6×2 Matrix{Float64}:\n 100.449 3.1783\n 120.743 6.80764\n 91.3551 16.8275\n 120.878 8.80372\n 90.3363 1.86179\n 95.5965 5.51254\n\njulia> info(mod)\nDict{String, Any} with 5 entries:\n \"explained_var_by_dim\" => [0.873992, 0.999989, 1.0]\n \"fitted_records\" => 6\n \"prop_explained_var\" => 0.999989\n \"retained_dims\" => 2\n \"xndims\" => 3\n\njulia> xtest = [2 20 200];\n\njulia> xtest_reproj = predict(mod,xtest)\n1×2 Matrix{Float64}:\n 200.898 6.3566\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.SamplerWithData","page":"Utils","title":"BetaML.Utils.SamplerWithData","text":"SamplerWithData{Tsampler}\n\nAssociate an instance of an AbstractDataSampler with the actual data to sample.\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.Scaler","page":"Utils","title":"BetaML.Utils.Scaler","text":"mutable struct Scaler <: BetaMLUnsupervisedModel\n\nScale the data according to the specific chosen method (def: StandardScaler) \n\nFor the parameters see Scaler_hp and BML_options \n\nExamples:\n\nStandard scaler (default)...\n\njulia> using BetaML, Statistics\n\njulia> x = [[4000,1000,2000,3000] [400,100,200,300] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]\n4×4 Matrix{Float64}:\n 4000.0 400.0 4.0 0.4\n 1000.0 100.0 1.0 0.1\n 2000.0 200.0 2.0 0.2\n 3000.0 300.0 3.0 0.3\n\njulia> mod = Scaler() # equiv to `Scaler(StandardScaler(scale=true, center=true))`\nA Scaler BetaMLModel (unfitted)\n\njulia> xscaled = fit!(mod,x)\n4×4 Matrix{Float64}:\n 1.34164 1.34164 1.34164 1.34164\n -1.34164 -1.34164 -1.34164 -1.34164\n -0.447214 -0.447214 -0.447214 -0.447214\n 0.447214 0.447214 0.447214 0.447214\n\njulia> col_means = mean(xscaled, dims=1)\n1×4 Matrix{Float64}:\n 0.0 0.0 0.0 5.55112e-17\n\njulia> col_var = var(xscaled, dims=1, corrected=false)\n1×4 Matrix{Float64}:\n 1.0 1.0 1.0 1.0\n\njulia> xback = inverse_predict(mod, xscaled)\n4×4 Matrix{Float64}:\n 4000.0 400.0 4.0 0.4\n 1000.0 100.0 1.0 0.1\n 2000.0 200.0 2.0 0.2\n 3000.0 300.0 3.0 0.3\n\nMin-max scaler...\n\njulia> using BetaML\n\njulia> x = [[4000,1000,2000,3000] [\"a\", \"categorical\", \"variable\", \"not to scale\"] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]\n4×4 Matrix{Any}:\n 4000 \"a\" 4 0.4\n 1000 \"categorical\" 1 0.1\n 2000 \"variable\" 2 0.2\n 3000 \"not to scale\" 3 0.3\n\njulia> mod = Scaler(MinMaxScaler(outputRange=(0,10)),skip=[2])\nA Scaler BetaMLModel (unfitted)\n\njulia> xscaled = fit!(mod,x)\n4×4 Matrix{Any}:\n 10.0 \"a\" 10.0 10.0\n 0.0 \"categorical\" 0.0 0.0\n 3.33333 \"variable\" 3.33333 3.33333\n 6.66667 \"not to scale\" 6.66667 6.66667\n\njulia> xback = inverse_predict(mod,xscaled)\n4×4 Matrix{Any}:\n 4000.0 \"a\" 4.0 0.4\n 1000.0 \"categorical\" 1.0 0.1\n 2000.0 \"variable\" 2.0 0.2\n 3000.0 \"not to scale\" 3.0 0.3\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.Scaler_hp","page":"Utils","title":"BetaML.Utils.Scaler_hp","text":"mutable struct Scaler_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the Scaler transformer\n\nParameters\n\nmethod: The specific scaler method to employ with its own parameters. See StandardScaler [def] or MinMaxScaler.\nskip: The positional ids of the columns to skip scaling (eg. categorical columns, dummies,...) [def: []]\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.StandardScaler","page":"Utils","title":"BetaML.Utils.StandardScaler","text":"mutable struct StandardScaler <: BetaML.Utils.AbstractScaler\n\nStandardise the input to zero mean and unit standard deviation, aka \"Z-score\". Note that missing values are skipped.\n\nParameters:\n\nscale: Scale to unit variance [def: true]\ncenter: Center to zero mean [def: true]\n\nExample:\n\njulia> using BetaML, Statistics\n\njulia> x = [[4000,1000,2000,3000] [400,100,200,300] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]\n4×4 Matrix{Float64}:\n 4000.0 400.0 4.0 0.4\n 1000.0 100.0 1.0 0.1\n 2000.0 200.0 2.0 0.2\n 3000.0 300.0 3.0 0.3\n\njulia> mod = Scaler() # equiv to `Scaler(StandardScaler(scale=true, center=true))`\nA Scaler BetaMLModel (unfitted)\n\njulia> xscaled = fit!(mod,x)\n4×4 Matrix{Float64}:\n 1.34164 1.34164 1.34164 1.34164\n -1.34164 -1.34164 -1.34164 -1.34164\n -0.447214 -0.447214 -0.447214 -0.447214\n 0.447214 0.447214 0.447214 0.447214\n\njulia> col_means = mean(xscaled, dims=1)\n1×4 Matrix{Float64}:\n 0.0 0.0 0.0 5.55112e-17\n\njulia> col_var = var(xscaled, dims=1, corrected=false)\n1×4 Matrix{Float64}:\n 1.0 1.0 1.0 1.0\n\njulia> xback = inverse_predict(mod, xscaled)\n4×4 Matrix{Float64}:\n 4000.0 400.0 4.0 0.4\n 1000.0 100.0 1.0 0.1\n 2000.0 200.0 2.0 0.2\n 3000.0 300.0 3.0 0.3\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.SuccessiveHalvingSearch","page":"Utils","title":"BetaML.Utils.SuccessiveHalvingSearch","text":"mutable struct SuccessiveHalvingSearch <: AutoTuneMethod\n\nHyper-parameters validation of supervised models that search the parameters space trouth successive halving\n\nAll parameters are tested on a small sub-sample, then the \"best\" combinations are kept for a second round that use more samples and so on untill only one hyperparameter combination is left.\n\nNotes:\n\nthe default loss is suitable for 1-dimensional output supervised models, and applies itself cross-validation. Any function that accepts a model, some data and return a scalar loss can be used\nthe rate at which the potential candidate combinations of hyperparameters shrink is controlled by the number of data shares defined in res_shared (i.e. the epochs): more epochs are choosen, lower the \"shrink\" coefficient\n\nParameters:\n\nloss::Function: Loss function to use. [def: l2loss_by_cv]. Any function that takes a model, data (a vector of arrays, even if we work only with X) and (using therng` keyword) a RNG and return a scalar loss.\nres_shares::Vector{Float64}: Shares of the (data) resources to use for the autotuning in the successive iterations [def: [0.05, 0.2, 0.3]]. With res_share=1 all the dataset is used for autotuning, it can be very time consuming! The number of models is reduced of the same share in order to arrive with a single model. Increase the number of res_shares in order to increase the number of models kept at each iteration.\n\nhpranges::Dict{String, Any}: Dictionary of parameter names (String) and associated vector of values to test. Note that you can easily sample these values from a distribution with rand(distrobject,nvalues). The number of points you provide for a given parameter can be interpreted as proportional to the prior you have on the importance of that parameter for the algorithm quality.\nmultithreads::Bool: Use multiple threads in the search for the best hyperparameters [def: false]\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#Base.error-Union{Tuple{T}, Tuple{AbstractVector{T}, AbstractVector{T}}} where T","page":"Utils","title":"Base.error","text":"error(y,ŷ;ignorelabels=false) - Categorical error (T vs T)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#Base.error-Union{Tuple{T}, Tuple{Int64, Vector{T}}} where T<:Number","page":"Utils","title":"Base.error","text":"error(y,ŷ) - Categorical error with probabilistic prediction of a single datapoint (Int vs PMF). \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#Base.error-Union{Tuple{T}, Tuple{Vector{Int64}, Matrix{T}}} where T<:Number","page":"Utils","title":"Base.error","text":"error(y,ŷ) - Categorical error with probabilistic predictions of a dataset (Int vs PMF). \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#Base.error-Union{Tuple{T}, Tuple{Vector{T}, Array{Dict{T, Float64}, 1}}} where T","page":"Utils","title":"Base.error","text":"error(y,ŷ) - Categorical error with with probabilistic predictions of a dataset given in terms of a dictionary of probabilities (T vs Dict{T,Float64}). \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#Base.reshape-Union{Tuple{T}, Tuple{T, Vararg{Any, N} where N}} where T<:Number","page":"Utils","title":"Base.reshape","text":"reshape(myNumber, dims..) - Reshape a number as a n dimensional Array \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.accuracy-Union{Tuple{T}, Tuple{AbstractVector{Int64}, AbstractMatrix{T}}} where T<:Number","page":"Utils","title":"BetaML.Utils.accuracy","text":"accuracy(y,ŷ;tol,ignorelabels)\n\nCategorical accuracy with probabilistic predictions of a dataset (PMF vs Int).\n\nParameters:\n\ny: The N array with the correct category for each point n.\nŷ: An (N,K) matrix of probabilities that each hat y_n record with n in 1N being of category k with k in 1K.\ntol: The tollerance to the prediction, i.e. if considering \"correct\" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].\nignorelabels: Whether to ignore the specific label order in y. Useful for unsupervised learning algorithms where the specific label order don't make sense [def: false]\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.accuracy-Union{Tuple{T}, Tuple{AbstractVector{T}, AbstractArray{Dict{T, Float64}, 1}}} where T","page":"Utils","title":"BetaML.Utils.accuracy","text":"accuracy(y,ŷ;tol)\n\nCategorical accuracy with probabilistic predictions of a dataset given in terms of a dictionary of probabilities (Dict{T,Float64} vs T).\n\nParameters:\n\nŷ: An array where each item is the estimated probability mass function in terms of a Dictionary(Item1 => Prob1, Item2 => Prob2, ...)\ny: The N array with the correct category for each point n.\ntol: The tollerance to the prediction, i.e. if considering \"correct\" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.accuracy-Union{Tuple{T}, Tuple{AbstractVector{T}, AbstractVector{T}}} where T","page":"Utils","title":"BetaML.Utils.accuracy","text":"accuracy(ŷ,y;ignorelabels=false) - Categorical accuracy between two vectors (T vs T). \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.accuracy-Union{Tuple{T}, Tuple{Int64, AbstractVector{T}}} where T<:Number","page":"Utils","title":"BetaML.Utils.accuracy","text":"accuracy(y,ŷ;tol)\n\nCategorical accuracy with probabilistic prediction of a single datapoint (PMF vs Int).\n\nUse the parameter tol [def: 1] to determine the tollerance of the prediction, i.e. if considering \"correct\" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.accuracy-Union{Tuple{T}, Tuple{T, AbstractDict{T, Float64}}} where T","page":"Utils","title":"BetaML.Utils.accuracy","text":"accuracy(y,ŷ;tol)\n\nCategorical accuracy with probabilistic prediction of a single datapoint given in terms of a dictionary of probabilities (Dict{T,Float64} vs T).\n\nParameters:\n\nŷ: The returned probability mass function in terms of a Dictionary(Item1 => Prob1, Item2 => Prob2, ...)\ntol: The tollerance to the prediction, i.e. if considering \"correct\" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.aic-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.aic","text":"aic(lL,k) - Akaike information criterion (lower is better)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.autojacobian-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.autojacobian","text":"autojacobian(f,x;nY)\n\nEvaluate the Jacobian using AD in the form of a (nY,nX) matrix of first derivatives\n\nParameters:\n\nf: The function to compute the Jacobian\nx: The input to the function where the jacobian has to be computed\nnY: The number of outputs of the function f [def: length(f(x))]\n\nReturn values:\n\nAn Array{Float64,2} of the locally evaluated Jacobian\n\nNotes:\n\nThe nY parameter is optional. If provided it avoids having to compute f(x)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.autotune!-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.autotune!","text":"autotune!(m, data) -> Any\n\n\nHyperparameter autotuning.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.batch-Tuple{Integer, Integer}","page":"Utils","title":"BetaML.Utils.batch","text":"batch(n,bsize;sequential=false,rng)\n\nReturn a vector of bsize vectors of indeces from 1 to n. Randomly unless the optional parameter sequential is used.\n\nExample:\n\njulia julia> Utils.batch(6,2,sequential=true) 3-element Array{Array{Int64,1},1}: [1, 2] [3, 4] [5, 6]\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.bic-Tuple{Any, Any, Any}","page":"Utils","title":"BetaML.Utils.bic","text":"bic(lL,k,n) - Bayesian information criterion (lower is better)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.celu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.celu","text":"celu(x; α=1) \n\nhttps://arxiv.org/pdf/1704.07483.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.class_counts-Tuple{Any}","page":"Utils","title":"BetaML.Utils.class_counts","text":"class_counts(x;classes=nothing)\n\nReturn a (unsorted) vector with the counts of each unique item (element or rows) in a dataset.\n\nIf order is important or not all classes are present in the data, a preset vectors of classes can be given in the parameter classes\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.class_counts_with_labels-Tuple{Any}","page":"Utils","title":"BetaML.Utils.class_counts_with_labels","text":"classcountswith_labels(x)\n\nReturn a dictionary that counts the number of each unique item (rows) in a dataset.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.cols_with_missing-Tuple{Any}","page":"Utils","title":"BetaML.Utils.cols_with_missing","text":"cols_with_missing(x)\n\nRetuyrn an array with the ids of the columns where there is at least a missing value.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.consistent_shuffle-Union{Tuple{AbstractVector{T}}, Tuple{T}} where T","page":"Utils","title":"BetaML.Utils.consistent_shuffle","text":"consistent_shuffle(data;dims,rng)\n\nShuffle a vector of n-dimensional arrays across dimension dims keeping the same order between the arrays\n\nParameters\n\ndata: The vector of arrays to shuffle\ndims: The dimension over to apply the shuffle [def: 1]\nrng: An AbstractRNG to apply for the shuffle\n\nNotes\n\nAll the arrays must have the same size for the dimension to shuffle\n\nExample\n\njulia> a = [1 2 30; 10 20 30]; b = [100 200 300]; julia> (aShuffled, bShuffled) = consistent_shuffle([a,b],dims=2) 2-element Vector{Matrix{Int64}}: [1 30 2; 10 30 20] [100 300 200]\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.cosine_distance-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.cosine_distance","text":"Cosine distance\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.cross_validation","page":"Utils","title":"BetaML.Utils.cross_validation","text":"cross_validation(\n f,\n data\n) -> Union{Tuple{Any, Any}, Vector{Any}}\ncross_validation(\n f,\n data,\n sampler;\n dims,\n verbosity,\n return_statistics\n) -> Union{Tuple{Any, Any}, Vector{Any}}\n\n\nPerform cross_validation according to sampler rule by calling the function f and collecting its output\n\nParameters\n\nf: The user-defined function that consume the specific train and validation data and return somehting (often the associated validation error). See later\ndata: A single n-dimenasional array or a vector of them (e.g. X,Y), depending on the tasks required by f.\nsampler: An istance of a AbstractDataSampler, defining the \"rules\" for sampling at each iteration. [def: KFold(nsplits=5,nrepeats=1,shuffle=true,rng=Random.GLOBAL_RNG) ]. Note that the RNG passed to the f function is the RNG passed to the sampler\ndims: The dimension over performing the cross_validation i.e. the dimension containing the observations [def: 1]\nverbosity: The verbosity to print information during each iteration (this can also be printed in the f function) [def: STD]\nreturn_statistics: Wheter cross_validation should return the statistics of the output of f (mean and standard deviation) or the whole outputs [def: true].\n\nNotes\n\ncross_validation works by calling the function f, defined by the user, passing to it the tuple trainData, valData and rng and collecting the result of the function f. The specific method for which trainData, and valData are selected at each iteration depends on the specific sampler, whith a single 5 k-fold rule being the default.\n\nThis approach is very flexible because the specific model to employ or the metric to use is left within the user-provided function. The only thing that cross_validation does is provide the model defined in the function f with the opportune data (and the random number generator).\n\nInput of the user-provided function trainData and valData are both themselves tuples. In supervised models, crossvalidations data should be a tuple of (X,Y) and trainData and valData will be equivalent to (xtrain, ytrain) and (xval, yval). In unsupervised models data is a single array, but the training and validation data should still need to be accessed as trainData[1] and valData[1]. Output of the user-provided function The user-defined function can return whatever. However, if `returnstatisticsis left on its defaulttrue` value the user-defined function must return a single scalar (e.g. some error measure) so that the mean and the standard deviation are returned.\n\nNote that cross_validation can beconveniently be employed using the do syntax, as Julia automatically rewrite cross_validation(data,...) trainData,valData,rng ...user defined body... end as cross_validation(f(trainData,valData,rng ), data,...)\n\nExample\n\njulia> X = [11:19 21:29 31:39 41:49 51:59 61:69];\njulia> Y = [1:9;];\njulia> sampler = KFold(nsplits=3);\njulia> (μ,σ) = cross_validation([X,Y],sampler) do trainData,valData,rng\n (xtrain,ytrain) = trainData; (xval,yval) = valData\n trainedModel = buildForest(xtrain,ytrain,30)\n ŷval = predict(trainedModel,xval)\n ϵ = relative_mean_error(yval,ŷval,normrec=false)\n return ϵ\n end\n(0.3202242202242202, 0.04307662219315022)\n\n\n\n\n\n","category":"function"},{"location":"Utils.html#BetaML.Utils.crossentropy-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.crossentropy","text":"crossentropy(y,ŷ; weight)\n\nCompute the (weighted) cross-entropy between the predicted and the sampled probability distributions.\n\nTo be used in classification problems.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dcelu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dcelu","text":"dcelu(x; α=1) \n\nhttps://arxiv.org/pdf/1704.07483.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.delu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.delu","text":"delu(x; α=1) with α > 0 \n\nhttps://arxiv.org/pdf/1511.07289.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dmaximum-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dmaximum","text":"dmaximum(x) \n\nMultidimensional verison of the derivative of maximum\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dmish-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dmish","text":"dmish(x) \n\nhttps://arxiv.org/pdf/1908.08681v1.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dplu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dplu","text":"dplu(x;α=0.1,c=1) \n\nPiecewise Linear Unit derivative \n\nhttps://arxiv.org/pdf/1809.09534.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.drelu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.drelu","text":"drelu(x) \n\nRectified Linear Unit \n\nhttps://www.cs.toronto.edu/~hinton/absps/reluICML.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dsigmoid-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dsigmoid","text":"dsigmoid(x)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dsoftmax-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dsoftmax","text":"dsoftmax(x; β=1) \n\nDerivative of the softmax function \n\nhttps://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dsoftplus-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dsoftplus","text":"dsoftplus(x) \n\nhttps://en.wikipedia.org/wiki/Rectifier(neuralnetworks)#Softplus\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dtanh-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dtanh","text":"dtanh(x)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.elu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.elu","text":"elu(x; α=1) with α > 0 \n\nhttps://arxiv.org/pdf/1511.07289.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.entropy-Tuple{Any}","page":"Utils","title":"BetaML.Utils.entropy","text":"entropy(x)\n\nCalculate the entropy for a list of items (or rows).\n\nSee: https://en.wikipedia.org/wiki/Decisiontreelearning#Gini_impurity\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.generate_parallel_rngs-Tuple{Random.AbstractRNG, Integer}","page":"Utils","title":"BetaML.Utils.generate_parallel_rngs","text":"generate_parallel_rngs(rng::AbstractRNG, n::Integer;reSeed=false)\n\nFor multi-threaded models, return n independent random number generators (one per thread) to be used in threaded computations.\n\nNote that each ring is a copy of the original random ring. This means that code that use these RNGs will not change the original RNG state.\n\nUse it with rngs = generate_parallel_rngs(rng,Threads.nthreads()) to have a separate rng per thread. By default the function doesn't re-seed the RNG, as you may want to have a loop index based re-seeding strategy rather than a threadid-based one (to guarantee the same result independently of the number of threads). If you prefer, you can instead re-seed the RNG here (using the parameter reSeed=true), such that each thread has a different seed. Be aware however that the stream of number generated will depend from the number of threads at run time.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.getpermutations-Union{Tuple{AbstractVector{T}}, Tuple{T}} where T","page":"Utils","title":"BetaML.Utils.getpermutations","text":"getpermutations(v::AbstractArray{T,1};keepStructure=false)\n\nReturn a vector of either (a) all possible permutations (uncollected) or (b) just those based on the unique values of the vector\n\nUseful to measure accuracy where you don't care about the actual name of the labels, like in unsupervised classifications (e.g. clustering)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.gini-Tuple{Any}","page":"Utils","title":"BetaML.Utils.gini","text":"gini(x)\n\nCalculate the Gini Impurity for a list of items (or rows).\n\nSee: https://en.wikipedia.org/wiki/Decisiontreelearning#Information_gain\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.issortable-Union{Tuple{AbstractArray{T, N}}, Tuple{N}, Tuple{T}} where {T, N}","page":"Utils","title":"BetaML.Utils.issortable","text":"Return wheather an array is sortable, i.e. has methos issort defined\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.l1_distance-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.l1_distance","text":"L1 norm distance (aka Manhattan Distance)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.l2_distance-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.l2_distance","text":"Euclidean (L2) distance\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.l2loss_by_cv-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.l2loss_by_cv","text":"Compute the loss of a given model over a given (x,y) dataset running cross-validation\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.l2squared_distance-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.l2squared_distance","text":"Squared Euclidean (L2) distance\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.lse-Tuple{Any}","page":"Utils","title":"BetaML.Utils.lse","text":"LogSumExp for efficiently computing log(sum(exp.(x))) \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.makematrix-Tuple{AbstractVector{T} where T}","page":"Utils","title":"BetaML.Utils.makematrix","text":"Transform an Array{T,1} in an Array{T,2} and leave unchanged Array{T,2}.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mean_dicts-Tuple{Any}","page":"Utils","title":"BetaML.Utils.mean_dicts","text":"mean_dicts(dicts)\n\nCompute the mean of the values of an array of dictionaries.\n\nGiven dicts an array of dictionaries, mean_dicts first compute the union of the keys and then average the values. If the original valueas are probabilities (non-negative items summing to 1), the result is also a probability distribution.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mish-Tuple{Any}","page":"Utils","title":"BetaML.Utils.mish","text":"mish(x) \n\nhttps://arxiv.org/pdf/1908.08681v1.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mode-Union{Tuple{AbstractArray{Dict{T, Float64}, N} where N}, Tuple{T}} where T","page":"Utils","title":"BetaML.Utils.mode","text":"mode(elements,rng)\n\nGiven a vector of dictionaries whose key is numerical (e.g. probabilities), a vector of vectors or a matrix, it returns the mode of each element (dictionary, vector or row) in terms of the key or the position.\n\nUse it to return a unique value from a multiclass classifier returning probabilities.\n\nNote:\n\nIf multiple classes have the highest mode, one is returned at random (use the parameter rng to fix the stochasticity)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mode-Union{Tuple{AbstractVector{T}}, Tuple{T}} where T<:Number","page":"Utils","title":"BetaML.Utils.mode","text":"mode(v::AbstractVector{T};rng)\n\nReturn the position with the highest value in an array, interpreted as mode (using rand in case of multimodal values)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mode-Union{Tuple{Dict{T, Float64}}, Tuple{T}} where T","page":"Utils","title":"BetaML.Utils.mode","text":"mode(dict::Dict{T,Float64};rng)\n\nReturn the key with highest mode (using rand in case of multimodal values)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mse-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.mse","text":"mse(y,ŷ)\n\nCompute the mean squared error (MSE) (aka mean squared deviation - MSD) between two vectors y and ŷ. Note that while the deviation is averaged by the length of y is is not scaled to give it a relative meaning.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.pairwise-Tuple{AbstractArray}","page":"Utils","title":"BetaML.Utils.pairwise","text":"pairwise(x::AbstractArray; distance, dims) -> Any\n\n\nCompute pairwise distance matrix between elements of an array identified across dimension dims.\n\nParameters:\n\nx: the data array \ndistance: a distance measure [def: l2_distance]\ndims: the dimension of the observations [def: 1, i.e. records on rows]\n\nReturns:\n\na nrecords by nrecords simmetric matrix of the pairwise distances\n\nNotes:\n\nif performances matters, you can use something like Distances.pairwise(Distances.euclidean,x,dims=1) from the Distances package.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.partition-Union{Tuple{T}, Tuple{AbstractVector{T}, AbstractVector{Float64}}} where T<:AbstractArray","page":"Utils","title":"BetaML.Utils.partition","text":"partition(data,parts;shuffle,dims,rng)\n\nPartition (by rows) one or more matrices according to the shares in parts.\n\nParameters\n\ndata: A matrix/vector or a vector of matrices/vectors\nparts: A vector of the required shares (must sum to 1)\nshufle: Whether to randomly shuffle the matrices (preserving the relative order between matrices)\ndims: The dimension for which to partition [def: 1]\ncopy: Wheter to copy the actual data or only create a reference [def: true]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nThe sum of parts must be equal to 1\nThe number of elements in the specified dimension must be the same for all the arrays in data\n\nExample:\n\njulia julia> x = [1:10 11:20] julia> y = collect(31:40) julia> ((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.7,0.3])\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.plu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.plu","text":"plu(x;α=0.1,c=1) \n\nPiecewise Linear Unit \n\nhttps://arxiv.org/pdf/1809.09534.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.polynomial_kernel-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.polynomial_kernel","text":"Polynomial kernel parametrised with constant=0 and degree=2 (i.e. a quadratic kernel). For other cᵢ and dᵢ use K = (x,y) -> polynomial_kernel(x,y,c=cᵢ,d=dᵢ) as kernel function in the supporting algorithms\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.pool1d","page":"Utils","title":"BetaML.Utils.pool1d","text":"pool1d(x,poolsize=2;f=mean)\n\nApply funtion f to a rolling poolsize contiguous (in 1d) neurons.\n\nApplicable to VectorFunctionLayer, e.g. layer2 = VectorFunctionLayer(nₗ,f=(x->pool1d(x,4,f=mean)) Attention: to apply this function as activation function in a neural network you will need Julia version >= 1.6, otherwise you may experience a segmentation fault (see this bug report)\n\n\n\n\n\n","category":"function"},{"location":"Utils.html#BetaML.Utils.radial_kernel-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.radial_kernel","text":"Radial Kernel (aka RBF kernel) parametrised with γ=1/2. For other gammas γᵢ use K = (x,y) -> radial_kernel(x,y,γ=γᵢ) as kernel function in the supporting algorithms\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.relative_mean_error-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.relative_mean_error","text":"relativemeanerror(y, ŷ;normdim=false,normrec=false,p=1)\n\nCompute the relative mean error (l-1 based by default) between y and ŷ.\n\nThere are many ways to compute a relative mean error. In particular, if normrec (normdim) is set to true, the records (dimensions) are normalised, in the sense that it doesn't matter if a record (dimension) is bigger or smaller than the others, the relative error is first computed for each record (dimension) and then it is averaged. With both normdim and normrec set to false (default) the function returns the relative mean error; with both set to true it returns the mean relative error (i.e. with p=1 the \"mean absolute percentage error (MAPE)\") The parameter p [def: 1] controls the p-norm used to define the error.\n\nThe mean relative error enfatises the relativeness of the error, i.e. all observations and dimensions weigth the same, wether large or small. Conversly, in the relative mean error the same relative error on larger observations (or dimensions) weights more.\n\nFor example, given y = [1,44,3] and ŷ = [2,45,2], the mean relative error mean_relative_error(y,ŷ,normrec=true) is 0.452, while the relative mean error relative_mean_error(y,ŷ, normrec=false) is \"only\" 0.0625.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.relu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.relu","text":"relu(x) \n\nRectified Linear Unit \n\nhttps://www.cs.toronto.edu/~hinton/absps/reluICML.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.sigmoid-Tuple{Any}","page":"Utils","title":"BetaML.Utils.sigmoid","text":"sigmoid(x)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.silhouette-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.silhouette","text":"silhouette(distances, classes) -> Any\n\n\nProvide Silhouette scoring for cluster outputs\n\nParameters:\n\ndistances: the nrecords by nrecords pairwise distance matrix\nclasses: the vector of assigned classes to each record\n\nNotes:\n\nthe matrix of pairwise distances can be obtained with the function pairwise\nthis function doesn't sample. Eventually sample before\nto get the score for the cluster simply compute the mean\nsee also the Wikipedia article\n\nExample:\n\njulia> x = [1 2 3 3; 1.2 3 3.1 3.2; 2 4 6 6.2; 2.1 3.5 5.9 6.3];\n\njulia> s_scores = silhouette(pairwise(x),[1,2,2,2])\n4-element Vector{Float64}:\n 0.0\n -0.7590778795827623\n 0.5030093571833065\n 0.4936350560759424\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.softmax-Tuple{Any}","page":"Utils","title":"BetaML.Utils.softmax","text":"softmax (x; β=1) \n\nThe input x is a vector. Return a PMF\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.softplus-Tuple{Any}","page":"Utils","title":"BetaML.Utils.softplus","text":"softplus(x) \n\nhttps://en.wikipedia.org/wiki/Rectifier(neuralnetworks)#Softplus\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.squared_cost-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.squared_cost","text":"squared_cost(y,ŷ)\n\nCompute the squared costs between a vector of observations and one of prediction as (1/2)*norm(y - ŷ)^2.\n\nAside the 1/2 term, it correspond to the squared l-2 norm distance and when it is averaged on multiple datapoints corresponds to the Mean Squared Error (MSE). It is mostly used for regression problems.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.sterling-Tuple{BigInt, BigInt}","page":"Utils","title":"BetaML.Utils.sterling","text":"Sterling number: number of partitions of a set of n elements in k sets \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.variance-Tuple{Any}","page":"Utils","title":"BetaML.Utils.variance","text":"variance(x) - population variance\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.xavier_init","page":"Utils","title":"BetaML.Utils.xavier_init","text":"xavier_init(previous_npar, this_npar) -> Matrix{Float64}\nxavier_init(\n previous_npar,\n this_npar,\n outsize;\n rng,\n eltype\n) -> Any\n\n\nPErform a Xavier initialisation of the weigths\n\nParameters:\n\nprevious_npar: number of parameters of the previous layer\nthis_npar: number of parameters of this layer\noutsize: tuple with the size of the weigths [def: (this_npar,previous_npar)]\nrng : random number generator [def: Random.GLOBAL_RNG]\neltype: eltype of the weigth array [def: Float64]\n\n\n\n\n\n","category":"function"},{"location":"Utils.html#BetaML.Utils.@codelocation-Tuple{}","page":"Utils","title":"BetaML.Utils.@codelocation","text":"@codelocation()\n\nHelper macro to print during runtime an info message concerning the code being executed position\n\n\n\n\n\n","category":"macro"},{"location":"Utils.html#BetaML.Utils.@threadsif-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.@threadsif","text":"Conditionally apply multi-threading to for loops. This is a variation on Base.Threads.@threads that adds a run-time boolean flag to enable or disable threading. \n\nExample:\n\nfunction optimize(objectives; use_threads=true)\n @threadsif use_threads for k = 1:length(objectives)\n # ...\n end\nend\n\n# Notes:\n- Borrowed from https://github.com/JuliaQuantumControl/QuantumControlBase.jl/blob/master/src/conditionalthreads.jl\n\n\n\n\n\n","category":"macro"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"EditURL = \"betaml_tutorial_cluster_iris.jl\"","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#clustering_tutorial","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The task is to estimate the species of a plant given some floreal measurements. It use the classical \"Iris\" dataset. Note that in this example we are using clustering approaches, so we try to understand the \"structure\" of our data, without relying to actually knowing the true labels (\"classes\" or \"factors\"). However we have chosen a dataset for which the true labels are actually known, so we can compare the accuracy of the algorithms we use, but these labels will not be used during the algorithms training.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Data origin:","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"dataset description: https://en.wikipedia.org/wiki/Irisflowerdata_set\ndata source we use here: https://github.com/JuliaStats/RDatasets.jl","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Library-and-data-loading","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Library and data loading","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Activating the local environment specific to BetaML documentation","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"using Pkg\nPkg.activate(joinpath(@__DIR__,\"..\",\"..\",\"..\"))","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We load the Beta Machine Learning Toolkit as well as some other packages that we use in this tutorial","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"using BetaML\nusing Random, Statistics, Logging, BenchmarkTools, StableRNGs, RDatasets, Plots, DataFrames","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We are also going to compare our results with two other leading packages in Julia for clustering analysis, Clustering.jl that provides (inter alia) kmeans and kmedoids algorithms and GaussianMixtures.jl that provides, as the name says, Gaussian Mixture Models. So we import them (we \"import\" them, rather than \"use\", not to bound their full names into namespace as some would collide with BetaML).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"import Clustering, GaussianMixtures","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Here we are explicit and we use our own fixed RNG:","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"seed = 123 # The table at the end of this tutorial has been obtained with seeds 123, 1000 and 10000\nAFIXEDRNG = StableRNG(seed)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We do a few tweeks for the Clustering and GaussianMixtures packages. Note that in BetaML we can also control both the random seed and the verbosity in the algorithm call, not only globally","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Random.seed!(seed)\n#logger = Logging.SimpleLogger(stdout, Logging.Error); global_logger(logger); ## For suppressing GaussianMixtures output\nnothing #hide","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Differently from the regression tutorial, we load the data here from [RDatasets](https://github.com/JuliaStats/RDatasets.jl](https://github.com/JuliaStats/RDatasets.jl), a package providing standard datasets.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"iris = dataset(\"datasets\", \"iris\")\ndescribe(iris)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The iris dataset provides floreal measures in columns 1 to 4 and the assigned species name in column 5. There are no missing values","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Data-preparation","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Data preparation","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The first step is to prepare the data for the analysis. We collect the first 4 columns as our feature x matrix and the last one as our y label vector. As we are using clustering algorithms, we are not actually using the labels to train the algorithms, we'll behave like we do not know them, we'll just let the algorithm \"learn\" from the structure of the data itself. We'll however use it to judge the accuracy that the various algorithms reach.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"x = Matrix{Float64}(iris[:,1:4]);\nyLabels = unique(iris[:,5])","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"As the labels are expressed as strings, the first thing we do is encode them as integers for our analysis using the OrdinalEncoder model (data isn't really needed to be actually ordered):","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"y = fit!(OrdinalEncoder(categories=yLabels),iris[:,5])","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The dataset from RDatasets is ordered by species, so we need to shuffle it to avoid biases. Shuffling happens by default in crossvalidation, but we are keeping here a copy of the shuffled version for later. Note that the version of [`consistentshuffle`](@ref) that is included in BetaML accepts several n-dimensional arrays and shuffle them (by default on rows, by we can specify the dimension) keeping the association between the various arrays in the shuffled output.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"(xs,ys) = consistent_shuffle([x,y], rng=copy(AFIXEDRNG));\nnothing #hide","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Main-analysis","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Main analysis","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We will try 3 BetaML models (KMeansClusterer, KMedoidsClusterer and GaussianMixtureClusterer) and we compare them with kmeans from Clusterings.jl and GMM from GaussianMixtures.jl","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"KMeansClusterer and KMedoidsClusterer works by first initialising the centers of the k-clusters (step a ). These centers, also known as the \"representatives\", must be selected within the data for kmedoids, while for kmeans they are the geometrical centers.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Then ( step b ) the algorithms iterates toward each point to assign the point to the cluster of the closest representative (according with a user defined distance metric, default to Euclidean), and ( step c ) moves each representative at the center of its newly acquired cluster (where \"center\" depends again from the metric).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Steps b and c are reiterated until the algorithm converge, i.e. the tentative k representative points (and their relative clusters) don't move any more. The result (output of the algorithm) is that each point is assigned to one of the clusters (classes).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The algorithm in GaussianMixtureClusterer is similar in that it employs an iterative approach (the ExpectationMinimisation algorithm, \"em\") but here we make the hipothesis that the data points are the observed outcomes of some _mixture probabilistic models where we have first a k-categorical variables whose outcomes are the (unobservble) parameters of a probabilistic distribution from which the data is finally drawn. Because the parameters of each of the k-possible distributions is unobservable this is also called a model with latent variables.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Most gmm models use the Gaussain distribution as the family of the mixture components, so we can tought the gmm acronym to indicate Gaussian Mixture Model. In BetaML we have currently implemented only Gaussain components, but any distribution could be used by just subclassing AbstractMixture and implementing a couple of methids (you are invited to contribute or just ask for a distribution family you are interested), so I prefer to think \"gmm\" as an acronym for Generative Mixture Model.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The algorithm tries to find the mixture that maximises the likelihood that the data has been generated indeed from such mixture, where the \"E\" step refers to computing the probability that each point belongs to each of the k-composants (somehow similar to the step b in the kmeans/kmedoids algorithms), and the \"M\" step estimates, giving the association probabilities in step \"E\", the parameters of the mixture and of the individual components (similar to step c).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The result here is that each point has a categorical distribution (PMF) representing the probabilities that it belongs to any of the k-components (our classes or clusters). This is interesting, as gmm can be used for many other things that clustering. It forms the backbone of the GaussianMixtureImputer model to impute missing values (on some or all dimensions) based to how close the record seems to its pears. For the same reasons, GaussianMixtureImputer can also be used to predict user's behaviours (or users' appreciation) according to the behaviour/ranking made by pears (\"collaborative filtering\").","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"While the result of GaussianMixtureClusterer is a vector of PMFs (one for each record), error measures and reports with the true values (if known) can be directly applied, as in BetaML they internally call mode() to retrieve the class with the highest probability for each record.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"As we are here, we also try different versions of the BetaML models, even if the default \"versions\" should be fine. For KMeansClusterer and KMedoidsClusterer we will try different initialisation strategies (\"gird\", the default one, \"random\" and \"shuffle\"), while for the GaussianMixtureClusterer model we'll choose different distributions of the Gaussain family (SphericalGaussian - where the variance is a scalar, DiagonalGaussian - with a vector variance, and FullGaussian, where the covariance is a matrix).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"As the result would depend on stochasticity both in the data selected and in the random initialisation, we use a cross-validation approach to run our models several times (with different data) and then we average their results. Cross-Validation in BetaML is very flexible and it is done using the cross_validation function. It is used by default for hyperparameters autotuning of the BetaML supervised models. cross_validation works by calling the function f, defined by the user, passing to it the tuple trainData, valData and rng and collecting the result of the function f. The specific method for which trainData, and valData are selected at each iteration depends on the specific sampler.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We start by selectign a k-fold sampler that split our data in 5 different parts, it uses 4 for training and 1 part (not used here) for validation. We run the simulations twice and, to be sure to have replicable results, we fix the random seed (at the whole crossValidaiton level, not on each iteration).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"sampler = KFold(nsplits=5,nrepeats=3,shuffle=true, rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We can now run the cross-validation with our models. Note that instead of defining the function f and then calling cross_validation[f(trainData,testData,rng),[x,y],...) we use the Julia do block syntax and we write directly the content of the f function in the do block. Also, by default crossvalidation already returns the mean and the standard deviation of the output of the user-provided f function (or the do block). However this requires that the f function returns a single scalar. Here we are returning a vector of the accuracies of the different models (so we can run the cross-validation only once), and hence we indicate with `returnstatistics=false` to cross_validation not to attempt to generate statistics but rather report the whole output. We'll compute the statistics ex-post.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Inside the do block we do 4 things:","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"we recover from trainData (a tuple, as we passed a tuple to cross_validation too) the xtrain features and ytrain labels;\nwe run the various clustering algorithms\nwe use the real labels to compute the model accuracy. Note that the clustering algorithm know nothing about the specific label name or even their order. This is why accuracy has the parameter ignorelabels to compute the accuracy oven any possible permutation of the classes found.\nwe return the various models' accuracies","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"cOut = cross_validation([x,y],sampler,return_statistics=false) do trainData,testData,rng\n # For unsupervised learning we use only the train data.\n # Also, we use the associated labels only to measure the performances\n (xtrain,ytrain) = trainData;\n # We run the clustering algorithm and then and we compute the accuracy using the real labels:\n estcl = fit!(KMeansClusterer(n_classes=3,initialisation_strategy=\"grid\",rng=rng),xtrain)\n kMeansGAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(KMeansClusterer(n_classes=3,initialisation_strategy=\"random\",rng=rng),xtrain)\n kMeansRAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(KMeansClusterer(n_classes=3,initialisation_strategy=\"shuffle\",rng=rng),xtrain)\n kMeansSAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(KMedoidsClusterer(n_classes=3,initialisation_strategy=\"grid\",rng=rng),xtrain)\n kMedoidsGAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(KMedoidsClusterer(n_classes=3,initialisation_strategy=\"random\",rng=rng),xtrain)\n kMedoidsRAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(KMedoidsClusterer(n_classes=3,initialisation_strategy=\"shuffle\",rng=rng),xtrain)\n kMedoidsSAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(GaussianMixtureClusterer(n_classes=3,mixtures=SphericalGaussian,rng=rng,verbosity=NONE),xtrain)\n gmmSpherAccuracy = accuracy(ytrain,estcl,ignorelabels=true, rng=rng)\n estcl = fit!(GaussianMixtureClusterer(n_classes=3,mixtures=DiagonalGaussian,rng=rng,verbosity=NONE),xtrain)\n gmmDiagAccuracy = accuracy(ytrain,estcl,ignorelabels=true, rng=rng)\n estcl = fit!(GaussianMixtureClusterer(n_classes=3,mixtures=FullGaussian,rng=rng,verbosity=NONE),xtrain)\n gmmFullAccuracy = accuracy(ytrain,estcl,ignorelabels=true, rng=rng)\n # For comparision with Clustering.jl\n clusteringOut = Clustering.kmeans(xtrain', 3)\n kMeans2Accuracy = accuracy(ytrain,clusteringOut.assignments,ignorelabels=true)\n # For comparision with GaussianMistures.jl - sometimes GaussianMistures.jl em! fails with a PosDefException\n dGMM = GaussianMixtures.GMM(3, xtrain; method=:kmeans, kind=:diag)\n GaussianMixtures.em!(dGMM, xtrain)\n gmmDiag2Accuracy = accuracy(ytrain,GaussianMixtures.gmmposterior(dGMM, xtrain)[1],ignorelabels=true)\n fGMM = GaussianMixtures.GMM(3, xtrain; method=:kmeans, kind=:full)\n GaussianMixtures.em!(fGMM, xtrain)\n gmmFull2Accuracy = accuracy(ytrain,GaussianMixtures.gmmposterior(fGMM, xtrain)[1],ignorelabels=true)\n # Returning the accuracies\n return kMeansGAccuracy,kMeansRAccuracy,kMeansSAccuracy,kMedoidsGAccuracy,kMedoidsRAccuracy,kMedoidsSAccuracy,gmmSpherAccuracy,gmmDiagAccuracy,gmmFullAccuracy,kMeans2Accuracy,gmmDiag2Accuracy,gmmFull2Accuracy\n end\n\n# We transform the output in matrix for easier analysis\naccuracies = fill(0.0,(length(cOut),length(cOut[1])))\n[accuracies[r,c] = cOut[r][c] for r in 1:length(cOut),c in 1:length(cOut[1])]\nμs = mean(accuracies,dims=1)\nσs = std(accuracies,dims=1)\n\n\nmodelLabels=[\"kMeansG\",\"kMeansR\",\"kMeansS\",\"kMedoidsG\",\"kMedoidsR\",\"kMedoidsS\",\"gmmSpher\",\"gmmDiag\",\"gmmFull\",\"kMeans (Clustering.jl)\",\"gmmDiag (GaussianMixtures.jl)\",\"gmmFull (GaussianMixtures.jl)\"]\n\nreport = DataFrame(mName = modelLabels, avgAccuracy = dropdims(round.(μs',digits=3),dims=2), stdAccuracy = dropdims(round.(σs',digits=3),dims=2))","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Accuracies (mean and its standard dev.) running this scripts with different random seeds (123, 1000 and 10000):","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"model μ 1 σ² 1 μ 2 σ² 2 μ 3 σ² 3\n│ kMeansG 0.891 0.017 0.892 0.012 0.893 0.017\n│ kMeansR 0.866 0.083 0.831 0.127 0.836 0.114\n│ kMeansS 0.764 0.174 0.822 0.145 0.779 0.170\n│ kMedoidsG 0.894 0.015 0.896 0.012 0.894 0.017\n│ kMedoidsR 0.804 0.144 0.841 0.123 0.825 0.134\n│ kMedoidsS 0.893 0.018 0.834 0.130 0.877 0.085\n│ gmmSpher 0.893 0.016 0.891 0.016 0.895 0.017\n│ gmmDiag 0.917 0.022 0.912 0.016 0.916 0.014\n│ gmmFull 0.970 0.035 0.982 0.013 0.981 0.009\n│ kMeans (Clustering.jl) 0.856 0.112 0.873 0.083 0.873 0.089\n│ gmmDiag (GaussianMixtures.jl) 0.865 0.127 0.872 0.090 0.833 0.152\n│ gmmFull (GaussianMixtures.jl) 0.907 0.133 0.914 0.160 0.917 0.141","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We can see that running the script multiple times with different random seed confirm the estimated standard deviations collected with the cross_validation, with the BetaML GMM-based models and grid based ones being the most stable ones.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#BetaML-model-accuracies","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"BetaML model accuracies","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"From the output We see that the gmm models perform for this dataset generally better than kmeans or kmedoids algorithms, and they further have very low variances. In detail, it is the (default) grid initialisation that leads to the better results for kmeans and kmedoids, while for the gmm models it is the FullGaussian to perform better.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Comparisions-with-Clustering.jl-and-GaussianMixtures.jl","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Comparisions with Clustering.jl and GaussianMixtures.jl","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"For this specific case, both Clustering.jl and GaussianMixtures.jl report substantially worst accuracies, and with very high variances. But we maintain the ranking that Full Gaussian gmm > Diagonal Gaussian > Kmeans accuracy. I suspect the reason that BetaML gmm works so well is in relation to the usage of kmeans algorithm for initialisation of the mixtures, itself initialized with a \"grid\" arpproach. The grid initialisation \"guarantee\" indeed that the initial means of the mixture components are well spread across the multidimensional space defined by the data, and it helps avoiding the EM algoritm to converge to a bad local optimus.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Working-without-the-labels","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Working without the labels","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Up to now we used the real labels to compare the model accuracies. But in real clustering examples we don't have the true classes, or we wouln't need to do clustering in the first instance, so we don't know the number of classes to use. There are several methods to judge clusters algorithms goodness. For likelyhood based algorithms as GaussianMixtureClusterer we can use a information criteria that trade the goodness of the lickelyhood with the number of parameters used to do the fit. BetaML provides by default in the gmm clustering outputs both the Bayesian information criterion (BIC) and the Akaike information criterion (AIC), where for both a lower value is better.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We can then run the model with different number of classes and see which one leads to the lower BIC or AIC. We run hence cross_validation again with the FullGaussian gmm model. Note that we use the BIC/AIC criteria here for establishing the \"best\" number of classes but we could have used it also to select the kind of Gaussain distribution to use. This is one example of hyper-parameter tuning that we developed more in detail using autotuning in the regression tutorial.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Let's try up to 4 possible classes:","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"K = 4\nsampler = KFold(nsplits=5,nrepeats=2,shuffle=true, rng=copy(AFIXEDRNG))\ncOut = cross_validation([x,y],sampler,return_statistics=false) do trainData,testData,rng\n (xtrain,ytrain) = trainData;\n BICS = []\n AICS = []\n for k in 1:K\n m = GaussianMixtureClusterer(n_classes=k,mixtures=FullGaussian,rng=rng,verbosity=NONE)\n fit!(m,xtrain)\n push!(BICS,info(m)[\"BIC\"])\n push!(AICS,info(m)[\"AIC\"])\n end\n return (BICS,AICS)\nend\n\n# Transforming the output in matrices for easier analysis\nNit = length(cOut)\n\nBICS = fill(0.0,(Nit,K))\nAICS = fill(0.0,(Nit,K))\n[BICS[r,c] = cOut[r][1][c] for r in 1:Nit,c in 1:K]\n[AICS[r,c] = cOut[r][2][c] for r in 1:Nit,c in 1:K]\n\nμsBICS = mean(BICS,dims=1)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"σsBICS = std(BICS,dims=1)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"μsAICS = mean(AICS,dims=1)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"σsAICS = std(AICS,dims=1)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"plot(1:K,[μsBICS' μsAICS'], labels=[\"BIC\" \"AIC\"], title=\"Information criteria by number of classes\", xlabel=\"number of classes\", ylabel=\"lower is better\")","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We see that following the \"lowest AIC\" rule we would indeed choose three classes, while following the \"lowest BIC\" criteria we would have choosen only two classes. This means that there is two classes that, concerning the floreal measures used in the database, are very similar, and our models are unsure about them. Perhaps the biologists will end up one day with the conclusion that it is indeed only one specie :-).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We could study this issue more in detail by analysing the ConfusionMatrix, but the one used in BetaML does not account for the ignorelabels option (yet).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Analysing-the-silhouette-of-the-cluster","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Analysing the silhouette of the cluster","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"A further metric to analyse cluster output is the so-called Sinhouette method","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Silhouette is a distance-based metric and require as first argument a matrix of pairwise distances. This can be computed with the pairwise function, that default to using l2_distance (i.e. Euclidean). Many other distance functions are available in the Clustering sub-module or one can use the efficiently implemented distances from the Distances package, as in this example.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We'll use here the silhouette function over a simple loop:","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"x,y = consistent_shuffle([x,y],dims=1)\nimport Distances\npd = pairwise(x,distance=Distances.euclidean) # we compute the pairwise distances\nnclasses = 2:6\nmodels = [KMeansClusterer, KMedoidsClusterer, GaussianMixtureClusterer]\nprintln(\"Silhouette score by model type and class number:\")\nfor ncl in nclasses, mtype in models\n m = mtype(n_classes=ncl, verbosity=NONE)\n ŷ = fit!(m,x)\n if mtype == GaussianMixtureClusterer\n ŷ = mode(ŷ)\n end\n s = mean(silhouette(pd,ŷ))\n println(\"$mtype \\t ($ncl classes): $s\")\nend","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Highest levels are better. We see again that 2 classes have better scores !","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Conclusions","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Conclusions","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We have shown in this tutorial how we can easily run clustering algorithms in BetaML with just one line of code fit!(ChoosenClusterer(),x), but also how can we use cross-validation in order to help the model or parameter selection, with or whithout knowing the real classes. We retrieve here what we observed with supervised models. Globally the accuracy of BetaML models are comparable to those of leading specialised packages (in this case they are even better), but there is a significant gap in computational efficiency that restricts the pratical usage of BetaML to datasets that fits in the pc memory. However we trade this relative inefficiency with very flexible model definition and utility functions (for example GaussianMixtureClusterer works with missing data, allowing it to be used as the backbone of the GaussianMixtureImputer missing imputation function, or for collaborative reccomendation systems).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"View this file on Github.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"This page was generated using Literate.jl.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"EditURL = \"betaml_tutorial_regression_sharingBikes.jl\"","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#regression_tutorial","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The task is to estimate the influence of several variables (like the weather, the season, the day of the week..) on the demand of shared bicycles, so that the authority in charge of the service can organise the service in the best way.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Data origin:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"original full dataset (by hour, not used here): https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset\nsimplified dataset (by day, with some simple scaling): https://www.hds.utc.fr/~tdenoeux/dokuwiki/en/aec\ndescription: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/exam2019ace.pdf\ndata: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/bikesharing_day.csv.zip","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Note that even if we are estimating a time serie, we are not using here a recurrent neural network as we assume the temporal dependence to be negligible (i.e. Y_t = f(X_t) alone).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Library-and-data-loading","page":"A regression task: the prediction of bike sharing demand","title":"Library and data loading","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Activating the local environment specific to","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"using Pkg\nPkg.activate(joinpath(@__DIR__,\"..\",\"..\",\"..\"))","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We first load all the packages we are going to use","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"using LinearAlgebra, Random, Statistics, StableRNGs, DataFrames, CSV, Plots, Pipe, BenchmarkTools, BetaML\nimport Distributions: Uniform, DiscreteUniform\nimport DecisionTree, Flux ## For comparisions","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Here we are explicit and we use our own fixed RNG:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"seed = 123 # The table at the end of this tutorial has been obtained with seeds 123, 1000 and 10000\nAFIXEDRNG = StableRNG(seed)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Here we load the data from a csv provided by the BataML package","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"basedir = joinpath(dirname(pathof(BetaML)),\"..\",\"docs\",\"src\",\"tutorials\",\"Regression - bike sharing\")\ndata = CSV.File(joinpath(basedir,\"data\",\"bike_sharing_day.csv\"),delim=',') |> DataFrame\ndescribe(data)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The variable we want to learn to predict is cnt, the total demand of bikes for a given day. Even if it is indeed an integer, we treat it as a continuous variable, so each single prediction will be a scalar Y in mathbbR.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"plot(data.cnt, title=\"Daily bike sharing rents (2Y)\", label=nothing)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Decision-Trees","page":"A regression task: the prediction of bike sharing demand","title":"Decision Trees","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We start our regression task with Decision Trees.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Decision trees training consist in choosing the set of questions (in a hierarcical way, so to form indeed a \"decision tree\") that \"best\" split the dataset given for training, in the sense that the split generate the sub-samples (always 2 subsamples in the BetaML implementation) that are, for the characteristic we want to predict, the most homogeneous possible. Decision trees are one of the few ML algorithms that has an intuitive interpretation and can be used for both regression or classification tasks.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Data-preparation","page":"A regression task: the prediction of bike sharing demand","title":"Data preparation","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The first step is to prepare the data for the analysis. This indeed depends already on the model we want to employ, as some models \"accept\" almost everything as input, no matter if the data is numerical or categorical, if it has missing values or not... while other models are instead much more exigents, and require more work to \"clean up\" our dataset.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The tutorial starts using Decision Tree and Random Forest models that definitly belong to the first group, so the only thing we have to do is to select the variables in input (the \"feature matrix\", that we will indicate with \"X\") and the variable representing our output (the information we want to learn to predict, we call it \"y\"):","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"x = Matrix{Float64}(data[:,[:instant,:season,:yr,:mnth,:holiday,:weekday,:workingday,:weathersit,:temp,:atemp,:hum,:windspeed]])\ny = data[:,16];\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We finally set up a dataframe to store the relative mean errors of the various models we'll use.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"results = DataFrame(model=String[],train_rme=Float64[],test_rme=Float64[])","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Model-selection","page":"A regression task: the prediction of bike sharing demand","title":"Model selection","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can now split the dataset between the data that we will use for training the algorithm and selecting the hyperparameters (xtrain/ytrain) and those for testing the quality of the algoritm with the optimal hyperparameters (xtest/ytest). We use the partition function specifying the share we want to use for these two different subsets, here 80%, and 20% respectively. As our data represents indeed a time serie, we want our model to be able to predict future demand of bike sharing from past, observed rented bikes, so we do not shuffle the datasets as it would be the default.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.75,1-0.75],shuffle=false)\n(ntrain, ntest) = size.([ytrain,ytest],1)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Then we define the model we want to use, DecisionTreeEstimator in this case, and we create an instance of the model:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"m = DecisionTreeEstimator(autotune=true, rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Passing a fixed Random Number Generator (RNG) to the rng parameter guarantees that everytime we use the model with the same data (from the model creation downward to value prediciton) we obtain the same results. In particular BetaML provide FIXEDRNG, an istance of StableRNG that guarantees reproducibility even across different Julia versions. See the section \"Dealing with stochasticity\" for details. Note the autotune parameter. BetaML has perhaps what is the easiest method for automatically tuning the model hyperparameters (thus becoming in this way learned parameters). Indeed, in most cases it is enought to pass the attribute autotune=true on the model constructor and hyperparameters search will be automatically performed on the first fit! call. If needed we can customise hyperparameter tuning, chosing the tuning method on the parameter tunemethod. The single-line above is equivalent to:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"tuning_method = SuccessiveHalvingSearch(\n hpranges = Dict(\"max_depth\" =>[5,10,nothing], \"min_gain\"=>[0.0, 0.1, 0.5], \"min_records\"=>[2,3,5],\"max_features\"=>[nothing,5,10,30]),\n loss = l2loss_by_cv,\n res_shares = [0.05, 0.2, 0.3],\n multithreads = true\n )\nm_dt = DecisionTreeEstimator(autotune=true, rng=copy(AFIXEDRNG), tunemethod=tuning_method)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Note that the defaults change according to the specific model, for example RandomForestEstimator](@ref) autotuning default to not being multithreaded, as the individual model is already multithreaded.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"tip: Tip\nRefer to the versions of this tutorial for BetaML <= 0.6 for a good exercise on how to perform model selection using the cross_validation function, or even by custom grid search.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can now fit the model, that is learn the model parameters that lead to the best predictions from the data. By default (unless we use cache=false in the model constructor) the model stores also the training predictions, so we can just use fit!() instead of fit!() followed by predict(model,xtrain)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrain = fit!(m_dt,xtrain,ytrain)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The above code produces a fitted DecisionTreeEstimator object that can be used to make predictions given some new features, i.e. given a new X matrix of (number of observations x dimensions), predict the corresponding Y vector of scalars in R.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtest = predict(m_dt, xtest)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We now compute the mean relative error for the training and the test set. The relative_mean_error is a very flexible error function. Without additional parameter, it computes, as the name says, the relative mean error, between an estimated and a true vector. However it can also compute the mean relative error, also known as the \"mean absolute percentage error\" (MAPE), or use a p-norm higher than 1. The mean relative error enfatises the relativeness of the error, i.e. all observations and dimensions weigth the same, wether large or small. Conversly, in the relative mean error the same relative error on larger observations (or dimensions) weights more. In this tutorial we use the later, as our data has clearly some outlier days with very small rents, and we care more of avoiding our customers finding empty bike racks than having unrented bikes on the rack. Targeting a low mean average error would push all our predicitons down to try accomodate the low-level predicitons (to avoid a large relative error), and that's not what we want.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can then compute the relative mean error for the decision tree","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"rme_train = relative_mean_error(ytrain,ŷtrain) # 0.1367\nrme_test = relative_mean_error(ytest,ŷtest) # 0.1547","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"And we save the real mean accuracies in the results dataframe:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"push!(results,[\"DT\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can plot the true labels vs the estimated one for the three subsets...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytrain,ŷtrain,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in training period (DT)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytest,ŷtest,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in testing period (DT)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Or we can visualise the true vs estimated bike shared on a temporal base. First on the full period (2 years) ...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))\nŷtestfull = vcat(fill(missing,ntrain), ŷtest)\nplot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=[\"obs\" \"train\" \"test\"], legend=:topleft, ylabel=\"daily rides\", title=\"Daily bike sharing demand observed/estimated across the\\n whole 2-years period (DT)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"..and then focusing on the testing period","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"stc = ntrain\nendc = size(x,1)\nplot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=[\"obs\" \"test\"], legend=:bottomleft, ylabel=\"Daily rides\", title=\"Focus on the testing period (DT)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The predictions aren't so bad in this case, however decision trees are highly instable, and the output could have depended just from the specific initial random seed.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Random-Forests","page":"A regression task: the prediction of bike sharing demand","title":"Random Forests","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Rather than trying to solve this problem using a single Decision Tree model, let's not try to use a Random Forest model. Random forests average the results of many different decision trees and provide a more \"stable\" result. Being made of many decision trees, random forests are hovever more computationally expensive to train.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"m_rf = RandomForestEstimator(autotune=true, oob=true, rng=copy(AFIXEDRNG))\nŷtrain = fit!(m_rf,xtrain,ytrain);\nŷtest = predict(m_rf,xtest);\nrme_train = relative_mean_error(ytrain,ŷtrain) # 0.056\nrme_test = relative_mean_error(ytest,ŷtest) # 0.161\npush!(results,[\"RF\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"While slower than individual decision trees, random forests remain relativly fast. We should also consider that they are by default efficiently parallelised, so their speed increases with the number of available cores (in building this documentation page, GitHub CI servers allow for a single core, so all the bechmark you see in this tutorial are run with a single core available).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Random forests support the so-called \"out-of-bag\" error, an estimation of the error that we would have when the model is applied on a testing sample. However in this case the oob reported is much smaller than the testing error we will actually find. This is due to the fact that the division between training/validation and testing in this exercise is not random, but has a temporal basis. It seems that in this example the data in validation/testing follows a different pattern/variance than those in training (in probabilistic terms, the daily observations are not i.i.d.).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"info(m_rf)\noob_error, rme_test = info(m_rf)[\"oob_errors\"],relative_mean_error(ytest,ŷtest)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"In this case we found an error very similar to the one employing a single decision tree. Let's print the observed data vs the estimated one using the random forest and then along the temporal axis:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytrain,ŷtrain,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in training period (RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytest,ŷtest,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in testing period (RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Full period plot (2 years):","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))\nŷtestfull = vcat(fill(missing,ntrain), ŷtest)\nplot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=[\"obs\" \"train\" \"test\"], legend=:topleft, ylabel=\"daily rides\", title=\"Daily bike sharing demand observed/estimated across the\\n whole 2-years period (RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Focus on the testing period:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"stc = 620\nendc = size(x,1)\nplot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtrainfull[stc:endc] ŷtestfull[stc:endc]], label=[\"obs\" \"val\" \"test\"], legend=:bottomleft, ylabel=\"Daily rides\", title=\"Focus on the testing period (RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Comparison-with-DecisionTree.jl-random-forest","page":"A regression task: the prediction of bike sharing demand","title":"Comparison with DecisionTree.jl random forest","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We now compare our results with those obtained employing the same model in the DecisionTree package, using the hyperparameters of the obtimal BetaML Random forest model:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"best_rf_hp = hyperparameters(m_rf)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Hyperparameters of the DecisionTree.jl random forest model","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"n_subfeatures=isnothing(best_rf_hp.max_features) ? -1 : best_rf_hp.max_features; n_trees=best_rf_hp.n_trees; partial_sampling=0.7; max_depth=isnothing(best_rf_hp.max_depth) ? typemax(Int64) : best_rf_hp.max_depth;\nmin_samples_leaf=best_rf_hp.min_records; min_samples_split=best_rf_hp.min_records; min_purity_increase=best_rf_hp.min_gain;\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We train the model..","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"model = DecisionTree.build_forest(ytrain, convert(Matrix,xtrain),\n n_subfeatures,\n n_trees,\n partial_sampling,\n max_depth,\n min_samples_leaf,\n min_samples_split,\n min_purity_increase;\n rng = seed)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"And we generate predictions and measure their error","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"(ŷtrain,ŷtest) = DecisionTree.apply_forest.([model],[xtrain,xtest]);\n\n\n(rme_train, rme_test) = relative_mean_error.([ytrain,ytest],[ŷtrain,ŷtest]) # 0.022 and 0.304\npush!(results,[\"RF (DecisionTree.jl)\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"While the train error is very small, the error on the test set remains relativly high. The very low error level on the training set is a sign that it overspecialised on the training set, and we should have better ran a dedicated hyper-parameter tuning function for the DecisionTree.jl model (we did try using the default DecisionTrees.jl parameters, but we obtained roughtly the same results).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Finally we plot the DecisionTree.jl predictions alongside the observed value:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))\nŷtestfull = vcat(fill(missing,ntrain), ŷtest)\nplot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=[\"obs\" \"train\" \"test\"], legend=:topleft, ylabel=\"daily rides\", title=\"Daily bike sharing demand observed/estimated across the\\n whole 2-years period (DT.jl RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Again, focusing on the testing data:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"stc = ntrain\nendc = size(x,1)\nplot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=[\"obs\" \"test\"], legend=:bottomleft, ylabel=\"Daily rides\", title=\"Focus on the testing period (DT.jl RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Conclusions-of-Decision-Trees-/-Random-Forests-methods","page":"A regression task: the prediction of bike sharing demand","title":"Conclusions of Decision Trees / Random Forests methods","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The error obtained employing DecisionTree.jl is significantly larger than those obtained using a BetaML random forest model, altought to be fair with DecisionTrees.jl we didn't tuned its hyper-parameters. Also, the DecisionTree.jl random forest model is much faster. This is partially due by the fact that, internally, DecisionTree.jl models optimise the algorithm by sorting the observations. BetaML trees/forests don't employ this optimisation and hence they can work with true categorical data for which ordering is not defined. An other explanation of this difference in speed is that BetaML Random Forest models accept missing values within the feature matrix. To sum up, BetaML random forests are ideal algorithms when we want to obtain good predictions in the most simpler way, even without manually tuning the hyper-parameters, and without spending time in cleaning (\"munging\") the feature matrix, as they accept almost \"any kind\" of data as it is.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Neural-Networks","page":"A regression task: the prediction of bike sharing demand","title":"Neural Networks","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"BetaML provides only deep forward neural networks, artificial neural network units where the individual \"nodes\" are arranged in layers, from the input layer, where each unit holds the input coordinate, through various hidden layer transformations, until the actual output of the model:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"(Image: Neural Networks)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"In this layerwise computation, each unit in a particular layer takes input from all the preceding layer units and it has its own parameters that are adjusted to perform the overall computation. The training of the network consists in retrieving the coefficients that minimise a loss function between the output of the model and the known data. In particular, a deep (feedforward) neural network refers to a neural network that contains not only the input and output layers, but also (a variable number of) hidden layers in between.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Neural networks accept only numerical inputs. We hence need to convert all categorical data in numerical units. A common approach is to use the so-called \"one-hot-encoding\" where the catagorical values are converted into indicator variables (0/1), one for each possible value. This can be done in BetaML using the OneHotEncoder function:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"seasonDummies = fit!(OneHotEncoder(),data.season)\nweatherDummies = fit!(OneHotEncoder(),data.weathersit)\nwdayDummies = fit!(OneHotEncoder(),data.weekday .+ 1)\n\n\n# We compose the feature matrix with the new dimensions obtained from the onehotencoder functions\nx = hcat(Matrix{Float64}(data[:,[:instant,:yr,:mnth,:holiday,:workingday,:temp,:atemp,:hum,:windspeed]]),\n seasonDummies,\n weatherDummies,\n wdayDummies)\ny = data[:,16];\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"As we did for decision trees/ random forests, we split the data in training, validation and testing sets","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.75,1-0.75],shuffle=false)\n(ntrain, ntest) = size.([ytrain,ytest],1)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"An other common operation with neural networks is to scale the feature vectors (X) and the labels (Y). The BetaML Scaler model, by default, scales the data such that each dimension has mean 0 and variance 1.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Note that we can provide the Scaler` model with different scale factors or specify the columns that shoudn't be scaled (e.g. those resulting from the one-hot encoding). Finally we can reverse the scaling (this is useful to retrieve the unscaled features from a model trained with scaled ones).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"cols_nottoscale = [2;4;5;10:23]\nxsm = Scaler(skip=cols_nottoscale)\nxtrain_scaled = fit!(xsm,xtrain)\nxtest_scaled = predict(xsm,xtest)\nytrain_scaled = ytrain ./ 1000 # We just divide Y by 1000, as using full scaling of Y we may get negative demand.\nytest_scaled = ytest ./ 1000\nD = size(xtrain,2)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can now build our feed-forward neaural network. We create three layers, the first layers will always have a input size equal to the dimensions of our data (the number of columns), and the output layer, for a simple regression where the predictions are scalars, it will always be one. We will tune the size of the middle layer size.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"There are already several kind of layers available (and you can build your own kind by defining a new struct and implementing a few functions. See the Nn module documentation for details). Here we use only dense layers, those found in typycal feed-fordward neural networks.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"For each layer, on top of its size (in \"neurons\") we can specify an activation function. Here we use the relu for the terminal layer (this will guarantee that our predictions are always positive) and identity for the hidden layer. Again, consult the Nn module documentation for other activation layers already defined, or use any function of your choice.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Initial weight parameters can also be specified if needed. By default DenseLayer use the so-called Xavier initialisation.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Let's hence build our candidate neural network structures, choosing between 5 and 10 nodes in the hidden layers:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"candidate_structures = [\n [DenseLayer(D,k,f=relu,df=drelu,rng=copy(AFIXEDRNG)), # Activation function is ReLU, it's derivative is drelu\n DenseLayer(k,k,f=identity,df=identity,rng=copy(AFIXEDRNG)), # This is the hidden layer we vant to test various sizes\n DenseLayer(k,1,f=relu,df=didentity,rng=copy(AFIXEDRNG))] for k in 5:2:10]","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Note that specify the derivatives of the activation functions (and of the loss function that we'll see in a moment) it totally optional, as without them BetaML will use [Zygote.jl](https://github.com/FluxML/Zygote.jl for automatic differentiation.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We do also set a few other parameters as \"turnable\": the number of \"epochs\" to train the model (the number of iterations trough the whole dataset), the sample size at each batch and the optimisation algorithm to use. Several optimisation algorithms are indeed available, and each accepts different parameters, like the learning rate for the Stochastic Gradient Descent algorithm (SGD, used by default) or the exponential decay rates for the moments estimates for the ADAM algorithm (that we use here, with the default parameters).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The hyperparameter ranges will then look as follow:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"hpranges = Dict(\"layers\" => candidate_structures,\n \"epochs\" => rand(copy(AFIXEDRNG),DiscreteUniform(50,100),3), # 3 values sampled at random between 50 and 100\n \"batch_size\" => [4,8,16],\n \"opt_alg\" => [SGD(λ=2),SGD(λ=1),SGD(λ=3),ADAM(λ=0.5),ADAM(λ=1),ADAM(λ=0.25)])","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Finally we can build \"neural network\" NeuralNetworkEstimator model where we \"chain\" the layers together and we assign a final loss function (again, you can provide your own loss function, if those available in BetaML don't suit your needs):","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"nnm = NeuralNetworkEstimator(loss=squared_cost, descr=\"Bike sharing regression model\", tunemethod=SuccessiveHalvingSearch(hpranges = hpranges), autotune=true,rng=copy(AFIXEDRNG)) # Build the NN model and use the squared cost (aka MSE) as error function by default","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can now fit and autotune the model:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrain_scaled = fit!(nnm,xtrain_scaled,ytrain_scaled)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The model training is one order of magnitude slower than random forests, altought the memory requirement is approximatly the same.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"To obtain the neural network predictions we apply the function predict to the feature matrix X for which we want to generate previsions, and then we rescale y. Normally we would apply here the inverse_predict function, but as we simple divided by 1000, we multiply ŷ by the same amount:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrain = ŷtrain_scaled .* 1000\nŷtest = predict(nnm,xtest_scaled) .* 1000","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"(rme_train, rme_test) = relative_mean_error.([ŷtrain,ŷtest],[ytrain,ytest])\npush!(results,[\"NN\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The error is much lower. Let's plot our predictions:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Again, we can start by plotting the estimated vs the observed value:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytrain,ŷtrain,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in training period (NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytest,ŷtest,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in testing period (NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We now plot across the time dimension, first plotting the whole period (2 years):","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))\nŷtestfull = vcat(fill(missing,ntrain), ŷtest)\nplot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=[\"obs\" \"train\" \"test\"], legend=:topleft, ylabel=\"daily rides\", title=\"Daily bike sharing demand observed/estimated across the\\n whole 2-years period (NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"...and then focusing on the testing data","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"stc = 620\nendc = size(x,1)\nplot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=[\"obs\" \"val\" \"test\"], legend=:bottomleft, ylabel=\"Daily rides\", title=\"Focus on the testing period (NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Comparison-with-Flux.jl","page":"A regression task: the prediction of bike sharing demand","title":"Comparison with Flux.jl","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We now apply the same Neural Network model using the Flux framework, a dedicated neural network library, reusing the optimal parameters that we did learn from tuning NeuralNetworkEstimator:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"hp_opt = hyperparameters(nnm)\nopt_size = size(hp_opt.layers[1])[2][1]\nopt_batch_size = hp_opt.batch_size\nopt_epochs = hp_opt.epochs","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We fix the default random number generator so that the Flux example gives a reproducible output","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Random.seed!(seed)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We define the Flux neural network model and load it with data...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"l1 = Flux.Dense(D,opt_size,Flux.relu)\nl2 = Flux.Dense(opt_size,opt_size,identity)\nl3 = Flux.Dense(opt_size,1,Flux.relu)\nFlux_nn = Flux.Chain(l1,l2,l3)\nfluxloss(x, y) = Flux.mse(Flux_nn(x), y)\nps = Flux.params(Flux_nn)\nnndata = Flux.Data.DataLoader((xtrain_scaled', ytrain_scaled'), batchsize=opt_batch_size,shuffle=true)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We do the training of the Flux model...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"[Flux.train!(fluxloss, ps, nndata, Flux.ADAM(0.001, (0.9, 0.8))) for i in 1:opt_epochs]","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We obtain the predicitons...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainf = @pipe Flux_nn(xtrain_scaled')' .* 1000;\nŷtestf = @pipe Flux_nn(xtest_scaled')' .* 1000;\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"..and we compute the mean relative errors..","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"(rme_train, rme_test) = relative_mean_error.([ŷtrainf,ŷtestf],[ytrain,ytest])\npush!(results,[\"NN (Flux.jl)\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":".. finding an error not significantly different than the one obtained from BetaML.Nn.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Plots:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytrain,ŷtrainf,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in training period (Flux.NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytest,ŷtestf,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in testing period (Flux.NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainfullf = vcat(ŷtrainf,fill(missing,ntest))\nŷtestfullf = vcat(fill(missing,ntrain), ŷtestf)\nplot(data[:,:dteday],[data[:,:cnt] ŷtrainfullf ŷtestfullf], label=[\"obs\" \"train\" \"test\"], legend=:topleft, ylabel=\"daily rides\", title=\"Daily bike sharing demand observed/estimated across the\\n whole 2-years period (Flux.NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"stc = 620\nendc = size(x,1)\nplot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfullf[stc:endc]], label=[\"obs\" \"val\" \"test\"], legend=:bottomleft, ylabel=\"Daily rides\", title=\"Focus on the testing period (Flux.NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Conclusions-of-Neural-Network-models","page":"A regression task: the prediction of bike sharing demand","title":"Conclusions of Neural Network models","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"If we strive for the most accurate predictions, deep neural networks are usually the best choice. However they are computationally expensive, so with limited resourses we may get better results by fine tuning and running many repetitions of \"simpler\" decision trees or even random forest models than a large naural network with insufficient hyper-parameter tuning. Also, we shoudl consider that decision trees/random forests are much simpler to work with.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"That said, specialised neural network libraries, like Flux, allow to use GPU and specialised hardware letting neural networks to scale with very large datasets.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Still, for small and medium datasets, BetaML provides simpler yet customisable solutions that are accurate and fast.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#GMM-based-regressors","page":"A regression task: the prediction of bike sharing demand","title":"GMM-based regressors","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"BetaML 0.8 introduces new regression algorithms based on Gaussian Mixture Model. Specifically, there are two variants available, GaussianMixtureRegressor2 and GaussianMixtureRegressor, and this example uses GaussianMixtureRegressor As for neural networks, they work on numerical data only, so we reuse the datasets we prepared for the neural networks.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"As usual we first define the model.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"m = GaussianMixtureRegressor(rng=copy(AFIXEDRNG),verbosity=NONE)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"info: Info\nWe disabled autotune here, as this code is run by GitHub continuous_integration servers on each code update, and GitHub servers seem to have some strange problem with it, taking almost 4 hours instead of a few seconds on my machine.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We then fit the model to the training data..","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainGMM_unscaled = fit!(m,xtrain_scaled,ytrain_scaled)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"And we predict...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainGMM = ŷtrainGMM_unscaled .* 1000;\nŷtestGMM = predict(m,xtest_scaled) .* 1000;\n\n(rme_train, rme_test) = relative_mean_error.([ŷtrainGMM,ŷtestGMM],[ytrain,ytest])\npush!(results,[\"GMM\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Summary","page":"A regression task: the prediction of bike sharing demand","title":"Summary","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"This is the summary of the results (train and test relative mean error) we had trying to predict the daily bike sharing demand, given weather and calendar information:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"println(results)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"You may ask how stable are these results? How much do they depend from the specific RNG seed ? We re-evaluated a couple of times the whole script but changing random seeds (to 1000 and 10000):","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Model Train rme1 Test rme1 Train rme2 Test rme2 Train rme3 Test rme3\nDT 0.1366960 0.154720 0.0233044 0.249329 0.0621571 0.161657\nRF 0.0421267 0.180186 0.0535776 0.136920 0.0386144 0.141606\nRF (DecisionTree.jl) 0.0230439 0.235823 0.0801040 0.243822 0.0168764 0.219011\nNN 0.1604000 0.169952 0.1091330 0.121496 0.1481440 0.150458\nNN (Flux.jl) 0.0931161 0.166228 0.0920796 0.167047 0.0907810 0.122469\nGaussianMixtureRegressor* 0.1432800 0.293891 0.1380340 0.295470 0.1477570 0.284567","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"GMM is a deterministic model, the variations are due to the different random sampling in choosing the best hyperparameters","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Neural networks can be more precise than random forests models, but are more computationally expensive (and tricky to set up). When we compare BetaML with the algorithm-specific leading packages, we found similar results in terms of accuracy, but often the leading packages are better optimised and run more efficiently (but sometimes at the cost of being less versatile). GMM_based regressors are very computationally cheap and a good compromise if accuracy can be traded off for performances.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"View this file on Github.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"This page was generated using Literate.jl.","category":"page"},{"location":"Clustering.html#clustering_module","page":"Clustering","title":"The BetaML.Clustering Module","text":"","category":"section"},{"location":"Clustering.html","page":"Clustering","title":"Clustering","text":"Clustering","category":"page"},{"location":"Clustering.html#BetaML.Clustering","page":"Clustering","title":"BetaML.Clustering","text":"Clustering module (WIP)\n\n(Hard) Clustering algorithms \n\nProvide hard clustering methods using K-means and K-medoids. Please see also the GMM module for GMM-based soft clustering (i.e. where a probability distribution to be part of the various classes is assigned to each record instead of a single class), missing values imputation / collaborative filtering / reccomendation systems using clustering methods as backend.\n\nThe module provides the following models. Use ?[model] to access their documentation:\n\nKMeansClusterer: Classical K-mean algorithm\nKMedoidsClusterer: K-medoids algorithm with configurable distance metric\n\nSome metrics of the clustered output are available (e.g. silhouette).\n\n\n\n\n\n","category":"module"},{"location":"Clustering.html#Module-Index","page":"Clustering","title":"Module Index","text":"","category":"section"},{"location":"Clustering.html","page":"Clustering","title":"Clustering","text":"Modules = [Clustering]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Clustering.html#Detailed-API","page":"Clustering","title":"Detailed API","text":"","category":"section"},{"location":"Clustering.html","page":"Clustering","title":"Clustering","text":"Modules = [Clustering]\nPrivate = false","category":"page"},{"location":"Clustering.html#BetaML.Clustering.KMeansC_hp","page":"Clustering","title":"BetaML.Clustering.KMeansC_hp","text":"mutable struct KMeansC_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the KMeansClusterer model\n\nParameters:\n\nn_classes::Int64: Number of classes to discriminate the data [def: 3]\ndist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance, cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.\ninitialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:\n\"random\": randomly in the X space [default]\n\"grid\": using a grid approach\n\"shuffle\": selecting randomly within the available points\n\"given\": using a provided set of initial representatives provided in the initial_representatives parameter\n\ninitial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy=\"given\") [default: nothing]\n\n\n\n\n\n","category":"type"},{"location":"Clustering.html#BetaML.Clustering.KMeansClusterer","page":"Clustering","title":"BetaML.Clustering.KMeansClusterer","text":"mutable struct KMeansClusterer <: BetaMLUnsupervisedModel\n\nThe classical \"K-Means\" clustering algorithm (unsupervised).\n\nLearn to partition the data and assign each record to one of the n_classes classes according to a distance metric (default Euclidean).\n\nFor the parameters see ?KMeansC_hp and ?BML_options.\n\nNotes:\n\ndata must be numerical\nonline fitting (re-fitting with new data) is supported by using the \"old\" representatives as init ones\n\nExample :\n\njulia> using BetaML\n\njulia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8]\n5×2 Matrix{Float64}:\n 1.1 10.1\n 0.9 9.8\n 10.0 1.1\n 12.1 0.8\n 0.8 9.8\n\njulia> mod = KMeansClusterer(n_classes=2)\nKMeansClusterer - A K-Means Model (unfitted)\n\njulia> classes = fit!(mod,X)\n5-element Vector{Int64}:\n 1\n 1\n 2\n 2\n 1\n\njulia> newclasses = fit!(mod,[11 0.9])\n1-element Vector{Int64}:\n 2\n\njulia> info(mod)\nDict{String, Any} with 2 entries:\n \"fitted_records\" => 6\n \"av_distance_last_fit\" => 0.0\n \"xndims\" => 2\n\njulia> parameters(mod)\nBetaML.Clustering.KMeansMedoids_lp (a BetaMLLearnableParametersSet struct)\n- representatives: [1.13366 9.7209; 11.0 0.9]\n\n\n\n\n\n","category":"type"},{"location":"Clustering.html#BetaML.Clustering.KMedoidsC_hp","page":"Clustering","title":"BetaML.Clustering.KMedoidsC_hp","text":"mutable struct KMedoidsC_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the and KMedoidsClusterer models\n\nParameters:\n\nn_classes::Int64: Number of classes to discriminate the data [def: 3]\ndist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance, cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.\ninitialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:\n\"random\": randomly in the X space\n\"grid\": using a grid approach\n\"shuffle\": selecting randomly within the available points [default]\n\"given\": using a provided set of initial representatives provided in the initial_representatives parameter\n\ninitial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy=\"given\") [default: nothing]\n\n\n\n\n\n","category":"type"},{"location":"Clustering.html#BetaML.Clustering.KMedoidsClusterer","page":"Clustering","title":"BetaML.Clustering.KMedoidsClusterer","text":"mutable struct KMedoidsClusterer <: BetaMLUnsupervisedModel\n\nThe classical \"K-Medoids\" clustering algorithm (unsupervised).\n\nSimilar to K-Means, learn to partition the data and assign each record to one of the n_classes classes according to a distance metric, but the \"representatives\" (the cetroids) are guaranteed to be one of the training points. The algorithm work with any arbitrary distance measure (default Euclidean).\n\nFor the parameters see ?KMedoidsC_hp and ?BML_options.\n\nNotes:\n\ndata must be numerical\nonline fitting (re-fitting with new data) is supported by using the \"old\" representatives as init ones\nwith initialisation_strategy different than shuffle (the default initialisation for K-Medoids) the representatives may not be one of the training points when the algorithm doesn't perform enought iterations. This can happen for example when the number of classes is close to the number of records to cluster.\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8]\n5×2 Matrix{Float64}:\n 1.1 10.1\n 0.9 9.8\n 10.0 1.1\n 12.1 0.8\n 0.8 9.8\n\njulia> mod = KMedoidsClusterer(n_classes=2)\nKMedoidsClusterer - A K-Medoids Model (unfitted)\n\njulia> classes = fit!(mod,X)\n5-element Vector{Int64}:\n 1\n 1\n 2\n 2\n 1\n\njulia> newclasses = fit!(mod,[11 0.9])\n1-element Vector{Int64}:\n 2\n\njulia> info(mod)\nDict{String, Any} with 2 entries:\n\"fitted_records\" => 6\n\"av_distance_last_fit\" => 0.0\n\"xndims\" => 2\n\njulia> parameters(mod)\nBetaML.Clustering.KMeansMedoids_lp (a BetaMLLearnableParametersSet struct)\n- representatives: [0.9 9.8; 11.0 0.9]\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#perceptron_module","page":"Perceptron","title":"The BetaML.Perceptron Module","text":"","category":"section"},{"location":"Perceptron.html","page":"Perceptron","title":"Perceptron","text":"Perceptron","category":"page"},{"location":"Perceptron.html#BetaML.Perceptron","page":"Perceptron","title":"BetaML.Perceptron","text":"Perceptron module\n\nProvide linear and kernel classifiers.\n\nProvide the following supervised models:\n\nPerceptronClassifier: Train data using the classical perceptron\nKernelPerceptronClassifier: Train data using the kernel perceptron\nPegasosClassifier: Train data using the pegasos algorithm\n\nAll algorithms are multiclass, with PerceptronClassifier and PegasosClassifier employing a one-vs-all strategy, while KernelPerceptronClassifier employs a one-vs-one approach, and return a \"probability\" for each class in term of a dictionary for each record. Use mode(ŷ) to return a single class prediction per record.\n\nThese models are available in the MLJ framework as PerceptronClassifier,KernelPerceptronClassifier and PegasosClassifier respectivly.\n\n\n\n\n\n","category":"module"},{"location":"Perceptron.html#Module-Index","page":"Perceptron","title":"Module Index","text":"","category":"section"},{"location":"Perceptron.html","page":"Perceptron","title":"Perceptron","text":"Modules = [Perceptron]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Perceptron.html#Detailed-API","page":"Perceptron","title":"Detailed API","text":"","category":"section"},{"location":"Perceptron.html","page":"Perceptron","title":"Perceptron","text":"Modules = [Perceptron]\nPrivate = false","category":"page"},{"location":"Perceptron.html#BetaML.Perceptron.KernelPerceptronC_hp","page":"Perceptron","title":"BetaML.Perceptron.KernelPerceptronC_hp","text":"mutable struct KernelPerceptronC_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the KernelPerceptronClassifier model\n\nParameters:\n\nkernel: Kernel function to employ. See ?radial_kernel or ?polynomial_kernel for details or check ?BetaML.Utils to verify if other kernels are defined (you can alsways define your own kernel) [def: radial_kernel]\ninitial_errors: Initial distribution of the number of errors errors [def: nothing, i.e. zeros]. If provided, this should be a nModels-lenght vector of nRecords integer values vectors , where nModels is computed as (n_classes * (n_classes - 1)) / 2\nepochs: Maximum number of epochs, i.e. passages trough the whole training sample [def: 100]\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ntunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#BetaML.Perceptron.KernelPerceptronClassifier","page":"Perceptron","title":"BetaML.Perceptron.KernelPerceptronClassifier","text":"mutable struct KernelPerceptronClassifier <: BetaMLSupervisedModel\n\nA \"kernel\" version of the Perceptron model (supervised) with user configurable kernel function.\n\nFor the parameters see ? KernelPerceptronC_hp and ?BML_options\n\nLimitations:\n\ndata must be numerical\nonline training (retraining) is not supported\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> quadratic_kernel(x,y) = polynomial_kernel(x,y;degree=2)\nquadratic_kernel (generic function with 1 method)\n\njulia> mod = KernelPerceptronClassifier(epochs=100, kernel= quadratic_kernel)\nKernelPerceptronClassifier - A \"kernelised\" version of the perceptron classifier (unfitted)\n\njulia> ŷ = fit!(mod,X,y) |> mode\nRunning function BetaML.Perceptron.#KernelPerceptronClassifierBinary#17 at /home/lobianco/.julia/dev/BetaML/src/Perceptron/Perceptron_kernel.jl:133\nType `]dev BetaML` to modify the source code (this would change its location on disk)\n***\n*** Training kernel perceptron for maximum 100 iterations. Random shuffle: true\nAvg. error after iteration 1 : 0.5\nAvg. error after iteration 10 : 0.16666666666666666\n*** Avg. error after epoch 13 : 0.0 (all elements of the set has been correctly classified)\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#BetaML.Perceptron.PegasosC_hp","page":"Perceptron","title":"BetaML.Perceptron.PegasosC_hp","text":"mutable struct PegasosC_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the PegasosClassifier model.\n\nParameters:\n\nlearning_rate::Function: Learning rate [def: (epoch -> 1/sqrt(epoch))]\nlearning_rate_multiplicative::Float64: Multiplicative term of the learning rate [def: 0.5]\ninitial_parameters::Union{Nothing, Matrix{Float64}}: Initial parameters. If given, should be a matrix of n-classes by feature dimension + 1 (to include the constant term as the first element) [def: nothing, i.e. zeros]\nepochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]\nshuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\nforce_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]\nreturn_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]\ntunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#BetaML.Perceptron.PegasosClassifier","page":"Perceptron","title":"BetaML.Perceptron.PegasosClassifier","text":"mutable struct PegasosClassifier <: BetaMLSupervisedModel\n\nThe PegasosClassifier model, a linear, gradient-based classifier. Multiclass is supported using a one-vs-all approach.\n\nSee ?PegasosC_hp and ?BML_options for applicable hyperparameters and options. \n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> mod = PegasosClassifier(epochs=100,learning_rate = (epoch -> 0.05) )\nPegasosClassifier - a loss-based linear classifier without regularisation term (unfitted)\n\njulia> ŷ = fit!(mod,X,y) |> mode\n***\n*** Training pegasos for maximum 100 iterations. Random shuffle: true\nAvg. error after iteration 1 : 0.5\n*** Avg. error after epoch 3 : 0.0 (all elements of the set has been correctly classified)\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n \"a\"\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#BetaML.Perceptron.PerceptronC_hp","page":"Perceptron","title":"BetaML.Perceptron.PerceptronC_hp","text":"mutable struct PerceptronC_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the PerceptronClassifier model\n\nParameters:\n\ninitial_parameters::Union{Nothing, Matrix{Float64}}: Initial parameters. If given, should be a matrix of n-classes by feature dimension + 1 (to include the constant term as the first element) [def: nothing, i.e. zeros]\nepochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]\nshuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\nforce_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]\nreturn_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]\ntunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#BetaML.Perceptron.PerceptronClassifier","page":"Perceptron","title":"BetaML.Perceptron.PerceptronClassifier","text":"mutable struct PerceptronClassifier <: BetaMLSupervisedModel\n\nThe classical \"perceptron\" linear classifier (supervised).\n\nFor the parameters see ?PerceptronC_hp and ?BML_options.\n\nNotes:\n\ndata must be numerical\nonline fitting (re-fitting with new data) is not supported\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> mod = PerceptronClassifier(epochs=100,return_mean_hyperplane=false)\nPerceptronClassifier - The classic linear perceptron classifier (unfitted)\n\njulia> ŷ = fit!(mod,X,y) |> mode\nRunning function BetaML.Perceptron.#perceptronBinary#84 at /home/lobianco/.julia/dev/BetaML/src/Perceptron/Perceptron_classic.jl:150\nType `]dev BetaML` to modify the source code (this would change its location on disk)\n***\n*** Training perceptron for maximum 100 iterations. Random shuffle: true\nAvg. error after iteration 1 : 0.5\n*** Avg. error after epoch 5 : 0.0 (all elements of the set has been correctly classified)\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n \"a\"\n\n\n\n\n\n","category":"type"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"EditURL = \"betaml_tutorial_classification_cars.jl\"","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#classification_tutorial","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"In this exercise we are provided with several technical characteristics (mpg, horsepower,weight, model year...) for several car's models, together with the country of origin of such models, and we would like to create a machine learning model such that the country of origin can be accurately predicted given the technical characteristics. As the information to predict is a multi-class one, this is a [classification](https://en.wikipedia.org/wiki/Statisticalclassification) task. It is a challenging exercise due to the simultaneous presence of three factors: (1) presence of missing data; (2) unbalanced data - 254 out of 406 cars are US made; (3) small dataset.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Data origin:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"dataset description: https://archive.ics.uci.edu/ml/datasets/auto+mpg\ndata source we use here: https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Field description:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"mpg: continuous\ncylinders: multi-valued discrete\ndisplacement: continuous\nhorsepower: continuous\nweight: continuous\nacceleration: continuous\nmodel year: multi-valued discrete\norigin: multi-valued discrete\ncar name: string (unique for each instance)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"The car name is not used in this tutorial, so that the country is inferred only from technical data. As this field includes also the car maker, and there are several car's models from the same car maker, a more sophisticated machine learnign model could exploit this information e.g. using a bag of word encoding.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Library-loading-and-initialisation","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Library loading and initialisation","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Activating the local environment specific to BetaML documentation","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"using Pkg\nPkg.activate(joinpath(@__DIR__,\"..\",\"..\",\"..\"))","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We load a buch of packages that we'll use during this tutorial..","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"using Random, HTTP, Plots, CSV, DataFrames, BenchmarkTools, StableRNGs, BetaML\nimport DecisionTree, Flux\nimport Pipe: @pipe","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Machine Learning workflows include stochastic components in several steps: in the data sampling, in the model initialisation and often in the models's own algorithms (and sometimes also in the prediciton step). BetaML provides a random nuber generator (RNG) in order to simplify reproducibility ( FIXEDRNG. This is nothing else than an istance of StableRNG(123) defined in the BetaML.Utils sub-module, but you can choose of course your own \"fixed\" RNG). See the Dealing with stochasticity section in the Getting started tutorial for details.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Here we are explicit and we use our own fixed RNG:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"seed = 123 # The table at the end of this tutorial has been obtained with seeds 123, 1000 and 10000\nAFIXEDRNG = StableRNG(seed)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Data-loading-and-preparation","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Data loading and preparation","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"To load the data from the internet our workflow is (1) Retrieve the data –> (2) Clean it –> (3) Load it –> (4) Output it as a DataFrame.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"For step (1) we use HTTP.get(), for step (2) we use replace!, for steps (3) and (4) we uses the CSV package, and we use the \"pip\" |> operator to chain these operations, so that no file is ever saved on disk:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"urlDataOriginal = \"https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original\"\ndata = @pipe HTTP.get(urlDataOriginal).body |>\n replace!(_, UInt8('\\t') => UInt8(' ')) |> # the original dataset has mixed field delimiters !\n CSV.File(_, delim=' ', missingstring=\"NA\", ignorerepeated=true, header=false) |>\n DataFrame;\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"This results in a table where the rows are the observations (the various cars' models) and the column the fields. All BetaML models expect this layout.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"As the dataset is ordered, we randomly shuffle the data.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"idx = randperm(copy(AFIXEDRNG),size(data,1))\ndata[idx, :]\ndescribe(data)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Columns 1 to 7 contain characteristics of the car, while column 8 encodes the country or origin (\"1\" -> US, \"2\" -> EU, \"3\" -> Japan). That's the variable we want to be able to predict.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Columns 9 contains the car name, but we are not going to use this information in this tutorial. Note also that some fields have missing data.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Our first step is hence to divide the dataset in features (the x) and the labels (the y) we want to predict. The x is then a Julia standard Matrix of 406 rows by 7 columns and the y is a vector of the 406 observations:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"x = Matrix{Union{Missing,Float64}}(data[:,1:7]);\ny = Vector{Int64}(data[:,8]);\nx = fit!(Scaler(),x)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Some algorithms that we will use today don't accept missing data, so we need to impute them. BetaML provides several imputation models in the Imputation module. Note that many of these imputation models can be used for Collaborative Filtering / Recomendation Systems. Models as GaussianMixtureImputer have the advantage over traditional algorithms as k-nearest neighbors (KNN) that GMM can \"detect\" the hidden structure of the observed data, where some observation can be similar to a certain pool of other observvations for a certain characteristic, but similar to an other pool of observations for other characteristics. Here we use RandomForestImputer. While the model allows for reproducible multiple imputations (with the parameter multiple_imputation=an_integer) and multiple passages trough the various columns (fields) containing missing data (with the option recursive_passages=an_integer), we use here just a single imputation and a single passage. As all BetaML models, RandomForestImputer follows the patters m=ModelConstruction(pars); fit!(m,x,[y]); est = predict(m,x) where est can be an estimation of some labels or be some characteristics of x itself (the imputed version, as in this case, a reprojected version as in PCAEncoder), depending if the model is supervised or not. See the API user documentationfor more details. For imputers, the output ofpredictis the matrix with the imputed values replacing the missing ones, and we write here the model in a single line using a convenience feature that when the defaultcacheparameter is used in the model constructor thefit!` function returns itself the prediciton over the trained data:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"x = fit!(RandomForestImputer(rng=copy(AFIXEDRNG)),x) # Same as `m = RandomForestImputer(rng=copy(AFIXEDRNG)); fit!(m,x); x= predict(m,x)`","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Further, some models don't work with categorical data as well, so we need to represent our y as a matrix with a separate column for each possible categorical value (the so called \"one-hot\" representation). For example, within a three classes field, the individual value 2 (or \"Europe\" for what it matters) would be represented as the vector [0 1 0], while 3 (or \"Japan\") would become the vector [0 0 1]. To encode as one-hot we use the OneHotEncoder in BetaML.Utils, using the same shortcut as for the imputer we used earlier:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"y_oh = fit!(OneHotEncoder(),y)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"In supervised machine learning it is good practice to partition the available data in a training, validation, and test subsets, where the first one is used to train the ML algorithm, the second one to train any eventual \"hyper-parameters\" of the algorithm and the test subset is finally used to evaluate the quality of the algorithm. Here, for brevity, we use only the train and the test subsets, implicitly assuming we already know the best hyper-parameters. Please refer to the regression tutorial for examples of the auto-tune feature of BetaML models to \"automatically\" train the hyper-parameters (hint: in most cases just add the parameter autotune=true in the model constructor), or the clustering tutorial for an example of using the cross_validation function to do it manually.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We use then the partition function in BetaML.Utils, where we can specify the different data to partition (each matrix or vector to partition must have the same number of observations) and the shares of observation that we want in each subset. Here we keep 80% of observations for training (xtrain, and ytrain) and we use 20% of them for testing (xtest, and ytest):","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"((xtrain,xtest),(ytrain,ytest),(ytrain_oh,ytest_oh)) = partition([x,y,y_oh],[0.8,1-0.8],rng=copy(AFIXEDRNG));\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We finally set up a dataframe to store the accuracies of the various models we'll use.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"results = DataFrame(model=String[],train_acc=Float64[],test_acc=Float64[])","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Random-Forests","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Random Forests","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We are now ready to use our first model, the RandomForestEstimator. Random Forests build a \"forest\" of decision trees models and then average their predictions in order to make an overall prediction, wheter a regression or a classification.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"While here the missing data has been imputed and the dataset is comprised of only numerical values, one attractive feature of BetaML RandomForestEstimator is that they can work directly with missing and categorical data without any prior processing required.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"However as the labels are encoded using integers, we need also to specify the parameter force_classification=true, otherwise the model would undergo a regression job instead.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"rfm = RandomForestEstimator(force_classification=true, rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Opposite to the RandomForestImputer and OneHotEncoder models used earielr, to train a RandomForestEstimator model we need to provide it with both the training feature matrix and the associated \"true\" training labels. We use the same shortcut to get the training predictions directly from the fit! function. In this case the predictions correspond to the labels:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtrain = fit!(rfm,xtrain,ytrain)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"You can notice that for each record the result is reported in terms of a dictionary with the possible categories and their associated probabilities.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"warning: Warning\nOnly categories with non-zero probabilities are reported for each record, and being a dictionary, the order of the categories is not undefined","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"For example ŷtrain[1] is a Dict(2 => 0.0333333, 3 => 0.933333, 1 => 0.0333333), indicating an overhelming probability that that car model originates from Japan. To retrieve the predictions with the highest probabilities use mode(ŷ):","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtrain_top = mode(ŷtrain,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Why mode takes (optionally) a RNG ? I let the answer for you :-)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"To obtain the predicted labels for the test set we simply run the predict function over the features of the test set:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtest = predict(rfm,xtest)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Finally we can measure the accuracy of our predictions with the accuracy function. We don't need to explicitly use mode, as accuracy does it itself when it is passed with predictions expressed as a dictionary:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"trainAccuracy,testAccuracy = accuracy.([ytrain,ytest],[ŷtrain,ŷtest],rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We are now ready to store our first model accuracies in the results dataframe:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"push!(results,[\"RF\",trainAccuracy,testAccuracy]);\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"The predictions are quite good, for the training set the algoritm predicted almost all cars' origins correctly, while for the testing set (i.e. those records that has not been used to train the algorithm), the correct prediction level is still quite high, at around 80% (depends on the random seed)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"While accuracy can sometimes suffice, we may often want to better understand which categories our model has trouble to predict correctly. We can investigate the output of a multi-class classifier more in-deep with a ConfusionMatrix where the true values (y) are given in rows and the predicted ones (ŷ) in columns, together to some per-class metrics like the precision (true class i over predicted in class i), the recall (predicted class i over the true class i) and others.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We fist build the ConfusionMatrix model, we train it with ŷ and y and then we print it (we do it here for the test subset):","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"cfm = ConfusionMatrix(categories_names=Dict(1=>\"US\",2=>\"EU\",3=>\"Japan\"),rng=copy(AFIXEDRNG))\nfit!(cfm,ytest,ŷtest) # the output is by default the confusion matrix in relative terms\nprint(cfm)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"From the report we can see that Japanese cars have more trouble in being correctly classified, and in particular many Japanease cars are classified as US ones. This is likely a result of the class imbalance of the data set, and could be solved by balancing the dataset with various sampling tecniques before training the model.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"If you prefer a more graphical approach, we can also plot the confusion matrix. In order to do so, we pick up information from the info(cfm) function. Indeed most BetaML models can be queried with info(model) to retrieve additional information, in terms of a dictionary, that is not necessary to the prediciton, but could still be relevant. Other functions that you can use with BetaML models are parameters(m) and hyperparamaeters(m).","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"res = info(cfm)\nheatmap(string.(res[\"categories\"]),string.(res[\"categories\"]),res[\"normalised_scores\"],seriescolor=cgrad([:white,:blue]),xlabel=\"Predicted\",ylabel=\"Actual\", title=\"Confusion Matrix (normalised scores)\")","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Comparision-with-DecisionTree.jl","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Comparision with DecisionTree.jl","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We now compare BetaML [RandomForestEstimator] with the random forest estimator of the package DecisionTrees.jl` random forests are similar in usage: we first \"build\" (train) the forest and we then make predictions out of the trained model.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"# We train the model...\nmodel = DecisionTree.build_forest(ytrain, xtrain,rng=seed)\n# ..and we generate predictions and measure their error\n(ŷtrain,ŷtest) = DecisionTree.apply_forest.([model],[xtrain,xtest]);\n(trainAccuracy,testAccuracy) = accuracy.([ytrain,ytest],[ŷtrain,ŷtest])\npush!(results,[\"RF (DecisionTrees.jl)\",trainAccuracy,testAccuracy]);\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"While the accuracy on the training set is exactly the same as for BetaML random forets, DecisionTree.jl random forests are slighly less accurate in the testing sample. Where however DecisionTrees.jl excell is in the efficiency: they are extremelly fast and memory thrifty, even if we should consider also the resources needed to impute the missing values, as they don't work with missing data.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Also, one of the reasons DecisionTrees are such efficient is that internally the data is sorted to avoid repeated comparision, but in this way they work only with features that are sortable, while BetaML random forests accept virtually any kind of input without the needs to process it.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Neural-network","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Neural network","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Neural networks (NN) can be very powerfull, but have two \"inconvenients\" compared with random forests: first, are a bit \"picky\". We need to do a bit of work to provide data in specific format. Note that this is not feature engineering. One of the advantages on neural network is that for the most this is not needed for neural networks. However we still need to \"clean\" the data. One issue is that NN don't like missing data. So we need to provide them with the feature matrix \"clean\" of missing data. Secondly, they work only with numerical data. So we need to use the one-hot encoding we saw earlier. Further, they work best if the features are scaled such that each feature has mean zero and standard deviation 1. This is why we scaled the data back at the beginning of this tutorial.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We firt measure the dimensions of our data in input (i.e. the column of the feature matrix) and the dimensions of our output, i.e. the number of categories or columns in out one-hot encoded y.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"D = size(xtrain,2)\nclasses = unique(y)\nnCl = length(classes)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"The second \"inconvenient\" of NN is that, while not requiring feature engineering, they still need a bit of practice on the way the structure of the network is built . It's not as simple as fit!(Model(),x,y) (altougth BetaML provides a \"default\" neural network structure that can be used, it isn't often adapted to the specific task). We need instead to specify how we want our layers, chain the layers together and then decide a loss overall function. Only when we done these steps, we have the model ready for training. Here we define 2 DenseLayer where, for each of them, we specify the number of neurons in input (the first layer being equal to the dimensions of the data), the output layer (for a classification task, the last layer output size beying equal to the number of classes) and an activation function for each layer (default the identity function).","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ls = 50 # number of neurons in the inned layer\nl1 = DenseLayer(D,ls,f=relu,rng=copy(AFIXEDRNG))\nl2 = DenseLayer(ls,nCl,f=relu,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"For a classification task, the last layer is a VectorFunctionLayer that has no learnable parameters but whose activation function is applied to the ensemble of the neurons, rather than individually on each neuron. In particular, for classification we pass the softmax function whose output has the same size as the input (i.e. the number of classes to predict), but we can use the VectorFunctionLayer with any function, including the pool1d function to create a \"pooling\" layer (using maximum, mean or whatever other sub-function we pass to pool1d)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"l3 = VectorFunctionLayer(nCl,f=softmax) ## Add a (parameterless) layer whose activation function (softmax in this case) is defined to all its nodes at once","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Finally we chain the layers and assign a loss function and the number of epochs we want to train the model to the constructor of NeuralNetworkEstimator:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"nn = NeuralNetworkEstimator(layers=[l1,l2,l3],loss=crossentropy,rng=copy(AFIXEDRNG),epochs=500)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Aside the layer structure and size and the number of epochs, other hyper-parameters you may want to try are the batch_size and the optimisation algoritm to employ (opt_alg).","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Now we can train our network:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtrain = fit!(nn, xtrain, ytrain_oh)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Predictions are in form of a nrecords_ by nclasses_ matrix of the probabilities of each record being in that class. To retrieve the classes with the highest probabilities we can use again the mode function:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtrain_top = mode(ŷtrain)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Once trained, we can predict the test labels. As the trained was based on the scaled feature matrix, so must be for the predictions","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtest = predict(nn,xtest)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"And finally we can measure the accuracies and store the accuracies in the result dataframe:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"trainAccuracy, testAccuracy = accuracy.([ytrain,ytest],[ŷtrain,ŷtest],rng=copy(AFIXEDRNG))\npush!(results,[\"NN\",trainAccuracy,testAccuracy]);\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"cfm = ConfusionMatrix(categories_names=Dict(1=>\"US\",2=>\"EU\",3=>\"Japan\"),rng=copy(AFIXEDRNG))\nfit!(cfm,ytest,ŷtest)\nprint(cfm)\nres = info(cfm)\nheatmap(string.(res[\"categories\"]),string.(res[\"categories\"]),res[\"normalised_scores\"],seriescolor=cgrad([:white,:blue]),xlabel=\"Predicted\",ylabel=\"Actual\", title=\"Confusion Matrix (normalised scores)\")","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"While accuracies are a bit lower, the distribution of misclassification is similar, with many Jamanease cars misclassified as US ones (here we have also some EU cars misclassified as Japanease ones).","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Comparisons-with-Flux","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Comparisons with Flux","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"As we did for Random Forests, we compare BetaML neural networks with the leading package for deep learning in Julia, Flux.jl.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"In Flux the input must be in the form (fields, observations), so we transpose our original matrices","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"xtrainT, ytrain_ohT = transpose.([xtrain, ytrain_oh])\nxtestT, ytest_ohT = transpose.([xtest, ytest_oh])","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We define the Flux neural network model in a similar way than BetaML and load it with data, we train it, predict and measure the accuracies on the training and the test sets:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We fix the random seed for Flux, altough you may still get different results depending on the number of threads used.. this is a problem we solve in BetaML with generate_parallel_rngs.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Random.seed!(seed)\n\nl1 = Flux.Dense(D,ls,Flux.relu)\nl2 = Flux.Dense(ls,nCl,Flux.relu)\nFlux_nn = Flux.Chain(l1,l2)\nfluxloss(x, y) = Flux.logitcrossentropy(Flux_nn(x), y)\nps = Flux.params(Flux_nn)\nnndata = Flux.Data.DataLoader((xtrainT, ytrain_ohT),shuffle=true)\nbegin for i in 1:500 Flux.train!(fluxloss, ps, nndata, Flux.ADAM()) end end\nŷtrain = Flux.onecold(Flux_nn(xtrainT),1:3)\nŷtest = Flux.onecold(Flux_nn(xtestT),1:3)\ntrainAccuracy, testAccuracy = accuracy.([ytrain,ytest],[ŷtrain,ŷtest])","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"push!(results,[\"NN (Flux.jl)\",trainAccuracy,testAccuracy]);\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"While the train accuracy is little bit higher that BetaML, the test accuracy remains comparable","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Perceptron-like-classifiers.","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Perceptron-like classifiers.","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We finaly test 3 \"perceptron-like\" classifiers, the \"classical\" Perceptron (PerceptronClassifier), one of the first ML algorithms (a linear classifier), a \"kernellised\" version of it (KernelPerceptronClassifier, default to using the radial kernel) and \"PegasosClassifier\" (PegasosClassifier) another linear algorithm that starts considering a gradient-based optimisation, altought without the regularisation term as in the Support Vector Machines (SVM).","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"As for the previous classifiers we construct the model object, we train and predict and we compute the train and test accuracies:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"pm = PerceptronClassifier(rng=copy(AFIXEDRNG))\nŷtrain = fit!(pm, xtrain, ytrain)\nŷtest = predict(pm, xtest)\n(trainAccuracy,testAccuracy) = accuracy.([ytrain,ytest],[ŷtrain,ŷtest])\npush!(results,[\"Perceptron\",trainAccuracy,testAccuracy]);\n\nkpm = KernelPerceptronClassifier(rng=copy(AFIXEDRNG))\nŷtrain = fit!(kpm, xtrain, ytrain)\nŷtest = predict(kpm, xtest)\n(trainAccuracy,testAccuracy) = accuracy.([ytrain,ytest],[ŷtrain,ŷtest])\npush!(results,[\"KernelPerceptronClassifier\",trainAccuracy,testAccuracy]);\n\n\npegm = PegasosClassifier(rng=copy(AFIXEDRNG))\nŷtrain = fit!(pegm, xtrain, ytrain)\nŷtest = predict(pm, xtest)\n(trainAccuracy,testAccuracy) = accuracy.([ytrain,ytest],[ŷtrain,ŷtest])\npush!(results,[\"Pegasaus\",trainAccuracy,testAccuracy]);\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Summary","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Summary","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"This is the summary of the results we had trying to predict the country of origin of the cars, based on their technical characteristics:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"println(results)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"If you clone BetaML repository","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Model accuracies on my machine with seedd 123, 1000 and 10000 respectivelly","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"model train 1 test 1 train 2 test 2 train 3 test 3\nRF 0.996923 0.765432 1.000000 0.802469 1.000000 0.888889\nRF (DecisionTrees.jl) 0.975385 0.765432 0.984615 0.777778 0.975385 0.864198\nNN 0.886154 0.728395 0.916923 0.827160 0.895385 0.876543\n│ NN (Flux.jl) 0.793846 0.654321 0.938462 0.790123 0.935385 0.851852\n│ Perceptron 0.778462 0.703704 0.720000 0.753086 0.670769 0.654321\n│ KernelPerceptronClassifier 0.987692 0.703704 0.978462 0.777778 0.944615 0.827160\n│ Pegasaus 0.732308 0.703704 0.633846 0.753086 0.575385 0.654321","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We warn that this table just provides a rought idea of the various algorithms performances. Indeed there is a large amount of stochasticity both in the sampling of the data used for training/testing and in the initial settings of the parameters of the algorithm. For a statistically significant comparision we would have to repeat the analysis with multiple sampling (e.g. by cross-validation, see the clustering tutorial for an example) and initial random parameters.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Neverthless the table above shows that, when we compare BetaML with the algorithm-specific leading packages, we found similar results in terms of accuracy, but often the leading packages are better optimised and run more efficiently (but sometimes at the cost of being less verstatile). Also, for this dataset, Random Forests seems to remain marginally more accurate than Neural Network, altought of course this depends on the hyper-parameters and, with a single run of the models, we don't know if this difference is significant.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"View this file on Github.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"This page was generated using Literate.jl.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#getting_started","page":"Getting started","title":"Getting started","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html#Introduction","page":"Getting started","title":"Introduction","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"This \"tutorial\" part of the documentation presents a step-by-step guide to the main algorithms and utility functions provided by BetaML and comparisons with the leading packages in each field. Aside this page, the tutorial is divided in the following sections:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Classification tutorial - Topics: Decision trees and random forests, neural networks (softmax), dealing with stochasticity, loading data from internet\nRegression tutorial - Topics: Decision trees, Random forests, neural networks, hyper-parameters autotuning, one-hot encoding, continuous error measures\nClustering tutorial - Topics: k-means, kmedoids, generative (gaussian) mixture models (gmm), cross-validation, ordinal encoding","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Detailed usage instructions on each algorithm can be found on each model struct (listed here), while theoretical notes describing most of them can be found at the companion repository https://github.com/sylvaticus/MITx_6.86x.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"The overall \"philosophy\" of BetaML is to support simple machine learning tasks easily and make complex tasks possible. An the most basic level, the majority of algorithms have default parameters suitable for a basic analysis. A great level of flexibility can be already achieved by just employing the full set of model parameters, for example changing the distance function in KMedoidsClusterer to l1_distance (aka \"Manhattan distance\"). Finally, the greatest flexibility can be obtained by customising BetaML and writing, for example, its own neural network layer type (by subclassing AbstractLayer), its own sampler (by subclassing AbstractDataSampler) or its own mixture component (by subclassing AbstractMixture), In such a cases, while not required by any means, please consider to give it back to the community and open a pull request to integrate your work in BetaML.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"If you are looking for an introductory book on Julia, you could consider \"Julia Quick Syntax Reference\" (Apress,2019) or the online course \"Introduction to Scientific Programming and Machine Learning with Julia\".","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"A few conventions applied across the library:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Type names use the so-called \"CamelCase\" convention, where the words are separated by a capital letter rather than _ ,while function names use lower letters only, with words eventually separated (but only when really neeed for readibility) by an _;\nWhile some functions provide a dims parameter, most BetaML algorithms expect the input data layout with observations organised by rows and fields/features by columns. Almost everywhere in the code and documentation we refer with N the number of observations/records, D the number of dimensions and K the number of classes/categories;\nWhile some algorithms accept as input DataFrames, the usage of standard arrays is encourages (if the data is passed to the function as dataframe, it may be converted to standard arrays somewhere inside inner loops, leading to great inefficiencies)\nThe accuracy/error/loss measures expect the ground true y and then the estimated ŷ (in this order)","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#using_betaml_from_other_languages","page":"Getting started","title":"Using BetaML from other programming languages","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"In this section we provide two examples of using BetaML directly in Python or R (with automatic object conversion). Click Details for a more extended explanation of these examples. While I have no experience with, the same approach can be used to access BetaML from any language with a binding to Julia, like Matlab or Javascript. ","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#Use-BetaML-in-Python","page":"Getting started","title":"Use BetaML in Python","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"$ python3 -m pip install --user juliacall","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> from juliacall import Main as jl\n>>> import numpy as np\n>>> from sklearn import datasets\n>>> jl.seval('using Pkg; Pkg.add(\"BetaML\")') # Only once \n>>> jl.seval(\"using BetaML\")\n>>> bml = jl.BetaML\n>>> iris = datasets.load_iris()\n>>> X = iris.data[:, :4]\n>>> y = iris.target + 1 # Julia arrays start from 1 not 0\n>>> (Xs,ys) = bml.consistent_shuffle([X,y])\n>>> m = bml.KMeansClusterer(n_classes=3)\n>>> yhat = bml.fit_ex(m,Xs) # Python doesn't allow exclamation marks in function names, so we use `fit_ex(⋅)` instead of `fit!(⋅)` (the original function name)\n>>> m._jl_display() # force a \"Julian\" way of displaying of Julia objects\n>>> acc = bml.accuracy(ys,yhat,ignorelabels=True)\n>>> acc\n 0.8933333333333333","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"
Details","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We show for Python two separate \"Julia from Python\" interfaces, PyJulia and JuliaCall with the second one being the most recent one.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#With-the-classical-pyjulia-package","page":"Getting started","title":"With the classical pyjulia package","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"PyJulia is a relativelly old method to use Julia code and libraries in Python. It works great but it requires that you already have a Julia working installation on your PC, so we need first to download and install the Julia binaries for our operating system from JuliaLang.org. Be sure that Julia is working by opening the Julia terminal and e.g. typing println(\"hello world\")","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Install PyJulia with: ","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"$ python3 -m pip install --user julia # the name of the package in `pip` is `julia`, not `PyJulia`","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"For the sake of this tutorial, let's also install in Python a package that contains the dataset that we will use:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"$ python3 -m pip install --user sklearn # only for retrieving the dataset in the python way","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can now open a Python terminal and, to obtain an interface to Julia, just run:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> import julia\n>>> julia.install() # Only once to set-up in julia the julia packages required by PyJulia\n>>> jl = julia.Julia(compiled_modules=False)","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"If we have multiple Julia versions, we can specify the one to use in Python passing julia=\"/path/to/julia/binary/executable\" (e.g. julia = \"/home/myUser/lib/julia-1.8.0/bin/julia\") to the install() function.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"The compiled_module=False in the Julia constructor is a workaround to the common situation when the Python interpreter is statically linked to libpython, but it will slow down the interactive experience, as it will disable Julia packages pre-compilation, and every time we will use a module for the first time, this will need to be compiled first. Other, more efficient but also more complicate, workarounds are given in the package documentation, under the https://pyjulia.readthedocs.io/en/stable/troubleshooting.html[Troubleshooting section].","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Let's now add to Julia the BetaML package. We can surely do it from within Julia, but we can also do it while remaining in Python:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> jl.eval('using Pkg; Pkg.add(\"BetaML\")') # Only once to install BetaML","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"While jl.eval('some Julia code') evaluates any arbitrary Julia code (see below), most of the time we can use Julia in a more direct way. Let's start by importing the BetaML Julia package as a submodule of the Python Julia module:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> from julia import BetaML\n>>> jl.eval('using BetaML')","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"As you can see, it is no different than importing any other Python module.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"For the data, let's load it \"Python side\":","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> from sklearn import datasets\n>>> iris = datasets.load_iris()\n>>> X = iris.data[:, :4]\n>>> y = iris.target + 1 # Julia arrays start from 1 not 0","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Note that X and y are Numpy arrays.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can now call BetaML functions as we would do for any other Python library functions. In particular, we can pass to the functions (and retrieve) complex data types without worrying too much about the conversion between Python and Julia types, as these are converted automatically:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> (Xs,ys) = BetaML.consistent_shuffle([X,y]) # X and y are first converted to julia arrays and then the returned julia arrays are converted back to python Numpy arrays\n>>> m = BetaML.KMeansClusterer(n_classes=3)\n>>> yhat = BetaML.fit_ex(m,Xs) # Python doesn't allow exclamation marks in function names, so we use `fit_ex(⋅)` instead of `fit!(⋅)`\n>>> acc = BetaML.accuracy(ys,yhat,ignorelabels=True)\n>>> acc\n 0.8933333333333333","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Note: If we are using the jl.eval() interface, the objects we use must be already known to julia. To pass objects from Python to Julia, import the julia Main module (the root module in julia) and assign the needed variables, e.g.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> X_python = [1,2,3,2,4]\n>>> from julia import Main\n>>> Main.X_julia = X_python\n>>> jl.eval('BetaML.gini(X_julia)')\n0.7199999999999999","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Another alternative is to \"eval\" only the function name and pass the (python) objects in the function call:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> jl.eval('BetaML.gini')(X_python)\n0.7199999999999999","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#With-the-newer-JuliaCall-python-package","page":"Getting started","title":"With the newer JuliaCall python package","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"JuliaCall is a newer way to use Julia in Python that doesn't require separate installation of Julia.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Istall it in Python using pip as well:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"$ python3 -m pip install --user juliacall","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can now open a Python terminal and, to obtain an interface to Julia, just run:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> from juliacall import Main as jl","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"If you have julia on PATH, it will use that version, otherwise it will automatically download and install a private version for JuliaCall","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"If we have multiple Julia versions, we can specify the one to use in Python passing julia=\"/path/to/julia/binary/executable\" (e.g. julia = \"/home/myUser/lib/julia-1.8.0/bin/julia\") to the install() function.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"To add BetaML to the JuliaCall private version we evaluate the julia package manager add function:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> jl.seval('using Pkg; Pkg.add(\"BetaML\")')# Only once to install BetaML","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"As with PyJulia we can evaluate arbitrary Julia code either using jl.seval('some Julia code') and by direct call, but let's first import BetaML:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> jl.seval(\"using BetaML\")\n>>> bml = jl.BetaML","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"For the data, we reuse the X and y Numpy arrays we loaded earlier.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can now call BetaML functions as we would do for any other Python library functions. In particular, we can pass to the functions (and retrieve) complex data types without worrying too much about the conversion between Python and Julia types, as these are converted automatically:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> (Xs,ys) = bml.consistent_shuffle([X,y])\n>>> m = bml.KMeansClusterer(n_classes=3)\n>>> yhat = bml.fit_ex(m,Xs)\n>>> m._jl_display() # force a \"Julian\" way of displaying of Julia objects\n>>> acc = bml.accuracy(ys,yhat,ignorelabels=True)\n>>> acc\n 0.8933333333333333","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Note: If we are using the jl.eval() interface, the objects we use must be already known to julia. To pass objects from Python to Julia, we can write a small Julia macro:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> X_python = [1,2,3,2,4]\n>>> jlstore = jl.seval(\"(k, v) -> (@eval $(Symbol(k)) = $v; return)\")\n>>> jlstore(\"X_julia\",X_python)\n>>> jl.seval(\"BetaML.gini(X_julia)\")\n0.7199999999999999","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Another alternative is to \"eval\" only the function name and pass the (python) objects in the function call:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> X_python = [1,2,3,2,4]\n>>> jl.seval('BetaML.gini')(X_python)\n0.7199999999999999","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#Conclusions-about-using-BetaML-in-Python","page":"Getting started","title":"Conclusions about using BetaML in Python","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Using either the direct call or the eval function, wheter in Pyjulia or JuliaCall, we should be able to use all the BetaML functionalities directly from Python. If you run into problems using BetaML from Python, open an issue specifying your set-up.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"
","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#Use-BetaML-in-R","page":"Getting started","title":"Use BetaML in R","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"> install.packages(\"JuliaCall\") # only once\n> library(JuliaCall)\n> library(datasets)\n> julia_setup(installJulia = TRUE) # use installJulia = TRUE to let R download and install a private copy of julia, FALSE to use an existing Julia local installation\n> julia_eval('using Pkg; Pkg.add(\"BetaML\")') # only once\n> julia_eval(\"using BetaML\")\n> X <- as.matrix(sapply(iris[,1:4], as.numeric))\n> y <- sapply(iris[,5], as.integer)\n> xsize <- dim(X)\n> shuffled <- julia_call(\"consistent_shuffle\",list(X,y))\n> Xs <- matrix(sapply(shuffled[1],as.numeric), nrow=xsize[1])\n> ys <- as.vector(sapply(shuffled[2], as.integer))\n> m <- julia_eval('KMeansClusterer(n_classes=3)')\n> yhat <- julia_call(\"fit_ex\",m,Xs)\n> acc <- julia_call(\"accuracy\",yhat,ys,ignorelabels=TRUE)\n> acc\n[1] 0.8933333","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"
Details","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"For R, we show how to access BetaML functionalities using the JuliaCall R package (no relations with the homonymous Python package).","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Let's start by installing JuliaCall in R:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"> install.packages(\"JuliaCall\")\n> library(JuliaCall)\n> julia_setup(installJulia = TRUE) # use installJulia = TRUE to let R download and install a private copy of julia, FALSE to use an existing Julia local installation","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Note that, differently than PyJulia, the \"setup\" function needs to be called every time we start a new R section, not just when we install the JuliaCall package. If we don't have julia in the path of our system, or if we have multiple versions and we want to specify the one to work with, we can pass the JULIA_HOME = \"/path/to/julia/binary/executable/directory\" (e.g. JULIA_HOME = \"/home/myUser/lib/julia-1.1.0/bin\") parameter to the julia_setup call. Or just let JuliaCall automatically download and install a private copy of julia.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"JuliaCall depends for some things (like object conversion between Julia and R) from the Julia RCall package. If we don't already have it installed in Julia, it will try to install it automatically.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"As in Python, let's start from the data loaded from R and do some work with them in Julia:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"> library(datasets)\n> X <- as.matrix(sapply(iris[,1:4], as.numeric))\n> y <- sapply(iris[,5], as.integer)\n> xsize <- dim(X)","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Let's install BetaML. As we did in Python, we can install a Julia package from Julia itself or from within R:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"> julia_eval('using Pkg; Pkg.add(\"BetaML\")')","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can now \"import\" the BetaML julia package (in julia a \"Package\" is basically a module plus some metadata that facilitate its discovery and integration with other packages, like the reuired set) and call its functions with the julia_call(\"juliaFunction\",args) R function:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"> julia_eval(\"using BetaML\")\n> shuffled <- julia_call(\"consistent_shuffle\",list(X,y))\n> Xs <- matrix(sapply(shuffled[1],as.numeric), nrow=xsize[1])\n> ys <- as.vector(sapply(shuffled[2], as.integer))\n> m <- julia_eval('KMeansClusterer(n_classes=3)')\n> yhat <- julia_call(\"fit_ex\",m,Xs)\n> acc <- julia_call(\"accuracy\",yhat,ys,ignorelabels=TRUE)\n> acc\n[1] 0.8933333","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"As alternative, we can embed Julia code directly in R using the julia_eval() function:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"kMeansR <- julia_eval('\n function accFromKmeans(x,k,y)\n m = KMeansClusterer(n_classes=Int(k))\n yhat = fit!(m,x)\n acc = accuracy(yhat,y,ignorelabels=true)\n return acc\n end\n')","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can then call the above function in R in one of the following three ways:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"kMeansR(Xs,3,ys)\njulia_assign(\"Xs_julia\", Xs); julia_assign(\"ys_julia\", ys); julia_eval(\"accFromKmeans(Xs_julia,3,ys_julia)\")\njulia_call(\"accFromKmeans\",Xs,3,ys)","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"While other \"convenience\" functions are provided by the package, using julia_call, or julia_assign followed by julia_eval, should suffix to use BetaML from R. If you run into problems using BetaML from R, open an issue specifying your set-up.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"
","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#stochasticity_reproducibility","page":"Getting started","title":"Dealing with stochasticity and reproducibility","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Machine Learning workflows include stochastic components in several steps: in the data sampling, in the model initialisation and often in the models's own algorithms (and sometimes also in the prediction step). All BetaML models with a stochastic components support a rng parameter, standing for Random Number Generator. A RNG is a \"machine\" that streams a flow of random numbers. The flow itself however is deterministically determined for each \"seed\" (an integer number) that the RNG has been told to use. Normally this seed changes at each running of the script/model, so that stochastic models are indeed stochastic and their output differs at each run.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"If we want to obtain reproductible results we can fix the seed at the very beginning of our model with Random.seed!([AnInteger]). Now our model or script will pick up a specific flow of random numbers, but this flow will always be the same, so that its results will always be the same.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"However the default Julia RNG guarantee to provide the same flow of random numbers, conditional to the seed, only within minor versions of Julia. If we want to \"guarantee\" reproducibility of the results with different versions of Julia, or \"fix\" only some parts of our script, we can call the individual functions passing FIXEDRNG, an instance of StableRNG(FIXEDSEED) provided by BetaML, to the rng parameter. Use it with:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"MyModel(;rng=FIXEDRNG) : always produce the same sequence of results on each run of the script (\"pulling\" from the same rng object on different calls)\nMyModel(;rng=StableRNG(SOMEINTEGER)) : always produce the same result (new identical rng object on each call)","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"This is very convenient expecially during model development, as a model that use (...,rng=StableRNG(an_integer)) will provides stochastic results that are isolated (i.e. they don't depend from the consumption of the random stream from other parts of the model).","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"In particular, use rng=StableRNG(FIXEDSEED) or rng=copy(FIXEDRNG) with FIXEDSEED to retrieve the exact output as in the documentation or in the unit tests.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Most of the stochasticity appears in training a model. However in few cases (e.g. decision trees with missing values) some stochasticity appears also in predicting new data using a trained model. In such cases the model doesn't restrict the random seed, so that you can choose at predict time to use a fixed or a variable random seed.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Finally, if you plan to use multiple threads and want to provide the same stochastic output independent to the number of threads used, have a look at generate_parallel_rngs.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"\"Reproducible stochasticity\" is only one of the elements needed for a reproductible output. The other two are (a) the inputs the workflow uses and (b) the code that is evaluated. Concerning the second point Julia has a very modern package system that guarantee reproducible code evaluation (with a few exception linked to using external libraries, but BetaML models are all implemented in Julia itself). Without going in detail, you can use a pattern like this at the beginning of your machine learning workflows:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"using Pkg \ncd(@__DIR__) \nPkg.activate(\".\") # Activate a \"local\" environment, specific to this folder\nPkg.instantiate() # Download and install the required packages if not already available ","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"This will tell Julia to load the exact version of dependent packages, and recursively of their dependencies, from a Manifest.toml file that is automatically created in the script's folder, and automatically updated, when you add or update a package in your workflow. Note that these locals \"environments\" are very \"cheap\" (packages are not actually copied to each environment on your system, only referenced) and the environment doen't need to be in the same script folder as in this example, can be any folder you want to \"activate\".","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#Saving-and-loading-trained-models","page":"Getting started","title":"Saving and loading trained models","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Trained models can be saved on disk using the model_save function, and retrieved with model_load. The advantage over the serialization functionality in Julia core is that the two functions are actually wrappers around equivalent JLD2 package functions, and should maintain compatibility across different Julia versions. ","category":"page"},{"location":"GMM.html#gmm_module","page":"GMM","title":"The BetaML.GMM Module","text":"","category":"section"},{"location":"GMM.html","page":"GMM","title":"GMM","text":"GMM","category":"page"},{"location":"GMM.html#BetaML.GMM","page":"GMM","title":"BetaML.GMM","text":"GMM module\n\nGenerative (Gaussian) Mixed Model learners (supervised/unsupervised)\n\nProvides clustering and regressors using (Generative) Gaussiam Mixture Model (probabilistic).\n\nCollaborative filtering / missing values imputation / reccomendation systems based on GMM is available in the Imputation module.\n\nThe module provides the following models. Use ?[model] to access their documentation:\n\nGaussianMixtureClusterer: soft-clustering using GMM\nGaussianMixtureRegressor2: regressor using GMM as back-end (first algorithm)\nGaussianMixtureRegressor2: regressor using GMM as back-end (second algorithm)\n\nAll the algorithms works with arbitrary mixture distribution, altought only {Spherical|Diagonal|Full} Gaussian mixtures has been implemented. User defined mixtures can be used defining a struct as subtype of AbstractMixture and implementing for that mixture the following functions:\n\ninit_mixtures!(mixtures, X; minimum_variance, minimum_covariance, initialisation_strategy)\nlpdf(m,x,mask) (for the e-step)\nupdate_parameters!(mixtures, X, pₙₖ; minimum_variance, minimum_covariance) (the m-step)\nnpar(mixtures::Array{T,1}) (for the BIC/AIC computation)\n\nAll the GMM-based algorithms works only with numerical data, but accepts also Missing one.\n\nThe GaussianMixtureClusterer algorithm reports the BIC and the AIC in its info(model), but some metrics of the clustered output are also available, for example the silhouette score.\n\n\n\n\n\n","category":"module"},{"location":"GMM.html#Module-Index","page":"GMM","title":"Module Index","text":"","category":"section"},{"location":"GMM.html","page":"GMM","title":"GMM","text":"Modules = [GMM]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"GMM.html#Detailed-API","page":"GMM","title":"Detailed API","text":"","category":"section"},{"location":"GMM.html","page":"GMM","title":"GMM","text":"Modules = [GMM]\nPrivate = false","category":"page"},{"location":"GMM.html#BetaML.GMM.DiagonalGaussian-Union{Tuple{Union{Nothing, Vector{T}}}, Tuple{T}, Tuple{Union{Nothing, Vector{T}}, Union{Nothing, Vector{T}}}} where T","page":"GMM","title":"BetaML.GMM.DiagonalGaussian","text":"DiagonalGaussian(\n μ::Union{Nothing, Array{T, 1}}\n) -> DiagonalGaussian\nDiagonalGaussian(\n μ::Union{Nothing, Array{T, 1}},\n σ²::Union{Nothing, Array{T, 1}}\n) -> DiagonalGaussian\n\n\nDiagonalGaussian(μ,σ²) - Gaussian mixture with mean μ and variances σ² (and fixed zero covariances)\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.FullGaussian-Union{Tuple{Union{Nothing, Vector{T}}}, Tuple{T}, Tuple{Union{Nothing, Vector{T}}, Union{Nothing, Matrix{T}}}} where T","page":"GMM","title":"BetaML.GMM.FullGaussian","text":"FullGaussian(μ::Union{Nothing, Array{T, 1}}) -> FullGaussian\nFullGaussian(\n μ::Union{Nothing, Array{T, 1}},\n σ²::Union{Nothing, Array{T, 2}}\n) -> FullGaussian\n\n\nFullGaussian(μ,σ²) - Gaussian mixture with mean μ and variance/covariance matrix σ²\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.GaussianMixtureClusterer","page":"GMM","title":"BetaML.GMM.GaussianMixtureClusterer","text":"mutable struct GaussianMixtureClusterer <: BetaMLUnsupervisedModel\n\nAssign class probabilities to records (i.e. soft clustering) assuming a probabilistic generative model of observed data using mixtures.\n\nFor the parameters see ?GaussianMixture_hp and ?BML_options.\n\nNotes:\n\nData must be numerical\nMixtures can be user defined: see the ?GMM module documentation for a discussion on provided vs custom mixtures.\nOnline fitting (re-fitting with new data) is supported by setting the old learned mixtrures as the starting values\nThe model is fitted using an Expectation-Minimisation (EM) algorithm that supports Missing data and is implemented in the log-domain for better numerical accuracy with many dimensions\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8];\n\njulia> mod = GaussianMixtureClusterer(n_classes=2)\nGaussianMixtureClusterer - A Generative Mixture Model (unfitted)\n\njulia> prob_belong_classes = fit!(mod,X)\nIter. 1: Var. of the post 2.15612140465882 Log-likelihood -29.06452054772657\n5×2 Matrix{Float64}:\n 1.0 0.0\n 1.0 0.0\n 0.0 1.0\n 0.0 1.0\n 1.0 0.0\n\njulia> new_probs = fit!(mod,[11 0.9])\nIter. 1: Var. of the post 1.0 Log-likelihood -1.3312256125240092\n1×2 Matrix{Float64}:\n 0.0 1.0\n\njulia> info(mod)\nDict{String, Any} with 6 entries:\n \"xndims\" => 2\n \"error\" => [1.0, 0.0, 0.0]\n \"AIC\" => 15.7843\n \"fitted_records\" => 6\n \"lL\" => 1.10786\n \"BIC\" => -2.21571\n\njulia> parameters(mod)\nBetaML.GMM.GMMCluster_lp (a BetaMLLearnableParametersSet struct)\n- mixtures: DiagonalGaussian{Float64}[DiagonalGaussian{Float64}([0.9333333333333332, 9.9], [0.05, 0.05]), DiagonalGaussian{Float64}([11.05, 0.9500000000000001], [0.05, 0.05])]\n- initial_probmixtures: [0.0, 1.0]\n\n\n\n\n\n","category":"type"},{"location":"GMM.html#BetaML.GMM.GaussianMixtureRegressor","page":"GMM","title":"BetaML.GMM.GaussianMixtureRegressor","text":"mutable struct GaussianMixtureRegressor <: BetaMLUnsupervisedModel\n\nA multi-dimensional, missing data friendly non-linear regressor based on Generative (Gaussian) Mixture Model.\n\nThe training data is used to fit a probabilistic model with latent mixtures (Gaussian distributions with different covariances are already implemented) and then predictions of new data is obtained by fitting the new data to the mixtures.\n\nFor hyperparameters see GaussianMixture_hp and BML_options.\n\nThsi strategy (GaussianMixtureRegressor) works by training the EM algorithm on a combined (hcat) matrix of X and Y. At predict time, the new data is first fitted to the learned mixtures using the e-step part of the EM algorithm (and using missing values for the dimensions belonging to Y) to obtain the probabilistic assignment of each record to the various mixtures. Then these probabilities are multiplied to the mixture averages for the Y dimensions to obtain the predicted value(s) for each record. \n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8];\n\njulia> Y = X[:,1] .* 2 - X[:,2]\n5-element Vector{Float64}:\n -7.8999999999999995\n -8.0\n 18.9\n 23.4\n -8.200000000000001\n\njulia> mod = GaussianMixtureRegressor(n_classes=2)\nGaussianMixtureRegressor - A regressor based on Generative Mixture Model (unfitted)\n\njulia> ŷ = fit!(mod,X,Y)\nIter. 1: Var. of the post 2.2191120060614065 Log-likelihood -47.70971887023561\n5×1 Matrix{Float64}:\n -8.033333333333333\n -8.033333333333333\n 21.15\n 21.15\n -8.033333333333333\n\njulia> new_probs = predict(mod,[11 0.9])\n1×1 Matrix{Float64}:\n 21.15\n\njulia> info(mod)\nDict{String, Any} with 6 entries:\n \"xndims\" => 3\n \"error\" => [2.21911, 0.0260833, 3.19141e-39, 0.0]\n \"AIC\" => 60.0684\n \"fitted_records\" => 5\n \"lL\" => -17.0342\n \"BIC\" => 54.9911\n\njulia> parameters(mod)\nBetaML.GMM.GMMCluster_lp (a BetaMLLearnableParametersSet struct)\n- mixtures: DiagonalGaussian{Float64}[DiagonalGaussian{Float64}([0.9333333333333332, 9.9, -8.033333333333333], [1.1024999999999996, 0.05, 5.0625]), DiagonalGaussian{Float64}([11.05, 0.9500000000000001, 21.15], [1.1024999999999996, 0.05, 5.0625])]\n- initial_probmixtures: [0.6, 0.4]\n\n\n\n\n\n","category":"type"},{"location":"GMM.html#BetaML.GMM.GaussianMixtureRegressor2","page":"GMM","title":"BetaML.GMM.GaussianMixtureRegressor2","text":"mutable struct GaussianMixtureRegressor2 <: BetaMLUnsupervisedModel\n\nA multi-dimensional, missing data friendly non-linear regressor based on Generative (Gaussian) Mixture Model (strategy \"1\").\n\nThe training data is used to fit a probabilistic model with latent mixtures (Gaussian distributions with different covariances are already implemented) and then predictions of new data is obtained by fitting the new data to the mixtures.\n\nFor hyperparameters see GaussianMixture_hp and BML_options.\n\nThis strategy (GaussianMixtureRegressor2) works by fitting the EM algorithm on the feature matrix X. Once the data has been probabilistically assigned to the various classes, a mean value of fitting values Y is computed for each cluster (using the probabilities as weigths). At predict time, the new data is first fitted to the learned mixtures using the e-step part of the EM algorithm to obtain the probabilistic assignment of each record to the various mixtures. Then these probabilities are multiplied to the mixture averages for the Y dimensions learned at training time to obtain the predicted value(s) for each record. \n\nNotes:\n\nPredicted values are always a matrix, even when a single variable is predicted (use dropdims(ŷ,dims=2) to get a single vector).\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8];\n\njulia> Y = X[:,1] .* 2 - X[:,2]\n5-element Vector{Float64}:\n -7.8999999999999995\n -8.0\n 18.9\n 23.4\n -8.200000000000001\n\njulia> mod = GaussianMixtureRegressor2(n_classes=2)\nGaussianMixtureRegressor2 - A regressor based on Generative Mixture Model (unfitted)\n\njulia> ŷ = fit!(mod,X,Y)\nIter. 1: Var. of the post 2.15612140465882 Log-likelihood -29.06452054772657\n5×1 Matrix{Float64}:\n -8.033333333333333\n -8.033333333333333\n 21.15\n 21.15\n -8.033333333333333\n\njulia> new_probs = predict(mod,[11 0.9])\n1×1 Matrix{Float64}:\n 21.15\n\njulia> info(mod)\nDict{String, Any} with 6 entries:\n \"xndims\" => 2\n \"error\" => [2.15612, 0.118848, 4.19495e-7, 0.0, 0.0]\n \"AIC\" => 32.7605\n \"fitted_records\" => 5\n \"lL\" => -7.38023\n \"BIC\" => 29.2454\n\n\n\n\n\n","category":"type"},{"location":"GMM.html#BetaML.GMM.GaussianMixture_hp","page":"GMM","title":"BetaML.GMM.GaussianMixture_hp","text":"mutable struct GaussianMixture_hp <: BetaMLHyperParametersSet\n\nHyperparameters for GMM clusters and other GMM-based algorithms\n\nParameters:\n\nn_classes: Number of mixtures (latent classes) to consider [def: 3]\ninitial_probmixtures: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]\nmixtures: An array (of length n_classes) of the mixtures to employ (see the ?GMM module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if the initialisation_strategy parameter is set to \"gived\". This parameter can also be given symply in term of a type. In this case it is automatically extended to a vector of n_classes mixtures of the specified type. Note that mixing of different mixture types is not currently supported and that currently implemented mixtures are SphericalGaussian, DiagonalGaussian and FullGaussian. [def: DiagonalGaussian]\ntol: Tolerance to stop the algorithm [default: 10^(-6)]\nminimum_variance: Minimum variance for the mixtures [default: 0.05]\nminimum_covariance: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance.\ninitialisation_strategy: The computation method of the vector of the initial mixtures. One of the following:\n\"grid\": using a grid approach\n\"given\": using the mixture provided in the fully qualified mixtures parameter\n\"kmeans\": use first kmeans (itself initialised with a \"grid\" strategy) to set the initial mixture centers [default]\nNote that currently \"random\" and \"shuffle\" initialisations are not supported in gmm-based algorithms.\n\nmaximum_iterations: Maximum number of iterations [def: 5000]\ntunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method (suitable for the GMM-based regressors) To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"GMM.html#BetaML.GMM.SphericalGaussian-Union{Tuple{Union{Nothing, Vector{T}}}, Tuple{T}, Tuple{Union{Nothing, Vector{T}}, Union{Nothing, T}}} where T","page":"GMM","title":"BetaML.GMM.SphericalGaussian","text":"SphericalGaussian(\n μ::Union{Nothing, Array{T, 1}}\n) -> SphericalGaussian\nSphericalGaussian(\n μ::Union{Nothing, Array{T, 1}},\n σ²::Union{Nothing, T} where T\n) -> SphericalGaussian\n\n\nSphericalGaussian(μ,σ²) - Spherical Gaussian mixture with mean μ and (single) variance σ²\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.init_mixtures!-Union{Tuple{T}, Tuple{Vector{T}, Any}} where T<:BetaML.GMM.AbstractGaussian","page":"GMM","title":"BetaML.GMM.init_mixtures!","text":"init_mixtures!(mixtures::Array{T,1}, X; minimum_variance=0.25, minimum_covariance=0.0, initialisation_strategy=\"grid\",rng=Random.GLOBAL_RNG)\n\nThe parameter initialisation_strategy can be grid, kmeans or given:\n\ngrid: Uniformly cover the space observed by the data\nkmeans: Use the kmeans algorithm. If the data contains missing values, a first run of predictMissing is done under init=grid to impute the missing values just to allow the kmeans algorithm. Then the em algorithm is used with the output of kmean as init values.\ngiven: Leave the provided set of initial mixtures\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.lpdf-Tuple{DiagonalGaussian, Any, Any}","page":"GMM","title":"BetaML.GMM.lpdf","text":"lpdf(m::DiagonalGaussian,x,mask) - Log PDF of the mixture given the observation x\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.lpdf-Tuple{FullGaussian, Any, Any}","page":"GMM","title":"BetaML.GMM.lpdf","text":"lpdf(m::FullGaussian,x,mask) - Log PDF of the mixture given the observation x\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.lpdf-Tuple{SphericalGaussian, Any, Any}","page":"GMM","title":"BetaML.GMM.lpdf","text":"lpdf(m::SphericalGaussian,x,mask) - Log PDF of the mixture given the observation x\n\n\n\n\n\n","category":"method"},{"location":"StyleGuide_templates.html#Style-guide-and-template-for-BetaML-developers","page":"Style guide","title":"Style guide and template for BetaML developers","text":"","category":"section"},{"location":"StyleGuide_templates.html#Master-Style-guide","page":"Style guide","title":"Master Style guide","text":"","category":"section"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"The code in BetaML should follow the official Julia Style Guide.","category":"page"},{"location":"StyleGuide_templates.html#Names-style","page":"Style guide","title":"Names style","text":"","category":"section"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Each file name should start with a capital letter, no spaces allowed (and each file content should start with: \"Part of [BetaML](https://github.com/sylvaticus/BetaML.jl). Licence is MIT.\")\nType names use the so-called \"CamelCase\" convention, where the words are separated by a capital letter rather than _ ,while function names use lower letters only, with words eventually separated (but only when really neeed for readibility) by an _;\nIn the code and documentation we refer with N the number of observations/records, D the number of dimensions and K the number of classes/categories;\nError/accuracy/loss functions want firt y and then ŷ\nIn API exposed to users, strings are preferred to symbols","category":"page"},{"location":"StyleGuide_templates.html#Docstrings","page":"Style guide","title":"Docstrings","text":"","category":"section"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Please apply the following templates when writing a docstring for BetaML:","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Functions (add @docs if the function is not on the root module level, like for inner constructors, i.e. @docs \"\"\" foo()x ....\"\"\"):","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"\"\"\"\n$(TYPEDSIGNATURES)\n\nOne line description\n\n[Further description]\n\n# Parameters:\n\n\n\n# Returns:\n- Elements the funtion need\n\n# Notes:\n- notes\n\n# Example:\n` ` `julia\njulia> [code]\n[output]\n` ` `\n\"\"\"","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Structs","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"\"\"\"\n$(TYPEDEF)\n\nOne line description\n\n[Further description]\n\n# Fields: (if relevant)\n$(TYPEDFIELDS)\n\n# Notes:\n\n# Example:\n` ` `julia\njulia> [code]\n[output]\n` ` `\n\n\"\"\"","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Enums:","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"\"\"\"\n$(TYPEDEF)\n\nOne line description\n\n[Further description]\n\n\n# Notes:\n\n\"\"\"","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Constants","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"\"\"\"\n[4 spaces] [Constant name]\n\nOne line description\n\n[Further description]\n\n\n# Notes:\n\n\"\"\"","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Modules","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"\"\"\"\n[4 spaces] [Module name]\n\nOne line description\n\nDetailed description on the module objectives, content and organisation\n\n\"\"\"","category":"page"},{"location":"StyleGuide_templates.html#Internal-links","page":"Style guide","title":"Internal links","text":"","category":"section"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"To refer to a documented object: [`NAME`](@ref) or [`NAME`](@ref manual_id). In particular for internal links use [`?NAME`](@ref ?NAME)","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"To create an id manually: [Title](@id manual_id)","category":"page"},{"location":"StyleGuide_templates.html#Data-organisation","page":"Style guide","title":"Data organisation","text":"","category":"section"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"While some functions provide a dims parameter, most BetaML algorithms expect the input data layout with observations organised by rows and fields/features by columns.\nWhile some algorithms accept as input DataFrames, the usage of standard arrays is encourages (if the data is passed to the function as dataframe, it may be converted to standard arrays somewhere inside inner loops, leading to great inefficiencies).","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"EditURL = \"betaml_tutorial_multibranch_nn.jl\"","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html#multibranch_nn_tutorial","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"","category":"section"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Often we can \"divide\" our feature sets into different groups, where for each group we have many, many variables whose importance in prediction we don't know, but for which using a fully dense layer would be too computationally expensive. For example, we want to predict the growth of forest trees based on soil characteristics, climate characteristics and a bunch of other data (species, age, density...).","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"A soil (or climate) database may have hundreds of variables, how can we reduce them to a few that encode all the \"soil\" information? Sure, we could do a PCA or a clustering analysis, but a better way is to let our model itself find a way to encode the soil information into a vector in a way that is optimal for our prediction goal, i.e. we target the encoding task at our prediction goal.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"So we run a multi-branch neural network where one branch is given by the soil variables - it starts from all the hundreds of variables and ends in a few neuron outputs, another branch in a similar way is for the climate variables, we merge them in a branch to take into account the soil-weather interrelation (for example, it is well known that the water retention capacity of a sandy soil is quite different from that of a clay soil) and finally we merge this branch with the other variable branch to arrive at a single predicted output. In this example we focus on building, training and predicting a multi-branch neural network. See the other examples for cross-validation, hyperparameter tuning, scaling, overfitting, encoding, etc.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Data origin:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"while we hope to apply this example soon on actual real world data, for now we work on synthetic random data just to assess the validity of the network configuration.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html#Library-and-data-generation","page":"A deep neural network with multi-branch architecture","title":"Library and data generation","text":"","category":"section"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Activating the local environment specific to the tutorials","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"using Pkg\nPkg.activate(joinpath(@__DIR__,\"..\",\"..\",\"..\"))","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"We first load all the packages we are going to use","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"using StableRNGs, BetaML, Plots","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Here we are explicit and we use our own fixed RNG:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"seed = 123\nAFIXEDRNG = StableRNG(seed)","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Here we generate the random data..","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"N = 100 # records\nsoilD = 20 # dimensions of the soil database\nclimateD = 30 # dimensions of the climate database\nothervarD = 10 # dimensions of the other variables database\n\nsoilX = rand(StableRNG(seed),N,soilD)\nclimateX = rand(StableRNG(seed+10),N,climateD)\nothervarX = rand(StableRNG(seed+20),N,othervarD)\nX = hcat(soilX,climateX,othervarX)\nY = rand(StableRNG(seed+30),N)","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html#Model-definition","page":"A deep neural network with multi-branch architecture","title":"Model definition","text":"","category":"section"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"(Image: Neural Network model)","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"In the figure above, each circle represents a multi-neuron layer, with the number of neurons (output dimensions) written inside. Dotted circles are RreplicatorLayers, which simply \"pass through\" the information to the next layer. Red layers represent the layers responsible for the final step in encoding the information for a given branch. Subsequent layers will use this encoded information (i.e. decode it) to finally provide the prediction for the branch. We create a first branch for the soil variables, a second for the climate variables and finally a third for the other variables. We merge the soil and climate branches in layer 4 and the resulting branch and the other variables branch in layer 6. Finally, the single neuron layer 8 provides the prediction.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"The weights along the whole chain can be learned using the traditional backpropagation algorithm.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"The whole model can be implemented with the following code:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 1:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l1_soil = DenseLayer(20,30,f=relu,rng=copy(AFIXEDRNG))\nl1_climate = ReplicatorLayer(30)\nl1_oth = ReplicatorLayer(10)\nl1 = GroupedLayer([l1_soil,l1_climate,l1_oth])","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 2:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l2_soil = DenseLayer(30,30,f=relu,rng=copy(AFIXEDRNG))\nl2_climate = DenseLayer(30,40,f=relu,rng=copy(AFIXEDRNG))\nl2_oth = ReplicatorLayer(10)\nl2 = GroupedLayer([l2_soil,l2_climate,l2_oth])","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 3:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l3_soil = DenseLayer(30,4,f=relu,rng=copy(AFIXEDRNG)) # encoding of soil properties\nl3_climate = DenseLayer(40,4,f=relu,rng=copy(AFIXEDRNG)) # encoding of climate properties\nl3_oth = DenseLayer(10,15,f=relu,rng=copy(AFIXEDRNG))\nl3 = GroupedLayer([l3_soil,l3_climate,l3_oth])","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 4:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l4_soilclim = DenseLayer(8,15,f=relu,rng=copy(AFIXEDRNG))\nl4_oth = DenseLayer(15,15,f=relu,rng=copy(AFIXEDRNG))\nl4 = GroupedLayer([l4_soilclim,l4_oth])","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 5:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l5_soilclim = DenseLayer(15,6,f=relu,rng=copy(AFIXEDRNG)) # encoding of soil and climate properties together\nl5_oth = DenseLayer(15,6,f=relu,rng=copy(AFIXEDRNG)) # encoding of other vars\nl5 = GroupedLayer([l5_soilclim,l5_oth])","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 6:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l6 = DenseLayer(12,15,f=relu,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 7:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l7 = DenseLayer(15,15,f=relu,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 8:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l8 = DenseLayer(15,1,f=relu,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Finally we put the layers together and we create our NeuralNetworkEstimator model:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layers = [l1,l2,l3,l4,l5,l6,l7,l8]\nm = NeuralNetworkEstimator(layers=layers,opt_alg=ADAM(),epochs=100,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html#Fitting-the-model","page":"A deep neural network with multi-branch architecture","title":"Fitting the model","text":"","category":"section"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"We are now ready to fit the model to the data. By default BetaML models return directly the predictions of the trained data as the output of the fitting call, so there is no need to separate call predict(m,X).","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Ŷ = fit!(m,X,Y)","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html#Model-quality-assessment","page":"A deep neural network with multi-branch architecture","title":"Model quality assessment","text":"","category":"section"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"We can compute the relative mean error between the \"true\" Y and the Y estimated by the model.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"rme = relative_mean_error(Y,Ŷ)","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Of course we know there is no actual relation here between the X and The Y, as both are randomly generated, the result above just tell us that the network has been able to find a path between the X and Y that has been used for training, but we hope that in the real application this learned path represent a true, general relation beteen the inputs and the outputs.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Finally we can also plot Y again Ŷ and visualize how the average loss reduced along the training:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"scatter(Y,Ŷ,xlabel=\"vol observed\",ylabel=\"vol estimated\",label=nothing,title=\"Est vs. obs volumes\")","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"loss_per_epoch = info(m)[\"loss_per_epoch\"]\n\nplot(loss_per_epoch, xlabel=\"epoch\", ylabel=\"loss per epoch\", label=nothing, title=\"Loss per epoch\")","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"View this file on Github.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"This page was generated using Literate.jl.","category":"page"},{"location":"Api_v2_user.html#api_usage","page":"Introduction for user","title":"BetaML Api v2","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"note: Note\nThe API described below is the default one starting from BetaML v0.8.","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"The following API is designed to further simply the usage of the various ML models provided by BetaML introducing a common workflow. This is the user documentation. Refer to the developer documentation to learn how the API is implemented. ","category":"page"},{"location":"Api_v2_user.html#Supervised-,-unsupervised-and-transformed-models","page":"Introduction for user","title":"Supervised , unsupervised and transformed models","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Supervised refers to models designed to learn a relation between some features (often noted with X) and some labels (often noted with Y) in order to predict the label of new data given the observed features alone. Perceptron, decision trees or neural networks are common examples. Unsupervised and transformer models relate to models that learn a \"structure\" from the data itself (without any label attached from which to learn) and report either some new information using this learned structure (e.g. a cluster class) or directly process a transformation of the data itself, like PCAEncoder or missing imputers. There is no difference in BetaML about these kind of models, aside that the fitting (aka training) function for the former takes both the features and the labels. In particular there isn't a separate transform function as in other frameworks, but any information we need to learn using the model, wheter a label or some transformation of the original data, is provided by the predict function. ","category":"page"},{"location":"Api_v2_user.html#Model-constructor","page":"Introduction for user","title":"Model constructor","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"The first step is to build the model constructor by passing (using keyword arguments) the agorithm hyperparameters and various options (cache results flag, debug levels, random number generators, ...):","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"mod = ModelName(par1=X,par2=Y,...)","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Sometimes a parameter is itself another model, in such case we would have:","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"mod = ModelName(par1=OtherModel(a_par_of_OtherModel=X,...),par2=Y,...)","category":"page"},{"location":"Api_v2_user.html#Training-of-the-model","page":"Introduction for user","title":"Training of the model","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"The second step is to fit (aka train) the model:","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"fit!(m,X,[Y])","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"where Y is present only for supervised models.","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"For online algorithms, i.e. models that support updating of the learned parameters with new data, fit! can be repeated as new data arrive, altought not all algorithms guarantee that training each record at the time is equivalent to train all the records at once. In some algorithms the \"old training\" could be used as initial conditions, without consideration if these has been achieved with hundread or millions of records, and the new data we use for training become much more important than the old one for the determination of the learned parameters.","category":"page"},{"location":"Api_v2_user.html#Prediction","page":"Introduction for user","title":"Prediction","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Fitted models can be used to predict y (wheter the label, some desired new information or a transformation) given new X:","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"ŷ = predict(mod,X)","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"As a convenience, if the model has been trained while having the cache option set on true (by default) the ŷ of the last training is retained in the model object and it can be retrieved simply with predict(mod). Also in such case the fit! function returns ŷ instead of nothing effectively making it to behave like a fit-and-transform function. The 3 expressions below are hence equivalent :","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"ŷ = fit!(mod,xtrain) # only with `cache=true` in the model constructor (default)\nŷ1 = predict(mod) # only with `cache=true` in the model constructor (default)\nŷ2 = predict(mod,xtrain) ","category":"page"},{"location":"Api_v2_user.html#Other-functions","page":"Introduction for user","title":"Other functions","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Models can be resetted to lose the learned information with reset!(mod) and training information (other than the algorithm learned parameters, see below) can be retrieved with info(mod).","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Hyperparameters, options and learned parameters can be retrieved with the functions hyperparameters, parameters and options respectively. Note that they can be used also to set new values to the model as they return a reference to the required objects.","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"note: Note\nWhich is the difference between the output of info, parameters and the predict function ? The predict function (and, when cache is used, the fit! one too) returns the main information required from the model.. the prediceted label for supervised models, the class assignment for clusters or the reprojected data for PCA.... info returns complementary information like the number of dimensions of the data or the number of data emploied for training. It doesn't include information that is necessary for the training itself, like the centroids in cluser analysis. These can be retrieved instead using parameters that include all and only the information required to compute predict. ","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Some models allow an inverse transformation, that using the parameters learned at trainign time (e.g. the scale factors) perform an inverse tranformation of new data to the space of the training data (e.g. the unscaled space). Use inverse_predict(mod,xnew).","category":"page"},{"location":"MLJ_interface.html#bmlj_module","page":"MLJ interface","title":"The MLJ interface to BetaML Models","text":"","category":"section"},{"location":"MLJ_interface.html","page":"MLJ interface","title":"MLJ interface","text":"Bmlj\n","category":"page"},{"location":"MLJ_interface.html#BetaML.Bmlj","page":"MLJ interface","title":"BetaML.Bmlj","text":"MLJ interface for BetaML models\n\nIn this module we define the interface of several BetaML models. They can be used using the MLJ framework.\n\nNote that MLJ models (whose name could be the same as the underlying BetaML model) are not exported. You can access them with BetaML.Bmlj.ModelXYZ.\n\n\n\n\n\n","category":"module"},{"location":"MLJ_interface.html#Models-available-through-MLJ","page":"MLJ interface","title":"Models available through MLJ","text":"","category":"section"},{"location":"MLJ_interface.html","page":"MLJ interface","title":"MLJ interface","text":"Modules = [Bmlj]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"MLJ_interface.html#Detailed-models-documentation","page":"MLJ interface","title":"Detailed models documentation","text":"","category":"section"},{"location":"MLJ_interface.html","page":"MLJ interface","title":"MLJ interface","text":"Modules = [Bmlj]\nPrivate = true","category":"page"},{"location":"MLJ_interface.html#BetaML.Bmlj.AutoEncoder","page":"MLJ interface","title":"BetaML.Bmlj.AutoEncoder","text":"mutable struct AutoEncoder <: MLJModelInterface.Unsupervised\n\nA ready-to use AutoEncoder, from the Beta Machine Learning Toolkit (BetaML) for ecoding and decoding of data using neural networks\n\nParameters:\n\nencoded_size: The number of neurons (i.e. dimensions) of the encoded data. If the value is a float it is consiered a percentual (to be rounded) of the dimensionality of the data [def: 0.33]\nlayers_size: Inner layer dimension (i.e. number of neurons). If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: nothing that applies a specific heuristic]. Consider that the underlying neural network is trying to predict multiple values at the same times. Normally this requires many more neurons than a scalar prediction. If e_layers or d_layers are specified, this parameter is ignored for the respective part.\ne_layers: The layers (vector of AbstractLayers) responsable of the encoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]. See subtypes(BetaML.AbstractLayer) for supported layers\nd_layers: The layers (vector of AbstractLayers) responsable of the decoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]. See subtypes(BetaML.AbstractLayer) for supported layers\nloss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as (n x d) matrices.\nwarning: Warning\nIf you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.\n\ndloss: Derivative of the loss function [def: BetaML.dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]\nbatch_size: Size of each individual batch [def: 8]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()] See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ntunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\ndescr: An optional title and/or description for this model\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\ndata must be numerical\nuse transform to obtain the encoded data, and inverse_trasnform to decode to the original data\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load AutoEncoder pkg = \"BetaML\" verbosity=0;\n\njulia> model = modelType(encoded_size=2,layers_size=10);\n\njulia> mach = machine(model, X)\nuntrained Machine; caches model-specific representations of data\n model: AutoEncoder(e_layers = nothing, …)\n args: \n 1:\tSource @334 ⏎ Table{AbstractVector{Continuous}}\n\njulia> fit!(mach,verbosity=2)\n[ Info: Training machine(AutoEncoder(e_layers = nothing, …), …).\n***\n*** Training for 200 epochs with algorithm BetaML.Nn.ADAM.\nTraining.. \t avg loss on epoch 1 (1): \t 35.48243542158747\nTraining.. \t avg loss on epoch 20 (20): \t 0.07528042222678126\nTraining.. \t avg loss on epoch 40 (40): \t 0.06293071729378613\nTraining.. \t avg loss on epoch 60 (60): \t 0.057035588828991145\nTraining.. \t avg loss on epoch 80 (80): \t 0.056313167754822875\nTraining.. \t avg loss on epoch 100 (100): \t 0.055521461091809436\nTraining the Neural Network... 52%|██████████████████████████████████████ | ETA: 0:00:01Training.. \t avg loss on epoch 120 (120): \t 0.06015206472927942\nTraining.. \t avg loss on epoch 140 (140): \t 0.05536835903285201\nTraining.. \t avg loss on epoch 160 (160): \t 0.05877560142428245\nTraining.. \t avg loss on epoch 180 (180): \t 0.05476302769966953\nTraining.. \t avg loss on epoch 200 (200): \t 0.049240864053557445\nTraining the Neural Network... 100%|█████████████████████████████████████████████████████████████████████████| Time: 0:00:01\nTraining of 200 epoch completed. Final epoch error: 0.049240864053557445.\ntrained Machine; caches model-specific representations of data\n model: AutoEncoder(e_layers = nothing, …)\n args: \n 1:\tSource @334 ⏎ Table{AbstractVector{Continuous}}\n\n\njulia> X_latent = transform(mach, X)\n150×2 Matrix{Float64}:\n 7.01701 -2.77285\n 6.50615 -2.9279\n 6.5233 -2.60754\n ⋮ \n 6.70196 -10.6059\n 6.46369 -11.1117\n 6.20212 -10.1323\n\njulia> X_recovered = inverse_transform(mach,X_latent)\n150×4 Matrix{Float64}:\n 5.04973 3.55838 1.43251 0.242215\n 4.73689 3.19985 1.44085 0.295257\n 4.65128 3.25308 1.30187 0.244354\n ⋮ \n 6.50077 2.93602 5.3303 1.87647\n 6.38639 2.83864 5.54395 2.04117\n 6.01595 2.67659 5.03669 1.83234\n\njulia> BetaML.relative_mean_error(MLJ.matrix(X),X_recovered)\n0.03387721261716176\n\n\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.DecisionTreeClassifier","page":"MLJ interface","title":"BetaML.Bmlj.DecisionTreeClassifier","text":"mutable struct DecisionTreeClassifier <: MLJModelInterface.Probabilistic\n\nA simple Decision Tree model for classification with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nmax_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. look at all features]\nsplitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: gini]. Either gini, entropy or a custom function. It can also be an anonymous function.\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load DecisionTreeClassifier pkg = \"BetaML\" verbosity=0\nBetaML.Trees.DecisionTreeClassifier\n\njulia> model = modelType()\nDecisionTreeClassifier(\n max_depth = 0, \n min_gain = 0.0, \n min_records = 2, \n max_features = 0, \n splitting_criterion = BetaML.Utils.gini, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(DecisionTreeClassifier(max_depth = 0, …), …).\n\njulia> cat_est = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt32, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>0.0, virginica=>0.0)\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>0.0, virginica=>0.0)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)\n UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)\n UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.DecisionTreeRegressor","page":"MLJ interface","title":"BetaML.Bmlj.DecisionTreeRegressor","text":"mutable struct DecisionTreeRegressor <: MLJModelInterface.Deterministic\n\nA simple Decision Tree model for regression with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nmax_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. look at all features]\nsplitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: variance]. Either variance or a custom function. It can also be an anonymous function.\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> modelType = @load DecisionTreeRegressor pkg = \"BetaML\" verbosity=0\nBetaML.Trees.DecisionTreeRegressor\n\njulia> model = modelType()\nDecisionTreeRegressor(\n max_depth = 0, \n min_gain = 0.0, \n min_records = 2, \n max_features = 0, \n splitting_criterion = BetaML.Utils.variance, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(DecisionTreeRegressor(max_depth = 0, …), …).\n\njulia> ŷ = predict(mach, X);\n\njulia> hcat(y,ŷ)\n506×2 Matrix{Float64}:\n 24.0 26.35\n 21.6 21.6\n 34.7 34.8\n ⋮ \n 23.9 23.75\n 22.0 22.2\n 11.9 13.2\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.GaussianMixtureClusterer","page":"MLJ interface","title":"BetaML.Bmlj.GaussianMixtureClusterer","text":"mutable struct GaussianMixtureClusterer <: MLJModelInterface.Unsupervised\n\nA Expectation-Maximisation clustering algorithm with customisable mixtures, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nn_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]\ninitial_probmixtures::AbstractVector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]\nmixtures::Union{Type, Vector{var\"#s1270\"} where var\"#s1270\"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the ?GMM module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if the initialisation_strategy parameter is set to \"gived\". This parameter can also be given symply in term of a type. In this case it is automatically extended to a vector of n_classes mixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def: [DiagonalGaussian() for i in 1:n_classes]]\ntol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]\nminimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]\nminimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).\ninitialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:\n\"grid\": using a grid approach\n\"given\": using the mixture provided in the fully qualified mixtures parameter\n\"kmeans\": use first kmeans (itself initialised with a \"grid\" strategy) to set the initial mixture centers [default]\nNote that currently \"random\" and \"shuffle\" initialisations are not supported in gmm-based algorithms.\nmaximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]\nrng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nExample:\n\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load GaussianMixtureClusterer pkg = \"BetaML\" verbosity=0\nBetaML.GMM.GaussianMixtureClusterer\n\njulia> model = modelType()\nGaussianMixtureClusterer(\n n_classes = 3, \n initial_probmixtures = Float64[], \n mixtures = BetaML.GMM.DiagonalGaussian{Float64}[BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing)], \n tol = 1.0e-6, \n minimum_variance = 0.05, \n minimum_covariance = 0.0, \n initialisation_strategy = \"kmeans\", \n maximum_iterations = 9223372036854775807, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(GaussianMixtureClusterer(n_classes = 3, …), …).\nIter. 1: Var. of the post 10.800150114964184 Log-likelihood -650.0186451891216\n\njulia> classes_est = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, Int64, UInt32, Float64}:\n UnivariateFinite{Multiclass{3}}(1=>1.0, 2=>4.17e-15, 3=>2.1900000000000003e-31)\n UnivariateFinite{Multiclass{3}}(1=>1.0, 2=>1.25e-13, 3=>5.87e-31)\n UnivariateFinite{Multiclass{3}}(1=>1.0, 2=>4.5e-15, 3=>1.55e-32)\n UnivariateFinite{Multiclass{3}}(1=>1.0, 2=>6.93e-14, 3=>3.37e-31)\n ⋮\n UnivariateFinite{Multiclass{3}}(1=>5.39e-25, 2=>0.0167, 3=>0.983)\n UnivariateFinite{Multiclass{3}}(1=>7.5e-29, 2=>0.000106, 3=>1.0)\n UnivariateFinite{Multiclass{3}}(1=>1.6e-20, 2=>0.594, 3=>0.406)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.GaussianMixtureImputer","page":"MLJ interface","title":"BetaML.Bmlj.GaussianMixtureImputer","text":"mutable struct GaussianMixtureImputer <: MLJModelInterface.Unsupervised\n\nImpute missing values using a probabilistic approach (Gaussian Mixture Models) fitted using the Expectation-Maximisation algorithm, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nn_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]\ninitial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]\nmixtures::Union{Type, Vector{var\"#s1270\"} where var\"#s1270\"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module in BetaML). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to \"gived\" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported and that currently implemented mixtures areSphericalGaussian,DiagonalGaussianandFullGaussian. [def:DiagonalGaussian`]\ntol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]\nminimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]\nminimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance.\ninitialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:\n\"grid\": using a grid approach\n\"given\": using the mixture provided in the fully qualified mixtures parameter\n\"kmeans\": use first kmeans (itself initialised with a \"grid\" strategy) to set the initial mixture centers [default]\nNote that currently \"random\" and \"shuffle\" initialisations are not supported in gmm-based algorithms.\n\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample :\n\njulia> using MLJ\n\njulia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;\n\njulia> modelType = @load GaussianMixtureImputer pkg = \"BetaML\" verbosity=0\nBetaML.Imputation.GaussianMixtureImputer\n\njulia> model = modelType(initialisation_strategy=\"grid\")\nGaussianMixtureImputer(\n n_classes = 3, \n initial_probmixtures = Float64[], \n mixtures = BetaML.GMM.DiagonalGaussian{Float64}[BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing)], \n tol = 1.0e-6, \n minimum_variance = 0.05, \n minimum_covariance = 0.0, \n initialisation_strategy = \"grid\", \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(GaussianMixtureImputer(n_classes = 3, …), …).\nIter. 1: Var. of the post 2.0225921341714286 Log-likelihood -42.96100103213314\n\njulia> X_full = transform(mach) |> MLJ.matrix\n9×2 Matrix{Float64}:\n 1.0 10.5\n 1.5 14.7366\n 1.8 8.0\n 1.7 15.0\n 3.2 40.0\n 2.51842 15.1747\n 3.3 38.0\n 2.47412 -2.3\n 5.2 -2.4\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.GaussianMixtureRegressor","page":"MLJ interface","title":"BetaML.Bmlj.GaussianMixtureRegressor","text":"mutable struct GaussianMixtureRegressor <: MLJModelInterface.Deterministic\n\nA non-linear regressor derived from fitting the data on a probabilistic model (Gaussian Mixture Model). Relatively fast but generally not very precise, except for data with a structure matching the chosen underlying mixture.\n\nThis is the single-target version of the model. If you want to predict several labels (y) at once, use the MLJ model MultitargetGaussianMixtureRegressor.\n\nHyperparameters:\n\nn_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]\ninitial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]\nmixtures::Union{Type, Vector{var\"#s1270\"} where var\"#s1270\"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to \"gived\" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def:[DiagonalGaussian() for i in 1:n_classes]`]\ntol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]\nminimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]\nminimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).\ninitialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:\n\"grid\": using a grid approach\n\"given\": using the mixture provided in the fully qualified mixtures parameter\n\"kmeans\": use first kmeans (itself initialised with a \"grid\" strategy) to set the initial mixture centers [default]\nNote that currently \"random\" and \"shuffle\" initialisations are not supported in gmm-based algorithms.\n\nmaximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]\nrng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> modelType = @load GaussianMixtureRegressor pkg = \"BetaML\" verbosity=0\nBetaML.GMM.GaussianMixtureRegressor\n\njulia> model = modelType()\nGaussianMixtureRegressor(\n n_classes = 3, \n initial_probmixtures = Float64[], \n mixtures = BetaML.GMM.DiagonalGaussian{Float64}[BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing)], \n tol = 1.0e-6, \n minimum_variance = 0.05, \n minimum_covariance = 0.0, \n initialisation_strategy = \"kmeans\", \n maximum_iterations = 9223372036854775807, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(GaussianMixtureRegressor(n_classes = 3, …), …).\nIter. 1: Var. of the post 21.74887448784976 Log-likelihood -21687.09917379566\n\njulia> ŷ = predict(mach, X)\n506-element Vector{Float64}:\n 24.703442835305577\n 24.70344283512716\n ⋮\n 17.172486989759676\n 17.172486989759644\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.GeneralImputer","page":"MLJ interface","title":"BetaML.Bmlj.GeneralImputer","text":"mutable struct GeneralImputer <: MLJModelInterface.Unsupervised\n\nImpute missing values using arbitrary learning models, from the Beta Machine Learning Toolkit (BetaML).\n\nImpute missing values using a vector (one per column) of arbitrary learning models (classifiers/regressors, not necessarily from BetaML) that implement the interface m = Model([options]), train!(m,X,Y) and predict(m,X).\n\nHyperparameters:\n\ncols_to_impute::Union{String, Vector{Int64}}: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords \"auto\" (default) or \"all\". With \"auto\" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use \"all\" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.\nestimator::Any: An entimator model (regressor or classifier), with eventually its options (hyper-parameters), to be used to impute the various columns of the matrix. It can also be a cols_to_impute-length vector of different estimators to consider a different estimator for each column (dimension) to impute, for example when some columns are categorical (and will hence require a classifier) and some others are numerical (hence requiring a regressor). [default: nothing, i.e. use BetaML random forests, handling classification and regression jobs automatically].\nmissing_supported::Union{Bool, Vector{Bool}}: Wheter the estimator(s) used to predict the missing data support itself missing data in the training features (X). If not, when the model for a certain dimension is fitted, dimensions with missing data in the same rows of those where imputation is needed are dropped and then only non-missing rows in the other remaining dimensions are considered. It can be a vector of boolean values to specify this property for each individual estimator or a single booleann value to apply to all the estimators [default: false]\nfit_function::Union{Function, Vector{Function}}: The function used by the estimator(s) to fit the model. It should take as fist argument the model itself, as second argument a matrix representing the features, and as third argument a vector representing the labels. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.fit!]\npredict_function::Union{Function, Vector{Function}}: The function used by the estimator(s) to predict the labels. It should take as fist argument the model itself and as second argument a matrix representing the features. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.predict]\nrecursive_passages::Int64: Define the number of times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]. Note that this influence only the specific GeneralImputer code, the individual estimators may have their own rng (or similar) parameter.\n\nExamples :\n\nUsing BetaML models:\n\njulia> using MLJ;\njulia> import BetaML # The library from which to get the individual estimators to be used for each column imputation\njulia> X = [\"a\" 8.2;\n \"a\" missing;\n \"a\" 7.8;\n \"b\" 21;\n \"b\" 18;\n \"c\" -0.9;\n missing 20;\n \"c\" -1.8;\n missing -2.3;\n \"c\" -2.4] |> table ;\njulia> modelType = @load GeneralImputer pkg = \"BetaML\" verbosity=0\nBetaML.Imputation.GeneralImputer\njulia> model = modelType(estimator=BetaML.DecisionTreeEstimator(),recursive_passages=2);\njulia> mach = machine(model, X);\njulia> fit!(mach);\n[ Info: Training machine(GeneralImputer(cols_to_impute = auto, …), …).\njulia> X_full = transform(mach) |> MLJ.matrix\n10×2 Matrix{Any}:\n \"a\" 8.2\n \"a\" 8.0\n \"a\" 7.8\n \"b\" 21\n \"b\" 18\n \"c\" -0.9\n \"b\" 20\n \"c\" -1.8\n \"c\" -2.3\n \"c\" -2.4\n\nUsing third party packages (in this example DecisionTree):\n\njulia> using MLJ;\njulia> import DecisionTree # An example of external estimators to be used for each column imputation\njulia> X = [\"a\" 8.2;\n \"a\" missing;\n \"a\" 7.8;\n \"b\" 21;\n \"b\" 18;\n \"c\" -0.9;\n missing 20;\n \"c\" -1.8;\n missing -2.3;\n \"c\" -2.4] |> table ;\njulia> modelType = @load GeneralImputer pkg = \"BetaML\" verbosity=0\nBetaML.Imputation.GeneralImputer\njulia> model = modelType(estimator=[DecisionTree.DecisionTreeClassifier(),DecisionTree.DecisionTreeRegressor()], fit_function=DecisionTree.fit!,predict_function=DecisionTree.predict,recursive_passages=2);\njulia> mach = machine(model, X);\njulia> fit!(mach);\n[ Info: Training machine(GeneralImputer(cols_to_impute = auto, …), …).\njulia> X_full = transform(mach) |> MLJ.matrix\n10×2 Matrix{Any}:\n \"a\" 8.2\n \"a\" 7.51111\n \"a\" 7.8\n \"b\" 21\n \"b\" 18\n \"c\" -0.9\n \"b\" 20\n \"c\" -1.8\n \"c\" -2.3\n \"c\" -2.4\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.KMeansClusterer","page":"MLJ interface","title":"BetaML.Bmlj.KMeansClusterer","text":"mutable struct KMeansClusterer <: MLJModelInterface.Unsupervised\n\nThe classical KMeansClusterer clustering algorithm, from the Beta Machine Learning Toolkit (BetaML).\n\nParameters:\n\nn_classes::Int64: Number of classes to discriminate the data [def: 3]\ndist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance), cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that, contrary to KMedoidsClusterer, the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.\ninitialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:\n\"random\": randomly in the X space\n\"grid\": using a grid approach\n\"shuffle\": selecting randomly within the available points [default]\n\"given\": using a provided set of initial representatives provided in the initial_representatives parameter\n\ninitial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy=\"given\") [default: nothing]\nrng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\ndata must be numerical\nonline fitting (re-fitting with new data) is supported\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load KMeansClusterer pkg = \"BetaML\" verbosity=0\nBetaML.Clustering.KMeansClusterer\n\njulia> model = modelType()\nKMeansClusterer(\n n_classes = 3, \n dist = BetaML.Clustering.var\"#34#36\"(), \n initialisation_strategy = \"shuffle\", \n initial_representatives = nothing, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(KMeansClusterer(n_classes = 3, …), …).\n\njulia> classes_est = predict(mach, X);\n\njulia> hcat(y,classes_est)\n150×2 CategoricalArrays.CategoricalArray{Union{Int64, String},2,UInt32}:\n \"setosa\" 2\n \"setosa\" 2\n \"setosa\" 2\n ⋮ \n \"virginica\" 3\n \"virginica\" 3\n \"virginica\" 1\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.KMedoidsClusterer","page":"MLJ interface","title":"BetaML.Bmlj.KMedoidsClusterer","text":"mutable struct KMedoidsClusterer <: MLJModelInterface.Unsupervised\n\nParameters:\n\nn_classes::Int64: Number of classes to discriminate the data [def: 3]\ndist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance), cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics.\ninitialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:\n\"random\": randomly in the X space\n\"grid\": using a grid approach\n\"shuffle\": selecting randomly within the available points [default]\n\"given\": using a provided set of initial representatives provided in the initial_representatives parameter\n\ninitial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy=\"given\") [default: nothing]\nrng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nThe K-medoids clustering algorithm with customisable distance function, from the Beta Machine Learning Toolkit (BetaML).\n\nSimilar to K-Means, but the \"representatives\" (the cetroids) are guaranteed to be one of the training points. The algorithm work with any arbitrary distance measure.\n\nNotes:\n\ndata must be numerical\nonline fitting (re-fitting with new data) is supported\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load KMedoidsClusterer pkg = \"BetaML\" verbosity=0\nBetaML.Clustering.KMedoidsClusterer\n\njulia> model = modelType()\nKMedoidsClusterer(\n n_classes = 3, \n dist = BetaML.Clustering.var\"#39#41\"(), \n initialisation_strategy = \"shuffle\", \n initial_representatives = nothing, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(KMedoidsClusterer(n_classes = 3, …), …).\n\njulia> classes_est = predict(mach, X);\n\njulia> hcat(y,classes_est)\n150×2 CategoricalArrays.CategoricalArray{Union{Int64, String},2,UInt32}:\n \"setosa\" 3\n \"setosa\" 3\n \"setosa\" 3\n ⋮ \n \"virginica\" 1\n \"virginica\" 1\n \"virginica\" 2\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.KernelPerceptronClassifier","page":"MLJ interface","title":"BetaML.Bmlj.KernelPerceptronClassifier","text":"mutable struct KernelPerceptronClassifier <: MLJModelInterface.Probabilistic\n\nThe kernel perceptron algorithm using one-vs-one for multiclass, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nkernel::Function: Kernel function to employ. See ?radial_kernel or ?polynomial_kernel (once loaded the BetaML package) for details or check ?BetaML.Utils to verify if other kernels are defined (you can alsways define your own kernel) [def: radial_kernel]\nepochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 100]\ninitial_errors::Union{Nothing, Vector{Vector{Int64}}}: Initial distribution of the number of errors errors [def: nothing, i.e. zeros]. If provided, this should be a nModels-lenght vector of nRecords integer values vectors , where nModels is computed as (n_classes * (n_classes - 1)) / 2\nshuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load KernelPerceptronClassifier pkg = \"BetaML\"\n[ Info: For silent loading, specify `verbosity=0`. \nimport BetaML ✔\nBetaML.Perceptron.KernelPerceptronClassifier\n\njulia> model = modelType()\nKernelPerceptronClassifier(\n kernel = BetaML.Utils.radial_kernel, \n epochs = 100, \n initial_errors = nothing, \n shuffle = true, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n\njulia> est_classes = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt8, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>0.665, versicolor=>0.245, virginica=>0.09)\n UnivariateFinite{Multiclass{3}}(setosa=>0.665, versicolor=>0.245, virginica=>0.09)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>0.09, versicolor=>0.245, virginica=>0.665)\n UnivariateFinite{Multiclass{3}}(setosa=>0.09, versicolor=>0.665, virginica=>0.245)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.MultitargetGaussianMixtureRegressor","page":"MLJ interface","title":"BetaML.Bmlj.MultitargetGaussianMixtureRegressor","text":"mutable struct MultitargetGaussianMixtureRegressor <: MLJModelInterface.Deterministic\n\nA non-linear regressor derived from fitting the data on a probabilistic model (Gaussian Mixture Model). Relatively fast but generally not very precise, except for data with a structure matching the chosen underlying mixture.\n\nThis is the multi-target version of the model. If you want to predict a single label (y), use the MLJ model GaussianMixtureRegressor.\n\nHyperparameters:\n\nn_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]\ninitial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]\nmixtures::Union{Type, Vector{var\"#s1270\"} where var\"#s1270\"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to \"gived\" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def:[DiagonalGaussian() for i in 1:n_classes]`]\ntol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]\nminimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]\nminimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).\ninitialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:\n\"grid\": using a grid approach\n\"given\": using the mixture provided in the fully qualified mixtures parameter\n\"kmeans\": use first kmeans (itself initialised with a \"grid\" strategy) to set the initial mixture centers [default]\nNote that currently \"random\" and \"shuffle\" initialisations are not supported in gmm-based algorithms.\n\nmaximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]\nrng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> ydouble = hcat(y, y .*2 .+5);\n\njulia> modelType = @load MultitargetGaussianMixtureRegressor pkg = \"BetaML\" verbosity=0\nBetaML.GMM.MultitargetGaussianMixtureRegressor\n\njulia> model = modelType()\nMultitargetGaussianMixtureRegressor(\n n_classes = 3, \n initial_probmixtures = Float64[], \n mixtures = BetaML.GMM.DiagonalGaussian{Float64}[BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing)], \n tol = 1.0e-6, \n minimum_variance = 0.05, \n minimum_covariance = 0.0, \n initialisation_strategy = \"kmeans\", \n maximum_iterations = 9223372036854775807, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, ydouble);\n\njulia> fit!(mach);\n[ Info: Training machine(MultitargetGaussianMixtureRegressor(n_classes = 3, …), …).\nIter. 1: Var. of the post 20.46947926187522 Log-likelihood -23662.72770575145\n\njulia> ŷdouble = predict(mach, X)\n506×2 Matrix{Float64}:\n 23.3358 51.6717\n 23.3358 51.6717\n ⋮ \n 16.6843 38.3686\n 16.6843 38.3686\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.MultitargetNeuralNetworkRegressor","page":"MLJ interface","title":"BetaML.Bmlj.MultitargetNeuralNetworkRegressor","text":"mutable struct MultitargetNeuralNetworkRegressor <: MLJModelInterface.Deterministic\n\nA simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for regression of multiple dimensional targets.\n\nParameters:\n\nlayers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers\nloss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as matrices.\nwarning: Warning\nIf you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.\n\ndloss: Derivative of the loss function [def: BetaML.dsquared_cost, i.e. use the derivative of the squared cost]. Use nothing for autodiff.\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 300]\nbatch_size: Size of each individual batch [def: 16]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ndescr: An optional title and/or description for this model\ncb: A call back function to provide information during training [def: BetaML.fitting_info]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\ndata must be numerical\nthe label should be a n-records by n-dimensions matrix \n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> ydouble = hcat(y, y .*2 .+5);\n\njulia> modelType = @load MultitargetNeuralNetworkRegressor pkg = \"BetaML\" verbosity=0\nBetaML.Nn.MultitargetNeuralNetworkRegressor\n\njulia> layers = [BetaML.DenseLayer(12,50,f=BetaML.relu),BetaML.DenseLayer(50,50,f=BetaML.relu),BetaML.DenseLayer(50,50,f=BetaML.relu),BetaML.DenseLayer(50,2,f=BetaML.relu)];\n\njulia> model = modelType(layers=layers,opt_alg=BetaML.ADAM(),epochs=500)\nMultitargetNeuralNetworkRegressor(\n layers = BetaML.Nn.AbstractLayer[BetaML.Nn.DenseLayer([-0.2591582523441157 -0.027962845131416225 … 0.16044535560124418 -0.12838827994676857; -0.30381834909561184 0.2405495243851402 … -0.2588144861880588 0.09538577909777807; … ; -0.017320292924711156 -0.14042266424603767 … 0.06366999105841187 -0.13419651752478906; 0.07393079961409338 0.24521350531110264 … 0.04256867886217541 -0.0895506802948175], [0.14249427336553644, 0.24719379413682485, -0.25595911822556566, 0.10034088778965933, -0.017086404878505712, 0.21932184025609347, -0.031413516834861266, -0.12569076082247596, -0.18080140982481183, 0.14551901873323253 … -0.13321995621967364, 0.2436582233332092, 0.0552222336976439, 0.07000814133633904, 0.2280064379660025, -0.28885681475734193, -0.07414214246290696, -0.06783184733650621, -0.055318068046308455, -0.2573488383282579], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.0395424111703751 -0.22531232360829911 … -0.04341228943744482 0.024336206858365517; -0.16481887432946268 0.17798073384748508 … -0.18594039305095766 0.051159225856547474; … ; -0.011639475293705043 -0.02347011206244673 … 0.20508869536159186 -0.1158382446274592; -0.19078069527757857 -0.007487540070740484 … -0.21341165344291158 -0.24158671316310726], [-0.04283623889330032, 0.14924461547060602, -0.17039563392959683, 0.00907774027816255, 0.21738885963113852, -0.06308040225941691, -0.14683286822101105, 0.21726892197970937, 0.19784321784707126, -0.0344988665714947 … -0.23643089430602846, -0.013560425201427584, 0.05323948910726356, -0.04644175812567475, -0.2350400292671211, 0.09628312383424742, 0.07016420995205697, -0.23266392927140334, -0.18823664451487, 0.2304486691429084], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.11504184627266828 0.08601794194664503 … 0.03843129724045469 -0.18417305624127284; 0.10181551438831654 0.13459759904443674 … 0.11094951365942118 -0.1549466590355218; … ; 0.15279817525427697 0.0846661196058916 … -0.07993619892911122 0.07145402617285884; -0.1614160186346092 -0.13032002335149 … -0.12310552194729624 -0.15915773071049827], [-0.03435885900946367, -0.1198543931290306, 0.008454985905194445, -0.17980887188986966, -0.03557204910359624, 0.19125847393334877, -0.10949700778538696, -0.09343206702591, -0.12229583511781811, -0.09123969069220564 … 0.22119233518322862, 0.2053873143308657, 0.12756489387198222, 0.11567243705173319, -0.20982445664020496, 0.1595157838386987, -0.02087331046544119, -0.20556423263489765, -0.1622837764237961, -0.019220998739847395], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.25796717031347993 0.17579536633402948 … -0.09992960168785256 -0.09426177454620635; -0.026436330246675632 0.18070899284865127 … -0.19310119102392206 -0.06904005900252091], [0.16133004882307822, -0.3061228721091248], BetaML.Utils.relu, BetaML.Utils.drelu)], \n loss = BetaML.Utils.squared_cost, \n dloss = BetaML.Utils.dsquared_cost, \n epochs = 500, \n batch_size = 32, \n opt_alg = BetaML.Nn.ADAM(BetaML.Nn.var\"#90#93\"(), 1.0, 0.9, 0.999, 1.0e-8, BetaML.Nn.Learnable[], BetaML.Nn.Learnable[]), \n shuffle = true, \n descr = \"\", \n cb = BetaML.Nn.fitting_info, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, ydouble);\n\njulia> fit!(mach);\n\njulia> ŷdouble = predict(mach, X);\n\njulia> hcat(ydouble,ŷdouble)\n506×4 Matrix{Float64}:\n 24.0 53.0 28.4624 62.8607\n 21.6 48.2 22.665 49.7401\n 34.7 74.4 31.5602 67.9433\n 33.4 71.8 33.0869 72.4337\n ⋮ \n 23.9 52.8 23.3573 50.654\n 22.0 49.0 22.1141 48.5926\n 11.9 28.8 19.9639 45.5823\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.NeuralNetworkClassifier","page":"MLJ interface","title":"BetaML.Bmlj.NeuralNetworkClassifier","text":"mutable struct NeuralNetworkClassifier <: MLJModelInterface.Probabilistic\n\nA simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for classification problems.\n\nParameters:\n\nlayers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers. The last \"softmax\" layer is automatically added.\nloss: Loss (cost) function [def: BetaML.crossentropy]. Should always assume y and ŷ as matrices.\nwarning: Warning\nIf you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.\n\ndloss: Derivative of the loss function [def: BetaML.dcrossentropy, i.e. the derivative of the cross-entropy]. Use nothing for autodiff.\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]\nbatch_size: Size of each individual batch [def: 16]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ndescr: An optional title and/or description for this model\ncb: A call back function to provide information during training [def: BetaML.fitting_info]\ncategories: The categories to represent as columns. [def: nothing, i.e. unique training values].\nhandle_unknown: How to handle categories not seens in training or not present in the provided categories array? \"error\" (default) rises an error, \"infrequent\" adds a specific column for these categories.\nother_categories_name: Which value during prediction to assign to this \"other\" category (i.e. categories not seen on training or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and \"other\" for other types]. This setting is active only if handle_unknown=\"infrequent\" and in that case it MUST be specified if Y is neither integer or strings\nrng: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\ndata must be numerical\nthe label should be a n-records by n-dimensions matrix (e.g. a one-hot-encoded data for classification), where the output columns should be interpreted as the probabilities for each categories.\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load NeuralNetworkClassifier pkg = \"BetaML\" verbosity=0\nBetaML.Nn.NeuralNetworkClassifier\n\njulia> layers = [BetaML.DenseLayer(4,8,f=BetaML.relu),BetaML.DenseLayer(8,8,f=BetaML.relu),BetaML.DenseLayer(8,3,f=BetaML.relu),BetaML.VectorFunctionLayer(3,f=BetaML.softmax)];\n\njulia> model = modelType(layers=layers,opt_alg=BetaML.ADAM())\nNeuralNetworkClassifier(\n layers = BetaML.Nn.AbstractLayer[BetaML.Nn.DenseLayer([-0.376173352338049 0.7029289511758696 -0.5589563304592478 -0.21043274001651874; 0.044758889527899415 0.6687689636685921 0.4584331114653877 0.6820506583840453; … ; -0.26546358457167507 -0.28469736227283804 -0.164225549922154 -0.516785639164486; -0.5146043550684141 -0.0699113265130964 0.14959906603941908 -0.053706860039406834], [0.7003943613125758, -0.23990840466587576, -0.23823126271387746, 0.4018101580410387, 0.2274483050356888, -0.564975060667734, 0.1732063297031089, 0.11880299829896945], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.029467850439546583 0.4074661266592745 … 0.36775675246760053 -0.595524555448422; 0.42455597698371306 -0.2458082732997091 … -0.3324220683462514 0.44439454998610595; … ; -0.2890883863364267 -0.10109249362508033 … -0.0602680568207582 0.18177278845097555; -0.03432587226449335 -0.4301192922760063 … 0.5646018168286626 0.47269177680892693], [0.13777442835428688, 0.5473306726675433, 0.3781939472904011, 0.24021813428130567, -0.0714779477402877, -0.020386373530818958, 0.5465466618404464, -0.40339790713616525], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([0.6565120540082393 0.7139211611842745 … 0.07809812467915389 -0.49346311403373844; -0.4544472987041656 0.6502667641568863 … 0.43634608676548214 0.7213049952968921; 0.41212264783075303 -0.21993289366360613 … 0.25365007887755064 -0.5664469566269569], [-0.6911986792747682, -0.2149343209329364, -0.6347727539063817], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.VectorFunctionLayer{0}(fill(NaN), 3, 3, BetaML.Utils.softmax, BetaML.Utils.dsoftmax, nothing)], \n loss = BetaML.Utils.crossentropy, \n dloss = BetaML.Utils.dcrossentropy, \n epochs = 100, \n batch_size = 32, \n opt_alg = BetaML.Nn.ADAM(BetaML.Nn.var\"#90#93\"(), 1.0, 0.9, 0.999, 1.0e-8, BetaML.Nn.Learnable[], BetaML.Nn.Learnable[]), \n shuffle = true, \n descr = \"\", \n cb = BetaML.Nn.fitting_info, \n categories = nothing, \n handle_unknown = \"error\", \n other_categories_name = nothing, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n\njulia> classes_est = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt8, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>0.575, versicolor=>0.213, virginica=>0.213)\n UnivariateFinite{Multiclass{3}}(setosa=>0.573, versicolor=>0.213, virginica=>0.213)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>0.236, versicolor=>0.236, virginica=>0.529)\n UnivariateFinite{Multiclass{3}}(setosa=>0.254, versicolor=>0.254, virginica=>0.492)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.NeuralNetworkRegressor","page":"MLJ interface","title":"BetaML.Bmlj.NeuralNetworkRegressor","text":"mutable struct NeuralNetworkRegressor <: MLJModelInterface.Deterministic\n\nA simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for regression of a single dimensional target.\n\nParameters:\n\nlayers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers\nloss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as matrices, even if the regression task is 1-D\nwarning: Warning\nIf you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.\n\ndloss: Derivative of the loss function [def: BetaML.dsquared_cost, i.e. use the derivative of the squared cost]. Use nothing for autodiff.\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]\nbatch_size: Size of each individual batch [def: 16]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ndescr: An optional title and/or description for this model\ncb: A call back function to provide information during training [def: fitting_info]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\ndata must be numerical\nthe label should be be a n-records vector.\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> modelType = @load NeuralNetworkRegressor pkg = \"BetaML\" verbosity=0\nBetaML.Nn.NeuralNetworkRegressor\n\njulia> layers = [BetaML.DenseLayer(12,20,f=BetaML.relu),BetaML.DenseLayer(20,20,f=BetaML.relu),BetaML.DenseLayer(20,1,f=BetaML.relu)];\n\njulia> model = modelType(layers=layers,opt_alg=BetaML.ADAM());\nNeuralNetworkRegressor(\n layers = BetaML.Nn.AbstractLayer[BetaML.Nn.DenseLayer([-0.23249759178069676 -0.4125090172711131 … 0.41401934928739 -0.33017881111237535; -0.27912169279319965 0.270551221249931 … 0.19258414323473344 0.1703002982374256; … ; 0.31186742456482447 0.14776438287394805 … 0.3624993442655036 0.1438885872964824; 0.24363744610286758 -0.3221033024934767 … 0.14886090419299408 0.038411663101909355], [-0.42360286004241765, -0.34355377040029594, 0.11510963232946697, 0.29078650404397893, -0.04940236502546075, 0.05142849152316714, -0.177685375947775, 0.3857630523957018, -0.25454667127064756, -0.1726731848206195, 0.29832456225553444, -0.21138505291162835, -0.15763643112604903, -0.08477044513587562, -0.38436681165349196, 0.20538016429104916, -0.25008157754468335, 0.268681800562054, 0.10600581996650865, 0.4262194464325672], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.08534180387478185 0.19659398307677617 … -0.3413633217504578 -0.0484925247381256; 0.0024419192794883915 -0.14614102508129 … -0.21912059923003044 0.2680725396694708; … ; 0.25151545823147886 -0.27532269951606037 … 0.20739970895058063 0.2891938885916349; -0.1699020711688904 -0.1350423717084296 … 0.16947589410758873 0.3629006047373296], [0.2158116357688406, -0.3255582642532289, -0.057314442103850394, 0.29029696770539953, 0.24994080694366455, 0.3624239027782297, -0.30674318230919984, -0.3854738338935017, 0.10809721838554087, 0.16073511121016176, -0.005923262068960489, 0.3157147976348795, -0.10938918304264739, -0.24521229198853187, -0.307167732178712, 0.0808907777008302, -0.014577497150872254, -0.0011287181458157214, 0.07522282588658086, 0.043366500526073104], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.021367697115938555 -0.28326652172347155 … 0.05346175368370165 -0.26037328415871647], [-0.2313659199724562], BetaML.Utils.relu, BetaML.Utils.drelu)], \n loss = BetaML.Utils.squared_cost, \n dloss = BetaML.Utils.dsquared_cost, \n epochs = 100, \n batch_size = 32, \n opt_alg = BetaML.Nn.ADAM(BetaML.Nn.var\"#90#93\"(), 1.0, 0.9, 0.999, 1.0e-8, BetaML.Nn.Learnable[], BetaML.Nn.Learnable[]), \n shuffle = true, \n descr = \"\", \n cb = BetaML.Nn.fitting_info, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n\njulia> ŷ = predict(mach, X);\n\njulia> hcat(y,ŷ)\n506×2 Matrix{Float64}:\n 24.0 30.7726\n 21.6 28.0811\n 34.7 31.3194\n ⋮ \n 23.9 30.9032\n 22.0 29.49\n 11.9 27.2438\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.PegasosClassifier","page":"MLJ interface","title":"BetaML.Bmlj.PegasosClassifier","text":"mutable struct PegasosClassifier <: MLJModelInterface.Probabilistic\n\nThe gradient-based linear \"pegasos\" classifier using one-vs-all for multiclass, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\ninitial_coefficients::Union{Nothing, Matrix{Float64}}: N-classes by D-dimensions matrix of initial linear coefficients [def: nothing, i.e. zeros]\ninitial_constant::Union{Nothing, Vector{Float64}}: N-classes vector of initial contant terms [def: nothing, i.e. zeros]\nlearning_rate::Function: Learning rate [def: (epoch -> 1/sqrt(epoch))]\nlearning_rate_multiplicative::Float64: Multiplicative term of the learning rate [def: 0.5]\nepochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]\nshuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\nforce_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]\nreturn_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load PegasosClassifier pkg = \"BetaML\" verbosity=0\nBetaML.Perceptron.PegasosClassifier\n\njulia> model = modelType()\nPegasosClassifier(\n initial_coefficients = nothing, \n initial_constant = nothing, \n learning_rate = BetaML.Perceptron.var\"#71#73\"(), \n learning_rate_multiplicative = 0.5, \n epochs = 1000, \n shuffle = true, \n force_origin = false, \n return_mean_hyperplane = false, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n\njulia> est_classes = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt8, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>0.817, versicolor=>0.153, virginica=>0.0301)\n UnivariateFinite{Multiclass{3}}(setosa=>0.791, versicolor=>0.177, virginica=>0.0318)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>0.254, versicolor=>0.5, virginica=>0.246)\n UnivariateFinite{Multiclass{3}}(setosa=>0.283, versicolor=>0.51, virginica=>0.207)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.PerceptronClassifier","page":"MLJ interface","title":"BetaML.Bmlj.PerceptronClassifier","text":"mutable struct PerceptronClassifier <: MLJModelInterface.Probabilistic\n\nThe classical perceptron algorithm using one-vs-all for multiclass, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\ninitial_coefficients::Union{Nothing, Matrix{Float64}}: N-classes by D-dimensions matrix of initial linear coefficients [def: nothing, i.e. zeros]\ninitial_constant::Union{Nothing, Vector{Float64}}: N-classes vector of initial contant terms [def: nothing, i.e. zeros]\nepochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]\nshuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\nforce_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]\nreturn_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load PerceptronClassifier pkg = \"BetaML\"\n[ Info: For silent loading, specify `verbosity=0`. \nimport BetaML ✔\nBetaML.Perceptron.PerceptronClassifier\n\njulia> model = modelType()\nPerceptronClassifier(\n initial_coefficients = nothing, \n initial_constant = nothing, \n epochs = 1000, \n shuffle = true, \n force_origin = false, \n return_mean_hyperplane = false, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(PerceptronClassifier(initial_coefficients = nothing, …), …).\n*** Avg. error after epoch 2 : 0.0 (all elements of the set has been correctly classified)\njulia> est_classes = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt8, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>2.53e-34, virginica=>0.0)\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>1.27e-18, virginica=>1.86e-310)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>2.77e-57, versicolor=>1.1099999999999999e-82, virginica=>1.0)\n UnivariateFinite{Multiclass{3}}(setosa=>3.09e-22, versicolor=>4.03e-25, virginica=>1.0)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.RandomForestClassifier","page":"MLJ interface","title":"BetaML.Bmlj.RandomForestClassifier","text":"mutable struct RandomForestClassifier <: MLJModelInterface.Probabilistic\n\nA simple Random Forest model for classification with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nn_trees::Int64\nmax_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. square root of the data dimensions]\nsplitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: gini]. Either gini, entropy or a custom function. It can also be an anonymous function.\nβ::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour \"better\" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample :\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load RandomForestClassifier pkg = \"BetaML\" verbosity=0\nBetaML.Trees.RandomForestClassifier\n\njulia> model = modelType()\nRandomForestClassifier(\n n_trees = 30, \n max_depth = 0, \n min_gain = 0.0, \n min_records = 2, \n max_features = 0, \n splitting_criterion = BetaML.Utils.gini, \n β = 0.0, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(RandomForestClassifier(n_trees = 30, …), …).\n\njulia> cat_est = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt32, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>0.0, virginica=>0.0)\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>0.0, virginica=>0.0)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)\n UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0667, virginica=>0.933)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.RandomForestImputer","page":"MLJ interface","title":"BetaML.Bmlj.RandomForestImputer","text":"mutable struct RandomForestImputer <: MLJModelInterface.Unsupervised\n\nImpute missing values using Random Forests, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nn_trees::Int64: Number of (decision) trees in the forest [def: 30]\nmax_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Union{Nothing, Int64}: The maximum number of (random) features to consider at each partitioning [def: nothing, i.e. square root of the data dimension]\nforced_categorical_cols::Vector{Int64}: Specify the positions of the integer columns to treat as categorical instead of cardinal. [Default: empty vector (all numerical cols are treated as cardinal by default and the others as categorical)]\nsplitting_criterion::Union{Nothing, Function}: Either gini, entropy or variance. This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. It can be an anonymous function.\nrecursive_passages::Int64: Define the times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;\n\njulia> modelType = @load RandomForestImputer pkg = \"BetaML\" verbosity=0\nBetaML.Imputation.RandomForestImputer\n\njulia> model = modelType(n_trees=40)\nRandomForestImputer(\n n_trees = 40, \n max_depth = nothing, \n min_gain = 0.0, \n min_records = 2, \n max_features = nothing, \n forced_categorical_cols = Int64[], \n splitting_criterion = nothing, \n recursive_passages = 1, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(RandomForestImputer(n_trees = 40, …), …).\n\njulia> X_full = transform(mach) |> MLJ.matrix\n9×2 Matrix{Float64}:\n 1.0 10.5\n 1.5 10.3909\n 1.8 8.0\n 1.7 15.0\n 3.2 40.0\n 2.88375 8.66125\n 3.3 38.0\n 3.98125 -2.3\n 5.2 -2.4\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.RandomForestRegressor","page":"MLJ interface","title":"BetaML.Bmlj.RandomForestRegressor","text":"mutable struct RandomForestRegressor <: MLJModelInterface.Deterministic\n\nA simple Random Forest model for regression with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nn_trees::Int64: Number of (decision) trees in the forest [def: 30]\nmax_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. square root of the data dimension]\nsplitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: variance]. Either variance or a custom function. It can also be an anonymous function.\nβ::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour \"better\" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> modelType = @load RandomForestRegressor pkg = \"BetaML\" verbosity=0\nBetaML.Trees.RandomForestRegressor\n\njulia> model = modelType()\nRandomForestRegressor(\n n_trees = 30, \n max_depth = 0, \n min_gain = 0.0, \n min_records = 2, \n max_features = 0, \n splitting_criterion = BetaML.Utils.variance, \n β = 0.0, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(RandomForestRegressor(n_trees = 30, …), …).\n\njulia> ŷ = predict(mach, X);\n\njulia> hcat(y,ŷ)\n506×2 Matrix{Float64}:\n 24.0 25.8433\n 21.6 22.4317\n 34.7 35.5742\n 33.4 33.9233\n ⋮ \n 23.9 24.42\n 22.0 22.4433\n 11.9 15.5833\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.SimpleImputer","page":"MLJ interface","title":"BetaML.Bmlj.SimpleImputer","text":"mutable struct SimpleImputer <: MLJModelInterface.Unsupervised\n\nImpute missing values using feature (column) mean, with optional record normalisation (using l-norm norms), from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nstatistic::Function: The descriptive statistic of the column (feature) to use as imputed value [def: mean]\nnorm::Union{Nothing, Int64}: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).\n\nExample:\n\njulia> using MLJ\n\njulia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;\n\njulia> modelType = @load SimpleImputer pkg = \"BetaML\" verbosity=0\nBetaML.Imputation.SimpleImputer\n\njulia> model = modelType(norm=1)\nSimpleImputer(\n statistic = Statistics.mean, \n norm = 1)\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(SimpleImputer(statistic = mean, …), …).\n\njulia> X_full = transform(mach) |> MLJ.matrix\n9×2 Matrix{Float64}:\n 1.0 10.5\n 1.5 0.295466\n 1.8 8.0\n 1.7 15.0\n 3.2 40.0\n 0.280952 1.69524\n 3.3 38.0\n 0.0750839 -2.3\n 5.2 -2.4\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.mljverbosity_to_betaml_verbosity-Tuple{Integer}","page":"MLJ interface","title":"BetaML.Bmlj.mljverbosity_to_betaml_verbosity","text":"mljverbosity_to_betaml_verbosity(i::Integer) -> Verbosity\n\n\nConvert any integer (short scale) to one of the defined betaml verbosity levels Currently \"steps\" are 0, 1, 2 and 3\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.fit-Tuple{BetaML.Bmlj.AutoEncoder, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.fit","text":"fit(\n m::BetaML.Bmlj.AutoEncoder,\n verbosity,\n X\n) -> Tuple{AutoEncoder, Nothing, Nothing}\n\n\nFor the verbosity parameter see Verbosity)\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.fit-Tuple{BetaML.Bmlj.MultitargetNeuralNetworkRegressor, Any, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.fit","text":"fit(\n m::BetaML.Bmlj.MultitargetNeuralNetworkRegressor,\n verbosity,\n X,\n y\n) -> Tuple{NeuralNetworkEstimator, Nothing, Nothing}\n\n\nFor the verbosity parameter see Verbosity)\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.fit-Tuple{BetaML.Bmlj.NeuralNetworkClassifier, Any, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.fit","text":"MMI.fit(model::NeuralNetworkClassifier, verbosity, X, y)\n\nFor the verbosity parameter see Verbosity)\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.fit-Tuple{BetaML.Bmlj.NeuralNetworkRegressor, Any, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.fit","text":"fit(\n m::BetaML.Bmlj.NeuralNetworkRegressor,\n verbosity,\n X,\n y\n) -> Tuple{NeuralNetworkEstimator, Nothing, Nothing}\n\n\nFor the verbosity parameter see Verbosity)\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.predict-Tuple{Union{BetaML.Bmlj.KMeansClusterer, BetaML.Bmlj.KMedoidsClusterer}, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.predict","text":"predict(m::KMeansClusterer, fitResults, X) - Given a fitted clustering model and some observations, predict the class of the observation\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.transform-Tuple{BetaML.Bmlj.GeneralImputer, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.transform","text":"transform(m, fitResults, X)\n\nGiven a trained imputator model fill the missing data of some new observations. Note that with multiple recursive imputations and inner estimators that don't support missing data, this function works only for X for which th model has been trained with, i.e. this function can not be applied to new matrices with empty values using model trained on other matrices.\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.transform-Tuple{Union{BetaML.Bmlj.GaussianMixtureImputer, BetaML.Bmlj.RandomForestImputer, BetaML.Bmlj.SimpleImputer}, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.transform","text":"transform(m, fitResults, X) - Given a trained imputator model fill the missing data of some new observations\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.transform-Tuple{Union{BetaML.Bmlj.KMeansClusterer, BetaML.Bmlj.KMedoidsClusterer}, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.transform","text":"fit(m::KMeansClusterer, fitResults, X) - Given a fitted clustering model and some observations, return the distances to each centroids \n\n\n\n\n\n","category":"method"},{"location":"Api_v2_developer.html#api_implementation","page":"API implementation","title":"Api v2 - developer documentation (API implementation)","text":"","category":"section"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"Each model is a child of either BetaMLSuperVisedModel or BetaMLSuperVisedModel, both in turn child of BetaMLModel:","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"BetaMLSuperVisedModel <: BetaMLModel\nBetaMLUnsupervisedModel <: BetaMLModel\nRandomForestEstimator <: BetaMLSuperVisedModel","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"The model struct is composed of the following elements:","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"mutable struct DecisionTreeEstimator <: BetaMLSupervisedModel\n hpar::DecisionTreeE_hp # Hyper-pharameters\n opt::BML_options # Option sets, default or a specific one for the model\n par::DT_lp # Model learnable parameters (needed for predictions)\n cres::T # Cached results\n trained::Bool # Trained flag\n info # Complementary information, but not needed to make predictions\nend","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"Each specific model hyperparameter set and learnable parameter set are childs of BetaMLHyperParametersSet and BetaMLLearnedParametersSet and, if a specific model option set is used, this would be child of BetaMLOptionsSet.","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"While hyperparameters are elements that control the learning process, i.e. would influence the model training and prediction, the options have a more general meaning and do not directly affect the training (they can do indirectly, like the rng). The default option set is implemented as:","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"Base.@kwdef mutable struct BML_options\n \"Cache the results of the fitting stage, as to allow predict(mod) [default: `true`]. Set it to `false` to save memory for large data.\"\n cache::Bool = true\n \"An optional title and/or description for this model\"\n descr::String = \"\" \n \"The verbosity level to be used in training or prediction (see [`Verbosity`](@ref)) [deafult: `STD`]\n \"\n verbosity::Verbosity = STD\n \"Random Number Generator (see [`FIXEDSEED`](@ref)) [deafult: `Random.GLOBAL_RNG`]\n \"\n rng::AbstractRNG = Random.GLOBAL_RNG\nend","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"Note that the user doesn't generally need to make a difference between an hyperparameter and an option, as both are provided as keyword arguments to the model constructor thanks to a model constructor like the following one:","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"function KMedoidsClusterer(;kwargs...)\n m = KMedoidsClusterer(KMeansMedoidsHyperParametersSet(),BML_options(),KMeansMedoids_lp(),nothing,false,Dict{Symbol,Any}())\n thisobjfields = fieldnames(nonmissingtype(typeof(m)))\n for (kw,kwv) in kwargs\n found = false\n for f in thisobjfields\n fobj = getproperty(m,f)\n if kw in fieldnames(typeof(fobj))\n setproperty!(fobj,kw,kwv)\n found = true\n end\n end\n found || error(\"Keyword \\\"$kw\\\" is not part of this model.\")\n end\n return m\nend","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"So, in order to implement a new model we need to:","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"implement its struct and constructor\nimplement the relative ModelHyperParametersSet, ModelLearnedParametersSet and eventually ModelOptionsSet.\ndefine fit!(model, X, [y]), predict(model,X) and eventually inverse_predict(model,X).","category":"page"},{"location":"index.html#![BLogos](assets/BetaML_logo_30x30.png)-BetaML.jl-Documentation","page":"Index","title":"(Image: BLogos) BetaML.jl Documentation","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"Welcome to the documentation of the Beta Machine Learning toolkit.","category":"page"},{"location":"index.html#About","page":"Index","title":"About","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"The BetaML toolkit provides machine learning algorithms written in the Julia programming language.","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Aside the algorithms themselves, BetaML provides many \"utility\" functions. Because algorithms are all self-contained in the library itself (you are invited to explore their source code by typing @edit functionOfInterest(par1,par2,...)), the utility functions have APIs that are coordinated with the algorithms, facilitating the \"preparation\" of the data for the analysis, the choice of the hyper-parameters or the evaluation of the models. Most models have an interface for the MLJ framework.","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Aside Julia, BetaML can be accessed in R or Python using respectively JuliaCall and PyJulia. See the tutorial for details.","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"!!! Warning Version 0.11 brings homogenization in the models' names and put some order on other stuff, but at the cost of severe breaking changes. Follow the updated documentation. ","category":"page"},{"location":"index.html#Installation","page":"Index","title":"Installation","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"The BetaML package is included in the standard Julia register, install it with:","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"] add BetaML","category":"page"},{"location":"index.html#Available-modules","page":"Index","title":"Available modules","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"While BetaML is split in several (sub)modules, all of them are re-exported at the root module level. This means that you can access their functionality by simply typing using BetaML:","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"using BetaML\nmyLayer = DenseLayer(2,3) # DenseLayer is defined in the Nn submodule\nres = KernelPerceptronClassifier() # KernelPerceptronClassifier is defined in the Perceptron module\n@edit DenseLayer(2,3) # Open a text editor with to the relevant source code","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Each module is documented on the links below (you can also use the inline Julia help system: just press the question mark ? and then, on the special help prompt help?>, type the function name):","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"BetaML.Perceptron: The Perceptron, Kernel Perceptron and Pegasos classification algorithms;\nBetaML.Trees: The Decision Trees and Random Forests algorithms for classification or regression (with missing values supported);\nBetaML.Nn: Implementation of Artificial Neural Networks;\nBetaML.Clustering: (hard) Clustering algorithms (K-Means, K-Mdedoids)\nBetaML.GMM: Various algorithms (Clustering, regressor, missing imputation / collaborative filtering / recommandation systems) that use a Generative (Gaussian) mixture models (probabilistic) fitter, fitted using a EM algorithm;\nBetaML.Imputation: Imputation algorithms;\nBetaML.Utils: Various utility functions (scale, one-hot, distances, kernels, pca, accuracy/error measures..).","category":"page"},{"location":"index.html#models_list","page":"Index","title":"Available models","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"Currently BetaML provides the following models:","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"BetaML name MLJ Interface Category*\nPerceptronClassifier PerceptronClassifier Supervised classifier\nKernelPerceptronClassifier KernelPerceptronClassifier Supervised classifier\nPegasosClassifier PegasosClassifier Supervised classifier\nDecisionTreeEstimator DecisionTreeClassifier, DecisionTreeRegressor Supervised regressor and classifier\nRandomForestEstimator RandomForestClassifier, RandomForestRegressor Supervised regressor and classifier\nNeuralNetworkEstimator NeuralNetworkRegressor, MultitargetNeuralNetworkRegressor, NeuralNetworkClassifier Supervised regressor and classifier\nGaussianMixtureRegressor GaussianMixtureRegressor, MultitargetGaussianMixtureRegressor Supervised regressor\nGaussianMixtureRegressor2 Supervised regressor\nKMeansClusterer KMeansClusterer Unsupervised hard clusterer\nKMedoidsClusterer KMedoidsClusterer Unsupervised hard clusterer\nGaussianMixtureClusterer GaussianMixtureClusterer Unsupervised soft clusterer\nSimpleImputer SimpleImputer Unsupervised missing data imputer\nGaussianMixtureImputer GaussianMixtureImputer Unsupervised missing data imputer\nRandomForestImputer RandomForestImputer Unsupervised missing data imputer\nGeneralImputer GeneralImputer Unsupervised missing data imputer\nMinMaxScaler Data transformer\nStandardScaler Data transformer\nScaler Data transformer\nPCAEncoder Unsupervised dimensionality reduction\nAutoEncoder AutoEncoder Unsupervised non-linear dimensionality reduction\nOneHotEncoder Data transformer\nOrdinalEncoder Data transformer\nConfusionMatrix Predictions assessment","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"* There is no formal distinction in BetaML between a transformer, or also a model to assess predictions, and a unsupervised model. They are all treated as unsupervised models that given some data they lern how to return some useful information, wheter a class grouping, a specific tranformation or a quality evaluation..","category":"page"},{"location":"index.html#Usage","page":"Index","title":"Usage","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"New to BetaML or even to Julia / Machine Learning altogether? Start from the tutorial!","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"All models supports the (a) model construction (where hyperparameters and options are choosen), (b) fitting and (c) prediction paradigm. A few model support inverse_transform, for example to go back from the one-hot encoded columns to the original categorical variable (factor). ","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"This paradigm is described in detail in the API V2 page.","category":"page"},{"location":"index.html#Quick-examples","page":"Index","title":"Quick examples","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"(see the tutorial for a more step-by-step guide to the examples below and to other examples)","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Using an Artificial Neural Network for multinomial categorisation","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"In this example we see how to train a neural networks model to predict the specie's name (5th column) given floral sepals and petals measures (first 4 columns) in the famous iris flower dataset.","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"# Load Modules\nusing DelimitedFiles, Random\nusing Pipe, Plots, BetaML # Load BetaML and other auxiliary modules\nRandom.seed!(123); # Fix the random seed (to obtain reproducible results).\n\n# Load the data\niris = readdlm(joinpath(dirname(Base.find_package(\"BetaML\")),\"..\",\"test\",\"data\",\"iris.csv\"),',',skipstart=1)\nx = convert(Array{Float64,2}, iris[:,1:4])\ny = convert(Array{String,1}, iris[:,5])\n# Encode the categories (levels) of y using a separate column per each category (aka \"one-hot\" encoding) \nohmod = OneHotEncoder()\ny_oh = fit!(ohmod,y) \n# Split the data in training/testing sets\n((xtrain,xtest),(ytrain,ytest),(ytrain_oh,ytest_oh)) = partition([x,y,y_oh],[0.8,0.2])\n(ntrain, ntest) = size.([xtrain,xtest],1)\n\n# Define the Artificial Neural Network model\nl1 = DenseLayer(4,10,f=relu) # The activation function is `ReLU`\nl2 = DenseLayer(10,3) # The activation function is `identity` by default\nl3 = VectorFunctionLayer(3,f=softmax) # Add a (parameterless include(\"Imputation_tests.jl\")) layer whose activation function (`softmax` in this case) is defined to all its nodes at once\nmynn = NeuralNetworkEstimator(layers=[l1,l2,l3],loss=crossentropy,descr=\"Multinomial logistic regression Model Sepal\", batch_size=2, epochs=200) # Build the NN and use the cross-entropy as error function.\n# Alternatively, swith to hyperparameters auto-tuning with `autotune=true` instead of specify `batch_size` and `epoch` manually\n\n# Train the model (using the ADAM optimizer by default)\nres = fit!(mynn,fit!(Scaler(),xtrain),ytrain_oh) # Fit the model to the (scaled) data\n\n# Obtain predictions and test them against the ground true observations\nŷtrain = @pipe predict(mynn,fit!(Scaler(),xtrain)) |> inverse_predict(ohmod,_) # Note the scaling and reverse one-hot encoding functions\nŷtest = @pipe predict(mynn,fit!(Scaler(),xtest)) |> inverse_predict(ohmod,_) \ntrain_accuracy = accuracy(ŷtrain,ytrain) # 0.975\ntest_accuracy = accuracy(ŷtest,ytest) # 0.96\n\n# Analyse model performances\ncm = ConfusionMatrix()\nfit!(cm,ytest,ŷtest)\nprint(cm)","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"A ConfusionMatrix BetaMLModel (fitted)\n\n-----------------------------------------------------------------\n\n*** CONFUSION MATRIX ***\n\nScores actual (rows) vs predicted (columns):\n\n4×4 Matrix{Any}:\n \"Labels\" \"virginica\" \"versicolor\" \"setosa\"\n \"virginica\" 8 1 0\n \"versicolor\" 0 14 0\n \"setosa\" 0 0 7\nNormalised scores actual (rows) vs predicted (columns):\n\n4×4 Matrix{Any}:\n \"Labels\" \"virginica\" \"versicolor\" \"setosa\"\n \"virginica\" 0.888889 0.111111 0.0\n \"versicolor\" 0.0 1.0 0.0\n \"setosa\" 0.0 0.0 1.0\n\n *** CONFUSION REPORT ***\n\n- Accuracy: 0.9666666666666667\n- Misclassification rate: 0.033333333333333326\n- Number of classes: 3\n\n N Class precision recall specificity f1score actual_count predicted_count\n TPR TNR support \n\n 1 virginica 1.000 0.889 1.000 0.941 9 8\n 2 versicolor 0.933 1.000 0.938 0.966 14 15\n 3 setosa 1.000 1.000 1.000 1.000 7 7\n\n- Simple avg. 0.978 0.963 0.979 0.969\n- Weigthed avg. 0.969 0.967 0.971 0.966","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"ϵ = info(mynn)[\"loss_per_epoch\"]\nplot(1:length(ϵ),ϵ, ylabel=\"epochs\",xlabel=\"error\",legend=nothing,title=\"Avg. error per epoch on the Sepal dataset\")\nheatmap(info(cm)[\"categories\"],info(cm)[\"categories\"],info(cm)[\"normalised_scores\"],c=cgrad([:white,:blue]),xlabel=\"Predicted\",ylabel=\"Actual\", title=\"Confusion Matrix\")","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"(Image: results) (Image: results)","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Using Random forests for regression","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"In this example we predict, using another classical ML dataset, the miles per gallon of various car models.","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Note in particular:","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"(a) how easy it is in Julia to import remote data, even cleaning them without ever saving a local file on disk;\n(b) how Random Forest models can directly work on data with missing values, categorical one and non-numerical one in general without any preprocessing ","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"# Load modules\nusing Random, HTTP, CSV, DataFrames, BetaML, Plots\nimport Pipe: @pipe\nRandom.seed!(123)\n\n# Load data\nurlData = \"https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data\"\ndata = @pipe HTTP.get(urlData).body |>\n replace!(_, UInt8('\\t') => UInt8(' ')) |>\n CSV.File(_, delim=' ', missingstring=\"?\", ignorerepeated=true, header=false) |>\n DataFrame;\n\n# Preprocess data\nX = Matrix(data[:,2:8]) # cylinders, displacement, horsepower, weight, acceleration, model year, origin, model name\ny = data[:,1] # miles per gallon\n(xtrain,xtest),(ytrain,ytest) = partition([X,y],[0.8,0.2])\n\n# Model definition, hyper-parameters auto-tuning, training and prediction\nm = RandomForestEstimator(autotune=true)\nŷtrain = fit!(m,xtrain,ytrain) # shortcut for `fit!(m,xtrain,ytrain); ŷtrain = predict(x,xtrain)`\nŷtest = predict(m,xtest)\n\n# Prediction assessment\nrelative_mean_error_train = relative_mean_error(ytrain,ŷtrain) # 0.039\nrelative_mean_error_test = relative_mean_error(ytest,ŷtest) # 0.076\nscatter(ytest,ŷtest,xlabel=\"Actual\",ylabel=\"Estimated\",label=nothing,title=\"Est vs. obs MPG (test set)\")","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"(Image: results)","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Further examples","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Finally, you may want to give a look at the \"test\" folder. While the primary objective of the scripts under the \"test\" folder is to provide automatic testing of the BetaML toolkit, they can also be used to see how functions should be called, as virtually all functions provided by BetaML are tested there.","category":"page"},{"location":"index.html#Acknowledgements","page":"Index","title":"Acknowledgements","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"The development of this package at the Bureau d'Economie Théorique et Appliquée (BETA, Nancy) was supported by the French National Research Agency through the Laboratory of Excellence ARBRE, a part of the “Investissements d'Avenir” Program (ANR 11 – LABX-0002-01).","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"(Image: BLogos)","category":"page"}] +[{"location":"Api.html#api_module","page":"The Api module","title":"The BetaML.Api Module","text":"","category":"section"},{"location":"Api.html","page":"The Api module","title":"The Api module","text":"Api","category":"page"},{"location":"Api.html#BetaML.Api","page":"The Api module","title":"BetaML.Api","text":"Api\n\nThe Api Module (currently v2)\n\nThis module includes the shared api trough the various BetaML submodules, i.e. names used by more than one submodule.\n\nModules are free to use other functions but these are defined here to avoid name conflicts and allows instead Multiple Dispatch to handle them. For a user-prospective overall description of the BetaML API see the page API V2 → Introduction for users, while for the implementation of the API see the page API V2 → For developers\n\n\n\n\n\n","category":"module"},{"location":"Api.html#Module-Index","page":"The Api module","title":"Module Index","text":"","category":"section"},{"location":"Api.html","page":"The Api module","title":"The Api module","text":"Modules = [Api]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Api.html#Detailed-API","page":"The Api module","title":"Detailed API","text":"","category":"section"},{"location":"Api.html","page":"The Api module","title":"The Api module","text":"Modules = [Api]\nPrivate = false","category":"page"},{"location":"Api.html#BetaML.Api.FIXEDRNG","page":"The Api module","title":"BetaML.Api.FIXEDRNG","text":"Fixed ring to allow reproducible results\n\nUse it with:\n\nmyAlgorithm(;rng=FIXEDRNG) # always produce the same sequence of results on each run of the script (\"pulling\" from the same rng object on different calls)\nmyAlgorithm(;rng=copy(FIXEDRNG)) # always produce the same result (new rng object on each function call)\n\n\n\n\n\n","category":"constant"},{"location":"Api.html#BetaML.Api.FIXEDSEED","page":"The Api module","title":"BetaML.Api.FIXEDSEED","text":"const FIXEDSEED\n\nFixed seed to allow reproducible results. This is the seed used to obtain the same results under unit tests.\n\nUse it with:\n\nmyAlgorithm(;rng=MyChoosenRNG(FIXEDSEED)) # always produce the same sequence of results on each run of the script (\"pulling\" from the same rng object on different calls)\nmyAlgorithm(;rng=copy(MyChoosenRNG(FIXEDSEED))) # always produce the same result (new rng object on each call)\n\n\n\n\n\n","category":"constant"},{"location":"Api.html#BetaML.Api.BML_options","page":"The Api module","title":"BetaML.Api.BML_options","text":"mutable struct BML_options <: BetaMLOptionsSet\n\nA struct defining the options used by default by the algorithms that do not override it with their own option sets.\n\nFields:\n\ncache::Bool: Cache the results of the fitting stage, as to allow predict(mod) [default: true]. Set it to false to save memory for large data.\ndescr::String: An optional title and/or description for this model\nautotune::Bool: 0ption for hyper-parameters autotuning [def: false, i.e. not autotuning performed]. If activated, autotuning is performed on the first fit!() call. Controll auto-tuning trough the option tunemethod (see the model hyper-parameters)\nverbosity::Verbosity: The verbosity level to be used in training or prediction: NONE, LOW, STD [default], HIGH or FULL\nrng::Random.AbstractRNG: Random Number Generator (see ?FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\neven if a model doesn't override BML_options, may not use all its options, for example deterministic models would not make use of the rng parameter. Passing such parameters in these cases would simply have no influence.\n\nExample:\n\njulia> options = BML_options(cache=false,descr=\"My model\")\n\n\n\n\n\n","category":"type"},{"location":"Api.html#BetaML.Api.Verbosity","page":"The Api module","title":"BetaML.Api.Verbosity","text":"primitive type Verbosity <: Enum{Int32} 32\n\nMany models and functions accept a verbosity parameter.\n\nChoose between: NONE, LOW, STD [default], HIGH and FULL.\n\n\n\n\n\n","category":"type"},{"location":"Api.html#BetaML.Api.fit!-Tuple{BetaMLModel, Vararg{Any, N} where N}","page":"The Api module","title":"BetaML.Api.fit!","text":"fit!(m::BetaMLModel,X,[y])\n\nFit (\"train\") a BetaMLModel (i.e. learn the algorithm's parameters) based on data, either only features or features and labels.\n\nEach specific model implements its own version of fit!(m,X,[Y]), but the usage is consistent across models.\n\nNotes:\n\nFor online algorithms, i.e. models that support updating of the learned parameters with new data, fit! can be repeated as new data arrive, altought not all algorithms guarantee that training each record at the time is equivalent to train all the records at once.\nIf the model has been trained while having the cache option set on true (by default) fit! returns ŷ instead of nothing effectively making it behave like a fit-and-transform function.\nIn Python and other languages that don't allow the exclamation mark within the function name, use fit_ex(⋅) instead of fit!(⋅)\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.hyperparameters-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.hyperparameters","text":"hyperparameters(m::BetaMLModel)\n\nReturns the hyperparameters of a BetaML model. See also ?options for the parameters that do not directly affect learning.\n\nwarning: Warning\nThe returned object is a reference, so if it is modified, the relative object in the model will change too.\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.info-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.info","text":"info(m::BetaMLModel) -> Any\n\n\nReturn a string-keyed dictionary of \"additional\" information stored during model fitting.\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.inverse_predict-Tuple{BetaMLModel, Any}","page":"The Api module","title":"BetaML.Api.inverse_predict","text":"inverse_predict(m::BetaMLModel,X)\n\nGiven a model m that fitted on x produces xnew, it takes xnew to return (possibly an approximation of ) x.\n\nFor example, when OneHotEncoder is fitted with a subset of the possible categories and the handle_unknown option is set on infrequent, inverse_transform will aggregate all the other categories as specified in other_categories_name.\n\nNotes:\n\nInplemented only in a few models.\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.model_load","page":"The Api module","title":"BetaML.Api.model_load","text":"model_load(filename::AbstractString)\nmodel_load(filename::AbstractString,args::AbstractString...)\n\nLoad from file one or more BetaML models (wheter fitted or not).\n\nNotes:\n\nIf no model names to retrieve are specified it returns a dictionary keyed with the model names\nIf multiple models are demanded, a tuple is returned\nFor further options see the documentation of the function load of the JLD2 package\n\nExamples:\n\njulia> models = model_load(\"fittedModels.jl\"; mod1Name=mod1,mod2)\njulia> mod1 = model_load(\"fittedModels.jl\",mod1)\njulia> (mod1,mod2) = model_load(\"fittedModels.jl\",\"mod1\", \"mod2\")\n\n\n\n\n\n","category":"function"},{"location":"Api.html#BetaML.Api.model_save","page":"The Api module","title":"BetaML.Api.model_save","text":"model_save(filename::AbstractString,overwrite_file::Bool=false;kwargs...)\n\nAllow to save one or more BetaML models (wheter fitted or not), eventually specifying a name for each of them.\n\nParameters:\n\nfilename: Name of the destination file\noverwrite_file: Wheter to overrite the file if it alreaxy exist or preserve it (for the objects different than the one that are going to be saved) [def: false, i.e. preserve the file]\nkwargs: model objects to be saved, eventually associated with a different name to save the mwith (e.g. mod1Name=mod1,mod2) \n\nNotes:\n\nIf an object with the given name already exists on the destination JLD2 file it will be ovenwritten.\nIf the file exists, but it is not a JLD2 file and the option overwrite_file is set to false, an error will be raisen.\nUse the semicolon ; to separate the filename from the model(s) to save\nFor further options see the documentation of the JLD2 package\n\nExamples\n\njulia> model_save(\"fittedModels.jl\"; mod1Name=mod1,mod2)\n\n\n\n\n\n","category":"function"},{"location":"Api.html#BetaML.Api.options-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.options","text":"options(m::BetaMLModel)\n\nReturns the non-learning related options of a BetaML model. See also ?hyperparameters for the parameters that directly affect learning.\n\nwarning: Warning\nThe returned object is a reference, so if it is modified, the relative object in the model will change too.\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.parameters-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.parameters","text":"parameters(m::BetaMLModel)\n\nReturns the learned parameters of a BetaML model.\n\nwarning: Warning\nThe returned object is a reference, so if it is modified, the relative object in the model will change too.\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.predict-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.predict","text":"predict(m::BetaMLModel,[X])\n\nPredict new information (including transformation) based on a fitted BetaMLModel, eventually applied to new features when the algorithm generalises to new data.\n\nNotes:\n\nAs a convenience, if the model has been trained while having the cache option set on true (by default) the predictions associated with the last training of the model is retained in the model object and can be retrieved simply with predict(m).\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.reset!-Tuple{BetaMLModel}","page":"The Api module","title":"BetaML.Api.reset!","text":"reset!(m::BetaMLModel)\n\nReset the parameters of a trained model.\n\nNotes:\n\nIn Python and other languages that don't allow the exclamation mark within the function name, use reset_ex(⋅) instead of reset!(⋅)\n\n\n\n\n\n","category":"method"},{"location":"Api.html#BetaML.Api.sethp!-Tuple{BetaMLModel, Dict}","page":"The Api module","title":"BetaML.Api.sethp!","text":"sethp!(m::BetaMLModel, hp::Dict)\n\n\nSet the hyperparameters of model m as specified in the hp dictionary.\n\n\n\n\n\n","category":"method"},{"location":"Imputation.html#imputation_module","page":"Imputation","title":"The BetaML.Imputation Module","text":"","category":"section"},{"location":"Imputation.html","page":"Imputation","title":"Imputation","text":"Imputation","category":"page"},{"location":"Imputation.html#BetaML.Imputation","page":"Imputation","title":"BetaML.Imputation","text":"Imputation module\n\nProvide various imputation methods for missing data. Note that the interpretation of \"missing\" can be very wide. For example, reccomendation systems / collaborative filtering (e.g. suggestion of the film to watch) can well be representated as a missing data to impute problem, often with better results than traditional algorithms as k-nearest neighbors (KNN)\n\nProvided imputers:\n\nSimpleImputer: Impute data using the feature (column) mean, optionally normalised by l-norms of the records (rows) (fastest)\nGaussianMixtureImputer: Impute data using a Generative (Gaussian) Mixture Model (good trade off)\nRandomForestImputer: Impute missing data using Random Forests, with optional replicable multiple imputations (most accurate).\nGeneralImputer: Impute missing data using a vector (one per column) of arbitrary learning models (classifiers/regressors) that implement m = Model([options]), fit!(m,X,Y) and predict(m,X) (not necessarily from BetaML).\n\nImputations for all these models can be optained by running mod = ImputatorModel([options]), fit!(mod,X). The data with the missing values imputed can then be obtained with predict(mod). Useinfo(m::Imputer) to retrieve further information concerning the imputation. Trained models can be also used to impute missing values in new data with predict(mox,xNew). Note that if multiple imputations are run (for the supporting imputators) predict() will return a vector of predictions rather than a single one`.\n\nExample\n\njulia> using Statistics, BetaML\n\njulia> X = [2 missing 10; 2000 4000 1000; 2000 4000 10000; 3 5 12 ; 4 8 20; 1 2 5]\n6×3 Matrix{Union{Missing, Int64}}:\n 2 missing 10\n 2000 4000 1000\n 2000 4000 10000\n 3 5 12\n 4 8 20\n 1 2 5\n\njulia> mod = RandomForestImputer(multiple_imputations=10, rng=copy(FIXEDRNG));\n\njulia> fit!(mod,X);\n\njulia> vals = predict(mod)\n10-element Vector{Matrix{Union{Missing, Int64}}}:\n [2 3 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 4 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 4 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 136 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 137 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 4 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 4 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 4 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 137 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n [2 137 10; 2000 4000 1000; … ; 4 8 20; 1 2 5]\n\njulia> nR,nC = size(vals[1])\n(6, 3)\n\njulia> medianValues = [median([v[r,c] for v in vals]) for r in 1:nR, c in 1:nC]\n6×3 Matrix{Float64}:\n 2.0 4.0 10.0\n 2000.0 4000.0 1000.0\n 2000.0 4000.0 10000.0\n 3.0 5.0 12.0\n 4.0 8.0 20.0\n 1.0 2.0 5.0\n\njulia> infos = info(mod);\n\njulia> infos[\"n_imputed_values\"]\n1\n\n\n\n\n\n","category":"module"},{"location":"Imputation.html#Module-Index","page":"Imputation","title":"Module Index","text":"","category":"section"},{"location":"Imputation.html","page":"Imputation","title":"Imputation","text":"Modules = [Imputation]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Imputation.html#Detailed-API","page":"Imputation","title":"Detailed API","text":"","category":"section"},{"location":"Imputation.html","page":"Imputation","title":"Imputation","text":"Modules = [Imputation]\nPrivate = false","category":"page"},{"location":"Imputation.html#BetaML.Imputation.GaussianMixtureImputer","page":"Imputation","title":"BetaML.Imputation.GaussianMixtureImputer","text":"mutable struct GaussianMixtureImputer <: Imputer\n\nMissing data imputer that uses a Generative (Gaussian) Mixture Model.\n\nFor the parameters (n_classes,mixtures,..) see GaussianMixture_hp.\n\nLimitations:\n\ndata must be numerical\nthe resulted matrix is a Matrix{Float64}\ncurrently the Mixtures available do not support random initialisation for missing imputation, and the rest of the algorithm (Expectation-Maximisation) is deterministic, so there is no random component involved (i.e. no multiple imputations)\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1 2.5; missing 20.5; 0.8 18; 12 22.8; 0.4 missing; 1.6 3.7];\n\njulia> mod = GaussianMixtureImputer(mixtures=[SphericalGaussian() for i in 1:2])\nGaussianMixtureImputer - A Gaussian Mixture Model based imputer (unfitted)\n\njulia> X_full = fit!(mod,X)\nIter. 1: Var. of the post 2.373498171519511 Log-likelihood -29.111866299189792\n6×2 Matrix{Float64}:\n 1.0 2.5\n 6.14905 20.5\n 0.8 18.0\n 12.0 22.8\n 0.4 4.61314\n 1.6 3.7\n\njulia> info(mod)\nDict{String, Any} with 7 entries:\n \"xndims\" => 2\n \"error\" => [2.3735, 0.17527, 0.0283747, 0.0053147, 0.000981885]\n \"AIC\" => 57.798\n \"fitted_records\" => 6\n \"lL\" => -21.899\n \"n_imputed_values\" => 2\n \"BIC\" => 56.3403\n\njulia> parameters(mod)\nBetaML.Imputation.GaussianMixtureImputer_lp (a BetaMLLearnableParametersSet struct)\n- mixtures: AbstractMixture[SphericalGaussian{Float64}([1.0179819950570768, 3.0999990977255845], 0.2865287884295908), SphericalGaussian{Float64}([6.149053737674149, 20.43331198167713], 15.18664378248651)]\n- initial_probmixtures: [0.48544987084082347, 0.5145501291591764]\n- probRecords: [0.9999996039918224 3.9600817749531375e-7; 2.3866922376272767e-229 1.0; … ; 0.9127030246369684 0.08729697536303167; 0.9999965964161501 3.403583849794472e-6]\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.GeneralI_hp","page":"Imputation","title":"BetaML.Imputation.GeneralI_hp","text":"mutable struct GeneralI_hp <: BetaMLHyperParametersSet\n\nHyperparameters for GeneralImputer\n\nParameters:\n\ncols_to_impute: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords \"auto\" (default) or \"all\". With \"auto\" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use \"all\" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.\nestimator: An entimator model (regressor or classifier), with eventually its options (hyper-parameters), to be used to impute the various columns of the matrix. It can also be a cols_to_impute-length vector of different estimators to consider a different estimator for each column (dimension) to impute, for example when some columns are categorical (and will hence require a classifier) and some others are numerical (hence requiring a regressor). [default: nothing, i.e. use BetaML random forests, handling classification and regression jobs automatically].\nmissing_supported: Wheter the estimator(s) used to predict the missing data support itself missing data in the training features (X). If not, when the model for a certain dimension is fitted, dimensions with missing data in the same rows of those where imputation is needed are dropped and then only non-missing rows in the other remaining dimensions are considered. It can be a vector of boolean values to specify this property for each individual estimator or a single booleann value to apply to all the estimators [default: false]\nfit_function: The function used by the estimator(s) to fit the model. It should take as fist argument the model itself, as second argument a matrix representing the features, and as third argument a vector representing the labels. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.fit!]\npredict_function: The function used by the estimator(s) to predict the labels. It should take as fist argument the model itself and as second argument a matrix representing the features. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.predict]\nrecursive_passages: Define the number of times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].\nmultiple_imputations: Determine the number of independent imputation of the whole dataset to make. Note that while independent, the imputations share the same random number generator (RNG).\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.GeneralImputer","page":"Imputation","title":"BetaML.Imputation.GeneralImputer","text":"mutable struct GeneralImputer <: Imputer\n\nImpute missing values using arbitrary learning models.\n\nImpute missing values using any arbitrary learning model (classifier or regressor, not necessarily from BetaML) that implement an interface m = Model([options]), train!(m,X,Y) and predict(m,X). For non-BetaML supervised models the actual training and predict functions must be specified in the fit_function and predict_function parameters respectively. If needed (for example when some columns with missing data are categorical and some numerical) different models can be specified for each column. Multiple imputations and multiple \"passages\" trought the various colums for a single imputation are supported. \n\nSee GeneralI_hp for all the hyper-parameters.\n\nExamples:\n\nUsing BetaML models:\n\njulia> using BetaML\njulia> X = [1.4 2.5 \"a\"; missing 20.5 \"b\"; 0.6 18 missing; 0.7 22.8 \"b\"; 0.4 missing \"b\"; 1.6 3.7 \"a\"]\n6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n missing 20.5 \"b\"\n 0.6 18 missing\n 0.7 22.8 \"b\"\n 0.4 missing \"b\"\n 1.6 3.7 \"a\"\n\n julia> mod = GeneralImputer(recursive_passages=2,multiple_imputations=2)\n GeneralImputer - A imputer based on an arbitrary regressor/classifier(unfitted)\n\n julia> mX_full = fit!(mod,X);\n ** Processing imputation 1\n ** Processing imputation 2\n\n julia> mX_full[1]\n 6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n 0.546722 20.5 \"b\"\n 0.6 18 \"b\"\n 0.7 22.8 \"b\"\n 0.4 19.8061 \"b\"\n 1.6 3.7 \"a\"\n\n julia> mX_full[2]\n 6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n 0.554167 20.5 \"b\"\n 0.6 18 \"b\"\n 0.7 22.8 \"b\"\n 0.4 20.7551 \"b\"\n 1.6 3.7 \"a\"\n \n julia> info(mod)\n Dict{String, Any} with 1 entry:\n \"n_imputed_values\" => 3\n \n\nUsing third party packages (in this example DecisionTree):\n\njulia> using BetaML\njulia> import DecisionTree\njulia> X = [1.4 2.5 \"a\"; missing 20.5 \"b\"; 0.6 18 missing; 0.7 22.8 \"b\"; 0.4 missing \"b\"; 1.6 3.7 \"a\"]\n6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n missing 20.5 \"b\"\n 0.6 18 missing\n 0.7 22.8 \"b\"\n 0.4 missing \"b\"\n 1.6 3.7 \"a\"\njulia> mod = GeneralImputer(estimator=[DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeClassifier()], fit_function = DecisionTree.fit!, predict_function=DecisionTree.predict, recursive_passages=2)\nGeneralImputer - A imputer based on an arbitrary regressor/classifier(unfitted)\njulia> X_full = fit!(mod,X)\n** Processing imputation 1\n6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n 0.94 20.5 \"b\"\n 0.6 18 \"b\"\n 0.7 22.8 \"b\"\n 0.4 13.5 \"b\"\n 1.6 3.7 \"a\"\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.RandomForestI_hp","page":"Imputation","title":"BetaML.Imputation.RandomForestI_hp","text":"mutable struct RandomForestI_hp <: BetaMLHyperParametersSet\n\nHyperparameters for RandomForestImputer\n\nParameters:\n\nrfhpar::Any: For the underlying random forest algorithm parameters (n_trees,max_depth,min_gain,min_records,max_features:,splitting_criterion,β,initialisation_strategy, oob and rng) see RandomForestE_hp for the specific RF algorithm parameters\nforced_categorical_cols::Vector{Int64}: Specify the positions of the integer columns to treat as categorical instead of cardinal. [Default: empty vector (all numerical cols are treated as cardinal by default and the others as categorical)]\nrecursive_passages::Int64: Define the times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].\nmultiple_imputations::Int64: Determine the number of independent imputation of the whole dataset to make. Note that while independent, the imputations share the same random number generator (RNG).\ncols_to_impute::Union{String, Vector{Int64}}: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords \"auto\" (default) or \"all\". With \"auto\" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use \"auto\" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.\n\nExample:\n\njulia>mod = RandomForestImputer(n_trees=20,max_depth=10,recursive_passages=3)\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.RandomForestImputer","page":"Imputation","title":"BetaML.Imputation.RandomForestImputer","text":"mutable struct RandomForestImputer <: Imputer\n\nImpute missing data using Random Forests, with optional replicable multiple imputations. \n\nSee RandomForestI_hp, RandomForestE_hp and BML_options for the parameters.\n\nNotes:\n\nGiven a certain RNG and its status (e.g. RandomForestImputer(...,rng=StableRNG(FIXEDSEED))), the algorithm is completely deterministic, i.e. replicable. \nThe algorithm accepts virtually any kind of data, sortable or not\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.4 2.5 \"a\"; missing 20.5 \"b\"; 0.6 18 missing; 0.7 22.8 \"b\"; 0.4 missing \"b\"; 1.6 3.7 \"a\"]\n6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n missing 20.5 \"b\"\n 0.6 18 missing\n 0.7 22.8 \"b\"\n 0.4 missing \"b\"\n 1.6 3.7 \"a\"\n\njulia> mod = RandomForestImputer(n_trees=20,max_depth=10,recursive_passages=2)\nRandomForestImputer - A Random-Forests based imputer (unfitted)\n\njulia> X_full = fit!(mod,X)\n** Processing imputation 1\n6×3 Matrix{Any}:\n 1.4 2.5 \"a\"\n 0.504167 20.5 \"b\"\n 0.6 18 \"b\"\n 0.7 22.8 \"b\"\n 0.4 20.0837 \"b\"\n 1.6 3.7 \"a\"\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.SimpleI_hp","page":"Imputation","title":"BetaML.Imputation.SimpleI_hp","text":"mutable struct SimpleI_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the SimpleImputer model\n\nParameters:\n\nstatistic::Function: The descriptive statistic of the column (feature) to use as imputed value [def: mean]\nnorm::Union{Nothing, Int64}: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).\n\n\n\n\n\n","category":"type"},{"location":"Imputation.html#BetaML.Imputation.SimpleImputer","page":"Imputation","title":"BetaML.Imputation.SimpleImputer","text":"mutable struct SimpleImputer <: Imputer\n\nSimple imputer using the missing data's feature (column) statistic (def: mean), optionally normalised by l-norms of the records (rows)\n\nParameters:\n\nstatistics: The descriptive statistic of the column (feature) to use as imputed value [def: mean]\nnorm: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries). \n\nLimitations:\n\ndata must be numerical\n\nExample:\n\njulia> using BetaML\n\njulia> X = [2.0 missing 10; 20 40 100]\n2×3 Matrix{Union{Missing, Float64}}:\n 2.0 missing 10.0\n 20.0 40.0 100.0\n\njulia> mod = SimpleImputer(norm=1)\nSimpleImputer - A simple feature-stat based imputer (unfitted)\n\njulia> X_full = fit!(mod,X)\n2×3 Matrix{Float64}:\n 2.0 4.04494 10.0\n 20.0 40.0 100.0\n\njulia> info(mod)\nDict{String, Any} with 1 entry:\n \"n_imputed_values\" => 1\n\njulia> parameters(mod)\nBetaML.Imputation.SimpleImputer_lp (a BetaMLLearnableParametersSet struct)\n- cStats: [11.0, 40.0, 55.0]\n- norms: [6.0, 53.333333333333336]\n\n\n\n\n\n","category":"type"},{"location":"Benchmarks.html#BetaML-Benchmarks","page":"Benchmarks","title":"BetaML Benchmarks","text":"","category":"section"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"This benchmark allows to quickly check for regressions across versions. As it is run and compiled using GitHub actions, and these may be powered by different computational resources, timing results are normalized using SystemBenchmark.","category":"page"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"This page also provides a basic comparison with other leading Julia libraries for the same algorithm, USING DEFAULT VALUES. This file is intended just for benchmarking, not much as a tutorial, and it doesn't employ a full ML workflow, just the minimum preprocessing such that the algorithms work.","category":"page"},{"location":"Benchmarks.html#Benchmark-setup","page":"Benchmarks","title":"Benchmark setup","text":"","category":"section"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"using Pkg\nPkg.activate(joinpath(@__DIR__,\"..\"))\nusing Test, Statistics, Random, DelimitedFiles\nusing DataStructures, DataFrames, BenchmarkTools, StableRNGs, SystemBenchmark\nimport DecisionTree, Flux\nusing BetaML\n\nTESTRNG = StableRNG(123)","category":"page"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"Threads.nthreads()","category":"page"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"println(\"*** Computing System benchmarking for normalization of the outputs..\")\nres = runbenchmark()\ncomp = comparetoref(res)\ntests = [\"FloatMul\", \"FusedMulAdd\", \"FloatSin\", \"VecMulBroad\", \"CPUMatMul\", \n \"MatMulBroad\", \"3DMulBroad\", \"FFMPEGH264Write\"] \navg_factor_to_ref = mean(comp[in.(comp.testname, Ref(tests)), \"factor\"])","category":"page"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"avg_factor_to_ref ","category":"page"},{"location":"Benchmarks.html#Regression","page":"Benchmarks","title":"Regression","text":"","category":"section"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"A simple regression over 500 points with y = x₁²-x₂+x₃²","category":"page"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"println(\"*** Benchmarking regression task..\")\n\ndf_regr = DataFrame(name= String[],time=Float64[],memory=Int64[],allocs=Int64[],mre_train=Float64[],std_train=Float64[],mre_test=Float64[],std_test=Float64[])\nn = 500\nseeds = rand(copy(TESTRNG),n)\nx = vcat([[s*2 (s-3)^2 s/2 0.2-s] for s in seeds]...)\ny = [r[1]*2-r[2]+r[3]^2 for r in eachrow(x)]\n\nbml_models = OrderedDict(\"DT\"=>DecisionTreeEstimator(rng=copy(TESTRNG),verbosity=NONE),\n \"RF\"=>RandomForestEstimator(rng=copy(TESTRNG),verbosity=NONE),\n \"NN\"=>NeuralNetworkEstimator(rng=copy(TESTRNG),verbosity=NONE),\n);\n\nfor (mname,m) in bml_models\n #mname = \"DT\"\n #m = NeuralNetworkEstimator(rng=copy(TESTRNG),verbosity=NONE)\n # speed measure \n println(\"Processing model $mname ... \")\n bres = @benchmark fit!(m2,$x,$y) setup=(m2 = deepcopy($m))\n m_time = median(bres.times)\n m_memory = bres.memory\n m_allocs = bres.allocs\n sampler = KFold(nsplits=10,rng=copy(TESTRNG));\n cv_out = cross_validation([x,y],sampler,return_statistics=false) do trainData,valData,rng\n (xtrain,ytrain) = trainData; (xval,yval) = valData\n m2 = deepcopy(m)\n fit!(m2,xtrain,ytrain)\n ŷtrain = predict(m2,xtrain)\n ŷval = predict(m2,xval)\n rme_train = relative_mean_error(ytrain,ŷtrain)\n rme_val = relative_mean_error(yval,ŷval)\n return (rme_train, rme_val)\n end\n\n mre_train = mean([r[1] for r in cv_out])\n std_train = std([r[1] for r in cv_out])\n mre_test = mean([r[2] for r in cv_out])\n std_test = std([r[2] for r in cv_out])\n push!(df_regr,[mname, m_time, m_memory, m_allocs, mre_train, std_train, mre_test, std_test])\n @test mre_test <= 0.05\nend\n\n### DecisionTree\nRandom.seed!(123)\ndt_models = OrderedDict(\"DT (DecisionTrees.jl)\"=>DecisionTree.DecisionTreeRegressor(),\n \"RF (DecisionTrees.jl)\"=>DecisionTree.RandomForestRegressor(),\n);\n\nfor (mname,m) in dt_models\n #mname = \"DT\"\n #m = NeuralNetworkEstimator(rng=copy(TESTRNG),verbosity=NONE)\n # speed measure \n bres = @benchmark DecisionTree.fit!(m2,$x,$y) setup=(m2 = deepcopy($m))\n m_time = median(bres.times)\n m_memory = bres.memory\n m_allocs = bres.allocs\n sampler = KFold(nsplits=10,rng=copy(TESTRNG));\n cv_out = cross_validation([x,y],sampler,return_statistics=false) do trainData,valData,rng\n (xtrain,ytrain) = trainData; (xval,yval) = valData\n m2 = deepcopy(m)\n DecisionTree.fit!(m2,xtrain,ytrain)\n ŷtrain = DecisionTree.predict(m2,xtrain)\n ŷval = DecisionTree.predict(m2,xval)\n rme_train = relative_mean_error(ytrain,ŷtrain)\n rme_val = relative_mean_error(yval,ŷval)\n return (rme_train, rme_val)\n end\n\n mre_train = mean([r[1] for r in cv_out])\n std_train = std([r[1] for r in cv_out])\n mre_test = mean([r[2] for r in cv_out])\n std_test = std([r[2] for r in cv_out])\n push!(df_regr,[mname, m_time, m_memory, m_allocs, mre_train, std_train, mre_test, std_test])\n @test mre_test <= 0.05\nend\n\n### Flux\nRandom.seed!(123)\nl1 = Flux.Dense(4,8,Flux.relu)\nl2 = Flux.Dense(8,8,Flux.relu)\nl3 = Flux.Dense(8,1,Flux.identity)\nFlux_nn = Flux.Chain(l1,l2,l3)\nfluxloss(x, y) = Flux.mse(Flux_nn(x), y)\nps = Flux.params(Flux_nn)\nnndata = Flux.Data.DataLoader((Float32.(x)', Float32.(y)'), batchsize=16,shuffle=true)\n\nbres = @benchmark [Flux.train!(fluxloss, ps2, $nndata, Flux.ADAM()) for i in 1:200] setup=(ps2 = deepcopy($ps))\nm_time = median(bres.times)\nm_memory = bres.memory\nm_allocs = bres.allocs\n\nsampler = KFold(nsplits=10,rng=copy(TESTRNG));\ncv_out = cross_validation([x,y],sampler,return_statistics=false) do trainData,valData,rng\n (xtrain,ytrain) = trainData; (xval,yval) = valData\n m2 = deepcopy(Flux_nn)\n ps2 = Flux.params(m2)\n fluxloss2(x, y) = Flux.mse(m2(x), y)\n nndata = Flux.Data.DataLoader((Float32.(xtrain)', Float32.(ytrain)'), batchsize=16,shuffle=true)\n [Flux.train!(fluxloss2, ps2, nndata, Flux.ADAM()) for i in 1:200] \n ŷtrain = m2(xtrain')'\n ŷval = m2(xval')'\n rme_train = relative_mean_error(ytrain,ŷtrain)\n rme_val = relative_mean_error(yval,ŷval)\n return (rme_train, rme_val)\nend\nmre_train = mean([r[1] for r in cv_out])\nstd_train = std([r[1] for r in cv_out])\nmre_test = mean([r[2] for r in cv_out])\nstd_test = std([r[2] for r in cv_out])\npush!(df_regr,[\"NN (Flux.jl)\", m_time, m_memory, m_allocs, mre_train, std_train, mre_test, std_train])\n@test mre_test <= 0.05\n\ndf_regr.time .= df_regr.time ./ avg_factor_to_ref","category":"page"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"df_regr","category":"page"},{"location":"Benchmarks.html#Classification","page":"Benchmarks","title":"Classification","text":"","category":"section"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"A dicotomic diagnostic breast cancer classification based on the Wisconsin Breast Cancer Database.","category":"page"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"println(\"*** Benchmarking classification task..\")\n\nbcancer_file = joinpath(@__DIR__,\"..\",\"..\",\"test\",\"data\",\"breast_wisconsin\",\"wdbc.data\")\nbcancer = readdlm(bcancer_file,',')\nx = fit!(Scaler(),convert(Matrix{Float64},bcancer[:,3:end]))\ny = convert(Vector{String},bcancer[:,2])\nohm = OneHotEncoder()\nyoh = fit!(ohm,y)\ndf_class = DataFrame(name= String[],time=Float64[],memory=Int64[],allocs=Int64[],acc_train=Float64[],std_train=Float64[],acc_test=Float64[],std_test=Float64[])\n\nbml_models = OrderedDict(\"DT\"=>DecisionTreeEstimator(rng=copy(TESTRNG),verbosity=NONE),\n \"RF\"=>RandomForestEstimator(rng=copy(TESTRNG),verbosity=NONE),\n \"NN\"=>NeuralNetworkEstimator(rng=copy(TESTRNG),verbosity=NONE),\n \"Perc\"=>PerceptronClassifier(rng=copy(TESTRNG),verbosity=NONE),\n \"KPerc\"=>KernelPerceptronClassifier(rng=copy(TESTRNG),verbosity=NONE),\n \"Peg\"=>PegasosClassifier(rng=copy(TESTRNG),verbosity=NONE),\n);\n\nfor (mname,m) in bml_models\n #mname = \"NN\"\n #m = NeuralNetworkEstimator(rng=copy(TESTRNG),verbosity=NONE)\n # speed measure \n println(\"Processing model $mname ... \")\n if mname == \"NN\"\n bres = @benchmark fit!(m2,$x,$yoh) setup=(m2 = deepcopy($m))\n else\n bres = @benchmark fit!(m2,$x,$y) setup=(m2 = deepcopy($m))\n end\n m_time = median(bres.times)\n m_memory = bres.memory\n m_allocs = bres.allocs\n sampler = KFold(nsplits=10,rng=copy(TESTRNG));\n cv_out = cross_validation([x,y,yoh],sampler,return_statistics=false) do trainData,valData,rng\n (xtrain,ytrain,yohtrain) = trainData; (xval,yval,yohval) = valData\n m2 = deepcopy(m)\n if mname == \"NN\"\n fit!(m2,xtrain,yohtrain)\n else\n fit!(m2,xtrain,ytrain)\n end\n ŷtrain = predict(m2,xtrain)\n ŷval = predict(m2,xval)\n if mname == \"NN\"\n acc_train = accuracy(BetaML.mode(yohtrain),BetaML.mode(ŷtrain))\n acc_val = accuracy(BetaML.mode(yohval),BetaML.mode(ŷval))\n else\n acc_train = accuracy(ytrain,ŷtrain)\n acc_val = accuracy(yval,ŷval)\n end\n return (acc_train, acc_val)\n end\n\n acc_train = mean([r[1] for r in cv_out])\n std_train = std([r[1] for r in cv_out])\n acc_test = mean([r[2] for r in cv_out])\n std_test = std([r[2] for r in cv_out])\n push!(df_class,[mname, m_time, m_memory, m_allocs, acc_train, std_train, acc_test, std_test])\n @test acc_test >= 0.6\nend\n\n\nRandom.seed!(123)\ndt_models = OrderedDict(\"DT (DT.jl)\"=>DecisionTree.DecisionTreeClassifier(),\n \"RF (DT.jl)\"=>DecisionTree.RandomForestClassifier(),\n);\n\n\nfor (mname,m) in dt_models\n #mname = \"DT\"\n #m = NeuralNetworkEstimator(rng=copy(TESTRNG),verbosity=NONE)\n # speed measure \n bres = @benchmark DecisionTree.fit!(m2,$x,$y) setup=(m2 = deepcopy($m))\n m_time = median(bres.times)\n m_memory = bres.memory\n m_allocs = bres.allocs\n sampler = KFold(nsplits=10,rng=copy(TESTRNG));\n cv_out = cross_validation([x,y],sampler,return_statistics=false) do trainData,valData,rng\n (xtrain,ytrain) = trainData; (xval,yval) = valData\n m2 = deepcopy(m)\n DecisionTree.fit!(m2,xtrain,ytrain)\n ŷtrain = DecisionTree.predict(m2,xtrain)\n ŷval = DecisionTree.predict(m2,xval)\n acc_train = accuracy(ytrain,ŷtrain)\n acc_val = accuracy(yval,ŷval)\n return (acc_train, acc_val)\n end\n\n acc_train = mean([r[1] for r in cv_out])\n std_train = std([r[1] for r in cv_out])\n acc_test = mean([r[2] for r in cv_out])\n std_test = std([r[2] for r in cv_out])\n push!(df_class,[mname, m_time, m_memory, m_allocs, acc_train, std_train, acc_test, std_test])\n @test acc_test >= 0.8\nend\n\n### Flux\nRandom.seed!(123)\nohm = OneHotEncoder()\nyoh = fit!(ohm,y)\nl1 = Flux.Dense(30,45,Flux.relu)\nl2 = Flux.Dense(45,45,Flux.relu)\nl3 = Flux.Dense(45,2,Flux.identity)\nFlux_nn = Flux.Chain(l1,l2,l3)\nfluxloss(lx, ly) = Flux.logitcrossentropy(Flux_nn(lx), ly)\nps = Flux.params(Flux_nn)\nnndata = Flux.Data.DataLoader((Float32.(x)', Float32.(yoh)'), batchsize=15,shuffle=true)\nbres = @benchmark [Flux.train!(fluxloss, ps2, $nndata, Flux.ADAM()) for i in 1:200] setup=(ps2 = deepcopy($ps))\nm_time = median(bres.times)\nm_memory = bres.memory\nm_allocs = bres.allocs\n\nsampler = KFold(nsplits=10,rng=copy(TESTRNG));\ncv_out = cross_validation([x,y,yoh],sampler,return_statistics=false) do trainData,valData,rng\n (xtrain,ytrain,yohtrain) = trainData; (xval,yval,yohval) = valData\n m2 = deepcopy(Flux_nn)\n ps2 = Flux.params(m2)\n fluxloss2(lx, ly) = Flux.logitcrossentropy(m2(lx), ly)\n nndata = Flux.Data.DataLoader((Float32.(xtrain)', Float32.(yohtrain)'), batchsize=16,shuffle=true)\n [Flux.train!(fluxloss2, ps2, nndata, Flux.ADAM()) for i in 1:200] \n ŷtrain = inverse_predict(ohm,fit!(OneHotEncoder(),mode(m2(xtrain')')))\n ŷval = inverse_predict(ohm,fit!(OneHotEncoder(),mode(m2(xval')')))\n acc_train = accuracy(ytrain,ŷtrain)\n acc_val = accuracy(yval,ŷval)\n return (acc_train, acc_val)\nend\nacc_train = mean([r[1] for r in cv_out])\nstd_train = std([r[1] for r in cv_out])\nacc_test = mean([r[2] for r in cv_out])\nstd_test = std([r[2] for r in cv_out])\npush!(df_class,[\"NN (Flux.jl)\", m_time, m_memory, m_allocs, acc_train, std_train, acc_test, std_test])\n@test acc_test >= 0.8\n\ndf_class.time .= df_class.time ./ avg_factor_to_ref","category":"page"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"df_class","category":"page"},{"location":"Benchmarks.html#Clustering","page":"Benchmarks","title":"Clustering","text":"","category":"section"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"TODO :-)","category":"page"},{"location":"Benchmarks.html#Missing-imputation","page":"Benchmarks","title":"Missing imputation","text":"","category":"section"},{"location":"Benchmarks.html","page":"Benchmarks","title":"Benchmarks","text":"TODO :-)","category":"page"},{"location":"Examples.html#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"Examples.html#Supervised-learning","page":"Examples","title":"Supervised learning","text":"","category":"section"},{"location":"Examples.html#Regression","page":"Examples","title":"Regression","text":"","category":"section"},{"location":"Examples.html#Estimating-the-bike-sharing-demand","page":"Examples","title":"Estimating the bike sharing demand","text":"","category":"section"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"The task is to estimate the influence of several variables (like the weather, the season, the day of the week..) on the demand of shared bicycles, so that the authority in charge of the service can organise the service in the best way.","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"Data origin:","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"original full dataset (by hour, not used here): https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset\nsimplified dataset (by day, with some simple scaling): https://www.hds.utc.fr/~tdenoeux/dokuwiki/en/aec\ndescription: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/exam2019ace.pdf\ndata: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/bikesharing_day.csv.zip","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"Note that even if we are estimating a time serie, we are not using here a recurrent neural network as we assume the temporal dependence to be negligible (i.e. Y_t = f(X_t) alone).","category":"page"},{"location":"Examples.html#Classification","page":"Examples","title":"Classification","text":"","category":"section"},{"location":"Examples.html#Unsupervised-lerarning","page":"Examples","title":"Unsupervised lerarning","text":"","category":"section"},{"location":"Examples.html#Notebooks","page":"Examples","title":"Notebooks","text":"","category":"section"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"The following notebooks provide runnable examples of the package functionality:","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"Pegasus classifiers: [Static notebook] - [myBinder]\nDecision Trees and Random Forest regression on Bike sharing demand forecast (daily data): [Static notebook] - [myBinder]\nNeural Networks: [Static notebook] - [myBinder]\nBike sharing demand forecast (daily data): [Static notebook] - [myBinder]\nClustering: [Static notebook] - [myBinder]","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"Note: the live, runnable computational environment is a temporary new copy made at each connection. The first time after a commit is done on this repository a new environment has to be set (instead of just being copied), and the server may take several minutes.","category":"page"},{"location":"Examples.html","page":"Examples","title":"Examples","text":"This is only if you are the unlucky user triggering the rebuild of the environment after the commit.","category":"page"},{"location":"Nn.html#nn_module","page":"Nn","title":"The BetaML.Nn Module","text":"","category":"section"},{"location":"Nn.html","page":"Nn","title":"Nn","text":"Nn","category":"page"},{"location":"Nn.html#BetaML.Nn","page":"Nn","title":"BetaML.Nn","text":"BetaML.Nn module\n\nImplement the functionality required to define an artificial Neural Network, train it with data, forecast data and assess its performances.\n\nCommon type of layers and optimisation algorithms are already provided, but you can define your own ones subclassing respectively the AbstractLayer and OptimisationAlgorithm abstract types.\n\nThe module provide the following types or functions. Use ?[type or function] to access their full signature and detailed documentation:\n\nModel definition:\n\nDenseLayer: Classical feed-forward layer with user-defined activation function\nDenseNoBiasLayer: Classical layer without the bias parameter\nVectorFunctionLayer: Layer whose activation function run over the ensable of its nodes rather than on each one individually. No learnable weigths on input, optional learnable weigths as parameters of the activation function.\nScalarFunctionLayer: Layer whose activation function run over each node individually, like a classic DenseLayer, but with no learnable weigths on input and optional learnable weigths as parameters of the activation function.\nReplicatorLayer: Alias for a ScalarFunctionLayer with no learnable parameters and identity as activation function\nReshaperLayer: Reshape the output of a layer (or the input data) to the shape needed for the next one\nPoolingLayer: In the middle between VectorFunctionLayer and ScalarFunctionLayer, it applyes a function to the set of nodes defined in a sliding kernel. Weightless.\nConvLayer: A generic N+1 (channels) dimensional convolutional layer \nGroupedLayer: To stack several layers into a single layer, e.g. for multi-branches networks\nNeuralNetworkEstimator: Build the chained network and define a cost function\n\nEach layer can use a default activation function, one of the functions provided in the Utils module (relu, tanh, softmax,...) or one provided by you. BetaML will try to recognise if it is a \"known\" function for which it sets the exact derivatives, otherwise you can normally provide the layer with it. If the derivative of the activation function is not provided (either manually or automatically), AD will be used and training may be slower, altought this difference tends to vanish with bigger datasets.\n\nYou can alternativly implement your own layer defining a new type as subtype of the abstract type AbstractLayer. Each user-implemented layer must define the following methods:\n\nA suitable constructor\nforward(layer,x)\nbackward(layer,x,next_gradient)\nget_params(layer)\nget_gradient(layer,x,next_gradient)\nset_params!(layer,w)\nsize(layer)\n\nModel fitting:\n\nfit!(nn,X,Y): fitting function\nfitting_info(nn): Default callback function during fitting\nSGD: The classical optimisation algorithm\nADAM: A faster moment-based optimisation algorithm \n\nTo define your own optimisation algorithm define a subtype of OptimisationAlgorithm and implement the function single_update!(θ,▽;opt_alg) and eventually init_optalg!(⋅) specific for it.\n\nModel predictions and assessment:\n\npredict(nn) or predict(nn,X): Return the output given the data\n\nWhile high-level functions operating on the dataset expect it to be in the standard format (nrecords × ndimensions matrices) it is customary to represent the chain of a neural network as a flow of column vectors, so all low-level operations (operating on a single datapoint) expect both the input and the output as a column vector.\n\n\n\n\n\n","category":"module"},{"location":"Nn.html#Module-Index","page":"Nn","title":"Module Index","text":"","category":"section"},{"location":"Nn.html","page":"Nn","title":"Nn","text":"Modules = [Nn]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Nn.html#Detailed-API","page":"Nn","title":"Detailed API","text":"","category":"section"},{"location":"Nn.html","page":"Nn","title":"Nn","text":"Modules = [Nn]\nPrivate = false","category":"page"},{"location":"Nn.html#BetaML.Nn.ADAM","page":"Nn","title":"BetaML.Nn.ADAM","text":"ADAM(;η, λ, β₁, β₂, ϵ)\n\nThe ADAM algorithm, an adaptive moment estimation optimiser.\n\nFields:\n\nη: Learning rate (stepsize, α in the paper), as a function of the current epoch [def: t -> 0.001 (i.e. fixed)]\nλ: Multiplicative constant to the learning rate [def: 1]\nβ₁: Exponential decay rate for the first moment estimate [range: ∈ [0,1], def: 0.9]\nβ₂: Exponential decay rate for the second moment estimate [range: ∈ [0,1], def: 0.999]\nϵ: Epsilon value to avoid division by zero [def: 10^-8]\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.ConvLayer","page":"Nn","title":"BetaML.Nn.ConvLayer","text":"struct ConvLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nA generic N+1 (channels) dimensional convolutional layer\n\nEXPERIMENTAL: Still too slow for practical applications\n\nThis convolutional layer has two constructors, one with the form ConvLayer(input_size,kernel_size,nchannels_in,nchannels_out), and an alternative one as ConvLayer(input_size_with_channel,kernel_size,nchannels_out). If the input is a vector, use a ReshaperLayer in front.\n\nFields:\n\ninput_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Input size (including nchannel_in as last dimension)\noutput_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Output size (including nchannel_out as last dimension)\nweight::Array{WET, NDPLUS2} where {NDPLUS2, WET<:Number}: Weight tensor (aka \"filter\" or \"kernel\") with respect to the input from previous layer or data (kernelsize array augmented by the nchannelsin and nchannels_out dimensions)\nusebias::Bool: Wether to use (and learn) a bias weigth [def: true]\nbias::Vector{WET} where WET<:Number: Bias (nchannels_out array)\npadding_start::StaticArraysCore.SVector{ND, Int64} where ND: Padding (initial)\npadding_end::StaticArraysCore.SVector{ND, Int64} where ND: Padding (ending)\nstride::StaticArraysCore.SVector{ND, Int64} where ND: Stride\nndims::Int64: Number of dimensions (excluding input and output channels)\nf::Function: Activation function\ndf::Union{Nothing, Function}: Derivative of the activation function\nx_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1: x ids of the convolution (computed in preprocessing- itself at the beginning oftrain`\ny_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1: y ids of the convolution (computed in preprocessing- itself at the beginning oftrain`\nw_ids::Array{StaticArraysCore.SVector{NDPLUS2, Int64}, 1} where NDPLUS2: w ids of the convolution (computed in preprocessing- itself at the beginning oftrain`\ny_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1: A y-dims array of vectors of ids of x(s) contributing to the giving y\ny_to_w_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS2}}, 1}, NDPLUS1} where {NDPLUS1, NDPLUS2}: A y-dims array of vectors of corresponding w(s) contributing to the giving y\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.ConvLayer-NTuple{4, Any}","page":"Nn","title":"BetaML.Nn.ConvLayer","text":"ConvLayer(\n input_size,\n kernel_size,\n nchannels_in,\n nchannels_out;\n stride,\n rng,\n padding,\n kernel_eltype,\n kernel_init,\n usebias,\n bias_init,\n f,\n df\n) -> ConvLayer{_A, _B, _C, typeof(identity), _D, Float64} where {_A, _B, _C, _D<:Union{Nothing, Function}}\n\n\nInstantiate a new nD-dimensional, possibly multichannel ConvolutionalLayer\n\nThe input data is either a column vector (in which case is reshaped) or an array of input_size augmented by the n_channels dimension, the output size depends on the input_size, kernel_size, padding and striding but has always nchannels_out as its last dimention. \n\nPositional arguments:\n\ninput_size: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.\nkernel_size: Size of the kernel (aka filter or learnable weights) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.\nnchannels_in: Number of channels in input\nnchannels_out: Number of channels in output\n\nKeyword arguments:\n\nstride: \"Steps\" to move the convolution with across the various tensor dimensions [def: ones]\npadding: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def: nothing, i.e. set the padding required to keep the same dimensions in output (with stride==1)]\nf: Activation function [def: relu]\ndf: Derivative of the activation function [default: try to match a known funcion, AD otherwise. Use nothing to force AD]\nkernel_eltype: Kernel eltype [def: Float64]\nkernel_init: Initial weigths with respect to the input [default: Xavier initialisation]. If explicitly provided, it should be a multidimensional array of kernel_size augmented by nchannels_in and nchannels_out dimensions\nbias_init: Initial weigths with respect to the bias [default: Xavier initialisation]. If given it should be a nchannels_out vector of scalars.\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nXavier initialization is sampled from a Uniform distribution between ⨦ sqrt(6/(prod(input_size)*nchannels_in))\nto retrieve the output size of the layer, use size(ConvLayer[2]). The output size on each dimension d (except the last one that is given by nchannels_out) is given by the following formula (ceiled): output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]\nwith strides higher than 1, the automatic padding is set to keep outsize = inside/stride\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.ConvLayer-Tuple{Any, Any, Any}","page":"Nn","title":"BetaML.Nn.ConvLayer","text":"ConvLayer(\n input_size_with_channel,\n kernel_size,\n nchannels_out;\n stride,\n rng,\n padding,\n kernel_eltype,\n kernel_init,\n usebias,\n bias_init,\n f,\n df\n) -> ConvLayer{_A, _B, _C, typeof(identity), _D, _E} where {_A, _B, _C, _D<:Union{Nothing, Function}, _E<:Number}\n\n\nAlternative constructor for a ConvLayer where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2] if one wish.\n\nFor arguments and default values see the documentation of the main constructor.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.DenseLayer","page":"Nn","title":"BetaML.Nn.DenseLayer","text":"struct DenseLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nRepresentation of a layer in the network\n\nFields:\n\nw: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)\nwb: Biases (n)\nf: Activation function\ndf: Derivative of the activation function\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.DenseLayer-Tuple{Any, Any}","page":"Nn","title":"BetaML.Nn.DenseLayer","text":"DenseLayer(\n nₗ,\n n;\n rng,\n w_eltype,\n w,\n wb,\n f,\n df\n) -> DenseLayer{typeof(identity), _A, Float64} where _A<:Union{Nothing, Function}\n\n\nInstantiate a new DenseLayer\n\nPositional arguments:\n\nnₗ: Number of nodes of the previous layer\nn: Number of nodes\n\nKeyword arguments:\n\nw_eltype: Eltype of the weigths [def: Float64]\nw: Initial weigths with respect to input [default: Xavier initialisation, dims = (n,nₗ)]\nwb: Initial weigths with respect to bias [default: Xavier initialisation, dims = (n)]\nf: Activation function [def: identity]\ndf: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD if f is unknown]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nXavier initialization = rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ+n))\nSpecify df=nothing to explicitly use AD\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.DenseNoBiasLayer","page":"Nn","title":"BetaML.Nn.DenseNoBiasLayer","text":"struct DenseNoBiasLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nRepresentation of a layer without bias in the network\n\nFields:\n\nw: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)\nf: Activation function\ndf: Derivative of the activation function\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.DenseNoBiasLayer-Tuple{Any, Any}","page":"Nn","title":"BetaML.Nn.DenseNoBiasLayer","text":"DenseNoBiasLayer(\n nₗ,\n n;\n rng,\n w_eltype,\n w,\n f,\n df\n) -> DenseNoBiasLayer{typeof(identity), _A, Float64} where _A<:Union{Nothing, Function}\n\n\nInstantiate a new DenseNoBiasLayer\n\nPositional arguments:\n\nnₗ: Number of nodes of the previous layer\nn: Number of nodes\n\nKeyword arguments:\n\nw_eltype: Eltype of the weigths [def: Float64]\nw: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]\nf: Activation function [def: identity]\ndf: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD if f is unknown]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nXavier initialization = rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ,n))\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.GroupedLayer","page":"Nn","title":"BetaML.Nn.GroupedLayer","text":"struct GroupedLayer <: AbstractLayer\n\nRepresentation of a \"group\" of layers, each of which operates on different inputs (features) and acting as a single layer in the network.\n\nFields:\n\nlayers: The individual layers that compose this grouped layer\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.GroupedLayer-Tuple{Any}","page":"Nn","title":"BetaML.Nn.GroupedLayer","text":"GroupedLayer(layers) -> GroupedLayer\n\n\nInstantiate a new GroupedLayer, a layer made up of several other layers stacked together in order to cover all the data dimensions but without connect all the inputs to all the outputs like a single DenseLayer would do.\n\nPositional arguments:\n\nlayers: The individual layers that compose this grouped layer\n\nNotes:\n\ncan be used to create composable neural networks with multiple branches\ntested only with 1 dimensional layers. For convolutional networks use ReshaperLayers before and/or after.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.Learnable","page":"Nn","title":"BetaML.Nn.Learnable","text":"Learnable(data)\n\nStructure representing the learnable parameters of a layer or its gradient.\n\nThe learnable parameters of a layers are given in the form of a N-tuple of Array{Float64,N2} where N2 can change (e.g. we can have a layer with the first parameter being a matrix, and the second one being a scalar). We wrap the tuple on its own structure a bit for some efficiency gain, but above all to define standard mathematic operations on the gradients without doing \"type piracy\" with respect to Base tuples.\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.NeuralNetworkE_hp","page":"Nn","title":"BetaML.Nn.NeuralNetworkE_hp","text":"**`\n\nmutable struct NeuralNetworkE_hp <: BetaMLHyperParametersSet\n\n`**\n\nHyperparameters for the Feedforward neural network model\n\nParameters:\n\nlayers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers\nloss: Loss (cost) function [def: squared_cost] It must always assume y and ŷ as (n x d) matrices, eventually using dropdims inside.\n\ndloss: Derivative of the loss function [def: dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]\nbatch_size: Size of each individual batch [def: 16]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: ADAM()]\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ntunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\nTo know the available layers type subtypes(AbstractLayer)) and then type ?LayerName for information on how to use each layer.\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.NeuralNetworkE_options","page":"Nn","title":"BetaML.Nn.NeuralNetworkE_options","text":"NeuralNetworkE_options\n\nA struct defining the options used by the Feedforward neural network model\n\nParameters:\n\ncache: Cache the results of the fitting stage, as to allow predict(mod) [default: true]. Set it to false to save memory for large data.\ndescr: An optional title and/or description for this model\nverbosity: The verbosity level to be used in training or prediction (see Verbosity) [deafult: STD]\ncb: A call back function to provide information during training [def: fitting_info\nautotune: 0ption for hyper-parameters autotuning [def: false, i.e. not autotuning performed]. If activated, autotuning is performed on the first fit!() call. Controll auto-tuning trough the option tunemethod (see the model hyper-parameters)\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.NeuralNetworkEstimator","page":"Nn","title":"BetaML.Nn.NeuralNetworkEstimator","text":"NeuralNetworkEstimator\n\nA \"feedforward\" (but also multi-branch) neural network (supervised).\n\nFor the parameters see NeuralNetworkE_hp and for the training options NeuralNetworkE_options (we have a few more options for this specific estimator).\n\nNotes:\n\ndata must be numerical\nthe label can be a n-records vector or a n-records by n-dimensions matrix, but the result is always a matrix.\nFor one-dimension regressions drop the unnecessary dimension with dropdims(ŷ,dims=2)\nFor classification tasks the columns should normally be interpreted as the probabilities for each categories\n\nExamples:\n\nClassification...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> ohmod = OneHotEncoder()\nA OneHotEncoder BetaMLModel (unfitted)\n\njulia> y_oh = fit!(ohmod,y)\n6×2 Matrix{Bool}:\n 1 0\n 0 1\n 0 1\n 0 1\n 0 1\n 1 0\n\njulia> layers = [DenseLayer(2,6),DenseLayer(6,2),VectorFunctionLayer(2,f=softmax)];\n\njulia> m = NeuralNetworkEstimator(layers=layers,opt_alg=ADAM(),epochs=300,verbosity=LOW)\nNeuralNetworkEstimator - A Feed-forward neural network (unfitted)\n\njulia> ŷ_prob = fit!(m,X,y_oh)\n***\n*** Training for 300 epochs with algorithm ADAM.\nTraining.. avg ϵ on (Epoch 1 Batch 1): 0.4116936481380642\nTraining of 300 epoch completed. Final epoch error: 0.44308719831108734.\n6×2 Matrix{Float64}:\n 0.853198 0.146802\n 0.0513715 0.948629\n 0.0894273 0.910573\n 0.0367079 0.963292\n 0.00548038 0.99452\n 0.808334 0.191666\n\njulia> ŷ = inverse_predict(ohmod,ŷ_prob)\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n \"a\"\n\nRegression...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = 2 .* X[:,1] .- X[:,2] .+ 3;\n\njulia> layers = [DenseLayer(2,6),DenseLayer(6,6),DenseLayer(6,1)];\n\njulia> m = NeuralNetworkEstimator(layers=layers,opt_alg=ADAM(),epochs=3000,verbosity=LOW)\nNeuralNetworkEstimator - A Feed-forward neural network (unfitted)\n\njulia> ŷ = fit!(m,X,y);\n***\n*** Training for 3000 epochs with algorithm ADAM.\nTraining.. avg ϵ on (Epoch 1 Batch 1): 33.30063874270561\nTraining of 3000 epoch completed. Final epoch error: 34.61265465430473.\n\njulia> hcat(y,ŷ)\n6×2 Matrix{Float64}:\n 4.1 4.11015\n -16.5 -16.5329\n -13.8 -13.8381\n -18.4 -18.3876\n -27.2 -27.1667\n 2.7 2.70542\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.PoolingLayer","page":"Nn","title":"BetaML.Nn.PoolingLayer","text":"struct PoolingLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nRepresentation of a pooling layer in the network (weightless)\n\nEXPERIMENTAL: Still too slow for practical applications\n\nIn the middle between VectorFunctionLayer and ScalarFunctionLayer, it applyes a function to the set of nodes defined in a sliding kernel.\n\nFields:\n\ninput_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Input size (including nchannel_in as last dimension)\noutput_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Output size (including nchannel_out as last dimension)\nkernel_size::StaticArraysCore.SVector{NDPLUS2, Int64} where NDPLUS2: kernelsize augmented by the nchannelsin and nchannels_out dimensions\npadding_start::StaticArraysCore.SVector{ND, Int64} where ND: Padding (initial)\npadding_end::StaticArraysCore.SVector{ND, Int64} where ND: Padding (ending)\nstride::StaticArraysCore.SVector{ND, Int64} where ND: Stride\nndims::Int64: Number of dimensions (excluding input and output channels)\nf::Function: Activation function\ndf::Union{Nothing, Function}: Derivative of the activation function\ny_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1: A y-dims array of vectors of ids of x(s) contributing to the giving y\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.PoolingLayer-Tuple{Any, Any, Any}","page":"Nn","title":"BetaML.Nn.PoolingLayer","text":"PoolingLayer(\n input_size,\n kernel_size,\n nchannels_in;\n stride,\n kernel_eltype,\n padding,\n f,\n df\n) -> PoolingLayer{_A, _B, _C, typeof(maximum), _D, Float64} where {_A, _B, _C, _D<:Union{Nothing, Function}}\n\n\nInstantiate a new nD-dimensional, possibly multichannel PoolingLayer\n\nThe input data is either a column vector (in which case is reshaped) or an array of input_size augmented by the n_channels dimension, the output size depends on the input_size, kernel_size, padding and striding but has always nchannels_out as its last dimention. \n\nPositional arguments:\n\ninput_size: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.\nkernel_eltype: Kernel eltype [def: Float64]\nkernel_size: Size of the kernel (aka filter) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.\nnchannels_in: Number of channels in input\nnchannels_out: Number of channels in output\n\nKeyword arguments:\n\nstride: \"Steps\" to move the convolution with across the various tensor dimensions [def: kernel_size, i.e. each X contributes to a single y]\npadding: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def: nothing, i.e. set the padding required to keep outside = inside / stride ]\nf: Activation function. It should have a vector as input and produce a scalar as output[def: maximum]\ndf: Derivative (gradient) of the activation function for the various inputs. [default: nothing (i.e. use AD)]\n\nNotes:\n\nto retrieve the output size of the layer, use size(PoolLayer[2]). The output size on each dimension d (except the last one that is given by nchannels_out) is given by the following formula (ceiled): output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]\ndifferently from a ConvLayer, the pooling applies always on a single channel level, so that the output has always the same number of channels of the input. If you want to reduce the channels number either use a ConvLayer with the desired number of channels in output or use a ReghaperLayer to add a 1-element further dimension that will be treated as \"channel\" and choose the desided stride for the last pooling dimension (the one that was originally the channel dimension) \n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.PoolingLayer-Tuple{Any, Any}","page":"Nn","title":"BetaML.Nn.PoolingLayer","text":"PoolingLayer(\n input_size_with_channel,\n kernel_size;\n stride,\n padding,\n f,\n kernel_eltype,\n df\n) -> PoolingLayer{_A, _B, _C, typeof(maximum), _D, _E} where {_A, _B, _C, _D<:Union{Nothing, Function}, _E<:Number}\n\n\nAlternative constructor for a PoolingLayer where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2] if one wish.\n\nFor arguments and default values see the documentation of the main constructor.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.ReshaperLayer","page":"Nn","title":"BetaML.Nn.ReshaperLayer","text":"struct ReshaperLayer{NDIN, NDOUT} <: AbstractLayer\n\nRepresentation of a \"reshaper\" (weigthless) layer in the network\n\nReshape the output of a layer (or the input data) to the shape needed for the next one.\n\nFields:\n\ninput_size::StaticArraysCore.SVector{NDIN, Int64} where NDIN: Input size\noutput_size::StaticArraysCore.SVector{NDOUT, Int64} where NDOUT: Output size\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.ReshaperLayer-2","page":"Nn","title":"BetaML.Nn.ReshaperLayer","text":"ReshaperLayer(\n input_size\n) -> ReshaperLayer{_A, _B} where {_A, _B}\nReshaperLayer(\n input_size,\n output_size\n) -> ReshaperLayer{_A, _B} where {_A, _B}\n\n\nInstantiate a new ReshaperLayer\n\nPositional arguments:\n\ninput_size: Shape of the input layer (tuple).\noutput_size: Shape of the input layer (tuple) [def: prod([input_size...])), i.e. reshape to a vector of appropriate lenght].\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.SGD","page":"Nn","title":"BetaML.Nn.SGD","text":"SGD(;η=t -> 1/(1+t), λ=2)\n\nStochastic Gradient Descent algorithm (default)\n\nFields:\n\nη: Learning rate, as a function of the current epoch [def: t -> 1/(1+t)]\nλ: Multiplicative constant to the learning rate [def: 2]\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.ScalarFunctionLayer","page":"Nn","title":"BetaML.Nn.ScalarFunctionLayer","text":"struct ScalarFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nRepresentation of a ScalarFunction layer in the network. ScalarFunctionLayer applies the activation function directly to the output of the previous layer (i.e., without passing for a weigth matrix), but using an optional learnable parameter (an array) used as second argument, similarly to [VectorFunctionLayer(@ref). Differently from VectorFunctionLayer, the function is applied scalarwise to each node. \n\nThe number of nodes in input must be set to the same as in the previous layer\n\nFields:\n\nw: Weigths (parameter) array passes as second argument to the activation function (if not empty)\nn: Number of nodes in output (≡ number of nodes in input )\nf: Activation function (vector)\ndfx: Derivative of the (vector) activation function with respect to the layer inputs (x)\ndfw: Derivative of the (vector) activation function with respect to the optional learnable weigths (w) \n\nNotes:\n\nThe output size of this layer is the same as those of the previous layers.\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.ScalarFunctionLayer-Tuple{Any}","page":"Nn","title":"BetaML.Nn.ScalarFunctionLayer","text":"ScalarFunctionLayer(\n nₗ;\n rng,\n wsize,\n w_eltype,\n w,\n f,\n dfx,\n dfw\n) -> ScalarFunctionLayer{_A, typeof(softmax), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}\n\n\nInstantiate a new ScalarFunctionLayer\n\nPositional arguments:\n\nnₗ: Number of nodes (must be same as in the previous layer)\n\nKeyword arguments:\n\nwsize: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]\nw_eltype: Eltype of the weigths [def: Float64]\nw: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]\nf: Activation function [def: softmax]\ndfx: Derivative of the activation function with respect to the data [default: try to match with well-known derivatives, resort to AD if f is unknown]\ndfw: Derivative of the activation function with respect to the learnable parameter [default: nothing (i.e. use AD)]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nIf the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)\nXavier initialization = rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.VectorFunctionLayer","page":"Nn","title":"BetaML.Nn.VectorFunctionLayer","text":"struct VectorFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer\n\nRepresentation of a VectorFunction layer in the network. Vector function layer expects a vector activation function, i.e. a function taking the whole output of the previous layer an input rather than working on a single node as \"normal\" activation functions would do. Useful for example with the SoftMax function in classification or with the pool1D function to implement a \"pool\" layer in 1 dimensions. By default it is weightless, i.e. it doesn't apply any transformation to the output coming from the previous layer except the activation function. However, by passing the parameter wsize (a touple or array - tested only 1D) you can pass the learnable parameter to the activation function too. It is your responsability to be sure the activation function accept only X or also this learnable array (as second argument). The number of nodes in input must be set to the same as in the previous layer (and if you are using this for classification, to the number of classes, i.e. the previous layer must be set equal to the number of classes in the predictions).\n\nFields:\n\nw: Weigths (parameter) array passes as second argument to the activation function (if not empty)\nnₗ: Number of nodes in input (i.e. length of previous layer)\nn: Number of nodes in output (automatically inferred in the constructor)\nf: Activation function (vector)\ndfx: Derivative of the (vector) activation function with respect to the layer inputs (x)\ndfw: Derivative of the (vector) activation function with respect to the optional learnable weigths (w) \n\nNotes:\n\nThe output size of this layer is given by the size of the output function,\n\nthat not necessarily is the same as the previous layers.\n\n\n\n\n\n","category":"type"},{"location":"Nn.html#BetaML.Nn.VectorFunctionLayer-Tuple{Any}","page":"Nn","title":"BetaML.Nn.VectorFunctionLayer","text":"VectorFunctionLayer(\n nₗ;\n rng,\n wsize,\n w_eltype,\n w,\n f,\n dfx,\n dfw,\n dummyDataToTestOutputSize\n) -> VectorFunctionLayer{_A, typeof(softmax), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}\n\n\nInstantiate a new VectorFunctionLayer\n\nPositional arguments:\n\nnₗ: Number of nodes (must be same as in the previous layer)\n\nKeyword arguments:\n\nwsize: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]\nw_eltype: Eltype of the weigths [def: Float64]\nw: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]\nf: Activation function [def: softmax]\ndfx: Derivative of the activation function with respect to the data\n\n[default: try to match with well-known derivatives, resort to AD if f is unknown]\n\ndfw: Derivative of the activation function with respect to the learnable parameter [default: nothing (i.e. use AD)]\ndummyDataToTestOutputSize: Dummy data to test the output size [def:\n\nones(nₗ)]\n\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nIf the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)\nTo avoid recomputing the activation function just to determine its output size, we compute the output size once here in the layer constructor by calling the activation function with dummyDataToTestOutputSize. Feel free to change it if it doesn't match with the activation function you are setting\nXavier initialization = rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#Base.size-Tuple{AbstractLayer}","page":"Nn","title":"Base.size","text":"size(layer)\n\nGet the size of the layers in terms of (size in input, size in output) - both as tuples\n\nNotes:\n\nYou need to use import Base.size before defining this function for your layer\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#Base.size-Tuple{ConvLayer}","page":"Nn","title":"Base.size","text":"size(layer::ConvLayer) -> Tuple{Tuple, Tuple}\n\n\nGet the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#Base.size-Union{Tuple{PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET} where {TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number}}, Tuple{NDPLUS2}, Tuple{NDPLUS1}, Tuple{ND}} where {ND, NDPLUS1, NDPLUS2}","page":"Nn","title":"Base.size","text":"size(\n layer::PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET} where {TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number}\n) -> Tuple{Tuple, Tuple}\n\n\nGet the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.ReplicatorLayer-Tuple{Any}","page":"Nn","title":"BetaML.Nn.ReplicatorLayer","text":"ReplicatorLayer(\n n\n) -> ScalarFunctionLayer{_A, typeof(identity), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}\n\n\nCreate a weigthless layer whose output is equal to the input. \n\nFields:\n\nn: Number of nodes in output (≡ number of nodes in input ) \n\nNotes:\n\nThe output size of this layer is the same as those of the previous layers.\nThis is just an alias for a ScalarFunctionLayer with no weigths and identity function.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.backward-Tuple{AbstractLayer, Any, Any}","page":"Nn","title":"BetaML.Nn.backward","text":"backward(layer,x,next_gradient)\n\nCompute backpropagation for this layer with respect to its inputs\n\nParameters:\n\nlayer: Worker layer\nx: Input to the layer\nnext_gradient: Derivative of the overal loss with respect to the input of the next layer (output of this layer)\n\nReturn:\n\nThe evaluated gradient of the loss with respect to this layer inputs\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.fitting_info-NTuple{5, Any}","page":"Nn","title":"BetaML.Nn.fitting_info","text":"fittinginfo(nn,xbatch,ybatch,x,y;n,batchsize,epochs,epochsran,verbosity,nepoch,n_batch)\n\nDefault callback funtion to display information during training, depending on the verbosity level\n\nParameters:\n\nnn: Worker network\nxbatch: Batch input to the network (batch_size,din)\nybatch: Batch label input (batch_size,dout)\nx: Full input to the network (n_records,din)\ny: Full label input (n_records,dout)\nn: Size of the full training set\nn_batches : Number of baches per epoch\nepochs: Number of epochs defined for the training\nepochs_ran: Number of epochs already ran in previous training sessions\nverbosity: Verbosity level defined for the training (NONE,LOW,STD,HIGH,FULL)\nn_epoch: Counter of the current epoch\nn_batch: Counter of the current batch\n\n#Notes:\n\nReporting of the error (loss of the network) is expensive. Use verbosity=NONE for better performances\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.forward-Tuple{AbstractLayer, Any}","page":"Nn","title":"BetaML.Nn.forward","text":"forward(layer,x)\n\nPredict the output of the layer given the input\n\nParameters:\n\nlayer: Worker layer\nx: Input to the layer\n\nReturn:\n\nAn Array{T,1} of the prediction (even for a scalar)\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.forward-Union{Tuple{WET}, Tuple{TDF}, Tuple{TF}, Tuple{NDPLUS2}, Tuple{NDPLUS1}, Tuple{ND}, Tuple{ConvLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET}, Any}} where {ND, NDPLUS1, NDPLUS2, TF, TDF, WET}","page":"Nn","title":"BetaML.Nn.forward","text":"forward(\n layer::ConvLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET},\n x\n) -> Any\n\n\nCompute forward pass of a ConvLayer\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.forward-Union{Tuple{WET}, Tuple{TDF}, Tuple{TF}, Tuple{NDPLUS2}, Tuple{NDPLUS1}, Tuple{ND}, Tuple{PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET}, Any}} where {ND, NDPLUS1, NDPLUS2, TF, TDF, WET}","page":"Nn","title":"BetaML.Nn.forward","text":"forward(\n layer::PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET},\n x\n) -> Any\n\n\nCompute forward pass of a ConvLayer\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.get_gradient-Tuple{AbstractLayer, Any, Any}","page":"Nn","title":"BetaML.Nn.get_gradient","text":"get_gradient(layer,x,next_gradient)\n\nCompute backpropagation for this layer with respect to the layer weigths\n\nParameters:\n\nlayer: Worker layer\nx: Input to the layer\nnext_gradient: Derivative of the overaall loss with respect to the input of the next layer (output of this layer)\n\nReturn:\n\nThe evaluated gradient of the loss with respect to this layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the get_params() and set_params() functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in its Learnable type.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.get_gradient-Union{Tuple{N2}, Tuple{N1}, Tuple{T2}, Tuple{T}, Tuple{BetaML.Nn.NN, Union{AbstractArray{T, N1}, T}, Union{AbstractArray{T2, N2}, T2}}} where {T<:Number, T2<:Number, N1, N2}","page":"Nn","title":"BetaML.Nn.get_gradient","text":"get_gradient(nn,x,y)\n\nLow level function that retrieve the current gradient of the weigthts (i.e. derivative of the cost with respect to the weigths). Unexported in BetaML >= v0.9\n\nParameters:\n\nnn: Worker network\nx: Input to the network (d,1)\ny: Label input (d,1)\n\n#Notes:\n\nThe output is a vector of tuples of each layer's input weigths and bias weigths\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.get_params-Tuple{AbstractLayer}","page":"Nn","title":"BetaML.Nn.get_params","text":"get_params(layer)\n\nGet the layers current value of its trainable parameters\n\nParameters:\n\nlayer: Worker layer\n\nReturn:\n\nThe current value of the layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the get_gradient() and set_params() functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in its Learnable type.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.get_params-Tuple{BetaML.Nn.NN}","page":"Nn","title":"BetaML.Nn.get_params","text":"get_params(nn)\n\nRetrieve current weigthts\n\nParameters:\n\nnn: Worker network\n\nNotes:\n\nThe output is a vector of tuples of each layer's input weigths and bias weigths\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.init_optalg!-Tuple{ADAM}","page":"Nn","title":"BetaML.Nn.init_optalg!","text":"init_optalg!(opt_alg::ADAM;θ,batch_size,x,y,rng)\n\nInitialize the ADAM algorithm with the parameters m and v as zeros and check parameter bounds\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.init_optalg!-Tuple{BetaML.Nn.OptimisationAlgorithm}","page":"Nn","title":"BetaML.Nn.init_optalg!","text":"initoptalg!(optalg;θ,batch_size,x,y)\n\nInitialize the optimisation algorithm\n\nParameters:\n\nopt_alg: The Optimisation algorithm to use\nθ: Current parameters\nbatch_size: The size of the batch\nx: The training (input) data\ny: The training \"labels\" to match\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nOnly a few optimizers need this function and consequently ovverride it. By default it does nothing, so if you want write your own optimizer and don't need to initialise it, you don't have to override this method\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.preprocess!-Tuple{AbstractLayer}","page":"Nn","title":"BetaML.Nn.preprocess!","text":"preprocess!(layer::AbstractLayer)\n\n\nPreprocess the layer with information known at layer creation (i.e. no data info used)\n\nThis function is used for some layers to cache some computation that doesn't require the data and it is called at the beginning of fit!. For example, it is used in ConvLayer to store the ids of the convolution.\n\nNotes:\n\nas it doesn't depend on data, it is not reset by reset!\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.set_params!-Tuple{AbstractLayer, Any}","page":"Nn","title":"BetaML.Nn.set_params!","text":"set_params!(layer,w)\n\nSet the trainable parameters of the layer with the given values\n\nParameters:\n\nlayer: Worker layer\nw: The new parameters to set (Learnable)\n\nNotes:\n\nThe format of the tuple wrapped by Learnable must be consistent with those of the get_params() and get_gradient() functions.\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.set_params!-Tuple{BetaML.Nn.NN, Any}","page":"Nn","title":"BetaML.Nn.set_params!","text":"set_params!(nn,w)\n\nUpdate weigths of the network\n\nParameters:\n\nnn: Worker network\nw: The new weights to set\n\n\n\n\n\n","category":"method"},{"location":"Nn.html#BetaML.Nn.single_update!-Tuple{Any, Any}","page":"Nn","title":"BetaML.Nn.single_update!","text":"singleupdate!(θ,▽;nepoch,nbatch,batchsize,xbatch,ybatch,opt_alg)\n\nPerform the parameters update based on the average batch gradient.\n\nParameters:\n\nθ: Current parameters\n▽: Average gradient of the batch\nn_epoch: Count of current epoch\nn_batch: Count of current batch\nn_batches: Number of batches per epoch\nxbatch: Data associated to the current batch\nybatch: Labels associated to the current batch\nopt_alg: The Optimisation algorithm to use for the update\n\nNotes:\n\nThis function is overridden so that each optimisation algorithm implement their\n\nown version\n\nMost parameters are not used by any optimisation algorithm. They are provided\n\nto support the largest possible class of optimisation algorithms\n\nSome optimisation algorithms may change their internal structure in this function\n\n\n\n\n\n","category":"method"},{"location":"Trees.html#trees_module","page":"Trees","title":"The BetaML.Trees Module","text":"","category":"section"},{"location":"Trees.html","page":"Trees","title":"Trees","text":"Trees","category":"page"},{"location":"Trees.html#BetaML.Trees","page":"Trees","title":"BetaML.Trees","text":"BetaML.Trees module\n\nImplement the DecisionTreeEstimator and RandomForestEstimator models (Decision Trees and Random Forests).\n\nBoth Decision Trees and Random Forests can be used for regression or classification problems, based on the type of the labels (numerical or not). The automatic selection can be overridden with the parameter force_classification=true, typically if labels are integer representing some categories rather than numbers. For classification problems the output of predict is a dictionary with the key being the labels with non-zero probabilitity and the corresponding value its probability; for regression it is a numerical value.\n\nPlease be aware that, differently from most other implementations, the Random Forest algorithm collects and averages the probabilities from the trees, rather than just repording the mode, i.e. no information is lost and the output of the forest classifier is still a PMF.\n\nTo retrieve the prediction with the highest probability use mode over the prediciton returned by the model. Most error/accuracy measures in the Utils BetaML module works diretly with this format.\n\nMissing data and trully unordered types are supported on the features, both on training and on prediction.\n\nThe module provide the following functions. Use ?[type or function] to access their full signature and detailed documentation:\n\nFeatures are expected to be in the standard format (nRecords × nDimensions matrices) and the labels (either categorical or numerical) as a nRecords column vector.\n\nAcknowlegdments: originally based on the Josh Gordon's code\n\n\n\n\n\n","category":"module"},{"location":"Trees.html#Module-Index","page":"Trees","title":"Module Index","text":"","category":"section"},{"location":"Trees.html","page":"Trees","title":"Trees","text":"Modules = [Trees]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Trees.html#Detailed-API","page":"Trees","title":"Detailed API","text":"","category":"section"},{"location":"Trees.html","page":"Trees","title":"Trees","text":"Modules = [Trees]\nPrivate = false","category":"page"},{"location":"Trees.html#BetaML.Trees.DecisionNode","page":"Trees","title":"BetaML.Trees.DecisionNode","text":"DecisionNode(question,trueBranch,falseBranch, depth)\n\nA tree's non-terminal node.\n\nConstructor's arguments and struct members:\n\nquestion: The question asked in this node\ntrueBranch: A reference to the \"true\" branch of the trees\nfalseBranch: A reference to the \"false\" branch of the trees\ndepth: The nodes's depth in the tree\n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.DecisionTreeE_hp","page":"Trees","title":"BetaML.Trees.DecisionTreeE_hp","text":"mutable struct DecisionTreeE_hp <: BetaMLHyperParametersSet\n\nHyperparameters for DecisionTreeEstimator (Decision Tree).\n\nParameters:\n\nmax_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Union{Nothing, Int64}: The maximum number of (random) features to consider at each partitioning [def: nothing, i.e. look at all features]\nforce_classification::Bool: Whether to force a classification task even if the labels are numerical (typically when labels are integers encoding some feature rather than representing a real cardinal measure) [def: false]\nsplitting_criterion::Union{Nothing, Function}: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. Either gini, entropy, variance or a custom function. It can also be an anonymous function.\nfast_algorithm::Bool: Use an experimental faster algoritm for looking up the best split in ordered fields (colums). Currently it brings down the fitting time of an order of magnitude, but predictions are sensibly affected. If used, control the meaning of integer fields with integer_encoded_cols.\ninteger_encoded_cols::Union{Nothing, Vector{Int64}}: A vector of columns positions to specify which integer columns should be treated as encoding of categorical variables insteads of ordered classes/values. [def: nothing, integer columns with less than 20 unique values are considered categorical]. Useful in conjunction with fast_algorithm, little difference otherwise.\ntunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.DecisionTreeEstimator","page":"Trees","title":"BetaML.Trees.DecisionTreeEstimator","text":"mutable struct DecisionTreeEstimator <: BetaMLSupervisedModel\n\nA Decision Tree classifier and regressor (supervised).\n\nDecision Tree works by finding the \"best\" question to split the fitting data (according to the metric specified by the parameter splitting_criterion on the associated labels) untill either all the dataset is separated or a terminal condition is reached. \n\nFor the parameters see ?DecisionTreeE_hp and ?BML_options.\n\nNotes:\n\nOnline fitting (re-fitting with new data) is not supported\nMissing data (in the feature dataset) is supported.\n\nExamples:\n\nClassification...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> mod = DecisionTreeEstimator(max_depth=5)\nDecisionTreeEstimator - A Decision Tree model (unfitted)\n\njulia> ŷ = fit!(mod,X,y) |> mode\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n \"a\"\n\njulia> println(mod)\nDecisionTreeEstimator - A Decision Tree classifier (fitted on 6 records)\nDict{String, Any}(\"job_is_regression\" => 0, \"fitted_records\" => 6, \"max_reached_depth\" => 2, \"avg_depth\" => 2.0, \"xndims\" => 2)\n*** Printing Decision Tree: ***\n\n1. Is col 2 >= 18.0 ?\n--> True : Dict(\"b\" => 1.0)\n--> False: Dict(\"a\" => 1.0)\n\nRegression...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = 2 .* X[:,1] .- X[:,2] .+ 3;\n\njulia> mod = DecisionTreeEstimator(max_depth=10)\nDecisionTreeEstimator - A Decision Tree model (unfitted)\n\njulia> ŷ = fit!(mod,X,y);\n\njulia> hcat(y,ŷ)\n6×2 Matrix{Float64}:\n 4.1 3.4\n -16.5 -17.45\n -13.8 -13.8\n -18.4 -17.45\n -27.2 -27.2\n 2.7 3.4\n\njulia> println(mod)\nDecisionTreeEstimator - A Decision Tree regressor (fitted on 6 records)\nDict{String, Any}(\"job_is_regression\" => 1, \"fitted_records\" => 6, \"max_reached_depth\" => 4, \"avg_depth\" => 3.25, \"xndims\" => 2)\n*** Printing Decision Tree: ***\n\n1. Is col 2 >= 18.0 ?\n--> True :\n 1.2. Is col 2 >= 31.0 ?\n --> True : -27.2\n --> False:\n 1.2.3. Is col 2 >= 20.5 ?\n --> True : -17.450000000000003\n --> False: -13.8\n--> False: 3.3999999999999995\n\nVisualisation...\n\nYou can either text-print or plot a decision tree using the AbstractTree and TreeRecipe package..\n\njulia> println(mod)\nDecisionTreeEstimator - A Decision Tree regressor (fitted on 6 records)\nDict{String, Any}(\"job_is_regression\" => 1, \"fitted_records\" => 6, \"max_reached_depth\" => 4, \"avg_depth\" => 3.25, \"xndims\" => 2)\n*** Printing Decision Tree: ***\n\n1. Is col 2 >= 18.0 ?\n--> True :\n 1.2. Is col 2 >= 31.0 ?\n --> True : -27.2\n --> False:\n 1.2.3. Is col 2 >= 20.5 ?\n --> True : -17.450000000000003\n --> False: -13.8\n--> False: 3.3999999999999995\n\njulia> using Plots, TreeRecipe, AbstractTrees\njulia> featurenames = [\"Something\", \"Som else\"];\njulia> wrapped_tree = wrapdn(dtree, featurenames = featurenames); # featurenames is otional\njulia> print_tree(wrapped_tree)\nSom else >= 18.0?\n├─ Som else >= 31.0?\n│ ├─ -27.2\n│ │ \n│ └─ Som else >= 20.5?\n│ ├─ -17.450000000000003\n│ │ \n│ └─ -13.8\n│ \n└─ 3.3999999999999995\njulia> plot(wrapped_tree) \n\n(Image: DT plot) \n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.InfoNode","page":"Trees","title":"BetaML.Trees.InfoNode","text":"These types are introduced so that additional information currently not present in a DecisionTree-structure – namely the feature names – can be used for visualization.\n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.Leaf","page":"Trees","title":"BetaML.Trees.Leaf","text":"Leaf(y,depth)\n\nA tree's leaf (terminal) node.\n\nConstructor's arguments:\n\ny: The labels assorciated to each record (either numerical or categorical)\ndepth: The nodes's depth in the tree\n\nStruct members:\n\npredictions: Either the relative label's count (i.e. a PMF) or the mean\ndepth: The nodes's depth in the tree\n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.RandomForestE_hp","page":"Trees","title":"BetaML.Trees.RandomForestE_hp","text":"mutable struct RandomForestE_hp <: BetaMLHyperParametersSet\n\nHyperparameters for RandomForestEstimator (Random Forest).\n\nParameters:\n\nn_trees::Int64: Number of (decision) trees in the forest [def: 30]\nmax_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Union{Nothing, Int64}: The maximum number of (random) features to consider when choosing the optimal partition of the dataset [def: nothing, i.e. square root of the dimensions of the training data`]\nforce_classification::Bool: Whether to force a classification task even if the labels are numerical (typically when labels are integers encoding some feature rather than representing a real cardinal measure) [def: false]\nsplitting_criterion::Union{Nothing, Function}: Either gini, entropy or variance. This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. It can be an anonymous function.\nfast_algorithm::Bool: Use an experimental faster algoritm for looking up the best split in ordered fields (colums). Currently it brings down the fitting time of an order of magnitude, but predictions are sensibly affected. If used, control the meaning of integer fields with integer_encoded_cols.\ninteger_encoded_cols::Union{Nothing, Vector{Int64}}: A vector of columns positions to specify which integer columns should be treated as encoding of categorical variables insteads of ordered classes/values. [def: nothing, integer columns with less than 20 unique values are considered categorical]. Useful in conjunction with fast_algorithm, little difference otherwise.\nbeta::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour \"better\" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]\noob::Bool: Wheter to compute the Out-Of-Bag error, an estimation of the validation error (the mismatching error for classification and the relative mean error for regression jobs).\ntunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Trees.html#BetaML.Trees.RandomForestEstimator","page":"Trees","title":"BetaML.Trees.RandomForestEstimator","text":"mutable struct RandomForestEstimator <: BetaMLSupervisedModel\n\nA Random Forest classifier and regressor (supervised).\n\nRandom forests are ensemble of Decision Trees models (see ?DecisionTreeEstimator).\n\nFor the parameters see ?RandomForestE_hp and ?BML_options.\n\nNotes :\n\nEach individual decision tree is built using bootstrap over the data, i.e. \"sampling N records with replacement\" (hence, some records appear multiple times and some records do not appear in the specific tree training). The maxx_feature injects further variability and reduces the correlation between the forest trees.\nThe predictions of the \"forest\" (using the function predict()) are then the aggregated predictions of the individual trees (from which the name \"bagging\": boostrap aggregating).\nThe performances of each individual trees, as measured using the records they have not being trained with, can then be (optionally) used as weights in the predict function. The parameter beta ≥ 0 regulate the distribution of these weights: larger is β, the greater the importance (hence the weights) attached to the best-performing trees compared to the low-performing ones. Using these weights can significantly improve the forest performances (especially using small forests), however the correct value of beta depends on the problem under exam (and the chosen caratteristics of the random forest estimator) and should be cross-validated to avoid over-fitting.\nNote that training RandomForestEstimator uses multiple threads if these are available. You can check the number of threads available with Threads.nthreads(). To set the number of threads in Julia either set the environmental variable JULIA_NUM_THREADS (before starting Julia) or start Julia with the command line option --threads (most integrated development editors for Julia already set the number of threads to 4).\nOnline fitting (re-fitting with new data) is not supported\nMissing data (in the feature dataset) is supported.\n\nExamples:\n\nClassification...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> mod = RandomForestEstimator(n_trees=5)\nRandomForestEstimator - A 5 trees Random Forest model (unfitted)\n\njulia> ŷ = fit!(mod,X,y) |> mode\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n \"a\"\n\njulia> println(mod)\nRandomForestEstimator - A 5 trees Random Forest classifier (fitted on 6 records)\nDict{String, Any}(\"job_is_regression\" => 0, \"avg_avg_depth\" => 1.8, \"fitted_records\" => 6, \"avg_mmax_reached_depth\" => 1.8, \"oob_errors\" => Inf, \"xndims\" => 2)\n\nRegression...\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = 2 .* X[:,1] .- X[:,2] .+ 3;\n\njulia> mod = RandomForestEstimator(n_trees=5)\nRandomForestEstimator - A 5 trees Random Forest model (unfitted)\n\njulia> ŷ = fit!(mod,X,y);\n\njulia> hcat(y,ŷ)\n6×2 Matrix{Float64}:\n 4.1 2.98\n -16.5 -18.37\n -13.8 -14.61\n -18.4 -17.37\n -27.2 -20.78\n 2.7 2.98\n\njulia> println(mod)\nRandomForestEstimator - A 5 trees Random Forest regressor (fitted on 6 records)\nDict{String, Any}(\"job_is_regression\" => 1, \"fitted_records\" => 6, \"avg_avg_depth\" => 2.8833333333333333, \"oob_errors\" => Inf, \"avg_max_reached_depth\" => 3.4, \"xndims\" => 2)\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#utils_module","page":"Utils","title":"The BetaML.Utils Module","text":"","category":"section"},{"location":"Utils.html","page":"Utils","title":"Utils","text":"Utils\n","category":"page"},{"location":"Utils.html#BetaML.Utils","page":"Utils","title":"BetaML.Utils","text":"Utils module\n\nProvide shared utility functions and/or models for various machine learning algorithms.\n\nFor the complete list of functions provided see below. The main ones are:\n\nHelper functions for logging\n\nMost BetaML functions accept a parameter verbosity (choose between NONE, LOW, STD, HIGH or FULL)\nWriting complex code and need to find where something is executed ? Use the macro @codelocation\n\nStochasticity management\n\nUtils provide [FIXEDSEED], [FIXEDRNG] and generate_parallel_rngs. All stochastic functions and models accept a rng parameter. See the \"Getting started\" section in the tutorial for details.\n\nData processing\n\nVarious small and large utilities for helping processing the data, expecially before running a ML algorithm\nIncludes getpermutations, OneHotEncoder, OrdinalEncoder, partition, Scaler, PCAEncoder, AutoEncoder, cross_validation.\nAuto-tuning of hyperparameters is implemented in the supported models by specifying autotune=true and optionally overriding the tunemethod parameters (e.g. for different hyperparameters ranges or different resources available for the tuning). Autotuning is then implemented in the (first) fit! call. Provided autotuning methods: SuccessiveHalvingSearch (default), GridSearch\n\nSamplers\n\nUtilities to sample from data (e.g. for neural network training or for cross-validation)\nInclude the \"generic\" type SamplerWithData, together with the sampler implementation KFold and the function batch\n\nTransformers\n\nFuntions that \"transform\" a single input (that can be also a vector or a matrix)\nIncludes varios NN \"activation\" functions (relu, celu, sigmoid, softmax, pool1d) and their derivatives (d[FunctionName]), but also gini, entropy, variance, BIC, AIC\n\nMeasures\n\nSeveral functions of a pair of parameters (often y and ŷ) to measure the goodness of ŷ, the distance between the two elements of the pair, ...\nIncludes \"classical\" distance functions (l1_distance, l2_distance, l2squared_distance cosine_distance), \"cost\" functions for continuous variables (squared_cost, relative_mean_error) and comparision functions for multi-class variables (crossentropy, accuracy, ConfusionMatrix, silhouette)\nDistances can be used to compute a pairwise distance matrix using the function pairwise\n\n\n\n\n\n","category":"module"},{"location":"Utils.html#Module-Index","page":"Utils","title":"Module Index","text":"","category":"section"},{"location":"Utils.html","page":"Utils","title":"Utils","text":"Modules = [Utils]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Utils.html#Detailed-API","page":"Utils","title":"Detailed API","text":"","category":"section"},{"location":"Utils.html","page":"Utils","title":"Utils","text":"Modules = [Utils]\nPrivate = false","category":"page"},{"location":"Utils.html#BetaML.Utils.AutoE_hp","page":"Utils","title":"BetaML.Utils.AutoE_hp","text":"mutable struct AutoE_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the AutoEncoder transformer\n\nParameters\n\nencoded_size: The desired size of the encoded data, that is the number of dimensions in output or the size of the latent space. This is the number of neurons of the layer sitting between the econding and decoding layers. If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: 0.33]\nlayers_size: Inner layers dimension (i.e. number of neurons). If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: nothing that applies a specific heuristic]. Consider that the underlying neural network is trying to predict multiple values at the same times. Normally this requires many more neurons than a scalar prediction. If e_layers or d_layers are specified, this parameter is ignored for the respective part.\ne_layers: The layers (vector of AbstractLayers) responsable of the encoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]\nd_layers: The layers (vector of AbstractLayers) responsable of the decoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]\nloss: Loss (cost) function [def: squared_cost] It must always assume y and ŷ as (n x d) matrices, eventually using dropdims inside.\n\ndloss: Derivative of the loss function [def: dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]\nbatch_size: Size of each individual batch [def: 8]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: ADAM()]\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ntunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.AutoEncoder","page":"Utils","title":"BetaML.Utils.AutoEncoder","text":"mutable struct AutoEncoder <: BetaMLUnsupervisedModel\n\nPerform a (possibly-non linear) transformation (\"encoding\") of the data into a different space, e.g. for dimensionality reduction using neural network trained to replicate the input data.\n\nA neural network is trained to first transform the data (ofter \"compress\") to a subspace (the output of an inner layer) and then retransform (subsequent layers) to the original data.\n\npredict(mod::AutoEncoder,x) returns the encoded data, inverse_predict(mod::AutoEncoder,xtransformed) performs the decoding.\n\nFor the parameters see AutoE_hp and BML_options \n\nNotes:\n\nAutoEncoder doesn't automatically scale the data. It is suggested to apply the Scaler model before running it. \nMissing data are not supported. Impute them first, see the Imputation module.\nDecoding layers can be optinally choosen (parameter d_layers) in order to suit the kind of data, e.g. a relu activation function for nonegative data\n\nExample:\n\njulia> using BetaML\n\njulia> x = [0.12 0.31 0.29 3.21 0.21;\n 0.22 0.61 0.58 6.43 0.42;\n 0.51 1.47 1.46 16.12 0.99;\n 0.35 0.93 0.91 10.04 0.71;\n 0.44 1.21 1.18 13.54 0.85];\n\njulia> m = AutoEncoder(encoded_size=1,epochs=400)\nA AutoEncoder BetaMLModel (unfitted)\n\njulia> x_reduced = fit!(m,x)\n***\n*** Training for 400 epochs with algorithm ADAM.\nTraining.. avg loss on epoch 1 (1): 60.27802763757111\nTraining.. avg loss on epoch 200 (200): 0.08970099870421573\nTraining.. avg loss on epoch 400 (400): 0.013138484118673664\nTraining of 400 epoch completed. Final epoch error: 0.013138484118673664.\n5×1 Matrix{Float64}:\n -3.5483740608901186\n -6.90396890458868\n -17.06296512222304\n -10.688936344498398\n -14.35734756603212\n\njulia> x̂ = inverse_predict(m,x_reduced)\n5×5 Matrix{Float64}:\n 0.0982406 0.110294 0.264047 3.35501 0.327228\n 0.205628 0.470884 0.558655 6.51042 0.487416\n 0.529785 1.56431 1.45762 16.067 0.971123\n 0.3264 0.878264 0.893584 10.0709 0.667632\n 0.443453 1.2731 1.2182 13.5218 0.842298\n\njulia> info(m)[\"rme\"]\n0.020858783340281222\n\njulia> hcat(x,x̂)\n5×10 Matrix{Float64}:\n 0.12 0.31 0.29 3.21 0.21 0.0982406 0.110294 0.264047 3.35501 0.327228\n 0.22 0.61 0.58 6.43 0.42 0.205628 0.470884 0.558655 6.51042 0.487416\n 0.51 1.47 1.46 16.12 0.99 0.529785 1.56431 1.45762 16.067 0.971123\n 0.35 0.93 0.91 10.04 0.71 0.3264 0.878264 0.893584 10.0709 0.667632\n 0.44 1.21 1.18 13.54 0.85 0.443453 1.2731 1.2182 13.5218 0.842298\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.ConfusionMatrix","page":"Utils","title":"BetaML.Utils.ConfusionMatrix","text":"mutable struct ConfusionMatrix <: BetaMLUnsupervisedModel\n\nCompute a confusion matrix detailing the mismatch between observations and predictions of a categorical variable\n\nFor the parameters see ConfusionMatrix_hp and BML_options.\n\nThe \"predicted\" values are either the scores or the normalised scores (depending on the parameter normalise_scores [def: true]).\n\nNotes:\n\nThe Confusion matrix report can be printed (i.e. print(cm_model). If you plan to print the Confusion Matrix report, be sure that the type of the data in y and ŷ can be converted to String.\nInformation in a structured way is available trought the info(cm) function that returns the following dictionary:\naccuracy: Oveall accuracy rate\nmisclassification: Overall misclassification rate\nactual_count: Array of counts per lebel in the actual data\npredicted_count: Array of counts per label in the predicted data\nscores: Matrix actual (rows) vs predicted (columns)\nnormalised_scores: Normalised scores\ntp: True positive (by class)\ntn: True negative (by class)\nfp: False positive (by class)\nfn: False negative (by class)\nprecision: True class i over predicted class i (by class)\nrecall: Predicted class i over true class i (by class)\nspecificity: Predicted not class i over true not class i (by class)\nf1score: Harmonic mean of precision and recall\nmean_precision: Mean by class, respectively unweighted and weighted by actual_count\nmean_recall: Mean by class, respectively unweighted and weighted by actual_count\nmean_specificity: Mean by class, respectively unweighted and weighted by actual_count\nmean_f1score: Mean by class, respectively unweighted and weighted by actual_count\ncategories: The categories considered\nfitted_records: Number of records considered\nn_categories: Number of categories considered\n\nExample:\n\nThe confusion matrix can also be plotted, e.g.:\n\njulia> using Plots, BetaML\n\njulia> y = [\"apple\",\"mandarin\",\"clementine\",\"clementine\",\"mandarin\",\"apple\",\"clementine\",\"clementine\",\"apple\",\"mandarin\",\"clementine\"];\n\njulia> ŷ = [\"apple\",\"mandarin\",\"clementine\",\"mandarin\",\"mandarin\",\"apple\",\"clementine\",\"clementine\",missing,\"clementine\",\"clementine\"];\n\njulia> cm = ConfusionMatrix(handle_missing=\"drop\")\nA ConfusionMatrix BetaMLModel (unfitted)\n\njulia> normalised_scores = fit!(cm,y,ŷ)\n3×3 Matrix{Float64}:\n 1.0 0.0 0.0\n 0.0 0.666667 0.333333\n 0.0 0.2 0.8\n\njulia> println(cm)\nA ConfusionMatrix BetaMLModel (fitted)\n\n-----------------------------------------------------------------\n\n*** CONFUSION MATRIX ***\n\nScores actual (rows) vs predicted (columns):\n\n4×4 Matrix{Any}:\n \"Labels\" \"apple\" \"mandarin\" \"clementine\"\n \"apple\" 2 0 0\n \"mandarin\" 0 2 1\n \"clementine\" 0 1 4\nNormalised scores actual (rows) vs predicted (columns):\n\n4×4 Matrix{Any}:\n \"Labels\" \"apple\" \"mandarin\" \"clementine\"\n \"apple\" 1.0 0.0 0.0\n \"mandarin\" 0.0 0.666667 0.333333\n \"clementine\" 0.0 0.2 0.8\n\n *** CONFUSION REPORT ***\n\n- Accuracy: 0.8\n- Misclassification rate: 0.19999999999999996\n- Number of classes: 3\n\n N Class precision recall specificity f1score actual_count predicted_count\n TPR TNR support \n\n 1 apple 1.000 1.000 1.000 1.000 2 2\n 2 mandarin 0.667 0.667 0.857 0.667 3 3\n 3 clementine 0.800 0.800 0.800 0.800 5 5\n\n- Simple avg. 0.822 0.822 0.886 0.822\n- Weigthed avg. 0.800 0.800 0.857 0.800\n\n-----------------------------------------------------------------\nOutput of `info(cm)`:\n- mean_precision: (0.8222222222222223, 0.8)\n- fitted_records: 10\n- specificity: [1.0, 0.8571428571428571, 0.8]\n- precision: [1.0, 0.6666666666666666, 0.8]\n- misclassification: 0.19999999999999996\n- mean_recall: (0.8222222222222223, 0.8)\n- n_categories: 3\n- normalised_scores: [1.0 0.0 0.0; 0.0 0.6666666666666666 0.3333333333333333; 0.0 0.2 0.8]\n- tn: [8, 6, 4]\n- mean_f1score: (0.8222222222222223, 0.8)\n- actual_count: [2, 3, 5]\n- accuracy: 0.8\n- recall: [1.0, 0.6666666666666666, 0.8]\n- f1score: [1.0, 0.6666666666666666, 0.8]\n- mean_specificity: (0.8857142857142858, 0.8571428571428571)\n- predicted_count: [2, 3, 5]\n- scores: [2 0 0; 0 2 1; 0 1 4]\n- tp: [2, 2, 4]\n- fn: [0, 1, 1]\n- categories: [\"apple\", \"mandarin\", \"clementine\"]\n- fp: [0, 1, 1]\n\njulia> res = info(cm);\n\njulia> heatmap(string.(res[\"categories\"]),string.(res[\"categories\"]),res[\"normalised_scores\"],seriescolor=cgrad([:white,:blue]),xlabel=\"Predicted\",ylabel=\"Actual\", title=\"Confusion Matrix (normalised scores)\")\n\n(Image: CM plot) \n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.ConfusionMatrix_hp","page":"Utils","title":"BetaML.Utils.ConfusionMatrix_hp","text":"mutable struct ConfusionMatrix_hp <: BetaMLHyperParametersSet\n\nHyperparameters for ConfusionMatrix\n\nParameters:\n\ncategories: The categories (aka \"levels\") to represent. [def: nothing, i.e. unique ground true values].\nhandle_unknown: How to handle categories not seen in the ground true values or not present in the provided categories array? \"error\" (default) rises an error, \"infrequent\" adds a specific category for these values.\nhandle_missing: How to handle missing values in either ground true or predicted values ? \"error\" [default] will rise an error, \"drop\" will drop the record\nother_categories_name: Which value to assign to the \"other\" category (i.e. categories not seen in the gound truth or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and \"other\" for other types]. This setting is active only if handle_unknown=\"infrequent\" and in that case it MUST be specified if the vector to one-hot encode is neither integer or strings\ncategories_names: A dictionary to map categories to some custom names. Useful for example if categories are integers, or you want to use shorter names [def: Dict(), i.e. not used]. This option isn't currently compatible with missing values or when some record has a value not in this provided dictionary.\nnormalise_scores: Wether predict should return the normalised scores. Note that both unnormalised and normalised scores remain available using info. [def: true]\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.GridSearch","page":"Utils","title":"BetaML.Utils.GridSearch","text":"mutable struct GridSearch <: AutoTuneMethod\n\nSimple grid method for hyper-parameters validation of supervised models.\n\nAll parameters are tested using cross-validation and then the \"best\" combination is used. \n\nNotes:\n\nthe default loss is suitable for 1-dimensional output supervised models\n\nParameters:\n\nloss::Function: Loss function to use. [def: l2loss_by_cv]. Any function that takes a model, data (a vector of arrays, even if we work only with X) and (using therng` keyword) a RNG and return a scalar loss.\nres_share::Float64: Share of the (data) resources to use for the autotuning [def: 0.1]. With res_share=1 all the dataset is used for autotuning, it can be very time consuming!\nhpranges::Dict{String, Any}: Dictionary of parameter names (String) and associated vector of values to test. Note that you can easily sample these values from a distribution with rand(distrobject,nvalues). The number of points you provide for a given parameter can be interpreted as proportional to the prior you have on the importance of that parameter for the algorithm quality.\nmultithreads::Bool: Use multithreads in the search for the best hyperparameters [def: false]\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.KFold","page":"Utils","title":"BetaML.Utils.KFold","text":"KFold(nsplits=5,nrepeats=1,shuffle=true,rng=Random.GLOBAL_RNG)\n\nIterator for k-fold cross_validation strategy.\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.MinMaxScaler","page":"Utils","title":"BetaML.Utils.MinMaxScaler","text":"mutable struct MinMaxScaler <: BetaML.Utils.AbstractScaler\n\nScale the data to a given (def: unit) hypercube\n\nParameters:\n\ninputRange: The range of the input. [def: (minimum,maximum)]. Both ranges are functions of the data. You can consider other relative of absolute ranges using e.g. inputRange=(x->minimum(x)*0.8,x->100)\noutputRange: The range of the scaled output [def: (0,1)]\n\nExample:\n\njulia> using BetaML\n\njulia> x = [[4000,1000,2000,3000] [\"a\", \"categorical\", \"variable\", \"not to scale\"] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]\n4×4 Matrix{Any}:\n 4000 \"a\" 4 0.4\n 1000 \"categorical\" 1 0.1\n 2000 \"variable\" 2 0.2\n 3000 \"not to scale\" 3 0.3\n\njulia> mod = Scaler(MinMaxScaler(outputRange=(0,10)), skip=[2])\nA Scaler BetaMLModel (unfitted)\n\njulia> xscaled = fit!(mod,x)\n4×4 Matrix{Any}:\n 10.0 \"a\" 10.0 10.0\n 0.0 \"categorical\" 0.0 0.0\n 3.33333 \"variable\" 3.33333 3.33333\n 6.66667 \"not to scale\" 6.66667 6.66667\n\njulia> xback = inverse_predict(mod, xscaled)\n4×4 Matrix{Any}:\n 4000.0 \"a\" 4.0 0.4\n 1000.0 \"categorical\" 1.0 0.1\n 2000.0 \"variable\" 2.0 0.2\n 3000.0 \"not to scale\" 3.0 0.3\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.OneHotE_hp","page":"Utils","title":"BetaML.Utils.OneHotE_hp","text":"mutable struct OneHotE_hp <: BetaMLHyperParametersSet\n\nHyperparameters for both OneHotEncoder and OrdinalEncoder\n\nParameters:\n\ncategories: The categories to represent as columns. [def: nothing, i.e. unique training values or range for integers]. Do not include missing in this list.\nhandle_unknown: How to handle categories not seen in training or not present in the provided categories array? \"error\" (default) rises an error, \"missing\" labels the whole output with missing values, \"infrequent\" adds a specific column for these categories in one-hot encoding or a single new category for ordinal one.\nother_categories_name: Which value during inverse transformation to assign to the \"other\" category (i.e. categories not seen on training or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and \"other\" for other types]. This setting is active only if handle_unknown=\"infrequent\" and in that case it MUST be specified if the vector to one-hot encode is neither integer or strings\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.OneHotEncoder","page":"Utils","title":"BetaML.Utils.OneHotEncoder","text":"mutable struct OneHotEncoder <: BetaMLUnsupervisedModel\n\nEncode a vector of categorical values as one-hot columns.\n\nThe algorithm distinguishes between missing values, for which it returns a one-hot encoded row of missing values, and other categories not in the provided list or not seen during training that are handled according to the handle_unknown parameter. \n\nFor the parameters see OneHotE_hp and BML_options. This model supports inverse_predict.\n\nExample:\n\njulia> using BetaML\n\njulia> x = [\"a\",\"d\",\"e\",\"c\",\"d\"];\n\njulia> mod = OneHotEncoder(handle_unknown=\"infrequent\",other_categories_name=\"zz\")\nA OneHotEncoder BetaMLModel (unfitted)\n\njulia> x_oh = fit!(mod,x) # last col is for the \"infrequent\" category\n5×5 Matrix{Bool}:\n 1 0 0 0 0\n 0 1 0 0 0\n 0 0 1 0 0\n 0 0 0 1 0\n 0 1 0 0 0\n\njulia> x2 = [\"a\",\"b\",\"c\"];\n\njulia> x2_oh = predict(mod,x2)\n3×5 Matrix{Bool}:\n 1 0 0 0 0\n 0 0 0 0 1\n 0 0 0 1 0\n\njulia> x2_back = inverse_predict(mod,x2_oh)\n3-element Vector{String}:\n \"a\"\n \"zz\"\n \"c\"\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.OrdinalEncoder","page":"Utils","title":"BetaML.Utils.OrdinalEncoder","text":"mutable struct OrdinalEncoder <: BetaMLUnsupervisedModel\n\nEncode a vector of categorical values as integers.\n\nThe algorithm distinguishes between missing values, for which it propagate the missing, and other categories not in the provided list or not seen during training that are handled according to the handle_unknown parameter. \n\nFor the parameters see OneHotE_hp and BML_options. This model supports inverse_predict.\n\nExample:\n\njulia> using BetaML\n\njulia> x = [\"a\",\"d\",\"e\",\"c\",\"d\"];\n\njulia> mod = OrdinalEncoder(handle_unknown=\"infrequent\",other_categories_name=\"zz\")\nA OrdinalEncoder BetaMLModel (unfitted)\n\njulia> x_int = fit!(mod,x)\n5-element Vector{Int64}:\n 1\n 2\n 3\n 4\n 2\n\njulia> x2 = [\"a\",\"b\",\"c\",\"g\"];\n\njulia> x2_int = predict(mod,x2) # 5 is for the \"infrequent\" category\n4-element Vector{Int64}:\n 1\n 5\n 4\n 5\n\njulia> x2_back = inverse_predict(mod,x2_oh)\n4-element Vector{String}:\n \"a\"\n \"zz\"\n \"c\"\n \"zz\"\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.PCAE_hp","page":"Utils","title":"BetaML.Utils.PCAE_hp","text":"mutable struct PCAE_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the PCAEncoder transformer\n\nParameters\n\nencoded_size: The size, that is the number of dimensions, to maintain (with encoded_size <= size(X,2) ) [def: nothing, i.e. the number of output dimensions is determined from the parameter max_unexplained_var]\nmax_unexplained_var: The maximum proportion of variance that we are willing to accept when reducing the number of dimensions in our data [def: 0.05]. It doesn't have any effect when the output number of dimensions is explicitly chosen with the parameter encoded_size\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.PCAEncoder","page":"Utils","title":"BetaML.Utils.PCAEncoder","text":"mutable struct PCAEncoder <: BetaMLUnsupervisedModel\n\nPerform a Principal Component Analysis, a dimensionality reduction tecnique employing a linear trasformation of the original matrix by the eigenvectors of the covariance matrix.\n\nPCAEncoder returns the matrix reprojected among the dimensions of maximum variance.\n\nFor the parameters see PCAE_hp and BML_options \n\nNotes:\n\nPCAEncoder doesn't automatically scale the data. It is suggested to apply the Scaler model before running it. \nMissing data are not supported. Impute them first, see the Imputation module.\nIf one doesn't know a priori the maximum unexplained variance that he is willling to accept, nor the wished number of dimensions, he can run the model with all the dimensions in output (i.e. with encoded_size=size(X,2)), analise the proportions of explained cumulative variance by dimensions in info(mod,\"\"explained_var_by_dim\"), choose the number of dimensions K according to his needs and finally pick from the reprojected matrix only the number of dimensions required, i.e. out.X[:,1:K].\n\nExample:\n\njulia> using BetaML\n\njulia> xtrain = [1 10 100; 1.1 15 120; 0.95 23 90; 0.99 17 120; 1.05 8 90; 1.1 12 95];\n\njulia> mod = PCAEncoder(max_unexplained_var=0.05)\nA PCAEncoder BetaMLModel (unfitted)\n\njulia> xtrain_reproj = fit!(mod,xtrain)\n6×2 Matrix{Float64}:\n 100.449 3.1783\n 120.743 6.80764\n 91.3551 16.8275\n 120.878 8.80372\n 90.3363 1.86179\n 95.5965 5.51254\n\njulia> info(mod)\nDict{String, Any} with 5 entries:\n \"explained_var_by_dim\" => [0.873992, 0.999989, 1.0]\n \"fitted_records\" => 6\n \"prop_explained_var\" => 0.999989\n \"retained_dims\" => 2\n \"xndims\" => 3\n\njulia> xtest = [2 20 200];\n\njulia> xtest_reproj = predict(mod,xtest)\n1×2 Matrix{Float64}:\n 200.898 6.3566\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.SamplerWithData","page":"Utils","title":"BetaML.Utils.SamplerWithData","text":"SamplerWithData{Tsampler}\n\nAssociate an instance of an AbstractDataSampler with the actual data to sample.\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.Scaler","page":"Utils","title":"BetaML.Utils.Scaler","text":"mutable struct Scaler <: BetaMLUnsupervisedModel\n\nScale the data according to the specific chosen method (def: StandardScaler) \n\nFor the parameters see Scaler_hp and BML_options \n\nExamples:\n\nStandard scaler (default)...\n\njulia> using BetaML, Statistics\n\njulia> x = [[4000,1000,2000,3000] [400,100,200,300] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]\n4×4 Matrix{Float64}:\n 4000.0 400.0 4.0 0.4\n 1000.0 100.0 1.0 0.1\n 2000.0 200.0 2.0 0.2\n 3000.0 300.0 3.0 0.3\n\njulia> mod = Scaler() # equiv to `Scaler(StandardScaler(scale=true, center=true))`\nA Scaler BetaMLModel (unfitted)\n\njulia> xscaled = fit!(mod,x)\n4×4 Matrix{Float64}:\n 1.34164 1.34164 1.34164 1.34164\n -1.34164 -1.34164 -1.34164 -1.34164\n -0.447214 -0.447214 -0.447214 -0.447214\n 0.447214 0.447214 0.447214 0.447214\n\njulia> col_means = mean(xscaled, dims=1)\n1×4 Matrix{Float64}:\n 0.0 0.0 0.0 5.55112e-17\n\njulia> col_var = var(xscaled, dims=1, corrected=false)\n1×4 Matrix{Float64}:\n 1.0 1.0 1.0 1.0\n\njulia> xback = inverse_predict(mod, xscaled)\n4×4 Matrix{Float64}:\n 4000.0 400.0 4.0 0.4\n 1000.0 100.0 1.0 0.1\n 2000.0 200.0 2.0 0.2\n 3000.0 300.0 3.0 0.3\n\nMin-max scaler...\n\njulia> using BetaML\n\njulia> x = [[4000,1000,2000,3000] [\"a\", \"categorical\", \"variable\", \"not to scale\"] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]\n4×4 Matrix{Any}:\n 4000 \"a\" 4 0.4\n 1000 \"categorical\" 1 0.1\n 2000 \"variable\" 2 0.2\n 3000 \"not to scale\" 3 0.3\n\njulia> mod = Scaler(MinMaxScaler(outputRange=(0,10)),skip=[2])\nA Scaler BetaMLModel (unfitted)\n\njulia> xscaled = fit!(mod,x)\n4×4 Matrix{Any}:\n 10.0 \"a\" 10.0 10.0\n 0.0 \"categorical\" 0.0 0.0\n 3.33333 \"variable\" 3.33333 3.33333\n 6.66667 \"not to scale\" 6.66667 6.66667\n\njulia> xback = inverse_predict(mod,xscaled)\n4×4 Matrix{Any}:\n 4000.0 \"a\" 4.0 0.4\n 1000.0 \"categorical\" 1.0 0.1\n 2000.0 \"variable\" 2.0 0.2\n 3000.0 \"not to scale\" 3.0 0.3\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.Scaler_hp","page":"Utils","title":"BetaML.Utils.Scaler_hp","text":"mutable struct Scaler_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the Scaler transformer\n\nParameters\n\nmethod: The specific scaler method to employ with its own parameters. See StandardScaler [def] or MinMaxScaler.\nskip: The positional ids of the columns to skip scaling (eg. categorical columns, dummies,...) [def: []]\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.StandardScaler","page":"Utils","title":"BetaML.Utils.StandardScaler","text":"mutable struct StandardScaler <: BetaML.Utils.AbstractScaler\n\nStandardise the input to zero mean and unit standard deviation, aka \"Z-score\". Note that missing values are skipped.\n\nParameters:\n\nscale: Scale to unit variance [def: true]\ncenter: Center to zero mean [def: true]\n\nExample:\n\njulia> using BetaML, Statistics\n\njulia> x = [[4000,1000,2000,3000] [400,100,200,300] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]\n4×4 Matrix{Float64}:\n 4000.0 400.0 4.0 0.4\n 1000.0 100.0 1.0 0.1\n 2000.0 200.0 2.0 0.2\n 3000.0 300.0 3.0 0.3\n\njulia> mod = Scaler() # equiv to `Scaler(StandardScaler(scale=true, center=true))`\nA Scaler BetaMLModel (unfitted)\n\njulia> xscaled = fit!(mod,x)\n4×4 Matrix{Float64}:\n 1.34164 1.34164 1.34164 1.34164\n -1.34164 -1.34164 -1.34164 -1.34164\n -0.447214 -0.447214 -0.447214 -0.447214\n 0.447214 0.447214 0.447214 0.447214\n\njulia> col_means = mean(xscaled, dims=1)\n1×4 Matrix{Float64}:\n 0.0 0.0 0.0 5.55112e-17\n\njulia> col_var = var(xscaled, dims=1, corrected=false)\n1×4 Matrix{Float64}:\n 1.0 1.0 1.0 1.0\n\njulia> xback = inverse_predict(mod, xscaled)\n4×4 Matrix{Float64}:\n 4000.0 400.0 4.0 0.4\n 1000.0 100.0 1.0 0.1\n 2000.0 200.0 2.0 0.2\n 3000.0 300.0 3.0 0.3\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#BetaML.Utils.SuccessiveHalvingSearch","page":"Utils","title":"BetaML.Utils.SuccessiveHalvingSearch","text":"mutable struct SuccessiveHalvingSearch <: AutoTuneMethod\n\nHyper-parameters validation of supervised models that search the parameters space trouth successive halving\n\nAll parameters are tested on a small sub-sample, then the \"best\" combinations are kept for a second round that use more samples and so on untill only one hyperparameter combination is left.\n\nNotes:\n\nthe default loss is suitable for 1-dimensional output supervised models, and applies itself cross-validation. Any function that accepts a model, some data and return a scalar loss can be used\nthe rate at which the potential candidate combinations of hyperparameters shrink is controlled by the number of data shares defined in res_shared (i.e. the epochs): more epochs are choosen, lower the \"shrink\" coefficient\n\nParameters:\n\nloss::Function: Loss function to use. [def: l2loss_by_cv]. Any function that takes a model, data (a vector of arrays, even if we work only with X) and (using therng` keyword) a RNG and return a scalar loss.\nres_shares::Vector{Float64}: Shares of the (data) resources to use for the autotuning in the successive iterations [def: [0.05, 0.2, 0.3]]. With res_share=1 all the dataset is used for autotuning, it can be very time consuming! The number of models is reduced of the same share in order to arrive with a single model. Increase the number of res_shares in order to increase the number of models kept at each iteration.\n\nhpranges::Dict{String, Any}: Dictionary of parameter names (String) and associated vector of values to test. Note that you can easily sample these values from a distribution with rand(distrobject,nvalues). The number of points you provide for a given parameter can be interpreted as proportional to the prior you have on the importance of that parameter for the algorithm quality.\nmultithreads::Bool: Use multiple threads in the search for the best hyperparameters [def: false]\n\n\n\n\n\n","category":"type"},{"location":"Utils.html#Base.error-Union{Tuple{T}, Tuple{AbstractVector{T}, AbstractVector{T}}} where T","page":"Utils","title":"Base.error","text":"error(y,ŷ;ignorelabels=false) - Categorical error (T vs T)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#Base.error-Union{Tuple{T}, Tuple{Int64, Vector{T}}} where T<:Number","page":"Utils","title":"Base.error","text":"error(y,ŷ) - Categorical error with probabilistic prediction of a single datapoint (Int vs PMF). \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#Base.error-Union{Tuple{T}, Tuple{Vector{Int64}, Matrix{T}}} where T<:Number","page":"Utils","title":"Base.error","text":"error(y,ŷ) - Categorical error with probabilistic predictions of a dataset (Int vs PMF). \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#Base.error-Union{Tuple{T}, Tuple{Vector{T}, Array{Dict{T, Float64}, 1}}} where T","page":"Utils","title":"Base.error","text":"error(y,ŷ) - Categorical error with with probabilistic predictions of a dataset given in terms of a dictionary of probabilities (T vs Dict{T,Float64}). \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#Base.reshape-Union{Tuple{T}, Tuple{T, Vararg{Any, N} where N}} where T<:Number","page":"Utils","title":"Base.reshape","text":"reshape(myNumber, dims..) - Reshape a number as a n dimensional Array \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.accuracy-Union{Tuple{T}, Tuple{AbstractVector{Int64}, AbstractMatrix{T}}} where T<:Number","page":"Utils","title":"BetaML.Utils.accuracy","text":"accuracy(y,ŷ;tol,ignorelabels)\n\nCategorical accuracy with probabilistic predictions of a dataset (PMF vs Int).\n\nParameters:\n\ny: The N array with the correct category for each point n.\nŷ: An (N,K) matrix of probabilities that each hat y_n record with n in 1N being of category k with k in 1K.\ntol: The tollerance to the prediction, i.e. if considering \"correct\" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].\nignorelabels: Whether to ignore the specific label order in y. Useful for unsupervised learning algorithms where the specific label order don't make sense [def: false]\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.accuracy-Union{Tuple{T}, Tuple{AbstractVector{T}, AbstractArray{Dict{T, Float64}, 1}}} where T","page":"Utils","title":"BetaML.Utils.accuracy","text":"accuracy(y,ŷ;tol)\n\nCategorical accuracy with probabilistic predictions of a dataset given in terms of a dictionary of probabilities (Dict{T,Float64} vs T).\n\nParameters:\n\nŷ: An array where each item is the estimated probability mass function in terms of a Dictionary(Item1 => Prob1, Item2 => Prob2, ...)\ny: The N array with the correct category for each point n.\ntol: The tollerance to the prediction, i.e. if considering \"correct\" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.accuracy-Union{Tuple{T}, Tuple{AbstractVector{T}, AbstractVector{T}}} where T","page":"Utils","title":"BetaML.Utils.accuracy","text":"accuracy(ŷ,y;ignorelabels=false) - Categorical accuracy between two vectors (T vs T). \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.accuracy-Union{Tuple{T}, Tuple{Int64, AbstractVector{T}}} where T<:Number","page":"Utils","title":"BetaML.Utils.accuracy","text":"accuracy(y,ŷ;tol)\n\nCategorical accuracy with probabilistic prediction of a single datapoint (PMF vs Int).\n\nUse the parameter tol [def: 1] to determine the tollerance of the prediction, i.e. if considering \"correct\" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.accuracy-Union{Tuple{T}, Tuple{T, AbstractDict{T, Float64}}} where T","page":"Utils","title":"BetaML.Utils.accuracy","text":"accuracy(y,ŷ;tol)\n\nCategorical accuracy with probabilistic prediction of a single datapoint given in terms of a dictionary of probabilities (Dict{T,Float64} vs T).\n\nParameters:\n\nŷ: The returned probability mass function in terms of a Dictionary(Item1 => Prob1, Item2 => Prob2, ...)\ntol: The tollerance to the prediction, i.e. if considering \"correct\" only a prediction where the value with highest probability is the true value (tol = 1), or consider instead the set of tol maximum values [def: 1].\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.aic-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.aic","text":"aic(lL,k) - Akaike information criterion (lower is better)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.autojacobian-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.autojacobian","text":"autojacobian(f,x;nY)\n\nEvaluate the Jacobian using AD in the form of a (nY,nX) matrix of first derivatives\n\nParameters:\n\nf: The function to compute the Jacobian\nx: The input to the function where the jacobian has to be computed\nnY: The number of outputs of the function f [def: length(f(x))]\n\nReturn values:\n\nAn Array{Float64,2} of the locally evaluated Jacobian\n\nNotes:\n\nThe nY parameter is optional. If provided it avoids having to compute f(x)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.autotune!-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.autotune!","text":"autotune!(m, data) -> Any\n\n\nHyperparameter autotuning.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.batch-Tuple{Integer, Integer}","page":"Utils","title":"BetaML.Utils.batch","text":"batch(n,bsize;sequential=false,rng)\n\nReturn a vector of bsize vectors of indeces from 1 to n. Randomly unless the optional parameter sequential is used.\n\nExample:\n\njulia julia> Utils.batch(6,2,sequential=true) 3-element Array{Array{Int64,1},1}: [1, 2] [3, 4] [5, 6]\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.bic-Tuple{Any, Any, Any}","page":"Utils","title":"BetaML.Utils.bic","text":"bic(lL,k,n) - Bayesian information criterion (lower is better)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.celu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.celu","text":"celu(x; α=1) \n\nhttps://arxiv.org/pdf/1704.07483.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.class_counts-Tuple{Any}","page":"Utils","title":"BetaML.Utils.class_counts","text":"class_counts(x;classes=nothing)\n\nReturn a (unsorted) vector with the counts of each unique item (element or rows) in a dataset.\n\nIf order is important or not all classes are present in the data, a preset vectors of classes can be given in the parameter classes\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.class_counts_with_labels-Tuple{Any}","page":"Utils","title":"BetaML.Utils.class_counts_with_labels","text":"classcountswith_labels(x)\n\nReturn a dictionary that counts the number of each unique item (rows) in a dataset.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.cols_with_missing-Tuple{Any}","page":"Utils","title":"BetaML.Utils.cols_with_missing","text":"cols_with_missing(x)\n\nRetuyrn an array with the ids of the columns where there is at least a missing value.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.consistent_shuffle-Union{Tuple{AbstractVector{T}}, Tuple{T}} where T","page":"Utils","title":"BetaML.Utils.consistent_shuffle","text":"consistent_shuffle(data;dims,rng)\n\nShuffle a vector of n-dimensional arrays across dimension dims keeping the same order between the arrays\n\nParameters\n\ndata: The vector of arrays to shuffle\ndims: The dimension over to apply the shuffle [def: 1]\nrng: An AbstractRNG to apply for the shuffle\n\nNotes\n\nAll the arrays must have the same size for the dimension to shuffle\n\nExample\n\njulia> a = [1 2 30; 10 20 30]; b = [100 200 300]; julia> (aShuffled, bShuffled) = consistent_shuffle([a,b],dims=2) 2-element Vector{Matrix{Int64}}: [1 30 2; 10 30 20] [100 300 200]\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.cosine_distance-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.cosine_distance","text":"Cosine distance\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.cross_validation","page":"Utils","title":"BetaML.Utils.cross_validation","text":"cross_validation(\n f,\n data\n) -> Union{Tuple{Any, Any}, Vector{Any}}\ncross_validation(\n f,\n data,\n sampler;\n dims,\n verbosity,\n return_statistics\n) -> Union{Tuple{Any, Any}, Vector{Any}}\n\n\nPerform cross_validation according to sampler rule by calling the function f and collecting its output\n\nParameters\n\nf: The user-defined function that consume the specific train and validation data and return somehting (often the associated validation error). See later\ndata: A single n-dimenasional array or a vector of them (e.g. X,Y), depending on the tasks required by f.\nsampler: An istance of a AbstractDataSampler, defining the \"rules\" for sampling at each iteration. [def: KFold(nsplits=5,nrepeats=1,shuffle=true,rng=Random.GLOBAL_RNG) ]. Note that the RNG passed to the f function is the RNG passed to the sampler\ndims: The dimension over performing the cross_validation i.e. the dimension containing the observations [def: 1]\nverbosity: The verbosity to print information during each iteration (this can also be printed in the f function) [def: STD]\nreturn_statistics: Wheter cross_validation should return the statistics of the output of f (mean and standard deviation) or the whole outputs [def: true].\n\nNotes\n\ncross_validation works by calling the function f, defined by the user, passing to it the tuple trainData, valData and rng and collecting the result of the function f. The specific method for which trainData, and valData are selected at each iteration depends on the specific sampler, whith a single 5 k-fold rule being the default.\n\nThis approach is very flexible because the specific model to employ or the metric to use is left within the user-provided function. The only thing that cross_validation does is provide the model defined in the function f with the opportune data (and the random number generator).\n\nInput of the user-provided function trainData and valData are both themselves tuples. In supervised models, crossvalidations data should be a tuple of (X,Y) and trainData and valData will be equivalent to (xtrain, ytrain) and (xval, yval). In unsupervised models data is a single array, but the training and validation data should still need to be accessed as trainData[1] and valData[1]. Output of the user-provided function The user-defined function can return whatever. However, if `returnstatisticsis left on its defaulttrue` value the user-defined function must return a single scalar (e.g. some error measure) so that the mean and the standard deviation are returned.\n\nNote that cross_validation can beconveniently be employed using the do syntax, as Julia automatically rewrite cross_validation(data,...) trainData,valData,rng ...user defined body... end as cross_validation(f(trainData,valData,rng ), data,...)\n\nExample\n\njulia> X = [11:19 21:29 31:39 41:49 51:59 61:69];\njulia> Y = [1:9;];\njulia> sampler = KFold(nsplits=3);\njulia> (μ,σ) = cross_validation([X,Y],sampler) do trainData,valData,rng\n (xtrain,ytrain) = trainData; (xval,yval) = valData\n model = RandomForestEstimator(n_trees=30,rng=rng) \n fit!(model,xtrain,ytrain)\n ŷval = predict(model,xval)\n ϵ = relative_mean_error(yval,ŷval)\n return ϵ\n end\n(0.3202242202242202, 0.04307662219315022)\n\n\n\n\n\n","category":"function"},{"location":"Utils.html#BetaML.Utils.crossentropy-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.crossentropy","text":"crossentropy(y,ŷ; weight)\n\nCompute the (weighted) cross-entropy between the predicted and the sampled probability distributions.\n\nTo be used in classification problems.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dcelu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dcelu","text":"dcelu(x; α=1) \n\nhttps://arxiv.org/pdf/1704.07483.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.delu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.delu","text":"delu(x; α=1) with α > 0 \n\nhttps://arxiv.org/pdf/1511.07289.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dmaximum-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dmaximum","text":"dmaximum(x) \n\nMultidimensional verison of the derivative of maximum\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dmish-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dmish","text":"dmish(x) \n\nhttps://arxiv.org/pdf/1908.08681v1.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dplu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dplu","text":"dplu(x;α=0.1,c=1) \n\nPiecewise Linear Unit derivative \n\nhttps://arxiv.org/pdf/1809.09534.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.drelu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.drelu","text":"drelu(x) \n\nRectified Linear Unit \n\nhttps://www.cs.toronto.edu/~hinton/absps/reluICML.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dsigmoid-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dsigmoid","text":"dsigmoid(x)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dsoftmax-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dsoftmax","text":"dsoftmax(x; β=1) \n\nDerivative of the softmax function \n\nhttps://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dsoftplus-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dsoftplus","text":"dsoftplus(x) \n\nhttps://en.wikipedia.org/wiki/Rectifier(neuralnetworks)#Softplus\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.dtanh-Tuple{Any}","page":"Utils","title":"BetaML.Utils.dtanh","text":"dtanh(x)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.elu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.elu","text":"elu(x; α=1) with α > 0 \n\nhttps://arxiv.org/pdf/1511.07289.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.entropy-Tuple{Any}","page":"Utils","title":"BetaML.Utils.entropy","text":"entropy(x)\n\nCalculate the entropy for a list of items (or rows).\n\nSee: https://en.wikipedia.org/wiki/Decisiontreelearning#Gini_impurity\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.generate_parallel_rngs-Tuple{Random.AbstractRNG, Integer}","page":"Utils","title":"BetaML.Utils.generate_parallel_rngs","text":"generate_parallel_rngs(rng::AbstractRNG, n::Integer;reSeed=false)\n\nFor multi-threaded models, return n independent random number generators (one per thread) to be used in threaded computations.\n\nNote that each ring is a copy of the original random ring. This means that code that use these RNGs will not change the original RNG state.\n\nUse it with rngs = generate_parallel_rngs(rng,Threads.nthreads()) to have a separate rng per thread. By default the function doesn't re-seed the RNG, as you may want to have a loop index based re-seeding strategy rather than a threadid-based one (to guarantee the same result independently of the number of threads). If you prefer, you can instead re-seed the RNG here (using the parameter reSeed=true), such that each thread has a different seed. Be aware however that the stream of number generated will depend from the number of threads at run time.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.getpermutations-Union{Tuple{AbstractVector{T}}, Tuple{T}} where T","page":"Utils","title":"BetaML.Utils.getpermutations","text":"getpermutations(v::AbstractArray{T,1};keepStructure=false)\n\nReturn a vector of either (a) all possible permutations (uncollected) or (b) just those based on the unique values of the vector\n\nUseful to measure accuracy where you don't care about the actual name of the labels, like in unsupervised classifications (e.g. clustering)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.gini-Tuple{Any}","page":"Utils","title":"BetaML.Utils.gini","text":"gini(x)\n\nCalculate the Gini Impurity for a list of items (or rows).\n\nSee: https://en.wikipedia.org/wiki/Decisiontreelearning#Information_gain\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.issortable-Union{Tuple{AbstractArray{T, N}}, Tuple{N}, Tuple{T}} where {T, N}","page":"Utils","title":"BetaML.Utils.issortable","text":"Return wheather an array is sortable, i.e. has methos issort defined\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.l1_distance-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.l1_distance","text":"L1 norm distance (aka Manhattan Distance)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.l2_distance-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.l2_distance","text":"Euclidean (L2) distance\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.l2loss_by_cv-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.l2loss_by_cv","text":"Compute the loss of a given model over a given (x,y) dataset running cross-validation\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.l2squared_distance-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.l2squared_distance","text":"Squared Euclidean (L2) distance\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.lse-Tuple{Any}","page":"Utils","title":"BetaML.Utils.lse","text":"LogSumExp for efficiently computing log(sum(exp.(x))) \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.makematrix-Tuple{AbstractVector{T} where T}","page":"Utils","title":"BetaML.Utils.makematrix","text":"Transform an Array{T,1} in an Array{T,2} and leave unchanged Array{T,2}.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mean_dicts-Tuple{Any}","page":"Utils","title":"BetaML.Utils.mean_dicts","text":"mean_dicts(dicts)\n\nCompute the mean of the values of an array of dictionaries.\n\nGiven dicts an array of dictionaries, mean_dicts first compute the union of the keys and then average the values. If the original valueas are probabilities (non-negative items summing to 1), the result is also a probability distribution.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mish-Tuple{Any}","page":"Utils","title":"BetaML.Utils.mish","text":"mish(x) \n\nhttps://arxiv.org/pdf/1908.08681v1.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mode-Union{Tuple{AbstractArray{Dict{T, Float64}, N} where N}, Tuple{T}} where T","page":"Utils","title":"BetaML.Utils.mode","text":"mode(elements,rng)\n\nGiven a vector of dictionaries whose key is numerical (e.g. probabilities), a vector of vectors or a matrix, it returns the mode of each element (dictionary, vector or row) in terms of the key or the position.\n\nUse it to return a unique value from a multiclass classifier returning probabilities.\n\nNote:\n\nIf multiple classes have the highest mode, one is returned at random (use the parameter rng to fix the stochasticity)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mode-Union{Tuple{AbstractVector{T}}, Tuple{T}} where T<:Number","page":"Utils","title":"BetaML.Utils.mode","text":"mode(v::AbstractVector{T};rng)\n\nReturn the position with the highest value in an array, interpreted as mode (using rand in case of multimodal values)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mode-Union{Tuple{Dict{T, Float64}}, Tuple{T}} where T","page":"Utils","title":"BetaML.Utils.mode","text":"mode(dict::Dict{T,Float64};rng)\n\nReturn the key with highest mode (using rand in case of multimodal values)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.mse-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.mse","text":"mse(y,ŷ)\n\nCompute the mean squared error (MSE) (aka mean squared deviation - MSD) between two vectors y and ŷ. Note that while the deviation is averaged by the length of y is is not scaled to give it a relative meaning.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.pairwise-Tuple{AbstractArray}","page":"Utils","title":"BetaML.Utils.pairwise","text":"pairwise(x::AbstractArray; distance, dims) -> Any\n\n\nCompute pairwise distance matrix between elements of an array identified across dimension dims.\n\nParameters:\n\nx: the data array \ndistance: a distance measure [def: l2_distance]\ndims: the dimension of the observations [def: 1, i.e. records on rows]\n\nReturns:\n\na nrecords by nrecords simmetric matrix of the pairwise distances\n\nNotes:\n\nif performances matters, you can use something like Distances.pairwise(Distances.euclidean,x,dims=1) from the Distances package.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.partition-Union{Tuple{T}, Tuple{AbstractVector{T}, AbstractVector{Float64}}} where T<:AbstractArray","page":"Utils","title":"BetaML.Utils.partition","text":"partition(data,parts;shuffle,dims,rng)\n\nPartition (by rows) one or more matrices according to the shares in parts.\n\nParameters\n\ndata: A matrix/vector or a vector of matrices/vectors\nparts: A vector of the required shares (must sum to 1)\nshufle: Whether to randomly shuffle the matrices (preserving the relative order between matrices)\ndims: The dimension for which to partition [def: 1]\ncopy: Wheter to copy the actual data or only create a reference [def: true]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\nThe sum of parts must be equal to 1\nThe number of elements in the specified dimension must be the same for all the arrays in data\n\nExample:\n\njulia julia> x = [1:10 11:20] julia> y = collect(31:40) julia> ((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.7,0.3])\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.plu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.plu","text":"plu(x;α=0.1,c=1) \n\nPiecewise Linear Unit \n\nhttps://arxiv.org/pdf/1809.09534.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.polynomial_kernel-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.polynomial_kernel","text":"Polynomial kernel parametrised with constant=0 and degree=2 (i.e. a quadratic kernel). For other cᵢ and dᵢ use K = (x,y) -> polynomial_kernel(x,y,c=cᵢ,d=dᵢ) as kernel function in the supporting algorithms\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.pool1d","page":"Utils","title":"BetaML.Utils.pool1d","text":"pool1d(x,poolsize=2;f=mean)\n\nApply funtion f to a rolling poolsize contiguous (in 1d) neurons.\n\nApplicable to VectorFunctionLayer, e.g. layer2 = VectorFunctionLayer(nₗ,f=(x->pool1d(x,4,f=mean)) Attention: to apply this function as activation function in a neural network you will need Julia version >= 1.6, otherwise you may experience a segmentation fault (see this bug report)\n\n\n\n\n\n","category":"function"},{"location":"Utils.html#BetaML.Utils.radial_kernel-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.radial_kernel","text":"Radial Kernel (aka RBF kernel) parametrised with γ=1/2. For other gammas γᵢ use K = (x,y) -> radial_kernel(x,y,γ=γᵢ) as kernel function in the supporting algorithms\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.relative_mean_error-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.relative_mean_error","text":"relativemeanerror(y, ŷ;normdim=false,normrec=false,p=1)\n\nCompute the relative mean error (l-1 based by default) between y and ŷ.\n\nThere are many ways to compute a relative mean error. In particular, if normrec (normdim) is set to true, the records (dimensions) are normalised, in the sense that it doesn't matter if a record (dimension) is bigger or smaller than the others, the relative error is first computed for each record (dimension) and then it is averaged. With both normdim and normrec set to false (default) the function returns the relative mean error; with both set to true it returns the mean relative error (i.e. with p=1 the \"mean absolute percentage error (MAPE)\") The parameter p [def: 1] controls the p-norm used to define the error.\n\nThe mean relative error enfatises the relativeness of the error, i.e. all observations and dimensions weigth the same, wether large or small. Conversly, in the relative mean error the same relative error on larger observations (or dimensions) weights more.\n\nFor example, given y = [1,44,3] and ŷ = [2,45,2], the mean relative error mean_relative_error(y,ŷ,normrec=true) is 0.452, while the relative mean error relative_mean_error(y,ŷ, normrec=false) is \"only\" 0.0625.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.relu-Tuple{Any}","page":"Utils","title":"BetaML.Utils.relu","text":"relu(x) \n\nRectified Linear Unit \n\nhttps://www.cs.toronto.edu/~hinton/absps/reluICML.pdf\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.sigmoid-Tuple{Any}","page":"Utils","title":"BetaML.Utils.sigmoid","text":"sigmoid(x)\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.silhouette-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.silhouette","text":"silhouette(distances, classes) -> Any\n\n\nProvide Silhouette scoring for cluster outputs\n\nParameters:\n\ndistances: the nrecords by nrecords pairwise distance matrix\nclasses: the vector of assigned classes to each record\n\nNotes:\n\nthe matrix of pairwise distances can be obtained with the function pairwise\nthis function doesn't sample. Eventually sample before\nto get the score for the cluster simply compute the mean\nsee also the Wikipedia article\n\nExample:\n\njulia> x = [1 2 3 3; 1.2 3 3.1 3.2; 2 4 6 6.2; 2.1 3.5 5.9 6.3];\n\njulia> s_scores = silhouette(pairwise(x),[1,2,2,2])\n4-element Vector{Float64}:\n 0.0\n -0.7590778795827623\n 0.5030093571833065\n 0.4936350560759424\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.softmax-Tuple{Any}","page":"Utils","title":"BetaML.Utils.softmax","text":"softmax (x; β=1) \n\nThe input x is a vector. Return a PMF\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.softplus-Tuple{Any}","page":"Utils","title":"BetaML.Utils.softplus","text":"softplus(x) \n\nhttps://en.wikipedia.org/wiki/Rectifier(neuralnetworks)#Softplus\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.squared_cost-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.squared_cost","text":"squared_cost(y,ŷ)\n\nCompute the squared costs between a vector of observations and one of prediction as (1/2)*norm(y - ŷ)^2.\n\nAside the 1/2 term, it correspond to the squared l-2 norm distance and when it is averaged on multiple datapoints corresponds to the Mean Squared Error (MSE). It is mostly used for regression problems.\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.sterling-Tuple{BigInt, BigInt}","page":"Utils","title":"BetaML.Utils.sterling","text":"Sterling number: number of partitions of a set of n elements in k sets \n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.variance-Tuple{Any}","page":"Utils","title":"BetaML.Utils.variance","text":"variance(x) - population variance\n\n\n\n\n\n","category":"method"},{"location":"Utils.html#BetaML.Utils.xavier_init","page":"Utils","title":"BetaML.Utils.xavier_init","text":"xavier_init(previous_npar, this_npar) -> Matrix{Float64}\nxavier_init(\n previous_npar,\n this_npar,\n outsize;\n rng,\n eltype\n) -> Any\n\n\nPErform a Xavier initialisation of the weigths\n\nParameters:\n\nprevious_npar: number of parameters of the previous layer\nthis_npar: number of parameters of this layer\noutsize: tuple with the size of the weigths [def: (this_npar,previous_npar)]\nrng : random number generator [def: Random.GLOBAL_RNG]\neltype: eltype of the weigth array [def: Float64]\n\n\n\n\n\n","category":"function"},{"location":"Utils.html#BetaML.Utils.@codelocation-Tuple{}","page":"Utils","title":"BetaML.Utils.@codelocation","text":"@codelocation()\n\nHelper macro to print during runtime an info message concerning the code being executed position\n\n\n\n\n\n","category":"macro"},{"location":"Utils.html#BetaML.Utils.@threadsif-Tuple{Any, Any}","page":"Utils","title":"BetaML.Utils.@threadsif","text":"Conditionally apply multi-threading to for loops. This is a variation on Base.Threads.@threads that adds a run-time boolean flag to enable or disable threading. \n\nExample:\n\nfunction optimize(objectives; use_threads=true)\n @threadsif use_threads for k = 1:length(objectives)\n # ...\n end\nend\n\n# Notes:\n- Borrowed from https://github.com/JuliaQuantumControl/QuantumControlBase.jl/blob/master/src/conditionalthreads.jl\n\n\n\n\n\n","category":"macro"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"EditURL = \"betaml_tutorial_cluster_iris.jl\"","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#clustering_tutorial","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The task is to estimate the species of a plant given some floreal measurements. It use the classical \"Iris\" dataset. Note that in this example we are using clustering approaches, so we try to understand the \"structure\" of our data, without relying to actually knowing the true labels (\"classes\" or \"factors\"). However we have chosen a dataset for which the true labels are actually known, so we can compare the accuracy of the algorithms we use, but these labels will not be used during the algorithms training.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Data origin:","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"dataset description: https://en.wikipedia.org/wiki/Irisflowerdata_set\ndata source we use here: https://github.com/JuliaStats/RDatasets.jl","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Library-and-data-loading","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Library and data loading","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Activating the local environment specific to BetaML documentation","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"using Pkg\nPkg.activate(joinpath(@__DIR__,\"..\",\"..\",\"..\"))","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We load the Beta Machine Learning Toolkit as well as some other packages that we use in this tutorial","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"using BetaML\nusing Random, Statistics, Logging, BenchmarkTools, StableRNGs, RDatasets, Plots, DataFrames","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We are also going to compare our results with two other leading packages in Julia for clustering analysis, Clustering.jl that provides (inter alia) kmeans and kmedoids algorithms and GaussianMixtures.jl that provides, as the name says, Gaussian Mixture Models. So we import them (we \"import\" them, rather than \"use\", not to bound their full names into namespace as some would collide with BetaML).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"import Clustering, GaussianMixtures","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Here we are explicit and we use our own fixed RNG:","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"seed = 123 # The table at the end of this tutorial has been obtained with seeds 123, 1000 and 10000\nAFIXEDRNG = StableRNG(seed)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We do a few tweeks for the Clustering and GaussianMixtures packages. Note that in BetaML we can also control both the random seed and the verbosity in the algorithm call, not only globally","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Random.seed!(seed)\n#logger = Logging.SimpleLogger(stdout, Logging.Error); global_logger(logger); ## For suppressing GaussianMixtures output\nnothing #hide","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Differently from the regression tutorial, we load the data here from [RDatasets](https://github.com/JuliaStats/RDatasets.jl](https://github.com/JuliaStats/RDatasets.jl), a package providing standard datasets.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"iris = dataset(\"datasets\", \"iris\")\ndescribe(iris)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The iris dataset provides floreal measures in columns 1 to 4 and the assigned species name in column 5. There are no missing values","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Data-preparation","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Data preparation","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The first step is to prepare the data for the analysis. We collect the first 4 columns as our feature x matrix and the last one as our y label vector. As we are using clustering algorithms, we are not actually using the labels to train the algorithms, we'll behave like we do not know them, we'll just let the algorithm \"learn\" from the structure of the data itself. We'll however use it to judge the accuracy that the various algorithms reach.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"x = Matrix{Float64}(iris[:,1:4]);\nyLabels = unique(iris[:,5])","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"As the labels are expressed as strings, the first thing we do is encode them as integers for our analysis using the OrdinalEncoder model (data isn't really needed to be actually ordered):","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"y = fit!(OrdinalEncoder(categories=yLabels),iris[:,5])","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The dataset from RDatasets is ordered by species, so we need to shuffle it to avoid biases. Shuffling happens by default in crossvalidation, but we are keeping here a copy of the shuffled version for later. Note that the version of [`consistentshuffle`](@ref) that is included in BetaML accepts several n-dimensional arrays and shuffle them (by default on rows, by we can specify the dimension) keeping the association between the various arrays in the shuffled output.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"(xs,ys) = consistent_shuffle([x,y], rng=copy(AFIXEDRNG));\nnothing #hide","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Main-analysis","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Main analysis","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We will try 3 BetaML models (KMeansClusterer, KMedoidsClusterer and GaussianMixtureClusterer) and we compare them with kmeans from Clusterings.jl and GMM from GaussianMixtures.jl","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"KMeansClusterer and KMedoidsClusterer works by first initialising the centers of the k-clusters (step a ). These centers, also known as the \"representatives\", must be selected within the data for kmedoids, while for kmeans they are the geometrical centers.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Then ( step b ) the algorithms iterates toward each point to assign the point to the cluster of the closest representative (according with a user defined distance metric, default to Euclidean), and ( step c ) moves each representative at the center of its newly acquired cluster (where \"center\" depends again from the metric).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Steps b and c are reiterated until the algorithm converge, i.e. the tentative k representative points (and their relative clusters) don't move any more. The result (output of the algorithm) is that each point is assigned to one of the clusters (classes).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The algorithm in GaussianMixtureClusterer is similar in that it employs an iterative approach (the ExpectationMinimisation algorithm, \"em\") but here we make the hipothesis that the data points are the observed outcomes of some _mixture probabilistic models where we have first a k-categorical variables whose outcomes are the (unobservble) parameters of a probabilistic distribution from which the data is finally drawn. Because the parameters of each of the k-possible distributions is unobservable this is also called a model with latent variables.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Most gmm models use the Gaussain distribution as the family of the mixture components, so we can tought the gmm acronym to indicate Gaussian Mixture Model. In BetaML we have currently implemented only Gaussain components, but any distribution could be used by just subclassing AbstractMixture and implementing a couple of methids (you are invited to contribute or just ask for a distribution family you are interested), so I prefer to think \"gmm\" as an acronym for Generative Mixture Model.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The algorithm tries to find the mixture that maximises the likelihood that the data has been generated indeed from such mixture, where the \"E\" step refers to computing the probability that each point belongs to each of the k-composants (somehow similar to the step b in the kmeans/kmedoids algorithms), and the \"M\" step estimates, giving the association probabilities in step \"E\", the parameters of the mixture and of the individual components (similar to step c).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"The result here is that each point has a categorical distribution (PMF) representing the probabilities that it belongs to any of the k-components (our classes or clusters). This is interesting, as gmm can be used for many other things that clustering. It forms the backbone of the GaussianMixtureImputer model to impute missing values (on some or all dimensions) based to how close the record seems to its pears. For the same reasons, GaussianMixtureImputer can also be used to predict user's behaviours (or users' appreciation) according to the behaviour/ranking made by pears (\"collaborative filtering\").","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"While the result of GaussianMixtureClusterer is a vector of PMFs (one for each record), error measures and reports with the true values (if known) can be directly applied, as in BetaML they internally call mode() to retrieve the class with the highest probability for each record.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"As we are here, we also try different versions of the BetaML models, even if the default \"versions\" should be fine. For KMeansClusterer and KMedoidsClusterer we will try different initialisation strategies (\"gird\", the default one, \"random\" and \"shuffle\"), while for the GaussianMixtureClusterer model we'll choose different distributions of the Gaussain family (SphericalGaussian - where the variance is a scalar, DiagonalGaussian - with a vector variance, and FullGaussian, where the covariance is a matrix).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"As the result would depend on stochasticity both in the data selected and in the random initialisation, we use a cross-validation approach to run our models several times (with different data) and then we average their results. Cross-Validation in BetaML is very flexible and it is done using the cross_validation function. It is used by default for hyperparameters autotuning of the BetaML supervised models. cross_validation works by calling the function f, defined by the user, passing to it the tuple trainData, valData and rng and collecting the result of the function f. The specific method for which trainData, and valData are selected at each iteration depends on the specific sampler.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We start by selectign a k-fold sampler that split our data in 5 different parts, it uses 4 for training and 1 part (not used here) for validation. We run the simulations twice and, to be sure to have replicable results, we fix the random seed (at the whole crossValidaiton level, not on each iteration).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"sampler = KFold(nsplits=5,nrepeats=3,shuffle=true, rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We can now run the cross-validation with our models. Note that instead of defining the function f and then calling cross_validation[f(trainData,testData,rng),[x,y],...) we use the Julia do block syntax and we write directly the content of the f function in the do block. Also, by default crossvalidation already returns the mean and the standard deviation of the output of the user-provided f function (or the do block). However this requires that the f function returns a single scalar. Here we are returning a vector of the accuracies of the different models (so we can run the cross-validation only once), and hence we indicate with `returnstatistics=false` to cross_validation not to attempt to generate statistics but rather report the whole output. We'll compute the statistics ex-post.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Inside the do block we do 4 things:","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"we recover from trainData (a tuple, as we passed a tuple to cross_validation too) the xtrain features and ytrain labels;\nwe run the various clustering algorithms\nwe use the real labels to compute the model accuracy. Note that the clustering algorithm know nothing about the specific label name or even their order. This is why accuracy has the parameter ignorelabels to compute the accuracy oven any possible permutation of the classes found.\nwe return the various models' accuracies","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"cOut = cross_validation([x,y],sampler,return_statistics=false) do trainData,testData,rng\n # For unsupervised learning we use only the train data.\n # Also, we use the associated labels only to measure the performances\n (xtrain,ytrain) = trainData;\n # We run the clustering algorithm and then and we compute the accuracy using the real labels:\n estcl = fit!(KMeansClusterer(n_classes=3,initialisation_strategy=\"grid\",rng=rng),xtrain)\n kMeansGAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(KMeansClusterer(n_classes=3,initialisation_strategy=\"random\",rng=rng),xtrain)\n kMeansRAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(KMeansClusterer(n_classes=3,initialisation_strategy=\"shuffle\",rng=rng),xtrain)\n kMeansSAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(KMedoidsClusterer(n_classes=3,initialisation_strategy=\"grid\",rng=rng),xtrain)\n kMedoidsGAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(KMedoidsClusterer(n_classes=3,initialisation_strategy=\"random\",rng=rng),xtrain)\n kMedoidsRAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(KMedoidsClusterer(n_classes=3,initialisation_strategy=\"shuffle\",rng=rng),xtrain)\n kMedoidsSAccuracy = accuracy(ytrain,estcl,ignorelabels=true)\n estcl = fit!(GaussianMixtureClusterer(n_classes=3,mixtures=SphericalGaussian,rng=rng,verbosity=NONE),xtrain)\n gmmSpherAccuracy = accuracy(ytrain,estcl,ignorelabels=true, rng=rng)\n estcl = fit!(GaussianMixtureClusterer(n_classes=3,mixtures=DiagonalGaussian,rng=rng,verbosity=NONE),xtrain)\n gmmDiagAccuracy = accuracy(ytrain,estcl,ignorelabels=true, rng=rng)\n estcl = fit!(GaussianMixtureClusterer(n_classes=3,mixtures=FullGaussian,rng=rng,verbosity=NONE),xtrain)\n gmmFullAccuracy = accuracy(ytrain,estcl,ignorelabels=true, rng=rng)\n # For comparision with Clustering.jl\n clusteringOut = Clustering.kmeans(xtrain', 3)\n kMeans2Accuracy = accuracy(ytrain,clusteringOut.assignments,ignorelabels=true)\n # For comparision with GaussianMistures.jl - sometimes GaussianMistures.jl em! fails with a PosDefException\n dGMM = GaussianMixtures.GMM(3, xtrain; method=:kmeans, kind=:diag)\n GaussianMixtures.em!(dGMM, xtrain)\n gmmDiag2Accuracy = accuracy(ytrain,GaussianMixtures.gmmposterior(dGMM, xtrain)[1],ignorelabels=true)\n fGMM = GaussianMixtures.GMM(3, xtrain; method=:kmeans, kind=:full)\n GaussianMixtures.em!(fGMM, xtrain)\n gmmFull2Accuracy = accuracy(ytrain,GaussianMixtures.gmmposterior(fGMM, xtrain)[1],ignorelabels=true)\n # Returning the accuracies\n return kMeansGAccuracy,kMeansRAccuracy,kMeansSAccuracy,kMedoidsGAccuracy,kMedoidsRAccuracy,kMedoidsSAccuracy,gmmSpherAccuracy,gmmDiagAccuracy,gmmFullAccuracy,kMeans2Accuracy,gmmDiag2Accuracy,gmmFull2Accuracy\n end\n\n# We transform the output in matrix for easier analysis\naccuracies = fill(0.0,(length(cOut),length(cOut[1])))\n[accuracies[r,c] = cOut[r][c] for r in 1:length(cOut),c in 1:length(cOut[1])]\nμs = mean(accuracies,dims=1)\nσs = std(accuracies,dims=1)\n\n\nmodelLabels=[\"kMeansG\",\"kMeansR\",\"kMeansS\",\"kMedoidsG\",\"kMedoidsR\",\"kMedoidsS\",\"gmmSpher\",\"gmmDiag\",\"gmmFull\",\"kMeans (Clustering.jl)\",\"gmmDiag (GaussianMixtures.jl)\",\"gmmFull (GaussianMixtures.jl)\"]\n\nreport = DataFrame(mName = modelLabels, avgAccuracy = dropdims(round.(μs',digits=3),dims=2), stdAccuracy = dropdims(round.(σs',digits=3),dims=2))","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Accuracies (mean and its standard dev.) running this scripts with different random seeds (123, 1000 and 10000):","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"model μ 1 σ² 1 μ 2 σ² 2 μ 3 σ² 3\n│ kMeansG 0.891 0.017 0.892 0.012 0.893 0.017\n│ kMeansR 0.866 0.083 0.831 0.127 0.836 0.114\n│ kMeansS 0.764 0.174 0.822 0.145 0.779 0.170\n│ kMedoidsG 0.894 0.015 0.896 0.012 0.894 0.017\n│ kMedoidsR 0.804 0.144 0.841 0.123 0.825 0.134\n│ kMedoidsS 0.893 0.018 0.834 0.130 0.877 0.085\n│ gmmSpher 0.893 0.016 0.891 0.016 0.895 0.017\n│ gmmDiag 0.917 0.022 0.912 0.016 0.916 0.014\n│ gmmFull 0.970 0.035 0.982 0.013 0.981 0.009\n│ kMeans (Clustering.jl) 0.856 0.112 0.873 0.083 0.873 0.089\n│ gmmDiag (GaussianMixtures.jl) 0.865 0.127 0.872 0.090 0.833 0.152\n│ gmmFull (GaussianMixtures.jl) 0.907 0.133 0.914 0.160 0.917 0.141","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We can see that running the script multiple times with different random seed confirm the estimated standard deviations collected with the cross_validation, with the BetaML GMM-based models and grid based ones being the most stable ones.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#BetaML-model-accuracies","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"BetaML model accuracies","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"From the output We see that the gmm models perform for this dataset generally better than kmeans or kmedoids algorithms, and they further have very low variances. In detail, it is the (default) grid initialisation that leads to the better results for kmeans and kmedoids, while for the gmm models it is the FullGaussian to perform better.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Comparisions-with-Clustering.jl-and-GaussianMixtures.jl","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Comparisions with Clustering.jl and GaussianMixtures.jl","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"For this specific case, both Clustering.jl and GaussianMixtures.jl report substantially worst accuracies, and with very high variances. But we maintain the ranking that Full Gaussian gmm > Diagonal Gaussian > Kmeans accuracy. I suspect the reason that BetaML gmm works so well is in relation to the usage of kmeans algorithm for initialisation of the mixtures, itself initialized with a \"grid\" arpproach. The grid initialisation \"guarantee\" indeed that the initial means of the mixture components are well spread across the multidimensional space defined by the data, and it helps avoiding the EM algoritm to converge to a bad local optimus.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Working-without-the-labels","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Working without the labels","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Up to now we used the real labels to compare the model accuracies. But in real clustering examples we don't have the true classes, or we wouln't need to do clustering in the first instance, so we don't know the number of classes to use. There are several methods to judge clusters algorithms goodness. For likelyhood based algorithms as GaussianMixtureClusterer we can use a information criteria that trade the goodness of the lickelyhood with the number of parameters used to do the fit. BetaML provides by default in the gmm clustering outputs both the Bayesian information criterion (BIC) and the Akaike information criterion (AIC), where for both a lower value is better.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We can then run the model with different number of classes and see which one leads to the lower BIC or AIC. We run hence cross_validation again with the FullGaussian gmm model. Note that we use the BIC/AIC criteria here for establishing the \"best\" number of classes but we could have used it also to select the kind of Gaussain distribution to use. This is one example of hyper-parameter tuning that we developed more in detail using autotuning in the regression tutorial.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Let's try up to 4 possible classes:","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"K = 4\nsampler = KFold(nsplits=5,nrepeats=2,shuffle=true, rng=copy(AFIXEDRNG))\ncOut = cross_validation([x,y],sampler,return_statistics=false) do trainData,testData,rng\n (xtrain,ytrain) = trainData;\n BICS = []\n AICS = []\n for k in 1:K\n m = GaussianMixtureClusterer(n_classes=k,mixtures=FullGaussian,rng=rng,verbosity=NONE)\n fit!(m,xtrain)\n push!(BICS,info(m)[\"BIC\"])\n push!(AICS,info(m)[\"AIC\"])\n end\n return (BICS,AICS)\nend\n\n# Transforming the output in matrices for easier analysis\nNit = length(cOut)\n\nBICS = fill(0.0,(Nit,K))\nAICS = fill(0.0,(Nit,K))\n[BICS[r,c] = cOut[r][1][c] for r in 1:Nit,c in 1:K]\n[AICS[r,c] = cOut[r][2][c] for r in 1:Nit,c in 1:K]\n\nμsBICS = mean(BICS,dims=1)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"σsBICS = std(BICS,dims=1)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"μsAICS = mean(AICS,dims=1)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"σsAICS = std(AICS,dims=1)","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"plot(1:K,[μsBICS' μsAICS'], labels=[\"BIC\" \"AIC\"], title=\"Information criteria by number of classes\", xlabel=\"number of classes\", ylabel=\"lower is better\")","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We see that following the \"lowest AIC\" rule we would indeed choose three classes, while following the \"lowest BIC\" criteria we would have choosen only two classes. This means that there is two classes that, concerning the floreal measures used in the database, are very similar, and our models are unsure about them. Perhaps the biologists will end up one day with the conclusion that it is indeed only one specie :-).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We could study this issue more in detail by analysing the ConfusionMatrix, but the one used in BetaML does not account for the ignorelabels option (yet).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Analysing-the-silhouette-of-the-cluster","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Analysing the silhouette of the cluster","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"A further metric to analyse cluster output is the so-called Sinhouette method","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Silhouette is a distance-based metric and require as first argument a matrix of pairwise distances. This can be computed with the pairwise function, that default to using l2_distance (i.e. Euclidean). Many other distance functions are available in the Clustering sub-module or one can use the efficiently implemented distances from the Distances package, as in this example.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We'll use here the silhouette function over a simple loop:","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"x,y = consistent_shuffle([x,y],dims=1)\nimport Distances\npd = pairwise(x,distance=Distances.euclidean) # we compute the pairwise distances\nnclasses = 2:6\nmodels = [KMeansClusterer, KMedoidsClusterer, GaussianMixtureClusterer]\nprintln(\"Silhouette score by model type and class number:\")\nfor ncl in nclasses, mtype in models\n m = mtype(n_classes=ncl, verbosity=NONE)\n ŷ = fit!(m,x)\n if mtype == GaussianMixtureClusterer\n ŷ = mode(ŷ)\n end\n s = mean(silhouette(pd,ŷ))\n println(\"$mtype \\t ($ncl classes): $s\")\nend","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"Highest levels are better. We see again that 2 classes have better scores !","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html#Conclusions","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"Conclusions","text":"","category":"section"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"We have shown in this tutorial how we can easily run clustering algorithms in BetaML with just one line of code fit!(ChoosenClusterer(),x), but also how can we use cross-validation in order to help the model or parameter selection, with or whithout knowing the real classes. We retrieve here what we observed with supervised models. Globally the accuracy of BetaML models are comparable to those of leading specialised packages (in this case they are even better), but there is a significant gap in computational efficiency that restricts the pratical usage of BetaML to datasets that fits in the pc memory. However we trade this relative inefficiency with very flexible model definition and utility functions (for example GaussianMixtureClusterer works with missing data, allowing it to be used as the backbone of the GaussianMixtureImputer missing imputation function, or for collaborative reccomendation systems).","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"View this file on Github.","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"","category":"page"},{"location":"tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html","page":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","title":"A clustering task: the prediction of plant species from floreal measures (the iris dataset)","text":"This page was generated using Literate.jl.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"EditURL = \"betaml_tutorial_regression_sharingBikes.jl\"","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#regression_tutorial","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The task is to estimate the influence of several variables (like the weather, the season, the day of the week..) on the demand of shared bicycles, so that the authority in charge of the service can organise the service in the best way.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Data origin:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"original full dataset (by hour, not used here): https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset\nsimplified dataset (by day, with some simple scaling): https://www.hds.utc.fr/~tdenoeux/dokuwiki/en/aec\ndescription: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/exam2019ace.pdf\ndata: https://www.hds.utc.fr/~tdenoeux/dokuwiki/media/en/bikesharing_day.csv.zip","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Note that even if we are estimating a time serie, we are not using here a recurrent neural network as we assume the temporal dependence to be negligible (i.e. Y_t = f(X_t) alone).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Library-and-data-loading","page":"A regression task: the prediction of bike sharing demand","title":"Library and data loading","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Activating the local environment specific to","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"using Pkg\nPkg.activate(joinpath(@__DIR__,\"..\",\"..\",\"..\"))","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We first load all the packages we are going to use","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"using LinearAlgebra, Random, Statistics, StableRNGs, DataFrames, CSV, Plots, Pipe, BenchmarkTools, BetaML\nimport Distributions: Uniform, DiscreteUniform\nimport DecisionTree, Flux ## For comparisions","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Here we are explicit and we use our own fixed RNG:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"seed = 123 # The table at the end of this tutorial has been obtained with seeds 123, 1000 and 10000\nAFIXEDRNG = StableRNG(seed)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Here we load the data from a csv provided by the BataML package","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"basedir = joinpath(dirname(pathof(BetaML)),\"..\",\"docs\",\"src\",\"tutorials\",\"Regression - bike sharing\")\ndata = CSV.File(joinpath(basedir,\"data\",\"bike_sharing_day.csv\"),delim=',') |> DataFrame\ndescribe(data)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The variable we want to learn to predict is cnt, the total demand of bikes for a given day. Even if it is indeed an integer, we treat it as a continuous variable, so each single prediction will be a scalar Y in mathbbR.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"plot(data.cnt, title=\"Daily bike sharing rents (2Y)\", label=nothing)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Decision-Trees","page":"A regression task: the prediction of bike sharing demand","title":"Decision Trees","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We start our regression task with Decision Trees.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Decision trees training consist in choosing the set of questions (in a hierarcical way, so to form indeed a \"decision tree\") that \"best\" split the dataset given for training, in the sense that the split generate the sub-samples (always 2 subsamples in the BetaML implementation) that are, for the characteristic we want to predict, the most homogeneous possible. Decision trees are one of the few ML algorithms that has an intuitive interpretation and can be used for both regression or classification tasks.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Data-preparation","page":"A regression task: the prediction of bike sharing demand","title":"Data preparation","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The first step is to prepare the data for the analysis. This indeed depends already on the model we want to employ, as some models \"accept\" almost everything as input, no matter if the data is numerical or categorical, if it has missing values or not... while other models are instead much more exigents, and require more work to \"clean up\" our dataset.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The tutorial starts using Decision Tree and Random Forest models that definitly belong to the first group, so the only thing we have to do is to select the variables in input (the \"feature matrix\", that we will indicate with \"X\") and the variable representing our output (the information we want to learn to predict, we call it \"y\"):","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"x = Matrix{Float64}(data[:,[:instant,:season,:yr,:mnth,:holiday,:weekday,:workingday,:weathersit,:temp,:atemp,:hum,:windspeed]])\ny = data[:,16];\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We finally set up a dataframe to store the relative mean errors of the various models we'll use.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"results = DataFrame(model=String[],train_rme=Float64[],test_rme=Float64[])","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Model-selection","page":"A regression task: the prediction of bike sharing demand","title":"Model selection","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can now split the dataset between the data that we will use for training the algorithm and selecting the hyperparameters (xtrain/ytrain) and those for testing the quality of the algoritm with the optimal hyperparameters (xtest/ytest). We use the partition function specifying the share we want to use for these two different subsets, here 80%, and 20% respectively. As our data represents indeed a time serie, we want our model to be able to predict future demand of bike sharing from past, observed rented bikes, so we do not shuffle the datasets as it would be the default.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.75,1-0.75],shuffle=false)\n(ntrain, ntest) = size.([ytrain,ytest],1)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Then we define the model we want to use, DecisionTreeEstimator in this case, and we create an instance of the model:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"m = DecisionTreeEstimator(autotune=true, rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Passing a fixed Random Number Generator (RNG) to the rng parameter guarantees that everytime we use the model with the same data (from the model creation downward to value prediciton) we obtain the same results. In particular BetaML provide FIXEDRNG, an istance of StableRNG that guarantees reproducibility even across different Julia versions. See the section \"Dealing with stochasticity\" for details. Note the autotune parameter. BetaML has perhaps what is the easiest method for automatically tuning the model hyperparameters (thus becoming in this way learned parameters). Indeed, in most cases it is enought to pass the attribute autotune=true on the model constructor and hyperparameters search will be automatically performed on the first fit! call. If needed we can customise hyperparameter tuning, chosing the tuning method on the parameter tunemethod. The single-line above is equivalent to:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"tuning_method = SuccessiveHalvingSearch(\n hpranges = Dict(\"max_depth\" =>[5,10,nothing], \"min_gain\"=>[0.0, 0.1, 0.5], \"min_records\"=>[2,3,5],\"max_features\"=>[nothing,5,10,30]),\n loss = l2loss_by_cv,\n res_shares = [0.05, 0.2, 0.3],\n multithreads = true\n )\nm_dt = DecisionTreeEstimator(autotune=true, rng=copy(AFIXEDRNG), tunemethod=tuning_method)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Note that the defaults change according to the specific model, for example RandomForestEstimator](@ref) autotuning default to not being multithreaded, as the individual model is already multithreaded.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"tip: Tip\nRefer to the versions of this tutorial for BetaML <= 0.6 for a good exercise on how to perform model selection using the cross_validation function, or even by custom grid search.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can now fit the model, that is learn the model parameters that lead to the best predictions from the data. By default (unless we use cache=false in the model constructor) the model stores also the training predictions, so we can just use fit!() instead of fit!() followed by predict(model,xtrain)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrain = fit!(m_dt,xtrain,ytrain)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The above code produces a fitted DecisionTreeEstimator object that can be used to make predictions given some new features, i.e. given a new X matrix of (number of observations x dimensions), predict the corresponding Y vector of scalars in R.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtest = predict(m_dt, xtest)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We now compute the mean relative error for the training and the test set. The relative_mean_error is a very flexible error function. Without additional parameter, it computes, as the name says, the relative mean error, between an estimated and a true vector. However it can also compute the mean relative error, also known as the \"mean absolute percentage error\" (MAPE), or use a p-norm higher than 1. The mean relative error enfatises the relativeness of the error, i.e. all observations and dimensions weigth the same, wether large or small. Conversly, in the relative mean error the same relative error on larger observations (or dimensions) weights more. In this tutorial we use the later, as our data has clearly some outlier days with very small rents, and we care more of avoiding our customers finding empty bike racks than having unrented bikes on the rack. Targeting a low mean average error would push all our predicitons down to try accomodate the low-level predicitons (to avoid a large relative error), and that's not what we want.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can then compute the relative mean error for the decision tree","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"rme_train = relative_mean_error(ytrain,ŷtrain) # 0.1367\nrme_test = relative_mean_error(ytest,ŷtest) # 0.1547","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"And we save the real mean accuracies in the results dataframe:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"push!(results,[\"DT\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can plot the true labels vs the estimated one for the three subsets...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytrain,ŷtrain,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in training period (DT)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytest,ŷtest,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in testing period (DT)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Or we can visualise the true vs estimated bike shared on a temporal base. First on the full period (2 years) ...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))\nŷtestfull = vcat(fill(missing,ntrain), ŷtest)\nplot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=[\"obs\" \"train\" \"test\"], legend=:topleft, ylabel=\"daily rides\", title=\"Daily bike sharing demand observed/estimated across the\\n whole 2-years period (DT)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"..and then focusing on the testing period","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"stc = ntrain\nendc = size(x,1)\nplot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=[\"obs\" \"test\"], legend=:bottomleft, ylabel=\"Daily rides\", title=\"Focus on the testing period (DT)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The predictions aren't so bad in this case, however decision trees are highly instable, and the output could have depended just from the specific initial random seed.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Random-Forests","page":"A regression task: the prediction of bike sharing demand","title":"Random Forests","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Rather than trying to solve this problem using a single Decision Tree model, let's not try to use a Random Forest model. Random forests average the results of many different decision trees and provide a more \"stable\" result. Being made of many decision trees, random forests are hovever more computationally expensive to train.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"m_rf = RandomForestEstimator(autotune=true, oob=true, rng=copy(AFIXEDRNG))\nŷtrain = fit!(m_rf,xtrain,ytrain);\nŷtest = predict(m_rf,xtest);\nrme_train = relative_mean_error(ytrain,ŷtrain) # 0.056\nrme_test = relative_mean_error(ytest,ŷtest) # 0.161\npush!(results,[\"RF\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"While slower than individual decision trees, random forests remain relativly fast. We should also consider that they are by default efficiently parallelised, so their speed increases with the number of available cores (in building this documentation page, GitHub CI servers allow for a single core, so all the bechmark you see in this tutorial are run with a single core available).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Random forests support the so-called \"out-of-bag\" error, an estimation of the error that we would have when the model is applied on a testing sample. However in this case the oob reported is much smaller than the testing error we will actually find. This is due to the fact that the division between training/validation and testing in this exercise is not random, but has a temporal basis. It seems that in this example the data in validation/testing follows a different pattern/variance than those in training (in probabilistic terms, the daily observations are not i.i.d.).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"info(m_rf)\noob_error, rme_test = info(m_rf)[\"oob_errors\"],relative_mean_error(ytest,ŷtest)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"In this case we found an error very similar to the one employing a single decision tree. Let's print the observed data vs the estimated one using the random forest and then along the temporal axis:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytrain,ŷtrain,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in training period (RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytest,ŷtest,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in testing period (RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Full period plot (2 years):","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))\nŷtestfull = vcat(fill(missing,ntrain), ŷtest)\nplot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=[\"obs\" \"train\" \"test\"], legend=:topleft, ylabel=\"daily rides\", title=\"Daily bike sharing demand observed/estimated across the\\n whole 2-years period (RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Focus on the testing period:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"stc = 620\nendc = size(x,1)\nplot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtrainfull[stc:endc] ŷtestfull[stc:endc]], label=[\"obs\" \"val\" \"test\"], legend=:bottomleft, ylabel=\"Daily rides\", title=\"Focus on the testing period (RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Comparison-with-DecisionTree.jl-random-forest","page":"A regression task: the prediction of bike sharing demand","title":"Comparison with DecisionTree.jl random forest","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We now compare our results with those obtained employing the same model in the DecisionTree package, using the hyperparameters of the obtimal BetaML Random forest model:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"best_rf_hp = hyperparameters(m_rf)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Hyperparameters of the DecisionTree.jl random forest model","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"n_subfeatures=isnothing(best_rf_hp.max_features) ? -1 : best_rf_hp.max_features; n_trees=best_rf_hp.n_trees; partial_sampling=0.7; max_depth=isnothing(best_rf_hp.max_depth) ? typemax(Int64) : best_rf_hp.max_depth;\nmin_samples_leaf=best_rf_hp.min_records; min_samples_split=best_rf_hp.min_records; min_purity_increase=best_rf_hp.min_gain;\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We train the model..","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"model = DecisionTree.build_forest(ytrain, convert(Matrix,xtrain),\n n_subfeatures,\n n_trees,\n partial_sampling,\n max_depth,\n min_samples_leaf,\n min_samples_split,\n min_purity_increase;\n rng = seed)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"And we generate predictions and measure their error","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"(ŷtrain,ŷtest) = DecisionTree.apply_forest.([model],[xtrain,xtest]);\n\n\n(rme_train, rme_test) = relative_mean_error.([ytrain,ytest],[ŷtrain,ŷtest]) # 0.022 and 0.304\npush!(results,[\"RF (DecisionTree.jl)\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"While the train error is very small, the error on the test set remains relativly high. The very low error level on the training set is a sign that it overspecialised on the training set, and we should have better ran a dedicated hyper-parameter tuning function for the DecisionTree.jl model (we did try using the default DecisionTrees.jl parameters, but we obtained roughtly the same results).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Finally we plot the DecisionTree.jl predictions alongside the observed value:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))\nŷtestfull = vcat(fill(missing,ntrain), ŷtest)\nplot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=[\"obs\" \"train\" \"test\"], legend=:topleft, ylabel=\"daily rides\", title=\"Daily bike sharing demand observed/estimated across the\\n whole 2-years period (DT.jl RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Again, focusing on the testing data:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"stc = ntrain\nendc = size(x,1)\nplot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=[\"obs\" \"test\"], legend=:bottomleft, ylabel=\"Daily rides\", title=\"Focus on the testing period (DT.jl RF)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Conclusions-of-Decision-Trees-/-Random-Forests-methods","page":"A regression task: the prediction of bike sharing demand","title":"Conclusions of Decision Trees / Random Forests methods","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The error obtained employing DecisionTree.jl is significantly larger than those obtained using a BetaML random forest model, altought to be fair with DecisionTrees.jl we didn't tuned its hyper-parameters. Also, the DecisionTree.jl random forest model is much faster. This is partially due by the fact that, internally, DecisionTree.jl models optimise the algorithm by sorting the observations. BetaML trees/forests don't employ this optimisation and hence they can work with true categorical data for which ordering is not defined. An other explanation of this difference in speed is that BetaML Random Forest models accept missing values within the feature matrix. To sum up, BetaML random forests are ideal algorithms when we want to obtain good predictions in the most simpler way, even without manually tuning the hyper-parameters, and without spending time in cleaning (\"munging\") the feature matrix, as they accept almost \"any kind\" of data as it is.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Neural-Networks","page":"A regression task: the prediction of bike sharing demand","title":"Neural Networks","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"BetaML provides only deep forward neural networks, artificial neural network units where the individual \"nodes\" are arranged in layers, from the input layer, where each unit holds the input coordinate, through various hidden layer transformations, until the actual output of the model:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"(Image: Neural Networks)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"In this layerwise computation, each unit in a particular layer takes input from all the preceding layer units and it has its own parameters that are adjusted to perform the overall computation. The training of the network consists in retrieving the coefficients that minimise a loss function between the output of the model and the known data. In particular, a deep (feedforward) neural network refers to a neural network that contains not only the input and output layers, but also (a variable number of) hidden layers in between.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Neural networks accept only numerical inputs. We hence need to convert all categorical data in numerical units. A common approach is to use the so-called \"one-hot-encoding\" where the catagorical values are converted into indicator variables (0/1), one for each possible value. This can be done in BetaML using the OneHotEncoder function:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"seasonDummies = fit!(OneHotEncoder(),data.season)\nweatherDummies = fit!(OneHotEncoder(),data.weathersit)\nwdayDummies = fit!(OneHotEncoder(),data.weekday .+ 1)\n\n\n# We compose the feature matrix with the new dimensions obtained from the onehotencoder functions\nx = hcat(Matrix{Float64}(data[:,[:instant,:yr,:mnth,:holiday,:workingday,:temp,:atemp,:hum,:windspeed]]),\n seasonDummies,\n weatherDummies,\n wdayDummies)\ny = data[:,16];\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"As we did for decision trees/ random forests, we split the data in training, validation and testing sets","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.75,1-0.75],shuffle=false)\n(ntrain, ntest) = size.([ytrain,ytest],1)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"An other common operation with neural networks is to scale the feature vectors (X) and the labels (Y). The BetaML Scaler model, by default, scales the data such that each dimension has mean 0 and variance 1.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Note that we can provide the Scaler` model with different scale factors or specify the columns that shoudn't be scaled (e.g. those resulting from the one-hot encoding). Finally we can reverse the scaling (this is useful to retrieve the unscaled features from a model trained with scaled ones).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"cols_nottoscale = [2;4;5;10:23]\nxsm = Scaler(skip=cols_nottoscale)\nxtrain_scaled = fit!(xsm,xtrain)\nxtest_scaled = predict(xsm,xtest)\nytrain_scaled = ytrain ./ 1000 # We just divide Y by 1000, as using full scaling of Y we may get negative demand.\nytest_scaled = ytest ./ 1000\nD = size(xtrain,2)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can now build our feed-forward neaural network. We create three layers, the first layers will always have a input size equal to the dimensions of our data (the number of columns), and the output layer, for a simple regression where the predictions are scalars, it will always be one. We will tune the size of the middle layer size.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"There are already several kind of layers available (and you can build your own kind by defining a new struct and implementing a few functions. See the Nn module documentation for details). Here we use only dense layers, those found in typycal feed-fordward neural networks.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"For each layer, on top of its size (in \"neurons\") we can specify an activation function. Here we use the relu for the terminal layer (this will guarantee that our predictions are always positive) and identity for the hidden layer. Again, consult the Nn module documentation for other activation layers already defined, or use any function of your choice.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Initial weight parameters can also be specified if needed. By default DenseLayer use the so-called Xavier initialisation.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Let's hence build our candidate neural network structures, choosing between 5 and 10 nodes in the hidden layers:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"candidate_structures = [\n [DenseLayer(D,k,f=relu,df=drelu,rng=copy(AFIXEDRNG)), # Activation function is ReLU, it's derivative is drelu\n DenseLayer(k,k,f=identity,df=identity,rng=copy(AFIXEDRNG)), # This is the hidden layer we vant to test various sizes\n DenseLayer(k,1,f=relu,df=didentity,rng=copy(AFIXEDRNG))] for k in 5:2:10]","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Note that specify the derivatives of the activation functions (and of the loss function that we'll see in a moment) it totally optional, as without them BetaML will use [Zygote.jl](https://github.com/FluxML/Zygote.jl for automatic differentiation.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We do also set a few other parameters as \"turnable\": the number of \"epochs\" to train the model (the number of iterations trough the whole dataset), the sample size at each batch and the optimisation algorithm to use. Several optimisation algorithms are indeed available, and each accepts different parameters, like the learning rate for the Stochastic Gradient Descent algorithm (SGD, used by default) or the exponential decay rates for the moments estimates for the ADAM algorithm (that we use here, with the default parameters).","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The hyperparameter ranges will then look as follow:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"hpranges = Dict(\"layers\" => candidate_structures,\n \"epochs\" => rand(copy(AFIXEDRNG),DiscreteUniform(50,100),3), # 3 values sampled at random between 50 and 100\n \"batch_size\" => [4,8,16],\n \"opt_alg\" => [SGD(λ=2),SGD(λ=1),SGD(λ=3),ADAM(λ=0.5),ADAM(λ=1),ADAM(λ=0.25)])","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Finally we can build \"neural network\" NeuralNetworkEstimator model where we \"chain\" the layers together and we assign a final loss function (again, you can provide your own loss function, if those available in BetaML don't suit your needs):","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"nnm = NeuralNetworkEstimator(loss=squared_cost, descr=\"Bike sharing regression model\", tunemethod=SuccessiveHalvingSearch(hpranges = hpranges), autotune=true,rng=copy(AFIXEDRNG)) # Build the NN model and use the squared cost (aka MSE) as error function by default","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We can now fit and autotune the model:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrain_scaled = fit!(nnm,xtrain_scaled,ytrain_scaled)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The model training is one order of magnitude slower than random forests, altought the memory requirement is approximatly the same.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"To obtain the neural network predictions we apply the function predict to the feature matrix X for which we want to generate previsions, and then we rescale y. Normally we would apply here the inverse_predict function, but as we simple divided by 1000, we multiply ŷ by the same amount:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrain = ŷtrain_scaled .* 1000\nŷtest = predict(nnm,xtest_scaled) .* 1000","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"(rme_train, rme_test) = relative_mean_error.([ŷtrain,ŷtest],[ytrain,ytest])\npush!(results,[\"NN\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"The error is much lower. Let's plot our predictions:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Again, we can start by plotting the estimated vs the observed value:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytrain,ŷtrain,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in training period (NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytest,ŷtest,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in testing period (NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We now plot across the time dimension, first plotting the whole period (2 years):","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))\nŷtestfull = vcat(fill(missing,ntrain), ŷtest)\nplot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=[\"obs\" \"train\" \"test\"], legend=:topleft, ylabel=\"daily rides\", title=\"Daily bike sharing demand observed/estimated across the\\n whole 2-years period (NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"...and then focusing on the testing data","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"stc = 620\nendc = size(x,1)\nplot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=[\"obs\" \"val\" \"test\"], legend=:bottomleft, ylabel=\"Daily rides\", title=\"Focus on the testing period (NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Comparison-with-Flux.jl","page":"A regression task: the prediction of bike sharing demand","title":"Comparison with Flux.jl","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We now apply the same Neural Network model using the Flux framework, a dedicated neural network library, reusing the optimal parameters that we did learn from tuning NeuralNetworkEstimator:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"hp_opt = hyperparameters(nnm)\nopt_size = size(hp_opt.layers[1])[2][1]\nopt_batch_size = hp_opt.batch_size\nopt_epochs = hp_opt.epochs","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We fix the default random number generator so that the Flux example gives a reproducible output","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Random.seed!(seed)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We define the Flux neural network model and load it with data...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"l1 = Flux.Dense(D,opt_size,Flux.relu)\nl2 = Flux.Dense(opt_size,opt_size,identity)\nl3 = Flux.Dense(opt_size,1,Flux.relu)\nFlux_nn = Flux.Chain(l1,l2,l3)\nfluxloss(x, y) = Flux.mse(Flux_nn(x), y)\nps = Flux.params(Flux_nn)\nnndata = Flux.Data.DataLoader((xtrain_scaled', ytrain_scaled'), batchsize=opt_batch_size,shuffle=true)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We do the training of the Flux model...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"[Flux.train!(fluxloss, ps, nndata, Flux.ADAM(0.001, (0.9, 0.8))) for i in 1:opt_epochs]","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We obtain the predicitons...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainf = @pipe Flux_nn(xtrain_scaled')' .* 1000;\nŷtestf = @pipe Flux_nn(xtest_scaled')' .* 1000;\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"..and we compute the mean relative errors..","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"(rme_train, rme_test) = relative_mean_error.([ŷtrainf,ŷtestf],[ytrain,ytest])\npush!(results,[\"NN (Flux.jl)\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":".. finding an error not significantly different than the one obtained from BetaML.Nn.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Plots:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytrain,ŷtrainf,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in training period (Flux.NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"scatter(ytest,ŷtestf,xlabel=\"daily rides\",ylabel=\"est. daily rides\",label=nothing,title=\"Est vs. obs in testing period (Flux.NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainfullf = vcat(ŷtrainf,fill(missing,ntest))\nŷtestfullf = vcat(fill(missing,ntrain), ŷtestf)\nplot(data[:,:dteday],[data[:,:cnt] ŷtrainfullf ŷtestfullf], label=[\"obs\" \"train\" \"test\"], legend=:topleft, ylabel=\"daily rides\", title=\"Daily bike sharing demand observed/estimated across the\\n whole 2-years period (Flux.NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"stc = 620\nendc = size(x,1)\nplot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfullf[stc:endc]], label=[\"obs\" \"val\" \"test\"], legend=:bottomleft, ylabel=\"Daily rides\", title=\"Focus on the testing period (Flux.NN)\")","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Conclusions-of-Neural-Network-models","page":"A regression task: the prediction of bike sharing demand","title":"Conclusions of Neural Network models","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"If we strive for the most accurate predictions, deep neural networks are usually the best choice. However they are computationally expensive, so with limited resourses we may get better results by fine tuning and running many repetitions of \"simpler\" decision trees or even random forest models than a large naural network with insufficient hyper-parameter tuning. Also, we shoudl consider that decision trees/random forests are much simpler to work with.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"That said, specialised neural network libraries, like Flux, allow to use GPU and specialised hardware letting neural networks to scale with very large datasets.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Still, for small and medium datasets, BetaML provides simpler yet customisable solutions that are accurate and fast.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#GMM-based-regressors","page":"A regression task: the prediction of bike sharing demand","title":"GMM-based regressors","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"BetaML 0.8 introduces new regression algorithms based on Gaussian Mixture Model. Specifically, there are two variants available, GaussianMixtureRegressor2 and GaussianMixtureRegressor, and this example uses GaussianMixtureRegressor As for neural networks, they work on numerical data only, so we reuse the datasets we prepared for the neural networks.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"As usual we first define the model.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"m = GaussianMixtureRegressor(rng=copy(AFIXEDRNG),verbosity=NONE)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"info: Info\nWe disabled autotune here, as this code is run by GitHub continuous_integration servers on each code update, and GitHub servers seem to have some strange problem with it, taking almost 4 hours instead of a few seconds on my machine.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"We then fit the model to the training data..","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainGMM_unscaled = fit!(m,xtrain_scaled,ytrain_scaled)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"And we predict...","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"ŷtrainGMM = ŷtrainGMM_unscaled .* 1000;\nŷtestGMM = predict(m,xtest_scaled) .* 1000;\n\n(rme_train, rme_test) = relative_mean_error.([ŷtrainGMM,ŷtestGMM],[ytrain,ytest])\npush!(results,[\"GMM\",rme_train,rme_test]);\nnothing #hide","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html#Summary","page":"A regression task: the prediction of bike sharing demand","title":"Summary","text":"","category":"section"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"This is the summary of the results (train and test relative mean error) we had trying to predict the daily bike sharing demand, given weather and calendar information:","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"println(results)","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"You may ask how stable are these results? How much do they depend from the specific RNG seed ? We re-evaluated a couple of times the whole script but changing random seeds (to 1000 and 10000):","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Model Train rme1 Test rme1 Train rme2 Test rme2 Train rme3 Test rme3\nDT 0.1366960 0.154720 0.0233044 0.249329 0.0621571 0.161657\nRF 0.0421267 0.180186 0.0535776 0.136920 0.0386144 0.141606\nRF (DecisionTree.jl) 0.0230439 0.235823 0.0801040 0.243822 0.0168764 0.219011\nNN 0.1604000 0.169952 0.1091330 0.121496 0.1481440 0.150458\nNN (Flux.jl) 0.0931161 0.166228 0.0920796 0.167047 0.0907810 0.122469\nGaussianMixtureRegressor* 0.1432800 0.293891 0.1380340 0.295470 0.1477570 0.284567","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"GMM is a deterministic model, the variations are due to the different random sampling in choosing the best hyperparameters","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"Neural networks can be more precise than random forests models, but are more computationally expensive (and tricky to set up). When we compare BetaML with the algorithm-specific leading packages, we found similar results in terms of accuracy, but often the leading packages are better optimised and run more efficiently (but sometimes at the cost of being less versatile). GMM_based regressors are very computationally cheap and a good compromise if accuracy can be traded off for performances.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"View this file on Github.","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"","category":"page"},{"location":"tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html","page":"A regression task: the prediction of bike sharing demand","title":"A regression task: the prediction of bike sharing demand","text":"This page was generated using Literate.jl.","category":"page"},{"location":"Clustering.html#clustering_module","page":"Clustering","title":"The BetaML.Clustering Module","text":"","category":"section"},{"location":"Clustering.html","page":"Clustering","title":"Clustering","text":"Clustering","category":"page"},{"location":"Clustering.html#BetaML.Clustering","page":"Clustering","title":"BetaML.Clustering","text":"Clustering module (WIP)\n\n(Hard) Clustering algorithms \n\nProvide hard clustering methods using K-means and K-medoids. Please see also the GMM module for GMM-based soft clustering (i.e. where a probability distribution to be part of the various classes is assigned to each record instead of a single class), missing values imputation / collaborative filtering / reccomendation systems using clustering methods as backend.\n\nThe module provides the following models. Use ?[model] to access their documentation:\n\nKMeansClusterer: Classical K-mean algorithm\nKMedoidsClusterer: K-medoids algorithm with configurable distance metric\n\nSome metrics of the clustered output are available (e.g. silhouette).\n\n\n\n\n\n","category":"module"},{"location":"Clustering.html#Module-Index","page":"Clustering","title":"Module Index","text":"","category":"section"},{"location":"Clustering.html","page":"Clustering","title":"Clustering","text":"Modules = [Clustering]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Clustering.html#Detailed-API","page":"Clustering","title":"Detailed API","text":"","category":"section"},{"location":"Clustering.html","page":"Clustering","title":"Clustering","text":"Modules = [Clustering]\nPrivate = false","category":"page"},{"location":"Clustering.html#BetaML.Clustering.KMeansC_hp","page":"Clustering","title":"BetaML.Clustering.KMeansC_hp","text":"mutable struct KMeansC_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the KMeansClusterer model\n\nParameters:\n\nn_classes::Int64: Number of classes to discriminate the data [def: 3]\ndist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance, cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.\ninitialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:\n\"random\": randomly in the X space [default]\n\"grid\": using a grid approach\n\"shuffle\": selecting randomly within the available points\n\"given\": using a provided set of initial representatives provided in the initial_representatives parameter\n\ninitial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy=\"given\") [default: nothing]\n\n\n\n\n\n","category":"type"},{"location":"Clustering.html#BetaML.Clustering.KMeansClusterer","page":"Clustering","title":"BetaML.Clustering.KMeansClusterer","text":"mutable struct KMeansClusterer <: BetaMLUnsupervisedModel\n\nThe classical \"K-Means\" clustering algorithm (unsupervised).\n\nLearn to partition the data and assign each record to one of the n_classes classes according to a distance metric (default Euclidean).\n\nFor the parameters see ?KMeansC_hp and ?BML_options.\n\nNotes:\n\ndata must be numerical\nonline fitting (re-fitting with new data) is supported by using the \"old\" representatives as init ones\n\nExample :\n\njulia> using BetaML\n\njulia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8]\n5×2 Matrix{Float64}:\n 1.1 10.1\n 0.9 9.8\n 10.0 1.1\n 12.1 0.8\n 0.8 9.8\n\njulia> mod = KMeansClusterer(n_classes=2)\nKMeansClusterer - A K-Means Model (unfitted)\n\njulia> classes = fit!(mod,X)\n5-element Vector{Int64}:\n 1\n 1\n 2\n 2\n 1\n\njulia> newclasses = fit!(mod,[11 0.9])\n1-element Vector{Int64}:\n 2\n\njulia> info(mod)\nDict{String, Any} with 2 entries:\n \"fitted_records\" => 6\n \"av_distance_last_fit\" => 0.0\n \"xndims\" => 2\n\njulia> parameters(mod)\nBetaML.Clustering.KMeansMedoids_lp (a BetaMLLearnableParametersSet struct)\n- representatives: [1.13366 9.7209; 11.0 0.9]\n\n\n\n\n\n","category":"type"},{"location":"Clustering.html#BetaML.Clustering.KMedoidsC_hp","page":"Clustering","title":"BetaML.Clustering.KMedoidsC_hp","text":"mutable struct KMedoidsC_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the and KMedoidsClusterer models\n\nParameters:\n\nn_classes::Int64: Number of classes to discriminate the data [def: 3]\ndist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance, cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.\ninitialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:\n\"random\": randomly in the X space\n\"grid\": using a grid approach\n\"shuffle\": selecting randomly within the available points [default]\n\"given\": using a provided set of initial representatives provided in the initial_representatives parameter\n\ninitial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy=\"given\") [default: nothing]\n\n\n\n\n\n","category":"type"},{"location":"Clustering.html#BetaML.Clustering.KMedoidsClusterer","page":"Clustering","title":"BetaML.Clustering.KMedoidsClusterer","text":"mutable struct KMedoidsClusterer <: BetaMLUnsupervisedModel\n\nThe classical \"K-Medoids\" clustering algorithm (unsupervised).\n\nSimilar to K-Means, learn to partition the data and assign each record to one of the n_classes classes according to a distance metric, but the \"representatives\" (the cetroids) are guaranteed to be one of the training points. The algorithm work with any arbitrary distance measure (default Euclidean).\n\nFor the parameters see ?KMedoidsC_hp and ?BML_options.\n\nNotes:\n\ndata must be numerical\nonline fitting (re-fitting with new data) is supported by using the \"old\" representatives as init ones\nwith initialisation_strategy different than shuffle (the default initialisation for K-Medoids) the representatives may not be one of the training points when the algorithm doesn't perform enought iterations. This can happen for example when the number of classes is close to the number of records to cluster.\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8]\n5×2 Matrix{Float64}:\n 1.1 10.1\n 0.9 9.8\n 10.0 1.1\n 12.1 0.8\n 0.8 9.8\n\njulia> mod = KMedoidsClusterer(n_classes=2)\nKMedoidsClusterer - A K-Medoids Model (unfitted)\n\njulia> classes = fit!(mod,X)\n5-element Vector{Int64}:\n 1\n 1\n 2\n 2\n 1\n\njulia> newclasses = fit!(mod,[11 0.9])\n1-element Vector{Int64}:\n 2\n\njulia> info(mod)\nDict{String, Any} with 2 entries:\n\"fitted_records\" => 6\n\"av_distance_last_fit\" => 0.0\n\"xndims\" => 2\n\njulia> parameters(mod)\nBetaML.Clustering.KMeansMedoids_lp (a BetaMLLearnableParametersSet struct)\n- representatives: [0.9 9.8; 11.0 0.9]\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#perceptron_module","page":"Perceptron","title":"The BetaML.Perceptron Module","text":"","category":"section"},{"location":"Perceptron.html","page":"Perceptron","title":"Perceptron","text":"Perceptron","category":"page"},{"location":"Perceptron.html#BetaML.Perceptron","page":"Perceptron","title":"BetaML.Perceptron","text":"Perceptron module\n\nProvide linear and kernel classifiers.\n\nProvide the following supervised models:\n\nPerceptronClassifier: Train data using the classical perceptron\nKernelPerceptronClassifier: Train data using the kernel perceptron\nPegasosClassifier: Train data using the pegasos algorithm\n\nAll algorithms are multiclass, with PerceptronClassifier and PegasosClassifier employing a one-vs-all strategy, while KernelPerceptronClassifier employs a one-vs-one approach, and return a \"probability\" for each class in term of a dictionary for each record. Use mode(ŷ) to return a single class prediction per record.\n\nThese models are available in the MLJ framework as PerceptronClassifier,KernelPerceptronClassifier and PegasosClassifier respectivly.\n\n\n\n\n\n","category":"module"},{"location":"Perceptron.html#Module-Index","page":"Perceptron","title":"Module Index","text":"","category":"section"},{"location":"Perceptron.html","page":"Perceptron","title":"Perceptron","text":"Modules = [Perceptron]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"Perceptron.html#Detailed-API","page":"Perceptron","title":"Detailed API","text":"","category":"section"},{"location":"Perceptron.html","page":"Perceptron","title":"Perceptron","text":"Modules = [Perceptron]\nPrivate = false","category":"page"},{"location":"Perceptron.html#BetaML.Perceptron.KernelPerceptronC_hp","page":"Perceptron","title":"BetaML.Perceptron.KernelPerceptronC_hp","text":"mutable struct KernelPerceptronC_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the KernelPerceptronClassifier model\n\nParameters:\n\nkernel: Kernel function to employ. See ?radial_kernel or ?polynomial_kernel for details or check ?BetaML.Utils to verify if other kernels are defined (you can alsways define your own kernel) [def: radial_kernel]\ninitial_errors: Initial distribution of the number of errors errors [def: nothing, i.e. zeros]. If provided, this should be a nModels-lenght vector of nRecords integer values vectors , where nModels is computed as (n_classes * (n_classes - 1)) / 2\nepochs: Maximum number of epochs, i.e. passages trough the whole training sample [def: 100]\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ntunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#BetaML.Perceptron.KernelPerceptronClassifier","page":"Perceptron","title":"BetaML.Perceptron.KernelPerceptronClassifier","text":"mutable struct KernelPerceptronClassifier <: BetaMLSupervisedModel\n\nA \"kernel\" version of the Perceptron model (supervised) with user configurable kernel function.\n\nFor the parameters see ? KernelPerceptronC_hp and ?BML_options\n\nLimitations:\n\ndata must be numerical\nonline training (retraining) is not supported\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> quadratic_kernel(x,y) = polynomial_kernel(x,y;degree=2)\nquadratic_kernel (generic function with 1 method)\n\njulia> mod = KernelPerceptronClassifier(epochs=100, kernel= quadratic_kernel)\nKernelPerceptronClassifier - A \"kernelised\" version of the perceptron classifier (unfitted)\n\njulia> ŷ = fit!(mod,X,y) |> mode\nRunning function BetaML.Perceptron.#KernelPerceptronClassifierBinary#17 at /home/lobianco/.julia/dev/BetaML/src/Perceptron/Perceptron_kernel.jl:133\nType `]dev BetaML` to modify the source code (this would change its location on disk)\n***\n*** Training kernel perceptron for maximum 100 iterations. Random shuffle: true\nAvg. error after iteration 1 : 0.5\nAvg. error after iteration 10 : 0.16666666666666666\n*** Avg. error after epoch 13 : 0.0 (all elements of the set has been correctly classified)\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#BetaML.Perceptron.PegasosC_hp","page":"Perceptron","title":"BetaML.Perceptron.PegasosC_hp","text":"mutable struct PegasosC_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the PegasosClassifier model.\n\nParameters:\n\nlearning_rate::Function: Learning rate [def: (epoch -> 1/sqrt(epoch))]\nlearning_rate_multiplicative::Float64: Multiplicative term of the learning rate [def: 0.5]\ninitial_parameters::Union{Nothing, Matrix{Float64}}: Initial parameters. If given, should be a matrix of n-classes by feature dimension + 1 (to include the constant term as the first element) [def: nothing, i.e. zeros]\nepochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]\nshuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\nforce_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]\nreturn_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]\ntunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#BetaML.Perceptron.PegasosClassifier","page":"Perceptron","title":"BetaML.Perceptron.PegasosClassifier","text":"mutable struct PegasosClassifier <: BetaMLSupervisedModel\n\nThe PegasosClassifier model, a linear, gradient-based classifier. Multiclass is supported using a one-vs-all approach.\n\nSee ?PegasosC_hp and ?BML_options for applicable hyperparameters and options. \n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> mod = PegasosClassifier(epochs=100,learning_rate = (epoch -> 0.05) )\nPegasosClassifier - a loss-based linear classifier without regularisation term (unfitted)\n\njulia> ŷ = fit!(mod,X,y) |> mode\n***\n*** Training pegasos for maximum 100 iterations. Random shuffle: true\nAvg. error after iteration 1 : 0.5\n*** Avg. error after epoch 3 : 0.0 (all elements of the set has been correctly classified)\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n \"a\"\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#BetaML.Perceptron.PerceptronC_hp","page":"Perceptron","title":"BetaML.Perceptron.PerceptronC_hp","text":"mutable struct PerceptronC_hp <: BetaMLHyperParametersSet\n\nHyperparameters for the PerceptronClassifier model\n\nParameters:\n\ninitial_parameters::Union{Nothing, Matrix{Float64}}: Initial parameters. If given, should be a matrix of n-classes by feature dimension + 1 (to include the constant term as the first element) [def: nothing, i.e. zeros]\nepochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]\nshuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\nforce_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]\nreturn_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]\ntunemethod::AutoTuneMethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"Perceptron.html#BetaML.Perceptron.PerceptronClassifier","page":"Perceptron","title":"BetaML.Perceptron.PerceptronClassifier","text":"mutable struct PerceptronClassifier <: BetaMLSupervisedModel\n\nThe classical \"perceptron\" linear classifier (supervised).\n\nFor the parameters see ?PerceptronC_hp and ?BML_options.\n\nNotes:\n\ndata must be numerical\nonline fitting (re-fitting with new data) is not supported\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];\n\njulia> y = [\"a\",\"b\",\"b\",\"b\",\"b\",\"a\"];\n\njulia> mod = PerceptronClassifier(epochs=100,return_mean_hyperplane=false)\nPerceptronClassifier - The classic linear perceptron classifier (unfitted)\n\njulia> ŷ = fit!(mod,X,y) |> mode\nRunning function BetaML.Perceptron.#perceptronBinary#84 at /home/lobianco/.julia/dev/BetaML/src/Perceptron/Perceptron_classic.jl:150\nType `]dev BetaML` to modify the source code (this would change its location on disk)\n***\n*** Training perceptron for maximum 100 iterations. Random shuffle: true\nAvg. error after iteration 1 : 0.5\n*** Avg. error after epoch 5 : 0.0 (all elements of the set has been correctly classified)\n6-element Vector{String}:\n \"a\"\n \"b\"\n \"b\"\n \"b\"\n \"b\"\n \"a\"\n\n\n\n\n\n","category":"type"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"EditURL = \"betaml_tutorial_classification_cars.jl\"","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#classification_tutorial","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"In this exercise we are provided with several technical characteristics (mpg, horsepower,weight, model year...) for several car's models, together with the country of origin of such models, and we would like to create a machine learning model such that the country of origin can be accurately predicted given the technical characteristics. As the information to predict is a multi-class one, this is a [classification](https://en.wikipedia.org/wiki/Statisticalclassification) task. It is a challenging exercise due to the simultaneous presence of three factors: (1) presence of missing data; (2) unbalanced data - 254 out of 406 cars are US made; (3) small dataset.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Data origin:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"dataset description: https://archive.ics.uci.edu/ml/datasets/auto+mpg\ndata source we use here: https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Field description:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"mpg: continuous\ncylinders: multi-valued discrete\ndisplacement: continuous\nhorsepower: continuous\nweight: continuous\nacceleration: continuous\nmodel year: multi-valued discrete\norigin: multi-valued discrete\ncar name: string (unique for each instance)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"The car name is not used in this tutorial, so that the country is inferred only from technical data. As this field includes also the car maker, and there are several car's models from the same car maker, a more sophisticated machine learnign model could exploit this information e.g. using a bag of word encoding.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Library-loading-and-initialisation","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Library loading and initialisation","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Activating the local environment specific to BetaML documentation","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"using Pkg\nPkg.activate(joinpath(@__DIR__,\"..\",\"..\",\"..\"))","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We load a buch of packages that we'll use during this tutorial..","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"using Random, HTTP, Plots, CSV, DataFrames, BenchmarkTools, StableRNGs, BetaML\nimport DecisionTree, Flux\nimport Pipe: @pipe","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Machine Learning workflows include stochastic components in several steps: in the data sampling, in the model initialisation and often in the models's own algorithms (and sometimes also in the prediciton step). BetaML provides a random nuber generator (RNG) in order to simplify reproducibility ( FIXEDRNG. This is nothing else than an istance of StableRNG(123) defined in the BetaML.Utils sub-module, but you can choose of course your own \"fixed\" RNG). See the Dealing with stochasticity section in the Getting started tutorial for details.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Here we are explicit and we use our own fixed RNG:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"seed = 123 # The table at the end of this tutorial has been obtained with seeds 123, 1000 and 10000\nAFIXEDRNG = StableRNG(seed)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Data-loading-and-preparation","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Data loading and preparation","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"To load the data from the internet our workflow is (1) Retrieve the data –> (2) Clean it –> (3) Load it –> (4) Output it as a DataFrame.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"For step (1) we use HTTP.get(), for step (2) we use replace!, for steps (3) and (4) we uses the CSV package, and we use the \"pip\" |> operator to chain these operations, so that no file is ever saved on disk:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"urlDataOriginal = \"https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original\"\ndata = @pipe HTTP.get(urlDataOriginal).body |>\n replace!(_, UInt8('\\t') => UInt8(' ')) |> # the original dataset has mixed field delimiters !\n CSV.File(_, delim=' ', missingstring=\"NA\", ignorerepeated=true, header=false) |>\n DataFrame;\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"This results in a table where the rows are the observations (the various cars' models) and the column the fields. All BetaML models expect this layout.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"As the dataset is ordered, we randomly shuffle the data.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"idx = randperm(copy(AFIXEDRNG),size(data,1))\ndata[idx, :]\ndescribe(data)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Columns 1 to 7 contain characteristics of the car, while column 8 encodes the country or origin (\"1\" -> US, \"2\" -> EU, \"3\" -> Japan). That's the variable we want to be able to predict.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Columns 9 contains the car name, but we are not going to use this information in this tutorial. Note also that some fields have missing data.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Our first step is hence to divide the dataset in features (the x) and the labels (the y) we want to predict. The x is then a Julia standard Matrix of 406 rows by 7 columns and the y is a vector of the 406 observations:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"x = Matrix{Union{Missing,Float64}}(data[:,1:7]);\ny = Vector{Int64}(data[:,8]);\nx = fit!(Scaler(),x)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Some algorithms that we will use today don't accept missing data, so we need to impute them. BetaML provides several imputation models in the Imputation module. Note that many of these imputation models can be used for Collaborative Filtering / Recomendation Systems. Models as GaussianMixtureImputer have the advantage over traditional algorithms as k-nearest neighbors (KNN) that GMM can \"detect\" the hidden structure of the observed data, where some observation can be similar to a certain pool of other observvations for a certain characteristic, but similar to an other pool of observations for other characteristics. Here we use RandomForestImputer. While the model allows for reproducible multiple imputations (with the parameter multiple_imputation=an_integer) and multiple passages trough the various columns (fields) containing missing data (with the option recursive_passages=an_integer), we use here just a single imputation and a single passage. As all BetaML models, RandomForestImputer follows the patters m=ModelConstruction(pars); fit!(m,x,[y]); est = predict(m,x) where est can be an estimation of some labels or be some characteristics of x itself (the imputed version, as in this case, a reprojected version as in PCAEncoder), depending if the model is supervised or not. See the API user documentationfor more details. For imputers, the output ofpredictis the matrix with the imputed values replacing the missing ones, and we write here the model in a single line using a convenience feature that when the defaultcacheparameter is used in the model constructor thefit!` function returns itself the prediciton over the trained data:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"x = fit!(RandomForestImputer(rng=copy(AFIXEDRNG)),x) # Same as `m = RandomForestImputer(rng=copy(AFIXEDRNG)); fit!(m,x); x= predict(m,x)`","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Further, some models don't work with categorical data as well, so we need to represent our y as a matrix with a separate column for each possible categorical value (the so called \"one-hot\" representation). For example, within a three classes field, the individual value 2 (or \"Europe\" for what it matters) would be represented as the vector [0 1 0], while 3 (or \"Japan\") would become the vector [0 0 1]. To encode as one-hot we use the OneHotEncoder in BetaML.Utils, using the same shortcut as for the imputer we used earlier:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"y_oh = fit!(OneHotEncoder(),y)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"In supervised machine learning it is good practice to partition the available data in a training, validation, and test subsets, where the first one is used to train the ML algorithm, the second one to train any eventual \"hyper-parameters\" of the algorithm and the test subset is finally used to evaluate the quality of the algorithm. Here, for brevity, we use only the train and the test subsets, implicitly assuming we already know the best hyper-parameters. Please refer to the regression tutorial for examples of the auto-tune feature of BetaML models to \"automatically\" train the hyper-parameters (hint: in most cases just add the parameter autotune=true in the model constructor), or the clustering tutorial for an example of using the cross_validation function to do it manually.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We use then the partition function in BetaML.Utils, where we can specify the different data to partition (each matrix or vector to partition must have the same number of observations) and the shares of observation that we want in each subset. Here we keep 80% of observations for training (xtrain, and ytrain) and we use 20% of them for testing (xtest, and ytest):","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"((xtrain,xtest),(ytrain,ytest),(ytrain_oh,ytest_oh)) = partition([x,y,y_oh],[0.8,1-0.8],rng=copy(AFIXEDRNG));\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We finally set up a dataframe to store the accuracies of the various models we'll use.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"results = DataFrame(model=String[],train_acc=Float64[],test_acc=Float64[])","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Random-Forests","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Random Forests","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We are now ready to use our first model, the RandomForestEstimator. Random Forests build a \"forest\" of decision trees models and then average their predictions in order to make an overall prediction, wheter a regression or a classification.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"While here the missing data has been imputed and the dataset is comprised of only numerical values, one attractive feature of BetaML RandomForestEstimator is that they can work directly with missing and categorical data without any prior processing required.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"However as the labels are encoded using integers, we need also to specify the parameter force_classification=true, otherwise the model would undergo a regression job instead.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"rfm = RandomForestEstimator(force_classification=true, rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Opposite to the RandomForestImputer and OneHotEncoder models used earielr, to train a RandomForestEstimator model we need to provide it with both the training feature matrix and the associated \"true\" training labels. We use the same shortcut to get the training predictions directly from the fit! function. In this case the predictions correspond to the labels:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtrain = fit!(rfm,xtrain,ytrain)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"You can notice that for each record the result is reported in terms of a dictionary with the possible categories and their associated probabilities.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"warning: Warning\nOnly categories with non-zero probabilities are reported for each record, and being a dictionary, the order of the categories is not undefined","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"For example ŷtrain[1] is a Dict(2 => 0.0333333, 3 => 0.933333, 1 => 0.0333333), indicating an overhelming probability that that car model originates from Japan. To retrieve the predictions with the highest probabilities use mode(ŷ):","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtrain_top = mode(ŷtrain,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Why mode takes (optionally) a RNG ? I let the answer for you :-)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"To obtain the predicted labels for the test set we simply run the predict function over the features of the test set:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtest = predict(rfm,xtest)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Finally we can measure the accuracy of our predictions with the accuracy function. We don't need to explicitly use mode, as accuracy does it itself when it is passed with predictions expressed as a dictionary:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"trainAccuracy,testAccuracy = accuracy.([ytrain,ytest],[ŷtrain,ŷtest],rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We are now ready to store our first model accuracies in the results dataframe:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"push!(results,[\"RF\",trainAccuracy,testAccuracy]);\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"The predictions are quite good, for the training set the algoritm predicted almost all cars' origins correctly, while for the testing set (i.e. those records that has not been used to train the algorithm), the correct prediction level is still quite high, at around 80% (depends on the random seed)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"While accuracy can sometimes suffice, we may often want to better understand which categories our model has trouble to predict correctly. We can investigate the output of a multi-class classifier more in-deep with a ConfusionMatrix where the true values (y) are given in rows and the predicted ones (ŷ) in columns, together to some per-class metrics like the precision (true class i over predicted in class i), the recall (predicted class i over the true class i) and others.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We fist build the ConfusionMatrix model, we train it with ŷ and y and then we print it (we do it here for the test subset):","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"cfm = ConfusionMatrix(categories_names=Dict(1=>\"US\",2=>\"EU\",3=>\"Japan\"),rng=copy(AFIXEDRNG))\nfit!(cfm,ytest,ŷtest) # the output is by default the confusion matrix in relative terms\nprint(cfm)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"From the report we can see that Japanese cars have more trouble in being correctly classified, and in particular many Japanease cars are classified as US ones. This is likely a result of the class imbalance of the data set, and could be solved by balancing the dataset with various sampling tecniques before training the model.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"If you prefer a more graphical approach, we can also plot the confusion matrix. In order to do so, we pick up information from the info(cfm) function. Indeed most BetaML models can be queried with info(model) to retrieve additional information, in terms of a dictionary, that is not necessary to the prediciton, but could still be relevant. Other functions that you can use with BetaML models are parameters(m) and hyperparamaeters(m).","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"res = info(cfm)\nheatmap(string.(res[\"categories\"]),string.(res[\"categories\"]),res[\"normalised_scores\"],seriescolor=cgrad([:white,:blue]),xlabel=\"Predicted\",ylabel=\"Actual\", title=\"Confusion Matrix (normalised scores)\")","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Comparision-with-DecisionTree.jl","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Comparision with DecisionTree.jl","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We now compare BetaML [RandomForestEstimator] with the random forest estimator of the package DecisionTrees.jl` random forests are similar in usage: we first \"build\" (train) the forest and we then make predictions out of the trained model.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"# We train the model...\nmodel = DecisionTree.build_forest(ytrain, xtrain,rng=seed)\n# ..and we generate predictions and measure their error\n(ŷtrain,ŷtest) = DecisionTree.apply_forest.([model],[xtrain,xtest]);\n(trainAccuracy,testAccuracy) = accuracy.([ytrain,ytest],[ŷtrain,ŷtest])\npush!(results,[\"RF (DecisionTrees.jl)\",trainAccuracy,testAccuracy]);\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"While the accuracy on the training set is exactly the same as for BetaML random forets, DecisionTree.jl random forests are slighly less accurate in the testing sample. Where however DecisionTrees.jl excell is in the efficiency: they are extremelly fast and memory thrifty, even if we should consider also the resources needed to impute the missing values, as they don't work with missing data.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Also, one of the reasons DecisionTrees are such efficient is that internally the data is sorted to avoid repeated comparision, but in this way they work only with features that are sortable, while BetaML random forests accept virtually any kind of input without the needs to process it.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Neural-network","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Neural network","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Neural networks (NN) can be very powerfull, but have two \"inconvenients\" compared with random forests: first, are a bit \"picky\". We need to do a bit of work to provide data in specific format. Note that this is not feature engineering. One of the advantages on neural network is that for the most this is not needed for neural networks. However we still need to \"clean\" the data. One issue is that NN don't like missing data. So we need to provide them with the feature matrix \"clean\" of missing data. Secondly, they work only with numerical data. So we need to use the one-hot encoding we saw earlier. Further, they work best if the features are scaled such that each feature has mean zero and standard deviation 1. This is why we scaled the data back at the beginning of this tutorial.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We firt measure the dimensions of our data in input (i.e. the column of the feature matrix) and the dimensions of our output, i.e. the number of categories or columns in out one-hot encoded y.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"D = size(xtrain,2)\nclasses = unique(y)\nnCl = length(classes)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"The second \"inconvenient\" of NN is that, while not requiring feature engineering, they still need a bit of practice on the way the structure of the network is built . It's not as simple as fit!(Model(),x,y) (altougth BetaML provides a \"default\" neural network structure that can be used, it isn't often adapted to the specific task). We need instead to specify how we want our layers, chain the layers together and then decide a loss overall function. Only when we done these steps, we have the model ready for training. Here we define 2 DenseLayer where, for each of them, we specify the number of neurons in input (the first layer being equal to the dimensions of the data), the output layer (for a classification task, the last layer output size beying equal to the number of classes) and an activation function for each layer (default the identity function).","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ls = 50 # number of neurons in the inned layer\nl1 = DenseLayer(D,ls,f=relu,rng=copy(AFIXEDRNG))\nl2 = DenseLayer(ls,nCl,f=relu,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"For a classification task, the last layer is a VectorFunctionLayer that has no learnable parameters but whose activation function is applied to the ensemble of the neurons, rather than individually on each neuron. In particular, for classification we pass the softmax function whose output has the same size as the input (i.e. the number of classes to predict), but we can use the VectorFunctionLayer with any function, including the pool1d function to create a \"pooling\" layer (using maximum, mean or whatever other sub-function we pass to pool1d)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"l3 = VectorFunctionLayer(nCl,f=softmax) ## Add a (parameterless) layer whose activation function (softmax in this case) is defined to all its nodes at once","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Finally we chain the layers and assign a loss function and the number of epochs we want to train the model to the constructor of NeuralNetworkEstimator:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"nn = NeuralNetworkEstimator(layers=[l1,l2,l3],loss=crossentropy,rng=copy(AFIXEDRNG),epochs=500)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Aside the layer structure and size and the number of epochs, other hyper-parameters you may want to try are the batch_size and the optimisation algoritm to employ (opt_alg).","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Now we can train our network:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtrain = fit!(nn, xtrain, ytrain_oh)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Predictions are in form of a nrecords_ by nclasses_ matrix of the probabilities of each record being in that class. To retrieve the classes with the highest probabilities we can use again the mode function:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtrain_top = mode(ŷtrain)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Once trained, we can predict the test labels. As the trained was based on the scaled feature matrix, so must be for the predictions","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"ŷtest = predict(nn,xtest)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"And finally we can measure the accuracies and store the accuracies in the result dataframe:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"trainAccuracy, testAccuracy = accuracy.([ytrain,ytest],[ŷtrain,ŷtest],rng=copy(AFIXEDRNG))\npush!(results,[\"NN\",trainAccuracy,testAccuracy]);\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"cfm = ConfusionMatrix(categories_names=Dict(1=>\"US\",2=>\"EU\",3=>\"Japan\"),rng=copy(AFIXEDRNG))\nfit!(cfm,ytest,ŷtest)\nprint(cfm)\nres = info(cfm)\nheatmap(string.(res[\"categories\"]),string.(res[\"categories\"]),res[\"normalised_scores\"],seriescolor=cgrad([:white,:blue]),xlabel=\"Predicted\",ylabel=\"Actual\", title=\"Confusion Matrix (normalised scores)\")","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"While accuracies are a bit lower, the distribution of misclassification is similar, with many Jamanease cars misclassified as US ones (here we have also some EU cars misclassified as Japanease ones).","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Comparisons-with-Flux","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Comparisons with Flux","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"As we did for Random Forests, we compare BetaML neural networks with the leading package for deep learning in Julia, Flux.jl.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"In Flux the input must be in the form (fields, observations), so we transpose our original matrices","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"xtrainT, ytrain_ohT = transpose.([xtrain, ytrain_oh])\nxtestT, ytest_ohT = transpose.([xtest, ytest_oh])","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We define the Flux neural network model in a similar way than BetaML and load it with data, we train it, predict and measure the accuracies on the training and the test sets:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We fix the random seed for Flux, altough you may still get different results depending on the number of threads used.. this is a problem we solve in BetaML with generate_parallel_rngs.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Random.seed!(seed)\n\nl1 = Flux.Dense(D,ls,Flux.relu)\nl2 = Flux.Dense(ls,nCl,Flux.relu)\nFlux_nn = Flux.Chain(l1,l2)\nfluxloss(x, y) = Flux.logitcrossentropy(Flux_nn(x), y)\nps = Flux.params(Flux_nn)\nnndata = Flux.Data.DataLoader((xtrainT, ytrain_ohT),shuffle=true)\nbegin for i in 1:500 Flux.train!(fluxloss, ps, nndata, Flux.ADAM()) end end\nŷtrain = Flux.onecold(Flux_nn(xtrainT),1:3)\nŷtest = Flux.onecold(Flux_nn(xtestT),1:3)\ntrainAccuracy, testAccuracy = accuracy.([ytrain,ytest],[ŷtrain,ŷtest])","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"push!(results,[\"NN (Flux.jl)\",trainAccuracy,testAccuracy]);\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"While the train accuracy is little bit higher that BetaML, the test accuracy remains comparable","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Perceptron-like-classifiers.","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Perceptron-like classifiers.","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We finaly test 3 \"perceptron-like\" classifiers, the \"classical\" Perceptron (PerceptronClassifier), one of the first ML algorithms (a linear classifier), a \"kernellised\" version of it (KernelPerceptronClassifier, default to using the radial kernel) and \"PegasosClassifier\" (PegasosClassifier) another linear algorithm that starts considering a gradient-based optimisation, altought without the regularisation term as in the Support Vector Machines (SVM).","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"As for the previous classifiers we construct the model object, we train and predict and we compute the train and test accuracies:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"pm = PerceptronClassifier(rng=copy(AFIXEDRNG))\nŷtrain = fit!(pm, xtrain, ytrain)\nŷtest = predict(pm, xtest)\n(trainAccuracy,testAccuracy) = accuracy.([ytrain,ytest],[ŷtrain,ŷtest])\npush!(results,[\"Perceptron\",trainAccuracy,testAccuracy]);\n\nkpm = KernelPerceptronClassifier(rng=copy(AFIXEDRNG))\nŷtrain = fit!(kpm, xtrain, ytrain)\nŷtest = predict(kpm, xtest)\n(trainAccuracy,testAccuracy) = accuracy.([ytrain,ytest],[ŷtrain,ŷtest])\npush!(results,[\"KernelPerceptronClassifier\",trainAccuracy,testAccuracy]);\n\n\npegm = PegasosClassifier(rng=copy(AFIXEDRNG))\nŷtrain = fit!(pegm, xtrain, ytrain)\nŷtest = predict(pm, xtest)\n(trainAccuracy,testAccuracy) = accuracy.([ytrain,ytest],[ŷtrain,ŷtest])\npush!(results,[\"Pegasaus\",trainAccuracy,testAccuracy]);\nnothing #hide","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html#Summary","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"Summary","text":"","category":"section"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"This is the summary of the results we had trying to predict the country of origin of the cars, based on their technical characteristics:","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"println(results)","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"If you clone BetaML repository","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Model accuracies on my machine with seedd 123, 1000 and 10000 respectivelly","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"model train 1 test 1 train 2 test 2 train 3 test 3\nRF 0.996923 0.765432 1.000000 0.802469 1.000000 0.888889\nRF (DecisionTrees.jl) 0.975385 0.765432 0.984615 0.777778 0.975385 0.864198\nNN 0.886154 0.728395 0.916923 0.827160 0.895385 0.876543\n│ NN (Flux.jl) 0.793846 0.654321 0.938462 0.790123 0.935385 0.851852\n│ Perceptron 0.778462 0.703704 0.720000 0.753086 0.670769 0.654321\n│ KernelPerceptronClassifier 0.987692 0.703704 0.978462 0.777778 0.944615 0.827160\n│ Pegasaus 0.732308 0.703704 0.633846 0.753086 0.575385 0.654321","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"We warn that this table just provides a rought idea of the various algorithms performances. Indeed there is a large amount of stochasticity both in the sampling of the data used for training/testing and in the initial settings of the parameters of the algorithm. For a statistically significant comparision we would have to repeat the analysis with multiple sampling (e.g. by cross-validation, see the clustering tutorial for an example) and initial random parameters.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"Neverthless the table above shows that, when we compare BetaML with the algorithm-specific leading packages, we found similar results in terms of accuracy, but often the leading packages are better optimised and run more efficiently (but sometimes at the cost of being less verstatile). Also, for this dataset, Random Forests seems to remain marginally more accurate than Neural Network, altought of course this depends on the hyper-parameters and, with a single run of the models, we don't know if this difference is significant.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"View this file on Github.","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"","category":"page"},{"location":"tutorials/Classification - cars/betaml_tutorial_classification_cars.html","page":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","title":"A classification task when labels are known - determining the country of origin of cars given the cars characteristics","text":"This page was generated using Literate.jl.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#getting_started","page":"Getting started","title":"Getting started","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html#Introduction","page":"Getting started","title":"Introduction","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"This \"tutorial\" part of the documentation presents a step-by-step guide to the main algorithms and utility functions provided by BetaML and comparisons with the leading packages in each field. Aside this page, the tutorial is divided in the following sections:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Classification tutorial - Topics: Decision trees and random forests, neural networks (softmax), dealing with stochasticity, loading data from internet\nRegression tutorial - Topics: Decision trees, Random forests, neural networks, hyper-parameters autotuning, one-hot encoding, continuous error measures\nClustering tutorial - Topics: k-means, kmedoids, generative (gaussian) mixture models (gmm), cross-validation, ordinal encoding","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Detailed usage instructions on each algorithm can be found on each model struct (listed here), while theoretical notes describing most of them can be found at the companion repository https://github.com/sylvaticus/MITx_6.86x.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"The overall \"philosophy\" of BetaML is to support simple machine learning tasks easily and make complex tasks possible. An the most basic level, the majority of algorithms have default parameters suitable for a basic analysis. A great level of flexibility can be already achieved by just employing the full set of model parameters, for example changing the distance function in KMedoidsClusterer to l1_distance (aka \"Manhattan distance\"). Finally, the greatest flexibility can be obtained by customising BetaML and writing, for example, its own neural network layer type (by subclassing AbstractLayer), its own sampler (by subclassing AbstractDataSampler) or its own mixture component (by subclassing AbstractMixture), In such a cases, while not required by any means, please consider to give it back to the community and open a pull request to integrate your work in BetaML.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"If you are looking for an introductory book on Julia, you could consider \"Julia Quick Syntax Reference\" (Apress,2019) or the online course \"Introduction to Scientific Programming and Machine Learning with Julia\".","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"A few conventions applied across the library:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Type names use the so-called \"CamelCase\" convention, where the words are separated by a capital letter rather than _ ,while function names use lower letters only, with words eventually separated (but only when really neeed for readibility) by an _;\nWhile some functions provide a dims parameter, most BetaML algorithms expect the input data layout with observations organised by rows and fields/features by columns. Almost everywhere in the code and documentation we refer with N the number of observations/records, D the number of dimensions and K the number of classes/categories;\nWhile some algorithms accept as input DataFrames, the usage of standard arrays is encourages (if the data is passed to the function as dataframe, it may be converted to standard arrays somewhere inside inner loops, leading to great inefficiencies)\nThe accuracy/error/loss measures expect the ground true y and then the estimated ŷ (in this order)","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#using_betaml_from_other_languages","page":"Getting started","title":"Using BetaML from other programming languages","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"In this section we provide two examples of using BetaML directly in Python or R (with automatic object conversion). Click Details for a more extended explanation of these examples. While I have no experience with, the same approach can be used to access BetaML from any language with a binding to Julia, like Matlab or Javascript. ","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#Use-BetaML-in-Python","page":"Getting started","title":"Use BetaML in Python","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"$ python3 -m pip install --user juliacall","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> from juliacall import Main as jl\n>>> import numpy as np\n>>> from sklearn import datasets\n>>> jl.seval('using Pkg; Pkg.add(\"BetaML\")') # Only once \n>>> jl.seval(\"using BetaML\")\n>>> bml = jl.BetaML\n>>> iris = datasets.load_iris()\n>>> X = iris.data[:, :4]\n>>> y = iris.target + 1 # Julia arrays start from 1 not 0\n>>> (Xs,ys) = bml.consistent_shuffle([X,y])\n>>> m = bml.KMeansClusterer(n_classes=3)\n>>> yhat = bml.fit_ex(m,Xs) # Python doesn't allow exclamation marks in function names, so we use `fit_ex(⋅)` instead of `fit!(⋅)` (the original function name)\n>>> m._jl_display() # force a \"Julian\" way of displaying of Julia objects\n>>> acc = bml.accuracy(ys,yhat,ignorelabels=True)\n>>> acc\n 0.8933333333333333","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"
Details","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We show for Python two separate \"Julia from Python\" interfaces, PyJulia and JuliaCall with the second one being the most recent one.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#With-the-classical-pyjulia-package","page":"Getting started","title":"With the classical pyjulia package","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"PyJulia is a relativelly old method to use Julia code and libraries in Python. It works great but it requires that you already have a Julia working installation on your PC, so we need first to download and install the Julia binaries for our operating system from JuliaLang.org. Be sure that Julia is working by opening the Julia terminal and e.g. typing println(\"hello world\")","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Install PyJulia with: ","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"$ python3 -m pip install --user julia # the name of the package in `pip` is `julia`, not `PyJulia`","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"For the sake of this tutorial, let's also install in Python a package that contains the dataset that we will use:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"$ python3 -m pip install --user sklearn # only for retrieving the dataset in the python way","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can now open a Python terminal and, to obtain an interface to Julia, just run:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> import julia\n>>> julia.install() # Only once to set-up in julia the julia packages required by PyJulia\n>>> jl = julia.Julia(compiled_modules=False)","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"If we have multiple Julia versions, we can specify the one to use in Python passing julia=\"/path/to/julia/binary/executable\" (e.g. julia = \"/home/myUser/lib/julia-1.8.0/bin/julia\") to the install() function.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"The compiled_module=False in the Julia constructor is a workaround to the common situation when the Python interpreter is statically linked to libpython, but it will slow down the interactive experience, as it will disable Julia packages pre-compilation, and every time we will use a module for the first time, this will need to be compiled first. Other, more efficient but also more complicate, workarounds are given in the package documentation, under the https://pyjulia.readthedocs.io/en/stable/troubleshooting.html[Troubleshooting section].","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Let's now add to Julia the BetaML package. We can surely do it from within Julia, but we can also do it while remaining in Python:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> jl.eval('using Pkg; Pkg.add(\"BetaML\")') # Only once to install BetaML","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"While jl.eval('some Julia code') evaluates any arbitrary Julia code (see below), most of the time we can use Julia in a more direct way. Let's start by importing the BetaML Julia package as a submodule of the Python Julia module:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> from julia import BetaML\n>>> jl.eval('using BetaML')","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"As you can see, it is no different than importing any other Python module.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"For the data, let's load it \"Python side\":","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> from sklearn import datasets\n>>> iris = datasets.load_iris()\n>>> X = iris.data[:, :4]\n>>> y = iris.target + 1 # Julia arrays start from 1 not 0","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Note that X and y are Numpy arrays.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can now call BetaML functions as we would do for any other Python library functions. In particular, we can pass to the functions (and retrieve) complex data types without worrying too much about the conversion between Python and Julia types, as these are converted automatically:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> (Xs,ys) = BetaML.consistent_shuffle([X,y]) # X and y are first converted to julia arrays and then the returned julia arrays are converted back to python Numpy arrays\n>>> m = BetaML.KMeansClusterer(n_classes=3)\n>>> yhat = BetaML.fit_ex(m,Xs) # Python doesn't allow exclamation marks in function names, so we use `fit_ex(⋅)` instead of `fit!(⋅)`\n>>> acc = BetaML.accuracy(ys,yhat,ignorelabels=True)\n>>> acc\n 0.8933333333333333","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Note: If we are using the jl.eval() interface, the objects we use must be already known to julia. To pass objects from Python to Julia, import the julia Main module (the root module in julia) and assign the needed variables, e.g.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> X_python = [1,2,3,2,4]\n>>> from julia import Main\n>>> Main.X_julia = X_python\n>>> jl.eval('BetaML.gini(X_julia)')\n0.7199999999999999","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Another alternative is to \"eval\" only the function name and pass the (python) objects in the function call:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> jl.eval('BetaML.gini')(X_python)\n0.7199999999999999","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#With-the-newer-JuliaCall-python-package","page":"Getting started","title":"With the newer JuliaCall python package","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"JuliaCall is a newer way to use Julia in Python that doesn't require separate installation of Julia.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Istall it in Python using pip as well:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"$ python3 -m pip install --user juliacall","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can now open a Python terminal and, to obtain an interface to Julia, just run:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> from juliacall import Main as jl","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"If you have julia on PATH, it will use that version, otherwise it will automatically download and install a private version for JuliaCall","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"If we have multiple Julia versions, we can specify the one to use in Python passing julia=\"/path/to/julia/binary/executable\" (e.g. julia = \"/home/myUser/lib/julia-1.8.0/bin/julia\") to the install() function.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"To add BetaML to the JuliaCall private version we evaluate the julia package manager add function:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> jl.seval('using Pkg; Pkg.add(\"BetaML\")')# Only once to install BetaML","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"As with PyJulia we can evaluate arbitrary Julia code either using jl.seval('some Julia code') and by direct call, but let's first import BetaML:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> jl.seval(\"using BetaML\")\n>>> bml = jl.BetaML","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"For the data, we reuse the X and y Numpy arrays we loaded earlier.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can now call BetaML functions as we would do for any other Python library functions. In particular, we can pass to the functions (and retrieve) complex data types without worrying too much about the conversion between Python and Julia types, as these are converted automatically:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> (Xs,ys) = bml.consistent_shuffle([X,y])\n>>> m = bml.KMeansClusterer(n_classes=3)\n>>> yhat = bml.fit_ex(m,Xs)\n>>> m._jl_display() # force a \"Julian\" way of displaying of Julia objects\n>>> acc = bml.accuracy(ys,yhat,ignorelabels=True)\n>>> acc\n 0.8933333333333333","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Note: If we are using the jl.eval() interface, the objects we use must be already known to julia. To pass objects from Python to Julia, we can write a small Julia macro:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> X_python = [1,2,3,2,4]\n>>> jlstore = jl.seval(\"(k, v) -> (@eval $(Symbol(k)) = $v; return)\")\n>>> jlstore(\"X_julia\",X_python)\n>>> jl.seval(\"BetaML.gini(X_julia)\")\n0.7199999999999999","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Another alternative is to \"eval\" only the function name and pass the (python) objects in the function call:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":">>> X_python = [1,2,3,2,4]\n>>> jl.seval('BetaML.gini')(X_python)\n0.7199999999999999","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#Conclusions-about-using-BetaML-in-Python","page":"Getting started","title":"Conclusions about using BetaML in Python","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Using either the direct call or the eval function, wheter in Pyjulia or JuliaCall, we should be able to use all the BetaML functionalities directly from Python. If you run into problems using BetaML from Python, open an issue specifying your set-up.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"
","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#Use-BetaML-in-R","page":"Getting started","title":"Use BetaML in R","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"> install.packages(\"JuliaCall\") # only once\n> library(JuliaCall)\n> library(datasets)\n> julia_setup(installJulia = TRUE) # use installJulia = TRUE to let R download and install a private copy of julia, FALSE to use an existing Julia local installation\n> julia_eval('using Pkg; Pkg.add(\"BetaML\")') # only once\n> julia_eval(\"using BetaML\")\n> X <- as.matrix(sapply(iris[,1:4], as.numeric))\n> y <- sapply(iris[,5], as.integer)\n> xsize <- dim(X)\n> shuffled <- julia_call(\"consistent_shuffle\",list(X,y))\n> Xs <- matrix(sapply(shuffled[1],as.numeric), nrow=xsize[1])\n> ys <- as.vector(sapply(shuffled[2], as.integer))\n> m <- julia_eval('KMeansClusterer(n_classes=3)')\n> yhat <- julia_call(\"fit_ex\",m,Xs)\n> acc <- julia_call(\"accuracy\",yhat,ys,ignorelabels=TRUE)\n> acc\n[1] 0.8933333","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"
Details","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"For R, we show how to access BetaML functionalities using the JuliaCall R package (no relations with the homonymous Python package).","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Let's start by installing JuliaCall in R:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"> install.packages(\"JuliaCall\")\n> library(JuliaCall)\n> julia_setup(installJulia = TRUE) # use installJulia = TRUE to let R download and install a private copy of julia, FALSE to use an existing Julia local installation","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Note that, differently than PyJulia, the \"setup\" function needs to be called every time we start a new R section, not just when we install the JuliaCall package. If we don't have julia in the path of our system, or if we have multiple versions and we want to specify the one to work with, we can pass the JULIA_HOME = \"/path/to/julia/binary/executable/directory\" (e.g. JULIA_HOME = \"/home/myUser/lib/julia-1.1.0/bin\") parameter to the julia_setup call. Or just let JuliaCall automatically download and install a private copy of julia.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"JuliaCall depends for some things (like object conversion between Julia and R) from the Julia RCall package. If we don't already have it installed in Julia, it will try to install it automatically.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"As in Python, let's start from the data loaded from R and do some work with them in Julia:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"> library(datasets)\n> X <- as.matrix(sapply(iris[,1:4], as.numeric))\n> y <- sapply(iris[,5], as.integer)\n> xsize <- dim(X)","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Let's install BetaML. As we did in Python, we can install a Julia package from Julia itself or from within R:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"> julia_eval('using Pkg; Pkg.add(\"BetaML\")')","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can now \"import\" the BetaML julia package (in julia a \"Package\" is basically a module plus some metadata that facilitate its discovery and integration with other packages, like the reuired set) and call its functions with the julia_call(\"juliaFunction\",args) R function:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"> julia_eval(\"using BetaML\")\n> shuffled <- julia_call(\"consistent_shuffle\",list(X,y))\n> Xs <- matrix(sapply(shuffled[1],as.numeric), nrow=xsize[1])\n> ys <- as.vector(sapply(shuffled[2], as.integer))\n> m <- julia_eval('KMeansClusterer(n_classes=3)')\n> yhat <- julia_call(\"fit_ex\",m,Xs)\n> acc <- julia_call(\"accuracy\",yhat,ys,ignorelabels=TRUE)\n> acc\n[1] 0.8933333","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"As alternative, we can embed Julia code directly in R using the julia_eval() function:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"kMeansR <- julia_eval('\n function accFromKmeans(x,k,y)\n m = KMeansClusterer(n_classes=Int(k))\n yhat = fit!(m,x)\n acc = accuracy(yhat,y,ignorelabels=true)\n return acc\n end\n')","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"We can then call the above function in R in one of the following three ways:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"kMeansR(Xs,3,ys)\njulia_assign(\"Xs_julia\", Xs); julia_assign(\"ys_julia\", ys); julia_eval(\"accFromKmeans(Xs_julia,3,ys_julia)\")\njulia_call(\"accFromKmeans\",Xs,3,ys)","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"While other \"convenience\" functions are provided by the package, using julia_call, or julia_assign followed by julia_eval, should suffix to use BetaML from R. If you run into problems using BetaML from R, open an issue specifying your set-up.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"
","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#stochasticity_reproducibility","page":"Getting started","title":"Dealing with stochasticity and reproducibility","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Machine Learning workflows include stochastic components in several steps: in the data sampling, in the model initialisation and often in the models's own algorithms (and sometimes also in the prediction step). All BetaML models with a stochastic components support a rng parameter, standing for Random Number Generator. A RNG is a \"machine\" that streams a flow of random numbers. The flow itself however is deterministically determined for each \"seed\" (an integer number) that the RNG has been told to use. Normally this seed changes at each running of the script/model, so that stochastic models are indeed stochastic and their output differs at each run.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"If we want to obtain reproductible results we can fix the seed at the very beginning of our model with Random.seed!([AnInteger]). Now our model or script will pick up a specific flow of random numbers, but this flow will always be the same, so that its results will always be the same.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"However the default Julia RNG guarantee to provide the same flow of random numbers, conditional to the seed, only within minor versions of Julia. If we want to \"guarantee\" reproducibility of the results with different versions of Julia, or \"fix\" only some parts of our script, we can call the individual functions passing FIXEDRNG, an instance of StableRNG(FIXEDSEED) provided by BetaML, to the rng parameter. Use it with:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"MyModel(;rng=FIXEDRNG) : always produce the same sequence of results on each run of the script (\"pulling\" from the same rng object on different calls)\nMyModel(;rng=StableRNG(SOMEINTEGER)) : always produce the same result (new identical rng object on each call)","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"This is very convenient expecially during model development, as a model that use (...,rng=StableRNG(an_integer)) will provides stochastic results that are isolated (i.e. they don't depend from the consumption of the random stream from other parts of the model).","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"In particular, use rng=StableRNG(FIXEDSEED) or rng=copy(FIXEDRNG) with FIXEDSEED to retrieve the exact output as in the documentation or in the unit tests.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Most of the stochasticity appears in training a model. However in few cases (e.g. decision trees with missing values) some stochasticity appears also in predicting new data using a trained model. In such cases the model doesn't restrict the random seed, so that you can choose at predict time to use a fixed or a variable random seed.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Finally, if you plan to use multiple threads and want to provide the same stochastic output independent to the number of threads used, have a look at generate_parallel_rngs.","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"\"Reproducible stochasticity\" is only one of the elements needed for a reproductible output. The other two are (a) the inputs the workflow uses and (b) the code that is evaluated. Concerning the second point Julia has a very modern package system that guarantee reproducible code evaluation (with a few exception linked to using external libraries, but BetaML models are all implemented in Julia itself). Without going in detail, you can use a pattern like this at the beginning of your machine learning workflows:","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"using Pkg \ncd(@__DIR__) \nPkg.activate(\".\") # Activate a \"local\" environment, specific to this folder\nPkg.instantiate() # Download and install the required packages if not already available ","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"This will tell Julia to load the exact version of dependent packages, and recursively of their dependencies, from a Manifest.toml file that is automatically created in the script's folder, and automatically updated, when you add or update a package in your workflow. Note that these locals \"environments\" are very \"cheap\" (packages are not actually copied to each environment on your system, only referenced) and the environment doen't need to be in the same script folder as in this example, can be any folder you want to \"activate\".","category":"page"},{"location":"tutorials/Betaml_tutorial_getting_started.html#Saving-and-loading-trained-models","page":"Getting started","title":"Saving and loading trained models","text":"","category":"section"},{"location":"tutorials/Betaml_tutorial_getting_started.html","page":"Getting started","title":"Getting started","text":"Trained models can be saved on disk using the model_save function, and retrieved with model_load. The advantage over the serialization functionality in Julia core is that the two functions are actually wrappers around equivalent JLD2 package functions, and should maintain compatibility across different Julia versions. ","category":"page"},{"location":"GMM.html#gmm_module","page":"GMM","title":"The BetaML.GMM Module","text":"","category":"section"},{"location":"GMM.html","page":"GMM","title":"GMM","text":"GMM","category":"page"},{"location":"GMM.html#BetaML.GMM","page":"GMM","title":"BetaML.GMM","text":"GMM module\n\nGenerative (Gaussian) Mixed Model learners (supervised/unsupervised)\n\nProvides clustering and regressors using (Generative) Gaussiam Mixture Model (probabilistic).\n\nCollaborative filtering / missing values imputation / reccomendation systems based on GMM is available in the Imputation module.\n\nThe module provides the following models. Use ?[model] to access their documentation:\n\nGaussianMixtureClusterer: soft-clustering using GMM\nGaussianMixtureRegressor2: regressor using GMM as back-end (first algorithm)\nGaussianMixtureRegressor2: regressor using GMM as back-end (second algorithm)\n\nAll the algorithms works with arbitrary mixture distribution, altought only {Spherical|Diagonal|Full} Gaussian mixtures has been implemented. User defined mixtures can be used defining a struct as subtype of AbstractMixture and implementing for that mixture the following functions:\n\ninit_mixtures!(mixtures, X; minimum_variance, minimum_covariance, initialisation_strategy)\nlpdf(m,x,mask) (for the e-step)\nupdate_parameters!(mixtures, X, pₙₖ; minimum_variance, minimum_covariance) (the m-step)\nnpar(mixtures::Array{T,1}) (for the BIC/AIC computation)\n\nAll the GMM-based algorithms works only with numerical data, but accepts also Missing one.\n\nThe GaussianMixtureClusterer algorithm reports the BIC and the AIC in its info(model), but some metrics of the clustered output are also available, for example the silhouette score.\n\n\n\n\n\n","category":"module"},{"location":"GMM.html#Module-Index","page":"GMM","title":"Module Index","text":"","category":"section"},{"location":"GMM.html","page":"GMM","title":"GMM","text":"Modules = [GMM]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"GMM.html#Detailed-API","page":"GMM","title":"Detailed API","text":"","category":"section"},{"location":"GMM.html","page":"GMM","title":"GMM","text":"Modules = [GMM]\nPrivate = false","category":"page"},{"location":"GMM.html#BetaML.GMM.DiagonalGaussian-Union{Tuple{Union{Nothing, Vector{T}}}, Tuple{T}, Tuple{Union{Nothing, Vector{T}}, Union{Nothing, Vector{T}}}} where T","page":"GMM","title":"BetaML.GMM.DiagonalGaussian","text":"DiagonalGaussian(\n μ::Union{Nothing, Array{T, 1}}\n) -> DiagonalGaussian\nDiagonalGaussian(\n μ::Union{Nothing, Array{T, 1}},\n σ²::Union{Nothing, Array{T, 1}}\n) -> DiagonalGaussian\n\n\nDiagonalGaussian(μ,σ²) - Gaussian mixture with mean μ and variances σ² (and fixed zero covariances)\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.FullGaussian-Union{Tuple{Union{Nothing, Vector{T}}}, Tuple{T}, Tuple{Union{Nothing, Vector{T}}, Union{Nothing, Matrix{T}}}} where T","page":"GMM","title":"BetaML.GMM.FullGaussian","text":"FullGaussian(μ::Union{Nothing, Array{T, 1}}) -> FullGaussian\nFullGaussian(\n μ::Union{Nothing, Array{T, 1}},\n σ²::Union{Nothing, Array{T, 2}}\n) -> FullGaussian\n\n\nFullGaussian(μ,σ²) - Gaussian mixture with mean μ and variance/covariance matrix σ²\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.GaussianMixtureClusterer","page":"GMM","title":"BetaML.GMM.GaussianMixtureClusterer","text":"mutable struct GaussianMixtureClusterer <: BetaMLUnsupervisedModel\n\nAssign class probabilities to records (i.e. soft clustering) assuming a probabilistic generative model of observed data using mixtures.\n\nFor the parameters see ?GaussianMixture_hp and ?BML_options.\n\nNotes:\n\nData must be numerical\nMixtures can be user defined: see the ?GMM module documentation for a discussion on provided vs custom mixtures.\nOnline fitting (re-fitting with new data) is supported by setting the old learned mixtrures as the starting values\nThe model is fitted using an Expectation-Minimisation (EM) algorithm that supports Missing data and is implemented in the log-domain for better numerical accuracy with many dimensions\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8];\n\njulia> mod = GaussianMixtureClusterer(n_classes=2)\nGaussianMixtureClusterer - A Generative Mixture Model (unfitted)\n\njulia> prob_belong_classes = fit!(mod,X)\nIter. 1: Var. of the post 2.15612140465882 Log-likelihood -29.06452054772657\n5×2 Matrix{Float64}:\n 1.0 0.0\n 1.0 0.0\n 0.0 1.0\n 0.0 1.0\n 1.0 0.0\n\njulia> new_probs = fit!(mod,[11 0.9])\nIter. 1: Var. of the post 1.0 Log-likelihood -1.3312256125240092\n1×2 Matrix{Float64}:\n 0.0 1.0\n\njulia> info(mod)\nDict{String, Any} with 6 entries:\n \"xndims\" => 2\n \"error\" => [1.0, 0.0, 0.0]\n \"AIC\" => 15.7843\n \"fitted_records\" => 6\n \"lL\" => 1.10786\n \"BIC\" => -2.21571\n\njulia> parameters(mod)\nBetaML.GMM.GMMCluster_lp (a BetaMLLearnableParametersSet struct)\n- mixtures: DiagonalGaussian{Float64}[DiagonalGaussian{Float64}([0.9333333333333332, 9.9], [0.05, 0.05]), DiagonalGaussian{Float64}([11.05, 0.9500000000000001], [0.05, 0.05])]\n- initial_probmixtures: [0.0, 1.0]\n\n\n\n\n\n","category":"type"},{"location":"GMM.html#BetaML.GMM.GaussianMixtureRegressor","page":"GMM","title":"BetaML.GMM.GaussianMixtureRegressor","text":"mutable struct GaussianMixtureRegressor <: BetaMLUnsupervisedModel\n\nA multi-dimensional, missing data friendly non-linear regressor based on Generative (Gaussian) Mixture Model.\n\nThe training data is used to fit a probabilistic model with latent mixtures (Gaussian distributions with different covariances are already implemented) and then predictions of new data is obtained by fitting the new data to the mixtures.\n\nFor hyperparameters see GaussianMixture_hp and BML_options.\n\nThsi strategy (GaussianMixtureRegressor) works by training the EM algorithm on a combined (hcat) matrix of X and Y. At predict time, the new data is first fitted to the learned mixtures using the e-step part of the EM algorithm (and using missing values for the dimensions belonging to Y) to obtain the probabilistic assignment of each record to the various mixtures. Then these probabilities are multiplied to the mixture averages for the Y dimensions to obtain the predicted value(s) for each record. \n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8];\n\njulia> Y = X[:,1] .* 2 - X[:,2]\n5-element Vector{Float64}:\n -7.8999999999999995\n -8.0\n 18.9\n 23.4\n -8.200000000000001\n\njulia> mod = GaussianMixtureRegressor(n_classes=2)\nGaussianMixtureRegressor - A regressor based on Generative Mixture Model (unfitted)\n\njulia> ŷ = fit!(mod,X,Y)\nIter. 1: Var. of the post 2.2191120060614065 Log-likelihood -47.70971887023561\n5×1 Matrix{Float64}:\n -8.033333333333333\n -8.033333333333333\n 21.15\n 21.15\n -8.033333333333333\n\njulia> new_probs = predict(mod,[11 0.9])\n1×1 Matrix{Float64}:\n 21.15\n\njulia> info(mod)\nDict{String, Any} with 6 entries:\n \"xndims\" => 3\n \"error\" => [2.21911, 0.0260833, 3.19141e-39, 0.0]\n \"AIC\" => 60.0684\n \"fitted_records\" => 5\n \"lL\" => -17.0342\n \"BIC\" => 54.9911\n\njulia> parameters(mod)\nBetaML.GMM.GMMCluster_lp (a BetaMLLearnableParametersSet struct)\n- mixtures: DiagonalGaussian{Float64}[DiagonalGaussian{Float64}([0.9333333333333332, 9.9, -8.033333333333333], [1.1024999999999996, 0.05, 5.0625]), DiagonalGaussian{Float64}([11.05, 0.9500000000000001, 21.15], [1.1024999999999996, 0.05, 5.0625])]\n- initial_probmixtures: [0.6, 0.4]\n\n\n\n\n\n","category":"type"},{"location":"GMM.html#BetaML.GMM.GaussianMixtureRegressor2","page":"GMM","title":"BetaML.GMM.GaussianMixtureRegressor2","text":"mutable struct GaussianMixtureRegressor2 <: BetaMLUnsupervisedModel\n\nA multi-dimensional, missing data friendly non-linear regressor based on Generative (Gaussian) Mixture Model (strategy \"1\").\n\nThe training data is used to fit a probabilistic model with latent mixtures (Gaussian distributions with different covariances are already implemented) and then predictions of new data is obtained by fitting the new data to the mixtures.\n\nFor hyperparameters see GaussianMixture_hp and BML_options.\n\nThis strategy (GaussianMixtureRegressor2) works by fitting the EM algorithm on the feature matrix X. Once the data has been probabilistically assigned to the various classes, a mean value of fitting values Y is computed for each cluster (using the probabilities as weigths). At predict time, the new data is first fitted to the learned mixtures using the e-step part of the EM algorithm to obtain the probabilistic assignment of each record to the various mixtures. Then these probabilities are multiplied to the mixture averages for the Y dimensions learned at training time to obtain the predicted value(s) for each record. \n\nNotes:\n\nPredicted values are always a matrix, even when a single variable is predicted (use dropdims(ŷ,dims=2) to get a single vector).\n\nExample:\n\njulia> using BetaML\n\njulia> X = [1.1 10.1; 0.9 9.8; 10.0 1.1; 12.1 0.8; 0.8 9.8];\n\njulia> Y = X[:,1] .* 2 - X[:,2]\n5-element Vector{Float64}:\n -7.8999999999999995\n -8.0\n 18.9\n 23.4\n -8.200000000000001\n\njulia> mod = GaussianMixtureRegressor2(n_classes=2)\nGaussianMixtureRegressor2 - A regressor based on Generative Mixture Model (unfitted)\n\njulia> ŷ = fit!(mod,X,Y)\nIter. 1: Var. of the post 2.15612140465882 Log-likelihood -29.06452054772657\n5×1 Matrix{Float64}:\n -8.033333333333333\n -8.033333333333333\n 21.15\n 21.15\n -8.033333333333333\n\njulia> new_probs = predict(mod,[11 0.9])\n1×1 Matrix{Float64}:\n 21.15\n\njulia> info(mod)\nDict{String, Any} with 6 entries:\n \"xndims\" => 2\n \"error\" => [2.15612, 0.118848, 4.19495e-7, 0.0, 0.0]\n \"AIC\" => 32.7605\n \"fitted_records\" => 5\n \"lL\" => -7.38023\n \"BIC\" => 29.2454\n\n\n\n\n\n","category":"type"},{"location":"GMM.html#BetaML.GMM.GaussianMixture_hp","page":"GMM","title":"BetaML.GMM.GaussianMixture_hp","text":"mutable struct GaussianMixture_hp <: BetaMLHyperParametersSet\n\nHyperparameters for GMM clusters and other GMM-based algorithms\n\nParameters:\n\nn_classes: Number of mixtures (latent classes) to consider [def: 3]\ninitial_probmixtures: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]\nmixtures: An array (of length n_classes) of the mixtures to employ (see the ?GMM module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if the initialisation_strategy parameter is set to \"gived\". This parameter can also be given symply in term of a type. In this case it is automatically extended to a vector of n_classes mixtures of the specified type. Note that mixing of different mixture types is not currently supported and that currently implemented mixtures are SphericalGaussian, DiagonalGaussian and FullGaussian. [def: DiagonalGaussian]\ntol: Tolerance to stop the algorithm [default: 10^(-6)]\nminimum_variance: Minimum variance for the mixtures [default: 0.05]\nminimum_covariance: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance.\ninitialisation_strategy: The computation method of the vector of the initial mixtures. One of the following:\n\"grid\": using a grid approach\n\"given\": using the mixture provided in the fully qualified mixtures parameter\n\"kmeans\": use first kmeans (itself initialised with a \"grid\" strategy) to set the initial mixture centers [default]\nNote that currently \"random\" and \"shuffle\" initialisations are not supported in gmm-based algorithms.\n\nmaximum_iterations: Maximum number of iterations [def: 5000]\ntunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method (suitable for the GMM-based regressors) To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\n\n\n\n\n","category":"type"},{"location":"GMM.html#BetaML.GMM.SphericalGaussian-Union{Tuple{Union{Nothing, Vector{T}}}, Tuple{T}, Tuple{Union{Nothing, Vector{T}}, Union{Nothing, T}}} where T","page":"GMM","title":"BetaML.GMM.SphericalGaussian","text":"SphericalGaussian(\n μ::Union{Nothing, Array{T, 1}}\n) -> SphericalGaussian\nSphericalGaussian(\n μ::Union{Nothing, Array{T, 1}},\n σ²::Union{Nothing, T} where T\n) -> SphericalGaussian\n\n\nSphericalGaussian(μ,σ²) - Spherical Gaussian mixture with mean μ and (single) variance σ²\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.init_mixtures!-Union{Tuple{T}, Tuple{Vector{T}, Any}} where T<:BetaML.GMM.AbstractGaussian","page":"GMM","title":"BetaML.GMM.init_mixtures!","text":"init_mixtures!(mixtures::Array{T,1}, X; minimum_variance=0.25, minimum_covariance=0.0, initialisation_strategy=\"grid\",rng=Random.GLOBAL_RNG)\n\nThe parameter initialisation_strategy can be grid, kmeans or given:\n\ngrid: Uniformly cover the space observed by the data\nkmeans: Use the kmeans algorithm. If the data contains missing values, a first run of predictMissing is done under init=grid to impute the missing values just to allow the kmeans algorithm. Then the em algorithm is used with the output of kmean as init values.\ngiven: Leave the provided set of initial mixtures\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.lpdf-Tuple{DiagonalGaussian, Any, Any}","page":"GMM","title":"BetaML.GMM.lpdf","text":"lpdf(m::DiagonalGaussian,x,mask) - Log PDF of the mixture given the observation x\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.lpdf-Tuple{FullGaussian, Any, Any}","page":"GMM","title":"BetaML.GMM.lpdf","text":"lpdf(m::FullGaussian,x,mask) - Log PDF of the mixture given the observation x\n\n\n\n\n\n","category":"method"},{"location":"GMM.html#BetaML.GMM.lpdf-Tuple{SphericalGaussian, Any, Any}","page":"GMM","title":"BetaML.GMM.lpdf","text":"lpdf(m::SphericalGaussian,x,mask) - Log PDF of the mixture given the observation x\n\n\n\n\n\n","category":"method"},{"location":"StyleGuide_templates.html#Style-guide-and-template-for-BetaML-developers","page":"Style guide","title":"Style guide and template for BetaML developers","text":"","category":"section"},{"location":"StyleGuide_templates.html#Master-Style-guide","page":"Style guide","title":"Master Style guide","text":"","category":"section"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"The code in BetaML should follow the official Julia Style Guide.","category":"page"},{"location":"StyleGuide_templates.html#Names-style","page":"Style guide","title":"Names style","text":"","category":"section"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Each file name should start with a capital letter, no spaces allowed (and each file content should start with: \"Part of [BetaML](https://github.com/sylvaticus/BetaML.jl). Licence is MIT.\")\nType names use the so-called \"CamelCase\" convention, where the words are separated by a capital letter rather than _ ,while function names use lower letters only, with words eventually separated (but only when really neeed for readibility) by an _;\nIn the code and documentation we refer with N the number of observations/records, D the number of dimensions and K the number of classes/categories;\nError/accuracy/loss functions want firt y and then ŷ\nIn API exposed to users, strings are preferred to symbols","category":"page"},{"location":"StyleGuide_templates.html#Docstrings","page":"Style guide","title":"Docstrings","text":"","category":"section"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Please apply the following templates when writing a docstring for BetaML:","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Functions (add @docs if the function is not on the root module level, like for inner constructors, i.e. @docs \"\"\" foo()x ....\"\"\"):","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"\"\"\"\n$(TYPEDSIGNATURES)\n\nOne line description\n\n[Further description]\n\n# Parameters:\n\n\n\n# Returns:\n- Elements the funtion need\n\n# Notes:\n- notes\n\n# Example:\n` ` `julia\njulia> [code]\n[output]\n` ` `\n\"\"\"","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Structs","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"\"\"\"\n$(TYPEDEF)\n\nOne line description\n\n[Further description]\n\n# Fields: (if relevant)\n$(TYPEDFIELDS)\n\n# Notes:\n\n# Example:\n` ` `julia\njulia> [code]\n[output]\n` ` `\n\n\"\"\"","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Enums:","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"\"\"\"\n$(TYPEDEF)\n\nOne line description\n\n[Further description]\n\n\n# Notes:\n\n\"\"\"","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Constants","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"\"\"\"\n[4 spaces] [Constant name]\n\nOne line description\n\n[Further description]\n\n\n# Notes:\n\n\"\"\"","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"Modules","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"\"\"\"\n[4 spaces] [Module name]\n\nOne line description\n\nDetailed description on the module objectives, content and organisation\n\n\"\"\"","category":"page"},{"location":"StyleGuide_templates.html#Internal-links","page":"Style guide","title":"Internal links","text":"","category":"section"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"To refer to a documented object: [`NAME`](@ref) or [`NAME`](@ref manual_id). In particular for internal links use [`?NAME`](@ref ?NAME)","category":"page"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"To create an id manually: [Title](@id manual_id)","category":"page"},{"location":"StyleGuide_templates.html#Data-organisation","page":"Style guide","title":"Data organisation","text":"","category":"section"},{"location":"StyleGuide_templates.html","page":"Style guide","title":"Style guide","text":"While some functions provide a dims parameter, most BetaML algorithms expect the input data layout with observations organised by rows and fields/features by columns.\nWhile some algorithms accept as input DataFrames, the usage of standard arrays is encourages (if the data is passed to the function as dataframe, it may be converted to standard arrays somewhere inside inner loops, leading to great inefficiencies).","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"EditURL = \"betaml_tutorial_multibranch_nn.jl\"","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html#multibranch_nn_tutorial","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"","category":"section"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Often we can \"divide\" our feature sets into different groups, where for each group we have many, many variables whose importance in prediction we don't know, but for which using a fully dense layer would be too computationally expensive. For example, we want to predict the growth of forest trees based on soil characteristics, climate characteristics and a bunch of other data (species, age, density...).","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"A soil (or climate) database may have hundreds of variables, how can we reduce them to a few that encode all the \"soil\" information? Sure, we could do a PCA or a clustering analysis, but a better way is to let our model itself find a way to encode the soil information into a vector in a way that is optimal for our prediction goal, i.e. we target the encoding task at our prediction goal.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"So we run a multi-branch neural network where one branch is given by the soil variables - it starts from all the hundreds of variables and ends in a few neuron outputs, another branch in a similar way is for the climate variables, we merge them in a branch to take into account the soil-weather interrelation (for example, it is well known that the water retention capacity of a sandy soil is quite different from that of a clay soil) and finally we merge this branch with the other variable branch to arrive at a single predicted output. In this example we focus on building, training and predicting a multi-branch neural network. See the other examples for cross-validation, hyperparameter tuning, scaling, overfitting, encoding, etc.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Data origin:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"while we hope to apply this example soon on actual real world data, for now we work on synthetic random data just to assess the validity of the network configuration.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html#Library-and-data-generation","page":"A deep neural network with multi-branch architecture","title":"Library and data generation","text":"","category":"section"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Activating the local environment specific to the tutorials","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"using Pkg\nPkg.activate(joinpath(@__DIR__,\"..\",\"..\",\"..\"))","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"We first load all the packages we are going to use","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"using StableRNGs, BetaML, Plots","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Here we are explicit and we use our own fixed RNG:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"seed = 123\nAFIXEDRNG = StableRNG(seed)","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Here we generate the random data..","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"N = 100 # records\nsoilD = 20 # dimensions of the soil database\nclimateD = 30 # dimensions of the climate database\nothervarD = 10 # dimensions of the other variables database\n\nsoilX = rand(StableRNG(seed),N,soilD)\nclimateX = rand(StableRNG(seed+10),N,climateD)\nothervarX = rand(StableRNG(seed+20),N,othervarD)\nX = hcat(soilX,climateX,othervarX)\nY = rand(StableRNG(seed+30),N)","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html#Model-definition","page":"A deep neural network with multi-branch architecture","title":"Model definition","text":"","category":"section"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"(Image: Neural Network model)","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"In the figure above, each circle represents a multi-neuron layer, with the number of neurons (output dimensions) written inside. Dotted circles are RreplicatorLayers, which simply \"pass through\" the information to the next layer. Red layers represent the layers responsible for the final step in encoding the information for a given branch. Subsequent layers will use this encoded information (i.e. decode it) to finally provide the prediction for the branch. We create a first branch for the soil variables, a second for the climate variables and finally a third for the other variables. We merge the soil and climate branches in layer 4 and the resulting branch and the other variables branch in layer 6. Finally, the single neuron layer 8 provides the prediction.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"The weights along the whole chain can be learned using the traditional backpropagation algorithm.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"The whole model can be implemented with the following code:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 1:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l1_soil = DenseLayer(20,30,f=relu,rng=copy(AFIXEDRNG))\nl1_climate = ReplicatorLayer(30)\nl1_oth = ReplicatorLayer(10)\nl1 = GroupedLayer([l1_soil,l1_climate,l1_oth])","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 2:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l2_soil = DenseLayer(30,30,f=relu,rng=copy(AFIXEDRNG))\nl2_climate = DenseLayer(30,40,f=relu,rng=copy(AFIXEDRNG))\nl2_oth = ReplicatorLayer(10)\nl2 = GroupedLayer([l2_soil,l2_climate,l2_oth])","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 3:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l3_soil = DenseLayer(30,4,f=relu,rng=copy(AFIXEDRNG)) # encoding of soil properties\nl3_climate = DenseLayer(40,4,f=relu,rng=copy(AFIXEDRNG)) # encoding of climate properties\nl3_oth = DenseLayer(10,15,f=relu,rng=copy(AFIXEDRNG))\nl3 = GroupedLayer([l3_soil,l3_climate,l3_oth])","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 4:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l4_soilclim = DenseLayer(8,15,f=relu,rng=copy(AFIXEDRNG))\nl4_oth = DenseLayer(15,15,f=relu,rng=copy(AFIXEDRNG))\nl4 = GroupedLayer([l4_soilclim,l4_oth])","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 5:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l5_soilclim = DenseLayer(15,6,f=relu,rng=copy(AFIXEDRNG)) # encoding of soil and climate properties together\nl5_oth = DenseLayer(15,6,f=relu,rng=copy(AFIXEDRNG)) # encoding of other vars\nl5 = GroupedLayer([l5_soilclim,l5_oth])","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 6:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l6 = DenseLayer(12,15,f=relu,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 7:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l7 = DenseLayer(15,15,f=relu,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layer 8:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"l8 = DenseLayer(15,1,f=relu,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Finally we put the layers together and we create our NeuralNetworkEstimator model:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"layers = [l1,l2,l3,l4,l5,l6,l7,l8]\nm = NeuralNetworkEstimator(layers=layers,opt_alg=ADAM(),epochs=100,rng=copy(AFIXEDRNG))","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html#Fitting-the-model","page":"A deep neural network with multi-branch architecture","title":"Fitting the model","text":"","category":"section"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"We are now ready to fit the model to the data. By default BetaML models return directly the predictions of the trained data as the output of the fitting call, so there is no need to separate call predict(m,X).","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Ŷ = fit!(m,X,Y)","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html#Model-quality-assessment","page":"A deep neural network with multi-branch architecture","title":"Model quality assessment","text":"","category":"section"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"We can compute the relative mean error between the \"true\" Y and the Y estimated by the model.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"rme = relative_mean_error(Y,Ŷ)","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Of course we know there is no actual relation here between the X and The Y, as both are randomly generated, the result above just tell us that the network has been able to find a path between the X and Y that has been used for training, but we hope that in the real application this learned path represent a true, general relation beteen the inputs and the outputs.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"Finally we can also plot Y again Ŷ and visualize how the average loss reduced along the training:","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"scatter(Y,Ŷ,xlabel=\"vol observed\",ylabel=\"vol estimated\",label=nothing,title=\"Est vs. obs volumes\")","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"loss_per_epoch = info(m)[\"loss_per_epoch\"]\n\nplot(loss_per_epoch, xlabel=\"epoch\", ylabel=\"loss per epoch\", label=nothing, title=\"Loss per epoch\")","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"View this file on Github.","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"","category":"page"},{"location":"tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html","page":"A deep neural network with multi-branch architecture","title":"A deep neural network with multi-branch architecture","text":"This page was generated using Literate.jl.","category":"page"},{"location":"Api_v2_user.html#api_usage","page":"Introduction for user","title":"BetaML Api v2","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"note: Note\nThe API described below is the default one starting from BetaML v0.8.","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"The following API is designed to further simply the usage of the various ML models provided by BetaML introducing a common workflow. This is the user documentation. Refer to the developer documentation to learn how the API is implemented. ","category":"page"},{"location":"Api_v2_user.html#Supervised-,-unsupervised-and-transformed-models","page":"Introduction for user","title":"Supervised , unsupervised and transformed models","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Supervised refers to models designed to learn a relation between some features (often noted with X) and some labels (often noted with Y) in order to predict the label of new data given the observed features alone. Perceptron, decision trees or neural networks are common examples. Unsupervised and transformer models relate to models that learn a \"structure\" from the data itself (without any label attached from which to learn) and report either some new information using this learned structure (e.g. a cluster class) or directly process a transformation of the data itself, like PCAEncoder or missing imputers. There is no difference in BetaML about these kind of models, aside that the fitting (aka training) function for the former takes both the features and the labels. In particular there isn't a separate transform function as in other frameworks, but any information we need to learn using the model, wheter a label or some transformation of the original data, is provided by the predict function. ","category":"page"},{"location":"Api_v2_user.html#Model-constructor","page":"Introduction for user","title":"Model constructor","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"The first step is to build the model constructor by passing (using keyword arguments) the agorithm hyperparameters and various options (cache results flag, debug levels, random number generators, ...):","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"mod = ModelName(par1=X,par2=Y,...)","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Sometimes a parameter is itself another model, in such case we would have:","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"mod = ModelName(par1=OtherModel(a_par_of_OtherModel=X,...),par2=Y,...)","category":"page"},{"location":"Api_v2_user.html#Training-of-the-model","page":"Introduction for user","title":"Training of the model","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"The second step is to fit (aka train) the model:","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"fit!(m,X,[Y])","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"where Y is present only for supervised models.","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"For online algorithms, i.e. models that support updating of the learned parameters with new data, fit! can be repeated as new data arrive, altought not all algorithms guarantee that training each record at the time is equivalent to train all the records at once. In some algorithms the \"old training\" could be used as initial conditions, without consideration if these has been achieved with hundread or millions of records, and the new data we use for training become much more important than the old one for the determination of the learned parameters.","category":"page"},{"location":"Api_v2_user.html#Prediction","page":"Introduction for user","title":"Prediction","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Fitted models can be used to predict y (wheter the label, some desired new information or a transformation) given new X:","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"ŷ = predict(mod,X)","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"As a convenience, if the model has been trained while having the cache option set on true (by default) the ŷ of the last training is retained in the model object and it can be retrieved simply with predict(mod). Also in such case the fit! function returns ŷ instead of nothing effectively making it to behave like a fit-and-transform function. The 3 expressions below are hence equivalent :","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"ŷ = fit!(mod,xtrain) # only with `cache=true` in the model constructor (default)\nŷ1 = predict(mod) # only with `cache=true` in the model constructor (default)\nŷ2 = predict(mod,xtrain) ","category":"page"},{"location":"Api_v2_user.html#Other-functions","page":"Introduction for user","title":"Other functions","text":"","category":"section"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Models can be resetted to lose the learned information with reset!(mod) and training information (other than the algorithm learned parameters, see below) can be retrieved with info(mod).","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Hyperparameters, options and learned parameters can be retrieved with the functions hyperparameters, parameters and options respectively. Note that they can be used also to set new values to the model as they return a reference to the required objects.","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"note: Note\nWhich is the difference between the output of info, parameters and the predict function ? The predict function (and, when cache is used, the fit! one too) returns the main information required from the model.. the prediceted label for supervised models, the class assignment for clusters or the reprojected data for PCA.... info returns complementary information like the number of dimensions of the data or the number of data emploied for training. It doesn't include information that is necessary for the training itself, like the centroids in cluser analysis. These can be retrieved instead using parameters that include all and only the information required to compute predict. ","category":"page"},{"location":"Api_v2_user.html","page":"Introduction for user","title":"Introduction for user","text":"Some models allow an inverse transformation, that using the parameters learned at trainign time (e.g. the scale factors) perform an inverse tranformation of new data to the space of the training data (e.g. the unscaled space). Use inverse_predict(mod,xnew).","category":"page"},{"location":"MLJ_interface.html#bmlj_module","page":"MLJ interface","title":"The MLJ interface to BetaML Models","text":"","category":"section"},{"location":"MLJ_interface.html","page":"MLJ interface","title":"MLJ interface","text":"Bmlj\n","category":"page"},{"location":"MLJ_interface.html#BetaML.Bmlj","page":"MLJ interface","title":"BetaML.Bmlj","text":"MLJ interface for BetaML models\n\nIn this module we define the interface of several BetaML models. They can be used using the MLJ framework.\n\nNote that MLJ models (whose name could be the same as the underlying BetaML model) are not exported. You can access them with BetaML.Bmlj.ModelXYZ.\n\n\n\n\n\n","category":"module"},{"location":"MLJ_interface.html#Models-available-through-MLJ","page":"MLJ interface","title":"Models available through MLJ","text":"","category":"section"},{"location":"MLJ_interface.html","page":"MLJ interface","title":"MLJ interface","text":"Modules = [Bmlj]\nOrder = [:function, :constant, :type, :macro]","category":"page"},{"location":"MLJ_interface.html#Detailed-models-documentation","page":"MLJ interface","title":"Detailed models documentation","text":"","category":"section"},{"location":"MLJ_interface.html","page":"MLJ interface","title":"MLJ interface","text":"Modules = [Bmlj]\nPrivate = true","category":"page"},{"location":"MLJ_interface.html#BetaML.Bmlj.AutoEncoder","page":"MLJ interface","title":"BetaML.Bmlj.AutoEncoder","text":"mutable struct AutoEncoder <: MLJModelInterface.Unsupervised\n\nA ready-to use AutoEncoder, from the Beta Machine Learning Toolkit (BetaML) for ecoding and decoding of data using neural networks\n\nParameters:\n\nencoded_size: The number of neurons (i.e. dimensions) of the encoded data. If the value is a float it is consiered a percentual (to be rounded) of the dimensionality of the data [def: 0.33]\nlayers_size: Inner layer dimension (i.e. number of neurons). If the value is a float it is considered a percentual (to be rounded) of the dimensionality of the data [def: nothing that applies a specific heuristic]. Consider that the underlying neural network is trying to predict multiple values at the same times. Normally this requires many more neurons than a scalar prediction. If e_layers or d_layers are specified, this parameter is ignored for the respective part.\ne_layers: The layers (vector of AbstractLayers) responsable of the encoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]. See subtypes(BetaML.AbstractLayer) for supported layers\nd_layers: The layers (vector of AbstractLayers) responsable of the decoding of the data [def: nothing, i.e. two dense layers with the inner one of layers_size]. See subtypes(BetaML.AbstractLayer) for supported layers\nloss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as (n x d) matrices.\nwarning: Warning\nIf you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.\n\ndloss: Derivative of the loss function [def: BetaML.dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]\nbatch_size: Size of each individual batch [def: 8]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()] See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ntunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).\n\ndescr: An optional title and/or description for this model\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\ndata must be numerical\nuse transform to obtain the encoded data, and inverse_trasnform to decode to the original data\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load AutoEncoder pkg = \"BetaML\" verbosity=0;\n\njulia> model = modelType(encoded_size=2,layers_size=10);\n\njulia> mach = machine(model, X)\nuntrained Machine; caches model-specific representations of data\n model: AutoEncoder(e_layers = nothing, …)\n args: \n 1:\tSource @334 ⏎ Table{AbstractVector{Continuous}}\n\njulia> fit!(mach,verbosity=2)\n[ Info: Training machine(AutoEncoder(e_layers = nothing, …), …).\n***\n*** Training for 200 epochs with algorithm BetaML.Nn.ADAM.\nTraining.. \t avg loss on epoch 1 (1): \t 35.48243542158747\nTraining.. \t avg loss on epoch 20 (20): \t 0.07528042222678126\nTraining.. \t avg loss on epoch 40 (40): \t 0.06293071729378613\nTraining.. \t avg loss on epoch 60 (60): \t 0.057035588828991145\nTraining.. \t avg loss on epoch 80 (80): \t 0.056313167754822875\nTraining.. \t avg loss on epoch 100 (100): \t 0.055521461091809436\nTraining the Neural Network... 52%|██████████████████████████████████████ | ETA: 0:00:01Training.. \t avg loss on epoch 120 (120): \t 0.06015206472927942\nTraining.. \t avg loss on epoch 140 (140): \t 0.05536835903285201\nTraining.. \t avg loss on epoch 160 (160): \t 0.05877560142428245\nTraining.. \t avg loss on epoch 180 (180): \t 0.05476302769966953\nTraining.. \t avg loss on epoch 200 (200): \t 0.049240864053557445\nTraining the Neural Network... 100%|█████████████████████████████████████████████████████████████████████████| Time: 0:00:01\nTraining of 200 epoch completed. Final epoch error: 0.049240864053557445.\ntrained Machine; caches model-specific representations of data\n model: AutoEncoder(e_layers = nothing, …)\n args: \n 1:\tSource @334 ⏎ Table{AbstractVector{Continuous}}\n\n\njulia> X_latent = transform(mach, X)\n150×2 Matrix{Float64}:\n 7.01701 -2.77285\n 6.50615 -2.9279\n 6.5233 -2.60754\n ⋮ \n 6.70196 -10.6059\n 6.46369 -11.1117\n 6.20212 -10.1323\n\njulia> X_recovered = inverse_transform(mach,X_latent)\n150×4 Matrix{Float64}:\n 5.04973 3.55838 1.43251 0.242215\n 4.73689 3.19985 1.44085 0.295257\n 4.65128 3.25308 1.30187 0.244354\n ⋮ \n 6.50077 2.93602 5.3303 1.87647\n 6.38639 2.83864 5.54395 2.04117\n 6.01595 2.67659 5.03669 1.83234\n\njulia> BetaML.relative_mean_error(MLJ.matrix(X),X_recovered)\n0.03387721261716176\n\n\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.DecisionTreeClassifier","page":"MLJ interface","title":"BetaML.Bmlj.DecisionTreeClassifier","text":"mutable struct DecisionTreeClassifier <: MLJModelInterface.Probabilistic\n\nA simple Decision Tree model for classification with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nmax_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. look at all features]\nsplitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: gini]. Either gini, entropy or a custom function. It can also be an anonymous function.\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load DecisionTreeClassifier pkg = \"BetaML\" verbosity=0\nBetaML.Trees.DecisionTreeClassifier\n\njulia> model = modelType()\nDecisionTreeClassifier(\n max_depth = 0, \n min_gain = 0.0, \n min_records = 2, \n max_features = 0, \n splitting_criterion = BetaML.Utils.gini, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(DecisionTreeClassifier(max_depth = 0, …), …).\n\njulia> cat_est = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt32, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>0.0, virginica=>0.0)\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>0.0, virginica=>0.0)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)\n UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)\n UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.DecisionTreeRegressor","page":"MLJ interface","title":"BetaML.Bmlj.DecisionTreeRegressor","text":"mutable struct DecisionTreeRegressor <: MLJModelInterface.Deterministic\n\nA simple Decision Tree model for regression with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nmax_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. look at all features]\nsplitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: variance]. Either variance or a custom function. It can also be an anonymous function.\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> modelType = @load DecisionTreeRegressor pkg = \"BetaML\" verbosity=0\nBetaML.Trees.DecisionTreeRegressor\n\njulia> model = modelType()\nDecisionTreeRegressor(\n max_depth = 0, \n min_gain = 0.0, \n min_records = 2, \n max_features = 0, \n splitting_criterion = BetaML.Utils.variance, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(DecisionTreeRegressor(max_depth = 0, …), …).\n\njulia> ŷ = predict(mach, X);\n\njulia> hcat(y,ŷ)\n506×2 Matrix{Float64}:\n 24.0 26.35\n 21.6 21.6\n 34.7 34.8\n ⋮ \n 23.9 23.75\n 22.0 22.2\n 11.9 13.2\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.GaussianMixtureClusterer","page":"MLJ interface","title":"BetaML.Bmlj.GaussianMixtureClusterer","text":"mutable struct GaussianMixtureClusterer <: MLJModelInterface.Unsupervised\n\nA Expectation-Maximisation clustering algorithm with customisable mixtures, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nn_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]\ninitial_probmixtures::AbstractVector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]\nmixtures::Union{Type, Vector{var\"#s1270\"} where var\"#s1270\"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the ?GMM module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if the initialisation_strategy parameter is set to \"gived\". This parameter can also be given symply in term of a type. In this case it is automatically extended to a vector of n_classes mixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def: [DiagonalGaussian() for i in 1:n_classes]]\ntol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]\nminimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]\nminimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).\ninitialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:\n\"grid\": using a grid approach\n\"given\": using the mixture provided in the fully qualified mixtures parameter\n\"kmeans\": use first kmeans (itself initialised with a \"grid\" strategy) to set the initial mixture centers [default]\nNote that currently \"random\" and \"shuffle\" initialisations are not supported in gmm-based algorithms.\nmaximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]\nrng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nExample:\n\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load GaussianMixtureClusterer pkg = \"BetaML\" verbosity=0\nBetaML.GMM.GaussianMixtureClusterer\n\njulia> model = modelType()\nGaussianMixtureClusterer(\n n_classes = 3, \n initial_probmixtures = Float64[], \n mixtures = BetaML.GMM.DiagonalGaussian{Float64}[BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing)], \n tol = 1.0e-6, \n minimum_variance = 0.05, \n minimum_covariance = 0.0, \n initialisation_strategy = \"kmeans\", \n maximum_iterations = 9223372036854775807, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(GaussianMixtureClusterer(n_classes = 3, …), …).\nIter. 1: Var. of the post 10.800150114964184 Log-likelihood -650.0186451891216\n\njulia> classes_est = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, Int64, UInt32, Float64}:\n UnivariateFinite{Multiclass{3}}(1=>1.0, 2=>4.17e-15, 3=>2.1900000000000003e-31)\n UnivariateFinite{Multiclass{3}}(1=>1.0, 2=>1.25e-13, 3=>5.87e-31)\n UnivariateFinite{Multiclass{3}}(1=>1.0, 2=>4.5e-15, 3=>1.55e-32)\n UnivariateFinite{Multiclass{3}}(1=>1.0, 2=>6.93e-14, 3=>3.37e-31)\n ⋮\n UnivariateFinite{Multiclass{3}}(1=>5.39e-25, 2=>0.0167, 3=>0.983)\n UnivariateFinite{Multiclass{3}}(1=>7.5e-29, 2=>0.000106, 3=>1.0)\n UnivariateFinite{Multiclass{3}}(1=>1.6e-20, 2=>0.594, 3=>0.406)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.GaussianMixtureImputer","page":"MLJ interface","title":"BetaML.Bmlj.GaussianMixtureImputer","text":"mutable struct GaussianMixtureImputer <: MLJModelInterface.Unsupervised\n\nImpute missing values using a probabilistic approach (Gaussian Mixture Models) fitted using the Expectation-Maximisation algorithm, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nn_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]\ninitial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]\nmixtures::Union{Type, Vector{var\"#s1270\"} where var\"#s1270\"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module in BetaML). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to \"gived\" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported and that currently implemented mixtures areSphericalGaussian,DiagonalGaussianandFullGaussian. [def:DiagonalGaussian`]\ntol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]\nminimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]\nminimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance.\ninitialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:\n\"grid\": using a grid approach\n\"given\": using the mixture provided in the fully qualified mixtures parameter\n\"kmeans\": use first kmeans (itself initialised with a \"grid\" strategy) to set the initial mixture centers [default]\nNote that currently \"random\" and \"shuffle\" initialisations are not supported in gmm-based algorithms.\n\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample :\n\njulia> using MLJ\n\njulia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;\n\njulia> modelType = @load GaussianMixtureImputer pkg = \"BetaML\" verbosity=0\nBetaML.Imputation.GaussianMixtureImputer\n\njulia> model = modelType(initialisation_strategy=\"grid\")\nGaussianMixtureImputer(\n n_classes = 3, \n initial_probmixtures = Float64[], \n mixtures = BetaML.GMM.DiagonalGaussian{Float64}[BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing)], \n tol = 1.0e-6, \n minimum_variance = 0.05, \n minimum_covariance = 0.0, \n initialisation_strategy = \"grid\", \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(GaussianMixtureImputer(n_classes = 3, …), …).\nIter. 1: Var. of the post 2.0225921341714286 Log-likelihood -42.96100103213314\n\njulia> X_full = transform(mach) |> MLJ.matrix\n9×2 Matrix{Float64}:\n 1.0 10.5\n 1.5 14.7366\n 1.8 8.0\n 1.7 15.0\n 3.2 40.0\n 2.51842 15.1747\n 3.3 38.0\n 2.47412 -2.3\n 5.2 -2.4\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.GaussianMixtureRegressor","page":"MLJ interface","title":"BetaML.Bmlj.GaussianMixtureRegressor","text":"mutable struct GaussianMixtureRegressor <: MLJModelInterface.Deterministic\n\nA non-linear regressor derived from fitting the data on a probabilistic model (Gaussian Mixture Model). Relatively fast but generally not very precise, except for data with a structure matching the chosen underlying mixture.\n\nThis is the single-target version of the model. If you want to predict several labels (y) at once, use the MLJ model MultitargetGaussianMixtureRegressor.\n\nHyperparameters:\n\nn_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]\ninitial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]\nmixtures::Union{Type, Vector{var\"#s1270\"} where var\"#s1270\"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to \"gived\" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def:[DiagonalGaussian() for i in 1:n_classes]`]\ntol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]\nminimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]\nminimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).\ninitialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:\n\"grid\": using a grid approach\n\"given\": using the mixture provided in the fully qualified mixtures parameter\n\"kmeans\": use first kmeans (itself initialised with a \"grid\" strategy) to set the initial mixture centers [default]\nNote that currently \"random\" and \"shuffle\" initialisations are not supported in gmm-based algorithms.\n\nmaximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]\nrng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> modelType = @load GaussianMixtureRegressor pkg = \"BetaML\" verbosity=0\nBetaML.GMM.GaussianMixtureRegressor\n\njulia> model = modelType()\nGaussianMixtureRegressor(\n n_classes = 3, \n initial_probmixtures = Float64[], \n mixtures = BetaML.GMM.DiagonalGaussian{Float64}[BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing)], \n tol = 1.0e-6, \n minimum_variance = 0.05, \n minimum_covariance = 0.0, \n initialisation_strategy = \"kmeans\", \n maximum_iterations = 9223372036854775807, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(GaussianMixtureRegressor(n_classes = 3, …), …).\nIter. 1: Var. of the post 21.74887448784976 Log-likelihood -21687.09917379566\n\njulia> ŷ = predict(mach, X)\n506-element Vector{Float64}:\n 24.703442835305577\n 24.70344283512716\n ⋮\n 17.172486989759676\n 17.172486989759644\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.GeneralImputer","page":"MLJ interface","title":"BetaML.Bmlj.GeneralImputer","text":"mutable struct GeneralImputer <: MLJModelInterface.Unsupervised\n\nImpute missing values using arbitrary learning models, from the Beta Machine Learning Toolkit (BetaML).\n\nImpute missing values using a vector (one per column) of arbitrary learning models (classifiers/regressors, not necessarily from BetaML) that implement the interface m = Model([options]), train!(m,X,Y) and predict(m,X).\n\nHyperparameters:\n\ncols_to_impute::Union{String, Vector{Int64}}: Columns in the matrix for which to create an imputation model, i.e. to impute. It can be a vector of columns IDs (positions), or the keywords \"auto\" (default) or \"all\". With \"auto\" the model automatically detects the columns with missing data and impute only them. You may manually specify the columns or use \"all\" if you want to create a imputation model for that columns during training even if all training data are non-missing to apply then the training model to further data with possibly missing values.\nestimator::Any: An entimator model (regressor or classifier), with eventually its options (hyper-parameters), to be used to impute the various columns of the matrix. It can also be a cols_to_impute-length vector of different estimators to consider a different estimator for each column (dimension) to impute, for example when some columns are categorical (and will hence require a classifier) and some others are numerical (hence requiring a regressor). [default: nothing, i.e. use BetaML random forests, handling classification and regression jobs automatically].\nmissing_supported::Union{Bool, Vector{Bool}}: Wheter the estimator(s) used to predict the missing data support itself missing data in the training features (X). If not, when the model for a certain dimension is fitted, dimensions with missing data in the same rows of those where imputation is needed are dropped and then only non-missing rows in the other remaining dimensions are considered. It can be a vector of boolean values to specify this property for each individual estimator or a single booleann value to apply to all the estimators [default: false]\nfit_function::Union{Function, Vector{Function}}: The function used by the estimator(s) to fit the model. It should take as fist argument the model itself, as second argument a matrix representing the features, and as third argument a vector representing the labels. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.fit!]\npredict_function::Union{Function, Vector{Function}}: The function used by the estimator(s) to predict the labels. It should take as fist argument the model itself and as second argument a matrix representing the features. This parameter is mandatory for non-BetaML estimators and can be a single value or a vector (one per estimator) in case of different estimator packages used. [default: BetaML.predict]\nrecursive_passages::Int64: Define the number of times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]. Note that this influence only the specific GeneralImputer code, the individual estimators may have their own rng (or similar) parameter.\n\nExamples :\n\nUsing BetaML models:\n\njulia> using MLJ;\njulia> import BetaML # The library from which to get the individual estimators to be used for each column imputation\njulia> X = [\"a\" 8.2;\n \"a\" missing;\n \"a\" 7.8;\n \"b\" 21;\n \"b\" 18;\n \"c\" -0.9;\n missing 20;\n \"c\" -1.8;\n missing -2.3;\n \"c\" -2.4] |> table ;\njulia> modelType = @load GeneralImputer pkg = \"BetaML\" verbosity=0\nBetaML.Imputation.GeneralImputer\njulia> model = modelType(estimator=BetaML.DecisionTreeEstimator(),recursive_passages=2);\njulia> mach = machine(model, X);\njulia> fit!(mach);\n[ Info: Training machine(GeneralImputer(cols_to_impute = auto, …), …).\njulia> X_full = transform(mach) |> MLJ.matrix\n10×2 Matrix{Any}:\n \"a\" 8.2\n \"a\" 8.0\n \"a\" 7.8\n \"b\" 21\n \"b\" 18\n \"c\" -0.9\n \"b\" 20\n \"c\" -1.8\n \"c\" -2.3\n \"c\" -2.4\n\nUsing third party packages (in this example DecisionTree):\n\njulia> using MLJ;\njulia> import DecisionTree # An example of external estimators to be used for each column imputation\njulia> X = [\"a\" 8.2;\n \"a\" missing;\n \"a\" 7.8;\n \"b\" 21;\n \"b\" 18;\n \"c\" -0.9;\n missing 20;\n \"c\" -1.8;\n missing -2.3;\n \"c\" -2.4] |> table ;\njulia> modelType = @load GeneralImputer pkg = \"BetaML\" verbosity=0\nBetaML.Imputation.GeneralImputer\njulia> model = modelType(estimator=[DecisionTree.DecisionTreeClassifier(),DecisionTree.DecisionTreeRegressor()], fit_function=DecisionTree.fit!,predict_function=DecisionTree.predict,recursive_passages=2);\njulia> mach = machine(model, X);\njulia> fit!(mach);\n[ Info: Training machine(GeneralImputer(cols_to_impute = auto, …), …).\njulia> X_full = transform(mach) |> MLJ.matrix\n10×2 Matrix{Any}:\n \"a\" 8.2\n \"a\" 7.51111\n \"a\" 7.8\n \"b\" 21\n \"b\" 18\n \"c\" -0.9\n \"b\" 20\n \"c\" -1.8\n \"c\" -2.3\n \"c\" -2.4\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.KMeansClusterer","page":"MLJ interface","title":"BetaML.Bmlj.KMeansClusterer","text":"mutable struct KMeansClusterer <: MLJModelInterface.Unsupervised\n\nThe classical KMeansClusterer clustering algorithm, from the Beta Machine Learning Toolkit (BetaML).\n\nParameters:\n\nn_classes::Int64: Number of classes to discriminate the data [def: 3]\ndist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance), cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that, contrary to KMedoidsClusterer, the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.\ninitialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:\n\"random\": randomly in the X space\n\"grid\": using a grid approach\n\"shuffle\": selecting randomly within the available points [default]\n\"given\": using a provided set of initial representatives provided in the initial_representatives parameter\n\ninitial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy=\"given\") [default: nothing]\nrng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\ndata must be numerical\nonline fitting (re-fitting with new data) is supported\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load KMeansClusterer pkg = \"BetaML\" verbosity=0\nBetaML.Clustering.KMeansClusterer\n\njulia> model = modelType()\nKMeansClusterer(\n n_classes = 3, \n dist = BetaML.Clustering.var\"#34#36\"(), \n initialisation_strategy = \"shuffle\", \n initial_representatives = nothing, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(KMeansClusterer(n_classes = 3, …), …).\n\njulia> classes_est = predict(mach, X);\n\njulia> hcat(y,classes_est)\n150×2 CategoricalArrays.CategoricalArray{Union{Int64, String},2,UInt32}:\n \"setosa\" 2\n \"setosa\" 2\n \"setosa\" 2\n ⋮ \n \"virginica\" 3\n \"virginica\" 3\n \"virginica\" 1\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.KMedoidsClusterer","page":"MLJ interface","title":"BetaML.Bmlj.KMedoidsClusterer","text":"mutable struct KMedoidsClusterer <: MLJModelInterface.Unsupervised\n\nParameters:\n\nn_classes::Int64: Number of classes to discriminate the data [def: 3]\ndist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance), cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics.\ninitialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:\n\"random\": randomly in the X space\n\"grid\": using a grid approach\n\"shuffle\": selecting randomly within the available points [default]\n\"given\": using a provided set of initial representatives provided in the initial_representatives parameter\n\ninitial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy=\"given\") [default: nothing]\nrng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nThe K-medoids clustering algorithm with customisable distance function, from the Beta Machine Learning Toolkit (BetaML).\n\nSimilar to K-Means, but the \"representatives\" (the cetroids) are guaranteed to be one of the training points. The algorithm work with any arbitrary distance measure.\n\nNotes:\n\ndata must be numerical\nonline fitting (re-fitting with new data) is supported\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load KMedoidsClusterer pkg = \"BetaML\" verbosity=0\nBetaML.Clustering.KMedoidsClusterer\n\njulia> model = modelType()\nKMedoidsClusterer(\n n_classes = 3, \n dist = BetaML.Clustering.var\"#39#41\"(), \n initialisation_strategy = \"shuffle\", \n initial_representatives = nothing, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(KMedoidsClusterer(n_classes = 3, …), …).\n\njulia> classes_est = predict(mach, X);\n\njulia> hcat(y,classes_est)\n150×2 CategoricalArrays.CategoricalArray{Union{Int64, String},2,UInt32}:\n \"setosa\" 3\n \"setosa\" 3\n \"setosa\" 3\n ⋮ \n \"virginica\" 1\n \"virginica\" 1\n \"virginica\" 2\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.KernelPerceptronClassifier","page":"MLJ interface","title":"BetaML.Bmlj.KernelPerceptronClassifier","text":"mutable struct KernelPerceptronClassifier <: MLJModelInterface.Probabilistic\n\nThe kernel perceptron algorithm using one-vs-one for multiclass, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nkernel::Function: Kernel function to employ. See ?radial_kernel or ?polynomial_kernel (once loaded the BetaML package) for details or check ?BetaML.Utils to verify if other kernels are defined (you can alsways define your own kernel) [def: radial_kernel]\nepochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 100]\ninitial_errors::Union{Nothing, Vector{Vector{Int64}}}: Initial distribution of the number of errors errors [def: nothing, i.e. zeros]. If provided, this should be a nModels-lenght vector of nRecords integer values vectors , where nModels is computed as (n_classes * (n_classes - 1)) / 2\nshuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load KernelPerceptronClassifier pkg = \"BetaML\"\n[ Info: For silent loading, specify `verbosity=0`. \nimport BetaML ✔\nBetaML.Perceptron.KernelPerceptronClassifier\n\njulia> model = modelType()\nKernelPerceptronClassifier(\n kernel = BetaML.Utils.radial_kernel, \n epochs = 100, \n initial_errors = nothing, \n shuffle = true, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n\njulia> est_classes = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt8, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>0.665, versicolor=>0.245, virginica=>0.09)\n UnivariateFinite{Multiclass{3}}(setosa=>0.665, versicolor=>0.245, virginica=>0.09)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>0.09, versicolor=>0.245, virginica=>0.665)\n UnivariateFinite{Multiclass{3}}(setosa=>0.09, versicolor=>0.665, virginica=>0.245)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.MultitargetGaussianMixtureRegressor","page":"MLJ interface","title":"BetaML.Bmlj.MultitargetGaussianMixtureRegressor","text":"mutable struct MultitargetGaussianMixtureRegressor <: MLJModelInterface.Deterministic\n\nA non-linear regressor derived from fitting the data on a probabilistic model (Gaussian Mixture Model). Relatively fast but generally not very precise, except for data with a structure matching the chosen underlying mixture.\n\nThis is the multi-target version of the model. If you want to predict a single label (y), use the MLJ model GaussianMixtureRegressor.\n\nHyperparameters:\n\nn_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]\ninitial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]\nmixtures::Union{Type, Vector{var\"#s1270\"} where var\"#s1270\"<:AbstractMixture}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to \"gived\" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported. [def:[DiagonalGaussian() for i in 1:n_classes]`]\ntol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]\nminimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]\nminimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance (see notes).\ninitialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:\n\"grid\": using a grid approach\n\"given\": using the mixture provided in the fully qualified mixtures parameter\n\"kmeans\": use first kmeans (itself initialised with a \"grid\" strategy) to set the initial mixture centers [default]\nNote that currently \"random\" and \"shuffle\" initialisations are not supported in gmm-based algorithms.\n\nmaximum_iterations::Int64: Maximum number of iterations [def: typemax(Int64), i.e. ∞]\nrng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> ydouble = hcat(y, y .*2 .+5);\n\njulia> modelType = @load MultitargetGaussianMixtureRegressor pkg = \"BetaML\" verbosity=0\nBetaML.GMM.MultitargetGaussianMixtureRegressor\n\njulia> model = modelType()\nMultitargetGaussianMixtureRegressor(\n n_classes = 3, \n initial_probmixtures = Float64[], \n mixtures = BetaML.GMM.DiagonalGaussian{Float64}[BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing)], \n tol = 1.0e-6, \n minimum_variance = 0.05, \n minimum_covariance = 0.0, \n initialisation_strategy = \"kmeans\", \n maximum_iterations = 9223372036854775807, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, ydouble);\n\njulia> fit!(mach);\n[ Info: Training machine(MultitargetGaussianMixtureRegressor(n_classes = 3, …), …).\nIter. 1: Var. of the post 20.46947926187522 Log-likelihood -23662.72770575145\n\njulia> ŷdouble = predict(mach, X)\n506×2 Matrix{Float64}:\n 23.3358 51.6717\n 23.3358 51.6717\n ⋮ \n 16.6843 38.3686\n 16.6843 38.3686\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.MultitargetNeuralNetworkRegressor","page":"MLJ interface","title":"BetaML.Bmlj.MultitargetNeuralNetworkRegressor","text":"mutable struct MultitargetNeuralNetworkRegressor <: MLJModelInterface.Deterministic\n\nA simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for regression of multiple dimensional targets.\n\nParameters:\n\nlayers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers\nloss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as matrices.\nwarning: Warning\nIf you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.\n\ndloss: Derivative of the loss function [def: BetaML.dsquared_cost, i.e. use the derivative of the squared cost]. Use nothing for autodiff.\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 300]\nbatch_size: Size of each individual batch [def: 16]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ndescr: An optional title and/or description for this model\ncb: A call back function to provide information during training [def: BetaML.fitting_info]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\ndata must be numerical\nthe label should be a n-records by n-dimensions matrix \n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> ydouble = hcat(y, y .*2 .+5);\n\njulia> modelType = @load MultitargetNeuralNetworkRegressor pkg = \"BetaML\" verbosity=0\nBetaML.Nn.MultitargetNeuralNetworkRegressor\n\njulia> layers = [BetaML.DenseLayer(12,50,f=BetaML.relu),BetaML.DenseLayer(50,50,f=BetaML.relu),BetaML.DenseLayer(50,50,f=BetaML.relu),BetaML.DenseLayer(50,2,f=BetaML.relu)];\n\njulia> model = modelType(layers=layers,opt_alg=BetaML.ADAM(),epochs=500)\nMultitargetNeuralNetworkRegressor(\n layers = BetaML.Nn.AbstractLayer[BetaML.Nn.DenseLayer([-0.2591582523441157 -0.027962845131416225 … 0.16044535560124418 -0.12838827994676857; -0.30381834909561184 0.2405495243851402 … -0.2588144861880588 0.09538577909777807; … ; -0.017320292924711156 -0.14042266424603767 … 0.06366999105841187 -0.13419651752478906; 0.07393079961409338 0.24521350531110264 … 0.04256867886217541 -0.0895506802948175], [0.14249427336553644, 0.24719379413682485, -0.25595911822556566, 0.10034088778965933, -0.017086404878505712, 0.21932184025609347, -0.031413516834861266, -0.12569076082247596, -0.18080140982481183, 0.14551901873323253 … -0.13321995621967364, 0.2436582233332092, 0.0552222336976439, 0.07000814133633904, 0.2280064379660025, -0.28885681475734193, -0.07414214246290696, -0.06783184733650621, -0.055318068046308455, -0.2573488383282579], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.0395424111703751 -0.22531232360829911 … -0.04341228943744482 0.024336206858365517; -0.16481887432946268 0.17798073384748508 … -0.18594039305095766 0.051159225856547474; … ; -0.011639475293705043 -0.02347011206244673 … 0.20508869536159186 -0.1158382446274592; -0.19078069527757857 -0.007487540070740484 … -0.21341165344291158 -0.24158671316310726], [-0.04283623889330032, 0.14924461547060602, -0.17039563392959683, 0.00907774027816255, 0.21738885963113852, -0.06308040225941691, -0.14683286822101105, 0.21726892197970937, 0.19784321784707126, -0.0344988665714947 … -0.23643089430602846, -0.013560425201427584, 0.05323948910726356, -0.04644175812567475, -0.2350400292671211, 0.09628312383424742, 0.07016420995205697, -0.23266392927140334, -0.18823664451487, 0.2304486691429084], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.11504184627266828 0.08601794194664503 … 0.03843129724045469 -0.18417305624127284; 0.10181551438831654 0.13459759904443674 … 0.11094951365942118 -0.1549466590355218; … ; 0.15279817525427697 0.0846661196058916 … -0.07993619892911122 0.07145402617285884; -0.1614160186346092 -0.13032002335149 … -0.12310552194729624 -0.15915773071049827], [-0.03435885900946367, -0.1198543931290306, 0.008454985905194445, -0.17980887188986966, -0.03557204910359624, 0.19125847393334877, -0.10949700778538696, -0.09343206702591, -0.12229583511781811, -0.09123969069220564 … 0.22119233518322862, 0.2053873143308657, 0.12756489387198222, 0.11567243705173319, -0.20982445664020496, 0.1595157838386987, -0.02087331046544119, -0.20556423263489765, -0.1622837764237961, -0.019220998739847395], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.25796717031347993 0.17579536633402948 … -0.09992960168785256 -0.09426177454620635; -0.026436330246675632 0.18070899284865127 … -0.19310119102392206 -0.06904005900252091], [0.16133004882307822, -0.3061228721091248], BetaML.Utils.relu, BetaML.Utils.drelu)], \n loss = BetaML.Utils.squared_cost, \n dloss = BetaML.Utils.dsquared_cost, \n epochs = 500, \n batch_size = 32, \n opt_alg = BetaML.Nn.ADAM(BetaML.Nn.var\"#90#93\"(), 1.0, 0.9, 0.999, 1.0e-8, BetaML.Nn.Learnable[], BetaML.Nn.Learnable[]), \n shuffle = true, \n descr = \"\", \n cb = BetaML.Nn.fitting_info, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, ydouble);\n\njulia> fit!(mach);\n\njulia> ŷdouble = predict(mach, X);\n\njulia> hcat(ydouble,ŷdouble)\n506×4 Matrix{Float64}:\n 24.0 53.0 28.4624 62.8607\n 21.6 48.2 22.665 49.7401\n 34.7 74.4 31.5602 67.9433\n 33.4 71.8 33.0869 72.4337\n ⋮ \n 23.9 52.8 23.3573 50.654\n 22.0 49.0 22.1141 48.5926\n 11.9 28.8 19.9639 45.5823\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.NeuralNetworkClassifier","page":"MLJ interface","title":"BetaML.Bmlj.NeuralNetworkClassifier","text":"mutable struct NeuralNetworkClassifier <: MLJModelInterface.Probabilistic\n\nA simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for classification problems.\n\nParameters:\n\nlayers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers. The last \"softmax\" layer is automatically added.\nloss: Loss (cost) function [def: BetaML.crossentropy]. Should always assume y and ŷ as matrices.\nwarning: Warning\nIf you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.\n\ndloss: Derivative of the loss function [def: BetaML.dcrossentropy, i.e. the derivative of the cross-entropy]. Use nothing for autodiff.\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]\nbatch_size: Size of each individual batch [def: 16]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ndescr: An optional title and/or description for this model\ncb: A call back function to provide information during training [def: BetaML.fitting_info]\ncategories: The categories to represent as columns. [def: nothing, i.e. unique training values].\nhandle_unknown: How to handle categories not seens in training or not present in the provided categories array? \"error\" (default) rises an error, \"infrequent\" adds a specific column for these categories.\nother_categories_name: Which value during prediction to assign to this \"other\" category (i.e. categories not seen on training or not present in the provided categories array? [def: nothing, i.e. typemax(Int64) for integer vectors and \"other\" for other types]. This setting is active only if handle_unknown=\"infrequent\" and in that case it MUST be specified if Y is neither integer or strings\nrng: Random Number Generator [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\ndata must be numerical\nthe label should be a n-records by n-dimensions matrix (e.g. a one-hot-encoded data for classification), where the output columns should be interpreted as the probabilities for each categories.\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load NeuralNetworkClassifier pkg = \"BetaML\" verbosity=0\nBetaML.Nn.NeuralNetworkClassifier\n\njulia> layers = [BetaML.DenseLayer(4,8,f=BetaML.relu),BetaML.DenseLayer(8,8,f=BetaML.relu),BetaML.DenseLayer(8,3,f=BetaML.relu),BetaML.VectorFunctionLayer(3,f=BetaML.softmax)];\n\njulia> model = modelType(layers=layers,opt_alg=BetaML.ADAM())\nNeuralNetworkClassifier(\n layers = BetaML.Nn.AbstractLayer[BetaML.Nn.DenseLayer([-0.376173352338049 0.7029289511758696 -0.5589563304592478 -0.21043274001651874; 0.044758889527899415 0.6687689636685921 0.4584331114653877 0.6820506583840453; … ; -0.26546358457167507 -0.28469736227283804 -0.164225549922154 -0.516785639164486; -0.5146043550684141 -0.0699113265130964 0.14959906603941908 -0.053706860039406834], [0.7003943613125758, -0.23990840466587576, -0.23823126271387746, 0.4018101580410387, 0.2274483050356888, -0.564975060667734, 0.1732063297031089, 0.11880299829896945], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.029467850439546583 0.4074661266592745 … 0.36775675246760053 -0.595524555448422; 0.42455597698371306 -0.2458082732997091 … -0.3324220683462514 0.44439454998610595; … ; -0.2890883863364267 -0.10109249362508033 … -0.0602680568207582 0.18177278845097555; -0.03432587226449335 -0.4301192922760063 … 0.5646018168286626 0.47269177680892693], [0.13777442835428688, 0.5473306726675433, 0.3781939472904011, 0.24021813428130567, -0.0714779477402877, -0.020386373530818958, 0.5465466618404464, -0.40339790713616525], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([0.6565120540082393 0.7139211611842745 … 0.07809812467915389 -0.49346311403373844; -0.4544472987041656 0.6502667641568863 … 0.43634608676548214 0.7213049952968921; 0.41212264783075303 -0.21993289366360613 … 0.25365007887755064 -0.5664469566269569], [-0.6911986792747682, -0.2149343209329364, -0.6347727539063817], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.VectorFunctionLayer{0}(fill(NaN), 3, 3, BetaML.Utils.softmax, BetaML.Utils.dsoftmax, nothing)], \n loss = BetaML.Utils.crossentropy, \n dloss = BetaML.Utils.dcrossentropy, \n epochs = 100, \n batch_size = 32, \n opt_alg = BetaML.Nn.ADAM(BetaML.Nn.var\"#90#93\"(), 1.0, 0.9, 0.999, 1.0e-8, BetaML.Nn.Learnable[], BetaML.Nn.Learnable[]), \n shuffle = true, \n descr = \"\", \n cb = BetaML.Nn.fitting_info, \n categories = nothing, \n handle_unknown = \"error\", \n other_categories_name = nothing, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n\njulia> classes_est = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt8, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>0.575, versicolor=>0.213, virginica=>0.213)\n UnivariateFinite{Multiclass{3}}(setosa=>0.573, versicolor=>0.213, virginica=>0.213)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>0.236, versicolor=>0.236, virginica=>0.529)\n UnivariateFinite{Multiclass{3}}(setosa=>0.254, versicolor=>0.254, virginica=>0.492)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.NeuralNetworkRegressor","page":"MLJ interface","title":"BetaML.Bmlj.NeuralNetworkRegressor","text":"mutable struct NeuralNetworkRegressor <: MLJModelInterface.Deterministic\n\nA simple but flexible Feedforward Neural Network, from the Beta Machine Learning Toolkit (BetaML) for regression of a single dimensional target.\n\nParameters:\n\nlayers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers\nloss: Loss (cost) function [def: BetaML.squared_cost]. Should always assume y and ŷ as matrices, even if the regression task is 1-D\nwarning: Warning\nIf you change the parameter loss, you need to either provide its derivative on the parameter dloss or use autodiff with dloss=nothing.\n\ndloss: Derivative of the loss function [def: BetaML.dsquared_cost, i.e. use the derivative of the squared cost]. Use nothing for autodiff.\nepochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]\nbatch_size: Size of each individual batch [def: 16]\nopt_alg: The optimisation algorithm to update the gradient at each batch [def: BetaML.ADAM()]. See subtypes(BetaML.OptimisationAlgorithm) for supported optimizers\nshuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\ndescr: An optional title and/or description for this model\ncb: A call back function to provide information during training [def: fitting_info]\nrng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]\n\nNotes:\n\ndata must be numerical\nthe label should be be a n-records vector.\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> modelType = @load NeuralNetworkRegressor pkg = \"BetaML\" verbosity=0\nBetaML.Nn.NeuralNetworkRegressor\n\njulia> layers = [BetaML.DenseLayer(12,20,f=BetaML.relu),BetaML.DenseLayer(20,20,f=BetaML.relu),BetaML.DenseLayer(20,1,f=BetaML.relu)];\n\njulia> model = modelType(layers=layers,opt_alg=BetaML.ADAM());\nNeuralNetworkRegressor(\n layers = BetaML.Nn.AbstractLayer[BetaML.Nn.DenseLayer([-0.23249759178069676 -0.4125090172711131 … 0.41401934928739 -0.33017881111237535; -0.27912169279319965 0.270551221249931 … 0.19258414323473344 0.1703002982374256; … ; 0.31186742456482447 0.14776438287394805 … 0.3624993442655036 0.1438885872964824; 0.24363744610286758 -0.3221033024934767 … 0.14886090419299408 0.038411663101909355], [-0.42360286004241765, -0.34355377040029594, 0.11510963232946697, 0.29078650404397893, -0.04940236502546075, 0.05142849152316714, -0.177685375947775, 0.3857630523957018, -0.25454667127064756, -0.1726731848206195, 0.29832456225553444, -0.21138505291162835, -0.15763643112604903, -0.08477044513587562, -0.38436681165349196, 0.20538016429104916, -0.25008157754468335, 0.268681800562054, 0.10600581996650865, 0.4262194464325672], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.08534180387478185 0.19659398307677617 … -0.3413633217504578 -0.0484925247381256; 0.0024419192794883915 -0.14614102508129 … -0.21912059923003044 0.2680725396694708; … ; 0.25151545823147886 -0.27532269951606037 … 0.20739970895058063 0.2891938885916349; -0.1699020711688904 -0.1350423717084296 … 0.16947589410758873 0.3629006047373296], [0.2158116357688406, -0.3255582642532289, -0.057314442103850394, 0.29029696770539953, 0.24994080694366455, 0.3624239027782297, -0.30674318230919984, -0.3854738338935017, 0.10809721838554087, 0.16073511121016176, -0.005923262068960489, 0.3157147976348795, -0.10938918304264739, -0.24521229198853187, -0.307167732178712, 0.0808907777008302, -0.014577497150872254, -0.0011287181458157214, 0.07522282588658086, 0.043366500526073104], BetaML.Utils.relu, BetaML.Utils.drelu), BetaML.Nn.DenseLayer([-0.021367697115938555 -0.28326652172347155 … 0.05346175368370165 -0.26037328415871647], [-0.2313659199724562], BetaML.Utils.relu, BetaML.Utils.drelu)], \n loss = BetaML.Utils.squared_cost, \n dloss = BetaML.Utils.dsquared_cost, \n epochs = 100, \n batch_size = 32, \n opt_alg = BetaML.Nn.ADAM(BetaML.Nn.var\"#90#93\"(), 1.0, 0.9, 0.999, 1.0e-8, BetaML.Nn.Learnable[], BetaML.Nn.Learnable[]), \n shuffle = true, \n descr = \"\", \n cb = BetaML.Nn.fitting_info, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n\njulia> ŷ = predict(mach, X);\n\njulia> hcat(y,ŷ)\n506×2 Matrix{Float64}:\n 24.0 30.7726\n 21.6 28.0811\n 34.7 31.3194\n ⋮ \n 23.9 30.9032\n 22.0 29.49\n 11.9 27.2438\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.PegasosClassifier","page":"MLJ interface","title":"BetaML.Bmlj.PegasosClassifier","text":"mutable struct PegasosClassifier <: MLJModelInterface.Probabilistic\n\nThe gradient-based linear \"pegasos\" classifier using one-vs-all for multiclass, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\ninitial_coefficients::Union{Nothing, Matrix{Float64}}: N-classes by D-dimensions matrix of initial linear coefficients [def: nothing, i.e. zeros]\ninitial_constant::Union{Nothing, Vector{Float64}}: N-classes vector of initial contant terms [def: nothing, i.e. zeros]\nlearning_rate::Function: Learning rate [def: (epoch -> 1/sqrt(epoch))]\nlearning_rate_multiplicative::Float64: Multiplicative term of the learning rate [def: 0.5]\nepochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]\nshuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\nforce_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]\nreturn_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load PegasosClassifier pkg = \"BetaML\" verbosity=0\nBetaML.Perceptron.PegasosClassifier\n\njulia> model = modelType()\nPegasosClassifier(\n initial_coefficients = nothing, \n initial_constant = nothing, \n learning_rate = BetaML.Perceptron.var\"#71#73\"(), \n learning_rate_multiplicative = 0.5, \n epochs = 1000, \n shuffle = true, \n force_origin = false, \n return_mean_hyperplane = false, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n\njulia> est_classes = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt8, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>0.817, versicolor=>0.153, virginica=>0.0301)\n UnivariateFinite{Multiclass{3}}(setosa=>0.791, versicolor=>0.177, virginica=>0.0318)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>0.254, versicolor=>0.5, virginica=>0.246)\n UnivariateFinite{Multiclass{3}}(setosa=>0.283, versicolor=>0.51, virginica=>0.207)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.PerceptronClassifier","page":"MLJ interface","title":"BetaML.Bmlj.PerceptronClassifier","text":"mutable struct PerceptronClassifier <: MLJModelInterface.Probabilistic\n\nThe classical perceptron algorithm using one-vs-all for multiclass, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\ninitial_coefficients::Union{Nothing, Matrix{Float64}}: N-classes by D-dimensions matrix of initial linear coefficients [def: nothing, i.e. zeros]\ninitial_constant::Union{Nothing, Vector{Float64}}: N-classes vector of initial contant terms [def: nothing, i.e. zeros]\nepochs::Int64: Maximum number of epochs, i.e. passages trough the whole training sample [def: 1000]\nshuffle::Bool: Whether to randomly shuffle the data at each iteration (epoch) [def: true]\nforce_origin::Bool: Whether to force the parameter associated with the constant term to remain zero [def: false]\nreturn_mean_hyperplane::Bool: Whether to return the average hyperplane coefficients instead of the final ones [def: false]\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load PerceptronClassifier pkg = \"BetaML\"\n[ Info: For silent loading, specify `verbosity=0`. \nimport BetaML ✔\nBetaML.Perceptron.PerceptronClassifier\n\njulia> model = modelType()\nPerceptronClassifier(\n initial_coefficients = nothing, \n initial_constant = nothing, \n epochs = 1000, \n shuffle = true, \n force_origin = false, \n return_mean_hyperplane = false, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(PerceptronClassifier(initial_coefficients = nothing, …), …).\n*** Avg. error after epoch 2 : 0.0 (all elements of the set has been correctly classified)\njulia> est_classes = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt8, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>2.53e-34, virginica=>0.0)\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>1.27e-18, virginica=>1.86e-310)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>2.77e-57, versicolor=>1.1099999999999999e-82, virginica=>1.0)\n UnivariateFinite{Multiclass{3}}(setosa=>3.09e-22, versicolor=>4.03e-25, virginica=>1.0)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.RandomForestClassifier","page":"MLJ interface","title":"BetaML.Bmlj.RandomForestClassifier","text":"mutable struct RandomForestClassifier <: MLJModelInterface.Probabilistic\n\nA simple Random Forest model for classification with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nn_trees::Int64\nmax_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. square root of the data dimensions]\nsplitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: gini]. Either gini, entropy or a custom function. It can also be an anonymous function.\nβ::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour \"better\" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample :\n\njulia> using MLJ\n\njulia> X, y = @load_iris;\n\njulia> modelType = @load RandomForestClassifier pkg = \"BetaML\" verbosity=0\nBetaML.Trees.RandomForestClassifier\n\njulia> model = modelType()\nRandomForestClassifier(\n n_trees = 30, \n max_depth = 0, \n min_gain = 0.0, \n min_records = 2, \n max_features = 0, \n splitting_criterion = BetaML.Utils.gini, \n β = 0.0, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(RandomForestClassifier(n_trees = 30, …), …).\n\njulia> cat_est = predict(mach, X)\n150-element CategoricalDistributions.UnivariateFiniteVector{Multiclass{3}, String, UInt32, Float64}:\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>0.0, virginica=>0.0)\n UnivariateFinite{Multiclass{3}}(setosa=>1.0, versicolor=>0.0, virginica=>0.0)\n ⋮\n UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0, virginica=>1.0)\n UnivariateFinite{Multiclass{3}}(setosa=>0.0, versicolor=>0.0667, virginica=>0.933)\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.RandomForestImputer","page":"MLJ interface","title":"BetaML.Bmlj.RandomForestImputer","text":"mutable struct RandomForestImputer <: MLJModelInterface.Unsupervised\n\nImpute missing values using Random Forests, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nn_trees::Int64: Number of (decision) trees in the forest [def: 30]\nmax_depth::Union{Nothing, Int64}: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: nothing, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Union{Nothing, Int64}: The maximum number of (random) features to consider at each partitioning [def: nothing, i.e. square root of the data dimension]\nforced_categorical_cols::Vector{Int64}: Specify the positions of the integer columns to treat as categorical instead of cardinal. [Default: empty vector (all numerical cols are treated as cardinal by default and the others as categorical)]\nsplitting_criterion::Union{Nothing, Function}: Either gini, entropy or variance. This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: nothing, i.e. gini for categorical labels (classification task) and variance for numerical labels(regression task)]. It can be an anonymous function.\nrecursive_passages::Int64: Define the times to go trough the various columns to impute their data. Useful when there are data to impute on multiple columns. The order of the first passage is given by the decreasing number of missing values per column, the other passages are random [default: 1].\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;\n\njulia> modelType = @load RandomForestImputer pkg = \"BetaML\" verbosity=0\nBetaML.Imputation.RandomForestImputer\n\njulia> model = modelType(n_trees=40)\nRandomForestImputer(\n n_trees = 40, \n max_depth = nothing, \n min_gain = 0.0, \n min_records = 2, \n max_features = nothing, \n forced_categorical_cols = Int64[], \n splitting_criterion = nothing, \n recursive_passages = 1, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(RandomForestImputer(n_trees = 40, …), …).\n\njulia> X_full = transform(mach) |> MLJ.matrix\n9×2 Matrix{Float64}:\n 1.0 10.5\n 1.5 10.3909\n 1.8 8.0\n 1.7 15.0\n 3.2 40.0\n 2.88375 8.66125\n 3.3 38.0\n 3.98125 -2.3\n 5.2 -2.4\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.RandomForestRegressor","page":"MLJ interface","title":"BetaML.Bmlj.RandomForestRegressor","text":"mutable struct RandomForestRegressor <: MLJModelInterface.Deterministic\n\nA simple Random Forest model for regression with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nn_trees::Int64: Number of (decision) trees in the forest [def: 30]\nmax_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]\nmin_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]\nmin_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]\nmax_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. square root of the data dimension]\nsplitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the \"impurity\" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: variance]. Either variance or a custom function. It can also be an anonymous function.\nβ::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour \"better\" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]\nrng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]\n\nExample:\n\njulia> using MLJ\n\njulia> X, y = @load_boston;\n\njulia> modelType = @load RandomForestRegressor pkg = \"BetaML\" verbosity=0\nBetaML.Trees.RandomForestRegressor\n\njulia> model = modelType()\nRandomForestRegressor(\n n_trees = 30, \n max_depth = 0, \n min_gain = 0.0, \n min_records = 2, \n max_features = 0, \n splitting_criterion = BetaML.Utils.variance, \n β = 0.0, \n rng = Random._GLOBAL_RNG())\n\njulia> mach = machine(model, X, y);\n\njulia> fit!(mach);\n[ Info: Training machine(RandomForestRegressor(n_trees = 30, …), …).\n\njulia> ŷ = predict(mach, X);\n\njulia> hcat(y,ŷ)\n506×2 Matrix{Float64}:\n 24.0 25.8433\n 21.6 22.4317\n 34.7 35.5742\n 33.4 33.9233\n ⋮ \n 23.9 24.42\n 22.0 22.4433\n 11.9 15.5833\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.SimpleImputer","page":"MLJ interface","title":"BetaML.Bmlj.SimpleImputer","text":"mutable struct SimpleImputer <: MLJModelInterface.Unsupervised\n\nImpute missing values using feature (column) mean, with optional record normalisation (using l-norm norms), from the Beta Machine Learning Toolkit (BetaML).\n\nHyperparameters:\n\nstatistic::Function: The descriptive statistic of the column (feature) to use as imputed value [def: mean]\nnorm::Union{Nothing, Int64}: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).\n\nExample:\n\njulia> using MLJ\n\njulia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;\n\njulia> modelType = @load SimpleImputer pkg = \"BetaML\" verbosity=0\nBetaML.Imputation.SimpleImputer\n\njulia> model = modelType(norm=1)\nSimpleImputer(\n statistic = Statistics.mean, \n norm = 1)\n\njulia> mach = machine(model, X);\n\njulia> fit!(mach);\n[ Info: Training machine(SimpleImputer(statistic = mean, …), …).\n\njulia> X_full = transform(mach) |> MLJ.matrix\n9×2 Matrix{Float64}:\n 1.0 10.5\n 1.5 0.295466\n 1.8 8.0\n 1.7 15.0\n 3.2 40.0\n 0.280952 1.69524\n 3.3 38.0\n 0.0750839 -2.3\n 5.2 -2.4\n\n\n\n\n\n","category":"type"},{"location":"MLJ_interface.html#BetaML.Bmlj.mljverbosity_to_betaml_verbosity-Tuple{Integer}","page":"MLJ interface","title":"BetaML.Bmlj.mljverbosity_to_betaml_verbosity","text":"mljverbosity_to_betaml_verbosity(i::Integer) -> Verbosity\n\n\nConvert any integer (short scale) to one of the defined betaml verbosity levels Currently \"steps\" are 0, 1, 2 and 3\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.fit-Tuple{BetaML.Bmlj.AutoEncoder, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.fit","text":"fit(\n m::BetaML.Bmlj.AutoEncoder,\n verbosity,\n X\n) -> Tuple{AutoEncoder, Nothing, Nothing}\n\n\nFor the verbosity parameter see Verbosity)\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.fit-Tuple{BetaML.Bmlj.MultitargetNeuralNetworkRegressor, Any, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.fit","text":"fit(\n m::BetaML.Bmlj.MultitargetNeuralNetworkRegressor,\n verbosity,\n X,\n y\n) -> Tuple{NeuralNetworkEstimator, Nothing, Nothing}\n\n\nFor the verbosity parameter see Verbosity)\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.fit-Tuple{BetaML.Bmlj.NeuralNetworkClassifier, Any, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.fit","text":"MMI.fit(model::NeuralNetworkClassifier, verbosity, X, y)\n\nFor the verbosity parameter see Verbosity)\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.fit-Tuple{BetaML.Bmlj.NeuralNetworkRegressor, Any, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.fit","text":"fit(\n m::BetaML.Bmlj.NeuralNetworkRegressor,\n verbosity,\n X,\n y\n) -> Tuple{NeuralNetworkEstimator, Nothing, Nothing}\n\n\nFor the verbosity parameter see Verbosity)\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.predict-Tuple{Union{BetaML.Bmlj.KMeansClusterer, BetaML.Bmlj.KMedoidsClusterer}, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.predict","text":"predict(m::KMeansClusterer, fitResults, X) - Given a fitted clustering model and some observations, predict the class of the observation\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.transform-Tuple{BetaML.Bmlj.GeneralImputer, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.transform","text":"transform(m, fitResults, X)\n\nGiven a trained imputator model fill the missing data of some new observations. Note that with multiple recursive imputations and inner estimators that don't support missing data, this function works only for X for which th model has been trained with, i.e. this function can not be applied to new matrices with empty values using model trained on other matrices.\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.transform-Tuple{Union{BetaML.Bmlj.GaussianMixtureImputer, BetaML.Bmlj.RandomForestImputer, BetaML.Bmlj.SimpleImputer}, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.transform","text":"transform(m, fitResults, X) - Given a trained imputator model fill the missing data of some new observations\n\n\n\n\n\n","category":"method"},{"location":"MLJ_interface.html#MLJModelInterface.transform-Tuple{Union{BetaML.Bmlj.KMeansClusterer, BetaML.Bmlj.KMedoidsClusterer}, Any, Any}","page":"MLJ interface","title":"MLJModelInterface.transform","text":"fit(m::KMeansClusterer, fitResults, X) - Given a fitted clustering model and some observations, return the distances to each centroids \n\n\n\n\n\n","category":"method"},{"location":"Api_v2_developer.html#api_implementation","page":"API implementation","title":"Api v2 - developer documentation (API implementation)","text":"","category":"section"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"Each model is a child of either BetaMLSuperVisedModel or BetaMLSuperVisedModel, both in turn child of BetaMLModel:","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"BetaMLSuperVisedModel <: BetaMLModel\nBetaMLUnsupervisedModel <: BetaMLModel\nRandomForestEstimator <: BetaMLSuperVisedModel","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"The model struct is composed of the following elements:","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"mutable struct DecisionTreeEstimator <: BetaMLSupervisedModel\n hpar::DecisionTreeE_hp # Hyper-pharameters\n opt::BML_options # Option sets, default or a specific one for the model\n par::DT_lp # Model learnable parameters (needed for predictions)\n cres::T # Cached results\n trained::Bool # Trained flag\n info # Complementary information, but not needed to make predictions\nend","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"Each specific model hyperparameter set and learnable parameter set are childs of BetaMLHyperParametersSet and BetaMLLearnedParametersSet and, if a specific model option set is used, this would be child of BetaMLOptionsSet.","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"While hyperparameters are elements that control the learning process, i.e. would influence the model training and prediction, the options have a more general meaning and do not directly affect the training (they can do indirectly, like the rng). The default option set is implemented as:","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"Base.@kwdef mutable struct BML_options\n \"Cache the results of the fitting stage, as to allow predict(mod) [default: `true`]. Set it to `false` to save memory for large data.\"\n cache::Bool = true\n \"An optional title and/or description for this model\"\n descr::String = \"\" \n \"The verbosity level to be used in training or prediction (see [`Verbosity`](@ref)) [deafult: `STD`]\n \"\n verbosity::Verbosity = STD\n \"Random Number Generator (see [`FIXEDSEED`](@ref)) [deafult: `Random.GLOBAL_RNG`]\n \"\n rng::AbstractRNG = Random.GLOBAL_RNG\nend","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"Note that the user doesn't generally need to make a difference between an hyperparameter and an option, as both are provided as keyword arguments to the model constructor thanks to a model constructor like the following one:","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"function KMedoidsClusterer(;kwargs...)\n m = KMedoidsClusterer(KMeansMedoidsHyperParametersSet(),BML_options(),KMeansMedoids_lp(),nothing,false,Dict{Symbol,Any}())\n thisobjfields = fieldnames(nonmissingtype(typeof(m)))\n for (kw,kwv) in kwargs\n found = false\n for f in thisobjfields\n fobj = getproperty(m,f)\n if kw in fieldnames(typeof(fobj))\n setproperty!(fobj,kw,kwv)\n found = true\n end\n end\n found || error(\"Keyword \\\"$kw\\\" is not part of this model.\")\n end\n return m\nend","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"So, in order to implement a new model we need to:","category":"page"},{"location":"Api_v2_developer.html","page":"API implementation","title":"API implementation","text":"implement its struct and constructor\nimplement the relative ModelHyperParametersSet, ModelLearnedParametersSet and eventually ModelOptionsSet.\ndefine fit!(model, X, [y]), predict(model,X) and eventually inverse_predict(model,X).","category":"page"},{"location":"index.html#![BLogos](assets/BetaML_logo_30x30.png)-BetaML.jl-Documentation","page":"Index","title":"(Image: BLogos) BetaML.jl Documentation","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"Welcome to the documentation of the Beta Machine Learning toolkit.","category":"page"},{"location":"index.html#About","page":"Index","title":"About","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"The BetaML toolkit provides machine learning algorithms written in the Julia programming language.","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Aside the algorithms themselves, BetaML provides many \"utility\" functions. Because algorithms are all self-contained in the library itself (you are invited to explore their source code by typing @edit functionOfInterest(par1,par2,...)), the utility functions have APIs that are coordinated with the algorithms, facilitating the \"preparation\" of the data for the analysis, the choice of the hyper-parameters or the evaluation of the models. Most models have an interface for the MLJ framework.","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Aside Julia, BetaML can be accessed in R or Python using respectively JuliaCall and PyJulia. See the tutorial for details.","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"!!! Warning Version 0.11 brings homogenization in the models' names and put some order on other stuff, but at the cost of severe breaking changes. Follow the updated documentation. ","category":"page"},{"location":"index.html#Installation","page":"Index","title":"Installation","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"The BetaML package is included in the standard Julia register, install it with:","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"] add BetaML","category":"page"},{"location":"index.html#Available-modules","page":"Index","title":"Available modules","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"While BetaML is split in several (sub)modules, all of them are re-exported at the root module level. This means that you can access their functionality by simply typing using BetaML:","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"using BetaML\nmyLayer = DenseLayer(2,3) # DenseLayer is defined in the Nn submodule\nres = KernelPerceptronClassifier() # KernelPerceptronClassifier is defined in the Perceptron module\n@edit DenseLayer(2,3) # Open a text editor with to the relevant source code","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Each module is documented on the links below (you can also use the inline Julia help system: just press the question mark ? and then, on the special help prompt help?>, type the function name):","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"BetaML.Perceptron: The Perceptron, Kernel Perceptron and Pegasos classification algorithms;\nBetaML.Trees: The Decision Trees and Random Forests algorithms for classification or regression (with missing values supported);\nBetaML.Nn: Implementation of Artificial Neural Networks;\nBetaML.Clustering: (hard) Clustering algorithms (K-Means, K-Mdedoids)\nBetaML.GMM: Various algorithms (Clustering, regressor, missing imputation / collaborative filtering / recommandation systems) that use a Generative (Gaussian) mixture models (probabilistic) fitter, fitted using a EM algorithm;\nBetaML.Imputation: Imputation algorithms;\nBetaML.Utils: Various utility functions (scale, one-hot, distances, kernels, pca, accuracy/error measures..).","category":"page"},{"location":"index.html#models_list","page":"Index","title":"Available models","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"Currently BetaML provides the following models:","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"BetaML name MLJ Interface Category*\nPerceptronClassifier PerceptronClassifier Supervised classifier\nKernelPerceptronClassifier KernelPerceptronClassifier Supervised classifier\nPegasosClassifier PegasosClassifier Supervised classifier\nDecisionTreeEstimator DecisionTreeClassifier, DecisionTreeRegressor Supervised regressor and classifier\nRandomForestEstimator RandomForestClassifier, RandomForestRegressor Supervised regressor and classifier\nNeuralNetworkEstimator NeuralNetworkRegressor, MultitargetNeuralNetworkRegressor, NeuralNetworkClassifier Supervised regressor and classifier\nGaussianMixtureRegressor GaussianMixtureRegressor, MultitargetGaussianMixtureRegressor Supervised regressor\nGaussianMixtureRegressor2 Supervised regressor\nKMeansClusterer KMeansClusterer Unsupervised hard clusterer\nKMedoidsClusterer KMedoidsClusterer Unsupervised hard clusterer\nGaussianMixtureClusterer GaussianMixtureClusterer Unsupervised soft clusterer\nSimpleImputer SimpleImputer Unsupervised missing data imputer\nGaussianMixtureImputer GaussianMixtureImputer Unsupervised missing data imputer\nRandomForestImputer RandomForestImputer Unsupervised missing data imputer\nGeneralImputer GeneralImputer Unsupervised missing data imputer\nMinMaxScaler Data transformer\nStandardScaler Data transformer\nScaler Data transformer\nPCAEncoder Unsupervised dimensionality reduction\nAutoEncoder AutoEncoder Unsupervised non-linear dimensionality reduction\nOneHotEncoder Data transformer\nOrdinalEncoder Data transformer\nConfusionMatrix Predictions assessment","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"* There is no formal distinction in BetaML between a transformer, or also a model to assess predictions, and a unsupervised model. They are all treated as unsupervised models that given some data they lern how to return some useful information, wheter a class grouping, a specific tranformation or a quality evaluation..","category":"page"},{"location":"index.html#Usage","page":"Index","title":"Usage","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"New to BetaML or even to Julia / Machine Learning altogether? Start from the tutorial!","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"All models supports the (a) model construction (where hyperparameters and options are choosen), (b) fitting and (c) prediction paradigm. A few model support inverse_transform, for example to go back from the one-hot encoded columns to the original categorical variable (factor). ","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"This paradigm is described in detail in the API V2 page.","category":"page"},{"location":"index.html#Quick-examples","page":"Index","title":"Quick examples","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"(see the tutorial for a more step-by-step guide to the examples below and to other examples)","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Using an Artificial Neural Network for multinomial categorisation","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"In this example we see how to train a neural networks model to predict the specie's name (5th column) given floral sepals and petals measures (first 4 columns) in the famous iris flower dataset.","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"# Load Modules\nusing DelimitedFiles, Random\nusing Pipe, Plots, BetaML # Load BetaML and other auxiliary modules\nRandom.seed!(123); # Fix the random seed (to obtain reproducible results).\n\n# Load the data\niris = readdlm(joinpath(dirname(Base.find_package(\"BetaML\")),\"..\",\"test\",\"data\",\"iris.csv\"),',',skipstart=1)\nx = convert(Array{Float64,2}, iris[:,1:4])\ny = convert(Array{String,1}, iris[:,5])\n# Encode the categories (levels) of y using a separate column per each category (aka \"one-hot\" encoding) \nohmod = OneHotEncoder()\ny_oh = fit!(ohmod,y) \n# Split the data in training/testing sets\n((xtrain,xtest),(ytrain,ytest),(ytrain_oh,ytest_oh)) = partition([x,y,y_oh],[0.8,0.2])\n(ntrain, ntest) = size.([xtrain,xtest],1)\n\n# Define the Artificial Neural Network model\nl1 = DenseLayer(4,10,f=relu) # The activation function is `ReLU`\nl2 = DenseLayer(10,3) # The activation function is `identity` by default\nl3 = VectorFunctionLayer(3,f=softmax) # Add a (parameterless include(\"Imputation_tests.jl\")) layer whose activation function (`softmax` in this case) is defined to all its nodes at once\nmynn = NeuralNetworkEstimator(layers=[l1,l2,l3],loss=crossentropy,descr=\"Multinomial logistic regression Model Sepal\", batch_size=2, epochs=200) # Build the NN and use the cross-entropy as error function.\n# Alternatively, swith to hyperparameters auto-tuning with `autotune=true` instead of specify `batch_size` and `epoch` manually\n\n# Train the model (using the ADAM optimizer by default)\nres = fit!(mynn,fit!(Scaler(),xtrain),ytrain_oh) # Fit the model to the (scaled) data\n\n# Obtain predictions and test them against the ground true observations\nŷtrain = @pipe predict(mynn,fit!(Scaler(),xtrain)) |> inverse_predict(ohmod,_) # Note the scaling and reverse one-hot encoding functions\nŷtest = @pipe predict(mynn,fit!(Scaler(),xtest)) |> inverse_predict(ohmod,_) \ntrain_accuracy = accuracy(ŷtrain,ytrain) # 0.975\ntest_accuracy = accuracy(ŷtest,ytest) # 0.96\n\n# Analyse model performances\ncm = ConfusionMatrix()\nfit!(cm,ytest,ŷtest)\nprint(cm)","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"A ConfusionMatrix BetaMLModel (fitted)\n\n-----------------------------------------------------------------\n\n*** CONFUSION MATRIX ***\n\nScores actual (rows) vs predicted (columns):\n\n4×4 Matrix{Any}:\n \"Labels\" \"virginica\" \"versicolor\" \"setosa\"\n \"virginica\" 8 1 0\n \"versicolor\" 0 14 0\n \"setosa\" 0 0 7\nNormalised scores actual (rows) vs predicted (columns):\n\n4×4 Matrix{Any}:\n \"Labels\" \"virginica\" \"versicolor\" \"setosa\"\n \"virginica\" 0.888889 0.111111 0.0\n \"versicolor\" 0.0 1.0 0.0\n \"setosa\" 0.0 0.0 1.0\n\n *** CONFUSION REPORT ***\n\n- Accuracy: 0.9666666666666667\n- Misclassification rate: 0.033333333333333326\n- Number of classes: 3\n\n N Class precision recall specificity f1score actual_count predicted_count\n TPR TNR support \n\n 1 virginica 1.000 0.889 1.000 0.941 9 8\n 2 versicolor 0.933 1.000 0.938 0.966 14 15\n 3 setosa 1.000 1.000 1.000 1.000 7 7\n\n- Simple avg. 0.978 0.963 0.979 0.969\n- Weigthed avg. 0.969 0.967 0.971 0.966","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"ϵ = info(mynn)[\"loss_per_epoch\"]\nplot(1:length(ϵ),ϵ, ylabel=\"epochs\",xlabel=\"error\",legend=nothing,title=\"Avg. error per epoch on the Sepal dataset\")\nheatmap(info(cm)[\"categories\"],info(cm)[\"categories\"],info(cm)[\"normalised_scores\"],c=cgrad([:white,:blue]),xlabel=\"Predicted\",ylabel=\"Actual\", title=\"Confusion Matrix\")","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"(Image: results) (Image: results)","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Using Random forests for regression","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"In this example we predict, using another classical ML dataset, the miles per gallon of various car models.","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Note in particular:","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"(a) how easy it is in Julia to import remote data, even cleaning them without ever saving a local file on disk;\n(b) how Random Forest models can directly work on data with missing values, categorical one and non-numerical one in general without any preprocessing ","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"# Load modules\nusing Random, HTTP, CSV, DataFrames, BetaML, Plots\nimport Pipe: @pipe\nRandom.seed!(123)\n\n# Load data\nurlData = \"https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data\"\ndata = @pipe HTTP.get(urlData).body |>\n replace!(_, UInt8('\\t') => UInt8(' ')) |>\n CSV.File(_, delim=' ', missingstring=\"?\", ignorerepeated=true, header=false) |>\n DataFrame;\n\n# Preprocess data\nX = Matrix(data[:,2:8]) # cylinders, displacement, horsepower, weight, acceleration, model year, origin, model name\ny = data[:,1] # miles per gallon\n(xtrain,xtest),(ytrain,ytest) = partition([X,y],[0.8,0.2])\n\n# Model definition, hyper-parameters auto-tuning, training and prediction\nm = RandomForestEstimator(autotune=true)\nŷtrain = fit!(m,xtrain,ytrain) # shortcut for `fit!(m,xtrain,ytrain); ŷtrain = predict(x,xtrain)`\nŷtest = predict(m,xtest)\n\n# Prediction assessment\nrelative_mean_error_train = relative_mean_error(ytrain,ŷtrain) # 0.039\nrelative_mean_error_test = relative_mean_error(ytest,ŷtest) # 0.076\nscatter(ytest,ŷtest,xlabel=\"Actual\",ylabel=\"Estimated\",label=nothing,title=\"Est vs. obs MPG (test set)\")","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"(Image: results)","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Further examples","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"Finally, you may want to give a look at the \"test\" folder. While the primary objective of the scripts under the \"test\" folder is to provide automatic testing of the BetaML toolkit, they can also be used to see how functions should be called, as virtually all functions provided by BetaML are tested there.","category":"page"},{"location":"index.html#Acknowledgements","page":"Index","title":"Acknowledgements","text":"","category":"section"},{"location":"index.html","page":"Index","title":"Index","text":"The development of this package at the Bureau d'Economie Théorique et Appliquée (BETA, Nancy) was supported by the French National Research Agency through the Laboratory of Excellence ARBRE, a part of the “Investissements d'Avenir” Program (ANR 11 – LABX-0002-01).","category":"page"},{"location":"index.html","page":"Index","title":"Index","text":"(Image: BLogos)","category":"page"}] } diff --git a/dev/tutorials/Betaml_tutorial_getting_started.html b/dev/tutorials/Betaml_tutorial_getting_started.html index 0b2c086..eee5d93 100644 --- a/dev/tutorials/Betaml_tutorial_getting_started.html +++ b/dev/tutorials/Betaml_tutorial_getting_started.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

Getting started

Introduction

This "tutorial" part of the documentation presents a step-by-step guide to the main algorithms and utility functions provided by BetaML and comparisons with the leading packages in each field. Aside this page, the tutorial is divided in the following sections:

  • Classification tutorial - Topics: Decision trees and random forests, neural networks (softmax), dealing with stochasticity, loading data from internet
  • Regression tutorial - Topics: Decision trees, Random forests, neural networks, hyper-parameters autotuning, one-hot encoding, continuous error measures
  • Clustering tutorial - Topics: k-means, kmedoids, generative (gaussian) mixture models (gmm), cross-validation, ordinal encoding

Detailed usage instructions on each algorithm can be found on each model struct (listed here), while theoretical notes describing most of them can be found at the companion repository https://github.com/sylvaticus/MITx_6.86x.

The overall "philosophy" of BetaML is to support simple machine learning tasks easily and make complex tasks possible. An the most basic level, the majority of algorithms have default parameters suitable for a basic analysis. A great level of flexibility can be already achieved by just employing the full set of model parameters, for example changing the distance function in KMedoidsClusterer to l1_distance (aka "Manhattan distance"). Finally, the greatest flexibility can be obtained by customising BetaML and writing, for example, its own neural network layer type (by subclassing AbstractLayer), its own sampler (by subclassing AbstractDataSampler) or its own mixture component (by subclassing AbstractMixture), In such a cases, while not required by any means, please consider to give it back to the community and open a pull request to integrate your work in BetaML.

If you are looking for an introductory book on Julia, you could consider "Julia Quick Syntax Reference" (Apress,2019) or the online course "Introduction to Scientific Programming and Machine Learning with Julia".

A few conventions applied across the library:

  • Type names use the so-called "CamelCase" convention, where the words are separated by a capital letter rather than _ ,while function names use lower letters only, with words eventually separated (but only when really neeed for readibility) by an _;
  • While some functions provide a dims parameter, most BetaML algorithms expect the input data layout with observations organised by rows and fields/features by columns. Almost everywhere in the code and documentation we refer with N the number of observations/records, D the number of dimensions and K the number of classes/categories;
  • While some algorithms accept as input DataFrames, the usage of standard arrays is encourages (if the data is passed to the function as dataframe, it may be converted to standard arrays somewhere inside inner loops, leading to great inefficiencies)
  • The accuracy/error/loss measures expect the ground true y and then the estimated (in this order)

Using BetaML from other programming languages

In this section we provide two examples of using BetaML directly in Python or R (with automatic object conversion). Click Details for a more extended explanation of these examples. While I have no experience with, the same approach can be used to access BetaML from any language with a binding to Julia, like Matlab or Javascript.

Use BetaML in Python

$ python3 -m pip install --user juliacall
>>> from juliacall import Main as jl
+

Getting started

Introduction

This "tutorial" part of the documentation presents a step-by-step guide to the main algorithms and utility functions provided by BetaML and comparisons with the leading packages in each field. Aside this page, the tutorial is divided in the following sections:

  • Classification tutorial - Topics: Decision trees and random forests, neural networks (softmax), dealing with stochasticity, loading data from internet
  • Regression tutorial - Topics: Decision trees, Random forests, neural networks, hyper-parameters autotuning, one-hot encoding, continuous error measures
  • Clustering tutorial - Topics: k-means, kmedoids, generative (gaussian) mixture models (gmm), cross-validation, ordinal encoding

Detailed usage instructions on each algorithm can be found on each model struct (listed here), while theoretical notes describing most of them can be found at the companion repository https://github.com/sylvaticus/MITx_6.86x.

The overall "philosophy" of BetaML is to support simple machine learning tasks easily and make complex tasks possible. An the most basic level, the majority of algorithms have default parameters suitable for a basic analysis. A great level of flexibility can be already achieved by just employing the full set of model parameters, for example changing the distance function in KMedoidsClusterer to l1_distance (aka "Manhattan distance"). Finally, the greatest flexibility can be obtained by customising BetaML and writing, for example, its own neural network layer type (by subclassing AbstractLayer), its own sampler (by subclassing AbstractDataSampler) or its own mixture component (by subclassing AbstractMixture), In such a cases, while not required by any means, please consider to give it back to the community and open a pull request to integrate your work in BetaML.

If you are looking for an introductory book on Julia, you could consider "Julia Quick Syntax Reference" (Apress,2019) or the online course "Introduction to Scientific Programming and Machine Learning with Julia".

A few conventions applied across the library:

  • Type names use the so-called "CamelCase" convention, where the words are separated by a capital letter rather than _ ,while function names use lower letters only, with words eventually separated (but only when really neeed for readibility) by an _;
  • While some functions provide a dims parameter, most BetaML algorithms expect the input data layout with observations organised by rows and fields/features by columns. Almost everywhere in the code and documentation we refer with N the number of observations/records, D the number of dimensions and K the number of classes/categories;
  • While some algorithms accept as input DataFrames, the usage of standard arrays is encourages (if the data is passed to the function as dataframe, it may be converted to standard arrays somewhere inside inner loops, leading to great inefficiencies)
  • The accuracy/error/loss measures expect the ground true y and then the estimated (in this order)

Using BetaML from other programming languages

In this section we provide two examples of using BetaML directly in Python or R (with automatic object conversion). Click Details for a more extended explanation of these examples. While I have no experience with, the same approach can be used to access BetaML from any language with a binding to Julia, like Matlab or Javascript.

Use BetaML in Python

$ python3 -m pip install --user juliacall
>>> from juliacall import Main as jl
 >>> import numpy as np
 >>> from sklearn import datasets
 >>> jl.seval('using Pkg; Pkg.add("BetaML")') # Only once 
@@ -86,4 +86,4 @@
 ')

We can then call the above function in R in one of the following three ways:

  1. kMeansR(Xs,3,ys)
  2. julia_assign("Xs_julia", Xs); julia_assign("ys_julia", ys); julia_eval("accFromKmeans(Xs_julia,3,ys_julia)")
  3. julia_call("accFromKmeans",Xs,3,ys)

While other "convenience" functions are provided by the package, using julia_call, or julia_assign followed by julia_eval, should suffix to use BetaML from R. If you run into problems using BetaML from R, open an issue specifying your set-up.

Dealing with stochasticity and reproducibility

Machine Learning workflows include stochastic components in several steps: in the data sampling, in the model initialisation and often in the models's own algorithms (and sometimes also in the prediction step). All BetaML models with a stochastic components support a rng parameter, standing for Random Number Generator. A RNG is a "machine" that streams a flow of random numbers. The flow itself however is deterministically determined for each "seed" (an integer number) that the RNG has been told to use. Normally this seed changes at each running of the script/model, so that stochastic models are indeed stochastic and their output differs at each run.

If we want to obtain reproductible results we can fix the seed at the very beginning of our model with Random.seed!([AnInteger]). Now our model or script will pick up a specific flow of random numbers, but this flow will always be the same, so that its results will always be the same.

However the default Julia RNG guarantee to provide the same flow of random numbers, conditional to the seed, only within minor versions of Julia. If we want to "guarantee" reproducibility of the results with different versions of Julia, or "fix" only some parts of our script, we can call the individual functions passing FIXEDRNG, an instance of StableRNG(FIXEDSEED) provided by BetaML, to the rng parameter. Use it with:

  • MyModel(;rng=FIXEDRNG) : always produce the same sequence of results on each run of the script ("pulling" from the same rng object on different calls)
  • MyModel(;rng=StableRNG(SOMEINTEGER)) : always produce the same result (new identical rng object on each call)

This is very convenient expecially during model development, as a model that use (...,rng=StableRNG(an_integer)) will provides stochastic results that are isolated (i.e. they don't depend from the consumption of the random stream from other parts of the model).

In particular, use rng=StableRNG(FIXEDSEED) or rng=copy(FIXEDRNG) with FIXEDSEED to retrieve the exact output as in the documentation or in the unit tests.

Most of the stochasticity appears in training a model. However in few cases (e.g. decision trees with missing values) some stochasticity appears also in predicting new data using a trained model. In such cases the model doesn't restrict the random seed, so that you can choose at predict time to use a fixed or a variable random seed.

Finally, if you plan to use multiple threads and want to provide the same stochastic output independent to the number of threads used, have a look at generate_parallel_rngs.

"Reproducible stochasticity" is only one of the elements needed for a reproductible output. The other two are (a) the inputs the workflow uses and (b) the code that is evaluated. Concerning the second point Julia has a very modern package system that guarantee reproducible code evaluation (with a few exception linked to using external libraries, but BetaML models are all implemented in Julia itself). Without going in detail, you can use a pattern like this at the beginning of your machine learning workflows:

using Pkg  
 cd(@__DIR__)            
 Pkg.activate(".")  # Activate a "local" environment, specific to this folder
-Pkg.instantiate()  # Download and install the required packages if not already available 

This will tell Julia to load the exact version of dependent packages, and recursively of their dependencies, from a Manifest.toml file that is automatically created in the script's folder, and automatically updated, when you add or update a package in your workflow. Note that these locals "environments" are very "cheap" (packages are not actually copied to each environment on your system, only referenced) and the environment doen't need to be in the same script folder as in this example, can be any folder you want to "activate".

Saving and loading trained models

Trained models can be saved on disk using the model_save function, and retrieved with model_load. The advantage over the serialization functionality in Julia core is that the two functions are actually wrappers around equivalent JLD2 package functions, and should maintain compatibility across different Julia versions.

+Pkg.instantiate() # Download and install the required packages if not already available

This will tell Julia to load the exact version of dependent packages, and recursively of their dependencies, from a Manifest.toml file that is automatically created in the script's folder, and automatically updated, when you add or update a package in your workflow. Note that these locals "environments" are very "cheap" (packages are not actually copied to each environment on your system, only referenced) and the environment doen't need to be in the same script folder as in this example, can be any folder you want to "activate".

Saving and loading trained models

Trained models can be saved on disk using the model_save function, and retrieved with model_load. The advantage over the serialization functionality in Julia core is that the two functions are actually wrappers around equivalent JLD2 package functions, and should maintain compatibility across different Julia versions.

diff --git a/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-08cd8a42.svg b/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-5d3675ed.svg similarity index 90% rename from dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-08cd8a42.svg rename to dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-5d3675ed.svg index 6940ee8..5a28787 100644 --- a/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-08cd8a42.svg +++ b/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-5d3675ed.svg @@ -1,37 +1,37 @@ - + - + - + - + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - + - + - + diff --git a/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-bf76b088.svg b/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-cc4fd638.svg similarity index 90% rename from dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-bf76b088.svg rename to dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-cc4fd638.svg index f56112b..95d05ea 100644 --- a/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-bf76b088.svg +++ b/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars-cc4fd638.svg @@ -1,37 +1,37 @@ - + - + - + - + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - + - + - + diff --git a/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars.html b/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars.html index 9fbdeb7..7ea2aca 100644 --- a/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars.html +++ b/dev/tutorials/Classification - cars/betaml_tutorial_classification_cars.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

A classification task when labels are known - determining the country of origin of cars given the cars characteristics

In this exercise we are provided with several technical characteristics (mpg, horsepower,weight, model year...) for several car's models, together with the country of origin of such models, and we would like to create a machine learning model such that the country of origin can be accurately predicted given the technical characteristics. As the information to predict is a multi-class one, this is a [classification](https://en.wikipedia.org/wiki/Statisticalclassification) task. It is a challenging exercise due to the simultaneous presence of three factors: (1) presence of missing data; (2) unbalanced data - 254 out of 406 cars are US made; (3) small dataset.

Data origin:

Field description:

  1. mpg: continuous
  2. cylinders: multi-valued discrete
  3. displacement: continuous
  4. horsepower: continuous
  5. weight: continuous
  6. acceleration: continuous
  7. model year: multi-valued discrete
  8. origin: multi-valued discrete
  9. car name: string (unique for each instance)

The car name is not used in this tutorial, so that the country is inferred only from technical data. As this field includes also the car maker, and there are several car's models from the same car maker, a more sophisticated machine learnign model could exploit this information e.g. using a bag of word encoding.

Library loading and initialisation

Activating the local environment specific to BetaML documentation

using Pkg
+

A classification task when labels are known - determining the country of origin of cars given the cars characteristics

In this exercise we are provided with several technical characteristics (mpg, horsepower,weight, model year...) for several car's models, together with the country of origin of such models, and we would like to create a machine learning model such that the country of origin can be accurately predicted given the technical characteristics. As the information to predict is a multi-class one, this is a [classification](https://en.wikipedia.org/wiki/Statisticalclassification) task. It is a challenging exercise due to the simultaneous presence of three factors: (1) presence of missing data; (2) unbalanced data - 254 out of 406 cars are US made; (3) small dataset.

Data origin:

Field description:

  1. mpg: continuous
  2. cylinders: multi-valued discrete
  3. displacement: continuous
  4. horsepower: continuous
  5. weight: continuous
  6. acceleration: continuous
  7. model year: multi-valued discrete
  8. origin: multi-valued discrete
  9. car name: string (unique for each instance)

The car name is not used in this tutorial, so that the country is inferred only from technical data. As this field includes also the car maker, and there are several car's models from the same car maker, a more sophisticated machine learnign model could exploit this information e.g. using a bag of word encoding.

Library loading and initialisation

Activating the local environment specific to BetaML documentation

using Pkg
 Pkg.activate(joinpath(@__DIR__,"..","..",".."))
  Activating environment at `~/work/BetaML.jl/BetaML.jl/docs/Project.toml`

We load a buch of packages that we'll use during this tutorial..

using Random, HTTP, Plots, CSV, DataFrames, BenchmarkTools, StableRNGs, BetaML
 import DecisionTree, Flux
 import Pipe: @pipe

Machine Learning workflows include stochastic components in several steps: in the data sampling, in the model initialisation and often in the models's own algorithms (and sometimes also in the prediciton step). BetaML provides a random nuber generator (RNG) in order to simplify reproducibility ( FIXEDRNG. This is nothing else than an istance of StableRNG(123) defined in the BetaML.Utils sub-module, but you can choose of course your own "fixed" RNG). See the Dealing with stochasticity section in the Getting started tutorial for details.

Here we are explicit and we use our own fixed RNG:

seed = 123 # The table at the end of this tutorial has been obtained with seeds 123, 1000 and 10000
@@ -13,7 +13,7 @@
              CSV.File(_, delim=' ', missingstring="NA", ignorerepeated=true, header=false) |>
              DataFrame;

This results in a table where the rows are the observations (the various cars' models) and the column the fields. All BetaML models expect this layout.

As the dataset is ordered, we randomly shuffle the data.

idx = randperm(copy(AFIXEDRNG),size(data,1))
 data[idx, :]
-describe(data)
9×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1Column123.51469.023.046.68Union{Missing, Float64}
2Column25.475373.04.08.00Float64
3Column3194.7868.0151.0455.00Float64
4Column4105.08246.095.0230.06Union{Missing, Float64}
5Column52979.411613.02822.55140.00Float64
6Column615.51978.015.524.80Float64
7Column775.921270.076.082.00Float64
8Column81.568971.01.03.00Float64
9Column9amc ambassador broughamvw rabbit custom0String

Columns 1 to 7 contain characteristics of the car, while column 8 encodes the country or origin ("1" -> US, "2" -> EU, "3" -> Japan). That's the variable we want to be able to predict.

Columns 9 contains the car name, but we are not going to use this information in this tutorial. Note also that some fields have missing data.

Our first step is hence to divide the dataset in features (the x) and the labels (the y) we want to predict. The x is then a Julia standard Matrix of 406 rows by 7 columns and the y is a vector of the 406 observations:

x     = Matrix{Union{Missing,Float64}}(data[:,1:7]);
+describe(data)

9 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1Column123.51469.023.046.68Union{Missing, Float64}
2Column25.475373.04.08.00Float64
3Column3194.7868.0151.0455.00Float64
4Column4105.08246.095.0230.06Union{Missing, Float64}
5Column52979.411613.02822.55140.00Float64
6Column615.51978.015.524.80Float64
7Column775.921270.076.082.00Float64
8Column81.568971.01.03.00Float64
9Column9amc ambassador broughamvw rabbit custom0String

Columns 1 to 7 contain characteristics of the car, while column 8 encodes the country or origin ("1" -> US, "2" -> EU, "3" -> Japan). That's the variable we want to be able to predict.

Columns 9 contains the car name, but we are not going to use this information in this tutorial. Note also that some fields have missing data.

Our first step is hence to divide the dataset in features (the x) and the labels (the y) we want to predict. The x is then a Julia standard Matrix of 406 rows by 7 columns and the y is a vector of the 406 observations:

x     = Matrix{Union{Missing,Float64}}(data[:,1:7]);
 y     = Vector{Int64}(data[:,8]);
 x     = fit!(Scaler(),x)
406×7 Matrix{Union{Missing, Float64}}:
  -0.706439   1.47635    1.07088    0.643526   0.620107   -1.25708   -1.58146
@@ -75,7 +75,7 @@
  0  1  0
  1  0  0
  1  0  0
- 1  0  0

In supervised machine learning it is good practice to partition the available data in a training, validation, and test subsets, where the first one is used to train the ML algorithm, the second one to train any eventual "hyper-parameters" of the algorithm and the test subset is finally used to evaluate the quality of the algorithm. Here, for brevity, we use only the train and the test subsets, implicitly assuming we already know the best hyper-parameters. Please refer to the regression tutorial for examples of the auto-tune feature of BetaML models to "automatically" train the hyper-parameters (hint: in most cases just add the parameter autotune=true in the model constructor), or the clustering tutorial for an example of using the cross_validation function to do it manually.

We use then the partition function in BetaML.Utils, where we can specify the different data to partition (each matrix or vector to partition must have the same number of observations) and the shares of observation that we want in each subset. Here we keep 80% of observations for training (xtrain, and ytrain) and we use 20% of them for testing (xtest, and ytest):

((xtrain,xtest),(ytrain,ytest),(ytrain_oh,ytest_oh)) = partition([x,y,y_oh],[0.8,1-0.8],rng=copy(AFIXEDRNG));

We finally set up a dataframe to store the accuracies of the various models we'll use.

results = DataFrame(model=String[],train_acc=Float64[],test_acc=Float64[])
0×3 DataFrame
Rowmodeltrain_acctest_acc
StringFloat64Float64

Random Forests

We are now ready to use our first model, the RandomForestEstimator. Random Forests build a "forest" of decision trees models and then average their predictions in order to make an overall prediction, wheter a regression or a classification.

While here the missing data has been imputed and the dataset is comprised of only numerical values, one attractive feature of BetaML RandomForestEstimator is that they can work directly with missing and categorical data without any prior processing required.

However as the labels are encoded using integers, we need also to specify the parameter force_classification=true, otherwise the model would undergo a regression job instead.

rfm      = RandomForestEstimator(force_classification=true, rng=copy(AFIXEDRNG))
RandomForestEstimator - A 30 trees Random Forest model (unfitted)

Opposite to the RandomForestImputer and OneHotEncoder models used earielr, to train a RandomForestEstimator model we need to provide it with both the training feature matrix and the associated "true" training labels. We use the same shortcut to get the training predictions directly from the fit! function. In this case the predictions correspond to the labels:

ŷtrain   = fit!(rfm,xtrain,ytrain)
325-element Vector{Dict{Int64, Float64}}:
+ 1  0  0

In supervised machine learning it is good practice to partition the available data in a training, validation, and test subsets, where the first one is used to train the ML algorithm, the second one to train any eventual "hyper-parameters" of the algorithm and the test subset is finally used to evaluate the quality of the algorithm. Here, for brevity, we use only the train and the test subsets, implicitly assuming we already know the best hyper-parameters. Please refer to the regression tutorial for examples of the auto-tune feature of BetaML models to "automatically" train the hyper-parameters (hint: in most cases just add the parameter autotune=true in the model constructor), or the clustering tutorial for an example of using the cross_validation function to do it manually.

We use then the partition function in BetaML.Utils, where we can specify the different data to partition (each matrix or vector to partition must have the same number of observations) and the shares of observation that we want in each subset. Here we keep 80% of observations for training (xtrain, and ytrain) and we use 20% of them for testing (xtest, and ytest):

((xtrain,xtest),(ytrain,ytest),(ytrain_oh,ytest_oh)) = partition([x,y,y_oh],[0.8,1-0.8],rng=copy(AFIXEDRNG));

We finally set up a dataframe to store the accuracies of the various models we'll use.

results = DataFrame(model=String[],train_acc=Float64[],test_acc=Float64[])

0 rows × 3 columns

modeltrain_acctest_acc
StringFloat64Float64

Random Forests

We are now ready to use our first model, the RandomForestEstimator. Random Forests build a "forest" of decision trees models and then average their predictions in order to make an overall prediction, wheter a regression or a classification.

While here the missing data has been imputed and the dataset is comprised of only numerical values, one attractive feature of BetaML RandomForestEstimator is that they can work directly with missing and categorical data without any prior processing required.

However as the labels are encoded using integers, we need also to specify the parameter force_classification=true, otherwise the model would undergo a regression job instead.

rfm      = RandomForestEstimator(force_classification=true, rng=copy(AFIXEDRNG))
RandomForestEstimator - A 30 trees Random Forest model (unfitted)

Opposite to the RandomForestImputer and OneHotEncoder models used earielr, to train a RandomForestEstimator model we need to provide it with both the training feature matrix and the associated "true" training labels. We use the same shortcut to get the training predictions directly from the fit! function. In this case the predictions correspond to the labels:

ŷtrain   = fit!(rfm,xtrain,ytrain)
325-element Vector{Dict{Int64, Float64}}:
  Dict(2 => 0.06666666666666667, 3 => 0.8666666666666666, 1 => 0.06666666666666667)
  Dict(1 => 0.9999999999999999)
  Dict(2 => 0.9999999999999999)
@@ -117,12 +117,12 @@
  3
  3

Why mode takes (optionally) a RNG ? I let the answer for you :-)

To obtain the predicted labels for the test set we simply run the predict function over the features of the test set:

ŷtest   = predict(rfm,xtest)
81-element Vector{Dict{Int64, Float64}}:
  Dict(2 => 0.6, 3 => 0.13333333333333333, 1 => 0.26666666666666666)
- Dict(2 => 0.6, 3 => 0.03333333333333333, 1 => 0.36666666666666664)
- Dict(2 => 0.6833333333333332, 3 => 0.1, 1 => 0.21666666666666665)
+ Dict(2 => 0.6333333333333333, 3 => 0.03333333333333333, 1 => 0.3333333333333333)
+ Dict(2 => 0.6499999999999999, 3 => 0.1, 1 => 0.24999999999999997)
  Dict(1 => 0.9999999999999999)
  Dict(2 => 0.1, 3 => 0.3333333333333333, 1 => 0.5666666666666667)
  Dict(3 => 0.03333333333333333, 1 => 0.9666666666666666)
- Dict(2 => 0.24999999999999997, 3 => 0.7499999999999999)
+ Dict(2 => 0.2833333333333333, 3 => 0.7166666666666666)
  Dict(2 => 0.47222222222222215, 3 => 0.13333333333333333, 1 => 0.3944444444444444)
  Dict(2 => 0.03333333333333333, 3 => 0.03333333333333333, 1 => 0.9333333333333332)
  Dict(1 => 0.9999999999999999)
@@ -132,9 +132,9 @@
  Dict(2 => 0.21666666666666665, 3 => 0.7166666666666666, 1 => 0.06666666666666667)
  Dict(1 => 0.9999999999999999)
  Dict(1 => 0.9999999999999999)
- Dict(2 => 0.18333333333333332, 3 => 0.49999999999999994, 1 => 0.31666666666666665)
- Dict(2 => 0.08333333333333334, 3 => 0.8999999999999999, 1 => 0.016666666666666666)
- Dict(2 => 0.18333333333333332, 3 => 0.5666666666666667, 1 => 0.24999999999999997)
+ Dict(2 => 0.18333333333333332, 3 => 0.5333333333333333, 1 => 0.2833333333333333)
+ Dict(2 => 0.05, 3 => 0.9333333333333332, 1 => 0.016666666666666666)
+ Dict(2 => 0.21666666666666665, 3 => 0.5333333333333333, 1 => 0.24999999999999997)
  Dict(1 => 0.9999999999999999)

Finally we can measure the accuracy of our predictions with the accuracy function. We don't need to explicitly use mode, as accuracy does it itself when it is passed with predictions expressed as a dictionary:

trainAccuracy,testAccuracy  = accuracy.([ytrain,ytest],[ŷtrain,ŷtest],rng=copy(AFIXEDRNG))
2-element Vector{Float64}:
  1.0
  0.7283950617283951

We are now ready to store our first model accuracies in the results dataframe:

push!(results,["RF",trainAccuracy,testAccuracy]);

The predictions are quite good, for the training set the algoritm predicted almost all cars' origins correctly, while for the testing set (i.e. those records that has not been used to train the algorithm), the correct prediction level is still quite high, at around 80% (depends on the random seed)

While accuracy can sometimes suffice, we may often want to better understand which categories our model has trouble to predict correctly. We can investigate the output of a multi-class classifier more in-deep with a ConfusionMatrix where the true values (y) are given in rows and the predicted ones () in columns, together to some per-class metrics like the precision (true class i over predicted in class i), the recall (predicted class i over the true class i) and others.

We fist build the ConfusionMatrix model, we train it with and y and then we print it (we do it here for the test subset):

cfm = ConfusionMatrix(categories_names=Dict(1=>"US",2=>"EU",3=>"Japan"),rng=copy(AFIXEDRNG))
@@ -199,7 +199,7 @@
 - fn:	[8, 5, 9]
 - categories:	["EU", "US", "Japan"]
 - fp:	[4, 12, 6]

From the report we can see that Japanese cars have more trouble in being correctly classified, and in particular many Japanease cars are classified as US ones. This is likely a result of the class imbalance of the data set, and could be solved by balancing the dataset with various sampling tecniques before training the model.

If you prefer a more graphical approach, we can also plot the confusion matrix. In order to do so, we pick up information from the info(cfm) function. Indeed most BetaML models can be queried with info(model) to retrieve additional information, in terms of a dictionary, that is not necessary to the prediciton, but could still be relevant. Other functions that you can use with BetaML models are parameters(m) and hyperparamaeters(m).

res = info(cfm)
-heatmap(string.(res["categories"]),string.(res["categories"]),res["normalised_scores"],seriescolor=cgrad([:white,:blue]),xlabel="Predicted",ylabel="Actual", title="Confusion Matrix (normalised scores)")
Example block output

Comparision with DecisionTree.jl

We now compare BetaML [RandomForestEstimator] with the random forest estimator of the package DecisionTrees.jl` random forests are similar in usage: we first "build" (train) the forest and we then make predictions out of the trained model.

# We train the model...
+heatmap(string.(res["categories"]),string.(res["categories"]),res["normalised_scores"],seriescolor=cgrad([:white,:blue]),xlabel="Predicted",ylabel="Actual", title="Confusion Matrix (normalised scores)")
Example block output

Comparision with DecisionTree.jl

We now compare BetaML [RandomForestEstimator] with the random forest estimator of the package DecisionTrees.jl` random forests are similar in usage: we first "build" (train) the forest and we then make predictions out of the trained model.

# We train the model...
 model = DecisionTree.build_forest(ytrain, xtrain,rng=seed)
 # ..and we generate predictions and measure their error
 (ŷtrain,ŷtest) = DecisionTree.apply_forest.([model],[xtrain,xtest]);
@@ -273,7 +273,7 @@
 fit!(cfm,ytest,ŷtest)
 print(cfm)
 res = info(cfm)
-heatmap(string.(res["categories"]),string.(res["categories"]),res["normalised_scores"],seriescolor=cgrad([:white,:blue]),xlabel="Predicted",ylabel="Actual", title="Confusion Matrix (normalised scores)")
Example block output

While accuracies are a bit lower, the distribution of misclassification is similar, with many Jamanease cars misclassified as US ones (here we have also some EU cars misclassified as Japanease ones).

Comparisons with Flux

As we did for Random Forests, we compare BetaML neural networks with the leading package for deep learning in Julia, Flux.jl.

In Flux the input must be in the form (fields, observations), so we transpose our original matrices

xtrainT, ytrain_ohT = transpose.([xtrain, ytrain_oh])
+heatmap(string.(res["categories"]),string.(res["categories"]),res["normalised_scores"],seriescolor=cgrad([:white,:blue]),xlabel="Predicted",ylabel="Actual", title="Confusion Matrix (normalised scores)")
Example block output

While accuracies are a bit lower, the distribution of misclassification is similar, with many Jamanease cars misclassified as US ones (here we have also some EU cars misclassified as Japanease ones).

Comparisons with Flux

As we did for Random Forests, we compare BetaML neural networks with the leading package for deep learning in Julia, Flux.jl.

In Flux the input must be in the form (fields, observations), so we transpose our original matrices

xtrainT, ytrain_ohT = transpose.([xtrain, ytrain_oh])
 xtestT, ytest_ohT   = transpose.([xtest, ytest_oh])
2-element Vector{LinearAlgebra.Transpose{Float64, Matrix{Float64}}}:
  [-0.9370258544446618 0.8308089913068687 … 1.6506744270177232 -0.5783347263211628; 0.30679255888470214 -0.8627640505724731 … -0.27798574584388547 0.30679255888470214; … ; 0.10010898192256414 2.2430393858184523 … 1.564444757918087 -0.007037538272230526; 0.5552223059557987 1.0893935292213297 … 1.0893935292213297 -1.31437697547356]
  [0.0 0.0 … 0.0 1.0; 1.0 1.0 … 1.0 0.0; 0.0 0.0 … 0.0 0.0]

We define the Flux neural network model in a similar way than BetaML and load it with data, we train it, predict and measure the accuracies on the training and the test sets:

We fix the random seed for Flux, altough you may still get different results depending on the number of threads used.. this is a problem we solve in BetaML with generate_parallel_rngs.

Random.seed!(seed)
@@ -356,17 +356,17 @@
 ***
 *** Training kernel perceptron for maximum 100 iterations. Random shuffle: true
 Avg. error after iteration 1 : 0.15671641791044777
-
Training Kernel Perceptron...   6%|█▍                    |  ETA: 0:00:09Avg. error after iteration 10 : 0.055970149253731345
-
Training Kernel Perceptron...  12%|██▋                   |  ETA: 0:00:08
Training Kernel Perceptron...  18%|████                  |  ETA: 0:00:08Avg. error after iteration 20 : 0.05970149253731343
+
Training Kernel Perceptron...   6%|█▍                    |  ETA: 0:00:08Avg. error after iteration 10 : 0.055970149253731345
+
Training Kernel Perceptron...  12%|██▋                   |  ETA: 0:00:08
Training Kernel Perceptron...  18%|████                  |  ETA: 0:00:07Avg. error after iteration 20 : 0.05970149253731343
 
Training Kernel Perceptron...  24%|█████▎                |  ETA: 0:00:07Avg. error after iteration 30 : 0.03731343283582089
 
Training Kernel Perceptron...  30%|██████▋               |  ETA: 0:00:06
Training Kernel Perceptron...  36%|███████▉              |  ETA: 0:00:06Avg. error after iteration 40 : 0.05970149253731343
 
Training Kernel Perceptron...  42%|█████████▎            |  ETA: 0:00:05
Training Kernel Perceptron...  48%|██████████▌           |  ETA: 0:00:05Avg. error after iteration 50 : 0.041044776119402986
 
Training Kernel Perceptron...  54%|███████████▉          |  ETA: 0:00:04Avg. error after iteration 60 : 0.022388059701492536
 
Training Kernel Perceptron...  60%|█████████████▎        |  ETA: 0:00:04
Training Kernel Perceptron...  66%|██████████████▌       |  ETA: 0:00:03Avg. error after iteration 70 : 0.033582089552238806
-
Training Kernel Perceptron...  72%|███████████████▉      |  ETA: 0:00:03
Training Kernel Perceptron...  78%|█████████████████▏    |  ETA: 0:00:02Avg. error after iteration 80 : 0.026119402985074626
+
Training Kernel Perceptron...  72%|███████████████▉      |  ETA: 0:00:02
Training Kernel Perceptron...  78%|█████████████████▏    |  ETA: 0:00:02Avg. error after iteration 80 : 0.026119402985074626
 
Training Kernel Perceptron...  84%|██████████████████▌   |  ETA: 0:00:01Avg. error after iteration 90 : 0.033582089552238806
 
Training Kernel Perceptron...  90%|███████████████████▊  |  ETA: 0:00:01
Training Kernel Perceptron...  96%|█████████████████████▏|  ETA: 0:00:00Avg. error after iteration 100 : 0.026119402985074626
-
Training Kernel Perceptron... 100%|██████████████████████| Time: 0:00:09
+
Training Kernel Perceptron... 100%|██████████████████████| Time: 0:00:08
 Running function BetaML.Perceptron.#kernel_perceptron_classifier_binary#17 at /home/runner/work/BetaML.jl/BetaML.jl/src/Perceptron/Perceptron_kernel.jl:133
 Type `]dev BetaML` to modify the source code (this would change its location on disk)
 ***
@@ -374,7 +374,7 @@
 Avg. error after iteration 1 : 0.4166666666666667
 Avg. error after iteration 10 : 0.13333333333333333
 Avg. error after iteration 20 : 0.1
-
Training Kernel Perceptron...  28%|██████▏               |  ETA: 0:00:01Avg. error after iteration 30 : 0.09166666666666666
+
Training Kernel Perceptron...  29%|██████▍               |  ETA: 0:00:01Avg. error after iteration 30 : 0.09166666666666666
 Avg. error after iteration 40 : 0.08333333333333333
 *** Avg. error after epoch 49 : 0.0 (all elements of the set has been correctly classified)
 
Training Kernel Perceptron... 100%|██████████████████████| Time: 0:00:00
@@ -383,10 +383,10 @@
 ***
 *** Training kernel perceptron for maximum 100 iterations. Random shuffle: true
 Avg. error after iteration 1 : 0.16793893129770993
-
Training Kernel Perceptron...   6%|█▍                    |  ETA: 0:00:08Avg. error after iteration 10 : 0.06870229007633588
-
Training Kernel Perceptron...  12%|██▋                   |  ETA: 0:00:08
Training Kernel Perceptron...  18%|████                  |  ETA: 0:00:07Avg. error after iteration 20 : 0.04198473282442748
-
Training Kernel Perceptron...  24%|█████▎                |  ETA: 0:00:07Avg. error after iteration 30 : 0.03816793893129771
-
Training Kernel Perceptron...  30%|██████▋               |  ETA: 0:00:06
Training Kernel Perceptron...  36%|███████▉              |  ETA: 0:00:06*** Avg. error after epoch 40 : 0.0 (all elements of the set has been correctly classified)
+
Training Kernel Perceptron...   7%|█▌                    |  ETA: 0:00:08Avg. error after iteration 10 : 0.06870229007633588
+
Training Kernel Perceptron...  14%|███▏                  |  ETA: 0:00:07Avg. error after iteration 20 : 0.04198473282442748
+
Training Kernel Perceptron...  21%|████▋                 |  ETA: 0:00:07
Training Kernel Perceptron...  28%|██████▏               |  ETA: 0:00:06Avg. error after iteration 30 : 0.03816793893129771
+
Training Kernel Perceptron...  34%|███████▌              |  ETA: 0:00:06*** Avg. error after epoch 40 : 0.0 (all elements of the set has been correctly classified)
 
Training Kernel Perceptron... 100%|██████████████████████| Time: 0:00:03
 ***
 *** Training pegasos for maximum 1000 iterations. Random shuffle: true
@@ -436,4 +436,4 @@
    4 │ NN (Flux.jl)                 0.978462  0.765432
    5 │ Perceptron                   0.735385  0.691358
    6 │ KernelPerceptronClassifier   0.978462  0.703704
-   7 │ Pegasaus                     0.670769  0.691358

If you clone BetaML repository

Model accuracies on my machine with seedd 123, 1000 and 10000 respectivelly

modeltrain 1test 1train 2test 2train 3test 3
RF0.9969230.7654321.0000000.8024691.0000000.888889
RF (DecisionTrees.jl)0.9753850.7654320.9846150.7777780.9753850.864198
NN0.8861540.7283950.9169230.8271600.8953850.876543
│ NN (Flux.jl)0.7938460.6543210.9384620.7901230.9353850.851852
│ Perceptron0.7784620.7037040.7200000.7530860.6707690.654321
│ KernelPerceptronClassifier0.9876920.7037040.9784620.7777780.9446150.827160
│ Pegasaus0.7323080.7037040.6338460.7530860.5753850.654321

We warn that this table just provides a rought idea of the various algorithms performances. Indeed there is a large amount of stochasticity both in the sampling of the data used for training/testing and in the initial settings of the parameters of the algorithm. For a statistically significant comparision we would have to repeat the analysis with multiple sampling (e.g. by cross-validation, see the clustering tutorial for an example) and initial random parameters.

Neverthless the table above shows that, when we compare BetaML with the algorithm-specific leading packages, we found similar results in terms of accuracy, but often the leading packages are better optimised and run more efficiently (but sometimes at the cost of being less verstatile). Also, for this dataset, Random Forests seems to remain marginally more accurate than Neural Network, altought of course this depends on the hyper-parameters and, with a single run of the models, we don't know if this difference is significant.

View this file on Github.


This page was generated using Literate.jl.

+ 7 │ Pegasaus 0.670769 0.691358

If you clone BetaML repository

Model accuracies on my machine with seedd 123, 1000 and 10000 respectivelly

modeltrain 1test 1train 2test 2train 3test 3
RF0.9969230.7654321.0000000.8024691.0000000.888889
RF (DecisionTrees.jl)0.9753850.7654320.9846150.7777780.9753850.864198
NN0.8861540.7283950.9169230.8271600.8953850.876543
│ NN (Flux.jl)0.7938460.6543210.9384620.7901230.9353850.851852
│ Perceptron0.7784620.7037040.7200000.7530860.6707690.654321
│ KernelPerceptronClassifier0.9876920.7037040.9784620.7777780.9446150.827160
│ Pegasaus0.7323080.7037040.6338460.7530860.5753850.654321

We warn that this table just provides a rought idea of the various algorithms performances. Indeed there is a large amount of stochasticity both in the sampling of the data used for training/testing and in the initial settings of the parameters of the algorithm. For a statistically significant comparision we would have to repeat the analysis with multiple sampling (e.g. by cross-validation, see the clustering tutorial for an example) and initial random parameters.

Neverthless the table above shows that, when we compare BetaML with the algorithm-specific leading packages, we found similar results in terms of accuracy, but often the leading packages are better optimised and run more efficiently (but sometimes at the cost of being less verstatile). Also, for this dataset, Random Forests seems to remain marginally more accurate than Neural Network, altought of course this depends on the hyper-parameters and, with a single run of the models, we don't know if this difference is significant.

View this file on Github.


This page was generated using Literate.jl.

diff --git a/dev/tutorials/Clustering - Iris/betaml_tutorial_cluster_iris-c117f6b5.svg b/dev/tutorials/Clustering - Iris/betaml_tutorial_cluster_iris-00862f78.svg similarity index 86% rename from dev/tutorials/Clustering - Iris/betaml_tutorial_cluster_iris-c117f6b5.svg rename to dev/tutorials/Clustering - Iris/betaml_tutorial_cluster_iris-00862f78.svg index 1179387..78e44e6 100644 --- a/dev/tutorials/Clustering - Iris/betaml_tutorial_cluster_iris-c117f6b5.svg +++ b/dev/tutorials/Clustering - Iris/betaml_tutorial_cluster_iris-00862f78.svg @@ -1,50 +1,50 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html b/dev/tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html index afcb91f..9f664ce 100644 --- a/dev/tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html +++ b/dev/tutorials/Clustering - Iris/betaml_tutorial_cluster_iris.html @@ -3,12 +3,12 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

A clustering task: the prediction of plant species from floreal measures (the iris dataset)

The task is to estimate the species of a plant given some floreal measurements. It use the classical "Iris" dataset. Note that in this example we are using clustering approaches, so we try to understand the "structure" of our data, without relying to actually knowing the true labels ("classes" or "factors"). However we have chosen a dataset for which the true labels are actually known, so we can compare the accuracy of the algorithms we use, but these labels will not be used during the algorithms training.

Data origin:

Library and data loading

Activating the local environment specific to BetaML documentation

using Pkg
+

A clustering task: the prediction of plant species from floreal measures (the iris dataset)

The task is to estimate the species of a plant given some floreal measurements. It use the classical "Iris" dataset. Note that in this example we are using clustering approaches, so we try to understand the "structure" of our data, without relying to actually knowing the true labels ("classes" or "factors"). However we have chosen a dataset for which the true labels are actually known, so we can compare the accuracy of the algorithms we use, but these labels will not be used during the algorithms training.

Data origin:

Library and data loading

Activating the local environment specific to BetaML documentation

using Pkg
 Pkg.activate(joinpath(@__DIR__,"..","..",".."))
  Activating environment at `~/work/BetaML.jl/BetaML.jl/docs/Project.toml`

We load the Beta Machine Learning Toolkit as well as some other packages that we use in this tutorial

using BetaML
 using Random, Statistics, Logging, BenchmarkTools, StableRNGs, RDatasets, Plots, DataFrames

We are also going to compare our results with two other leading packages in Julia for clustering analysis, Clustering.jl that provides (inter alia) kmeans and kmedoids algorithms and GaussianMixtures.jl that provides, as the name says, Gaussian Mixture Models. So we import them (we "import" them, rather than "use", not to bound their full names into namespace as some would collide with BetaML).

import Clustering, GaussianMixtures

Here we are explicit and we use our own fixed RNG:

seed = 123 # The table at the end of this tutorial has been obtained with seeds 123, 1000 and 10000
 AFIXEDRNG = StableRNG(seed)
StableRNGs.LehmerRNG(state=0x000000000000000000000000000000f7)

We do a few tweeks for the Clustering and GaussianMixtures packages. Note that in BetaML we can also control both the random seed and the verbosity in the algorithm call, not only globally

Random.seed!(seed)
 #logger  = Logging.SimpleLogger(stdout, Logging.Error); global_logger(logger); ## For suppressing GaussianMixtures output

Differently from the regression tutorial, we load the data here from [RDatasets](https://github.com/JuliaStats/RDatasets.jl](https://github.com/JuliaStats/RDatasets.jl), a package providing standard datasets.

iris = dataset("datasets", "iris")
-describe(iris)
5×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
1SepalLength5.843334.35.87.90Float64
2SepalWidth3.057332.03.04.40Float64
3PetalLength3.7581.04.356.90Float64
4PetalWidth1.199330.11.32.50Float64
5Speciessetosavirginica0CategoricalValue{String, UInt8}

The iris dataset provides floreal measures in columns 1 to 4 and the assigned species name in column 5. There are no missing values

Data preparation

The first step is to prepare the data for the analysis. We collect the first 4 columns as our feature x matrix and the last one as our y label vector. As we are using clustering algorithms, we are not actually using the labels to train the algorithms, we'll behave like we do not know them, we'll just let the algorithm "learn" from the structure of the data itself. We'll however use it to judge the accuracy that the various algorithms reach.

x       = Matrix{Float64}(iris[:,1:4]);
+describe(iris)

5 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
1SepalLength5.843334.35.87.90Float64
2SepalWidth3.057332.03.04.40Float64
3PetalLength3.7581.04.356.90Float64
4PetalWidth1.199330.11.32.50Float64
5Speciessetosavirginica0CategoricalValue{String, UInt8}

The iris dataset provides floreal measures in columns 1 to 4 and the assigned species name in column 5. There are no missing values

Data preparation

The first step is to prepare the data for the analysis. We collect the first 4 columns as our feature x matrix and the last one as our y label vector. As we are using clustering algorithms, we are not actually using the labels to train the algorithms, we'll behave like we do not know them, we'll just let the algorithm "learn" from the structure of the data itself. We'll however use it to judge the accuracy that the various algorithms reach.

x       = Matrix{Float64}(iris[:,1:4]);
 yLabels = unique(iris[:,5])
3-element Vector{String}:
  "setosa"
  "versicolor"
@@ -78,7 +78,7 @@
 
 modelLabels=["kMeansG","kMeansR","kMeansS","kMedoidsG","kMedoidsR","kMedoidsS","gmmSpher","gmmDiag","gmmFull","kMeans (Clustering.jl)","gmmDiag (GaussianMixtures.jl)","gmmFull (GaussianMixtures.jl)"]
 
-report = DataFrame(mName = modelLabels, avgAccuracy = dropdims(round.(μs',digits=3),dims=2), stdAccuracy = dropdims(round.(σs',digits=3),dims=2))
12×3 DataFrame
RowmNameavgAccuracystdAccuracy
StringFloat64Float64
1kMeansG0.8920.015
2kMeansR0.8350.098
3kMeansS0.8250.134
4kMedoidsG0.8970.014
5kMedoidsR0.8170.136
6kMedoidsS0.8510.124
7gmmSpher0.8940.015
8gmmDiag0.9180.019
9gmmFull0.9740.027
10kMeans (Clustering.jl)0.8710.09
11gmmDiag (GaussianMixtures.jl)0.8810.1
12gmmFull (GaussianMixtures.jl)0.9090.138

Accuracies (mean and its standard dev.) running this scripts with different random seeds (123, 1000 and 10000):

modelμ 1σ² 1μ 2σ² 2μ 3σ² 3
│ kMeansG0.8910.0170.8920.0120.8930.017
│ kMeansR0.8660.0830.8310.1270.8360.114
│ kMeansS0.7640.1740.8220.1450.7790.170
│ kMedoidsG0.8940.0150.8960.0120.8940.017
│ kMedoidsR0.8040.1440.8410.1230.8250.134
│ kMedoidsS0.8930.0180.8340.1300.8770.085
│ gmmSpher0.8930.0160.8910.0160.8950.017
│ gmmDiag0.9170.0220.9120.0160.9160.014
│ gmmFull0.9700.0350.9820.0130.9810.009
│ kMeans (Clustering.jl)0.8560.1120.8730.0830.8730.089
│ gmmDiag (GaussianMixtures.jl)0.8650.1270.8720.0900.8330.152
│ gmmFull (GaussianMixtures.jl)0.9070.1330.9140.1600.9170.141

We can see that running the script multiple times with different random seed confirm the estimated standard deviations collected with the cross_validation, with the BetaML GMM-based models and grid based ones being the most stable ones.

BetaML model accuracies

From the output We see that the gmm models perform for this dataset generally better than kmeans or kmedoids algorithms, and they further have very low variances. In detail, it is the (default) grid initialisation that leads to the better results for kmeans and kmedoids, while for the gmm models it is the FullGaussian to perform better.

Comparisions with Clustering.jl and GaussianMixtures.jl

For this specific case, both Clustering.jl and GaussianMixtures.jl report substantially worst accuracies, and with very high variances. But we maintain the ranking that Full Gaussian gmm > Diagonal Gaussian > Kmeans accuracy. I suspect the reason that BetaML gmm works so well is in relation to the usage of kmeans algorithm for initialisation of the mixtures, itself initialized with a "grid" arpproach. The grid initialisation "guarantee" indeed that the initial means of the mixture components are well spread across the multidimensional space defined by the data, and it helps avoiding the EM algoritm to converge to a bad local optimus.

Working without the labels

Up to now we used the real labels to compare the model accuracies. But in real clustering examples we don't have the true classes, or we wouln't need to do clustering in the first instance, so we don't know the number of classes to use. There are several methods to judge clusters algorithms goodness. For likelyhood based algorithms as GaussianMixtureClusterer we can use a information criteria that trade the goodness of the lickelyhood with the number of parameters used to do the fit. BetaML provides by default in the gmm clustering outputs both the Bayesian information criterion (BIC) and the Akaike information criterion (AIC), where for both a lower value is better.

We can then run the model with different number of classes and see which one leads to the lower BIC or AIC. We run hence cross_validation again with the FullGaussian gmm model. Note that we use the BIC/AIC criteria here for establishing the "best" number of classes but we could have used it also to select the kind of Gaussain distribution to use. This is one example of hyper-parameter tuning that we developed more in detail using autotuning in the regression tutorial.

Let's try up to 4 possible classes:

K = 4
+report = DataFrame(mName = modelLabels, avgAccuracy = dropdims(round.(μs',digits=3),dims=2), stdAccuracy = dropdims(round.(σs',digits=3),dims=2))

12 rows × 3 columns

mNameavgAccuracystdAccuracy
StringFloat64Float64
1kMeansG0.8920.015
2kMeansR0.8350.098
3kMeansS0.8250.134
4kMedoidsG0.8970.014
5kMedoidsR0.8170.136
6kMedoidsS0.8510.124
7gmmSpher0.8940.015
8gmmDiag0.9180.019
9gmmFull0.9740.027
10kMeans (Clustering.jl)0.8710.09
11gmmDiag (GaussianMixtures.jl)0.8810.1
12gmmFull (GaussianMixtures.jl)0.9090.138

Accuracies (mean and its standard dev.) running this scripts with different random seeds (123, 1000 and 10000):

modelμ 1σ² 1μ 2σ² 2μ 3σ² 3
│ kMeansG0.8910.0170.8920.0120.8930.017
│ kMeansR0.8660.0830.8310.1270.8360.114
│ kMeansS0.7640.1740.8220.1450.7790.170
│ kMedoidsG0.8940.0150.8960.0120.8940.017
│ kMedoidsR0.8040.1440.8410.1230.8250.134
│ kMedoidsS0.8930.0180.8340.1300.8770.085
│ gmmSpher0.8930.0160.8910.0160.8950.017
│ gmmDiag0.9170.0220.9120.0160.9160.014
│ gmmFull0.9700.0350.9820.0130.9810.009
│ kMeans (Clustering.jl)0.8560.1120.8730.0830.8730.089
│ gmmDiag (GaussianMixtures.jl)0.8650.1270.8720.0900.8330.152
│ gmmFull (GaussianMixtures.jl)0.9070.1330.9140.1600.9170.141

We can see that running the script multiple times with different random seed confirm the estimated standard deviations collected with the cross_validation, with the BetaML GMM-based models and grid based ones being the most stable ones.

BetaML model accuracies

From the output We see that the gmm models perform for this dataset generally better than kmeans or kmedoids algorithms, and they further have very low variances. In detail, it is the (default) grid initialisation that leads to the better results for kmeans and kmedoids, while for the gmm models it is the FullGaussian to perform better.

Comparisions with Clustering.jl and GaussianMixtures.jl

For this specific case, both Clustering.jl and GaussianMixtures.jl report substantially worst accuracies, and with very high variances. But we maintain the ranking that Full Gaussian gmm > Diagonal Gaussian > Kmeans accuracy. I suspect the reason that BetaML gmm works so well is in relation to the usage of kmeans algorithm for initialisation of the mixtures, itself initialized with a "grid" arpproach. The grid initialisation "guarantee" indeed that the initial means of the mixture components are well spread across the multidimensional space defined by the data, and it helps avoiding the EM algoritm to converge to a bad local optimus.

Working without the labels

Up to now we used the real labels to compare the model accuracies. But in real clustering examples we don't have the true classes, or we wouln't need to do clustering in the first instance, so we don't know the number of classes to use. There are several methods to judge clusters algorithms goodness. For likelyhood based algorithms as GaussianMixtureClusterer we can use a information criteria that trade the goodness of the lickelyhood with the number of parameters used to do the fit. BetaML provides by default in the gmm clustering outputs both the Bayesian information criterion (BIC) and the Akaike information criterion (AIC), where for both a lower value is better.

We can then run the model with different number of classes and see which one leads to the lower BIC or AIC. We run hence cross_validation again with the FullGaussian gmm model. Note that we use the BIC/AIC criteria here for establishing the "best" number of classes but we could have used it also to select the kind of Gaussain distribution to use. This is one example of hyper-parameter tuning that we developed more in detail using autotuning in the regression tutorial.

Let's try up to 4 possible classes:

K = 4
 sampler = KFold(nsplits=5,nrepeats=2,shuffle=true, rng=copy(AFIXEDRNG))
 cOut = cross_validation([x,y],sampler,return_statistics=false) do trainData,testData,rng
     (xtrain,ytrain)  = trainData;
@@ -105,7 +105,7 @@
  762.112  516.031  539.392  593.272
σsBICS = std(BICS,dims=1)
1×4 Matrix{Float64}:
  12.2912  15.8085  17.7181  24.6026
μsAICS = mean(AICS,dims=1)
1×4 Matrix{Float64}:
  723.087  435.194  416.743  428.81
σsAICS = std(AICS,dims=1)
1×4 Matrix{Float64}:
- 12.2912  15.8085  17.7181  24.6026
plot(1:K,[μsBICS' μsAICS'], labels=["BIC" "AIC"], title="Information criteria by number of classes", xlabel="number of classes", ylabel="lower is better")
Example block output

We see that following the "lowest AIC" rule we would indeed choose three classes, while following the "lowest BIC" criteria we would have choosen only two classes. This means that there is two classes that, concerning the floreal measures used in the database, are very similar, and our models are unsure about them. Perhaps the biologists will end up one day with the conclusion that it is indeed only one specie :-).

We could study this issue more in detail by analysing the ConfusionMatrix, but the one used in BetaML does not account for the ignorelabels option (yet).

Analysing the silhouette of the cluster

A further metric to analyse cluster output is the so-called Sinhouette method

Silhouette is a distance-based metric and require as first argument a matrix of pairwise distances. This can be computed with the pairwise function, that default to using l2_distance (i.e. Euclidean). Many other distance functions are available in the Clustering sub-module or one can use the efficiently implemented distances from the Distances package, as in this example.

We'll use here the silhouette function over a simple loop:

x,y = consistent_shuffle([x,y],dims=1)
+ 12.2912  15.8085  17.7181  24.6026
plot(1:K,[μsBICS' μsAICS'], labels=["BIC" "AIC"], title="Information criteria by number of classes", xlabel="number of classes", ylabel="lower is better")
Example block output

We see that following the "lowest AIC" rule we would indeed choose three classes, while following the "lowest BIC" criteria we would have choosen only two classes. This means that there is two classes that, concerning the floreal measures used in the database, are very similar, and our models are unsure about them. Perhaps the biologists will end up one day with the conclusion that it is indeed only one specie :-).

We could study this issue more in detail by analysing the ConfusionMatrix, but the one used in BetaML does not account for the ignorelabels option (yet).

Analysing the silhouette of the cluster

A further metric to analyse cluster output is the so-called Sinhouette method

Silhouette is a distance-based metric and require as first argument a matrix of pairwise distances. This can be computed with the pairwise function, that default to using l2_distance (i.e. Euclidean). Many other distance functions are available in the Clustering sub-module or one can use the efficiently implemented distances from the Distances package, as in this example.

We'll use here the silhouette function over a simple loop:

x,y = consistent_shuffle([x,y],dims=1)
 import Distances
 pd = pairwise(x,distance=Distances.euclidean) # we compute the pairwise distances
 nclasses = 2:6
@@ -134,4 +134,4 @@
 GaussianMixtureClusterer 	 (5 classes): 0.4863408030950679
 KMeansClusterer 	 (6 classes): 0.3674845748098317
 KMedoidsClusterer 	 (6 classes): 0.34916011367198635
-GaussianMixtureClusterer 	 (6 classes): 0.3543173617053886

Highest levels are better. We see again that 2 classes have better scores !

Conclusions

We have shown in this tutorial how we can easily run clustering algorithms in BetaML with just one line of code fit!(ChoosenClusterer(),x), but also how can we use cross-validation in order to help the model or parameter selection, with or whithout knowing the real classes. We retrieve here what we observed with supervised models. Globally the accuracy of BetaML models are comparable to those of leading specialised packages (in this case they are even better), but there is a significant gap in computational efficiency that restricts the pratical usage of BetaML to datasets that fits in the pc memory. However we trade this relative inefficiency with very flexible model definition and utility functions (for example GaussianMixtureClusterer works with missing data, allowing it to be used as the backbone of the GaussianMixtureImputer missing imputation function, or for collaborative reccomendation systems).

View this file on Github.


This page was generated using Literate.jl.

+GaussianMixtureClusterer (6 classes): 0.3543173617053886

Highest levels are better. We see again that 2 classes have better scores !

Conclusions

We have shown in this tutorial how we can easily run clustering algorithms in BetaML with just one line of code fit!(ChoosenClusterer(),x), but also how can we use cross-validation in order to help the model or parameter selection, with or whithout knowing the real classes. We retrieve here what we observed with supervised models. Globally the accuracy of BetaML models are comparable to those of leading specialised packages (in this case they are even better), but there is a significant gap in computational efficiency that restricts the pratical usage of BetaML to datasets that fits in the pc memory. However we trade this relative inefficiency with very flexible model definition and utility functions (for example GaussianMixtureClusterer works with missing data, allowing it to be used as the backbone of the GaussianMixtureImputer missing imputation function, or for collaborative reccomendation systems).

View this file on Github.


This page was generated using Literate.jl.

diff --git a/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-f226ab71.svg b/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-915b89af.svg similarity index 87% rename from dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-f226ab71.svg rename to dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-915b89af.svg index 200d4d6..5b18559 100644 --- a/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-f226ab71.svg +++ b/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-915b89af.svg @@ -1,41 +1,41 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-0d345d12.svg b/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-ee2a7d64.svg similarity index 80% rename from dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-0d345d12.svg rename to dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-ee2a7d64.svg index afc1adc..e782d04 100644 --- a/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-0d345d12.svg +++ b/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn-ee2a7d64.svg @@ -1,142 +1,142 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html b/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html index 4a84f97..7d5a063 100644 --- a/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html +++ b/dev/tutorials/Multi-branch neural network/betaml_tutorial_multibranch_nn.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

A deep neural network with multi-branch architecture

Often we can "divide" our feature sets into different groups, where for each group we have many, many variables whose importance in prediction we don't know, but for which using a fully dense layer would be too computationally expensive. For example, we want to predict the growth of forest trees based on soil characteristics, climate characteristics and a bunch of other data (species, age, density...).

A soil (or climate) database may have hundreds of variables, how can we reduce them to a few that encode all the "soil" information? Sure, we could do a PCA or a clustering analysis, but a better way is to let our model itself find a way to encode the soil information into a vector in a way that is optimal for our prediction goal, i.e. we target the encoding task at our prediction goal.

So we run a multi-branch neural network where one branch is given by the soil variables - it starts from all the hundreds of variables and ends in a few neuron outputs, another branch in a similar way is for the climate variables, we merge them in a branch to take into account the soil-weather interrelation (for example, it is well known that the water retention capacity of a sandy soil is quite different from that of a clay soil) and finally we merge this branch with the other variable branch to arrive at a single predicted output. In this example we focus on building, training and predicting a multi-branch neural network. See the other examples for cross-validation, hyperparameter tuning, scaling, overfitting, encoding, etc.

Data origin:

  • while we hope to apply this example soon on actual real world data, for now we work on synthetic random data just to assess the validity of the network configuration.

Library and data generation

Activating the local environment specific to the tutorials

using Pkg
+

A deep neural network with multi-branch architecture

Often we can "divide" our feature sets into different groups, where for each group we have many, many variables whose importance in prediction we don't know, but for which using a fully dense layer would be too computationally expensive. For example, we want to predict the growth of forest trees based on soil characteristics, climate characteristics and a bunch of other data (species, age, density...).

A soil (or climate) database may have hundreds of variables, how can we reduce them to a few that encode all the "soil" information? Sure, we could do a PCA or a clustering analysis, but a better way is to let our model itself find a way to encode the soil information into a vector in a way that is optimal for our prediction goal, i.e. we target the encoding task at our prediction goal.

So we run a multi-branch neural network where one branch is given by the soil variables - it starts from all the hundreds of variables and ends in a few neuron outputs, another branch in a similar way is for the climate variables, we merge them in a branch to take into account the soil-weather interrelation (for example, it is well known that the water retention capacity of a sandy soil is quite different from that of a clay soil) and finally we merge this branch with the other variable branch to arrive at a single predicted output. In this example we focus on building, training and predicting a multi-branch neural network. See the other examples for cross-validation, hyperparameter tuning, scaling, overfitting, encoding, etc.

Data origin:

  • while we hope to apply this example soon on actual real world data, for now we work on synthetic random data just to assess the validity of the network configuration.

Library and data generation

Activating the local environment specific to the tutorials

using Pkg
 Pkg.activate(joinpath(@__DIR__,"..","..",".."))
  Activating environment at `~/work/BetaML.jl/BetaML.jl/docs/Project.toml`

We first load all the packages we are going to use

using  StableRNGs, BetaML, Plots

Here we are explicit and we use our own fixed RNG:

seed      = 123
 AFIXEDRNG = StableRNG(seed)
StableRNGs.LehmerRNG(state=0x000000000000000000000000000000f7)

Here we generate the random data..

N         = 100 # records
 soilD     = 20   # dimensions of the soil database
@@ -68,6 +68,6 @@
  0.0
  0.6066950430614358
  0.12725625856856926
- 0.8345485301545944

Model quality assessment

We can compute the relative mean error between the "true" Y and the Y estimated by the model.

rme    = relative_mean_error(Y,Ŷ)
0.01344636290425473

Of course we know there is no actual relation here between the X and The Y, as both are randomly generated, the result above just tell us that the network has been able to find a path between the X and Y that has been used for training, but we hope that in the real application this learned path represent a true, general relation beteen the inputs and the outputs.

Finally we can also plot Y again Ŷ and visualize how the average loss reduced along the training:

scatter(Y,Ŷ,xlabel="vol observed",ylabel="vol estimated",label=nothing,title="Est vs. obs volumes")
Example block output
loss_per_epoch = info(m)["loss_per_epoch"]
+ 0.8345485301545944

Model quality assessment

We can compute the relative mean error between the "true" Y and the Y estimated by the model.

rme    = relative_mean_error(Y,Ŷ)
0.01344636290425473

Of course we know there is no actual relation here between the X and The Y, as both are randomly generated, the result above just tell us that the network has been able to find a path between the X and Y that has been used for training, but we hope that in the real application this learned path represent a true, general relation beteen the inputs and the outputs.

Finally we can also plot Y again Ŷ and visualize how the average loss reduced along the training:

scatter(Y,Ŷ,xlabel="vol observed",ylabel="vol estimated",label=nothing,title="Est vs. obs volumes")
Example block output
loss_per_epoch = info(m)["loss_per_epoch"]
 
-plot(loss_per_epoch, xlabel="epoch", ylabel="loss per epoch", label=nothing, title="Loss per epoch")
Example block output

View this file on Github.


This page was generated using Literate.jl.

+plot(loss_per_epoch, xlabel="epoch", ylabel="loss per epoch", label=nothing, title="Loss per epoch")
Example block output

View this file on Github.


This page was generated using Literate.jl.

diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-57973fed.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-04274d94.svg similarity index 87% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-57973fed.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-04274d94.svg index 5ec52ef..bb1aed3 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-57973fed.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-04274d94.svg @@ -1,46 +1,46 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-fd386bbe.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-1840719c.svg similarity index 72% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-fd386bbe.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-1840719c.svg index fd663f5..5fb19d5 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-fd386bbe.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-1840719c.svg @@ -1,586 +1,586 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-fb217b8a.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-3917c772.svg similarity index 79% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-fb217b8a.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-3917c772.svg index ec1df77..73bc75f 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-fb217b8a.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-3917c772.svg @@ -1,225 +1,225 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-6ffd2a5c.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-3dcbf167.svg similarity index 79% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-6ffd2a5c.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-3dcbf167.svg index b6e8e05..68cf2b2 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-6ffd2a5c.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-3dcbf167.svg @@ -1,229 +1,229 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-40891983.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-445fdc7c.svg similarity index 89% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-40891983.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-445fdc7c.svg index b35cc3a..37072cf 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-40891983.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-445fdc7c.svg @@ -1,46 +1,46 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-a876a07a.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-687c029d.svg similarity index 79% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-a876a07a.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-687c029d.svg index d10e971..5ab95c6 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-a876a07a.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-687c029d.svg @@ -1,227 +1,227 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-5aa275e5.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-7ca0b26e.svg similarity index 73% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-5aa275e5.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-7ca0b26e.svg index 607a121..5ddeede 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-5aa275e5.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-7ca0b26e.svg @@ -1,592 +1,592 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-a13e0a0d.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-832944fc.svg similarity index 72% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-a13e0a0d.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-832944fc.svg index 5192dfb..b7aca97 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-a13e0a0d.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-832944fc.svg @@ -1,586 +1,586 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-660a6d67.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-9d9c07ea.svg similarity index 90% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-660a6d67.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-9d9c07ea.svg index 1edb13d..9561878 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-660a6d67.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-9d9c07ea.svg @@ -1,46 +1,46 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-5f85076a.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-9f2283a5.svg similarity index 90% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-5f85076a.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-9f2283a5.svg index 5b8315e..a3098b7 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-5f85076a.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-9f2283a5.svg @@ -1,46 +1,46 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-34a5da8e.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-a018a4dd.svg similarity index 90% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-34a5da8e.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-a018a4dd.svg index 70ee8c1..d2efe0d 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-34a5da8e.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-a018a4dd.svg @@ -1,46 +1,46 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-65342eec.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-ad52845b.svg similarity index 89% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-65342eec.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-ad52845b.svg index cebbd19..33d2b38 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-65342eec.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-ad52845b.svg @@ -1,46 +1,46 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-996a1433.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-ba588daa.svg similarity index 89% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-996a1433.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-ba588daa.svg index 7bbb795..1bd31fa 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-996a1433.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-ba588daa.svg @@ -1,41 +1,41 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-14a15b5b.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-c13c3d88.svg similarity index 86% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-14a15b5b.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-c13c3d88.svg index 91863de..e4f0ca8 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-14a15b5b.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-c13c3d88.svg @@ -1,46 +1,46 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-ce6f6faf.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-c4671433.svg similarity index 79% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-ce6f6faf.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-c4671433.svg index 5545d29..8e7560d 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-ce6f6faf.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-c4671433.svg @@ -1,227 +1,227 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-d7735aab.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-e6db0fad.svg similarity index 87% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-d7735aab.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-e6db0fad.svg index 0951a9e..cefc098 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-d7735aab.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-e6db0fad.svg @@ -1,46 +1,46 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-abadc39b.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-eb7964a6.svg similarity index 72% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-abadc39b.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-eb7964a6.svg index e20002c..91456df 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-abadc39b.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-eb7964a6.svg @@ -1,586 +1,586 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-d20630c7.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-f11408b5.svg similarity index 86% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-d20630c7.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-f11408b5.svg index 8123b62..eb16f8f 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-d20630c7.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-f11408b5.svg @@ -1,46 +1,46 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-848cd3de.svg b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-fff0b19f.svg similarity index 87% rename from dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-848cd3de.svg rename to dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-fff0b19f.svg index 12fc390..a7d22da 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-848cd3de.svg +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes-fff0b19f.svg @@ -1,47 +1,47 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html index 6f187e2..6ff0d68 100644 --- a/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html +++ b/dev/tutorials/Regression - bike sharing/betaml_tutorial_regression_sharingBikes.html @@ -3,14 +3,14 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-JYKX8QY5JW', {'page_path': location.pathname + location.search + location.hash}); -

A regression task: the prediction of bike sharing demand

The task is to estimate the influence of several variables (like the weather, the season, the day of the week..) on the demand of shared bicycles, so that the authority in charge of the service can organise the service in the best way.

Data origin:

Note that even if we are estimating a time serie, we are not using here a recurrent neural network as we assume the temporal dependence to be negligible (i.e. $Y_t = f(X_t)$ alone).

Library and data loading

Activating the local environment specific to

using Pkg
+

A regression task: the prediction of bike sharing demand

The task is to estimate the influence of several variables (like the weather, the season, the day of the week..) on the demand of shared bicycles, so that the authority in charge of the service can organise the service in the best way.

Data origin:

Note that even if we are estimating a time serie, we are not using here a recurrent neural network as we assume the temporal dependence to be negligible (i.e. $Y_t = f(X_t)$ alone).

Library and data loading

Activating the local environment specific to

using Pkg
 Pkg.activate(joinpath(@__DIR__,"..","..",".."))
  Activating environment at `~/work/BetaML.jl/BetaML.jl/docs/Project.toml`

We first load all the packages we are going to use

using  LinearAlgebra, Random, Statistics, StableRNGs, DataFrames, CSV, Plots, Pipe, BenchmarkTools, BetaML
 import Distributions: Uniform, DiscreteUniform
 import DecisionTree, Flux ## For comparisions

Here we are explicit and we use our own fixed RNG:

seed = 123 # The table at the end of this tutorial has been obtained with seeds 123, 1000 and 10000
 AFIXEDRNG = StableRNG(seed)
StableRNGs.LehmerRNG(state=0x000000000000000000000000000000f7)

Here we load the data from a csv provided by the BataML package

basedir = joinpath(dirname(pathof(BetaML)),"..","docs","src","tutorials","Regression - bike sharing")
 data    = CSV.File(joinpath(basedir,"data","bike_sharing_day.csv"),delim=',') |> DataFrame
-describe(data)
16×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
1instant366.01366.07310Int64
2dteday2011-01-012012-12-310Date
3season2.4965813.040Int64
4yr0.50068401.010Int64
5mnth6.5198417.0120Int64
6holiday0.028727800.010Int64
7weekday2.9972603.060Int64
8workingday0.68399501.010Int64
9weathersit1.3953511.030Int64
10temp0.4953850.05913040.4983330.8616670Float64
11atemp0.4743540.07906960.4867330.8408960Float64
12hum0.6278940.00.6266670.97250Float64
13windspeed0.1904860.02239170.1809750.5074630Float64
14casual848.1762713.034100Int64
15registered3656.17203662.069460Int64
16cnt4504.35224548.087140Int64

The variable we want to learn to predict is cnt, the total demand of bikes for a given day. Even if it is indeed an integer, we treat it as a continuous variable, so each single prediction will be a scalar $Y \in \mathbb{R}$.

plot(data.cnt, title="Daily bike sharing rents (2Y)", label=nothing)
Example block output

Decision Trees

We start our regression task with Decision Trees.

Decision trees training consist in choosing the set of questions (in a hierarcical way, so to form indeed a "decision tree") that "best" split the dataset given for training, in the sense that the split generate the sub-samples (always 2 subsamples in the BetaML implementation) that are, for the characteristic we want to predict, the most homogeneous possible. Decision trees are one of the few ML algorithms that has an intuitive interpretation and can be used for both regression or classification tasks.

Data preparation

The first step is to prepare the data for the analysis. This indeed depends already on the model we want to employ, as some models "accept" almost everything as input, no matter if the data is numerical or categorical, if it has missing values or not... while other models are instead much more exigents, and require more work to "clean up" our dataset.

The tutorial starts using Decision Tree and Random Forest models that definitly belong to the first group, so the only thing we have to do is to select the variables in input (the "feature matrix", that we will indicate with "X") and the variable representing our output (the information we want to learn to predict, we call it "y"):

x    = Matrix{Float64}(data[:,[:instant,:season,:yr,:mnth,:holiday,:weekday,:workingday,:weathersit,:temp,:atemp,:hum,:windspeed]])
-y    = data[:,16];

We finally set up a dataframe to store the relative mean errors of the various models we'll use.

results = DataFrame(model=String[],train_rme=Float64[],test_rme=Float64[])
0×3 DataFrame
Rowmodeltrain_rmetest_rme
StringFloat64Float64

Model selection

We can now split the dataset between the data that we will use for training the algorithm and selecting the hyperparameters (xtrain/ytrain) and those for testing the quality of the algoritm with the optimal hyperparameters (xtest/ytest). We use the partition function specifying the share we want to use for these two different subsets, here 80%, and 20% respectively. As our data represents indeed a time serie, we want our model to be able to predict future demand of bike sharing from past, observed rented bikes, so we do not shuffle the datasets as it would be the default.

((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.75,1-0.75],shuffle=false)
+describe(data)

16 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
1instant366.01366.07310Int64
2dteday2011-01-012012-12-310Date
3season2.4965813.040Int64
4yr0.50068401.010Int64
5mnth6.5198417.0120Int64
6holiday0.028727800.010Int64
7weekday2.9972603.060Int64
8workingday0.68399501.010Int64
9weathersit1.3953511.030Int64
10temp0.4953850.05913040.4983330.8616670Float64
11atemp0.4743540.07906960.4867330.8408960Float64
12hum0.6278940.00.6266670.97250Float64
13windspeed0.1904860.02239170.1809750.5074630Float64
14casual848.1762713.034100Int64
15registered3656.17203662.069460Int64
16cnt4504.35224548.087140Int64

The variable we want to learn to predict is cnt, the total demand of bikes for a given day. Even if it is indeed an integer, we treat it as a continuous variable, so each single prediction will be a scalar $Y \in \mathbb{R}$.

plot(data.cnt, title="Daily bike sharing rents (2Y)", label=nothing)
Example block output

Decision Trees

We start our regression task with Decision Trees.

Decision trees training consist in choosing the set of questions (in a hierarcical way, so to form indeed a "decision tree") that "best" split the dataset given for training, in the sense that the split generate the sub-samples (always 2 subsamples in the BetaML implementation) that are, for the characteristic we want to predict, the most homogeneous possible. Decision trees are one of the few ML algorithms that has an intuitive interpretation and can be used for both regression or classification tasks.

Data preparation

The first step is to prepare the data for the analysis. This indeed depends already on the model we want to employ, as some models "accept" almost everything as input, no matter if the data is numerical or categorical, if it has missing values or not... while other models are instead much more exigents, and require more work to "clean up" our dataset.

The tutorial starts using Decision Tree and Random Forest models that definitly belong to the first group, so the only thing we have to do is to select the variables in input (the "feature matrix", that we will indicate with "X") and the variable representing our output (the information we want to learn to predict, we call it "y"):

x    = Matrix{Float64}(data[:,[:instant,:season,:yr,:mnth,:holiday,:weekday,:workingday,:weathersit,:temp,:atemp,:hum,:windspeed]])
+y    = data[:,16];

We finally set up a dataframe to store the relative mean errors of the various models we'll use.

results = DataFrame(model=String[],train_rme=Float64[],test_rme=Float64[])

0 rows × 3 columns

modeltrain_rmetest_rme
StringFloat64Float64

Model selection

We can now split the dataset between the data that we will use for training the algorithm and selecting the hyperparameters (xtrain/ytrain) and those for testing the quality of the algoritm with the optimal hyperparameters (xtest/ytest). We use the partition function specifying the share we want to use for these two different subsets, here 80%, and 20% respectively. As our data represents indeed a time serie, we want our model to be able to predict future demand of bike sharing from past, observed rented bikes, so we do not shuffle the datasets as it would be the default.

((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.75,1-0.75],shuffle=false)
 (ntrain, ntest) = size.([ytrain,ytest],1)
2-element Vector{Int64}:
  548
  183

Then we define the model we want to use, DecisionTreeEstimator in this case, and we create an instance of the model:

m = DecisionTreeEstimator(autotune=true, rng=copy(AFIXEDRNG))
DecisionTreeEstimator - A Decision Tree model (unfitted)

Passing a fixed Random Number Generator (RNG) to the rng parameter guarantees that everytime we use the model with the same data (from the model creation downward to value prediciton) we obtain the same results. In particular BetaML provide FIXEDRNG, an istance of StableRNG that guarantees reproducibility even across different Julia versions. See the section "Dealing with stochasticity" for details. Note the autotune parameter. BetaML has perhaps what is the easiest method for automatically tuning the model hyperparameters (thus becoming in this way learned parameters). Indeed, in most cases it is enought to pass the attribute autotune=true on the model constructor and hyperparameters search will be automatically performed on the first fit! call. If needed we can customise hyperparameter tuning, chosing the tuning method on the parameter tunemethod. The single-line above is equivalent to:

tuning_method = SuccessiveHalvingSearch(
@@ -60,11 +60,11 @@
  4169.0
  4896.333333333333
  5459.0

We now compute the mean relative error for the training and the test set. The relative_mean_error is a very flexible error function. Without additional parameter, it computes, as the name says, the relative mean error, between an estimated and a true vector. However it can also compute the mean relative error, also known as the "mean absolute percentage error" (MAPE), or use a p-norm higher than 1. The mean relative error enfatises the relativeness of the error, i.e. all observations and dimensions weigth the same, wether large or small. Conversly, in the relative mean error the same relative error on larger observations (or dimensions) weights more. In this tutorial we use the later, as our data has clearly some outlier days with very small rents, and we care more of avoiding our customers finding empty bike racks than having unrented bikes on the rack. Targeting a low mean average error would push all our predicitons down to try accomodate the low-level predicitons (to avoid a large relative error), and that's not what we want.

We can then compute the relative mean error for the decision tree

rme_train = relative_mean_error(ytrain,ŷtrain) # 0.1367
-rme_test  = relative_mean_error(ytest,ŷtest) # 0.1547
0.1728808325123301

And we save the real mean accuracies in the results dataframe:

push!(results,["DT",rme_train,rme_test]);

We can plot the true labels vs the estimated one for the three subsets...

scatter(ytrain,ŷtrain,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in training period (DT)")
Example block output
scatter(ytest,ŷtest,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in testing period (DT)")
Example block output

Or we can visualise the true vs estimated bike shared on a temporal base. First on the full period (2 years) ...

ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))
+rme_test  = relative_mean_error(ytest,ŷtest) # 0.1547
0.1728808325123301

And we save the real mean accuracies in the results dataframe:

push!(results,["DT",rme_train,rme_test]);

We can plot the true labels vs the estimated one for the three subsets...

scatter(ytrain,ŷtrain,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in training period (DT)")
Example block output
scatter(ytest,ŷtest,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in testing period (DT)")
Example block output

Or we can visualise the true vs estimated bike shared on a temporal base. First on the full period (2 years) ...

ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))
 ŷtestfull  = vcat(fill(missing,ntrain), ŷtest)
-plot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=["obs" "train" "test"], legend=:topleft, ylabel="daily rides", title="Daily bike sharing demand observed/estimated across the\n whole 2-years period (DT)")
Example block output

..and then focusing on the testing period

stc = ntrain
+plot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=["obs" "train" "test"], legend=:topleft, ylabel="daily rides", title="Daily bike sharing demand observed/estimated across the\n whole 2-years period (DT)")
Example block output

..and then focusing on the testing period

stc = ntrain
 endc = size(x,1)
-plot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=["obs" "test"], legend=:bottomleft, ylabel="Daily rides", title="Focus on the testing period (DT)")
Example block output

The predictions aren't so bad in this case, however decision trees are highly instable, and the output could have depended just from the specific initial random seed.

Random Forests

Rather than trying to solve this problem using a single Decision Tree model, let's not try to use a Random Forest model. Random forests average the results of many different decision trees and provide a more "stable" result. Being made of many decision trees, random forests are hovever more computationally expensive to train.

m_rf      = RandomForestEstimator(autotune=true, oob=true, rng=copy(AFIXEDRNG))
+plot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=["obs" "test"], legend=:bottomleft, ylabel="Daily rides", title="Focus on the testing period (DT)")
Example block output

The predictions aren't so bad in this case, however decision trees are highly instable, and the output could have depended just from the specific initial random seed.

Random Forests

Rather than trying to solve this problem using a single Decision Tree model, let's not try to use a Random Forest model. Random forests average the results of many different decision trees and provide a more "stable" result. Being made of many decision trees, random forests are hovever more computationally expensive to train.

m_rf      = RandomForestEstimator(autotune=true, oob=true, rng=copy(AFIXEDRNG))
 ŷtrain    = fit!(m_rf,xtrain,ytrain);
 ŷtest     = predict(m_rf,xtest);
 rme_train = relative_mean_error(ytrain,ŷtrain) # 0.056
@@ -77,11 +77,11 @@
 (e 5 / 7) N data / n candidates / n candidates to retain : 109.60000000000001 	 22 8
 (e 6 / 7) N data / n candidates / n candidates to retain : 164.4 	 8 3
 (e 7 / 7) N data / n candidates / n candidates to retain : 219.20000000000002 	 3 1

While slower than individual decision trees, random forests remain relativly fast. We should also consider that they are by default efficiently parallelised, so their speed increases with the number of available cores (in building this documentation page, GitHub CI servers allow for a single core, so all the bechmark you see in this tutorial are run with a single core available).

Random forests support the so-called "out-of-bag" error, an estimation of the error that we would have when the model is applied on a testing sample. However in this case the oob reported is much smaller than the testing error we will actually find. This is due to the fact that the division between training/validation and testing in this exercise is not random, but has a temporal basis. It seems that in this example the data in validation/testing follows a different pattern/variance than those in training (in probabilistic terms, the daily observations are not i.i.d.).

info(m_rf)
-oob_error, rme_test  = info(m_rf)["oob_errors"],relative_mean_error(ytest,ŷtest)

In this case we found an error very similar to the one employing a single decision tree. Let's print the observed data vs the estimated one using the random forest and then along the temporal axis:

scatter(ytrain,ŷtrain,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in training period (RF)")
Example block output
scatter(ytest,ŷtest,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in testing period (RF)")
Example block output

Full period plot (2 years):

ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))
+oob_error, rme_test  = info(m_rf)["oob_errors"],relative_mean_error(ytest,ŷtest)

In this case we found an error very similar to the one employing a single decision tree. Let's print the observed data vs the estimated one using the random forest and then along the temporal axis:

scatter(ytrain,ŷtrain,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in training period (RF)")
Example block output
scatter(ytest,ŷtest,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in testing period (RF)")
Example block output

Full period plot (2 years):

ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))
 ŷtestfull  = vcat(fill(missing,ntrain), ŷtest)
-plot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=["obs" "train" "test"], legend=:topleft, ylabel="daily rides", title="Daily bike sharing demand observed/estimated across the\n whole 2-years period (RF)")
Example block output

Focus on the testing period:

stc = 620
+plot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=["obs" "train" "test"], legend=:topleft, ylabel="daily rides", title="Daily bike sharing demand observed/estimated across the\n whole 2-years period (RF)")
Example block output

Focus on the testing period:

stc = 620
 endc = size(x,1)
-plot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtrainfull[stc:endc] ŷtestfull[stc:endc]], label=["obs" "val" "test"], legend=:bottomleft, ylabel="Daily rides", title="Focus on the testing period (RF)")
Example block output

Comparison with DecisionTree.jl random forest

We now compare our results with those obtained employing the same model in the DecisionTree package, using the hyperparameters of the obtimal BetaML Random forest model:

best_rf_hp = hyperparameters(m_rf)
RandomForestE_hp (a BetaMLHyperParametersSet struct)
+plot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtrainfull[stc:endc] ŷtestfull[stc:endc]], label=["obs" "val" "test"], legend=:bottomleft, ylabel="Daily rides", title="Focus on the testing period (RF)")
Example block output

Comparison with DecisionTree.jl random forest

We now compare our results with those obtained employing the same model in the DecisionTree package, using the hyperparameters of the obtimal BetaML Random forest model:

best_rf_hp = hyperparameters(m_rf)
RandomForestE_hp (a BetaMLHyperParametersSet struct)
 - n_trees: 30
 - max_depth: nothing
 - min_gain: 0.0
@@ -112,9 +112,9 @@
 (rme_train, rme_test) = relative_mean_error.([ytrain,ytest],[ŷtrain,ŷtest]) # 0.022 and 0.304
 push!(results,["RF (DecisionTree.jl)",rme_train,rme_test]);

While the train error is very small, the error on the test set remains relativly high. The very low error level on the training set is a sign that it overspecialised on the training set, and we should have better ran a dedicated hyper-parameter tuning function for the DecisionTree.jl model (we did try using the default DecisionTrees.jl parameters, but we obtained roughtly the same results).

Finally we plot the DecisionTree.jl predictions alongside the observed value:

ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))
 ŷtestfull  = vcat(fill(missing,ntrain), ŷtest)
-plot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=["obs" "train" "test"], legend=:topleft, ylabel="daily rides", title="Daily bike sharing demand observed/estimated across the\n whole 2-years period (DT.jl RF)")
Example block output

Again, focusing on the testing data:

stc  = ntrain
+plot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=["obs" "train" "test"], legend=:topleft, ylabel="daily rides", title="Daily bike sharing demand observed/estimated across the\n whole 2-years period (DT.jl RF)")
Example block output

Again, focusing on the testing data:

stc  = ntrain
 endc = size(x,1)
-plot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=["obs" "test"], legend=:bottomleft, ylabel="Daily rides", title="Focus on the testing period (DT.jl RF)")
Example block output

Conclusions of Decision Trees / Random Forests methods

The error obtained employing DecisionTree.jl is significantly larger than those obtained using a BetaML random forest model, altought to be fair with DecisionTrees.jl we didn't tuned its hyper-parameters. Also, the DecisionTree.jl random forest model is much faster. This is partially due by the fact that, internally, DecisionTree.jl models optimise the algorithm by sorting the observations. BetaML trees/forests don't employ this optimisation and hence they can work with true categorical data for which ordering is not defined. An other explanation of this difference in speed is that BetaML Random Forest models accept missing values within the feature matrix. To sum up, BetaML random forests are ideal algorithms when we want to obtain good predictions in the most simpler way, even without manually tuning the hyper-parameters, and without spending time in cleaning ("munging") the feature matrix, as they accept almost "any kind" of data as it is.

Neural Networks

BetaML provides only deep forward neural networks, artificial neural network units where the individual "nodes" are arranged in layers, from the input layer, where each unit holds the input coordinate, through various hidden layer transformations, until the actual output of the model:

Neural Networks

In this layerwise computation, each unit in a particular layer takes input from all the preceding layer units and it has its own parameters that are adjusted to perform the overall computation. The training of the network consists in retrieving the coefficients that minimise a loss function between the output of the model and the known data. In particular, a deep (feedforward) neural network refers to a neural network that contains not only the input and output layers, but also (a variable number of) hidden layers in between.

Neural networks accept only numerical inputs. We hence need to convert all categorical data in numerical units. A common approach is to use the so-called "one-hot-encoding" where the catagorical values are converted into indicator variables (0/1), one for each possible value. This can be done in BetaML using the OneHotEncoder function:

seasonDummies  = fit!(OneHotEncoder(),data.season)
+plot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=["obs" "test"], legend=:bottomleft, ylabel="Daily rides", title="Focus on the testing period (DT.jl RF)")
Example block output

Conclusions of Decision Trees / Random Forests methods

The error obtained employing DecisionTree.jl is significantly larger than those obtained using a BetaML random forest model, altought to be fair with DecisionTrees.jl we didn't tuned its hyper-parameters. Also, the DecisionTree.jl random forest model is much faster. This is partially due by the fact that, internally, DecisionTree.jl models optimise the algorithm by sorting the observations. BetaML trees/forests don't employ this optimisation and hence they can work with true categorical data for which ordering is not defined. An other explanation of this difference in speed is that BetaML Random Forest models accept missing values within the feature matrix. To sum up, BetaML random forests are ideal algorithms when we want to obtain good predictions in the most simpler way, even without manually tuning the hyper-parameters, and without spending time in cleaning ("munging") the feature matrix, as they accept almost "any kind" of data as it is.

Neural Networks

BetaML provides only deep forward neural networks, artificial neural network units where the individual "nodes" are arranged in layers, from the input layer, where each unit holds the input coordinate, through various hidden layer transformations, until the actual output of the model:

Neural Networks

In this layerwise computation, each unit in a particular layer takes input from all the preceding layer units and it has its own parameters that are adjusted to perform the overall computation. The training of the network consists in retrieving the coefficients that minimise a loss function between the output of the model and the known data. In particular, a deep (feedforward) neural network refers to a neural network that contains not only the input and output layers, but also (a variable number of) hidden layers in between.

Neural networks accept only numerical inputs. We hence need to convert all categorical data in numerical units. A common approach is to use the so-called "one-hot-encoding" where the catagorical values are converted into indicator variables (0/1), one for each possible value. This can be done in BetaML using the OneHotEncoder function:

seasonDummies  = fit!(OneHotEncoder(),data.season)
 weatherDummies = fit!(OneHotEncoder(),data.weathersit)
 wdayDummies    = fit!(OneHotEncoder(),data.weekday .+ 1)
 
@@ -188,11 +188,11 @@
  3614.919209226012
  3534.3393968342434
  3449.2829592663797
(rme_train, rme_test) = relative_mean_error.([ŷtrain,ŷtest],[ytrain,ytest])
-push!(results,["NN",rme_train,rme_test]);

The error is much lower. Let's plot our predictions:

Again, we can start by plotting the estimated vs the observed value:

scatter(ytrain,ŷtrain,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in training period (NN)")
Example block output
scatter(ytest,ŷtest,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in testing period (NN)")
Example block output

We now plot across the time dimension, first plotting the whole period (2 years):

ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))
+push!(results,["NN",rme_train,rme_test]);

The error is much lower. Let's plot our predictions:

Again, we can start by plotting the estimated vs the observed value:

scatter(ytrain,ŷtrain,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in training period (NN)")
Example block output
scatter(ytest,ŷtest,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in testing period (NN)")
Example block output

We now plot across the time dimension, first plotting the whole period (2 years):

ŷtrainfull = vcat(ŷtrain,fill(missing,ntest))
 ŷtestfull  = vcat(fill(missing,ntrain), ŷtest)
-plot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=["obs" "train" "test"], legend=:topleft, ylabel="daily rides", title="Daily bike sharing demand observed/estimated across the\n whole 2-years period  (NN)")
Example block output

...and then focusing on the testing data

stc  = 620
+plot(data[:,:dteday],[data[:,:cnt] ŷtrainfull ŷtestfull], label=["obs" "train" "test"], legend=:topleft, ylabel="daily rides", title="Daily bike sharing demand observed/estimated across the\n whole 2-years period  (NN)")
Example block output

...and then focusing on the testing data

stc  = 620
 endc = size(x,1)
-plot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=["obs" "val" "test"], legend=:bottomleft, ylabel="Daily rides", title="Focus on the testing period (NN)")
Example block output

Comparison with Flux.jl

We now apply the same Neural Network model using the Flux framework, a dedicated neural network library, reusing the optimal parameters that we did learn from tuning NeuralNetworkEstimator:

hp_opt         = hyperparameters(nnm)
+plot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfull[stc:endc]], label=["obs" "val" "test"], legend=:bottomleft, ylabel="Daily rides", title="Focus on the testing period (NN)")
Example block output

Comparison with Flux.jl

We now apply the same Neural Network model using the Flux framework, a dedicated neural network library, reusing the optimal parameters that we did learn from tuning NeuralNetworkEstimator:

hp_opt         = hyperparameters(nnm)
 opt_size       = size(hp_opt.layers[1])[2][1]
 opt_batch_size = hp_opt.batch_size
 opt_epochs     = hp_opt.epochs
93

We fix the default random number generator so that the Flux example gives a reproducible output

Random.seed!(seed)
MersenneTwister(123)

We define the Flux neural network model and load it with data...

l1         = Flux.Dense(D,opt_size,Flux.relu)
@@ -228,17 +228,17 @@
 │   The input will be converted, but any earlier layers may be very slow.
 │   layer = Dense(23 => 9, relu)  # 216 parameters
 │   summary(x) = "23×548 adjoint(::Matrix{Float64}) with eltype Float64"
-└ @ Flux ~/.julia/packages/Flux/n3cOc/src/layers/stateless.jl:60
+└ @ Flux ~/.julia/packages/Flux/uCLgc/src/layers/stateless.jl:50
 ┌ Warning: Layer with Float32 parameters got Float64 input.
 │   The input will be converted, but any earlier layers may be very slow.
 │   layer = Dense(23 => 9, relu)  # 216 parameters
 │   summary(x) = "23×183 adjoint(::Matrix{Float64}) with eltype Float64"
-└ @ Flux ~/.julia/packages/Flux/n3cOc/src/layers/stateless.jl:60

..and we compute the mean relative errors..

(rme_train, rme_test) = relative_mean_error.([ŷtrainf,ŷtestf],[ytrain,ytest])
-push!(results,["NN (Flux.jl)",rme_train,rme_test]);

.. finding an error not significantly different than the one obtained from BetaML.Nn.

Plots:

scatter(ytrain,ŷtrainf,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in training period (Flux.NN)")
Example block output
scatter(ytest,ŷtestf,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in testing period (Flux.NN)")
Example block output
ŷtrainfullf = vcat(ŷtrainf,fill(missing,ntest))
+└ @ Flux ~/.julia/packages/Flux/uCLgc/src/layers/stateless.jl:50

..and we compute the mean relative errors..

(rme_train, rme_test) = relative_mean_error.([ŷtrainf,ŷtestf],[ytrain,ytest])
+push!(results,["NN (Flux.jl)",rme_train,rme_test]);

.. finding an error not significantly different than the one obtained from BetaML.Nn.

Plots:

scatter(ytrain,ŷtrainf,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in training period (Flux.NN)")
Example block output
scatter(ytest,ŷtestf,xlabel="daily rides",ylabel="est. daily rides",label=nothing,title="Est vs. obs in testing period (Flux.NN)")
Example block output
ŷtrainfullf = vcat(ŷtrainf,fill(missing,ntest))
 ŷtestfullf  = vcat(fill(missing,ntrain), ŷtestf)
-plot(data[:,:dteday],[data[:,:cnt] ŷtrainfullf ŷtestfullf], label=["obs" "train" "test"], legend=:topleft, ylabel="daily rides", title="Daily bike sharing demand observed/estimated across the\n whole 2-years period (Flux.NN)")
Example block output
stc = 620
+plot(data[:,:dteday],[data[:,:cnt] ŷtrainfullf ŷtestfullf], label=["obs" "train" "test"], legend=:topleft, ylabel="daily rides", title="Daily bike sharing demand observed/estimated across the\n whole 2-years period (Flux.NN)")
Example block output
stc = 620
 endc = size(x,1)
-plot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfullf[stc:endc]], label=["obs" "val" "test"], legend=:bottomleft, ylabel="Daily rides", title="Focus on the testing period (Flux.NN)")
Example block output

Conclusions of Neural Network models

If we strive for the most accurate predictions, deep neural networks are usually the best choice. However they are computationally expensive, so with limited resourses we may get better results by fine tuning and running many repetitions of "simpler" decision trees or even random forest models than a large naural network with insufficient hyper-parameter tuning. Also, we shoudl consider that decision trees/random forests are much simpler to work with.

That said, specialised neural network libraries, like Flux, allow to use GPU and specialised hardware letting neural networks to scale with very large datasets.

Still, for small and medium datasets, BetaML provides simpler yet customisable solutions that are accurate and fast.

GMM-based regressors

BetaML 0.8 introduces new regression algorithms based on Gaussian Mixture Model. Specifically, there are two variants available, GaussianMixtureRegressor2 and GaussianMixtureRegressor, and this example uses GaussianMixtureRegressor As for neural networks, they work on numerical data only, so we reuse the datasets we prepared for the neural networks.

As usual we first define the model.

m = GaussianMixtureRegressor(rng=copy(AFIXEDRNG),verbosity=NONE)
GaussianMixtureRegressor - A regressor based on Generative Mixture Model (unfitted)
Info

We disabled autotune here, as this code is run by GitHub continuous_integration servers on each code update, and GitHub servers seem to have some strange problem with it, taking almost 4 hours instead of a few seconds on my machine.

We then fit the model to the training data..

ŷtrainGMM_unscaled = fit!(m,xtrain_scaled,ytrain_scaled)
548×1 Matrix{Float64}:
+plot(data[stc:endc,:dteday],[data[stc:endc,:cnt] ŷtestfullf[stc:endc]], label=["obs" "val" "test"], legend=:bottomleft, ylabel="Daily rides", title="Focus on the testing period (Flux.NN)")
Example block output

Conclusions of Neural Network models

If we strive for the most accurate predictions, deep neural networks are usually the best choice. However they are computationally expensive, so with limited resourses we may get better results by fine tuning and running many repetitions of "simpler" decision trees or even random forest models than a large naural network with insufficient hyper-parameter tuning. Also, we shoudl consider that decision trees/random forests are much simpler to work with.

That said, specialised neural network libraries, like Flux, allow to use GPU and specialised hardware letting neural networks to scale with very large datasets.

Still, for small and medium datasets, BetaML provides simpler yet customisable solutions that are accurate and fast.

GMM-based regressors

BetaML 0.8 introduces new regression algorithms based on Gaussian Mixture Model. Specifically, there are two variants available, GaussianMixtureRegressor2 and GaussianMixtureRegressor, and this example uses GaussianMixtureRegressor As for neural networks, they work on numerical data only, so we reuse the datasets we prepared for the neural networks.

As usual we first define the model.

m = GaussianMixtureRegressor(rng=copy(AFIXEDRNG),verbosity=NONE)
GaussianMixtureRegressor - A regressor based on Generative Mixture Model (unfitted)
Info

We disabled autotune here, as this code is run by GitHub continuous_integration servers on each code update, and GitHub servers seem to have some strange problem with it, taking almost 4 hours instead of a few seconds on my machine.

We then fit the model to the training data..

ŷtrainGMM_unscaled = fit!(m,xtrain_scaled,ytrain_scaled)
548×1 Matrix{Float64}:
  2.05456959514988
  2.0545695951498795
  2.0545695951498795
@@ -271,4 +271,4 @@
    3 │ RF (DecisionTree.jl)  0.0987308  0.286927
    4 │ NN                    0.149393   0.166221
    5 │ NN (Flux.jl)          0.08599    0.172753
-   6 │ GMM                   0.216144   0.26681

You may ask how stable are these results? How much do they depend from the specific RNG seed ? We re-evaluated a couple of times the whole script but changing random seeds (to 1000 and 10000):

ModelTrain rme1Test rme1Train rme2Test rme2Train rme3Test rme3
DT0.13669600.1547200.02330440.2493290.06215710.161657
RF0.04212670.1801860.05357760.1369200.03861440.141606
RF (DecisionTree.jl)0.02304390.2358230.08010400.2438220.01687640.219011
NN0.16040000.1699520.10913300.1214960.14814400.150458
NN (Flux.jl)0.09311610.1662280.09207960.1670470.09078100.122469
GaussianMixtureRegressor*0.14328000.2938910.13803400.2954700.14775700.284567
  • GMM is a deterministic model, the variations are due to the different random sampling in choosing the best hyperparameters

Neural networks can be more precise than random forests models, but are more computationally expensive (and tricky to set up). When we compare BetaML with the algorithm-specific leading packages, we found similar results in terms of accuracy, but often the leading packages are better optimised and run more efficiently (but sometimes at the cost of being less versatile). GMM_based regressors are very computationally cheap and a good compromise if accuracy can be traded off for performances.

View this file on Github.


This page was generated using Literate.jl.

+ 6 │ GMM 0.216144 0.26681

You may ask how stable are these results? How much do they depend from the specific RNG seed ? We re-evaluated a couple of times the whole script but changing random seeds (to 1000 and 10000):

ModelTrain rme1Test rme1Train rme2Test rme2Train rme3Test rme3
DT0.13669600.1547200.02330440.2493290.06215710.161657
RF0.04212670.1801860.05357760.1369200.03861440.141606
RF (DecisionTree.jl)0.02304390.2358230.08010400.2438220.01687640.219011
NN0.16040000.1699520.10913300.1214960.14814400.150458
NN (Flux.jl)0.09311610.1662280.09207960.1670470.09078100.122469
GaussianMixtureRegressor*0.14328000.2938910.13803400.2954700.14775700.284567
  • GMM is a deterministic model, the variations are due to the different random sampling in choosing the best hyperparameters

Neural networks can be more precise than random forests models, but are more computationally expensive (and tricky to set up). When we compare BetaML with the algorithm-specific leading packages, we found similar results in terms of accuracy, but often the leading packages are better optimised and run more efficiently (but sometimes at the cost of being less versatile). GMM_based regressors are very computationally cheap and a good compromise if accuracy can be traded off for performances.

View this file on Github.


This page was generated using Literate.jl.