Skip to content

Commit

Permalink
Large rewrite of MoxelXHyperPArametersSet and ModelXLearnableParamete…
Browse files Browse the repository at this point in the history
…rs to shorter versions
  • Loading branch information
sylvaticus committed Jan 25, 2024
1 parent af509b3 commit a96b5df
Show file tree
Hide file tree
Showing 23 changed files with 214 additions and 208 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
name: Documentation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@v1
with:
version: '1.6'
Expand All @@ -71,4 +71,4 @@ jobs:
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }}
GKSwstype: "100" # https://discourse.julialang.org/t/generation-of-documentation-fails-qt-qpa-xcb-could-not-connect-to-display/60988
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v1
- uses: codecov/codecov-action@v3
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ ML toolkits/pipelines | [ScikitLearn.jl](https://github.com/cstjean/ScikitLearn.
Neural Networks | [Flux.jl](https://fluxml.ai/), [Knet](https://github.com/denizyuret/Knet.jl)
Decision Trees | [DecisionTree.jl](https://github.com/bensadeghi/DecisionTree.jl)
Clustering | [Clustering.jl](https://github.com/JuliaStats/Clustering.jl), [GaussianMixtures.jl](https://github.com/davidavdav/GaussianMixtures.jl)
Missing imputation | [Impute.jl](https://github.com/invenia/Impute.jl)
Missing imputation | [Impute.jl](https://github.com/invenia/Impute.jl), [Mice.jl](https://github.com/tom-metherell/Mice.jl)



Expand All @@ -181,6 +181,8 @@ Missing imputation | [Impute.jl](https://github.com/invenia/Impute.jl)

- Add RNN support and improve convolutional layers speed
- Reinforcement learning (Markov decision processes)
- Standardize data sampling in training
- Convert to GPU

## Contribute

Expand Down
10 changes: 5 additions & 5 deletions docs/src/Api_v2_developer.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ The model struct is composed of the following elements:

```
mutable struct DecisionTreeEstimator <: BetaMLSupervisedModel
hpar::DTHyperParametersSet # Hyper-pharameters
opt::BetaMLDefaultOptionsSet # Option sets, default or a specific one for the model
par::DTLearnableParameters # Model learnable parameters (needed for predictions)
hpar::DecisionTreeE_hp # Hyper-pharameters
opt::BML_options # Option sets, default or a specific one for the model
par::DT_lp # Model learnable parameters (needed for predictions)
cres::T # Cached results
trained::Bool # Trained flag
info # Complementary information, but not needed to make predictions
Expand All @@ -26,7 +26,7 @@ Each specific model hyperparameter set and learnable parameter set are childs of
While hyperparameters are elements that control the learning process, i.e. would influence the model training and prediction, the options have a more general meaning and do not directly affect the training (they can do indirectly, like the rng). The default option set is implemented as:

```
Base.@kwdef mutable struct BetaMLDefaultOptionsSet
Base.@kwdef mutable struct BML_options
"Cache the results of the fitting stage, as to allow predict(mod) [default: `true`]. Set it to `false` to save memory for large data."
cache::Bool = true
"An optional title and/or description for this model"
Expand All @@ -44,7 +44,7 @@ Note that the user doesn't generally need to make a difference between an hyperp

```
function KMedoidsClusterer(;kwargs...)
m = KMedoidsClusterer(KMeansMedoidsHyperParametersSet(),BetaMLDefaultOptionsSet(),KMeansMedoidsLearnableParameters(),nothing,false,Dict{Symbol,Any}())
m = KMedoidsClusterer(KMeansMedoidsHyperParametersSet(),BML_options(),KMeansMedoids_lp(),nothing,false,Dict{Symbol,Any}())
thisobjfields = fieldnames(nonmissingtype(typeof(m)))
for (kw,kwv) in kwargs
found = false
Expand Down
3 changes: 3 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ Most models have an interface for the [`MLJ`](https://github.com/alan-turing-ins

Aside Julia, BetaML can be accessed in R or Python using respectively [JuliaCall](https://github.com/Non-Contradiction/JuliaCall) and [PyJulia](https://github.com/JuliaPy/pyjulia). See [the tutorial](@ref using_betaml_from_other_languages) for details.

!!! Warning
Version 0.11 brings homogenization in the models' names and put some order on other stuff, but at the cost of severe breaking changes. Follow the updated documentation.

## Installation

The BetaML package is included in the standard Julia register, install it with:
Expand Down
8 changes: 4 additions & 4 deletions src/Api.jl
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import JLD2
export Verbosity, NONE, LOW, STD, HIGH, FULL,
FIXEDSEED, FIXEDRNG,
BetaMLModel, BetaMLSupervisedModel, BetaMLUnsupervisedModel,
BetaMLOptionsSet, BetaMLDefaultOptionsSet, BetaMLHyperParametersSet, BetaMLLearnableParametersSet,
BetaMLOptionsSet, BML_options, BetaMLHyperParametersSet, BetaMLLearnableParametersSet,
AutoTuneMethod,
predict, inverse_predict, fit!, fit_ex, info, reset!, reset_ex, parameters,hyperparameters, options, sethp!,
model_save, model_load
Expand Down Expand Up @@ -84,14 +84,14 @@ A struct defining the options used by default by the algorithms that do not over
$(TYPEDFIELDS)
# Notes:
- even if a model doesn't override `BetaMLDefaultOptionsSet`, may not use all its options, for example deterministic models would not make use of the `rng` parameter. Passing such parameters in these cases would simply have no influence.
- even if a model doesn't override `BML_options`, may not use all its options, for example deterministic models would not make use of the `rng` parameter. Passing such parameters in these cases would simply have no influence.
# Example:
```
julia> options = BetaMLDefaultOptionsSet(cache=false,descr="My model")
julia> options = BML_options(cache=false,descr="My model")
```
"""
Base.@kwdef mutable struct BetaMLDefaultOptionsSet <: BetaMLOptionsSet
Base.@kwdef mutable struct BML_options <: BetaMLOptionsSet
"Cache the results of the fitting stage, as to allow predict(mod) [default: `true`]. Set it to `false` to save memory for large data."
cache::Bool = true
"An optional title and/or description for this model"
Expand Down
2 changes: 1 addition & 1 deletion src/Clustering/Clustering.jl
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ import Base.print
import Base.show

# export kmeans, kmedoids
export KMeansHyperParametersSet, KMedoidsHyperParametersSet, KMeansClusterer, KMedoidsClusterer
export KMeansC_hp, KMedoidsC_hp, KMeansClusterer, KMedoidsClusterer

include("Clustering_hard.jl") # K-means and k-medoids

Expand Down
34 changes: 17 additions & 17 deletions src/Clustering/Clustering_hard.jl
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ Hyperparameters for the [`KMeansClusterer`](@ref) model
# Parameters:
$(TYPEDFIELDS)
"""
Base.@kwdef mutable struct KMeansHyperParametersSet <: BetaMLHyperParametersSet
Base.@kwdef mutable struct KMeansC_hp <: BetaMLHyperParametersSet
"Number of classes to discriminate the data [def: 3]"
n_classes::Int64 = 3
"Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (`l1_distance`, `l2_distance`, `l2squared_distance`), `cosine_distance`), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that the `KMeansClusterer` algorithm is not guaranteed to converge with other distances than the Euclidean one."
Expand All @@ -273,7 +273,7 @@ Hyperparameters for the and [`KMedoidsClusterer`](@ref) models
# Parameters:
$(TYPEDFIELDS)
"""
Base.@kwdef mutable struct KMedoidsHyperParametersSet <: BetaMLHyperParametersSet
Base.@kwdef mutable struct KMedoidsC_hp <: BetaMLHyperParametersSet
"Number of classes to discriminate the data [def: 3]"
n_classes::Int64 = 3
"Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (`l1_distance`, `l2_distance`, `l2squared_distance`), `cosine_distance`), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that the `KMeansClusterer` algorithm is not guaranteed to converge with other distances than the Euclidean one."
Expand All @@ -291,7 +291,7 @@ Base.@kwdef mutable struct KMedoidsHyperParametersSet <: BetaMLHyperParametersSe
initial_representatives::Union{Nothing,Matrix{Float64}} = nothing
end

Base.@kwdef mutable struct KMeansMedoidsLearnableParameters <: BetaMLLearnableParametersSet
Base.@kwdef mutable struct KMeansMedoids_lp <: BetaMLLearnableParametersSet
representatives::Union{Nothing,Matrix{Float64}} = nothing
end

Expand All @@ -302,7 +302,7 @@ The classical "K-Means" clustering algorithm (unsupervised).
Learn to partition the data and assign each record to one of the `n_classes` classes according to a distance metric (default Euclidean).
For the parameters see [`?KMeansHyperParametersSet`](@ref KMeansHyperParametersSet) and [`?BetaMLDefaultOptionsSet`](@ref BetaMLDefaultOptionsSet).
For the parameters see [`?KMeansC_hp`](@ref KMeansC_hp) and [`?BML_options`](@ref BML_options).
# Notes:
- data must be numerical
Expand Down Expand Up @@ -343,15 +343,15 @@ Dict{String, Any} with 2 entries:
"xndims" => 2
julia> parameters(mod)
BetaML.Clustering.KMeansMedoidsLearnableParameters (a BetaMLLearnableParametersSet struct)
BetaML.Clustering.KMeansMedoids_lp (a BetaMLLearnableParametersSet struct)
- representatives: [1.13366 9.7209; 11.0 0.9]
```
"""
mutable struct KMeansClusterer <: BetaMLUnsupervisedModel
hpar::KMeansHyperParametersSet
opt::BetaMLDefaultOptionsSet
par::Union{Nothing,KMeansMedoidsLearnableParameters}
hpar::KMeansC_hp
opt::BML_options
par::Union{Nothing,KMeansMedoids_lp}
cres::Union{Nothing,Vector{Int64}}
fitted::Bool
info::Dict{String,Any}
Expand All @@ -364,7 +364,7 @@ The classical "K-Medoids" clustering algorithm (unsupervised).
Similar to K-Means, learn to partition the data and assign each record to one of the `n_classes` classes according to a distance metric, but the "representatives" (the cetroids) are guaranteed to be one of the training points. The algorithm work with any arbitrary distance measure (default Euclidean).
For the parameters see [`?KMedoidsHyperParametersSet`](@ref KMedoidsHyperParametersSet) and [`?BetaMLDefaultOptionsSet`](@ref BetaMLDefaultOptionsSet).
For the parameters see [`?KMedoidsC_hp`](@ref KMedoidsC_hp) and [`?BML_options`](@ref BML_options).
# Notes:
- data must be numerical
Expand Down Expand Up @@ -405,22 +405,22 @@ Dict{String, Any} with 2 entries:
"xndims" => 2
julia> parameters(mod)
BetaML.Clustering.KMeansMedoidsLearnableParameters (a BetaMLLearnableParametersSet struct)
BetaML.Clustering.KMeansMedoids_lp (a BetaMLLearnableParametersSet struct)
- representatives: [0.9 9.8; 11.0 0.9]
```
"""
mutable struct KMedoidsClusterer <: BetaMLUnsupervisedModel
hpar::KMedoidsHyperParametersSet
opt::BetaMLDefaultOptionsSet
par::Union{Nothing,KMeansMedoidsLearnableParameters}
hpar::KMedoidsC_hp
opt::BML_options
par::Union{Nothing,KMeansMedoids_lp}
cres::Union{Nothing,Vector{Int64}}
fitted::Bool
info::Dict{String,Any}
end


function KMeansClusterer(;kwargs...)
m = KMeansClusterer(KMeansHyperParametersSet(),BetaMLDefaultOptionsSet(),KMeansMedoidsLearnableParameters(),nothing,false,Dict{Symbol,Any}())
m = KMeansClusterer(KMeansC_hp(),BML_options(),KMeansMedoids_lp(),nothing,false,Dict{Symbol,Any}())
thisobjfields = fieldnames(nonmissingtype(typeof(m)))
for (kw,kwv) in kwargs
found = false
Expand All @@ -437,7 +437,7 @@ function KMeansClusterer(;kwargs...)
end

function KMedoidsClusterer(;kwargs...)
m = KMedoidsClusterer(KMedoidsHyperParametersSet(),BetaMLDefaultOptionsSet(),KMeansMedoidsLearnableParameters(),nothing,false,Dict{Symbol,Any}())
m = KMedoidsClusterer(KMedoidsC_hp(),BML_options(),KMeansMedoids_lp(),nothing,false,Dict{Symbol,Any}())
thisobjfields = fieldnames(nonmissingtype(typeof(m)))
for (kw,kwv) in kwargs
found = false
Expand Down Expand Up @@ -479,7 +479,7 @@ function fit!(m::KMeansClusterer,x)
else
(clIdx,Z) = kmeans(x,K,dist=dist,initialisation_strategy=initialisation_strategy,initial_representatives=initial_representatives,verbosity=verbosity,rng=rng)
end
m.par = KMeansMedoidsLearnableParameters(representatives=Z)
m.par = KMeansMedoids_lp(representatives=Z)
m.cres = cache ? clIdx : nothing
m.info["fitted_records"] = get(m.info,"fitted_records",0) + size(x,1)
m.info["xndims"] = size(x,2)
Expand Down Expand Up @@ -514,7 +514,7 @@ function fit!(m::KMedoidsClusterer,x)
else
(clIdx,Z) = kmedoids(x,K,dist=dist,initialisation_strategy=initialisation_strategy,initial_representatives=initial_representatives,verbosity=verbosity,rng=rng)
end
m.par = KMeansMedoidsLearnableParameters(representatives=Z)
m.par = KMeansMedoids_lp(representatives=Z)
m.cres = cache ? clIdx : nothing
m.info["fitted_records"] = get(m.info,"fitted_records",0) + size(x,1)
m.info["xndims"] = size(x,2)
Expand Down
2 changes: 1 addition & 1 deletion src/GMM/GMM.jl
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ import Base.show
export AbstractMixture,
GaussianMixtureClusterer,
GaussianMixtureRegressor2, GaussianMixtureRegressor,
GMMHyperParametersSet
GaussianMixture_hp

abstract type AbstractMixture end

Expand Down
22 changes: 11 additions & 11 deletions src/GMM/GMM_clustering.jl
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ Hyperparameters for GMM clusters and other GMM-based algorithms
$(FIELDS)
"""
mutable struct GMMHyperParametersSet <: BetaMLHyperParametersSet
mutable struct GaussianMixture_hp <: BetaMLHyperParametersSet
"Number of mixtures (latent classes) to consider [def: 3]"
n_classes::Int64
"Initial probabilities of the categorical distribution (n_classes x 1) [default: `[]`]"
Expand Down Expand Up @@ -208,7 +208,7 @@ mutable struct GMMHyperParametersSet <: BetaMLHyperParametersSet
"""
tunemethod::AutoTuneMethod

function GMMHyperParametersSet(;
function GaussianMixture_hp(;
n_classes::Union{Nothing,Int64} = nothing, # def: 3
initial_probmixtures::Vector{Float64} = Float64[],
mixtures::Union{Type,Vector{<: AbstractMixture},Nothing} = nothing, # DiagonalGaussian
Expand Down Expand Up @@ -246,7 +246,7 @@ mutable struct GMMHyperParametersSet <: BetaMLHyperParametersSet
end


Base.@kwdef mutable struct GMMClusterLearnableParameters <: BetaMLLearnableParametersSet
Base.@kwdef mutable struct GMMCluster_lp <: BetaMLLearnableParametersSet
mixtures::Union{Type,Vector{<: AbstractMixture}} = AbstractMixture[] # attention that this is set up at model construction, as it has the same name as the hyperparameter
initial_probmixtures::Vector{Float64} = []
#probRecords::Union{Nothing,Matrix{Float64}} = nothing
Expand All @@ -257,7 +257,7 @@ $(TYPEDEF)
Assign class probabilities to records (i.e. _soft_ clustering) assuming a probabilistic generative model of observed data using mixtures.
For the parameters see [`?GMMHyperParametersSet`](@ref GMMHyperParametersSet) and [`?BetaMLDefaultOptionsSet`](@ref BetaMLDefaultOptionsSet).
For the parameters see [`?GaussianMixture_hp`](@ref GaussianMixture_hp) and [`?BML_options`](@ref BML_options).
# Notes:
- Data must be numerical
Expand Down Expand Up @@ -298,22 +298,22 @@ Dict{String, Any} with 6 entries:
"BIC" => -2.21571
julia> parameters(mod)
BetaML.GMM.GMMClusterLearnableParameters (a BetaMLLearnableParametersSet struct)
BetaML.GMM.GMMCluster_lp (a BetaMLLearnableParametersSet struct)
- mixtures: DiagonalGaussian{Float64}[DiagonalGaussian{Float64}([0.9333333333333332, 9.9], [0.05, 0.05]), DiagonalGaussian{Float64}([11.05, 0.9500000000000001], [0.05, 0.05])]
- initial_probmixtures: [0.0, 1.0]
```
"""
mutable struct GaussianMixtureClusterer <: BetaMLUnsupervisedModel
hpar::GMMHyperParametersSet
opt::BetaMLDefaultOptionsSet
par::Union{Nothing,GMMClusterLearnableParameters}
hpar::GaussianMixture_hp
opt::BML_options
par::Union{Nothing,GMMCluster_lp}
cres::Union{Nothing,Matrix{Float64}}
fitted::Bool
info::Dict{String,Any}
end

function GaussianMixtureClusterer(;kwargs...)
m = GaussianMixtureClusterer(GMMHyperParametersSet(),BetaMLDefaultOptionsSet(),GMMClusterLearnableParameters(),nothing,false,Dict{Symbol,Any}())
m = GaussianMixtureClusterer(GaussianMixture_hp(),BML_options(),GMMCluster_lp(),nothing,false,Dict{Symbol,Any}())
thisobjfields = fieldnames(nonmissingtype(typeof(m)))
for (kw,kwv) in kwargs
found = false
Expand All @@ -327,7 +327,7 @@ function GaussianMixtureClusterer(;kwargs...)
found || error("Keyword \"$kw\" is not part of this model.")
end

# Special correction for GMMHyperParametersSet
# Special correction for GaussianMixture_hp
kwkeys = keys(kwargs) #in(2,[1,2,3])
if !in(:mixtures,kwkeys) && !in(:n_classes,kwkeys)
m.hpar.n_classes = 3
Expand Down Expand Up @@ -381,7 +381,7 @@ function fit!(m::GaussianMixtureClusterer,x)
gmmOut = gmm(x,K;initial_probmixtures=initial_probmixtures,mixtures=mixtures,tol=tol,verbosity=verbosity,minimum_variance=minimum_variance,minimum_covariance=minimum_covariance,initialisation_strategy=initialisation_strategy,maximum_iterations=maximum_iterations,rng = rng)
end
probRecords = gmmOut.pₙₖ
m.par = GMMClusterLearnableParameters(mixtures = gmmOut.mixtures, initial_probmixtures=makecolvector(gmmOut.pₖ))
m.par = GMMCluster_lp(mixtures = gmmOut.mixtures, initial_probmixtures=makecolvector(gmmOut.pₖ))

m.cres = cache ? probRecords : nothing
m.info["error"] = gmmOut.ϵ
Expand Down
Loading

0 comments on commit a96b5df

Please sign in to comment.