Adding loggers into TunedModels #193

pebeto · 2023-09-10T19:30:49Z

Adding parametric type L for loggers (detailed implementation in MLJBase.jl).

codecov · 2023-09-10T19:37:41Z

Codecov Report

Attention: Patch coverage is 90.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 87.55%. Comparing base (bb59cae) to head (2b63fa8).

Files	Patch %	Lines
src/tuned_models.jl	90.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #193      +/-   ##
==========================================
+ Coverage   87.53%   87.55%   +0.01%     
==========================================
  Files          13       13              
  Lines         666      667       +1     
==========================================
+ Hits          583      584       +1     
  Misses         83       83

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ablaom · 2023-09-11T03:03:13Z

Looking good, thanks!

Does it all look good on the MLflow service when fitting a TunedModel(model, logger=MLFlowLogger(...), ...)?

pebeto · 2023-09-11T07:52:31Z

Looking good locally. I've just uploaded the TunedModel test case JuliaAI/MLJFlow.jl@2153b69

ablaom · 2023-09-11T19:32:53Z

Played around with this some more. Very cool, thanks!

However, there is a problem running in multithread mode. It seem only one thread is logging:

using MLJ
using .Threads
using MLFlowClient
nthreads()
# 5

logger = MLFlowLogger("http://127.0.0.1:5000", experiment_name="horse")
X, y = make_moons()
model = (@load RandomForestClassifier pkg=DecisionTree)()

r = range(model, :sampling_fraction, lower=0.4, upper=1.0)

tmodel = TunedModel(
    model;
    range=r,
    logger,
    acceleration=CPUThreads(),
    n=100,
)

mach = machine(tmodel, X, y) |> fit!;
nruns = length(report(mach).history)
# 100

service = MLJFlow.service(logger)
experiment = MLFlowClient.getexperiment(service, "horse")
id = experiment.experiment_id
runs = MLFlowClient.searchruns(service, id);
length(runs)
# 20

@assert length(runs) == nruns
# ERROR: AssertionError: length(runs) == nruns
# Stacktrace:
#  [1] top-level scope
#    @ REPL[166]:1

ablaom · 2023-09-11T19:36:01Z

The problem is we are missing logger in the cloning of the resampling machine happening here:

https://github.com/pebeto/MLJTuning.jl/blob/6f295b7439a9884fa35c16841ded33db2d272227/src/tuned_models.jl#L590

ablaom · 2023-09-11T19:38:06Z

I think CPUProcesses should be fine, but we should add a test for this at MLJFlow.jl (and for CPUThreads).

ablaom · 2023-09-24T19:45:59Z

Thanks for the addition. Sadly, this is still not working for me. I'm getting three experiments, with different id's and same name, "horse" on the server. (I'm only expecting one). One contains 20 evaluations, the other two contains only 1 each, and this complaint is thrown several times:

    {"error_code": "RESOURCE_ALREADY_EXISTS", "message": "Experiment 'horse' already exists."}""")

Do you have any idea what is happening?

ERROR: TaskFailedException

nested task error: HTTP.Exceptions.StatusError(400, "POST", "/api/2.0/mlflow/experiments/create", HTTP.Messages.Response:
"""
HTTP/1.1 400 Bad Request
Server: gunicorn
Date: Sun, 24 Sep 2023 19:40:45 GMT
Connection: close
Content-Type: application/json
Content-Length: 90

{"error_code": "RESOURCE_ALREADY_EXISTS", "message": "Experiment 'horse' already exists."}""")
Stacktrace:
  [1] mlfpost(mlf::MLFlow, endpoint::String; kwargs::Base.Pairs{Symbol, Union{Missing, Nothing, String}, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:name, :artifact_location, :tags), Tuple{String, Nothing, Missing}}})
    @ MLFlowClient ~/.julia/packages/MLFlowClient/Szkbv/src/utils.jl:74
  [2] mlfpost
    @ ~/.julia/packages/MLFlowClient/Szkbv/src/utils.jl:66 [inlined]
  [3] createexperiment(mlf::MLFlow; name::String, artifact_location::Nothing, tags::Missing)                                                                                    
    @ MLFlowClient ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:21
  [4] createexperiment
    @ ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:16 [inlined]
  [5] #getorcreateexperiment#7
    @ ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:103 [inlined]
  [6] log_evaluation(logger::MLFlowLogger, performance_evaluation::PerformanceEvaluation

{MLJDecisionTreeInterface.RandomForestClassifier, Vector{LogLoss{Float64}}, Vector{Float64}, Vector{typeof(predict)}, Vector{Vector{Float64}}, Vector{Vector{Vector{Float64}}}, Vector{NamedTuple{(:forest,), Tuple{DecisionTree.Ensemble{Float64, UInt32}}}}, Vector{NamedTuple{(:features,), Tuple{Vector{Symbol}}}}, Holdout})
@ MLJFlow ~/.julia/packages/MLJFlow/TqEtw/src/base.jl:2
[7] evaluate!(mach::Machine{MLJDecisionTreeInterface.RandomForestClassifier, true}, resampling::Vector{Tuple{Vector{Int64}, Vector{Int64}}}, weights::Nothing, class_weights::Nothing, rows::Nothing, verbosity::Int64, repeats::Int64, measures::Vector{LogLoss{Float64}}, operations::Vector{typeof(predict)}, acceleration::CPU1{Nothing}, force::Bool, logger::MLFlowLogger, user_resampling::Holdout)
@ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1314
[8] evaluate!(::Machine{MLJDecisionTreeInterface.RandomForestClassifier, true}, ::Holdout, ::Nothing, ::Nothing, ::Nothing, ::Int64, ::Int64, ::Vector{LogLoss{Float64}}, ::Vector
{typeof(predict)}, ::CPU1{Nothing}, ::Bool, ::MLFlowLogger, ::Holdout)
@ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1335
[9] fit(::Resampler{Holdout, MLFlowLogger}, ::Int64, ::Tables.MatrixTable{Matrix{Float64}}, ::CategoricalArrays.CategoricalVector{Int64, UInt32, Int64, CategoricalArrays.CategoricalValue{Int64, UInt32}, Union{}})
@ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1494
[10] fit_only!(mach::Machine{Resampler{Holdout, MLFlowLogger}, false}; rows::Nothing, verbosity::Int64, force::Bool, composite::Nothing)
@ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:680
[11] fit_only!
@ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:606 [inlined]
[12] #fit!#63
@ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:778 [inlined]
[13] fit!
@ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:775 [inlined]
[14] event!(metamodel::MLJDecisionTreeInterface.RandomForestClassifier, resampling_machine::Machine{Resampler{Holdout, MLFlowLogger}, false}, verbosity::Int64, tuning::RandomSearch, history::Nothing, state::Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}})
@ MLJTuning ~/MLJ/MLJTuning/src/tuned_models.jl:443
[15] #46
@ ~/MLJ/MLJTuning/src/tuned_models.jl:597 [inlined]
[16] iterate
@ ./generator.jl:47 [inlined]
[17] _collect(c::Vector{MLJDecisionTreeInterface.RandomForestClassifier}, itr::Base.Generator{Vector{MLJDecisionTreeInterface.RandomForestClassifier}, MLJTuning.var"#46#50"{Int64, RandomSearch, Nothing, Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}}, Channel{Bool}, Vector{Machine{Resampler{Holdout, MLFlowLogger}, false}}, Int64}}, #unused#::Base.EltypeUnknown, isz::Base.HasShape{1})
@ Base ./array.jl:802
[18] collect_similar
@ ./array.jl:711 [inlined]
[19] map
@ ./abstractarray.jl:3261 [inlined]
[20] macro expansion
@ ~/MLJ/MLJTuning/src/tuned_models.jl:596 [inlined]
[21] (::MLJTuning.var"#45#49"{Vector{MLJDecisionTreeInterface.RandomForestClassifier}, Int64, RandomSearch, Nothing, Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}}, Channel{Bool}, Vector{Any}, Vector{Machine{Resampler{Holdout, MLFlowLogger}, false}}, UnitRange{Int64}, Int64})()
@ MLJTuning ./threadingconstructs.jl:373

ablaom · 2023-09-24T20:11:40Z

Interestingly, I'm getting the same kind of error for acceleration=Distributed:

using Distributed
addprocs(2)

nprocs()
# 3

using MLJ
using MLFlowClient
logger = MLFlowLogger("http://127.0.0.1:5000", experiment_name="rock")

X, y = make_moons()
model = (@iload RandomForestClassifier pkg=DecisionTree)()

r = range(model, :sampling_fraction, lower=0.4, upper=1.0)

tmodel = TunedModel(
    model;
    range=r,
    logger,
    acceleration=CPUProcesses(),
    n=100,
)

mach = machine(tmodel, X, y) |> fit!;

[ Info: Training machine(ProbabilisticTunedModel(model = RandomForestClassifier(max_depth = -1, …), …), …).
[ Info: Attempting to evaluate 100 models.
      From worker 3:    ┌ Error: Problem fitting the machine machine(Resampler(model = RandomForestClassifier(max_depth = -1, …), …), …). 
      From worker 3:    └ @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:682
      From worker 3:    [ Info: Running type checks... 
      From worker 3:    [ Info: Type checks okay. 
Evaluating over 100 metamodels:  50%[============>            ]  ETA: 0:00:15┌ Error: Proble
m fitting the machine machine(ProbabilisticTunedModel(model = RandomForestClassifier(max_depth = -1, …), …), …). 
└ @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:682
[ Info: Running type checks... 
[ Info: Type checks okay. 
ERROR: TaskFailedException
Stacktrace:
  [1] wait
    @ ./task.jl:349 [inlined]
  [2] fetch
    @ ./task.jl:369 [inlined]
  [3] preduce(reducer::Function, f::Function, R::Vector{MLJDecisionTreeInterface.RandomForestClassifier})                                          
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/macros.jl:274
  [4] macro expansion
    @ ~/MLJ/MLJTuning/src/tuned_models.jl:521 [inlined]
  [5] macro expansion
    @ ./task.jl:476 [inlined]
  [6] assemble_events!(metamodels::Vector{MLJDecisionTreeInterface.RandomForestClassifier}, 
resampling_machine::Machine{Resampler{Holdout, MLFlowLogger}, false}, verbosity::Int64, tuning::RandomSearch, history::Nothing, state::Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}}, acceleration::CPUProcesses{Nothing})
    @ MLJTuning ~/MLJ/MLJTuning/src/tuned_models.jl:502
  [7] build!(history::Nothing, n::Int64, tuning::RandomSearch, model::MLJDecisionTreeInterface.RandomForestClassifier, model_buffer::Channel{Any}, state::Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}}, verbosity::Int64, acceleration::CPUProcesses{Nothing}, resampling_machine::Machine{Resampler{Holdout, MLFlowLogger}, false})                                                           
    @ MLJTuning ~/MLJ/MLJTuning/src/tuned_models.jl:675
  [8] fit(::MLJTuning.ProbabilisticTunedModel{RandomSearch, MLJDecisionTreeInterface.RandomForestClassifier, MLFlowLogger}, ::Int64, ::Tables.MatrixTable{Matrix{Float64}}, ::CategoricalArrays.CategoricalVector{Int64, UInt32, Int64, CategoricalArrays.CategoricalValue{Int64, UInt32}, Union{}})         
    @ MLJTuning ~/MLJ/MLJTuning/src/tuned_models.jl:756
  [9] fit_only!(mach::Machine{MLJTuning.ProbabilisticTunedModel{RandomSearch, MLJDecisionTreeInterface.RandomForestClassifier, MLFlowLogger}, false}; rows::Nothing, verbosity::Int64, force::Bool, composite::Nothing)                                                            
    @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:680
 [10] fit_only!
    @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:606 [inlined]
 [11] #fit!#63
    @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:778 [inlined]
 [12] fit!
    @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:775 [inlined]
 [13] |>(x::Machine{MLJTuning.ProbabilisticTunedModel{RandomSearch, MLJDecisionTreeInterface.RandomForestClassifier, MLFlowLogger}, false}, f::typeof(fit!))
    @ Base ./operators.jl:907
 [14] top-level scope
    @ REPL[16]:1

    nested task error: On worker 3:
    HTTP.Exceptions.StatusError(400, "POST", "/api/2.0/mlflow/experiments/create", HTTP.Messages.Response:
    """
    HTTP/1.1 400 Bad Request
    Server: gunicorn
    Date: Sun, 24 Sep 2023 20:07:23 GMT
    Connection: close
    Content-Type: application/json
    Content-Length: 89
    
    {"error_code": "RESOURCE_ALREADY_EXISTS", "message": "Experiment 'rock' already exists."}""")
    Stacktrace:
      [1] #mlfpost#3
        @ ~/.julia/packages/MLFlowClient/Szkbv/src/utils.jl:74
      [2] mlfpost
        @ ~/.julia/packages/MLFlowClient/Szkbv/src/utils.jl:66 [inlined]
      [3] #createexperiment#6
        @ ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:21
      [4] createexperiment
        @ ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:16 [inlined]
      [5] #getorcreateexperiment#7
        @ ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:103 [inlined]
      [6] log_evaluation
        @ ~/.julia/packages/MLJFlow/TqEtw/src/base.jl:2
      [7] evaluate!
        @ ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1314
      [8] evaluate!
        @ ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1335
      [9] fit
        @ ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1494
     [10] #fit_only!#57
        @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:680
     [11] fit_only!
        @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:606 [inlined]
     [12] #fit!#63
        @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:778 [inlined]
     [13] fit!
        @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:775 [inlined]
     [14] event!
        @ ~/MLJ/MLJTuning/src/tuned_models.jl:443
     [15] macro expansion
        @ ~/MLJ/MLJTuning/src/tuned_models.jl:522 [inlined]
     [16] #39
        @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/macros.jl:288
     [17] #invokelatest#2
        @ ./essentials.jl:816
     [18] invokelatest
        @ ./essentials.jl:813
     [19] #110
        @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285
     [20] run_work_thunk
        @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:70
     [21] macro expansion
        @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285 [inlined]
     [22] #109
        @ ./task.jl:514
    Stacktrace:
     [1] remotecall_fetch(::Function, ::Distributed.Worker, ::Function, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})                      
       @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:465
     [2] remotecall_fetch(::Function, ::Distributed.Worker, ::Function, ::Vararg{Any})
       @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:454
     [3] #remotecall_fetch#162
       @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492 [inlined]
     [4] remotecall_fetch
       @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492 [inlined]
     [5] (::Distributed.var"#175#176"{typeof(vcat), MLJTuning.var"#39#42"{Machine{Resampler{Holdout, MLFlowLogger}, false}, Int64, RandomSearch, Nothing, Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}}, RemoteChannel{Channel{Bool}}}, Vector{MLJDecisionTreeInterface.RandomForestClassifier}, Vector{UnitRange{Int64}}, Int64, Int64})()
       @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/macros.jl:270

ablaom · 2023-09-24T20:42:13Z

Okay, see here for a MWE: JuliaAI/MLFlowClient.jl#40

ablaom · 2024-01-23T19:57:29Z

Revisiting this issue after a few months.

It looks like the multithreading issue is not likely to be addressed soon. Perhaps we can proceed with this PR, after strictly ruling out logging for the parallel modes. For example, if logger is different from nothing, and either acceleration or acceleration_resampling are different from CPU1(), then clean! resets the accelerations to CPU() and issues a message saying what it has done and why. The clean! code is here.

@pebeto What do you think?

pebeto · 2024-03-07T18:54:36Z

The solution to this issue is not part of the mlflow plans (see mlflow/mlflow#11122). However, a workaround is presented here: JuliaAI/MLJFlow.jl#36 to ensure our process is thread-safe.

pebeto mentioned this pull request Sep 24, 2023

Add tests for CPUThreads and CPUProcesses JuliaAI/MLJFlow.jl#25

Closed

ablaom mentioned this pull request Apr 3, 2024

Adding lock to control experiment creation during multiprocessing JuliaAI/MLJFlow.jl#36

Closed

pebeto added 2 commits May 17, 2024 23:51

Adding loggers into TunedModels

642bca7

Adding logger inside Resampler declaration for multi threading

2b63fa8

pebeto force-pushed the dev branch from f4085d7 to 2b63fa8 Compare May 18, 2024 04:53

pebeto mentioned this pull request May 21, 2024

Implementing channel control for experiment creation JuliaAI/MLJFlow.jl#41

Merged

ablaom merged commit dc1d6d4 into JuliaAI:dev May 21, 2024
4 checks passed

This was referenced May 21, 2024

For a 0.8.6 release #218

Merged

Issue for triggering new releases #59

Closed

ablaom mentioned this pull request Jul 22, 2024

Add logger JuliaAI/MLJIteration.jl#65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding loggers into TunedModels #193

Adding loggers into TunedModels #193

pebeto commented Sep 10, 2023

codecov bot commented Sep 10, 2023 •

edited

Loading

ablaom commented Sep 11, 2023

pebeto commented Sep 11, 2023

ablaom commented Sep 11, 2023

ablaom commented Sep 11, 2023 •

edited

Loading

ablaom commented Sep 11, 2023

ablaom commented Sep 24, 2023

ablaom commented Sep 24, 2023

ablaom commented Sep 24, 2023

ablaom commented Jan 23, 2024

pebeto commented Mar 7, 2024

Adding loggers into TunedModels #193

Adding loggers into TunedModels #193

Conversation

pebeto commented Sep 10, 2023

codecov bot commented Sep 10, 2023 • edited Loading

Codecov Report

ablaom commented Sep 11, 2023

pebeto commented Sep 11, 2023

ablaom commented Sep 11, 2023

ablaom commented Sep 11, 2023 • edited Loading

ablaom commented Sep 11, 2023

ablaom commented Sep 24, 2023

ablaom commented Sep 24, 2023

ablaom commented Sep 24, 2023

ablaom commented Jan 23, 2024

pebeto commented Mar 7, 2024

codecov bot commented Sep 10, 2023 •

edited

Loading

ablaom commented Sep 11, 2023 •

edited

Loading