[BUG] MLJ pipelines do not work with outlier detectors #31

hpaldan · 2022-09-01T15:35:07Z

Describe the bug

I have a problem using the UnsupervisedDetector models in a pipeline. I have tried a two different simple linear pipelines, one with a standardizer and LOFDetector and one with standardizer and IForestDetector. It seems like the fit! function doesn't work properly
on the detector models when they are in a pipeline since no training seems to take place and when I try to transform new data
with the machine it gives an error message:
"ERROR: MethodError: objects of type OutlierDetectionPython.IForestDetector are not callable"

To Reproduce

Hopefully the code example isn't too long.

using Pkg

Pkg.add("MLJ")
Pkg.add("OutlierDetection")
Pkg.add("DataFrames")
using MLJ
using OutlierDetection
using DataFrames

fake_dataframe = DataFrame(A=rand(100).-10 .*10,B= rand(100).+10 .*10)

#Load models
LOF = @iload LOFDetector() pkg= OutlierDetectionNeighbors 
IForest = @iload IForestDetector() pkg = OutlierDetectionPython

#Instantiate models
model_standardizer = Standardizer();
model_IForest = IForest();
model_LOF = LOF();

#Create pipelines
pipe_standardized_LOF = model_standardizer |> model_LOF
pipe_standardized_Iforest = model_standardizer |> model_IForest

#Create machines
mach_standardizer_LOF = machine(pipe_standardized_LOF,fake_dataframe)
mach_standardizer_Iforest = machine(pipe_standardized_Iforest,fake_dataframe)
mach_LOF = machine(model_LOF,fake_dataframe)

#fit machines
fit!(mach_standardizer_LOF);
fit!(mach_standardizer_Iforest);
fit!(mach_LOF);

#=
Here the transformation gives an error for pipelines but not for a single machine.
=#
fake_dataframe_1 = MLJ.transform(mach_standardizer_LOF,fake_dataframe)
fake_dataframe_2 = MLJ.transform(mach_standardizer_Iforest,fake_dataframe)
fake_dataframe_3 = MLJ.transform(mach_LOF,fake_dataframe)

# Trying another unsupervised model to rule out that 
#all unsupervised models doesn't work:

KMeans = @iload KMeans pkg=ParallelKMeans
model_KMeans = KMeans();
pipe_standardized_KMeans = model_standardizer |> model_KMeans
mach_standardizer_KMeans = machine(pipe_standardized_KMeans,fake_dataframe);
fit!(mach_standardizer_KMeans);

fake_dataframe_4 = MLJ.transform(mach_standardizer_KMeans,fake_dataframe)

Expected behavior

I expect the transform function to output an anomaly score from a machine that first standardizes the data and then do some kinde of
detector model on it.

Additional context

I have tried the same thing (as is in the code above) with other unsupervised models and it seems to work fine on them so the problem
is probably isolated to the OutlierDetection package.
I've also tried a PCA model instead of a standardizer with a oulierdetection model in a pipeline with the same problem.

Versions

Please run the following code snippet and paste the output here:

from sktime import show_versions; show_versions() <--- I didn't get this one to work at all so I will send the information from versioninfo instead.

From versioninfo:
Julia Version 1.6.7
Commit 3b76b25b64 (2022-07-19 15:11 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS =

hpaldan · 2022-09-01T15:37:40Z

I totally didn't understand that the arrows was for comments.. Rookie mistake.

hpaldan · 2022-09-01T15:41:06Z

I still hade to make some minor fixes on my bad description.

davnn · 2022-09-01T20:23:39Z

Hey! The reason might be that pipelines only support

const SUPPORTED_TYPES_FOR_PIPELINES = [
    :Deterministic,
    :Probabilistic,
    :Interval,
    :Unsupervised,
    :Static]

models, but outlier detection algorithms are currently modeled as a separate entity (Annotator <: Model) in MLJ.

I'm not sure if that's really the reason for the mentioned error
I'm not sure if it would make sense to add support for annotators to pipelines because then we would also have to add support to a lot of other scattered places all over MLJ. I would prefer to subtype Detector directly from Unsupervised or Supervised, but that too would require some major changes.

In the meantime, however, you could directly use the learning networks API to achieve your desired pipeline:

using MLJ
using OutlierDetection
using DataFrames

fake_dataframe = DataFrame(A=rand(100) .- 10 .* 10, B=rand(100) .+ 10 .* 10)

#Load models
LOF = @iload LOFDetector() pkg = OutlierDetectionNeighbors
IForest = @iload IForestDetector() pkg = OutlierDetectionPython

#Learning networks
Xs = source(fake_dataframe)
Xstd = MLJ.transform(machine(Standardizer(), Xs), Xs)
lof_mach = MLJ.transform(machine(LOF(), Xstd), Xstd)
forest_mach = MLJ.transform(machine(IForest(), Xstd), Xstd)

fit!(lof_mach)
lof_mach(fake_dataframe)

fit!(forest_mach)
forest_mach(fake_dataframe)

hpaldan · 2022-09-02T08:36:00Z

All right, too bad that the fix would require that much work.
Thank you for the fast reply and good guidance!

hpaldan added the bug Something isn't working label Sep 1, 2022

davnn changed the title ~~[BUG]~~ [BUG] MLJ pipelines do not work with outlier detectors Sep 1, 2022

rolling-robot mentioned this issue Mar 18, 2023

[DOC] simple examples in documentation are broken #37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] MLJ pipelines do not work with outlier detectors #31

[BUG] MLJ pipelines do not work with outlier detectors #31

hpaldan commented Sep 1, 2022 •

edited

Loading

hpaldan commented Sep 1, 2022

hpaldan commented Sep 1, 2022

davnn commented Sep 1, 2022

hpaldan commented Sep 2, 2022

[BUG] MLJ pipelines do not work with outlier detectors #31

[BUG] MLJ pipelines do not work with outlier detectors #31

Comments

hpaldan commented Sep 1, 2022 • edited Loading

hpaldan commented Sep 1, 2022

hpaldan commented Sep 1, 2022

davnn commented Sep 1, 2022

hpaldan commented Sep 2, 2022

hpaldan commented Sep 1, 2022 •

edited

Loading