Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] MLJ pipelines do not work with outlier detectors #31

Open
hpaldan opened this issue Sep 1, 2022 · 4 comments
Open

[BUG] MLJ pipelines do not work with outlier detectors #31

hpaldan opened this issue Sep 1, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@hpaldan
Copy link

hpaldan commented Sep 1, 2022

Describe the bug

I have a problem using the UnsupervisedDetector models in a pipeline. I have tried a two different simple linear pipelines, one with a standardizer and LOFDetector and one with standardizer and IForestDetector. It seems like the fit! function doesn't work properly
on the detector models when they are in a pipeline since no training seems to take place and when I try to transform new data
with the machine it gives an error message:
"ERROR: MethodError: objects of type OutlierDetectionPython.IForestDetector are not callable"

To Reproduce

Hopefully the code example isn't too long.

using Pkg

Pkg.add("MLJ")
Pkg.add("OutlierDetection")
Pkg.add("DataFrames")
using MLJ
using OutlierDetection
using DataFrames

fake_dataframe = DataFrame(A=rand(100).-10 .*10,B= rand(100).+10 .*10)

#Load models
LOF = @iload LOFDetector() pkg= OutlierDetectionNeighbors 
IForest = @iload IForestDetector() pkg = OutlierDetectionPython

#Instantiate models
model_standardizer = Standardizer();
model_IForest = IForest();
model_LOF = LOF();

#Create pipelines
pipe_standardized_LOF = model_standardizer |> model_LOF
pipe_standardized_Iforest = model_standardizer |> model_IForest

#Create machines
mach_standardizer_LOF = machine(pipe_standardized_LOF,fake_dataframe)
mach_standardizer_Iforest = machine(pipe_standardized_Iforest,fake_dataframe)
mach_LOF = machine(model_LOF,fake_dataframe)

#fit machines
fit!(mach_standardizer_LOF);
fit!(mach_standardizer_Iforest);
fit!(mach_LOF);

#=
Here the transformation gives an error for pipelines but not for a single machine.
=#
fake_dataframe_1 = MLJ.transform(mach_standardizer_LOF,fake_dataframe)
fake_dataframe_2 = MLJ.transform(mach_standardizer_Iforest,fake_dataframe)
fake_dataframe_3 = MLJ.transform(mach_LOF,fake_dataframe)

# Trying another unsupervised model to rule out that 
#all unsupervised models doesn't work:

KMeans = @iload KMeans pkg=ParallelKMeans
model_KMeans = KMeans();
pipe_standardized_KMeans = model_standardizer |> model_KMeans
mach_standardizer_KMeans = machine(pipe_standardized_KMeans,fake_dataframe);
fit!(mach_standardizer_KMeans);

fake_dataframe_4 = MLJ.transform(mach_standardizer_KMeans,fake_dataframe)

Expected behavior

I expect the transform function to output an anomaly score from a machine that first standardizes the data and then do some kinde of
detector model on it.

Additional context

I have tried the same thing (as is in the code above) with other unsupervised models and it seems to work fine on them so the problem
is probably isolated to the OutlierDetection package.
I've also tried a PCA model instead of a standardizer with a oulierdetection model in a pipeline with the same problem.

Versions

Please run the following code snippet and paste the output here:

from sktime import show_versions; show_versions() <--- I didn't get this one to work at all so I will send the information from versioninfo instead.

From versioninfo:
Julia Version 1.6.7
Commit 3b76b25b64 (2022-07-19 15:11 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS =

@hpaldan hpaldan added the bug Something isn't working label Sep 1, 2022
@hpaldan
Copy link
Author

hpaldan commented Sep 1, 2022

I totally didn't understand that the arrows was for comments.. Rookie mistake.

@hpaldan
Copy link
Author

hpaldan commented Sep 1, 2022

I still hade to make some minor fixes on my bad description.

@davnn
Copy link
Member

davnn commented Sep 1, 2022

Hey! The reason might be that pipelines only support

const SUPPORTED_TYPES_FOR_PIPELINES = [
    :Deterministic,
    :Probabilistic,
    :Interval,
    :Unsupervised,
    :Static]

models, but outlier detection algorithms are currently modeled as a separate entity (Annotator <: Model) in MLJ.

  1. I'm not sure if that's really the reason for the mentioned error
  2. I'm not sure if it would make sense to add support for annotators to pipelines because then we would also have to add support to a lot of other scattered places all over MLJ. I would prefer to subtype Detector directly from Unsupervised or Supervised, but that too would require some major changes.

In the meantime, however, you could directly use the learning networks API to achieve your desired pipeline:

using MLJ
using OutlierDetection
using DataFrames

fake_dataframe = DataFrame(A=rand(100) .- 10 .* 10, B=rand(100) .+ 10 .* 10)

#Load models
LOF = @iload LOFDetector() pkg = OutlierDetectionNeighbors
IForest = @iload IForestDetector() pkg = OutlierDetectionPython

#Learning networks
Xs = source(fake_dataframe)
Xstd = MLJ.transform(machine(Standardizer(), Xs), Xs)
lof_mach = MLJ.transform(machine(LOF(), Xstd), Xstd)
forest_mach = MLJ.transform(machine(IForest(), Xstd), Xstd)

fit!(lof_mach)
lof_mach(fake_dataframe)

fit!(forest_mach)
forest_mach(fake_dataframe)

@davnn davnn changed the title [BUG] [BUG] MLJ pipelines do not work with outlier detectors Sep 1, 2022
@hpaldan
Copy link
Author

hpaldan commented Sep 2, 2022

All right, too bad that the fix would require that much work.
Thank you for the fast reply and good guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants