Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I print threshold beyond which MAD classifies a metric as outliers? #260

Open
ganesh-srinivas opened this issue Mar 20, 2018 · 1 comment

Comments

@ganesh-srinivas
Copy link

ganesh-srinivas commented Mar 20, 2018

Is there a way to print the threshold learned by MacroBase when running MAD classifier? I want to know what values of a metric are considered outliers.

@ganesh-srinivas ganesh-srinivas changed the title How can I print threshold beyond which MAD/MCD classifies metric(s) as outliers? How can I print threshold beyond which MAD classifies a metric as outliers? May 2, 2018
@ganesh-srinivas
Copy link
Author

Is it correct to use the following formula to calculate the upper thresholds and lower thresholds learned by the MAD classifier for a metric?

upper_threshold_for_metric_value = median + score_percentile_cutoff*MAD

lower_threshold_for_metric_value = median - score_percentile_cutoff*MAD

How

I was able to derive this formula from a function in legacy/src/main/java/macrobase/analysis/stats/MAD.java:

    public double score(Datum datum) {
        double point = datum.metrics().getEntry(0);
        return Math.abs(point - median) / (MAD);
    }

Verification

I printed the values of median, MAD and score_percentile_cutoff by setting the logging level to TRACE and running a batch query on
sensor_data_demo_db_version.txt (two extra rows carrying infinitesimally small values of power_drain).

The calculations give upper_threshold_for_metric_value = 0.8634457399999994 and lower_threshold_for_metric_value = -0.2608399236802713. This is very close to the observed
thresholds: 1010/1012 outliers have power_drain > .864 and 2/1012 outliers have power_drain <= -0.260865.

Report: 2may2018_sensor_data_demo_power_drain_mad_two_infitesimals.txt

INFO  [2018-05-02 13:17:19,301] macrobase.runtime.command.MacroBasePipelineCommand: Result: [outliers: 1012.000000
inliers: 100243.000000
load time 997ms
execution time: 471ms
summarization time: 417ms

-----

support: 0.998024
records: 1010.000000
ratio: 33698.157581

Columns:
        device_id: 2040
        model: M606
        firmware_version: 0.3.2
        state: MA

-----

]
INFO  [2018-05-02 13:17:18,170] macrobase.conf.MacroBaseConf: Using MAD transform.
TRACE [2018-05-02 13:17:18,295] macrobase.analysis.stats.MAD: trained! median is 0.301569, MAD is 0.05076799999999998
DEBUG [2018-05-02 13:17:18,565] macrobase.analysis.classify.BatchingPercentileClassifier: 0.99 Percentile Cutoff: 11.078469902300666
DEBUG [2018-05-02 13:17:18,569] macrobase.analysis.classify.BatchingPercentileClassifier: Median: 1.0
DEBUG [2018-05-02 13:17:18,573] macrobase.analysis.classify.BatchingPercentileClassifier: Max: 229.5061653009771

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant