Skip to content

Latest commit

 

History

History
62 lines (60 loc) · 5.17 KB

README.md

File metadata and controls

62 lines (60 loc) · 5.17 KB

Reweighted-FastLTS

The Reweighted-FastLTS is a robust regression algorithm that allows you to detect anomalous observations. A Python implementation of FastLTS (by Michele Cappellari) is based on the analysis of datasets with 3 predictors (p). Inspired by the work of Cappellari and the research of Prof. Peter Rousseeuw I implemented a python version of the Reweighted-FastLTS for (i) p predictors with p < n (n number of observations) (ii) n < 600.

The attributes of Reweighted-FastLTS python class are the same that would be obtained by invoking the ltsReg in RStudio. Some doubts are about the implementation of FastMCD. In particular, I used MinCovDet from the sklearn library, and I realized that the location and the covariance matrix are different from those obtained by RStudio, with the consequence that the Robust Distance is different.

- Reference

- Some examples

Below I report the results of some tests. In particular, in the left column you will see the results obtained with Reweighted-FastLTS, while in the right column you will see the results obtained with ltsReg of RStudio's robustbase library. The datasets used are Hawkins-Bradu-Kass data(HBK) and Stackloss data .

-- Hawkins-Bradu-Kass

Reweighted-FastLTSltsReg
alpha0.50.5
quan4040
raw_coefficents[ 0.27835867, 0.04327558, -0.10558377] [0.27835868, 0.04327561, -0.10558381]
raw_intercept-0.62325114 -0.6232511
raw_scale 0.8535975675079938 0.8543587
raw_correction_factor1.27529191.275292
coefficents[0.08137871, 0.03990183, -0.05166559][0.08137871, 0.03990181, -0.05166558]
intercept-0.18046165-0.18046163
scale0.7440411624944030.7440412
chn_factor1.345862381.345862
correction_factor1.016265931.016266
good leverage points[11, 12, 13, 14][11, 12, 13, 14]
leverage points[1, 2, 3, 4, 5, 6, 7, 8, 9, 10][1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
vertical outliers

-- Stackloss data

Reweighted-FastLTSltsReg
alpha0.50.5
quan 1313
raw_coefficents[0.7409212, 0.39152664, 0.01113465][0.74092106, 0.39152672, 0.01113454]
raw_intercept-37.32334-37.32332647
raw_scale1.8631428810848381.863146
raw_correction_factor1.884166451.884166
coefficents[0.7976856, 0.5773405, -0.06706011][0.79768556, 0.57734046, -0.06706018]
intercept-37.652466-37.65245890
scale1.92187709288300331.921877
chn_factor1.486894151.486894
correction_factor1.144674241.144674
good leverage points[2, 15, 16, 17, 18, 19][2, 15, 16, 17, 18, 19]
bad leverage points[1, 3, 21][1, 3, 21]
vertical outliers[4][4]