diff --git a/publications/iaai21/dora_outlier_detection_iaai.tex b/publications/iaai21/dora_outlier_detection_iaai.tex index df6fcfa..ef78f5b 100644 --- a/publications/iaai21/dora_outlier_detection_iaai.tex +++ b/publications/iaai21/dora_outlier_detection_iaai.tex @@ -285,7 +285,7 @@ \section{Methods} is associated with a location defined by a geographic coordinate reference system (e.g., latitude/longitude in degrees). Most satellite data are distributed as rasters; common formats include GeoTIFF, NetCDF, and -\todo{HDF}. A data loader for each data type locates the data by the path(s) +HDF. A data loader for each data type locates the data by the path(s) defined in the configuration file and loads samples into a dictionary of numpy arrays indexed by the sample id. This \texttt{data\_dict} is then passed to each of the ranking algorithms specified in the configuration file. @@ -345,10 +345,15 @@ \section{Methods} the background distribution. \subparagraph{Sparsity.} -\todo{Describe isolation forest: Umaa.} +Sparsity-based methods score outliers based on how isolated or sparse they +are in the feature space. Isolation forests are a common sparsity-based method +that constructs many random binary trees from a +dataset~\cite{liu2008isolation}. The outlier score for +a sample is quantified as the average distance from the root to the item’s +leaf. Shorter distances are indicative of outliers because fewer splits were +required to isolate the sample. \subparagraph{Likelihood.} -\todo{Describe PAE: Hannah or Bryce} The negative sampling algorithm is implemented by converting the unsupervised outlier ranking problem into a semi-supervised problem~\citep{sipple:neg-sampling20}. Negative (anomalous) @@ -357,7 +362,7 @@ \section{Methods} negative and positive examples are then used to train a random forest classifier. We use the posterior probabilities of the random forest classifier as outlier scores, which means that the observations with higher posterior -probabilities are more likely to be outliers. +probabilities are more likely to be outliers. \todo{Describe PAE: Hannah or Bryce} \paragraph{Results organization.} Each of the outlier ranking algorithms returns an array containing the sample diff --git a/publications/iaai21/dora_references.bib b/publications/iaai21/dora_references.bib index f0ca71e..ba1cadb 100644 --- a/publications/iaai21/dora_references.bib +++ b/publications/iaai21/dora_references.bib @@ -1,3 +1,12 @@ +@inproceedings{liu2008isolation, + title={Isolation forest}, + author={Liu, Fei Tony and Ting, Kai Ming and Zhou, Zhi-Hua}, + booktitle={2008 eighth ieee international conference on data mining}, + pages={413--422}, + year={2008}, + organization={IEEE} +} + @article{molero2013analysis, title={Analysis and optimizations of global and local versions of the RX algorithm for anomaly detection in hyperspectral data}, author={Molero, Jos{\'e} Manuel and Garzon, Ester M and Garcia, Inmaculada and Plaza, Antonio},