Update pytorch example (#185)

* fix bug in scoring failure * update power example to support bit-depth search * update result directories * revert changes to example/power * add bit depth example * revert directory changes for power example * fixed directory for mnist in power example * fixed directory for mnist in power example * use the latest torch torchvision torchaudio * update workflow to push on PR * removed double stage folders in log folder * + more epochs for cifar100 * changed intervals from uniform to log uniform, made learning rate range larger * strip whitespace, covert o numeric in compile script * update git ignores * suport nb_epoch as defence choice * remove adv_success from requirements * add "NaN" to nones * update afr script * update mnist .dvc cache * updat cifar10 plots * uncomment paretoset in plotting * fix default defence bug and realtive pathing in compile script * moved plots to subfolder * better configuration support * fix compile script bug * update compile and plots yaml for power example * fix compile bug * update plots * include plot files in dvc * update afr to read from conf file * linting * linting * update pytorch example * update pytorch afr.yaml (not working) * split cleaning from plotting, but only working for examples/pytorch/mnist * working cleaning script * fix pytorch examples with new clean script * remove debug check from parse_results * make deckard a depedendency of the parsing script * made models.sh easier to read * update afr for pytorch example * update power example * update dvc.lock for pytorch example * update pytorch/cifar100 * update power/plots (not working) * add docstrings to plots.py * update power example with merge script * add power data * update configs * add combined plots * update afr models * added support for dummy variables in afr * ++combined_plots.py and fix afr bug * add cifar100 l4 power data with commenting everything else * add varepsilon to attack params * add dummy variables * fix rounding bug * update to newest plots * newest plots for power example * linting * removed old afr file * linting * Merge branch 'fix-compile-script' of github.com:simplymathematics/deckard into fix-compile-script * update conf * fixed kepler script bug * linting * linting * linting * linting * linting * linting * linting * linting * linting * fixed cifar100 pytorch example script * more resilient wait and cleaning scripts * +GZIP example * bug fixes * fix latex nan bug * bug fixes * add index to compilation csv * better defence merging * fixed bug where x,y scale are None * update cifar100 confs * fixed cleaning bug * fixed afr plot rendering bug * add check for negative predict time * update configs * fixed failure rate bug and updated confs * change plot default from eps to pdf * fix bug in calculating failure rate when attack size != train size * update mnist confs * specify attack size at the command line * linting * update all plot configs * Fix compile script (#172) * fix bug in scoring failure * update power example to support bit-depth search * update result directories * revert changes to example/power * add bit depth example * revert directory changes for power example * fixed directory for mnist in power example * fixed directory for mnist in power example * use the latest torch torchvision torchaudio * update workflow to push on PR * removed double stage folders in log folder * + more epochs for cifar100 * changed intervals from uniform to log uniform, made learning rate range larger * strip whitespace, covert o numeric in compile script * update git ignores * suport nb_epoch as defence choice * remove adv_success from requirements * add "NaN" to nones * update afr script * update mnist .dvc cache * updat cifar10 plots * uncomment paretoset in plotting * fix default defence bug and realtive pathing in compile script * moved plots to subfolder * better configuration support * fix compile script bug * update compile and plots yaml for power example * fix compile bug * update plots * include plot files in dvc * update afr to read from conf file * linting * linting * update pytorch example * update pytorch afr.yaml (not working) * split cleaning from plotting, but only working for examples/pytorch/mnist * working cleaning script * fix pytorch examples with new clean script * remove debug check from parse_results * make deckard a depedendency of the parsing script * made models.sh easier to read * update afr for pytorch example * update power example * update dvc.lock for pytorch example * update pytorch/cifar100 * update power/plots (not working) * add docstrings to plots.py * update power example with merge script * add power data * update configs * add combined plots * update afr models * added support for dummy variables in afr * ++combined_plots.py and fix afr bug * add cifar100 l4 power data with commenting everything else * add varepsilon to attack params * add dummy variables * fix rounding bug * update to newest plots * newest plots for power example * linting * removed old afr file * linting * Merge branch 'fix-compile-script' of github.com:simplymathematics/deckard into fix-compile-script * update conf * fixed kepler script bug * linting --------- Co-authored-by: Mohammad Reza Saleh <[email protected]> Co-authored-by: salehsedghpour <[email protected]> * added dummy vars, fixed plots * fix afr.py bugs * bad merge? * merge * fix bugs * linting * linting * linting * linting * linting * linting * update linter * update linter * update linter * update linter * linting * update setup, .gitignore * fix failure rate bug (again) * most up-to-date plots * update failure rate from h to f in pytorch examples * remove intercept and scale parameters from afr plots * remove rows where the score is an error * update plolt confs for pytorch example * allow setting filename from command line of AFR script * plot legend tweaks * linting * linting * Update Dockerfile * update dockerfile * lintin * update gzip configs * better logging * add url validation for data pipeline * git rm * update truthseeker yaml * add gzip .gitignore * gzip dvc changes * add sampling during training * more resilient find_best script * fixed bug with finding min/max when data is non-numeric * add support for url/local datasets * add column dropping for data parsing * find best for multi-objective search * better cleaning for experiments without attacks * better filetype support when plotting * load distance matrix from disk (optionally) * update confs for gzip * update default params * update .gitignore, add some models to torch_example * refactor confs * update pytorch confs * minor bug fixes * fix small bugs * add resnet examples * update pytorch experiment confs * config changes * increase timeline resolution * config changes * removed dvc cruft * update cost normalization calculation * update afr plotting * remove partial effects from pytorch config, add support for aalen additive models * update dvc file for pytorch plots * better error handling * update survival plots * ++plots to newest overleaf * dummy config changes * config change * re-run plot dvc.yaml * config changes * update .gitignore * fixed file bug * fix keyword bug * change Coefficient plots to sym log scale * and latex to result parsing * added support for forloop stage parsing * streamline some code * config changes + predicting the metric with a model config chosen by key * add plots yaml again * stop tracking cifar100.yaml * fix uncaught exception * make dataset formatting more robust * add a type check * update default configs for each dataset to use env vars instead of hard-coded values for the number of jobs * reconfigure the dvc pipeline for re-running and changing the number of jobs + adversarial success * config changes * better pytorch out of memory handling * add normalization to trash metric * better convergence error handline * config changes * linting * linting * stop tracking cifar100.yaml * use pretrained models as initial weights * better error handling * remove cruft * delete old configs * rename parameter for clarity * moved from plots to conf folder * update dvc to work with last commit * config changes for pytorch example * linting * update torch example to use nb_epochs instead of nb_epoch * linting * config updates * fixed bad merge * linting * update .gitignore * stop tracking params file * removed overly verbose logging * broke up attack scripts for better dvc tracking * update pytorch confs * add hashable object, better art type checking * created hashable object for inheritance * changed AFR to AFT * add arbitrary set() dictionary to catplot * add numeric casting to afr * fix logging bug * linting * better art typing * hashable object * linting * fixed hashing bug * fix bug * linting --------- Co-authored-by: Mohammad Reza Saleh <[email protected]> Co-authored-by: salehsedghpour <[email protected]>
simplymathematics · Aug 13, 2024 · 1f6ca37 · 1f6ca37
1 parent 6fa174c
commit 1f6ca37
Show file tree

Hide file tree

Showing 32 changed files with 217 additions and 2,847 deletions.
diff --git a/deckard/base/attack/attack.py b/deckard/base/attack/attack.py
@@ -134,7 +134,6 @@ def __init__(
         self.attack_size = attack_size
         self.init = AttackInitializer(model, name, **init)
         self.kwargs = kwargs
-        logger.info("Instantiating Attack with id: {}".format(self.__hash__()))
 
     def __hash__(self):
         return int(my_hash(self), 16)
@@ -300,7 +299,6 @@ def __init__(
         self.attack_size = attack_size
         self.init = AttackInitializer(model, name, **init)
         self.kwargs = kwargs
-        logger.info("Instantiating Attack with id: {}".format(self.__hash__()))
 
     def __hash__(self):
         return int(my_hash(self), 16)
@@ -493,7 +491,6 @@ def __init__(
         self.attack_size = attack_size
         self.init = AttackInitializer(model, name, **init)
         self.kwargs = kwargs
-        logger.info("Instantiating Attack with id: {}".format(self.__hash__()))
 
     def __hash__(self):
         return int(my_hash(self), 16)
@@ -618,7 +615,6 @@ def __init__(
                 f"kwargs must be of type DictConfig or dict. Got {type(kwargs)}",
             )
         self.kwargs = kwargs
-        logger.info("Instantiating Attack with id: {}".format(self.__hash__()))
 
     def __hash__(self):
         return int(my_hash(self), 16)
@@ -813,7 +809,6 @@ def __init__(
             kwargs.update(**kwargs.pop("kwargs"))
         self.kwargs = kwargs
         self.name = name if name is not None else my_hash(self)
-        logger.info("Instantiating Attack with id: {}".format(self.name))
 
     def __call__(
         self,

diff --git a/deckard/base/data/data.py b/deckard/base/data/data.py
@@ -148,7 +148,6 @@ def save(self, data, filename):
         :param filename: str
         """
         if filename is not None:
-            logger.info(f"Saving data to {filename}")
             suffix = Path(filename).suffix
             Path(filename).parent.mkdir(parents=True, exist_ok=True)
             if isinstance(data, dict):

diff --git a/deckard/base/data/generator.py b/deckard/base/data/generator.py
@@ -51,9 +51,6 @@ class SklearnDataGenerator:
     kwargs: dict = field(default_factory=dict)
 
     def __init__(self, name, **kwargs):
-        logger.info(
-            f"Instantiating {self.__class__.__name__} with name={name} and kwargs={kwargs}",
-        )
         self.name = name
         self.kwargs = {k: v for k, v in kwargs.items() if v is not None}
 
@@ -91,9 +88,6 @@ class TorchDataGenerator:
     kwargs: dict = field(default_factory=dict)
 
     def __init__(self, name, path=None, **kwargs):
-        logger.info(
-            f"Instantiating {self.__class__.__name__} with name={name} and kwargs={kwargs}",
-        )
         self.name = name
         self.path = path
         self.kwargs = {k: v for k, v in kwargs.items() if v is not None}
@@ -179,9 +173,6 @@ class KerasDataGenerator:
     kwargs: dict = field(default_factory=dict)
 
     def __init__(self, name, **kwargs):
-        logger.info(
-            f"Instantiating {self.__class__.__name__} with name={name} and kwargs={kwargs}",
-        )
         self.name = name
         self.kwargs = {k: v for k, v in kwargs.items() if v is not None}
 

diff --git a/deckard/base/model/model.py b/deckard/base/model/model.py
@@ -70,7 +70,6 @@ def __init__(self, **kwargs):
         self.kwargs = kwargs
 
     def __call__(self, data: list, model: object, library=None):
-        logger.info(f"Training model {model} with fit params: {self.kwargs}")
         device = str(model.device) if hasattr(model, "device") else "cpu"
         trainer = self.kwargs
         if library in sklearn_dict.keys():
@@ -91,7 +90,6 @@ def __call__(self, data: list, model: object, library=None):
         try:
             start = process_time_ns()
             start_timestamp = time()
-            logger.info(f"Fitting type(model): {type(model)} with kwargs {trainer}")
             model.fit(data[0], data[2], **trainer)
             end = process_time_ns()
             end_timestamp = time()

diff --git a/deckard/base/model/sklearn_pipeline.py b/deckard/base/model/sklearn_pipeline.py
@@ -57,16 +57,10 @@ class SklearnModelPipelineStage:
     kwargs: dict = field(default_factory=dict)
 
     def __init__(self, name, stage_name, **kwargs):
-        logger.debug(
-            f"Instantiating {self.__class__.__name__} with name={name} and kwargs={kwargs}",
-        )
         self.name = name
         self.kwargs = kwargs
         self.stage_name = stage_name
 
-    def __hash__(self):
-        return int(my_hash(self), 16)
-
     def __call__(self, model):
         logger.debug(
             f"Calling SklearnModelPipelineStage with name={self.name} and kwargs={self.kwargs}",
@@ -76,7 +70,7 @@ def __call__(self, model):
         stage_name = self.stage_name if self.stage_name is not None else name
         while "kwargs" in kwargs:
             kwargs.update(**kwargs.pop("kwargs"))
-        if "art." in str(type(model)):
+        if str(type(model)).startswith("art."):
             assert isinstance(
                 model.model,
                 BaseEstimator,
@@ -102,7 +96,6 @@ class SklearnModelPipeline:
     pipeline: Dict[str, SklearnModelPipelineStage] = field(default_factory=dict)
 
     def __init__(self, **kwargs):
-        logger.debug(f"Instantiating {self.__class__.__name__} with kwargs={kwargs}")
         pipe = {}
         while "kwargs" in kwargs:
             pipe.update(**kwargs.pop("kwargs"))
@@ -145,12 +138,12 @@ def __len__(self):
         else:
             return 0
 
-    def __hash__(self):
-        return int(my_hash(self), 16)
-
     def __iter__(self):
         return iter(self.pipeline)
 
+    def __hash__(self):
+        return int(my_hash(self), 16)
+
     def __call__(self, model):
         params = deepcopy(asdict(self))
         pipeline = params.pop("pipeline")
@@ -172,7 +165,7 @@ def __call__(self, model):
             elif isinstance(stage, SklearnModelPipelineStage):
                 model = stage(model=model)
             elif hasattr(stage, "fit"):
-                if "art." in str(type(model)):
+                if str(type(model)).startswith("art."):
                     assert isinstance(
                         model.model,
                         BaseEstimator,
@@ -184,12 +177,15 @@ def __call__(self, model):
                     ), f"model must be a sklearn estimator. Got {type(model)}"
                 if not isinstance(model, Pipeline) and "art." not in str(type(model)):
                     model = Pipeline([("model", model)])
-                elif "art." in str(type(model)) and not isinstance(
+                elif str(type(model)).startswith("art.") and not isinstance(
                     model.model,
                     Pipeline,
                 ):
                     model.model = Pipeline([("model", model.model)])
-                elif "art." in str(type(model)) and isinstance(model.model, Pipeline):
+                elif str(type(model)).startswith("art.") and isinstance(
+                    model.model,
+                    Pipeline,
+                ):
                     model.model.steps.insert(-2, [stage, model.model])
                 else:
                     model.steps.insert(-2, [stage, model])
@@ -213,6 +209,9 @@ class SklearnModelInitializer:
     pipeline: SklearnModelPipeline = field(default_factory=None)
     kwargs: Union[dict, None] = field(default_factory=dict)
 
+    def __hash__(self):
+        return int(my_hash(self), 16)
+
     def __init__(self, data, model=None, library="sklearn", pipeline={}, **kwargs):
         self.data = data
         self.model = model
@@ -267,6 +266,3 @@ def __call__(self):
             "fit",
         ), f"model must have a fit method. Got type {type(model)}"
         return model
-
-    def __hash__(self):
-        return int(my_hash(self), 16)
diff --git a/deckard/base/utils/hashing.py b/deckard/base/utils/hashing.py
@@ -1,7 +1,7 @@
 from hashlib import md5
 from collections import OrderedDict
 from typing import NamedTuple, Union
-from dataclasses import asdict, is_dataclass
+from dataclasses import asdict, is_dataclass, dataclass
 from omegaconf import DictConfig, OmegaConf, SCMode, ListConfig
 from copy import deepcopy
 import logging
@@ -71,3 +71,9 @@ def to_dict(obj: Union[dict, OrderedDict, NamedTuple]) -> dict:
 
 def my_hash(obj: Union[dict, OrderedDict, NamedTuple]) -> str:
     return md5(str(to_dict(obj)).encode("utf-8")).hexdigest()
+
+
+@dataclass
+class Hashable:
+    def __hash__(self):
+        return int(my_hash(self), 16)
diff --git a/deckard/layers/afr.py b/deckard/layers/afr.py
@@ -75,7 +75,11 @@ def ccl(p):
         ax = plt.gca()
     T = model.duration_col
     E = model.event_col
-
+    # Cast df to numeric DataFrame
+    for col in df.columns:
+        df[col] = pd.to_numeric(df[col], errors="raise")
+    # Drop NaNs
+    df = df.dropna()
     predictions_at_t0 = np.clip(
         1 - model.predict_survival_function(df, times=[t0]).T.squeeze(),
         1e-10,
@@ -347,8 +351,6 @@ def plot_aft(
     ax.set_xlabel(xlabel)
     ax.set_ylabel(ylabel)
     ax.set_title(title)
-    # symlog-scale the x-axis
-    # ax.set_xscale("linear")
     ax.get_figure().tight_layout()
     ax.get_figure().savefig(file)
     plt.gcf().clear()
@@ -624,7 +626,7 @@ def make_afr_table(
         pretty_dataset = dataset.upper()
     aft_data = aft_data.round(2)
     aft_data.to_csv(folder / "aft_comparison.csv")
-    logger.info(f"Saved AFR comparison to {folder / 'aft_comparison.csv'}")
+    logger.info(f"Saved AFT comparison to {folder / 'aft_comparison.csv'}")
     aft_data = aft_data.round(2)
     aft_data.fillna("--", inplace=True)
     aft_data.to_latex(

diff --git a/deckard/layers/clean_data.py b/deckard/layers/clean_data.py
@@ -81,7 +81,7 @@ def drop_rows_without_results(
         logger.info(f"Shape of data before data before dropping na: {data.shape}")
         data.dropna(axis=0, subset=[col], inplace=True)
         after = data.shape[0]
-        logger.info(f"Shape of data before data after dropping na: {data.shape}")
+        logger.info(f"Shape of data after data after dropping na: {data.shape}")
         percent_change = (before - after) / before * 100
         if percent_change > 5:
             # input(f"{percent_change:.2f}% of data dropped for {col}. Press any key to continue.")
@@ -593,7 +593,6 @@ def clean_data_for_plotting(
     data = fill_na(data, fillna)
     data = replace_strings_in_data(data, replace_dict)
     data = replace_strings_in_columns(data, col_replace_dict)
-
     if len(pareto_dict) > 0:
         data = find_pareto_set(data, pareto_dict)
     return data

diff --git a/deckard/layers/plots.py b/deckard/layers/plots.py
@@ -103,6 +103,7 @@ def cat_plot(
         file = Path(file).with_suffix(filetype)
     logger.info(f"Rendering graph {file}")
     data = digitize_cols(data, digitize)
+    set_ = kwargs.pop("set", {})
     if hue is not None:
         data = data.sort_values(by=[hue, x, y])
         logger.debug(
@@ -162,6 +163,8 @@ def cat_plot(
         graph.set(xlim=x_lim)
     if y_lim is not None:
         graph.set(ylim=y_lim)
+    if len(set_) > 0:
+        graph.set(**set_)
     graph.tight_layout()
     graph.savefig(folder / file)
     plt.gcf().clear()

diff --git a/examples/power/conf/afr.yaml b/examples/power/conf/afr.yaml
@@ -15,7 +15,7 @@ fillna:
 weibull:
   plot:
     file : weibull_aft.pdf
-    title : Weibull AFR Model
+    title : Weibull AFT Model
     labels:
       "Intercept: rho_": "$\\rho$"
       "Intercept: lambda_": "$\\lambda$" 
@@ -36,7 +36,7 @@ weibull:
     - "file": "weibull_epochs_partial_effect.pdf"
       "covariate_array": "model.trainer.np_epochs"
       "values_array": [1,10,25,50]
-      "title": "$S(t)$ for Weibull AFR"
+      "title": "$S(t)$ for Weibull AFT"
       "ylabel": "$\\mathbb{P}~(T>t)$"
       "xlabel": "Time $t$ (seconds)"
       "legend_kwargs": {
@@ -46,7 +46,7 @@ weibull:
 cox:
   plot:
     file : cox_aft.pdf
-    title : Cox AFR Model
+    title : Cox AFT Model
     labels:
       "data.sample.random_state": "Random State"
       "atk_value": "Attack Strength"
@@ -65,7 +65,7 @@ cox:
     - "file": "cox_epochs_partial_effect.pdf"
       "covariate_array": "model.trainer.np_epochs"
       "values_array": [1,10,25,50]
-      "title": "$S(t)$ for Cox AFR"
+      "title": "$S(t)$ for Cox AFT"
       "ylabel": "$\\mathbb{P}~(T>t)$"
       "xlabel": "Time $t$ (seconds)"
       "legend_kwargs": {
@@ -75,7 +75,7 @@ cox:
 log_logistic:
   plot:
     file : log_logistic_aft.pdf
-    title : Log logistic AFR Model
+    title : Log logistic AFT Model
     labels:
       "Intercept: beta_": "$\\beta$"
       "Intercept: alpha_": "$\\alpha$"
@@ -96,7 +96,7 @@ log_logistic:
     - "file": "log_logistic_epochs_partial_effect.pdf"
       "covariate_array": "model.trainer.np_epochs"
       "values_array": [1,10,25,50]
-      "title": "$S(t)$ for Log-Logistic AFR"
+      "title": "$S(t)$ for Log-Logistic AFT"
       "ylabel": "$\\mathbb{P}~(T>t)$"
       "xlabel": "Time $t$ (seconds)"
       "legend_kwargs": {
@@ -106,7 +106,7 @@ log_logistic:
 log_normal:
   plot:
     file : log_normal_aft.pdf
-    title : Log Normal AFR Model
+    title : Log Normal AFT Model
     labels:
       "Intercept: sigma_": "$\\sigma$" 
       "Intercept: mu_": "$\\mu$"
@@ -127,7 +127,7 @@ log_normal:
     - "file": "log_normal_epochs_partial_effect.pdf"
       "covariate_array": "model.trainer.np_epochs"
       "values_array": [1,10,25,50]
-      "title": "$S(t)$ for Log-Normal AFR"
+      "title": "$S(t)$ for Log-Normal AFT"
       "ylabel": "$\\mathbb{P}~(T>t)$"
       "xlabel": "Time $t$ (seconds)"
       "legend_kwargs": {