Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook branch #173

Merged
merged 331 commits into from
Mar 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
331 commits
Select commit Hold shift + click to select a range
14e32a8
linting
simplymathematics Jan 15, 2024
f25d72b
linting
simplymathematics Jan 15, 2024
358e759
fixed cifar100 pytorch example script
simplymathematics Jan 15, 2024
4b0ff7d
more resilient wait and cleaning scripts
simplymathematics Jan 15, 2024
4345985
+GZIP example
simplymathematics Jan 22, 2024
c49cd72
bug fixes
simplymathematics Jan 22, 2024
44d6145
fix latex nan bug
simplymathematics Jan 23, 2024
ef5612d
bug fixes
simplymathematics Jan 23, 2024
dc7c478
add index to compilation csv
simplymathematics Jan 23, 2024
9b57058
better defence merging
simplymathematics Jan 23, 2024
ff9d924
fixed bug where x,y scale are None
simplymathematics Jan 23, 2024
9dcc132
update cifar100 confs
simplymathematics Jan 23, 2024
ce45cde
fixed cleaning bug
simplymathematics Jan 23, 2024
285fcea
fixed afr plot rendering bug
simplymathematics Jan 24, 2024
90e41f1
add check for negative predict time
simplymathematics Jan 24, 2024
0254a92
update configs
simplymathematics Jan 24, 2024
f82e1d8
fixed failure rate bug and updated confs
simplymathematics Jan 24, 2024
0971505
Merge branch 'cifar100' of github.com:simplymathematics/deckard into …
simplymathematics Jan 24, 2024
b09a423
change plot default from eps to pdf
simplymathematics Jan 24, 2024
80998a8
fix bug in calculating failure rate when attack size != train size
simplymathematics Jan 24, 2024
c76394e
update mnist confs
simplymathematics Jan 25, 2024
5491358
specify attack size at the command line
simplymathematics Jan 25, 2024
74fbe66
linting
simplymathematics Jan 26, 2024
ab6459f
update all plot configs
simplymathematics Jan 26, 2024
351f597
Fix compile script (#172)
simplymathematics Jan 24, 2024
8f08882
added dummy vars, fixed plots
simplymathematics Jan 26, 2024
66515ff
fix afr.py bugs
simplymathematics Jan 26, 2024
33eeaf4
bad merge?
simplymathematics Jan 26, 2024
66f8319
merge
simplymathematics Jan 26, 2024
6625ae5
fix bugs
simplymathematics Jan 26, 2024
ae10c9f
Merge branch 'main' of github.com:simplymathematics/deckard into note…
simplymathematics Jan 26, 2024
cf265db
linting
simplymathematics Jan 26, 2024
1254a67
linting
simplymathematics Jan 26, 2024
3060e4c
linting
simplymathematics Jan 26, 2024
1da196c
linting
simplymathematics Jan 26, 2024
6609a53
linting
simplymathematics Jan 26, 2024
098af1c
linting
simplymathematics Jan 26, 2024
0f360f4
update linter
simplymathematics Jan 26, 2024
3150f9e
update linter
simplymathematics Jan 26, 2024
4455ef7
update linter
simplymathematics Jan 26, 2024
d17765e
update linter
simplymathematics Jan 26, 2024
6b6caef
linting
simplymathematics Jan 27, 2024
5978b44
update setup, .gitignore
simplymathematics Jan 30, 2024
e8257ed
fix failure rate bug (again)
simplymathematics Jan 30, 2024
aaea279
most up-to-date plots
simplymathematics Jan 31, 2024
bf5eaf8
update failure rate from h to f in pytorch examples
simplymathematics Feb 1, 2024
609bc8d
remove intercept and scale parameters from afr plots
simplymathematics Feb 1, 2024
56b3aa3
remove rows where the score is an error
simplymathematics Feb 1, 2024
ca6c7cc
update plolt confs for pytorch example
simplymathematics Feb 1, 2024
36ea908
allow setting filename from command line of AFR script
simplymathematics Feb 1, 2024
aa50ddc
plot legend tweaks
simplymathematics Feb 1, 2024
8ebdaa1
linting
simplymathematics Feb 4, 2024
2794006
Merge branch 'main' of github.com:simplymathematics/deckard into note…
simplymathematics Feb 4, 2024
5a3226b
linting
simplymathematics Feb 4, 2024
24b1b36
merge with main
simplymathematics Feb 4, 2024
de5ad1b
Update Dockerfile
simplymathematics Feb 4, 2024
f5b0287
Merge branch 'simplymathematics-workflow-diskspace-patch' of https://…
simplymathematics Feb 4, 2024
d9bd067
Merge branch 'notebook-branch' of https://github.com/simplymathematic…
simplymathematics Feb 4, 2024
b331acd
update dockerfile
simplymathematics Feb 4, 2024
5a8f88b
lintin
simplymathematics Feb 4, 2024
2b3cf6e
update gzip configs
simplymathematics Feb 4, 2024
f3123d9
better logging
simplymathematics Feb 4, 2024
d300763
add url validation for data pipeline
simplymathematics Feb 4, 2024
5e90ee8
git rm
simplymathematics Feb 4, 2024
80d6eaa
update truthseeker yaml
simplymathematics Feb 4, 2024
af7066d
add gzip .gitignore
simplymathematics Feb 4, 2024
db34da2
gzip dvc changes
simplymathematics Feb 4, 2024
a0de7bf
add sampling during training
simplymathematics Feb 5, 2024
533ad3e
more resilient find_best script
simplymathematics Feb 5, 2024
670d81a
fixed bug with finding min/max when data is non-numeric
simplymathematics Feb 5, 2024
32b13e7
add support for url/local datasets
simplymathematics Feb 5, 2024
bd2a7a5
add column dropping for data parsing
simplymathematics Feb 6, 2024
ac07dff
find best for multi-objective search
simplymathematics Feb 6, 2024
a773639
better cleaning for experiments without attacks
simplymathematics Feb 6, 2024
0874768
better filetype support when plotting
simplymathematics Feb 6, 2024
266f006
load distance matrix from disk (optionally)
simplymathematics Feb 6, 2024
7b4a517
update confs for gzip
simplymathematics Feb 6, 2024
9c178c4
Merge branches 'notebook-branch' and 'notebook-branch' of github.com:…
simplymathematics Feb 6, 2024
53d8bb6
update sklearn gitignor
simplymathematics Feb 6, 2024
192c014
fix data double initialization
simplymathematics Feb 6, 2024
2b17cbd
better loggin for gzip classifier
simplymathematics Feb 6, 2024
85f8987
update confs
simplymathematics Feb 6, 2024
89dfc13
minor refactor to prepare for optimization (#175)
ansuz Feb 7, 2024
1d2f85e
update gzip confs
simplymathematics Feb 7, 2024
14b369f
update .gitignores
simplymathematics Feb 7, 2024
10bfde1
revert changes to find_best script
simplymathematics Feb 7, 2024
81ac63f
update setup file
simplymathematics Feb 7, 2024
ad55914
fix absolute path bug in save_params_file function
simplymathematics Feb 7, 2024
c2dbfe7
Merge branch 'notebook-branch' of github.com:simplymathematics/deckar…
simplymathematics Feb 7, 2024
f1c8c00
minimum working example
simplymathematics Feb 7, 2024
83eeb83
add shebang to make gzip-classifier a standalone executable
simplymathematics Feb 7, 2024
70b8fa1
add docstrings
simplymathematics Feb 7, 2024
e7d4d2e
cleanup imports
simplymathematics Feb 7, 2024
1f01fc1
write "main" function
simplymathematics Feb 7, 2024
154a554
remove unused variable
simplymathematics Feb 7, 2024
377d57a
fix indent
simplymathematics Feb 7, 2024
28006ed
make main function more readable
simplymathematics Feb 7, 2024
a14df2d
update shebang to python3
simplymathematics Feb 7, 2024
fe04a45
fix training time bug
simplymathematics Feb 7, 2024
be04e84
update docstring
simplymathematics Feb 7, 2024
dca608f
add support for overrides in experiment script
simplymathematics Feb 7, 2024
e420c59
fixed time display bug
simplymathematics Feb 7, 2024
8a09781
add default params for gzip
simplymathematics Feb 7, 2024
02a3d34
cleaned up find_best_script
simplymathematics Feb 9, 2024
5dd4235
-- cruft
simplymathematics Feb 9, 2024
347cc0c
add override support for experiment script
simplymathematics Feb 9, 2024
9c1b515
update configs and run
simplymathematics Feb 9, 2024
150f9e3
fix distance matrix bugs
simplymathematics Feb 10, 2024
b82b8c6
merge without editing the params file
simplymathematics Feb 10, 2024
f32145c
revert changes to find distance matrix
simplymathematics Feb 10, 2024
7d844b0
cleanup cruft, add docstring
simplymathematics Feb 11, 2024
cbedfb6
better file mergin
simplymathematics Feb 11, 2024
c80a1d5
better param merging
simplymathematics Feb 11, 2024
b95b13d
reduce default truthseeker sample size
simplymathematics Feb 11, 2024
617ffe3
better binary classifier handling for arbitrary label matrices
simplymathematics Feb 11, 2024
78b2e92
fix drop bug
simplymathematics Feb 11, 2024
652c3bd
add shebang to main
simplymathematics Feb 11, 2024
7ac47d2
remove old conf
simplymathematics Feb 11, 2024
675a425
update default conf for gzip
simplymathematics Feb 11, 2024
2a60005
add stage for testing each training optimization method
simplymathematics Feb 11, 2024
47aeec2
add stage for preparing distance matrices
simplymathematics Feb 11, 2024
a5e7124
added grid search to method optimization
simplymathematics Feb 11, 2024
8f97ef4
add matrix to find_best_m
simplymathematics Feb 11, 2024
58144e9
and matrix to model_optimise_m
simplymathematics Feb 11, 2024
28a28c3
update defaul gzip_classifier.yaml
simplymathematics Feb 11, 2024
371f5f4
fix broken test
simplymathematics Feb 11, 2024
202b71f
fix drop bug
simplymathematics Feb 11, 2024
c5f6e8b
++dvc.yaml
simplymathematics Feb 11, 2024
5d30c11
insert CLI stop for identifying malformed files
simplymathematics Feb 12, 2024
3e092af
make experiment.py params a dict instead of omegaconf object
simplymathematics Feb 12, 2024
5de4792
update .gitignore
simplymathematics Feb 12, 2024
cf6d26a
refactor gzip classifier, updat configs
simplymathematics Feb 12, 2024
149ba71
add cli args to gzip_classifier
simplymathematics Feb 13, 2024
37529f3
add numeric dataset for testing
simplymathematics Feb 13, 2024
b2d9961
refactor compression
simplymathematics Feb 13, 2024
d9cd6d2
add support for string distance metrics
simplymathematics Feb 13, 2024
b6ff6dd
test_each_,method stage
simplymathematics Feb 13, 2024
972c123
fix bug when X_train isn't np.ndarray
simplymathematics Feb 13, 2024
ebe2c9e
move function
simplymathematics Feb 14, 2024
f64b82f
added more init params
simplymathematics Feb 14, 2024
bd59781
more distances
simplymathematics Feb 14, 2024
9dfd2c4
trying on sklearn knn classifier
simplymathematics Feb 14, 2024
17cbd51
better distance support, logging, argparse
simplymathematics Feb 14, 2024
7296fd0
update confs
simplymathematics Feb 14, 2024
a8ee82f
removed line search
simplymathematics Feb 14, 2024
f3fdcbe
refactor pareto set, add string replacement
simplymathematics Feb 21, 2024
b49445f
fix params.yaml bug
simplymathematics Feb 21, 2024
b2c1bc4
more flexible plot configuration
simplymathematics Feb 21, 2024
3bca0ce
add data preparation script
simplymathematics Feb 21, 2024
cfe87ae
fix params.yaml bug
simplymathematics Feb 21, 2024
a5e6591
update default params for gzip example
simplymathematics Feb 21, 2024
a85ccec
update gzip confs
simplymathematics Feb 21, 2024
9da794d
add label encoding to fit
simplymathematics Feb 21, 2024
ef97d2b
more datasets in gzip main()
simplymathematics Feb 21, 2024
977483e
finish experiment run, update dvc.lock file
simplymathematics Feb 22, 2024
a577459
update gzip confs
simplymathematics Feb 22, 2024
197eef2
better logging
simplymathematics Feb 22, 2024
cf5b5ef
cleanup cruft
simplymathematics Feb 22, 2024
927dff1
better logging for data prep script in gzip example
simplymathematics Feb 22, 2024
3b3e55c
update confs
simplymathematics Feb 25, 2024
b1b1314
better scoring support
simplymathematics Feb 25, 2024
9eaed36
add support for precompressing strings in gzip classifier, other mode…
simplymathematics Feb 25, 2024
7daa9e5
update dvc (running)
simplymathematics Feb 25, 2024
26a786b
collapse compressor intro metric
simplymathematics Feb 28, 2024
5d558b2
update model confs to remove compressors
simplymathematics Feb 28, 2024
bf3ee7a
optimizer support, svc per class
simplymathematics Feb 28, 2024
27aed8d
add support for arbitrary kwargs
simplymathematics Feb 29, 2024
dfcf6c2
update main args, add support for precompressing strings in dataset
simplymathematics Feb 29, 2024
83d45a3
partial revert + bug fixes
simplymathematics Feb 29, 2024
83c3a14
stop tracking gzip params
simplymathematics Feb 29, 2024
eb5ce5d
added params yaml back
simplymathematics Feb 29, 2024
e0a48d8
stop tracking params.yaml
simplymathematics Feb 29, 2024
3671aff
better support for finding best trials
simplymathematics Mar 5, 2024
a3e515c
batch support
simplymathematics Mar 5, 2024
e61b105
update gitignore
simplymathematics Mar 5, 2024
3a7ac50
update configs
simplymathematics Mar 5, 2024
a0a06c9
fix attack kwarg bug
simplymathematics Mar 5, 2024
7d667de
add optuna callback for dumping the study to file
simplymathematics Mar 5, 2024
051d31b
exception handling for plots.py
simplymathematics Mar 5, 2024
5b7d11c
draft script for calculating SHAPr score
simplymathematics Mar 5, 2024
9abd9d5
more resilient data object
simplymathematics Mar 5, 2024
0addd2a
fixed experiment FileConfig bug when only defaults were set
simplymathematics Mar 5, 2024
52ca7a5
update gitignore
simplymathematics Mar 5, 2024
fd49c02
++torch loadeR
simplymathematics Mar 5, 2024
d2f1e23
refactor experiment.py into data, model, and experiment.py scripts
simplymathematics Mar 5, 2024
467b7d1
better file list parsing
simplymathematics Mar 5, 2024
ca5cf26
more resilient parsing
simplymathematics Mar 5, 2024
244667c
more resilient parsing
simplymathematics Mar 5, 2024
c3f429d
cleaned up cleaning script
simplymathematics Mar 5, 2024
7f48f47
compile script will now prompt the user to fix the file at run-time
simplymathematics Mar 5, 2024
08c1340
model.py layer
simplymathematics Mar 5, 2024
5ae9af6
add support for model training during attack, if necessary
simplymathematics Mar 7, 2024
3212bec
clean up data/model pipelines a bit
simplymathematics Mar 7, 2024
fea493e
revert sampler default, remove noisy logs
simplymathematics Mar 7, 2024
f5b73c3
removed some noisy logging
simplymathematics Mar 7, 2024
4f50d45
pass all files to each object's call function
simplymathematics Mar 7, 2024
aee586a
removed unneeded code after pipeline refactor
simplymathematics Mar 7, 2024
4b19c05
update gzip confs
simplymathematics Mar 7, 2024
8a880ab
add random undersampling to the balanced dataset
simplymathematics Mar 7, 2024
8c182f9
refactor optuna callback
simplymathematics Mar 7, 2024
55dc8b4
add merge.py script
simplymathematics Mar 7, 2024
c51b794
add csv merging script
simplymathematics Mar 7, 2024
4b22df7
allow saving/loading of sampled csvs
simplymathematics Mar 7, 2024
2663958
add attack layer, refactor others for better submodule support
simplymathematics Mar 7, 2024
5ebf0b8
run the start command during call
simplymathematics Mar 7, 2024
564cd13
refactor for submodule support
simplymathematics Mar 7, 2024
66ae00c
add upper triangular matrix, refactor as submodule, more condensers
simplymathematics Mar 7, 2024
dd488ef
add dvc pre/post commits
simplymathematics Mar 7, 2024
36e2e7e
update gzip gitignore
simplymathematics Mar 7, 2024
9fdfe7b
update dvc files
simplymathematics Mar 7, 2024
b10c5f4
prepare clean_data to be proper submodule
simplymathematics Mar 13, 2024
f5f9e1d
re-add index to compile script saving
simplymathematics Mar 13, 2024
c9a5dd3
refactor for submodule, support for catplot when hue is not set
simplymathematics Mar 13, 2024
f00c2f0
minor refator for condensing
simplymathematics Mar 13, 2024
388dd17
numpy handling (instead of native dataframes)
simplymathematics Mar 13, 2024
ad9b885
conf updates
simplymathematics Mar 13, 2024
c4ef997
playing with batched condesing (WIP)
simplymathematics Mar 13, 2024
dedd8ad
prepare plots script for submodule
simplymathematics Mar 13, 2024
499d45f
README.md
simplymathematics Mar 13, 2024
9aa90d0
linting
simplymathematics Mar 13, 2024
10ce107
precommit order of operations
simplymathematics Mar 13, 2024
644ecb5
linting
simplymathematics Mar 13, 2024
a7c0266
linting
simplymathematics Mar 13, 2024
2bc52e9
linting
simplymathematics Mar 13, 2024
d286c3e
update dvc.lock
simplymathematics Mar 13, 2024
7b7d758
more git ignoring
simplymathematics Mar 13, 2024
ddfaf28
more git ignoring
simplymathematics Mar 13, 2024
61efcf0
linting
simplymathematics Mar 13, 2024
3bb8020
linting
simplymathematics Mar 13, 2024
03c9c65
remove dvc from pre commit config
simplymathematics Mar 13, 2024
c97fa86
linting
simplymathematics Mar 13, 2024
a489b02
allow renaming data files
simplymathematics Mar 19, 2024
ac1def5
linting
simplymathematics Mar 19, 2024
569f3f6
linting
simplymathematics Mar 19, 2024
693cd5e
linting
simplymathematics Mar 19, 2024
82b5709
linting
simplymathematics Mar 19, 2024
ab376d2
linting
simplymathematics Mar 19, 2024
2f41b15
linting
simplymathematics Mar 19, 2024
d72a55d
linting
simplymathematics Mar 19, 2024
b5704c5
linting
simplymathematics Mar 19, 2024
c317ab2
linting
simplymathematics Mar 19, 2024
3e7aca6
update plots
simplymathematics Mar 19, 2024
c234f31
add targets to base/data objects
simplymathematics Mar 19, 2024
461b1b0
art pipeline bug fixes
simplymathematics Mar 19, 2024
8b7ac14
linting
simplymathematics Mar 19, 2024
01e5efc
linting
simplymathematics Mar 19, 2024
2fc37e4
remove _target_ from base/data
simplymathematics Mar 20, 2024
0d995e9
remove _target_ from base/data
simplymathematics Mar 20, 2024
4149976
remove _target_ from base/data
simplymathematics Mar 20, 2024
1f7a758
remove _target_ from base/data
simplymathematics Mar 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/black.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,6 @@ jobs:
- uses: actions/checkout@v2
- uses: psf/black@stable
with:
options: "--check --verbose"
src: "deckard/"
jupyter: true
16 changes: 16 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -126,3 +126,19 @@ deckard/deckard.egg-info/*

*log.txt
*.hydra


# envs
env/


# random pdfs
*.pdf
# random pngs
*.png

# screenlog
screenlog.*

# tmp.py
tmp.py
62 changes: 31 additions & 31 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,36 +1,36 @@
repos:
- repo: https://github.com/asottile/add-trailing-comma
rev: v2.2.3
hooks:
- id: add-trailing-comma
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0 # Use the ref you want to point at
hooks:
- id: check-builtin-literals
- id: check-case-conflict
- id: check-symlinks
- id: check-toml
- id: detect-private-key
- id: end-of-file-fixer
- id: check-yaml
args : ['--unsafe']
- repo: https://github.com/hadialqattan/pycln
rev: v2.1.1 # Possible releases: https://github.com/hadialqattan/pycln/releases
hooks:
- id: pycln
args: [deckard/]
- repo: https://github.com/pycqa/flake8
rev: '5.0.4' # pick a git hash / tag to point to
hooks:
- id: flake8
exclude: __init__.py
args: [--ignore=E501 W503]
- repo: https://github.com/psf/black
rev: 22.8.0
hooks:
- id: black
- repo: https://github.com/asottile/add-trailing-comma
rev: v3.1.0
hooks:
- id: add-trailing-comma
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0 # Use the ref you want to point at
hooks:
- id: check-builtin-literals
- id: check-case-conflict
- id: check-symlinks
- id: check-toml
- id: detect-private-key
- id: end-of-file-fixer
- id: check-yaml
args: [--unsafe]
- repo: https://github.com/hadialqattan/pycln
rev: v2.4.0 # Possible releases: https://github.com/hadialqattan/pycln/releases
hooks:
- id: pycln
args: [deckard/]
- repo: https://github.com/psf/black
rev: 24.2.0
hooks:
- id: black
# It is recommended to specify the latest version of Python
# supported by your project here, or alternatively use
# pre-commit's default_language_version, see
# https://pre-commit.com/#top_level-default_language_version
language_version: python3
language_version: python3
- repo: https://github.com/pycqa/flake8
rev: 7.0.0 # pick a git hash / tag to point to
hooks:
- id: flake8
exclude: __init__.py
args: [--ignore=E501 W503]
2 changes: 0 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ RUN python3 -m pip install nvidia-pyindex nvidia-cuda-runtime-cu11
RUN git clone https://github.com/simplymathematics/deckard.git
WORKDIR /deckard
RUN python3 -m pip install --editable .
RUN python3 -m pip install pytest torch torchvision tensorflow
RUN git clone https://github.com/Trusted-AI/adversarial-robustness-toolbox.git
RUN cd adversarial-robustness-toolbox && python3 -m pip install .
RUN apt install python-is-python3
RUN pytest test
2 changes: 1 addition & 1 deletion deckard/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
},
"loggers": {
"deckard": {"handlers": ["default"]},
"deckard": {"handlers": ["default"], "level": "INFO", "propagate": True},
"tests": {"handlers": ["test"], "level": "DEBUG", "propagate": True},
},
}
Expand Down
11 changes: 6 additions & 5 deletions deckard/__main__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
""""Runs a submodule passed as an arg."""

#!/usr/bin/env python3
import argparse
import subprocess
import logging
Expand Down Expand Up @@ -44,9 +43,11 @@ def parse_and_repro(args, default_config="default.yaml", config_dir="conf"):
if len(args) == 0:
assert (
save_params_file(
config_dir=Path(Path(), config_dir)
if not Path(config_dir).is_absolute()
else Path(config_dir),
config_dir=(
Path(Path(), config_dir)
if not Path(config_dir).is_absolute()
else Path(config_dir)
),
config_file=default_config,
)
is None
Expand Down
40 changes: 37 additions & 3 deletions deckard/base/attack/attack.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
from omegaconf import DictConfig, OmegaConf
from hydra.utils import instantiate
from art.utils import to_categorical, compute_success
from sklearn.utils.validation import check_is_fitted
from sklearn.base import BaseEstimator
from sklearn.exceptions import NotFittedError
from random import randint
from ..data import Data
from ..model import Model
Expand Down Expand Up @@ -117,7 +120,13 @@ class EvasionAttack:
kwargs: Union[dict, None] = field(default_factory=dict)

def __init__(
self, name: str, data: Data, model: Model, init: dict, attack_size=-1, **kwargs
self,
name: str,
data: Data,
model: Model,
init: dict,
attack_size=-1,
**kwargs,
):
self.name = name
self.data = data
Expand Down Expand Up @@ -148,6 +157,10 @@ def __call__(
if attack_file is not None and Path(attack_file).exists():
samples = self.data.load(attack_file)
else:
print(f"Type of self.init: {type(self.init)}")
print(f"Type of self.init.model: {type(self.init.model)}")
print(f"Type of model: {type(model)}")

atk = self.init(model=model, attack_size=self.attack_size)

if targeted is True:
Expand Down Expand Up @@ -466,7 +479,13 @@ class InferenceAttack:
kwargs: Union[dict, None] = field(default_factory=dict)

def __init__(
self, name: str, data: Data, model: Model, init: dict, attack_size=-1, **kwargs
self,
name: str,
data: Data,
model: Model,
init: dict,
attack_size=-1,
**kwargs,
):
self.name = name
self.data = data
Expand Down Expand Up @@ -577,7 +596,13 @@ class ExtractionAttack:
kwargs: Union[dict, None] = field(default_factory=dict)

def __init__(
self, name: str, data: Data, model: Model, init: dict, attack_size=-1, **kwargs
self,
name: str,
data: Data,
model: Model,
init: dict,
attack_size=-1,
**kwargs,
):
self.name = name
self.data = data
Expand Down Expand Up @@ -798,12 +823,21 @@ def __call__(
adv_predictions_file=None,
adv_probabilities_file=None,
adv_losses_file=None,
**kwargs,
):
name = self.init.name
kwargs = deepcopy(self.kwargs)
kwargs.update({"init": self.init.kwargs})
data = self.data()
data, model = self.model.initialize(data)
if isinstance(model, BaseEstimator):
try:
check_is_fitted(model), "Model must be fitted before calling attack."
except NotFittedError as e:
logger.warning(
f"Model not fitted. Fitting model before attack. Error: {e}",
)
model, _ = self.model.fit(data=data, model=model)
if "art" not in str(type(model)):
model = self.model.art(model=model, data=data)
if self.method == "evasion":
Expand Down
61 changes: 42 additions & 19 deletions deckard/base/data/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
from dataclasses import dataclass, field
from pathlib import Path
from typing import Union

import numpy as np
from pandas import DataFrame, read_csv, Series

from omegaconf import OmegaConf
from validators import url
from ..utils import my_hash
from .generator import DataGenerator
from .sampler import SklearnDataSampler
Expand All @@ -28,6 +28,7 @@ class Data:
)
target: Union[str, None] = None
name: Union[str, None] = None
drop: list = field(default_factory=list)

def __init__(
self,
Expand All @@ -36,6 +37,8 @@ def __init__(
sample: SklearnDataSampler = None,
sklearn_pipeline: SklearnDataPipeline = None,
target: str = None,
drop: list = [],
**kwargs,
):
"""Initialize the data object. If the data is generated, then generate the data and sample it. If the data is loaded, then load the data and sample it.

Expand All @@ -46,9 +49,6 @@ def __init__(
sklearn_pipeline (SklearnDataPipeline, optional): The sklearn pipeline. Defaults to None.
target (str, optional): The target column. Defaults to None.
"""
logger.info(
f"Instantiating {self.__class__.__name__} with name={name} and generate={generate} and sample={sample} and sklearn_pipeline={sklearn_pipeline} and target={target}",
)
if generate is not None:
self.generate = (
generate
Expand All @@ -66,16 +66,19 @@ def __init__(
else:
self.sample = SklearnDataSampler()
if sklearn_pipeline is not None:
sklearn_pipeline = OmegaConf.to_container(
OmegaConf.create(sklearn_pipeline),
)
self.sklearn_pipeline = (
sklearn_pipeline
if isinstance(sklearn_pipeline, (SklearnDataPipeline, type(None)))
if isinstance(sklearn_pipeline, (SklearnDataPipeline))
else SklearnDataPipeline(**sklearn_pipeline)
)
else:
self.sklearn_pipeline = None
self.drop = drop
self.target = target
self.name = name if name is not None else my_hash(self)
logger.debug(f"Instantiating Data with id: {self.get_name()}")

def get_name(self):
"""Get the name of the data object."""
Expand All @@ -91,7 +94,6 @@ def initialize(self, filename=None):
"""
if filename is not None and Path(filename).exists():
result = self.load(filename)
assert len(result) == 4, f"Data is not generated: {self.name}"
elif self.generate is not None:
result = self.generate()
else:
Expand All @@ -100,14 +102,23 @@ def initialize(self, filename=None):
assert self.target is not None, "Target is not specified"
y = result[self.target]
X = result.drop(self.target, axis=1)
X = np.array(X)
y = np.array(y)
if self.drop != []:
X = X.drop(self.drop, axis=1)
X = X.to_numpy()
y = y.to_numpy()
result = [X, y]
else:
if self.drop != []:
raise ValueError(
f"Drop is not supported for non-DataFrame data. Data is type {type(result)}",
)
if len(result) == 2:
result = self.sample(*result)
assert (
len(result) == 4
), f"Data is not generated: {self.name} {result}. Length: {len(result)},"
if self.sklearn_pipeline is not None:
result = self.sklearn_pipeline(*result)
return result

def load(self, filename) -> DataFrame:
Expand All @@ -125,6 +136,8 @@ def load(self, filename) -> DataFrame:
elif suffix in [".pkl", ".pickle"]:
with open(filename, "rb") as f:
data = pickle.load(f)
elif suffix in [".npz"]:
data = np.load(filename)
else: # pragma: no cover
raise ValueError(f"Unknown file type {suffix}")
return data
Expand All @@ -138,6 +151,10 @@ def save(self, data, filename):
logger.info(f"Saving data to {filename}")
suffix = Path(filename).suffix
Path(filename).parent.mkdir(parents=True, exist_ok=True)
if isinstance(data, dict):
for k, v in data.items():
v = str(v)
data[k] = v
if suffix in [".json"]:
if isinstance(data, (Series, DataFrame)):
data = data.to_dict()
Expand All @@ -155,16 +172,20 @@ def save(self, data, filename):
else: # pragma: no cover
raise ValueError(f"Unknown data type {type(data)} for {filename}.")
with open(filename, "w") as f:
json.dump(data, f)
json.dump(data, f, indent=4, sort_keys=True)
elif suffix in [".csv"]:
assert isinstance(
data,
(Series, DataFrame, dict, np.ndarray),
), f"Data must be a Series, DataFrame, or dict, not {type(data)} to save to {filename}"
DataFrame(data).to_csv(filename, index=False)
if isinstance(data, (np.ndarray)):
data = DataFrame(data)
data.to_csv(filename, index=False)
elif suffix in [".pkl", ".pickle"]:
with open(filename, "wb") as f:
pickle.dump(data, f)
elif suffix in [".npz"]:
np.savez(filename, data)
else: # pragma: no cover
raise ValueError(f"Unknown file type {type(suffix)} for {suffix}")
assert Path(filename).exists()
Expand All @@ -174,19 +195,19 @@ def __call__(
data_file=None,
train_labels_file=None,
test_labels_file=None,
**kwargs,
) -> list:
"""Loads data from file if it exists, otherwise generates data and saves it to file. Returns X_train, X_test, y_train, y_test as a list of arrays, typed according to the framework.
:param filename: str
:return: list
"""
result_dict = {}
if data_file is not None and Path(data_file).exists():
data = self.load(data_file)
assert len(data) == 4, f"Some data is missing: {self.name}"
if Path(self.name).is_file() or url(self.name):
new_data_file = data_file
data_file = self.name
else:
data = self.initialize(filename=data_file)
assert len(data) == 4, f"Some data is missing: {self.name}"
data_file = self.save(data, data_file)
new_data_file = data_file
result_dict = {}
data = self.initialize(data_file)
result_dict["data"] = data
if train_labels_file is not None:
self.save(data[2], train_labels_file)
Expand All @@ -198,4 +219,6 @@ def __call__(
assert Path(
test_labels_file,
).exists(), f"Error saving test labels to {test_labels_file}"
if new_data_file is not None:
self.save(data, new_data_file)
return data
Loading
Loading