Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O2vae integration rebase #75

Open
wants to merge 152 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
1a44dca
Attempt at cli
ctr26 Jan 18, 2024
88d1004
local changes to run
afoix Jan 20, 2024
de5ee1f
command line arguments
afoix Jan 20, 2024
f5b3f3f
enable testing + uncomment dataset
afoix Jan 20, 2024
d20436d
added a slurm python script
afoix Jan 20, 2024
2dff311
fix cli type
afoix Jan 20, 2024
a541309
choose memory allocation base on latent space size
afoix Jan 20, 2024
e4aea5e
dynamically chose n gpus based on latent space size + fix mem allocat…
afoix Jan 20, 2024
e9a7b6c
fix gpu allocation typo
afoix Jan 20, 2024
00a4ec8
added a --clear-checkpoints clarg
afoix Jan 23, 2024
132786c
run individual jobs in own folder to work around checkpoints
afoix Jan 23, 2024
d41eeee
modification for slurm
afoix Feb 22, 2024
61edfaf
changes in the shape embed script
afoix Feb 22, 2024
0dcb45f
fix merge commit + add command line args for dataset (name and path) …
afoix Feb 22, 2024
01b0ccc
duplicated slurm script + specify dataset
afoix Feb 22, 2024
af46bcc
Fix wandb logger
afoix Feb 22, 2024
1ccd155
Add helakyoto dataset to the slurm script
afoix Feb 22, 2024
5765e90
Added allen dataset
afoix Mar 1, 2024
a85cc07
Limit time per job increased to 24h
afoix Mar 1, 2024
e4be5ab
change back to use dataset name from clarg + change default wandb job…
afoix Mar 3, 2024
6222e79
added back dataset subseting
afoix Mar 3, 2024
775035a
Added a tiny dataset for quick debugging (commented out in the slurm …
afoix Mar 3, 2024
55b52b1
use specific gpu resource
afoix Mar 4, 2024
259ff42
put back frobenius norm false
afoix Mar 6, 2024
861f678
all changes
afoix Mar 27, 2024
3f8a9a0
first structure
afoix Mar 27, 2024
cf40209
Properly overwrite default params from clargs
afoix Apr 1, 2024
e20e029
Use DatasetFolder to load .npy and turn the dist matrix into a 3 chan…
afoix Apr 1, 2024
d2193d0
Disable checkpoints in training by default (maybe re-enable at some f…
afoix Apr 1, 2024
e1a6217
Enable gpu accelleration by default
afoix Apr 1, 2024
53901d5
more informative verbose print
afoix Apr 1, 2024
cc31e6f
bring argparse to the masks2distmatrices script
afoix Apr 1, 2024
996b109
training and test model
afoix Apr 2, 2024
d151578
Roll indices + normalisation + sanity_check + dataset name for latent…
afoix Apr 2, 2024
bc6289c
Added wandb logging
afoix Apr 2, 2024
1638aae
Added the extraction of original/reconstructed matrices + clarg for o…
afoix Apr 2, 2024
b7b2ead
created a script that renders dist matrices .npy as .png images
afoix Apr 2, 2024
b9fdeba
new changes: sparisity, periodicity and also add a script to draw co…
afoix Apr 9, 2024
b056f5f
masks2distmat: turn find_contour into find_longest_contour
afoix Apr 9, 2024
156ef7e
masks2distmat: enable periodic splprep for closed contours
afoix Apr 9, 2024
b28c471
masks2distmat: updated default sparsity to 4
afoix Apr 9, 2024
2412e73
distmat2contour: removed spurious return statement in vprint
afoix Apr 9, 2024
45605c1
drawContourFromDM: removed spurious return statement in vprint
afoix Apr 9, 2024
4152f1b
set correct aspect ratio for distmat2contour scripts
afoix Apr 9, 2024
ad5d8fd
add different normalisations in dataset initial transformations
afoix Apr 9, 2024
9a52703
Add notion of class label to ditmat2emb script output
afoix Apr 9, 2024
c306767
Updated default model path in distmat2emb script
afoix Apr 9, 2024
d614315
Added umap and kmeans + original filenames list
afoix Apr 15, 2024
e8326ad
dist2emb: random seed for np and pl
afoix Apr 17, 2024
58a1936
dist2emb: test different initial transformations
afoix Apr 17, 2024
56ca8f4
dist2emb: remove "TODO" from prints
afoix Apr 17, 2024
420770f
MaskEmbed: log losses in loss_function method
afoix Apr 17, 2024
ac3addf
removed new line
afoix Apr 24, 2024
5da2a74
Normalise contour coord in mask2distmat script
afoix Apr 24, 2024
0ec1bba
Use bokeh for interactive umap plot (save as html file)
afoix Apr 24, 2024
e4e3548
save latent_space with extra info as pickle again and have a separate…
afoix Apr 25, 2024
42aa895
updated the render umap script with a _hardcoded_ trick to extract in…
afoix Apr 26, 2024
ff41213
minor config + comments
afoix Apr 29, 2024
0c5a8b9
added a beta vae model
afoix May 8, 2024
c5a00a4
added extra parameters in the wandb jobname
afoix May 8, 2024
8a97d06
finer grained clargs around latent space related parameters
afoix May 10, 2024
8580671
log different losses for vq or beta models
afoix May 10, 2024
77f9956
code to do classification using the features of the latent space
afoix May 13, 2024
aad3919
new latent space size
afoix Jun 5, 2024
5374669
Added imports that will be needed for next commits
afoix Jun 5, 2024
96f4187
Added checkpoint mechanism
afoix Jun 5, 2024
4a8b138
Added regionprops + fourrier decomposition trials (! hardcoded path t…
afoix Jun 5, 2024
80664a6
Adding a n compression parameter for the latent space size
afoix Jun 16, 2024
8344062
improve scoring function and use StratifiedKFold instead of KFold for…
afoix Jun 16, 2024
7eb65a1
hardcoded commited setup now points to quick test setup
afoix Jun 16, 2024
944fe1e
initial refactor commit, script with split up functionnalities, missi…
afoix Jun 16, 2024
5a23664
Added predictions + kmeans of input data
afoix Jun 17, 2024
6f8d2dc
factored out evaluation functionality + added regionprops, efd and sc…
afoix Jun 17, 2024
ae26709
cleaner logging + score shapeembed itself
afoix Jun 18, 2024
20c855b
reshaped shapeembed reported dataframe
afoix Jun 18, 2024
2115de7
renamed label to class
afoix Jun 18, 2024
c863fca
updated scoring function + collate and save results
afoix Jun 18, 2024
3f9d885
Added clargs to control matrix normalization and roll
afoix Jun 18, 2024
0737749
Added umap_plot
afoix Jun 18, 2024
a211f14
fix dataset clarg
afoix Jun 24, 2024
d212403
fix model name clarg
afoix Jun 24, 2024
394b673
fix model_name clarg again
afoix Jun 24, 2024
ea17864
Added early stop clarg (default no early stop)
afoix Jun 25, 2024
57b1e54
added confusion matrices to scoring function
afoix Jun 26, 2024
8d439bb
use integer division for compression factor clarg
afoix Jun 26, 2024
1ed802a
explicitly binarise image when running regionprops
afoix Jun 26, 2024
e6c7840
keep 'class' as a column rather than index + keeps column names as st…
afoix Jun 26, 2024
8e4ed4c
change len for shape[0]
afoix Jun 26, 2024
2fa4a5f
drop not needed return value from run_predictions
afoix Jun 26, 2024
bc0e588
added combined shapeembed + efd + regionprops scoring and comment out…
afoix Jun 26, 2024
eaef9b4
save combined score
afoix Jun 27, 2024
c4ea863
save confusion matrices
afoix Jun 27, 2024
511fe9f
First attempt at a result gathering script
afoix Jul 5, 2024
3818d23
added barplots
afoix Jul 5, 2024
74d48af
Added a separate regionprops script
afoix Jul 18, 2024
25d31f9
added a separate efd script
afoix Jul 18, 2024
dfe23d6
refactor efd and regionprops out of evaluation helpers
afoix Jul 18, 2024
11954a6
less debug info by default + create outdir if not there
afoix Jul 18, 2024
037e527
removed regionprops/efd from main shapeembed script + filename saniti…
afoix Jul 18, 2024
af66d48
unify file names across efd/regionprops/shapeembed
afoix Jul 18, 2024
d374ce0
Added a readme
afoix Jul 18, 2024
da8e483
track params in reporting
afoix Jul 18, 2024
73aa420
also add model specific params as tag columns
afoix Jul 18, 2024
9aed31c
added a slurm script to sweap shapeembed parameters
afoix Jul 18, 2024
dfdb4fe
added resnet50_beta_vae to the factory
afoix Jul 18, 2024
078d15f
added resnet50_beta_vae to the shapeembed script
afoix Jul 18, 2024
541e74b
handle per model params in slurm script + chose some param values to …
afoix Jul 18, 2024
7c3cac5
better slurm jobname
afoix Jul 18, 2024
1f6886e
removed compression factor 20
afoix Jul 19, 2024
c500e3c
bumped up memory allocation to 250G
afoix Jul 19, 2024
85ed923
added --no-early-stop flag
afoix Jul 19, 2024
276d1b9
added an oom_retry function
afoix Jul 20, 2024
83a2995
refined min / max epochs clargs
afoix Jul 20, 2024
df2f7c8
slurm script refactor args + force 150 epochs
afoix Jul 20, 2024
bd3d9e5
bring triangular + compression computation in named function (to shar…
afoix Jul 20, 2024
2149ecf
fix in model_str function test of model_args
afoix Jul 20, 2024
52a7ed0
refactor slurm script to detect already completed jobs
afoix Jul 20, 2024
7d1f5d7
factored out some common helpers
afoix Jul 20, 2024
23c8f4b
Add a comment/uncomment block for quick ad-hoc single config run
afoix Jul 21, 2024
584e7e6
added a function to find currently submitted slurm jobs
afoix Jul 21, 2024
5d1aa18
added clargs for job filtering enabling/disabling (enabled by default)
afoix Jul 21, 2024
edf6ad4
typo fix: sweap -> sweep
afoix Jul 21, 2024
7acab6f
parse dataset as a SimpleNamespace from job string
afoix Jul 21, 2024
ae5e5b6
updated data gathering script to newer changes (still TODO for figures)
afoix Jul 21, 2024
a205f76
removed stale script string
afoix Jul 21, 2024
8eb041b
Split model name in two columns if there are model args
afoix Jul 21, 2024
1a129f8
remove stale import
afoix Jul 21, 2024
3f8d3da
experiment with plots
afoix Jul 21, 2024
42f9f29
keep exploring potential plots
afoix Jul 22, 2024
352c7b4
more graphs
afoix Jul 22, 2024
21eb6f6
fix model name in shapeembed output csv
afoix Jul 22, 2024
6e2d175
Added loss / mse to shapeembed's generated csv
afoix Jul 22, 2024
6fe7511
updated slurm script with regex filtering of squeue output
afoix Jul 25, 2024
a4cf1ae
added a simple latex table to the gather_run_results script
afoix Jul 25, 2024
6ca9917
minor refactor in efd
afoix Jul 27, 2024
1ed13fe
minor refactor in regionprops
afoix Jul 27, 2024
56e9d91
generated plots and more tables in gather_run_results
afoix Jul 27, 2024
63e2c57
added regionprops and efd to gather results script
afoix Jul 27, 2024
2f1e450
Updated graphs titles
afoix Jul 29, 2024
0d57940
fake beta column if necessary and filter out regionprops and efd for …
afoix Jul 29, 2024
f70841b
updated datasets + only find jobs and scores if corresponding filter …
afoix Jul 29, 2024
397d06e
bugfix overwriting loop dataframe
afoix Jul 29, 2024
a8a0cf9
dded a clarg to control region prop properties
afoix Jul 30, 2024
dc86f5c
Added random order to efd and regionprops
afoix Aug 8, 2024
a0701b9
force different markers for scatter plot F1vMSE
afoix Sep 7, 2024
2943c82
updated scatterplot
afoix Sep 9, 2024
4370a52
add standard deviation to the report for regions props and efd
afoix Sep 27, 2024
a3c6c91
modification slurm script
afoix Sep 27, 2024
a7465e9
changes to test o2vae integration XXX relies on an adapted o2vae repo…
afoix Sep 29, 2024
905644f
off-by-one in square recrop
afoix Sep 29, 2024
c421f6f
specialized slurm script
afoix Sep 30, 2024
738a5c2
added o2vae repo patch
afoix Sep 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions bioimage_embed/augmentations.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,39 @@
DEFAULT_AUGMENTATION = A.Compose(DEFAULT_AUGMENTATION_LIST)
DEFAULT_ALBUMENTATION = A.Compose(DEFAULT_AUGMENTATION_LIST)

DEFAULT_AUGMENTATION_LIST = [
# Flip the images horizontally or vertically with a 50% chance
A.OneOf(
[
A.HorizontalFlip(p=0.5),
A.VerticalFlip(p=0.5),
],
p=0.5,
),
# Rotate the images by a random angle within a specified range
A.Rotate(limit=45, p=0.5),
# Randomly scale the image intensity to adjust brightness and contrast
A.RandomGamma(gamma_limit=(80, 120), p=0.5),
# Apply random elastic transformations to the images
A.ElasticTransform(
alpha=1,
sigma=50,
alpha_affine=50,
p=0.5,
),
# Shift the image channels along the intensity axis
A.ChannelShuffle(p=0.5),
# Add a small amount of noise to the images
A.GaussNoise(var_limit=(10.0, 50.0), p=0.5),
# Crop a random part of the image and resize it back to the original size
A.RandomResizedCrop(
height=512, width=512, scale=(0.9, 1.0), ratio=(0.9, 1.1), p=0.5
),
# Adjust image intensity with a specified range for individual channels
A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5),
]

DEFAULT_AUGMENTATION = A.Compose(DEFAULT_AUGMENTATION_LIST)

class VisionWrapper:
def __init__(self, transform_dict, *args, **kwargs):
Expand Down
2 changes: 2 additions & 0 deletions bioimage_embed/lightning/torch.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ def __init__(self, model, args=SimpleNamespace()):
# TODO update all models to use this for export to onxx
# self.example_input_array = torch.randn(1, *self.model.input_dim)
# self.model.train()
# keep a handle on metrics logged by the model
self.metrics = {}

def forward(self, x: torch.Tensor) -> ModelOutput:
"""
Expand Down
14 changes: 13 additions & 1 deletion bioimage_embed/models/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
from . import bolts
from functools import partial


class ModelFactory:
def __init__(
self, input_dim, latent_dim, pretrained=False, progress=True, **kwargs
Expand Down Expand Up @@ -144,6 +143,19 @@ def resnet18_beta_vae(self):
bolts.ResNet18VAEDecoder,
)

def resnet50_vae(self):
return self.create_model(
partial(
pythae.models.VAEConfig,
use_default_encoder=False,
use_default_decoder=False,
**self.kwargs
),
pythae.models.VAE,
bolts.ResNet50VAEEncoder,
bolts.ResNet50VAEDecoder,
)

def resnet50_vqvae(self):
return self.create_model(
partial(
Expand Down
97 changes: 97 additions & 0 deletions bioimage_embed/models/o2vae_shapeembed_integration.diff
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
diff --git a/models/align_reconstructions.py b/models/align_reconstructions.py
index d07d1ab..c52b40d 100644
--- a/models/align_reconstructions.py
+++ b/models/align_reconstructions.py
@@ -6,7 +6,7 @@ import torch
import torchgeometry as tgm
import torchvision.transforms.functional as T_f

-from registration import registration
+from ..registration import registration


def loss_reconstruction_fourier_batch(x, y, recon_loss_type="bce", mask=None):
diff --git a/models/decoders/cnn_decoder.py b/models/decoders/cnn_decoder.py
index ba3a1cc..1740945 100644
--- a/models/decoders/cnn_decoder.py
+++ b/models/decoders/cnn_decoder.py
@@ -58,7 +58,7 @@ class CnnDecoder(nn.Module):

self.dec_conv = nn.Sequential(*layers)

- def forward(self, x):
+ def forward(self, x, epoch = None):
bs = x.size(0)
x = self.fc(x)
dim = x.size(1)
diff --git a/models/encoders_o2/e2scnn.py b/models/encoders_o2/e2scnn.py
index 9c4f47f..e292b1e 100644
--- a/models/encoders_o2/e2scnn.py
+++ b/models/encoders_o2/e2scnn.py
@@ -219,14 +219,20 @@ class E2SFCNN(torch.nn.Module):
repr += f"\t{i: <3} - {name: <70} | {params: <8} |\n"
return repr

- def forward(self, input: torch.tensor):
+ def forward(self, input: torch.tensor, epoch = None):
+ #print(f"DEBUG: e2scnn forward: input.shape: {input.shape}")
x = GeometricTensor(input, self.in_repr)
+ #print(f"DEBUG: e2scnn forward: pre layers x.shape: {x.shape}")

for layer in self.eq_layers:
x = layer(x)

+ #print(f"DEBUG: e2scnn forward: pre fully_net x.shape: {x.shape}")
+
x = self.fully_net(x.tensor.reshape(x.tensor.shape[0], -1))

+ #print(f"DEBUG: e2scnn forward: pre final x.shape: {x.shape}")
+
return x

def build_layer_regular(
diff --git a/models/vae.py b/models/vae.py
index 3af262b..af1a2dc 100644
--- a/models/vae.py
+++ b/models/vae.py
@@ -3,8 +3,9 @@ import importlib
import numpy as np
import torch
import torchvision
+from pythae.models.base.base_utils import ModelOutput

-from models import align_reconstructions
+from . import align_reconstructions

from . import model_utils as mut

@@ -273,10 +274,11 @@ class VAE(torch.nn.Module):

return y

- def forward(self, x):
+ def forward(self, x, epoch = None):
+ x = x["data"]
in_shape = x.shape
bs = in_shape[0]
- assert x.ndim == 4
+ assert len(in_shape) == 4

# inference and sample
z = self.q_net(x)
@@ -290,8 +292,12 @@ class VAE(torch.nn.Module):
y = torch.sigmoid(y)
# check the spatial dimensions are good (if doing multiclass prediction per pixel, the `c` dim may be different)
assert in_shape[-2:] == y.shape[-2:], (
- "output image different dimension to "
- "input image ... probably change the number of layers (cnn_dims) in the decoder"
+ f"output image different dimension {y.shape[-2:]} to "
+ f"input image {in_shape[-2:]} ... probably change the number of layers (cnn_dims) in the decoder"
)

- return x, y, mu, logvar
+ # gather losses
+ losses = self.loss(x, y, mu, logvar)
+
+ return ModelOutput(recon_x=y, z=z_sample, loss=losses['loss'], recon_loss=losses['loss_recon'])
+ #return ModelOutput(recon_x=y, z=z_sample)
2 changes: 1 addition & 1 deletion bioimage_embed/shapes/lightning.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ def eval_step(self, batch, batch_idx):
[
loss_ops.diagonal_loss(),
loss_ops.symmetry_loss(),
# loss_ops.triangle_inequality(),
loss_ops.non_negative_loss(),
# loss_ops.triangle_inequality(),
# loss_ops.clockwise_order_loss(),
]
)
Expand Down
5 changes: 3 additions & 2 deletions bioimage_embed/shapes/mds.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@ def mds(d):
:return: A matrix of x, y coordinates.
"""
n = d.size(0)
I = torch.eye(n)
I = torch.eye(n, dtype=torch.float64)
H = I - torch.ones((n, n)) / n

S = -0.5 * H @ d @ H
eigvals, eigvecs = S.symeig(eigenvectors=True)
#eigvals, eigvecs = S.symeig(eigenvectors=True)
eigvals, eigvecs = torch.linalg.eigh(S)

# Sort the eigenvalues and eigenvectors in decreasing order
idx = eigvals.argsort(descending=True)
Expand Down
2 changes: 2 additions & 0 deletions scripts/shapeembed/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .dataset_transformations import mask2distmatrix
from .evaluation import *
42 changes: 42 additions & 0 deletions scripts/shapeembed/common_helpers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
import re
import os
import glob
import types
import logging

def compressed_n_features(dist_mat_size, comp_fact):
return dist_mat_size*(dist_mat_size-1)//(2**comp_fact)

def model_str(params):
s = f'{params.model_name}'
if hasattr(params, 'model_args'):
s += f"-{'_'.join([f'{k}{v}' for k, v in vars(params.model_args).items()])}"
return s

def job_str(params):
return f"{params.dataset.name}-{model_str(params)}-{params.compression_factor}-{params.latent_dim}-{params.batch_size}"

def job_str_re():
return re.compile("(.*)-(.*)-(\d+)-(\d+)-(\d+)")

def params_from_job_str(jobstr):
raw = jobstr.split('-')
ps = types.SimpleNamespace()
ps.batch_size = int(raw.pop())
ps.latent_dim = int(raw.pop())
ps.compression_factor = int(raw.pop())
if len(raw) == 3:
ps.model_args = types.SimpleNamespace()
for p in raw.pop().split('-'):
if p[0:4] == 'beta': ps.model_args.beta = float(p[4:])
ps.model_name = raw.pop()
ps.dataset = types.SimpleNamespace(name=raw.pop())
return ps

def find_existing_run_scores(dirname, logger=logging.getLogger(__name__)):
ps = []
for f in glob.glob(f'{dirname}/*-shapeembed-score_df.csv'):
p = params_from_job_str(os.path.basename(f)[:-24])
p.csv_file = f
ps.append(p)
return ps
Loading
Loading