Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added data and some fixed ordering of human acquisitions. #119

Open
wants to merge 166 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
166 commits
Select commit Hold shift + click to select a range
b0aeb7b
Finally, human acquisition is actually correct.
miretchin Aug 13, 2020
1ac0b65
Added new plotting notebook.
miretchin Aug 16, 2020
7990740
Added compound indexing to Thompson Sampling.
miretchin Sep 28, 2020
8dd7b82
Updated moonshot data.
miretchin Oct 25, 2020
e42ac26
Add HTS.
miretchin Oct 25, 2020
bb88e10
Updated belief active plot.
miretchin Oct 25, 2020
5401841
Added % inhibition column.
miretchin Oct 25, 2020
6b4941c
Added hts dataset script.
Oct 25, 2020
fd3a2f1
Updated data saving and loading to use DGL serializing. Also added tr…
miretchin Oct 31, 2020
2d7e754
Merge branch 'master' into active-human
miretchin Oct 31, 2020
88da736
Removing apparently unnecessary dependency on tqdm.
miretchin Oct 31, 2020
57e35ab
Merge branch 'active-human' of https://github.com/choderalab/pinot in…
miretchin Oct 31, 2020
97c3048
Updated training script for HTS.
miretchin Oct 31, 2020
0c81cf7
Removed the saved serialized HTS file.
miretchin Oct 31, 2020
bcd0a22
Updated bash script for HTS supervised training.
miretchin Oct 31, 2020
f062240
Python 3 in the HTS script.
miretchin Oct 31, 2020
2915f78
Fixed some bugs with the HTS scripts.
miretchin Oct 31, 2020
9e18fb3
Updated HTS script.
miretchin Oct 31, 2020
af5a91c
Fixed label split parameter so that HTS script hopefully runs correct…
miretchin Oct 31, 2020
c83e0df
Updated the script so it wouldn't overwrite the logs.
miretchin Nov 1, 2020
1054600
For now, going to just use at most 20% of the dataset.
miretchin Nov 1, 2020
7a95869
Trying to fix the HTS script.
miretchin Nov 1, 2020
94f9b0d
Hopefully fixing a file error.
miretchin Nov 1, 2020
bcf411d
Fix.
miretchin Nov 1, 2020
5ba0664
For some reason, try/except wasn't working. Very odd.
miretchin Nov 1, 2020
e828338
Hopefully fixed the graph loading error.
miretchin Nov 1, 2020
4fc6c40
Fix graph loading error.
miretchin Nov 1, 2020
90dbd1f
Make bin files unique.
miretchin Nov 1, 2020
1feb1c9
Forgot to make it change the name on write, not just read.
miretchin Nov 1, 2020
e56175f
Syntactical clean-up.
miretchin Nov 1, 2020
236546c
Change file nomenclature.
miretchin Nov 1, 2020
44260e9
Updated the bash script.
miretchin Nov 1, 2020
adf0ea5
Updated data structures and metrics to handle batched datasets.
miretchin Nov 1, 2020
a44f813
Removed the superfluous line at the top of the bash script.
miretchin Nov 1, 2020
a71bbb4
Further refactoring to speed up processing during testing.
miretchin Nov 1, 2020
77e8e6b
Merge branch 'active-human' of https://github.com/choderalab/pinot in…
miretchin Nov 1, 2020
eb94181
Updating fraction of data.
miretchin Nov 1, 2020
39f4a9e
Change the folder for storing the files.
miretchin Nov 2, 2020
27bdd11
Updated variational GP to use the appropriate scaling factor for batc…
miretchin Nov 2, 2020
0e10817
Change belief active plot.
miretchin Nov 2, 2020
88ece6a
Added ability to change annealing and number of inducing points.
miretchin Nov 2, 2020
6020b5a
Fixed the train and test function in the hts_supervised script.
miretchin Nov 2, 2020
ac4df2d
Updated logging.
miretchin Nov 4, 2020
8142cd2
Added new scripts for supervised learning and logging.
miretchin Nov 5, 2020
2899c41
Fixed script so that the filename is smaller.
miretchin Nov 11, 2020
f818313
Added test script.
miretchin Nov 11, 2020
df9c84d
Fixed script for linux.
miretchin Nov 11, 2020
66cf199
Fix syntax and make net permissive of random kwargs.
miretchin Nov 19, 2020
e59d67f
Merge branch 'active-human' of https://github.com/choderalab/pinot in…
miretchin Nov 19, 2020
15c31d4
Fixed training loop resulting from rewriting syntax to return self.
miretchin Nov 22, 2020
31c93a4
Removed bin file.
miretchin Nov 22, 2020
2938266
Fixed the zero-in-degree error by removing single-atom compounds. Add…
miretchin Nov 22, 2020
7824826
Added new bayes opt script for working with mpro_hts data.
miretchin Nov 22, 2020
183a597
Added test/train normalization.
miretchin Nov 22, 2020
9f5df4f
Add normalization flag.
miretchin Nov 22, 2020
e3ae834
Added flag in log file.
miretchin Nov 23, 2020
2a6de6b
Fixed an infuriating bug in argparse.
miretchin Nov 23, 2020
3d1a757
Fixed it again.
miretchin Nov 23, 2020
b895415
Fixed argparse bug with a different approach.
miretchin Nov 23, 2020
7b8522e
Made NN allow kwargs.
miretchin Nov 25, 2020
8c09cfe
Added a shuffle function to split.
miretchin Nov 29, 2020
04408e0
Added sample frac to the active plot and verified that it works. Stil…
miretchin Nov 29, 2020
fe2310c
Typo in datasets.py
miretchin Nov 29, 2020
ce0e8ef
Added annealing and n_inducing_points flag.
miretchin Nov 29, 2020
ce8e956
Changed nomenclature on parser args.
miretchin Nov 29, 2020
6ac2559
Fixed architecture.
miretchin Nov 29, 2020
545e377
Change early stopping parameter.
miretchin Nov 29, 2020
31c9a0f
Added purely functional train/test loop into pinot.app.
miretchin Dec 6, 2020
968b149
Cosmetics in the hts script.
miretchin Dec 6, 2020
badea89
Fixed bugs in the purely functional code.
miretchin Dec 6, 2020
452e416
visualization notebooks
yuanqing-wang Dec 10, 2020
381038d
vae update
yuanqing-wang Dec 10, 2020
c0d17af
Merge branch 'visulization' into active-human
yuanqing-wang Dec 10, 2020
d7f2e40
Added initial time limit on training and made new debugging script.
miretchin Dec 25, 2020
3e16f16
Merge branch 'active-human' of https://github.com/choderalab/pinot in…
miretchin Dec 25, 2020
a02e432
Removed redundant seed descriptor in log.
miretchin Dec 25, 2020
50328bd
Added save-file.
miretchin Dec 25, 2020
fe9ed82
Added shell script.
miretchin Dec 25, 2020
3e885b0
Squashed a bug with imports.
miretchin Dec 25, 2020
a3fc38a
Fixed a bug with arg namespace.
miretchin Dec 25, 2020
6e2c3b8
Fixed file not found error.
miretchin Dec 25, 2020
7868164
Added seed in logs name.
miretchin Dec 25, 2020
030cf8a
Fixed a bug in the file parsing.
miretchin Dec 25, 2020
18a4697
Changed time allotted for script.
miretchin Dec 25, 2020
382b61f
Fixed typo.
miretchin Dec 25, 2020
990c29f
Remove unneeded dependencies.
miretchin Dec 25, 2020
71e214e
Remove tqdm.
miretchin Dec 25, 2020
6be8dad
Added negative log likelihood to the metrics in the debug script.
miretchin Dec 25, 2020
c7e5f3a
Made datasets allow properly set seeds.
miretchin Dec 25, 2020
6a55516
Removed an unnecessary batch call.
miretchin Dec 25, 2020
d74a1f6
Fixed the shuffling of the datasets.
miretchin Dec 25, 2020
bff4c7b
Added filter for outliers (optional).
miretchin Dec 27, 2020
070cb71
Added time limit flag.
miretchin Dec 27, 2020
2942e28
Fixed parser for time limit.
miretchin Dec 27, 2020
04a838f
Verified time limit. ready to go.
miretchin Dec 27, 2020
3dcb9c1
Trying to solve weird parse error.
miretchin Dec 27, 2020
763e30f
Fixed the filtering mechanism.
miretchin Dec 27, 2020
f52e479
Allowing seed to vary.
miretchin Jan 4, 2021
e56a63f
Removed a dumb error in the filepath for the pickled output.
miretchin Jan 4, 2021
6845015
Added sampling to dataset.
miretchin Jan 10, 2021
8f79853
Allowed data to not take a fraction of sample.
miretchin Jan 10, 2021
f7180ad
Removed an underscore that should not exist.
miretchin Jan 10, 2021
600946f
Removed faulty parameter.
miretchin Jan 10, 2021
2ce80b6
Fixed output file not found error.
miretchin Jan 23, 2021
efb54f8
K, that didn't fix the bug. But this did.
miretchin Jan 23, 2021
a4e7ede
Issue with optional arguments in function definition.
miretchin Jan 23, 2021
3063ccd
Made the function fix in both relevant functions in the experiment.py…
miretchin Jan 23, 2021
be4a0c4
Fixed a typo.
miretchin Jan 23, 2021
3c2b77c
Added shallow training and periodic representation training / acquisi…
miretchin Feb 15, 2021
7bad1fc
Buggy code but the future.
miretchin Feb 19, 2021
3a23a96
Got BO fully working. Now the belief function is broken.
miretchin Feb 20, 2021
dd99bab
Removing belief function because it is broken.
miretchin Feb 20, 2021
b61002d
Added update_representation_interval in belief_active_plot.py.
miretchin Feb 20, 2021
1f42774
Made thompson sampling scale.
miretchin Feb 20, 2021
06d0e99
Fixed a very annoying string thing.
miretchin Feb 20, 2021
b5ed681
Added IC50 data from Nir.
miretchin Feb 23, 2021
75d8a70
Fixed error with y/output-list.
miretchin Feb 24, 2021
8a928c6
Trying to fix the shape of the outputs.
miretchin Feb 24, 2021
88bf312
Fixed output dimensions again.
miretchin Feb 24, 2021
64ae7a3
Added ic50 script.
miretchin Feb 26, 2021
823b9aa
Added pretrain loading of representation.
miretchin Feb 26, 2021
1cc2df4
Made it possible to not use the pretrain.
miretchin Feb 26, 2021
6ae27ef
Streamlined and fixed ic50 code.
miretchin Feb 27, 2021
845f298
Fixing loading parameters.
miretchin Feb 27, 2021
2403e20
Dictionary apparently expected to be simpler.
miretchin Feb 27, 2021
9e6c852
Typo.
miretchin Feb 27, 2021
6818966
fixed ic50 script
miretchin Feb 27, 2021
ddf3809
Debugging RMSE.
miretchin Feb 28, 2021
72b5d84
Merge branch 'active-human' of https://github.com/choderalab/pinot in…
miretchin Feb 28, 2021
908a58f
Debugging RMSE.
miretchin Feb 28, 2021
50d5ed2
Merge branch 'active-human' of https://github.com/choderalab/pinot in…
miretchin Feb 28, 2021
a33c496
Uptodate.
miretchin Mar 1, 2021
a22a2c4
Removed unnecessary print.
miretchin Mar 1, 2021
ce39d4a
Merge branch 'active-human' of https://github.com/choderalab/pinot in…
miretchin Mar 1, 2021
8fa17b1
Fixed variable error.
miretchin Mar 2, 2021
67b5dfe
Merge branch 'active-human' of https://github.com/choderalab/pinot in…
miretchin Mar 2, 2021
e64e230
Adding y_hat as a metric.
miretchin Mar 19, 2021
1eaa06f
Adding metrics to IC50 script.
miretchin Mar 19, 2021
01e4ec7
Fixing a bug with the new y_hat and y code.
miretchin Mar 21, 2021
9178ecd
Fixed BO logic to prevent mismatch between net and data.
miretchin Mar 21, 2021
e5c2e9f
testing not resetting the network each round.
miretchin Mar 28, 2021
9fbabb8
Changed grid boundary.
miretchin Mar 28, 2021
770ee36
Made grid boundary depend on normalization.
miretchin Mar 28, 2021
caedbed
Fix file name.
miretchin Mar 28, 2021
b43a9a4
Exposed VGP hyperparameters.
miretchin Mar 29, 2021
abccfd6
Fixing typo on args.
miretchin Apr 4, 2021
2c4634b
Fixed std_value kwargs name.
miretchin Apr 4, 2021
9c83fb1
removed psutil.
miretchin Apr 4, 2021
50b962c
fixing threshold arg.
miretchin Apr 4, 2021
bc2dffd
Added variational parameters to HTS.
miretchin Apr 11, 2021
68189f4
Added k-means initialization for VGP.
miretchin May 2, 2021
8719ef3
Added utils to run k-means.
miretchin May 2, 2021
94b1492
Need to put the net back on cuda.
miretchin May 2, 2021
56ee011
Running some long-needed experiments on what happens when you filter …
miretchin Jan 30, 2022
6039493
Preparing negative-filtered data experiment.
miretchin Jan 31, 2022
1627b55
Changing filename.
miretchin Jan 31, 2022
1d76a95
Updating high-throughput screen script.
miretchin Jan 31, 2022
5303a84
Fixing a small bug when training representation every round.
miretchin Jan 31, 2022
5a46753
Adding an experiment for regressing to ic50 back in March.
miretchin Jan 31, 2022
9d1a046
Fixed the EOF error.
miretchin Jan 31, 2022
17f46ab
Fixing syntax in HTS script.
miretchin Jan 31, 2022
28be2eb
Removing spaces in the HTS script.
miretchin Jan 31, 2022
31d72ea
Fixed a bug with the filter negatives flag.
miretchin Jan 31, 2022
9b0bb67
fixing the for loop with ints in the bash command for HTS script.
miretchin Jan 31, 2022
8945d82
Debugging kmeans initialization.
miretchin Jan 31, 2022
ed805d6
Need to add initializing k-means in the filenmame.
miretchin Feb 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 52 additions & 19 deletions pinot/active/acquisition.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,53 @@
# =============================================================================
# IMPORTS
# =============================================================================
import dgl
import torch
from pinot.metrics import _independent

# =============================================================================
# UTILITIES
# =============================================================================
def _get_utility(net, gs, acq_func, y_best=0.0):


import torch


def _independent_batch(qsar_model, candidate_data, batch_size=512):
""" Infer distribution in batched fashion
"""
candidate_data_batch = candidate_data.batch(
batch_size,
partial_batch=True
)

locs, scales = [], []
for d in candidate_data_batch:
inputs, _ = d
distribution = qsar_model.condition(inputs)
loc_batch = distribution.mean.flatten().detach()
scale_batch = distribution.variance.pow(0.5).flatten().detach()
locs.append(loc_batch)
scales.append(scale_batch)

distribution = torch.distributions.normal.Normal(
loc=torch.cat(locs),
scale=torch.cat(scales)
)

return distribution


def _get_utility(net, candidate_data, acq_func, max_batch_size=512, y_best=0.0):
""" Obtain distribution and utility from acquisition func.
"""
# obtain predictive posterior
distribution = _independent(net.condition(gs))
with torch.no_grad():
distribution = _independent_batch(net, candidate_data)

# obtain utility from vanilla acquisition func
utility = acq_func(distribution, y_best=y_best)

return utility

def _greedy(utility, q=1):
Expand Down Expand Up @@ -69,7 +101,7 @@ def _random(distribution, y_best=0.0, seed=2666):
# =============================================================================
# MODULE FUNCTIONS
# =============================================================================
def thompson_sampling(net, gs, y_best=0.0, q=1, unique=True):
def thompson_sampling(net, inputs, y_best=0.0, q=1, unique=True):
""" Generates m Thompson samples and maximizes them.

Parameters
Expand All @@ -95,7 +127,8 @@ def thompson_sampling(net, gs, y_best=0.0, q=1, unique=True):
The indices corresponding to pending points.
"""
# obtain predictive posterior
distribution = _independent(net.condition(gs))
with torch.no_grad():
distribution = _independent_batch(net, inputs)

# obtain samples from posterior
thetas = distribution.sample((q,))
Expand All @@ -118,7 +151,7 @@ def thompson_sampling(net, gs, y_best=0.0, q=1, unique=True):
return pending_pts


def temporal(net, gs, y_best=0.0, q=1):
def temporal(net, inputs, y_best=0.0, q=1):
r"""Picks the first in sequence.
Designed to be used with temporal datasets to compare with baselines.

Expand Down Expand Up @@ -147,7 +180,7 @@ def temporal(net, gs, y_best=0.0, q=1):
"""
utility = _get_utility(
net,
gs,
inputs,
_temporal,
y_best=y_best
)
Expand All @@ -161,7 +194,7 @@ def temporal(net, gs, y_best=0.0, q=1):
return pending_pts


def probability_of_improvement(net, gs, y_best=0.0, q=1):
def probability_of_improvement(net, inputs, y_best=0.0, q=1):
r""" Probability of Improvement (PI).

Parameters
Expand All @@ -187,7 +220,7 @@ def probability_of_improvement(net, gs, y_best=0.0, q=1):
"""
utility = _get_utility(
net,
gs,
inputs,
_pi,
y_best=y_best
)
Expand All @@ -201,7 +234,7 @@ def probability_of_improvement(net, gs, y_best=0.0, q=1):
return pending_pts


def uncertainty(net, gs, y_best=0.0, q=1):
def uncertainty(net, inputs, y_best=0.0, q=1):
r""" Uncertainty.

Parameters
Expand All @@ -227,7 +260,7 @@ def uncertainty(net, gs, y_best=0.0, q=1):
"""
utility = _get_utility(
net,
gs,
inputs,
_uncertainty,
y_best=y_best
)
Expand All @@ -241,7 +274,7 @@ def uncertainty(net, gs, y_best=0.0, q=1):
return pending_pts


def expected_improvement_analytical(net, gs, y_best=0.0, q=1):
def expected_improvement_analytical(net, inputs, y_best=0.0, q=1):
r""" Analytical Expected Improvement (EI).

Closed-form derivation assumes predictive posterior is a multivariate normal distribution.
Expand Down Expand Up @@ -278,7 +311,7 @@ def expected_improvement_analytical(net, gs, y_best=0.0, q=1):
"""
utility = _get_utility(
net,
gs,
inputs,
_ei_analytical,
y_best=y_best
)
Expand All @@ -292,7 +325,7 @@ def expected_improvement_analytical(net, gs, y_best=0.0, q=1):
return pending_pts


def expected_improvement_monte_carlo(net, gs, y_best=0.0, q=1, n_samples=1000):
def expected_improvement_monte_carlo(net, inputs, y_best=0.0, q=1, n_samples=1000):
r""" Monte Carlo Expected Improvement (EI).

Parameters
Expand Down Expand Up @@ -320,7 +353,7 @@ def expected_improvement_monte_carlo(net, gs, y_best=0.0, q=1, n_samples=1000):
"""
utility = _get_utility(
net,
gs,
inputs,
_ei_monte_carlo,
y_best=y_best
)
Expand All @@ -334,7 +367,7 @@ def expected_improvement_monte_carlo(net, gs, y_best=0.0, q=1, n_samples=1000):
return pending_pts


def upper_confidence_bound(net, gs, y_best=0.0, q=1, kappa=0.95):
def upper_confidence_bound(net, inputs, y_best=0.0, q=1, kappa=0.95):
r""" Upper Confidence Bound (UCB).

Parameters
Expand All @@ -359,7 +392,7 @@ def upper_confidence_bound(net, gs, y_best=0.0, q=1, kappa=0.95):
"""
utility = _get_utility(
net,
gs,
inputs,
_ucb,
y_best=y_best
)
Expand All @@ -373,7 +406,7 @@ def upper_confidence_bound(net, gs, y_best=0.0, q=1, kappa=0.95):
return pending_pts


def random(net, gs, y_best=0.0, q=1, seed=2666):
def random(net, inputs, y_best=0.0, q=1, seed=2666):
""" Random assignment of scores under normal distribution.

Parameters
Expand Down Expand Up @@ -405,7 +438,7 @@ def random(net, gs, y_best=0.0, q=1, seed=2666):
# torch.manual_seed(seed)
utility = _get_utility(
net,
gs,
inputs,
_random,
y_best=y_best
)
Expand Down
Loading