Releases: FunctionLab/selene
Releases · FunctionLab/selene
0.5.3
0.5.2
Fixes a NumPy/Cython type error causing build issues with Python 3.9+
0.5.0
Version 0.5.0
New functionality
sampler.MultiSampler
:MultiSampler
accepts any Selene sampler for each of the train, validation, and test partitions where previouslyMultiFileSampler
only acceptedFileSampler
s. We will deprecateMultiFileSampler
in our next major release.DataLoader
: Parallel data loading based on PyTorch'sDataLoader
class, which can be used with Selene'sMultiSampler
andMultiFileSampler
class. (see:sampler.SamplerDataLoader
,sampler.H5DataLoader
)- To support parallelism via multiprocessing, the sampler that
SamplerDataLoader
used needs to be picklable. To enable this, opening file operations are delayed to when any method that needs the file is called. There is no change to the API and settinginit_unpicklable=True
in__init__
forGenome
and allOnlineSampler
classes will fully reproduce the functionality inselene_sdk<=0.4.8
. sampler.RandomPositionsSampler
: added support forcenter_bin_to_predict
taking in a list/tuple of two integers to specify the region from which to query the targets---that is,center_bin_to_predict
by default (center_bin_to_predict=<int>
) queries targets based on the center bin size, but can be specified as start and end integers that are not at the center if desired.EvaluateModel
: accepts a list of metrics (by default computing ROC AUC and average precision) with which to evaluate the test dataset.
Usage
- Command-line interface (CLI): You can now run the CLI directly with
python -m selene_sdk
(if you have cloned the repository, make sure you have locally installedselene_sdk
viapython setup.py install
, orselene_sdk
is in the same directory as your script / added toPYTHONPATH
). Developers can make a copy of theselene_sdk/cli.py
script and use it the same way thatselene_cli.py
was used in earlier versions of Selene (python -u cli.py <config-yml> [--lr]
)
Bug fixes
EvaluateModel
:use_features_ord
allows you to evaluate a trained model on only a subset of chromatin features (targets) predicted by the model. If you are using aFileSampler
for your test dataset, you now have the option to pass in a subsetted matrix; however, this matrix must be ordered the same way asfeatures
(the original targets prediction ordering) and not in the same ordering asuse_features_ord
. However, the final model predictions and targets
(test_predictions.npz
andtest_targets.npz
) will be outputted according to theuse_features_ord
list and ordering.MatFileSampler
: Previously theMatFileSampler
reset the pointer to the start of the matrix too early (going back to the first sample before we had finished sampling the whole matrix).- CLI learning rate: Edge cases (e.g. not specifying the learning rate via CLI or config) previously were not handled correctly and did not throw an informative error.
0.4.8
Enhancements
- PyTorch now has flexible state dict loading, which allows users more flexibility in loading models that were trained with older/newer versions of PyTorch. Selene has been updated to use this parameter.
- Added HeartENN model architecture ahead of publication.
0.4.7
Bugfixes:
- Use
self.use_cuda
inget_predict
for raw sequence input in theAnalyzeSequences
class.
0.4.6
Updates
- Allow users to pass in individual sequences to
get_predictions
inAnalyzeSequences
class and get the model prediction directly (as opposed to having it be written to an output file).
0.4.5
Updates
- Specify upper & lower bounds for Selene's torch dependency
- Add '.' as a valid delimiter for VCF multiallelic parsing
- Allow users to evaluate on subsets of features in EvaluateModel
Bugfixes:
BASES_ARR
type consistency (specify as a list only) and resetting for lua-trained model vs. Selene-trained model.
0.4.4
Updates
- Refactored variant effect prediction to simplify the code
- Removed
contains_unk
column from output ofget_predictions_from_fasta
inAnalyzeSequences
class
Bugfixes
- Fixed variant effect prediction handling for odd-length sequences
0.4.3
Updates:
- Add a column
contains_unk
to BED/VCF predictions. This boolean column indicates whether a sequence contains any unknown bases.
Bugfixes:
- MultiModelWrapper can be used with CUDA.
0.4.2
Updates:
- MultiModelWrapper for model evaluation
Bugfixes: