Skip to content

Commit

Permalink
Merge pull request #58 from flatironinstitute/multitask_multigene
Browse files Browse the repository at this point in the history
Multi-gene tasks & Velocity (v0.6.0)
  • Loading branch information
asistradition authored Sep 14, 2022
2 parents 7b6baea + 4760377 commit a6048f8
Show file tree
Hide file tree
Showing 63 changed files with 6,277 additions and 3,402 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,6 @@ jobs:
- name: Test with pytest & coverage
run: |
python -m coverage run -m pytest
python -m coverage xml
- name: Upload Coverage to Codecov
uses: codecov/codecov-action@v1
uses: codecov/codecov-action@v2
2 changes: 1 addition & 1 deletion Tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Install required python libraries:
```
python -m pip install -r requirements.txt
```
Install required libraries for parallelization (running on a single machine requires only `python -m pip install pathos`):
Install required libraries for parallelization (running on a single machine requires only `python -m pip install joblib`):
```
python -m pip install -r requirements-multiprocessing.txt
```
Expand Down
99 changes: 64 additions & 35 deletions devnotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,25 +41,34 @@ and click "fork"

## Install the necessary packages (once)

In order to work with the forked repository you will need `git` and a number of
other tools.
In order to work with the forked repository you will need `git`.

```
sudo apt-get install python-dev
sudo apt-get install python-pip
sudo apt-get install python-nose
sudo apt-get install git
python -m pip install numpy
python -m pip install pandas
```

Download and install [miniconda](https://docs.conda.io/en/latest/miniconda.html).
Create a new conda environment for the inferelator.

```
conda create --name inferelator python=3.10
```

Switch to the inferelator environment and install the required dependencies

```
conda activate inferelator
python -m pip install numpy scipy sklearn pandas joblib anndata matplotlib
```


## Configuring your `git` command line interface

You might want to follow these instructions in case your name and email are not set properly (you should only need to do this once):

```
git config --global user.name "Your Name"
git config --global user.email [email protected]
git config --global user.name "Your Name"
git config --global user.email [email protected]
```
This allows git to associate your name and email with any changes you make.

Expand All @@ -74,6 +83,12 @@ To clone the fork onto your workstation type in the terminal:
git clone $URL
```

Create package links in the inferelator environment to finish installation:

```
python setup.py develop
```

## Set up the "remote" repository (once)

You will want to periodically
Expand Down Expand Up @@ -109,36 +124,50 @@ Unit tests attempt to check that everything is working properly.
It is a good idea to run unit tests frequently, especially before making
changes and after making changes but before committing them.

Run unit tests from the shell with the [nosetests](http://pythontesting.net/framework/nose/nose-introduction/) command
Run unit tests from the shell with the [pytest](https://docs.pytest.org/) command
(this runs the unit tests):

```bash
python -m nose
python -m pytest
```

Output should look like this:

```
...........................................SS.........................
......................................................................
......................................................................
........................
----------------------------------------------------------------------
Ran 241 tests in 14.257s
OK (SKIP=2)
```

Each dot stands for a unit test that ran, "S" stands for "Skipped". If there are
=================================== test session starts ====================================
platform linux -- Python 3.8.3, pytest-7.1.2, pluggy-0.13.1
rootdir: /home/chris/PycharmProjects/inferelator
collected 347 items
inferelator/tests/test_amusr.py .......... [ 2%]
inferelator/tests/test_base_regression.py .... [ 4%]
inferelator/tests/test_bayes_stats.py .................... [ 9%]
inferelator/tests/test_bbsr.py ........... [ 12%]
inferelator/tests/test_crossvalidation_wrapper.py ....................... [ 19%]
inferelator/tests/test_data_loader.py ....... [ 21%]
inferelator/tests/test_data_wrapper.py ...................................... [ 32%]
inferelator/tests/test_design_response.py ............ss.. [ 37%]
inferelator/tests/test_elasticnet_python.py .... [ 38%]
inferelator/tests/test_mi.py ............... [ 42%]
inferelator/tests/test_mpcontrol.py ................. [ 47%]
inferelator/tests/test_noising_data.py .......... [ 50%]
inferelator/tests/test_priors.py .............. [ 54%]
inferelator/tests/test_regression.py .................... [ 60%]
inferelator/tests/test_results_processor.py ........................................ [ 71%]
.ss.... [ 73%]
inferelator/tests/test_single_cell.py ......... [ 76%]
inferelator/tests/test_tfa.py ......... [ 78%]
inferelator/tests/test_utils.py ............. [ 82%]
inferelator/tests/test_workflow_base.py ..................................... [ 93%]
inferelator/tests/test_workflow_multitask.py ............... [ 97%]
inferelator/tests/test_workflow_tfa.py ........ [100%]
====================== 343 passed, 4 skipped, 224 warnings in 35.34s =======================
```

Each dot stands for a unit test that ran, "s" stands for "Skipped". If there are
failures the output will be more extensive, describing which tests failed and how.

For debugging purposes it is sometimes useful to use `print` statements and invoke
nosetests with the `--nocapture` option in order to see the output.

```bash
python -m nose --nocapture
```

# Making a contribution to the project

Before you make a change you want to contribute to the project it
Expand All @@ -155,28 +184,28 @@ To test the process you can go try the following:
cd inferelator/
```

Change one of the files (for example the `utils.py` file), by adding a blank line or something.
Change one of the files (for example the `workflow.py` file), by adding a blank line or something.

## Run Unit Tests again to make sure everything still works

run nosetests (this runs the unit tests):
run pytest (this runs the unit tests):

```
python -m nose
python -m pytest
```

## Push your changes to your Github directory

For each file you altered, run the following command:

```
git add utils.py
git add workflow.py
```

After you've done this for every file you've changed (in this case it's just 1 file), commit the changes to your fork by running:

```
git commit -m "fixed utils.py"
git commit -m "fixed workflow.py"
```
It is a good idea to commit the files you intended to change one at a time
to make sure you don't add unintended changes to the commit.
Expand All @@ -196,7 +225,7 @@ Someone with write access to the master repository will look over your changes.
approve, or close your request.

An approver may ask for changes before approving your pull request. You can add changes by pushing
more commits (to the same branch of your forked repository, in this case the `master` branch).
more commits (to the same branch of your forked repository, in this case the `release` branch).

## More sophisticated work flows

Expand Down
22 changes: 22 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,28 @@
Change Log
==========

Inferelator v0.6.0 `September 14, 2022`
----------------------------------------

New Functionality:

- Support for grouping arbitrary genes from multiple tasks into learning groups
- Workflow to learn homology groups together
- Workflow to explicitly incorporate velocity and decay into learning
- Added support for batching parallelization calls to reduce overhead when data is relatively small

Code Refactoring:

- Refactored multi-task learning to parameterize tfs and genes for each task
- Refactored parallelization around joblib & dask
- Removed pathos and replaced with joblib
- Optimized StARS-LASSO by replacing standalone LASSO with lasso_path

Bug Fixes:

- Fixed several messages to be more informative
- use_no_prior is appropriately applied in multitask learning

Inferelator v0.5.8 `February 23, 2022`
---------------------------------------

Expand Down
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@
# -- Project information -----------------------------------------------------

project = 'inferelator'
copyright = '2019, Flatiron Institute'
copyright = '2022, Flatiron Institute'
author = 'Chris Jackson'

# The full version, including alpha/beta/rc tags
release = 'v0.5.8'
release = 'v0.6.0'


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/references.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ References

* `C. A. Jackson, D. M. Castro, G.-A. Saldi, R. Bonneau, and D. Gresham, “Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments,” eLife, vol. 9, p. e51254, Jan. 2020. <https://doi.org/10.7554/eLife.51254>`_

* `C. Skok Gibbs, C. A. Jackson, G.-A. Saldi et al. Single-cell gene regulatory network inference at scale: The Inferelator 3.0,” bioRxiv, p. 2021.05.03.442499, May 2021. <https://www.biorxiv.org/content/10.1101/2021.05.03.442499v2>`_
* `C. S. Gibbs, C. A. Jackson, G.-A. Saldi, A. Tjärnberg, A. Shah, et al. 2022. “High Performance Single-Cell Gene Regulatory Network Inference at Scale: The Inferelator 3.0.” Bioinformatics , February. <https://doi.org/10.1093/bioinformatics/btac117>``
1 change: 1 addition & 0 deletions inferelator/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@
from inferelator.utils import inferelator_verbose_level
from inferelator.distributed.inferelator_mp import MPControl

from inferelator.workflows import amusr_workflow, single_cell_workflow, tfa_workflow, velocity_workflow
2 changes: 1 addition & 1 deletion inferelator/benchmarking/celloracle.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import copy

from inferelator import utils
from inferelator.single_cell_workflow import SingleCellWorkflow
from inferelator.workflows.single_cell_workflow import SingleCellWorkflow
from inferelator.regression.base_regression import _RegressionWorkflowMixin

import numpy as np
Expand Down
4 changes: 2 additions & 2 deletions inferelator/benchmarking/scenic.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from inferelator import utils
from inferelator.single_cell_workflow import SingleCellWorkflow
from inferelator.workflows.single_cell_workflow import SingleCellWorkflow
from inferelator.regression.base_regression import _RegressionWorkflowMixin
from inferelator.distributed.inferelator_mp import MPControl

Expand Down Expand Up @@ -145,7 +145,7 @@ def run_regression(self):
# Get adjacencies
adj_method = ADJ_METHODS[self.adjacency_method]

if MPControl.is_dask:
if MPControl.is_dask():
client_or_address = MPControl.client.client
MPControl.client.check_cluster_state()
else:
Expand Down
43 changes: 0 additions & 43 deletions inferelator/default.py

This file was deleted.

31 changes: 27 additions & 4 deletions inferelator/distributed/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@ class AbstractController:
_controller_name = None
_controller_dask = False

# Does this method require setup
# Or can it be done on the fly
_require_initialization=False

# Does this method require a clean shutdown
# Or can we just abandon it to the GC
_require_shutdown=False

@classmethod
def name(cls):
"""
Expand All @@ -23,10 +31,6 @@ def name(cls):
raise NameError("Controller name has not been defined")
return cls._controller_name

@classmethod
def is_dask(cls):
return cls._controller_dask

@classmethod
@abstractmethod
def connect(cls, *args, **kwargs):
Expand Down Expand Up @@ -60,3 +64,22 @@ def shutdown(cls):
Clean shutdown of the multiprocessing state
"""
raise NotImplementedError

@classmethod
def set_param(
cls,
param_name,
value
):
"""
Set a parameter in this object
If the value is not None
:param param_name: Parameter name
:type param_name: str
:param value: Value
:type value: any
"""

if value is not None:
setattr(cls, param_name, value)
Loading

0 comments on commit a6048f8

Please sign in to comment.