Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
rcannood committed Sep 11, 2024
1 parent 0233b09 commit 0431995
Showing 1 changed file with 332 additions and 23 deletions.
355 changes: 332 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,346 @@
# Task Template
# Template

This repo is a template to create a new task for the OpenProblems v2. This repo contains several example files and components that can be used when updated with the task info.

> [!WARNING]
> This README will be overwritten when performing the `create_task_readme` script.
<!--
This file is automatically generated from the tasks's api/*.yaml files.
Do not edit this file directly.
-->

## Create a repository from this template
A one sentence summary of purpose and methodology. Used for creating an
overview tables.

> [!IMPORTANT]
> Before creating a new repository, make sure you are part of the OpenProblems task team. This will be done when you create an issue for the task and you get the go ahead to create the task.
> For more information on how to create a new task, check out the [Create a new task](https://openproblems.bio/documentation/create_task/) documentation.
Repository: [rcannood/test](https://github.com/rcannood/test)

The instructions below will guide you through creating a new repository from this template ([creating-a-repository-from-a-template](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-repository-from-a-template#creating-a-repository-from-a-template)).
## Description

Provide a clear and concise description of your task, detailing the
specific problem it aims to solve. Outline the input data types, the
expected output, and any assumptions or constraints. Be sure to explain
any terminology or concepts that are essential for understanding the
task.

* Click the "Use this template" button on the top right of the repository.
* Use the Owner dropdown menu to select the `openproblems-bio` account.
* Type a name for your repository (task_...), and a description.
* Set the repository visibility to public.
* Click "Create repository from template".
Explain the motivation behind your proposed task. Describe the
biological or computational problem you aim to address and why it’s
important. Discuss the current state of research in this area and any
gaps or challenges that your task could help address. This section
should convince readers of the significance and relevance of your task.

## Clone the repository
## Authors & contributors

To clone the repository with the submodule files, you can use the following command:
| name | roles |
|:---------|:-------------------|
| John Doe | author, maintainer |

```bash
git clone --recursive [email protected]:openproblems-bio/<repo_name>.git
## API

``` mermaid
flowchart LR
file_common_dataset("Common Dataset")
comp_data_processor[/"Data processor"/]
file_solution("Solution")
file_test_h5ad("Test data")
file_train_h5ad("Training data")
comp_control_method[/"Control Method"/]
comp_metric[/"Metric"/]
comp_method[/"Method"/]
file_prediction("Predicted data")
file_score("Score")
file_common_dataset---comp_data_processor
comp_data_processor-->file_solution
comp_data_processor-->file_test_h5ad
comp_data_processor-->file_train_h5ad
file_solution---comp_control_method
file_solution---comp_metric
file_test_h5ad---comp_control_method
file_test_h5ad---comp_method
file_train_h5ad---comp_control_method
file_train_h5ad---comp_method
comp_control_method-->file_prediction
comp_metric-->file_score
comp_method-->file_prediction
file_prediction---comp_metric
```
>[!NOTE]
> If somehow there are no files visible in the submodule after cloning using the above command. Check the instructions [here](common/README.md).

## What to do next
## File format: Common Dataset

A subset of the common dataset.

Example file: `resources_test/common/pancreas/dataset.h5ad`

Format:

<div class="small">

AnnData object
obs: 'cell_type', 'batch'
var: 'hvg', 'hvg_score'
obsm: 'X_pca'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `obs["cell_type"]` | `string` | Cell type information. |
| `obs["batch"]` | `string` | Batch information. |
| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. |
| `var["hvg_score"]` | `double` | A ranking of the features by hvg. |
| `obsm["X_pca"]` | `double` | The resulting PCA embedding. |
| `layers["counts"]` | `integer` | Raw counts. |
| `layers["normalized"]` | `double` | Normalized expression values. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |

</div>

## Component type: Data processor

A data processor.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--input` | `file` | A subset of the common dataset. |
| `--output_train` | `file` | (*Output*) The training data in h5ad format. |
| `--output_test` | `file` | (*Output*) The subset of molecules used for the test dataset. |
| `--output_solution` | `file` | (*Output*) The solution for the test data. |

</div>

## File format: Solution

The solution for the test data

Example file: `resources_test/task_template/pancreas/solution.h5ad`

Format:

<div class="small">

AnnData object
obs: 'label', 'batch'
var: 'hvg', 'hvg_score'
obsm: 'X_pca'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `obs["label"]` | `string` | Ground truth cell type labels. |
| `obs["batch"]` | `string` | Batch information. |
| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. |
| `var["hvg_score"]` | `double` | A ranking of the features by hvg. |
| `obsm["X_pca"]` | `double` | The resulting PCA embedding. |
| `layers["counts"]` | `integer` | Raw counts. |
| `layers["normalized"]` | `double` | Normalized counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |

</div>

## File format: Test data

The subset of molecules used for the test dataset

Example file: `resources_test/task_template/pancreas/test.h5ad`

Format:

<div class="small">

AnnData object
obs: 'batch'
var: 'hvg', 'hvg_score'
obsm: 'X_pca'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'normalization_id'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `obs["batch"]` | `string` | Batch information. |
| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. |
| `var["hvg_score"]` | `double` | A ranking of the features by hvg. |
| `obsm["X_pca"]` | `double` | The resulting PCA embedding. |
| `layers["counts"]` | `integer` | Raw counts. |
| `layers["normalized"]` | `double` | Normalized counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |

</div>

## File format: Training data

The training data in h5ad format

Example file: `resources_test/task_template/pancreas/train.h5ad`

Format:

<div class="small">

AnnData object
obs: 'label', 'batch'
var: 'hvg', 'hvg_score'
obsm: 'X_pca'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'normalization_id'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `obs["label"]` | `string` | Ground truth cell type labels. |
| `obs["batch"]` | `string` | Batch information. |
| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. |
| `var["hvg_score"]` | `double` | A ranking of the features by hvg. |
| `obsm["X_pca"]` | `double` | The resulting PCA embedding. |
| `layers["counts"]` | `integer` | Raw counts. |
| `layers["normalized"]` | `double` | Normalized counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |

</div>

## Component type: Control Method

Quality control methods for verifying the pipeline.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--input_train` | `file` | The training data in h5ad format. |
| `--input_test` | `file` | The subset of molecules used for the test dataset. |
| `--input_solution` | `file` | The solution for the test data. |
| `--output` | `file` | (*Output*) A predicted dataset as output by a method. |

</div>

## Component type: Metric

A task template metric.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--input_solution` | `file` | The solution for the test data. |
| `--input_prediction` | `file` | A predicted dataset as output by a method. |
| `--output` | `file` | (*Output*) File indicating the score of a metric. |

</div>

## Component type: Method

A method.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--input_train` | `file` | The training data in h5ad format. |
| `--input_test` | `file` | The subset of molecules used for the test dataset. |
| `--output` | `file` | (*Output*) A predicted dataset as output by a method. |

</div>

## File format: Predicted data

A predicted dataset as output by a method.

Example file: `resources_test/task_template/pancreas/prediction.h5ad`

Format:

<div class="small">

AnnData object
obs: 'label_pred'
uns: 'dataset_id', 'normalization_id', 'method_id'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:--------------------------|:---------|:-------------------------------------|
| `obs["label_pred"]` | `string` | Predicted labels for the test cells. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |
| `uns["method_id"]` | `string` | A unique identifier for the method. |

</div>

## File format: Score

File indicating the score of a metric.

Example file: `resources/score.h5ad`

Format:

<div class="small">

AnnData object
uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |
| `uns["method_id"]` | `string` | A unique identifier for the method. |
| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. |
| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |

Check out the [instructions](https://github.com/openproblems-bio/common_resources/blob/main/INSTRUCTIONS.md) for more information on how to update the example files and components. These instructions also contain information on how to build out the task and basic commands.
</div>

For more information on the OpenProblems v2, check out the [documentation](https://openproblems.bio/documentation/).

0 comments on commit 0431995

Please sign in to comment.