Skip to content

Commit

Permalink
Link up usage guide
Browse files Browse the repository at this point in the history
  • Loading branch information
bsweger committed Oct 10, 2024
1 parent 7ecff57 commit df86eaf
Show file tree
Hide file tree
Showing 3 changed files with 86 additions and 19 deletions.
58 changes: 54 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,66 @@
Cladetime
===============

Cladetime is a lightweight Python library for manipulating SARS-CoV-2 sequence and clade data provided by
Cladetime is a Python library for manipulating SARS-CoV-2 sequence and clade data provided by
`nextstrain.org <https://nextstrain.org/>`_.

Contents
--------

.. toctree::
:titlesonly:
:hidden:

Home <self>
user-guide
reference/index

Installation
------------

Cladetime can be installed with `pip <https://pip.pypa.io/>`_:

.. code-block:: bash
pip install git+https://github.com/reichlab/cladetime.git
Usage
-----

The CladeTime :class:`CladeTime` class provides a lightweight wrapper around historical and current
SARS-CoV-2 GenBank sequence and sequence metadata created by `nextstrain.org's <https://nextstrain.org/>`_
daily workflow pipeline.

.. code-block:: python
import polars as pl
from cladetime import CladeTime
ct = CladeTime()
filtered_sequence_metadata = (
ct.sequence_metadata.select(["country", "division", "date", "host", "clade_nextstrain"])
.filter(
pl.col("country") == "USA",
pl.col("date").is_not_null(),
pl.col("host") == "Homo sapiens",
)
.cast({"date": pl.Date}, strict=False)
)
filtered_sequence_metadata.head(5).collect()
# shape: (5, 5)
# ┌─────────┬──────────┬────────────┬──────────────┬──────────────────┐
# │ country ┆ division ┆ date ┆ host ┆ clade_nextstrain │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ date ┆ str ┆ str │
# ╞═════════╪══════════╪════════════╪══════════════╪══════════════════╡
# │ USA ┆ Alabama ┆ 2022-07-07 ┆ Homo sapiens ┆ 22A │
# │ USA ┆ Arizona ┆ 2022-07-02 ┆ Homo sapiens ┆ 22B │
# │ USA ┆ Arizona ┆ 2022-07-19 ┆ Homo sapiens ┆ 22B │
# │ USA ┆ Arizona ┆ 2022-07-15 ┆ Homo sapiens ┆ 22B │
# │ USA ┆ Arizona ┆ 2022-07-20 ┆ Homo sapiens ┆ 22B │
# └─────────┴──────────┴────────────┴──────────────┴──────────────────┘
See the :doc:`user-guide` for more details about working with Cladetime.`

The :doc:`reference/index` documentation provides API-level documentation.

1 change: 0 additions & 1 deletion docs/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,5 @@ API Reference
=============

.. toctree::
:hidden:

cladetime
46 changes: 32 additions & 14 deletions docs/user-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,6 @@ Cladetime
===============


Installing
------------

cladetime can be installed with `pip <https://pip.pypa.io/>`_:

.. code-block:: bash
pip install git+https://github.com/reichlab/cladetime.git

Finding Nextstrain SARS-CoV-2 sequences and sequence metadata
--------------------------------------------------------------
Expand Down Expand Up @@ -61,14 +52,41 @@ and the reference tree varies over time.
ct = CladeTime()
# ct contains a Polars LazyFrame that references the sequence metadata .tsv file on AWS S3
lz = ct.sequence_metadata
lz
lf = ct.sequence_metadata
lf
<LazyFrame at 0x105341190>
# TODO: some polars examples
Time Traveling
--------------
Getting historical SARS-CoV-2 sequence metadata
------------------------------------------------

A CladeTime instance created without parameters will reference the most
recent data available from Nextstrain.

To access sequence metadata at a specific point in time, pass a date string
in the format 'YYYY-MM-DD' to the CladeTime constructor. Alternately, you pass
a Python datetime object. Both will be treated as UTC dates/times.

.. code-block:: python
from cladetime import CladeTime
ct = CladeTime(sequence_as_of="2024-08-02")
# ct operations now reference the version of the sequence metadata
# that was available at midnight UTC on August 2, 2024.
ct.sequence_metadata \
.cast({"date": pl.Date}, strict=False) \
.select(pl.max("date")).collect()
# shape: (1, 1)
# ┌────────────┐
# │ date │
# │ --- │
# │ date │
# ╞════════════╡
# │ 2024-07-23 │
# └────────────┘
omg!

0 comments on commit df86eaf

Please sign in to comment.