Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argo #2

Merged
merged 30 commits into from
Dec 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pr-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
test:
strategy:
matrix:
os: [windows-latest, ubuntu-latest, macos-latest]
os: [ubuntu-latest, macos-latest]
py3version: ["9", "11"]
fail-fast: false
uses: arup-group/actions-city-modelling-lab/.github/workflows/python-install-lint-test.yml@main
Expand Down
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,8 @@ reports/
mike-*.yml

# Jupyter notebooks
.ipynb_checkpoints
.ipynb_checkpoints

sandbox.py
tests/test_data/outputs/
tests/test_data/outputs/*log
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Removed

## [v0.1.0] - 2023-11-28
## [v0.1.0] - 2023-12-13

Initial release.

Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,15 @@
<!--- --8<-- [start:docs] -->
![gtfs_skims](resources/logos/title.png)

# gtfs-skims (gtfs_skims)
# Argo (gtfs_skims)

[![Daily CI Build](https://github.com/arup-group/gtfs_skims/actions/workflows/daily-scheduled-ci.yml/badge.svg)](https://github.com/arup-group/gtfs_skims/actions/workflows/daily-scheduled-ci.yml)
[![Documentation](https://github.com/arup-group/gtfs_skims/actions/workflows/pages/pages-build-deployment/badge.svg?branch=gh-pages)](https://arup-group.github.io/gtfs_skims)

Argo is a library aimed at the fast calculation of generalised time matrices from GTFS files.
By applying appropriate simplifications on the GTFS dataset, the library is able to calculate such matrices at scale.
For example, it was possible to calculate an MSOA-to-MSOA matrix for England and Wales in ~1 hour (with a relatevile large machine).

<!--- --8<-- [end:docs] -->

## Documentation
Expand Down
2 changes: 2 additions & 0 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@

# Installation

Note: this library only supports Unix-based systems (ie Ubuntu/macOS). If you wish to use it on Windows please use the Windows Subsystem for Linux.

## Setting up a user environment

As a `gtfs_skims` user, it is easiest to install using the [mamba](https://mamba.readthedocs.io/en/latest/index.html) package manager, as follows:
Expand Down
37 changes: 37 additions & 0 deletions docs/methodology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Methodology

Argo calculates generalised time matrices between a set of origin and destination points.

Generalised time is defined as follows:

$$
gc = ivt + \beta_{wait} \cdot wait\_time + \beta_{walk} \cdot walk\_time + \beta_{interchange\_penalty} \cdot n\_transfers
$$

Some example values for the leg component weights are:

$$
\beta_{wait} = \beta_{walk} = 2-3
$$

and

$$
\beta_{\text{interchange\_penalty}} = 5 \text{ to } 10 \text{ minutes}
$$

Walk distance is calculated as the crow's fly distance between two points, multiplied by a factor specified in the config file (typically ~1.3).

The library creates a graph representation of the GTFS dataset, where the edges represent vehicle movements or connections (access/egress/transfer legs). It then applied a shortest-paths algorithm, using generalised time as edge weights.

To achieve high performance, the user can limit the search space by:
* selecting a time scope and maximum travel time
* selecting a specific day
* selecting a maximum walk, wait and trasfer time for legs
* applying a spatial bounding box

We further improve performance by:
* using K-dimensional trees to organise spatial data
* using the effiecient graph-tool library to calculate shortest distances
* parallelising the shortest distances calculation, and vectorising data transformation tasks
* saving files to a compressed parquet format
48 changes: 48 additions & 0 deletions docs/run.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Running Argo

To run argo simply type this command on the command line:
```
argo run <CONFIG_PATH>
```
, where <CONFIG_PATH> is the path to the config yaml file.

An example config file is shown below:
```
paths:
path_gtfs: ./tests/test_data/iow-bus-gtfs.zip
path_outputs: ./tests/test_data/outputs
path_origins: ./tests/test_data/centroids.csv # path to the origin points
path_destinations: ./tests/test_data/centroids.csv # path to the destination points

settings:
calendar_date : 20190515 # yyyymmdd | Date for filtering the GTFS file.
start_s : 32400 # sec | Start time of the journey.
end_s : 41400 # sec | Max end time of a journey.
walk_distance_threshold : 2000 # m | Max walk distance in a leg
walk_speed : 4.5 # kph | Walking speed
crows_fly_factor : 1.3 # Conversion factor from euclidean to routed distances
max_transfer_time : 1800 # Max combined time of walking and waiting (sec) of a transfer
max_wait : 1800 # sec | Max wait time at a stop / leg
bounding_box : null
epsg_centroids: 27700 # coordinate system of the centroids file. Needs to be Cartesian and in meters.
weight_walk: 2 # value of walk time, ratio to in-vehicle time
weight_wait: 2 # value of wait time, ratio to in-vehicle time
penalty_interchange: 300 # seconds added to generalised cost for each interchange

steps:
- preprocessing
- connectors
- graph
```

To run the example provided by the repo, use:
```
argo run ./tests/test_data/config_demo.yaml
```

The time matrices will be saved in the `output_path` directory defined in the config file, in the `skims.parquet.gzip` file. An easy way to read the file is with pandas:
```
import pandas as pd
df = pd.read_parquet('<OUTPUT_PATH>/skims.parquet.gzip')
df
```
3 changes: 3 additions & 0 deletions gtfs_skims/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
"""Top-level module for gtfs_skims."""
import pyproj

__author__ = """Theodore-Chatziioannou"""
__email__ = "[email protected]"
__version__ = "0.1.0"

pyproj.network.set_network_enabled(False)
37 changes: 31 additions & 6 deletions gtfs_skims/cli.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,39 @@
"""Console script for gtfs_skims."""

from typing import Optional

import click

from gtfs_skims.connectors import main as main_connectors
from gtfs_skims.graph import main as main_graph
from gtfs_skims.preprocessing import main as main_preprocessing
from gtfs_skims.utils import Config


@click.version_option(package_name="gtfs_skims")
@click.command()
@click.group
def cli(args=None):
"""Console script for gtfs_skims."""
click.echo(
"Replace this message by putting your code into gtfs_skims.cli.cli"
)
click.echo("See click documentation at https://click.palletsprojects.com/")
"""Console script for Argo (gtfs_skims)."""
return 0


@cli.command()
@click.argument("config_path")
@click.option("--output_directory_override", default=None, help="override output directory")
def run(config_path: str, output_directory_override: Optional[str] = None):
config = Config.from_yaml(config_path)
if output_directory_override is not None:
config.path_outputs = output_directory_override
steps = config.steps

gtfs_data = None
connectors_data = None

if "preprocessing" in steps:
gtfs_data = main_preprocessing(config=config)

if "connectors" in steps:
connectors_data = main_connectors(config=config, data=gtfs_data)

if "graph" in steps:
main_graph(config=config, gtfs_data=gtfs_data, connectors_data=connectors_data)
Loading