Skip to content

Commit

Permalink
Merge pull request #9 from mraspaud/add-dhus
Browse files Browse the repository at this point in the history
Add a watcher for DHuS instances
  • Loading branch information
mraspaud authored May 8, 2024
2 parents b42fd4f + 3f7d5e3 commit f00e529
Show file tree
Hide file tree
Showing 18 changed files with 679 additions and 182 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
python -m pip install --upgrade pip
python -m pip install ruff pytest pytest-cov freezegun responses
python -m pip install git+https://github.com/gorakhargosh/watchdog
python -m pip install -e .[local,minio,publishing,ssh,dataspace]
python -m pip install -e .[local,minio,publishing,ssh,dataspace,datastore,dhus]
- name: Lint with ruff
run: |
ruff check .
Expand Down
27 changes: 27 additions & 0 deletions docs/source/backends.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Available backends
==================

Local watcher
-------------
.. automodule:: pytroll_watchers.local_watcher
:members:

Minio bucket notification watcher
---------------------------------
.. automodule:: pytroll_watchers.minio_notification_watcher
:members:

Copernicus dataspace watcher
----------------------------
.. automodule:: pytroll_watchers.dataspace_watcher
:members:

EUMETSAT datastore watcher
--------------------------
.. automodule:: pytroll_watchers.datastore_watcher
:members:

DHuS watcher
------------
.. automodule:: pytroll_watchers.dhus_watcher
:members:
21 changes: 21 additions & 0 deletions docs/source/cli.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
CLI
***

The command-line tool can be used by invoking `pytroll-watcher <config-file>`. An example config-file can be::

backend: minio
fs_config:
endpoint_url: my_endpoint.pytroll.org
bucket_name: satellite-data-viirs
storage_options:
profile: profile_for_credentials
publisher_config:
name: viirs_watcher
message_config:
subject: /segment/viirs/l1b/
atype: file
data:
sensor: viirs
aliases:
platform_name:
npp: Suomi-NPP
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

extensions = ["sphinx.ext.napoleon", "sphinx.ext.autodoc"]
autodoc_mock_imports = ["watchdog", "minio", "posttroll", "pytest", "trollsift", "universal_path",
"freezegun", "responses", "oauthlib", "requests_oauthlib"]
"freezegun", "responses", "oauthlib", "requests_oauthlib", "defusedxml"]

templates_path = ["_templates"]
exclude_patterns = []
Expand Down
123 changes: 5 additions & 118 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,126 +10,13 @@ Welcome to pytroll-watchers's documentation!
:maxdepth: 2
:caption: Contents:

Pytroll-watcher is a library and command-line tool to detect changes on a local or remote file system.

At the moment we support local filesystems and Minio S3 buckets through bucket notifications.

CLI
***

The command-line tool can be used by invoking `pytroll-watcher <config-file>`. An example config-file can be::

backend: minio
fs_config:
endpoint_url: my_endpoint.pytroll.org
bucket_name: satellite-data-viirs
storage_options:
profile: profile_for_credentials
publisher_config:
name: viirs_watcher
message_config:
subject: /segment/viirs/l1b/
atype: file
data:
sensor: viirs
aliases:
platform_name:
npp: Suomi-NPP

Published messages
******************

The published messages will contain information on how to access the resource advertized. The following parameters will
be present in the message.

uid
---

This is the unique identifier for the resource. In general, it is the basename for the file/objects, since we assume
that two files with the same name will have the same content. In some cases it can include the containing directory.

Examples of uids:

- `SVM13_npp_d20240408_t1006227_e1007469_b64498_c20240408102334392250_cspp_dev.h5`
- `S3B_OL_1_EFR____20240415T074029_20240415T074329_20240415T094236_0179_092_035_1620_PS2_O_NR_003.SEN3/Oa02_radiances.nc`

uri
---

This is the URI that can be used to access the resource. The URI can be composed as fsspec allows for more complex cases.

Examples of uris:

- `s3://viirs-data/sdr/SVM13_npp_d20240408_t1006227_e1007469_b64498_c20240408102334392250_cspp_dev.h5`
- `zip://sdr/SVM13_npp_d20240408_t1006227_e1007469_b64498_c20240408102334392250_cspp_dev.h5::s3://viirs-data/viirs_sdr_npp_d20240408_t1006227_e1007469_b64498.zip`
- `https://someplace.com/files/S3B_OL_1_EFR____20240415T074029_20240415T074329_20240415T094236_0179_092_035_1620_PS2_O_NR_003.SEN3/Oa02_radiances.nc`

filesystem
----------

Sometimes the URI is not enough to gain access to the resource, for example when the hosting service requires
authentification. This is why pytroll-watchers with also provide the filesystem and the path items. The filesystem
parameter is the fsspec json representation of the filesystem. This can be used on the recipient side using eg::

fsspec.AbstractFileSystem.from_json(json.dumps(fs_info))
cli
published
backends
other_api

where `fs_info` is the content of the filesystem parameter.

To pass authentification parameters to the filesystem, use the `storage_options` configuration item.


Example of filesystem:

- `{"cls": "s3fs.core.S3FileSystem", "protocol": "s3", "args": [], "profile": "someprofile"}`

.. warning::

Pytroll-watchers tries to prevent publishing of sensitive information such as passwords and secret keys, and will
raise an error in most cases when this is done. However, always double-check your pytroll-watchers configuration so
that secrets are not passed to the library to start with.
Solutions include ssh-agent for ssh-based filesystems, storing credentials in .aws config files for s3 filesystems.
For http-based filesystems implemented in pytroll-watchers, the username and password are used to generate a token
prior to publishing, and will thus not be published.

path
----

This parameter is the companion to `filesystem` and gives the path to the resource within the filesystem.

Examples of paths:

- `/viirs-data/sdr/SVM13_npp_d20240408_t1006227_e1007469_b64498_c20240408102334392250_cspp_dev.h5`
- `/sdr/SVM13_npp_d20240408_t1006227_e1007469_b64498_c20240408102334392250_cspp_dev.h5`
- `/files/S3B_OL_1_EFR____20240415T074029_20240415T074329_20240415T094236_0179_092_035_1620_PS2_O_NR_003.SEN3/Oa02_radiances.nc`


API
***

Main interface
--------------
.. automodule:: pytroll_watchers
:members:

Local watcher
-------------
.. automodule:: pytroll_watchers.local_watcher
:members:

Minio bucket notification watcher
---------------------------------
.. automodule:: pytroll_watchers.minio_notification_watcher
:members:

Copernicus dataspace watcher
---------------------------------
.. automodule:: pytroll_watchers.dataspace_watcher
:members:
Pytroll-watcher is a library and command-line tool to detect changes on a local or remote file system.

Testing utilities
-----------------
.. automodule:: pytroll_watchers.testing
:members:

Indices and tables
==================
Expand Down
12 changes: 12 additions & 0 deletions docs/source/other_api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Common API
**********

Main interface
--------------
.. automodule:: pytroll_watchers
:members:

Testing utilities
-----------------
.. automodule:: pytroll_watchers.testing
:members:
83 changes: 83 additions & 0 deletions docs/source/published.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
Published messages
******************

The published messages will contain information on how to access the resource advertized. The following parameters will
be present in the message.

Resource location information
=============================

uid
---

This is the unique identifier for the resource. In general, it is the basename for the file/objects, since we assume
that two files with the same name will have the same content. In some cases it can include the containing directory.

Examples of uids:

- `SVM13_npp_d20240408_t1006227_e1007469_b64498_c20240408102334392250_cspp_dev.h5`
- `S3B_OL_1_EFR____20240415T074029_20240415T074329_20240415T094236_0179_092_035_1620_PS2_O_NR_003.SEN3/Oa02_radiances.nc`

uri
---

This is the URI that can be used to access the resource. The URI can be composed as fsspec allows for more complex cases.

Examples of uris:

- `s3://viirs-data/sdr/SVM13_npp_d20240408_t1006227_e1007469_b64498_c20240408102334392250_cspp_dev.h5`
- `zip://sdr/SVM13_npp_d20240408_t1006227_e1007469_b64498_c20240408102334392250_cspp_dev.h5::s3://viirs-data/viirs_sdr_npp_d20240408_t1006227_e1007469_b64498.zip`
- `https://someplace.com/files/S3B_OL_1_EFR____20240415T074029_20240415T074329_20240415T094236_0179_092_035_1620_PS2_O_NR_003.SEN3/Oa02_radiances.nc`


filesystem
----------

Sometimes the URI is not enough to gain access to the resource, for example when the hosting service requires
authentification. This is why pytroll-watchers with also provide the filesystem and the path items. The filesystem
parameter is the fsspec json representation of the filesystem. This can be used on the recipient side using eg::

fsspec.AbstractFileSystem.from_json(json.dumps(fs_info))

where `fs_info` is the content of the filesystem parameter.

To pass authentification parameters to the filesystem, use the `storage_options` configuration item.


Example of filesystem:

- `{"cls": "s3fs.core.S3FileSystem", "protocol": "s3", "args": [], "profile": "someprofile"}`

.. warning::

Pytroll-watchers tries to prevent publishing of sensitive information such as passwords and secret keys, and will
raise an error in most cases when this is done. However, always double-check your pytroll-watchers configuration so
that secrets are not passed to the library to start with.
Solutions include ssh-agent for ssh-based filesystems, storing credentials in .aws config files for s3 filesystems.
For http-based filesystems implemented in pytroll-watchers, the username and password are used to generate a token
prior to publishing, and will thus not be published.

path
----

This parameter is the companion to `filesystem` and gives the path to the resource within the filesystem.

Examples of paths:

- `/viirs-data/sdr/SVM13_npp_d20240408_t1006227_e1007469_b64498_c20240408102334392250_cspp_dev.h5`
- `/sdr/SVM13_npp_d20240408_t1006227_e1007469_b64498_c20240408102334392250_cspp_dev.h5`
- `/files/S3B_OL_1_EFR____20240415T074029_20240415T074329_20240415T094236_0179_092_035_1620_PS2_O_NR_003.SEN3/Oa02_radiances.nc`

Other metadata
==============

Other metadata items are provided when possible:

* boundary: the geojson boundary of the data
* platform_name
* sensor
* orbit_number
* start_time
* end_time
* product_type
* checksum
11 changes: 10 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,21 @@ description = "Utility functions and scripts to watch for new files on local or
authors = [
{ name = "Martin Raspaud", email = "[email protected]" }
]
dependencies = ["universal-pathlib", "trollsift", "pyyaml"]
dependencies = ["universal-pathlib", "trollsift", "pyyaml", "geojson"]
readme = "README.md"
requires-python = ">= 3.10"
license = {file = "LICENSE.txt"}

[project.scripts]
pytroll-watcher = "pytroll_watchers.main_interface:cli"

[project.entry-points."pytroll_watchers.backends"]
local = "pytroll_watchers.local_watcher"
minio = "pytroll_watchers.minio_notification_watcher"
dataspace = "pytroll_watchers.dataspace_watcher"
datastore = "pytroll_watchers.datastore_watcher"
dhus = "pytroll_watchers.dhus_watcher"

[project.urls]
"Documentation" = "https://pytroll-watchers.readthedocs.io/en/latest/"

Expand All @@ -22,6 +29,8 @@ local = ["watchdog"]
publishing = ["posttroll"]
ssh = ["paramiko"]
dataspace = ["oauthlib", "requests_oauthlib", "s3fs"]
datastore = ["oauthlib", "requests_oauthlib"]
dhus = ["defusedxml"]

[build-system]
requires = ["hatchling", "hatch-vcs"]
Expand Down
32 changes: 32 additions & 0 deletions src/pytroll_watchers/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""Collection of function needed by multiple watchers."""

import datetime
import time


def run_every(interval):
"""Generator that ticks every `interval`.
Args:
interval: the timedelta object giving the amount of time to wait between ticks. An interval of 0 will just make
tick once, then return (and thus busy loops aren't allowed).
Yields:
The time of the next tick.
"""
while True:
next_check = datetime.datetime.now(datetime.timezone.utc) + interval
yield next_check
to_wait = max(next_check.timestamp() - time.time(), 0)
time.sleep(to_wait)
if not interval: # interval is 0
break


def fromisoformat(datestring):
"""Wrapper around datetime's fromisoformat that also works on python 3.10."""
try:
return datetime.datetime.fromisoformat(datestring)
except ValueError:
# for python 3.10
return datetime.datetime.strptime(datestring, "%Y-%m-%dT%H:%M:%S.%f%z")
Loading

0 comments on commit f00e529

Please sign in to comment.