Skip to content

Commit

Permalink
Automate registering search parameter descriptions (#311)
Browse files Browse the repository at this point in the history
* Automate registering search parameter descriptions

* Move API check out of client and only check if update is needed

* Don't hit DKIST search API during tests

* Use previously saved search api values when running tests

* Add changelog

* Rename changelog

* Move api response location definition to function for easier mocking

* Minor tweaks

* Bring back the 'is test' environment variable

I know it's clunkier than just mocking that function with pytest but I
just can't get it to work because hypothesis tries to import dkist.net
before pytest configures properly so it won't work

* Tweaks to improve test coverage

* Refactor attr value fetching

* prepare-release action

* Add update to attrs json to release workflow

* Some more cleanup

* Bring back support for auto-describing range parameters

* Update dkist/net/attrs_values.py

Co-authored-by: Stuart Mumford <[email protected]>

* fix typo

* Add special case for time

* Update dkist/net/attrs_values.py

---------

Co-authored-by: Stuart Mumford <[email protected]>
  • Loading branch information
SolarDrew and Cadair authored Feb 19, 2024
1 parent a5896dc commit c8091f7
Show file tree
Hide file tree
Showing 9 changed files with 515 additions and 15 deletions.
139 changes: 139 additions & 0 deletions .github/workflows/prepare-release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
name: Prepare release

permissions:
contents: write

on:
workflow_dispatch:
inputs:
version:
description: "Release version (without leading 'v')"
required: true
type: string
name:
description: "Release name"
required: false
type: string

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false

jobs:
update_attrs:
name: Update attr values json
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
lfs: true

- name: Setup Python 3.11
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install dkist
run: |
python -m pip install .
- name: Update attrs
run: |
python -c "from dkist.net.attrs_values import attempt_local_update; attempt_local_update(user_file='./dkist/data/api_search_values.json', silence_errors=False)"
- name: Configure Git
run: |
git config --global user.email "${{ github.actor }}@users.noreply.github.com"
git config --global user.name "${{ github.actor }}"
- name: Commit Attrs
run: |
git commit -m "Update attrs values before release"
- name: Push
run: git push

render_changelog:
name: Update changelog
needs: [update_attrs]
runs-on: ubuntu-latest
outputs:
markdown-changelog: ${{ steps.markdown-changelog.outputs.content }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
lfs: true

- name: Setup Python 3.11
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install towncrier
run: python -m pip install --upgrade towncrier

- name: Run towncrier in draft to capture the output
run: towncrier build --draft --version ${{ inputs.version }} --yes > release-changelog.rst

- name: Debug changelog.rst
run: cat release-changelog.rst

- name: Convert to markdown with pandoc
uses: docker://pandoc/core:2.9
with:
args: >- # allows you to break string into multiple lines
--wrap=none
-t markdown_strict
--output=release-changelog.md
release-changelog.rst
- name: Capture Markdown Changelog
id: markdown-changelog
run: |
{
echo 'content<<EOF'
cat release-changelog.md
echo EOF
} >> "$GITHUB_OUTPUT"
- name: Debug md changelog
run: |
echo "${{ steps.markdown-changelog.outputs.content }}"
- name: Run towncrier
run: |
towncrier build --version ${{ inputs.version }} --yes
- name: Configure Git
run: |
git config --global user.email "${{ github.actor }}@users.noreply.github.com"
git config --global user.name "${{ github.actor }}"
- name: Commit Changelog
run: |
git commit -m "Render changelog for v${{ inputs.version }}"
- name: Push
run: git push

make_release:
name: Make Github Release
runs-on: ubuntu-latest
environment: release
needs: [render_changelog]
steps:
- name: Create GitHub Release
uses: actions/github-script@v7
id: create-release
with:
script: |
let release_name = (("${{ inputs.name }}") ? "v${{ inputs.version }} - ${{ inputs.name }}" : "v${{ inputs.version }}");
return await github.rest.repos.createRelease({
owner: context.repo.owner,
repo: context.repo.repo,
tag_name: "v${{ inputs.version }}",
name: release_name,
body: `${{ needs.make_release_commit.outputs.markdown-changelog }}`
});
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ repos:
- id: mixed-line-ending
files: ".*.py"
- id: end-of-file-fixer
exclude: ".*(.fits|.asdf)"
exclude: ".*(.fits|.asdf|.json)"
- repo: https://github.com/pycqa/flake8
rev: 7.0.0
hooks:
Expand Down
1 change: 1 addition & 0 deletions changelog/311.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Call the DKIST search API to automatically determine valid data search parameters and register those with the Fido client.
1 change: 1 addition & 0 deletions dkist/data/api_search_values.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"parameterValues":[{"parameterName":"createDateMin","values":{"minValue":"2022-12-08T19:07:55.038280","maxValue":"2024-01-23T03:21:27.034961"}},{"parameterName":"createDateMax","values":{"minValue":"2022-12-08T19:07:55.038280","maxValue":"2024-01-23T03:21:27.034961"}},{"parameterName":"endTimeMin","values":{"minValue":"2022-02-23T20:48:55.393500","maxValue":"2023-11-01T20:51:20.287000"}},{"parameterName":"endTimeMax","values":{"minValue":"2022-02-23T20:48:55.393500","maxValue":"2023-11-01T20:51:20.287000"}},{"parameterName":"exposureTimeMin","values":{"minValue":0.037,"maxValue":1380.2332394366197}},{"parameterName":"exposureTimeMax","values":{"minValue":0.037,"maxValue":1380.2332394366197}},{"parameterName":"instrumentNames","values":{"categoricalValues":["VBI","VISP"]}},{"parameterName":"qualityAverageFriedParameterMin","values":{"minValue":0.027724481746640606,"maxValue":2.6520787500175156e+30}},{"parameterName":"qualityAverageFriedParameterMax","values":{"minValue":0.027724481746640606,"maxValue":2.6520787500175156e+30}},{"parameterName":"qualityAveragePolarimetricAccuracyMin","values":{"minValue":0.7556396371714269,"maxValue":0.9845845208228297}},{"parameterName":"qualityAveragePolarimetricAccuracyMax","values":{"minValue":0.7556396371714269,"maxValue":0.9845845208228297}},{"parameterName":"startTimeMin","values":{"minValue":"2022-02-23T19:05:32.338002","maxValue":"2023-11-01T19:53:02.868500"}},{"parameterName":"startTimeMax","values":{"minValue":"2022-02-23T19:05:32.338002","maxValue":"2023-11-01T19:53:02.868500"}},{"parameterName":"targetTypes","values":{"categoricalValues":["quietsun","unknown","sunspot"]}},{"parameterName":"averageDatasetSpectralSamplingMin","values":{"minValue":0.000540156130946172,"maxValue":0.001631075310766238}},{"parameterName":"averageDatasetSpectralSamplingMax","values":{"minValue":0.000540156130946172,"maxValue":0.001631075310766238}},{"parameterName":"averageDatasetSpatialSamplingMin","values":{"minValue":0.0,"maxValue":12388.04306084}},{"parameterName":"averageDatasetSpatialSamplingMax","values":{"minValue":0.0,"maxValue":12388.04306084}},{"parameterName":"averageDatasetTemporalSamplingMin","values":{"minValue":9.139999999997528,"maxValue":5263.145059399399}},{"parameterName":"averageDatasetTemporalSamplingMax","values":{"minValue":9.139999999997528,"maxValue":5263.145059399399}},{"parameterName":"highLevelSoftwareVersion","values":{"categoricalValues":["Pono_2.1.0","Pono_1.0.0","Alakai_5-1","Pono_3.1.0","Alakai_3-0","Alakai_4-0","Alakai_11.1.0","Alakai_6-0","Alakai_8-0","Alakai_10-0","Alakai_7-0"]}},{"parameterName":"workflowName","values":{"categoricalValues":["l0_to_l1_vbi_summit-calibrated","l0_to_l1_visp"]}},{"parameterName":"workflowVersion","values":{"categoricalValues":["1.4.11","2.10.1","2.0.2","2.7.3","1.4.1","1.1.5","1.2.0","2.10.2","2.7.4","2.6.1","1.2.1","2.7.5","1.1.7","2.0.1","0.16.0","1.4.8","2.9.0","2.3.1","2.3.0","2.10.0","1.1.10","2.7.2","1.0.0","2.7.0"]}},{"parameterName":"headerDataUnitCreationDateMin","values":{"minValue":"2022-12-08T17:25:51.965000","maxValue":"2024-01-23T03:17:38.126000"}},{"parameterName":"headerDataUnitCreationDateMax","values":{"minValue":"2022-12-08T17:25:51.965000","maxValue":"2024-01-23T03:17:38.126000"}},{"parameterName":"headerVersion","values":{"categoricalValues":["3.6.0","4.0.0","3.3.0","3.0.0","3.4.0","3.9.0","3.5.0","3.7.1","3.8.1"]}}]}
10 changes: 7 additions & 3 deletions dkist/net/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@
"""
import dkist.config as _config

from .client import DKISTClient
from .helpers import transfer_complete_datasets

__all__ = ["DKISTClient", "conf", "transfer_complete_datasets"]


Expand All @@ -27,5 +24,12 @@ class Conf(_config.ConfigNamespace):
dataset_path = _config.ConfigItem("/{bucket}/{primaryProposalId}/{datasetId}",
"The path template to a dataset on the main endpoint.")

attr_max_age = _config.ConfigItem(7,
"The number of days beyond which to refresh search attr values from the Data Center")


conf = Conf()

# Put imports after conf so that conf is initalized before import
from .client import DKISTClient # noqa
from .helpers import transfer_complete_datasets # noqa
176 changes: 176 additions & 0 deletions dkist/net/attrs_values.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
"Functions for working with the net submodule"
import json
import urllib
import datetime as dt
import importlib.resources
from pathlib import Path

import platformdirs

from sunpy.net import attrs as sattrs

import dkist.data
from dkist import log
from dkist.net import attrs as dattrs
from dkist.net import conf as net_conf

__all__ = ["attempt_local_update", "get_search_attrs_values"]

# Map keys in dataset inventory to Fido attrs
INVENTORY_ATTR_MAP = {
# Only categorical data are supported currently
"categorical": {
"instrumentNames": sattrs.Instrument,
"targetTypes": dattrs.TargetType,
"workflowName": dattrs.WorkflowName,
"workflowVersion": dattrs.WorkflowVersion,
"headerVersion": dattrs.HeaderVersion,
"highLevelSoftwareVersion": dattrs.SummitSoftwareVersion,
},
"range": {
"averageDatasetSpatialSampling": dattrs.SpatialSampling,
"averageDatasetSpectralSampling": dattrs.SpectralSampling,
"averageDatasetTemporalSampling": dattrs.TemporalSampling,
"exposureTime": dattrs.ExposureTime,
"qualityAverageFriedParameter": dattrs.FriedParameter,
}
}


def _get_file_age(path: Path) -> dt.timedelta:
last_modified = dt.datetime.fromtimestamp(path.stat().st_mtime)
now = dt.datetime.now()
return (now - last_modified)


def _get_cached_json() -> list[Path, bool]:
"""
Return the path to a local copy of the JSON file, and if the file should be updated.
If a user-local copy has been downloaded that will always be used.
"""
package_file = importlib.resources.files(dkist.data) / "api_search_values.json"
user_file = platformdirs.user_data_path("dkist") / "api_search_values.json"

return_file = package_file
if user_file_exists := user_file.exists():
return_file = user_file

update_needed = False
if not user_file_exists:
update_needed = True
if user_file_exists and _get_file_age(return_file) > dt.timedelta(days=net_conf.attr_max_age):
update_needed = True

return return_file, update_needed


def _fetch_values_to_file(filepath: Path, *, timeout: int = 1):
data = urllib.request.urlopen(
net_conf.dataset_endpoint + net_conf.dataset_search_values_path, timeout=timeout
)
with open(filepath, "wb") as f:
f.write(data.read())


def attempt_local_update(*, timeout: int = 1, user_file: Path = None, silence_errors: bool = True) -> bool:
"""
Attempt to update the local data copy of the values.
Parameters
----------
timeout
The number of seconds to wait before timing out an update request. This
is set low by default because this code is run at import of
``dkist.net``.
user_file
The file to save the updated attrs JSON to. If `None` platformdirs will
be used to get the user data path.
silence_errors
If `True` catch all errors in this function.
Returns
-------
success
`True` if the update succeeded or `False` otherwise.
"""
if user_file is None:
user_file = platformdirs.user_data_path("dkist") / "api_search_values.json"
user_file = Path(user_file)
user_file.parent.mkdir(exist_ok=True, parents=True)

log.info(f"Fetching updated search values for the DKIST client to {user_file}")

success = False
try:
_fetch_values_to_file(user_file, timeout=timeout)
success = True
except Exception as err:
log.error("Failed to download new attrs values.")
log.debug(str(err))
# If an error has occured then remove the local file so it isn't
# corrupted or invalid.
user_file.unlink(missing_ok=True)
if not silence_errors:
raise

return success

# Test that the file we just saved can be parsed as json
try:
with open(user_file, "r") as f:
json.load(f)
except Exception:
log.error("Downloaded file is not valid JSON.")
user_file.unlink(missing_ok=True)
if not silence_errors:
raise
success = False

return success


def get_search_attrs_values(*, allow_update: bool = True, timeout: int = 1) -> dict:
"""
Return the search values, updating if needed.
Parameters
----------
allow_update
Allow fetching updated values from the DKIST data center if they haven't
been updated in the configured amount of time (7 days by default).
timeout
The number of seconds to wait before timing out an update request. This
is set low by default because this code is run at import of
``dkist.net``.
Returns
-------
attr_values
Return a transformed version of the loaded attr values from the DKIST
data center.
"""
local_path, update_needed = _get_cached_json()
if allow_update and update_needed:
attempt_local_update(timeout=timeout)

if not update_needed:
log.debug("No update to attr values needed.")
log.debug("Using attr values from %s", local_path)

with open(local_path, "r") as f:
search_values = json.load(f)

search_values = {param["parameterName"]: param["values"] for param in search_values["parameterValues"]}

attr_values = {}
for key, attr in INVENTORY_ATTR_MAP["categorical"].items():
attr_values[attr] = [(name, "") for name in search_values[key]["categoricalValues"]]

for key, attr in INVENTORY_ATTR_MAP["range"].items():
attr_values[attr] = [("all", f"Value between {search_values[key+'Min']['minValue']:.5f} and {search_values[key+'Max']['maxValue']:.5f}")]

# Time - Time attr allows times in the full range but start and end time are given separately by the DKIST API
attr_values[sattrs.Time] = [("time", f"Min: {search_values['startTimeMin']['minValue']} - Max: {search_values['endTimeMax']['maxValue']}.")]

return attr_values
14 changes: 5 additions & 9 deletions dkist/net/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
QueryResponseTable, convert_row_to_table)
from sunpy.util.net import parse_header

from dkist.net.attrs_values import get_search_attrs_values
from dkist.utils.inventory import INVENTORY_KEY_MAP

from . import attrs as dattrs
Expand Down Expand Up @@ -266,23 +267,18 @@ def register_values(cls):
"""
Known search values for DKIST data, currently manually specified.
"""
return {
return_values = {
sattrs.Provider: [("DKIST", "Data provided by the DKIST Data Center")],
# instrumentNames
sattrs.Instrument: [("VBI", "Visible Broadband Imager"),
("VISP", "Visible Spectro-Polarimeter"),
("VTF", "Visible Tunable Filter"),
("Cryo-NIRSP", "Cryogenic Near Infrared SpectroPolarimiter"),
("DL-NIRSP", "Diffraction-Limited Near-InfraRed Spectro-Polarimeter")],

# hasAllStokes
sattrs.Physobs: [("stokes_parameters", "Stokes I, Q, U and V are provided in the dataset"),
("intensity", "Only Stokes I is provided in the dataset.")],
# isEmbargoed
dattrs.Embargoed: [("True", "Data is subject to access restrictions."),
("False", "Data is not subject to access restrictions.")],
# targetTypes
#dattrs.TargetType: [], # This should be a controlled list.

# Completeness
sattrs.Level: [("1", "DKIST data calibrated to level 1.")],
}

return {**return_values, **get_search_attrs_values()}
Loading

0 comments on commit c8091f7

Please sign in to comment.