Skip to content

Commit

Permalink
Allow typed metadata on upload (#3566)
Browse files Browse the repository at this point in the history
* Allow typed metadata on upload

PBENCH-1283

Normally metadata keys are set through `PUT /datasets/<id>/metadata` using an
`application/json` request body, allowing the full expressiveness of JSON,
specifically to specify sub-objects, integers, floating point numbers, and
boolean literals.

I added the `?metadata` query parameter on `PUT /upload/<name>` to allow a
client to specify metadata during the upload instead of waiting; however the
nature of the HTTP interface constrains these to text strings, which can be
limiting.

This PR generalizes the typed "metadata expression" syntax developed to
support generalized metadata `?filter` expressions on `GET /datasets` to
allow both quoting value strings and typing them. For example, one could now
set a JSON value using

```
?metadata=global.legacy.pbench:'{"host":"agent","migrate":true}':json
```

We support `str`, `int`, `float`, `bool`, and `json` types; e.g.,
`?metadata=global.tester.run_id:100:int,global.tester.good:true:bool`
  • Loading branch information
dbutenhof authored Oct 27, 2023
1 parent db449be commit 805ca7c
Show file tree
Hide file tree
Showing 11 changed files with 465 additions and 209 deletions.
2 changes: 1 addition & 1 deletion contrib/server/operations/pbench-upload-results.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def upload(
uploaded = datetime.datetime.fromtimestamp(
tarball.stat().st_mtime, tz=datetime.timezone.utc
)
meta = [f"global.server.legacy.migrated:{uploaded:%Y-%m-%dT%H:%M}"]
meta = [f"global.server.legacy.migrated:'{uploaded:%Y-%m-%dT%H:%M}'"]
if "::" in tarball.parent.name:
satellite, _ = tarball.parent.name.split("::", 1)
meta.append(f"server.origin:{satellite}")
Expand Down
124 changes: 124 additions & 0 deletions docs/Server/API/V1/metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# `GET|PUT /api/v1/datasets/<dataset>/metadata`

This API sets or retrieves metadata for the identified dataset. For `GET` you
specify a list of metadata keys with the `?metadata` query parameter; for `PUT`
you specify an `application/json` request body to specify a set of keys and
values.

## URI parameters

`<dataset>` string \
The resource ID of a dataset on the Pbench Server.

## Query parameters

`metadata` (`GET` only) \
A list of metadata keys to retrieve. For example, `?metadata=dataset,global,server,user`
will retrieve all metadata values, each namespace as a nested JSON object. (This can
be a lot of data, and is generally not recommended.)

The metadata query string `?metadata=dataset.name,dataset.access,server` will return
an `application/json` response something like this:

```json
{
"dataset.access": "public",
"dataset.name": "uperf__2023.08.21T15.09.46",
"server": {
"benchmark": "uperf",
"deletion": "2025-08-21",
"tarball-path": "<internal path>"
}
}
```
## Request body

For `PUT`, specify the keys and values in an `application/json` request body
under the `"metadata"` field, like this:

```json
{
"metadata": {
"dataset.name": "I shall call you squishie",
"server.deletion": "2024-12-13",
"global.pbench": {
"tag": "ABC",
"version": 1.0
}
}
}
```

## Request headers

`authorization: bearer` token \
*Bearer* schema authorization is required to update a dataset.
E.g., `authorization: bearer <token>`

## Response headers

`content-type: application/json` \
The return is a serialized JSON object with with the retrieved metadata key and
value pairs.

## Resource access

* `GET` requires `READ` access to the `<dataset>` resource, while `PUT` requires
`UPDATE` access to the `<dataset>` resource.

See [Access model](../access_model.md)

## Response status

`200` **OK** \
Successful request.

`401` **UNAUTHORIZED** \
The client is not authenticated for a `PUT` call.

`403` **FORBIDDEN** \
The authenticated client does not have `READ` access (for `GET`) or `UPDATE`
access (for `PUT`) to the specified dataset.

`404` **NOT FOUND** \
The `<dataset>` resource ID does not exist.

`503` **SERVICE UNAVAILABLE** \
The server has been disabled using the `server-state` server configuration
setting in the [server configuration](./server_config.md) API. The response
body is an `application/json` document describing the current server state,
a message, and optional JSON data provided by the system administrator.

## Response body

The `application/json` response shows the referenced metadata key values.

For `GET`, these are the keys you specified with the `?metadata`
query parameter.

For `PUT`, the actual metadata values you set are returned, along with a
possible map of errors. In general these are exactly what you set, but
some like `server.archiveonly` and `server.deletion` may be normalized
during validation. For example, for

```
PUT /api/v1/datasets/<resource_id>/metadata
{
"metadata": {
"server.archiveonly": "true",
"server.deletion": "2023-12-25T15:43"
}
}
```

The response might be:

```json
{
"errors": {},
"metadata": {
"server.archiveonly": true,
"server.deletion": "2023-12-26"
}
}
```
21 changes: 21 additions & 0 deletions docs/Server/API/V1/upload.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,27 @@ In particular the client can set any of:

For example, `?metadata=server.archiveonly:true,global.project:oidc`

__Typed metadata__

When you set metadata dynamically using `PUT /datasets/<id>/metadata`, you
specify a JSON value for each key, so defining typed metadata is implicit in
the JSON interpretation.

When using the `?metadata` query parameter, you're limited to writing strings,
and it's not quite so straightfoward. The string can contain type information
to compensate for the limitation. Each `?metadata` string is a comma-separated
list of "metadata expressions" of the form "<key>:<value>[:<type>]". If the
":<type>" is omitted, type is assumed to be "str". For example, you can specify
integer metadata values using ":int" (`global.mine.count:1:int`).

| Type | Description |
| ---- | ----------- |
| `str` | (default) The value is a string. If you want to include "," or ":" characters, you can quote the value using matched (and potentially nested) single and double quote characters. For example `<key>:'2023-10-01:10:23':str` will set the specified key to the value `2023-10-01:10:23`. |
| `bool` | The value is a (JSON format) boolean, `true` or `false`. For example `<key>:true:bool` will set the specified key to the boolean value `true`. |
| `int` | The value is an integer. For example `<key>:1:int` will set the specified key to the integer value 1. |
| `float` | The value is a floating point number. For example `<key>:1.0:float` will set the specified key to the floating point value 1.0. |
| `json` | The value is a quoted serialized JSON object representation. For example, `<key>:'{"str": "string", "int": 1, "bool": true}':json` will set the specified key to the JSON object `{"str": "string", "int": 1, "bool": true}` |

## Request headers

`authorization: bearer` token \
Expand Down
Loading

0 comments on commit 805ca7c

Please sign in to comment.