Refactor pg-backup-api code so it is easier to introduce new operations #83

barthisrael · 2023-08-15T12:44:32Z

This PR is an attempt of making it easier to extend the operations in the pg-backup-api.

Take note that this PR breaks compatibility because of these changes:

/servers/server_name/operations endpoint used to return a list of operation IDs. From now on it will return a list of dictionaries, each of them containing a couple keys: id (operation ID) and type (operation type).
/servers/server_name/operations/operation_id endpoint will return operation_id and status instead of recovery_id and status;
create-operation is removed from the argument parser of server_operation.py, command which was broken before this PR. However, that is not really exposed to users. You can only run that argument parser if invoking the module with python /path/to/server_operation.py, which is not done by pg-backup-api use run.

You can find bellow the full list of refactoring changes that have been performed:

Changes to server_operation module:
- Renamed Metadata class as OperationServer, and extended its functionalities so it is able to handle the management of operations for a given Barman Server, like:
  - Create required directories for storing job and output files;
  - Create the required files for job and output, respecting the expected minimum set of keys;
  - Read the contents of job and output files;
  - Get status of operations;
  - Get list of operations in this server;
- Create a superclass named Operation, which has logic for dealing with a single operation and "talking" with the OperationServer. Each subclass of this class defines an operation exposed by the pg-backup-api, and the logic to actually run the operation. The Operation class hides all the details from the subclass on how to deal with the OperationServer;
- Create the class RecoveryOperation, which is a subclass of Operation, and defines the logic for running a recovery operation. It takes care of dealing with validation of required arguments instead of relying on functions spread over the pg-backup-api code. It also takes care of building and running the barman recover command, so the other modules can simply call run instead of doing the job which should be performed by this class. This class was created based on logic found spread over modules of pg-backup-api, and based on ServerOperation class, which is now extinct;
- Define a set of custom exceptions to be raised instead of raising a general Exception or relying on implicit exception raising of things like KeyError:
  - OperationServerConfigError: error in the configuration of OperationServer;
  - MalformedContent: content of job or output file is not as expected;
  - OperationNotExists: trying to query information about a non-existing operation;
- Removed the command-line option create-operation, which was broken before this PR;
Changes to run module:
- Changed run module so it doesn't do job that should be internal to the new RecoveryOperation class, like running building the list of arguments for barman recover and running the command itself. The recovery logic should be a blackbox for the run module;
Changes to utils module:
- Changed utils module so it doesn't define the supported arguments for a recovery operation. Again, that job should be performed by the class implementing a recovery operation (RecoveryOperation);
Changes to the REST API:
- Changed /servers/server_name/operations/operation_id endpoint so it returns operation_id and status instead of recovery_id and status. If we are to support more operations, this should be generic and not tied to recovery operations;
- Changed /servers/server_name/operations endpoint so it is able to handle different JSON bodies depending on the operation that is being performed. Also, it now relies on an explicit MalformedContent being triggered by the object function instead of relying on an implicit KeyError to detect missing required arguments;
- Changed /servers/server_name/operations endpoint so it returns a list of dictionaries, containing the operation ID and type, instead of a simple list of operation IDs;
  - type: used to filter operation types being returned by the request. If omitted, return operations of any type, i.e., no filter is applied.

References: BAR-94.

This commit is an attempt of making it easier to extend the operations in the pg-backup-api. The following changes have been performed: * Changes to `server_operation` module: * Renamed `Metadata` class as `OperationServer`, and extended its functionalities so it is able to handle the management of operations for a given Barman Server, like: * Create required directories for storing job and output files; * Create the required files for job and output, respecting the expected minimum set of keys; * Read the contents of job and output files; * Get status of operations; * Get list of operations in this server; * Create a superclass named `Operation`, which has logic for dealing with a single operation and "talking" with the `OperationServer`. Each subclass of this class defines an operation exposed by the pg-backup-api, and the logic to actually run the operation. The `Operation` class hides all the details from the subclass on how to deal with the `OperationServer`; * Create the class `RecoveryOperation`, which is a subclass of `Operation`, and defines the logic for running a recovery operation. It takes care of dealing with validation of required arguments instead of relying on functions spread over the pg-backup-api code. It also takes care of building and running the `barman recover` command, so the other modules can simply call `run` instead of doing the job which should be performed by this class. This class was created based on logic found spread over modules of `pg-backup-api`, and based on `ServerOperation` class, which is now extinct; * Define a set of custom exceptions to be raised instead of raising a general `Exception` or relying on implicit exception raising of things like `KeyError`: * `OperationServerConfigError`: error in the configuration of `OperationServer`; * `MalformedContent`: content of job or output file is not as expected; * `OperationNotExists`: trying to query information about a non-existing operation; * `OperationAlreadyRun`: triggered if trying to run the same job twice or more; * Removed the command-line option `create-operation`, which was broken before this commit; * Changes to `run` module: * Changed `run` module so it doesn't do job that should be internal to the new `RecoveryOperation` class, like running building the list of arguments for `barman recover` and running the command itself. The recovery logic should be a blackbox for the `run` module; * Changes to `utils` module: * Changed `utils` module so it doesn't define the supported arguments for a recovery operation. Again, that job should be performed by the class implementing a recovery operation (`RecoveryOperation`); * Changes to the REST API: * Changed `/servers/server_name/operations/operation_id` endpoint so it returns `operation_id` and `status` instead of `recovery_id` and `status`. If we are to support more operations, this should be generic and not tied to recovery operations; * Changed `/servers/server_name/operations` endpoint so it is able to handle different JSON bodies depending on the operation that is being performed. Also, it now relies on an explicit `MalformedContent` being triggered by the object function instead of relying on an implicit `KeyError` to detect missing required arguments; * Changed `/servers/server_name/operations` endpoint so it returns not only a list of operation IDs, but also their type. Each item in the list is now a dictionary with two keys (`type` and `id`) instead of a simple string containing the operation ID; References: BAT-94. Signed-off-by: Israel Barth Rubio <[email protected]>

Fix the following bugs: * Not handling the case when `op_type` is `None` in `get_operations_list` * Use `read_job_file` instead of `_read_file` in `get_operations_list` * `id` argument of `Operation` could not be `None` * `datetime` was being wrongly referenced * `_run_subprocess` was not a static method * `_run_subprocess` was returning `stout.decode` function instead of its output Besides that, made a couple changes: * Added a note that `run` should only be called once, instead of trying to automatically check that a job was already executed; * Removed the check for "invalid arguments" in `_validate_job_content`. If one passes invalid arguments they will not be considered by `_run_logic`. References: BAR-94. Signed-off-by: Israel Barth Rubio <[email protected]>

Also fixes a couple bugs found based on unit tests execution. Signed-off-by: Israel Barth Rubio <[email protected]>

Signed-off-by: Israel Barth Rubio <[email protected]>

Also fixes a couple bugs found based on unit tests execution. Signed-off-by: Israel Barth Rubio <[email protected]>

Signed-off-by: Israel Barth Rubio <[email protected]>

gonzalemario · 2023-08-15T15:59:24Z

Could you please squash all in one commit? Or if you want to split, I'd have at most 2.

Refactor pg-backup-api code so it is easier to introduce new operations: f96cba5 and 056179c
Add unit tests for Operation and RecoveryOperation: with the rest of all of commits

barthisrael · 2023-08-15T20:23:22Z

Could you please squash all in one commit? Or if you want to split, I'd have at most 2.

Refactor pg-backup-api code so it is easier to introduce new operations: f96cba5 and 056179c

Add unit tests for Operation and RecoveryOperation: with the rest of all of commits

Do you want to squash them before merging?
An alternative would be to use the "squash and merge" option later. We would keep the history in the PR, but squash them into a single commit when merging into main.
What do you think about that?
The result would be the same, it is just that the squashing would be done at merge time instead of now.

mikewallace1979 · 2023-08-23T16:08:29Z

pg_backup_api/pg_backup_api/logic/utility_controller.py

-        response = {"recovery_id": operation_id, "status": status}
+        op_server = OperationServer(server_name)
+        status = op_server.get_operation_status(operation_id)
+        response = {"operation_id": operation_id, "status": status}


@gonzalemario Does this change actually affect repmgr? As far as I can tell repmgr only uses the POST response to this endpoint which is already returning an operation_id.

No, it doesn't. I thought the same when I first read it. That specific code is indeed changing the response repmgr receives but when we parse it in the callback[1], we just look for the status field, not for recovery_id (which it's called operation_id now). So we almost broke repmgr's standby creation though the API but because the client ignores other fields, we're safe.

In pg-backup-api's we've got 2 endpoints that receive POST data, this one checks if the previous recovery operation was completed or not. The other endpoint creates the recovery task but it's not used by servers_operation_id_get() method.

[1] size_t receive_operation_status(void *content, size_t size, size_t nmemb, char *buffer);

Right!

I guess the only thing that may break with this PR is size_t receive_operations_cb(void *content, size_t size, size_t nmemb, char *buffer), because we are now returning a list of dictionaries instead of a list of integers.

That function is not used at the moment anywhere in the repmgr code, but if one ever attempts to use it after this PR is merged, repmgr will likely face some problem.

mikewallace1979 · 2023-08-23T19:36:13Z

pg_backup_api/pg_backup_api/logic/utility_controller.py


    :return: the returned response varies:

        * If a successful ``GET`` request, then return a JSON response with
-          ``operations`` key containing a list of IDs of recovery operations
-          for Barman server *server_name*;
+          ``operations`` key containing a list of operations for Barman server


@gonzalemario This is the change which will break repmgr isn't it? This code here is currently expecting a list of IDs and it's now going to get a list of dicts.

Right! That's what I meant here.
At least I expect it to break 😆

It would break repmgr if it did use it. That code is part of receive_operations_cb which is a callback when get_operations_on_server is triggered. The purpose of that function is to check the different tasks the server has received for a specific node. That was added and included in v5.4.0 because I knew it was going to be useful. Let me explain:

In repmgr-action-standby.c we added run_pg_backupapi to the pool of functions to create different postgres standby. That function is only using 1) the creation of a new recovery and 2) the check of that previous task. That's why we're not breaking current repmgr's standby creation mode but it will break things if we expand features in the future.

I agree also on having a 2.0 version, and I also agree that to separate the general refactoring from the API changes would be great.

Ok great - so:

We should patch receive_operations_cb in repmgr so that it is not left with code which will not work against pg-backup-api.

We do not actually need to make a new repmgr release in order to be compatible with pg-backup-api 2.0.0 - the existing repmgr will work just fine with 1.1.1 and the proposed 2.0.0.

In that case I am -1 on the suggestion I made yesterday to make the change to the output format of the endpoint optional - we should just update the format rather than add additional complexity, given it is not needed in order to maintain repmgr compatibility.

mikewallace1979

This is a good improvement and I am a big fan of the tests.

The fact that it changes the API is going to make releasing it tricky though - we would need to version it 2.0.0 and any existing consumers of the operations API (currently repmgr) will need to be updated.

Is it possible to separate the general refactoring from the API changes?

Or alternatively could we make the API change which really would break repmgr (which is the change from returning a list of IDs for GETs to /server/<name>/operations) an opt-in change? So the default behaviour would be to return the list of IDs but if you include a verbose=true parameter in your GET request then you get the list of dicts?

barthisrael · 2023-08-23T19:49:24Z

This is a good improvement and I am a big fan of the tests.

Great, glad you liked it :)

The fact that it changes the API is going to make releasing it tricky though - we would need to version it 2.0.0 and any existing consumers of the operations API (currently repmgr) will need to be updated.

Indeed! That's one of me concerns too!

Is it possible to separate the general refactoring from the API changes?

Or alternatively could we make the API change which really would break repmgr (which is the change from returning a list of IDs for GETs to /server/<name>/operations) an opt-in change? So the default behaviour would be to return the list of IDs but if you include a verbose=true parameter in your GET request then you get the list of dicts?

Sure, I'll take a look at making that optional then.

…ons` This commit introduces a couple query params to `/servers/<server_name>/operations` `GET` requests: * `verbose`: if `true`, return a list of dictionaries, containing the operation ID and type. If `false` or omitted, keep the original beavior, i.e., return a list of operation IDs. This is done so the changes are backward compatible; * `type`: used to filter operation types being returned by the request. If omitted, return operations of any type, i.e., no filter is applied. References: BAR-94. Signed-off-by: Israel Barth Rubio <[email protected]>

barthisrael · 2023-08-23T20:39:55Z

@mikewallace1979 I just added a commit with your suggestion. I'm changing the PR description accordingly.

mikewallace1979 · 2023-08-24T12:22:49Z

@mikewallace1979 I just added a commit with your suggestion. I'm changing the PR description accordingly.

The patch looks fine but I'm not convinced we need it, in which case I prefer the simplicity of just having the one format - so I think we can revert 5e123a5 and merge as is.

Sorry for wasting your time there - I didn't quite understand the repmgr impact was inconsequential.

…/operations`" This reverts commit 5e123a5.

barthisrael · 2023-08-24T13:06:13Z

@mikewallace1979 I just added a commit with your suggestion. I'm changing the PR description accordingly.

The patch looks fine but I'm not convinced we need it, in which case I prefer the simplicity of just having the one format - so I think we can revert 5e123a5 and merge as is.

Sorry for wasting your time there - I didn't quite understand the repmgr impact was inconsequential.

Not a problem!

Just reverted the commit, and I'm updating the PR description too.

barthisrael · 2023-08-24T13:10:09Z

@mikewallace1979 @gonzalemario so IIUC there is nothing pending in this PR, right?
I'll wait for your green flag before squashing and merging.

mikewallace1979 · 2023-08-24T14:40:31Z

Thanks @barthisrael - I think we should wait for the repmgr patch before merging.

barthisrael · 2023-08-24T18:06:22Z

Thanks @barthisrael - I think we should wait for the repmgr patch before merging.

Makes sense, yes!

@gonzalemario would you be able, by chance, to update the code in repmgr? I ask because you might be more used to that code as I guess it was introduced by you 😆 (I didn't check the commit log yet to be sure, but I guess that was introduced by you :) ).

martinmarques · 2023-08-28T12:08:37Z

I just wanted to leave a message on the squashing and merging. I wouldn't use the option to squash that GitHub has available. It's better to squash from the command line as it lets to pick which commits to squash and edit a new commit message (sometimes it's the same message, sometimes it's not)

mikewallace1979

Approving for squashing and merging, since the repmgr patch is a little while away and it's not actually broken by these changes.

barthisrael · 2023-09-13T12:02:04Z

Approving for squashing and merging, since the repmgr patch is a little while away and it's not actually broken by these changes.

Great, thanks!

I had a short conversation with Martin and we agreed on using the "Squash and merge" from GitHub for this PR as we only need a single commit with the PR description -- i.e. we don't need 2 or more commits when merging.

barthisrael added 5 commits August 14, 2023 11:21

Add unit tests for OperationServer class

f174917

Also fixes a couple bugs found based on unit tests execution. Signed-off-by: Israel Barth Rubio <[email protected]>

Add unit tests for Operation class

727fe6e

Signed-off-by: Israel Barth Rubio <[email protected]>

Add unit tests for RecoveryOperation class

e082103

Also fixes a couple bugs found based on unit tests execution. Signed-off-by: Israel Barth Rubio <[email protected]>

barthisrael requested review from gonzalemario and mikewallace1979 August 15, 2023 12:44

barthisrael added 2 commits August 15, 2023 09:46

Remove unit tests that use the old code

f7f933c

Signed-off-by: Israel Barth Rubio <[email protected]>

Fix issues reported by static type checker

86db57e

Signed-off-by: Israel Barth Rubio <[email protected]>

gonzalemario approved these changes Aug 16, 2023

View reviewed changes

mikewallace1979 reviewed Aug 23, 2023

View reviewed changes

barthisrael requested a review from mikewallace1979 August 23, 2023 20:44

Revert "Add query params to GET requests to `/servers/<server_name>…

2ddd69f

…/operations`" This reverts commit 5e123a5.

mikewallace1979 approved these changes Sep 13, 2023

View reviewed changes

barthisrael merged commit 50f8877 into main Sep 13, 2023
2 checks passed

barthisrael deleted the dev/BAR-94-refactor-code branch September 13, 2023 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor pg-backup-api code so it is easier to introduce new operations #83

Refactor pg-backup-api code so it is easier to introduce new operations #83

barthisrael commented Aug 15, 2023 •

edited

Loading

gonzalemario commented Aug 15, 2023

barthisrael commented Aug 15, 2023 •

edited

Loading

mikewallace1979 Aug 23, 2023

gonzalemario Aug 23, 2023

barthisrael Aug 23, 2023

mikewallace1979 Aug 23, 2023

barthisrael Aug 23, 2023 •

edited

Loading

gonzalemario Aug 24, 2023

mikewallace1979 Aug 24, 2023

mikewallace1979 left a comment

barthisrael commented Aug 23, 2023

barthisrael commented Aug 23, 2023

mikewallace1979 commented Aug 24, 2023 •

edited

Loading

barthisrael commented Aug 24, 2023

barthisrael commented Aug 24, 2023

mikewallace1979 commented Aug 24, 2023

barthisrael commented Aug 24, 2023

martinmarques commented Aug 28, 2023 •

edited

Loading

mikewallace1979 left a comment

barthisrael commented Sep 13, 2023

Refactor pg-backup-api code so it is easier to introduce new operations #83

Refactor pg-backup-api code so it is easier to introduce new operations #83

Conversation

barthisrael commented Aug 15, 2023 • edited Loading

gonzalemario commented Aug 15, 2023

barthisrael commented Aug 15, 2023 • edited Loading

mikewallace1979 Aug 23, 2023

Choose a reason for hiding this comment

gonzalemario Aug 23, 2023

Choose a reason for hiding this comment

barthisrael Aug 23, 2023

Choose a reason for hiding this comment

mikewallace1979 Aug 23, 2023

Choose a reason for hiding this comment

barthisrael Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

gonzalemario Aug 24, 2023

Choose a reason for hiding this comment

mikewallace1979 Aug 24, 2023

Choose a reason for hiding this comment

mikewallace1979 left a comment

Choose a reason for hiding this comment

barthisrael commented Aug 23, 2023

barthisrael commented Aug 23, 2023

mikewallace1979 commented Aug 24, 2023 • edited Loading

barthisrael commented Aug 24, 2023

barthisrael commented Aug 24, 2023

mikewallace1979 commented Aug 24, 2023

barthisrael commented Aug 24, 2023

martinmarques commented Aug 28, 2023 • edited Loading

mikewallace1979 left a comment

Choose a reason for hiding this comment

barthisrael commented Sep 13, 2023

barthisrael commented Aug 15, 2023 •

edited

Loading

barthisrael commented Aug 15, 2023 •

edited

Loading

barthisrael Aug 23, 2023 •

edited

Loading

mikewallace1979 commented Aug 24, 2023 •

edited

Loading

martinmarques commented Aug 28, 2023 •

edited

Loading