Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor pg-backup-api code so it is easier to introduce new operations #83

Merged
merged 9 commits into from
Sep 13, 2023

Conversation

barthisrael
Copy link
Contributor

@barthisrael barthisrael commented Aug 15, 2023

This PR is an attempt of making it easier to extend the operations in the pg-backup-api.

Take note that this PR breaks compatibility because of these changes:

  • /servers/server_name/operations endpoint used to return a list of operation IDs. From now on it will return a list of dictionaries, each of them containing a couple keys: id (operation ID) and type (operation type).
  • /servers/server_name/operations/operation_id endpoint will return operation_id and status instead of recovery_id and status;
  • create-operation is removed from the argument parser of server_operation.py, command which was broken before this PR. However, that is not really exposed to users. You can only run that argument parser if invoking the module with python /path/to/server_operation.py, which is not done by pg-backup-api use run.

You can find bellow the full list of refactoring changes that have been performed:

  • Changes to server_operation module:
    • Renamed Metadata class as OperationServer, and extended its functionalities so it is able to handle the management of operations for a given Barman Server, like:
      • Create required directories for storing job and output files;
      • Create the required files for job and output, respecting the expected minimum set of keys;
      • Read the contents of job and output files;
      • Get status of operations;
      • Get list of operations in this server;
    • Create a superclass named Operation, which has logic for dealing with a single operation and "talking" with the OperationServer. Each subclass of this class defines an operation exposed by the pg-backup-api, and the logic to actually run the operation. The Operation class hides all the details from the subclass on how to deal with the OperationServer;
    • Create the class RecoveryOperation, which is a subclass of Operation, and defines the logic for running a recovery operation. It takes care of dealing with validation of required arguments instead of relying on functions spread over the pg-backup-api code. It also takes care of building and running the barman recover command, so the other modules can simply call run instead of doing the job which should be performed by this class. This class was created based on logic found spread over modules of pg-backup-api, and based on ServerOperation class, which is now extinct;
    • Define a set of custom exceptions to be raised instead of raising a general Exception or relying on implicit exception raising of things like KeyError:
      • OperationServerConfigError: error in the configuration of OperationServer;
      • MalformedContent: content of job or output file is not as expected;
      • OperationNotExists: trying to query information about a non-existing operation;
    • Removed the command-line option create-operation, which was broken before this PR;
  • Changes to run module:
    • Changed run module so it doesn't do job that should be internal to the new RecoveryOperation class, like running building the list of arguments for barman recover and running the command itself. The recovery logic should be a blackbox for the run module;
  • Changes to utils module:
    • Changed utils module so it doesn't define the supported arguments for a recovery operation. Again, that job should be performed by the class implementing a recovery operation (RecoveryOperation);
  • Changes to the REST API:
    • Changed /servers/server_name/operations/operation_id endpoint so it returns operation_id and status instead of recovery_id and status. If we are to support more operations, this should be generic and not tied to recovery operations;
    • Changed /servers/server_name/operations endpoint so it is able to handle different JSON bodies depending on the operation that is being performed. Also, it now relies on an explicit MalformedContent being triggered by the object function instead of relying on an implicit KeyError to detect missing required arguments;
    • Changed /servers/server_name/operations endpoint so it returns a list of dictionaries, containing the operation ID and type, instead of a simple list of operation IDs;
      • type: used to filter operation types being returned by the request. If omitted, return operations of any type, i.e., no filter is applied.

References: BAR-94.

This commit is an attempt of making it easier to extend the operations in the
pg-backup-api. The following changes have been performed:

* Changes to `server_operation` module:
    * Renamed `Metadata` class as `OperationServer`, and extended its functionalities
      so it is able to handle the management of operations for a given Barman
      Server, like:
        * Create required directories for storing job and output files;
        * Create the required files for job and output, respecting the expected
          minimum set of keys;
        * Read the contents of job and output files;
        * Get status of operations;
        * Get list of operations in this server;
    * Create a superclass named `Operation`, which has logic for dealing with a
      single operation and "talking" with the `OperationServer`. Each subclass of
      this class defines an operation exposed by the pg-backup-api, and the logic
      to actually run the operation. The `Operation` class hides all the details
      from the subclass on how to deal with the `OperationServer`;
    * Create the class `RecoveryOperation`, which is a subclass of `Operation`, and
      defines the logic for running a recovery operation. It takes care of dealing
      with validation of required arguments instead of relying on functions spread
      over the pg-backup-api code. It also takes care of building and running the
      `barman recover` command, so the other modules can simply call `run` instead
      of doing the job which should be performed by this class. This class was
      created based on logic found spread over modules of `pg-backup-api`, and based
      on `ServerOperation` class, which is now extinct;
    * Define a set of custom exceptions to be raised instead of raising a general
      `Exception` or relying on implicit exception raising of things like `KeyError`:
        * `OperationServerConfigError`: error in the configuration of `OperationServer`;
        * `MalformedContent`: content of job or output file is not as expected;
        * `OperationNotExists`: trying to query information about a non-existing operation;
        * `OperationAlreadyRun`: triggered if trying to run the same job twice or more;
    * Removed the command-line option `create-operation`, which was broken before
      this commit;
* Changes to `run` module:
    * Changed `run` module so it doesn't do job that should be internal to the
      new `RecoveryOperation` class, like running building the list of arguments
      for `barman recover` and running the command itself. The recovery logic
      should be a blackbox for the `run` module;
* Changes to `utils` module:
    * Changed `utils` module so it doesn't define the supported arguments for a
      recovery operation. Again, that job should be performed by the class
      implementing a recovery operation (`RecoveryOperation`);
* Changes to the REST API:
    * Changed `/servers/server_name/operations/operation_id` endpoint so it
      returns `operation_id` and `status` instead of `recovery_id` and `status`.
      If we are to support more operations, this should be generic and not tied
      to recovery operations;
    * Changed `/servers/server_name/operations` endpoint so it is able to handle
      different JSON bodies depending on the operation that is being performed.
      Also, it now relies on an explicit `MalformedContent` being triggered by
      the object function instead of relying on an implicit `KeyError` to detect
      missing required arguments;
    * Changed `/servers/server_name/operations` endpoint so it returns not only
      a list of operation IDs, but also their type. Each item in the list is now
      a dictionary with two keys (`type` and `id`) instead of a simple string
      containing the operation ID;

References: BAT-94.

Signed-off-by: Israel Barth Rubio <[email protected]>
Fix the following bugs:

* Not handling the case when `op_type` is `None` in `get_operations_list`
* Use `read_job_file` instead of `_read_file` in `get_operations_list`
* `id` argument of `Operation` could not be `None`
* `datetime` was being wrongly referenced
* `_run_subprocess` was not a static method
* `_run_subprocess` was returning `stout.decode` function instead of its output

Besides that, made a couple changes:

* Added a note that `run` should only be called once, instead of trying to
  automatically check that a job was already executed;
* Removed the check for "invalid arguments" in `_validate_job_content`. If one
  passes invalid arguments they will not be considered by `_run_logic`.

References: BAR-94.

Signed-off-by: Israel Barth Rubio <[email protected]>
Also fixes a couple bugs found based on unit tests execution.

Signed-off-by: Israel Barth Rubio <[email protected]>
Signed-off-by: Israel Barth Rubio <[email protected]>
Also fixes a couple bugs found based on unit tests execution.

Signed-off-by: Israel Barth Rubio <[email protected]>
@gonzalemario
Copy link
Contributor

Could you please squash all in one commit? Or if you want to split, I'd have at most 2.

  1. Refactor pg-backup-api code so it is easier to introduce new operations: f96cba5 and 056179c
  2. Add unit tests for Operation and RecoveryOperation: with the rest of all of commits

@barthisrael
Copy link
Contributor Author

barthisrael commented Aug 15, 2023

Could you please squash all in one commit? Or if you want to split, I'd have at most 2.

  1. Refactor pg-backup-api code so it is easier to introduce new operations: f96cba5 and 056179c
  2. Add unit tests for Operation and RecoveryOperation: with the rest of all of commits

Do you want to squash them before merging?
An alternative would be to use the "squash and merge" option later. We would keep the history in the PR, but squash them into a single commit when merging into main.
What do you think about that?
The result would be the same, it is just that the squashing would be done at merge time instead of now.

response = {"recovery_id": operation_id, "status": status}
op_server = OperationServer(server_name)
status = op_server.get_operation_status(operation_id)
response = {"operation_id": operation_id, "status": status}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gonzalemario Does this change actually affect repmgr? As far as I can tell repmgr only uses the POST response to this endpoint which is already returning an operation_id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't. I thought the same when I first read it. That specific code is indeed changing the response repmgr receives but when we parse it in the callback[1], we just look for the status field, not for recovery_id (which it's called operation_id now). So we almost broke repmgr's standby creation though the API but because the client ignores other fields, we're safe.

In pg-backup-api's we've got 2 endpoints that receive POST data, this one checks if the previous recovery operation was completed or not. The other endpoint creates the recovery task but it's not used by servers_operation_id_get() method.

[1] size_t receive_operation_status(void *content, size_t size, size_t nmemb, char *buffer);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!

I guess the only thing that may break with this PR is size_t receive_operations_cb(void *content, size_t size, size_t nmemb, char *buffer), because we are now returning a list of dictionaries instead of a list of integers.

That function is not used at the moment anywhere in the repmgr code, but if one ever attempts to use it after this PR is merged, repmgr will likely face some problem.


:return: the returned response varies:

* If a successful ``GET`` request, then return a JSON response with
``operations`` key containing a list of IDs of recovery operations
for Barman server *server_name*;
``operations`` key containing a list of operations for Barman server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gonzalemario This is the change which will break repmgr isn't it? This code here is currently expecting a list of IDs and it's now going to get a list of dicts.

Copy link
Contributor Author

@barthisrael barthisrael Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! That's what I meant here.
At least I expect it to break 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would break repmgr if it did use it. That code is part of receive_operations_cb which is a callback when get_operations_on_server is triggered. The purpose of that function is to check the different tasks the server has received for a specific node. That was added and included in v5.4.0 because I knew it was going to be useful. Let me explain:

In repmgr-action-standby.c we added run_pg_backupapi to the pool of functions to create different postgres standby. That function is only using 1) the creation of a new recovery and 2) the check of that previous task. That's why we're not breaking current repmgr's standby creation mode but it will break things if we expand features in the future.

I agree also on having a 2.0 version, and I also agree that to separate the general refactoring from the API changes would be great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great - so:

  1. We should patch receive_operations_cb in repmgr so that it is not left with code which will not work against pg-backup-api.
  2. We do not actually need to make a new repmgr release in order to be compatible with pg-backup-api 2.0.0 - the existing repmgr will work just fine with 1.1.1 and the proposed 2.0.0.

In that case I am -1 on the suggestion I made yesterday to make the change to the output format of the endpoint optional - we should just update the format rather than add additional complexity, given it is not needed in order to maintain repmgr compatibility.

Copy link
Contributor

@mikewallace1979 mikewallace1979 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good improvement and I am a big fan of the tests.

The fact that it changes the API is going to make releasing it tricky though - we would need to version it 2.0.0 and any existing consumers of the operations API (currently repmgr) will need to be updated.

Is it possible to separate the general refactoring from the API changes?

Or alternatively could we make the API change which really would break repmgr (which is the change from returning a list of IDs for GETs to /server/<name>/operations) an opt-in change? So the default behaviour would be to return the list of IDs but if you include a verbose=true parameter in your GET request then you get the list of dicts?

@barthisrael
Copy link
Contributor Author

This is a good improvement and I am a big fan of the tests.

Great, glad you liked it :)

The fact that it changes the API is going to make releasing it tricky though - we would need to version it 2.0.0 and any existing consumers of the operations API (currently repmgr) will need to be updated.

Indeed! That's one of me concerns too!

Is it possible to separate the general refactoring from the API changes?

Or alternatively could we make the API change which really would break repmgr (which is the change from returning a list of IDs for GETs to /server/<name>/operations) an opt-in change? So the default behaviour would be to return the list of IDs but if you include a verbose=true parameter in your GET request then you get the list of dicts?

Sure, I'll take a look at making that optional then.

…ons`

This commit introduces a couple query params to `/servers/<server_name>/operations`
`GET` requests:

* `verbose`: if `true`, return a list of dictionaries, containing the
operation ID and type. If `false` or omitted, keep the original beavior,
i.e., return a list of operation IDs. This is done so the changes are
backward compatible;
* `type`: used to filter operation types being returned by the request.
If omitted, return operations of any type, i.e., no filter is applied.

References: BAR-94.

Signed-off-by: Israel Barth Rubio <[email protected]>
@barthisrael
Copy link
Contributor Author

@mikewallace1979 I just added a commit with your suggestion. I'm changing the PR description accordingly.

@mikewallace1979
Copy link
Contributor

mikewallace1979 commented Aug 24, 2023

@mikewallace1979 I just added a commit with your suggestion. I'm changing the PR description accordingly.

The patch looks fine but I'm not convinced we need it, in which case I prefer the simplicity of just having the one format - so I think we can revert 5e123a5 and merge as is.

Sorry for wasting your time there - I didn't quite understand the repmgr impact was inconsequential.

@barthisrael
Copy link
Contributor Author

@mikewallace1979 I just added a commit with your suggestion. I'm changing the PR description accordingly.

The patch looks fine but I'm not convinced we need it, in which case I prefer the simplicity of just having the one format - so I think we can revert 5e123a5 and merge as is.

Sorry for wasting your time there - I didn't quite understand the repmgr impact was inconsequential.

Not a problem!

Just reverted the commit, and I'm updating the PR description too.

@barthisrael
Copy link
Contributor Author

@mikewallace1979 @gonzalemario so IIUC there is nothing pending in this PR, right?
I'll wait for your green flag before squashing and merging.

@mikewallace1979
Copy link
Contributor

Thanks @barthisrael - I think we should wait for the repmgr patch before merging.

@barthisrael
Copy link
Contributor Author

Thanks @barthisrael - I think we should wait for the repmgr patch before merging.

Makes sense, yes!

@gonzalemario would you be able, by chance, to update the code in repmgr? I ask because you might be more used to that code as I guess it was introduced by you 😆 (I didn't check the commit log yet to be sure, but I guess that was introduced by you :) ).

@martinmarques
Copy link
Collaborator

martinmarques commented Aug 28, 2023

I just wanted to leave a message on the squashing and merging. I wouldn't use the option to squash that GitHub has available. It's better to squash from the command line as it lets to pick which commits to squash and edit a new commit message (sometimes it's the same message, sometimes it's not)

Copy link
Contributor

@mikewallace1979 mikewallace1979 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for squashing and merging, since the repmgr patch is a little while away and it's not actually broken by these changes.

@barthisrael
Copy link
Contributor Author

Approving for squashing and merging, since the repmgr patch is a little while away and it's not actually broken by these changes.

Great, thanks!

I had a short conversation with Martin and we agreed on using the "Squash and merge" from GitHub for this PR as we only need a single commit with the PR description -- i.e. we don't need 2 or more commits when merging.

@barthisrael barthisrael merged commit 50f8877 into main Sep 13, 2023
2 checks passed
@barthisrael barthisrael deleted the dev/BAR-94-refactor-code branch September 13, 2023 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants