-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor pg-backup-api code so it is easier to introduce new operations #83
Conversation
This commit is an attempt of making it easier to extend the operations in the pg-backup-api. The following changes have been performed: * Changes to `server_operation` module: * Renamed `Metadata` class as `OperationServer`, and extended its functionalities so it is able to handle the management of operations for a given Barman Server, like: * Create required directories for storing job and output files; * Create the required files for job and output, respecting the expected minimum set of keys; * Read the contents of job and output files; * Get status of operations; * Get list of operations in this server; * Create a superclass named `Operation`, which has logic for dealing with a single operation and "talking" with the `OperationServer`. Each subclass of this class defines an operation exposed by the pg-backup-api, and the logic to actually run the operation. The `Operation` class hides all the details from the subclass on how to deal with the `OperationServer`; * Create the class `RecoveryOperation`, which is a subclass of `Operation`, and defines the logic for running a recovery operation. It takes care of dealing with validation of required arguments instead of relying on functions spread over the pg-backup-api code. It also takes care of building and running the `barman recover` command, so the other modules can simply call `run` instead of doing the job which should be performed by this class. This class was created based on logic found spread over modules of `pg-backup-api`, and based on `ServerOperation` class, which is now extinct; * Define a set of custom exceptions to be raised instead of raising a general `Exception` or relying on implicit exception raising of things like `KeyError`: * `OperationServerConfigError`: error in the configuration of `OperationServer`; * `MalformedContent`: content of job or output file is not as expected; * `OperationNotExists`: trying to query information about a non-existing operation; * `OperationAlreadyRun`: triggered if trying to run the same job twice or more; * Removed the command-line option `create-operation`, which was broken before this commit; * Changes to `run` module: * Changed `run` module so it doesn't do job that should be internal to the new `RecoveryOperation` class, like running building the list of arguments for `barman recover` and running the command itself. The recovery logic should be a blackbox for the `run` module; * Changes to `utils` module: * Changed `utils` module so it doesn't define the supported arguments for a recovery operation. Again, that job should be performed by the class implementing a recovery operation (`RecoveryOperation`); * Changes to the REST API: * Changed `/servers/server_name/operations/operation_id` endpoint so it returns `operation_id` and `status` instead of `recovery_id` and `status`. If we are to support more operations, this should be generic and not tied to recovery operations; * Changed `/servers/server_name/operations` endpoint so it is able to handle different JSON bodies depending on the operation that is being performed. Also, it now relies on an explicit `MalformedContent` being triggered by the object function instead of relying on an implicit `KeyError` to detect missing required arguments; * Changed `/servers/server_name/operations` endpoint so it returns not only a list of operation IDs, but also their type. Each item in the list is now a dictionary with two keys (`type` and `id`) instead of a simple string containing the operation ID; References: BAT-94. Signed-off-by: Israel Barth Rubio <[email protected]>
Fix the following bugs: * Not handling the case when `op_type` is `None` in `get_operations_list` * Use `read_job_file` instead of `_read_file` in `get_operations_list` * `id` argument of `Operation` could not be `None` * `datetime` was being wrongly referenced * `_run_subprocess` was not a static method * `_run_subprocess` was returning `stout.decode` function instead of its output Besides that, made a couple changes: * Added a note that `run` should only be called once, instead of trying to automatically check that a job was already executed; * Removed the check for "invalid arguments" in `_validate_job_content`. If one passes invalid arguments they will not be considered by `_run_logic`. References: BAR-94. Signed-off-by: Israel Barth Rubio <[email protected]>
Also fixes a couple bugs found based on unit tests execution. Signed-off-by: Israel Barth Rubio <[email protected]>
Signed-off-by: Israel Barth Rubio <[email protected]>
Also fixes a couple bugs found based on unit tests execution. Signed-off-by: Israel Barth Rubio <[email protected]>
Signed-off-by: Israel Barth Rubio <[email protected]>
Signed-off-by: Israel Barth Rubio <[email protected]>
Do you want to squash them before merging? |
response = {"recovery_id": operation_id, "status": status} | ||
op_server = OperationServer(server_name) | ||
status = op_server.get_operation_status(operation_id) | ||
response = {"operation_id": operation_id, "status": status} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gonzalemario Does this change actually affect repmgr? As far as I can tell repmgr only uses the POST response to this endpoint which is already returning an operation_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't. I thought the same when I first read it. That specific code is indeed changing the response repmgr receives but when we parse it in the callback[1], we just look for the status
field, not for recovery_id
(which it's called operation_id
now). So we almost broke repmgr's standby creation though the API but because the client ignores other fields, we're safe.
In pg-backup-api's we've got 2 endpoints that receive POST data, this one checks if the previous recovery operation was completed or not. The other endpoint creates the recovery task but it's not used by servers_operation_id_get()
method.
[1] size_t receive_operation_status(void *content, size_t size, size_t nmemb, char *buffer);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right!
I guess the only thing that may break with this PR is size_t receive_operations_cb(void *content, size_t size, size_t nmemb, char *buffer)
, because we are now returning a list of dictionaries instead of a list of integers.
That function is not used at the moment anywhere in the repmgr code, but if one ever attempts to use it after this PR is merged, repmgr will likely face some problem.
|
||
:return: the returned response varies: | ||
|
||
* If a successful ``GET`` request, then return a JSON response with | ||
``operations`` key containing a list of IDs of recovery operations | ||
for Barman server *server_name*; | ||
``operations`` key containing a list of operations for Barman server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gonzalemario This is the change which will break repmgr isn't it? This code here is currently expecting a list of IDs and it's now going to get a list of dicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! That's what I meant here.
At least I expect it to break 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would break repmgr if it did use it. That code is part of receive_operations_cb
which is a callback when get_operations_on_server
is triggered. The purpose of that function is to check the different tasks the server has received for a specific node. That was added and included in v5.4.0 because I knew it was going to be useful. Let me explain:
In repmgr-action-standby.c we added run_pg_backupapi
to the pool of functions to create different postgres standby. That function is only using 1) the creation of a new recovery and 2) the check of that previous task. That's why we're not breaking current repmgr's standby creation mode but it will break things if we expand features in the future.
I agree also on having a 2.0 version, and I also agree that to separate the general refactoring from the API changes would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok great - so:
- We should patch
receive_operations_cb
in repmgr so that it is not left with code which will not work against pg-backup-api. - We do not actually need to make a new repmgr release in order to be compatible with pg-backup-api 2.0.0 - the existing repmgr will work just fine with 1.1.1 and the proposed 2.0.0.
In that case I am -1 on the suggestion I made yesterday to make the change to the output format of the endpoint optional - we should just update the format rather than add additional complexity, given it is not needed in order to maintain repmgr compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good improvement and I am a big fan of the tests.
The fact that it changes the API is going to make releasing it tricky though - we would need to version it 2.0.0
and any existing consumers of the operations API (currently repmgr) will need to be updated.
Is it possible to separate the general refactoring from the API changes?
Or alternatively could we make the API change which really would break repmgr
(which is the change from returning a list of IDs for GETs to /server/<name>/operations
) an opt-in change? So the default behaviour would be to return the list of IDs but if you include a verbose=true
parameter in your GET request then you get the list of dicts?
Great, glad you liked it :)
Indeed! That's one of me concerns too!
Sure, I'll take a look at making that optional then. |
…ons` This commit introduces a couple query params to `/servers/<server_name>/operations` `GET` requests: * `verbose`: if `true`, return a list of dictionaries, containing the operation ID and type. If `false` or omitted, keep the original beavior, i.e., return a list of operation IDs. This is done so the changes are backward compatible; * `type`: used to filter operation types being returned by the request. If omitted, return operations of any type, i.e., no filter is applied. References: BAR-94. Signed-off-by: Israel Barth Rubio <[email protected]>
@mikewallace1979 I just added a commit with your suggestion. I'm changing the PR description accordingly. |
The patch looks fine but I'm not convinced we need it, in which case I prefer the simplicity of just having the one format - so I think we can revert 5e123a5 and merge as is. Sorry for wasting your time there - I didn't quite understand the repmgr impact was inconsequential. |
…/operations`" This reverts commit 5e123a5.
Not a problem! Just reverted the commit, and I'm updating the PR description too. |
@mikewallace1979 @gonzalemario so IIUC there is nothing pending in this PR, right? |
Thanks @barthisrael - I think we should wait for the repmgr patch before merging. |
Makes sense, yes! @gonzalemario would you be able, by chance, to update the code in repmgr? I ask because you might be more used to that code as I guess it was introduced by you 😆 (I didn't check the commit log yet to be sure, but I guess that was introduced by you :) ). |
I just wanted to leave a message on the squashing and merging. I wouldn't use the option to squash that GitHub has available. It's better to squash from the command line as it lets to pick which commits to squash and edit a new commit message (sometimes it's the same message, sometimes it's not) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for squashing and merging, since the repmgr patch is a little while away and it's not actually broken by these changes.
Great, thanks! I had a short conversation with Martin and we agreed on using the "Squash and merge" from GitHub for this PR as we only need a single commit with the PR description -- i.e. we don't need 2 or more commits when merging. |
This PR is an attempt of making it easier to extend the operations in the pg-backup-api.
Take note that this PR breaks compatibility because of these changes:
/servers/server_name/operations
endpoint used to return a list of operation IDs. From now on it will return a list of dictionaries, each of them containing a couple keys:id
(operation ID) andtype
(operation type)./servers/server_name/operations/operation_id
endpoint will returnoperation_id
andstatus
instead ofrecovery_id
andstatus
;create-operation
is removed from the argument parser ofserver_operation.py
, command which was broken before this PR. However, that is not really exposed to users. You can only run that argument parser if invoking the module withpython /path/to/server_operation.py
, which is not done bypg-backup-api
use run.You can find bellow the full list of refactoring changes that have been performed:
server_operation
module:Metadata
class asOperationServer
, and extended its functionalities so it is able to handle the management of operations for a given Barman Server, like:Operation
, which has logic for dealing with a single operation and "talking" with theOperationServer
. Each subclass of this class defines an operation exposed by the pg-backup-api, and the logic to actually run the operation. TheOperation
class hides all the details from the subclass on how to deal with theOperationServer
;RecoveryOperation
, which is a subclass ofOperation
, and defines the logic for running a recovery operation. It takes care of dealing with validation of required arguments instead of relying on functions spread over the pg-backup-api code. It also takes care of building and running thebarman recover
command, so the other modules can simply callrun
instead of doing the job which should be performed by this class. This class was created based on logic found spread over modules ofpg-backup-api
, and based onServerOperation
class, which is now extinct;Exception
or relying on implicit exception raising of things likeKeyError
:OperationServerConfigError
: error in the configuration ofOperationServer
;MalformedContent
: content of job or output file is not as expected;OperationNotExists
: trying to query information about a non-existing operation;create-operation
, which was broken before this PR;run
module:run
module so it doesn't do job that should be internal to the newRecoveryOperation
class, like running building the list of arguments forbarman recover
and running the command itself. The recovery logic should be a blackbox for therun
module;utils
module:utils
module so it doesn't define the supported arguments for a recovery operation. Again, that job should be performed by the class implementing a recovery operation (RecoveryOperation
);/servers/server_name/operations/operation_id
endpoint so it returnsoperation_id
andstatus
instead ofrecovery_id
andstatus
. If we are to support more operations, this should be generic and not tied to recovery operations;/servers/server_name/operations
endpoint so it is able to handle different JSON bodies depending on the operation that is being performed. Also, it now relies on an explicitMalformedContent
being triggered by the object function instead of relying on an implicitKeyError
to detect missing required arguments;/servers/server_name/operations
endpoint so it returns a list of dictionaries, containing the operation ID and type, instead of a simple list of operation IDs;type
: used to filter operation types being returned by the request. If omitted, return operations of any type, i.e., no filter is applied.References: BAR-94.