Rewrite disaster recovery based on recreate and status check #1890

AnnaSjerling · 2024-10-21T12:00:17Z

This PR rewrites the disaster recovery docs based on the new recreate and status check procedures.

With regards to the functional changes there are three major things to note.

The recovery of the system database is the same as before, even though the way we check if the system db is write available or not has changed.
The step which removes lost servers now also includes recreating some databases. It was decided this mixing of recovering servers and databases is necessary to get recreate into the guide in the best way we could see.
Previously it was not clear how to handle quarantined databases, even though the guide mentioned them in the intro. Now, it is more explicitly discussed how they are supposed to be handles during disaster recovery.

In the past, we have also gotten feedback that the disaster recovery docs are hard to follow. Therefore, this PR also includes refactoring which introduces a new guide structure. This new structure provides more explanations of why we are asking the user to do a certain thing.

tselmegbaasan

Just a few comments from testing this out locally by hand.

modules/ROOT/pages/clustering/disaster-recovery.adoc

…ocedures.

tselmegbaasan

Looks good. I just have a few questions and suggestions.

modules/ROOT/pages/clustering/disaster-recovery.adoc

NataliaIvakina

@AnnaSjerling hey! I went through your guide. Wow! Tremendous work! I left some editorial comments. My idea was to tie up the beginning with the rest of the guide.

modules/ROOT/pages/clustering/disaster-recovery.adoc

AnnaSjerling · 2024-12-19T08:40:35Z

I have printed the wrong words when describing output from e.g. SHOW DATABASES. So I fixed the words and the capitalisation of them to match the actual output.

…he same as the actual output.

neo-technology-commit-status-publisher · 2024-12-20T10:49:40Z

Thanks for the documentation updates.

The preview documentation has now been torn down - reopening this PR will republish it.

AnnaSjerling · 2024-12-20T11:08:22Z

modules/ROOT/pages/clustering/disaster-recovery.adoc

 ====

 The `system` database contains the view of the cluster.
 This includes which servers and databases are present, where they live and how they are configured.
 During a disaster, the view of the cluster might need to change to reflect a new reality, such as removing lost servers.
 Databases might also need to be recreated to regain write availability.
-Because both of these steps are executed by modifying the `system` database, making the `system` database write available is a vital first step during disaster recovery.
+Because both of these steps are executed by modifying the `system` database, making the `system` database write-enabled is a vital first step during disaster recovery.


This is in your hands, but I don't think it nice to change from a technically correct wording which we have defined in the beginning of the document to multiple adjacent ones which are not defined. Like write-enabled, able to accept write operations and write-capable.

AnnaSjerling · 2024-12-20T11:09:37Z

modules/ROOT/pages/clustering/disaster-recovery.adoc

-This is done in two different ways:
+Use the following steps to remove lost servers and add new ones to the cluster.
+To remove lost servers, any allocations they were hosting must be moved to available servers in the cluster.
+This can be done in two different ways:


This is not correct, it is not an either or, but both will be needed in most cases.

) This PR rewrites the disaster recovery docs based on the new recreate and status check procedures. With regards to the functional changes there are three major things to note. 1. The recovery of the system database is the same as before, even though the way we check if the system db is write available or not has changed. 2. The step which removes lost servers now also includes recreating some databases. It was decided this mixing of recovering servers and databases is necessary to get recreate into the guide in the best way we could see. 3. Previously it was not clear how to handle quarantined databases, even though the guide mentioned them in the intro. Now, it is more explicitly discussed how they are supposed to be handles during disaster recovery. In the past, we have also gotten feedback that the disaster recovery docs are hard to follow. Therefore, this PR also includes refactoring which introduces a new guide structure. This new structure provides more explanations of why we are asking the user to do a certain thing. --------- Co-authored-by: NataliaIvakina <[email protected]>

…2027) This PR rewrites the disaster recovery docs based on the new recreate and status check procedures. With regards to the functional changes there are three major things to note. 1. The recovery of the system database is the same as before, even though the way we check if the system db is write available or not has changed. 2. The step which removes lost servers now also includes recreating some databases. It was decided this mixing of recovering servers and databases is necessary to get recreate into the guide in the best way we could see. 3. Previously it was not clear how to handle quarantined databases, even though the guide mentioned them in the intro. Now, it is more explicitly discussed how they are supposed to be handles during disaster recovery. In the past, we have also gotten feedback that the disaster recovery docs are hard to follow. Therefore, this PR also includes refactoring which introduces a new guide structure. This new structure provides more explanations of why we are asking the user to do a certain thing. --------- Co-authored-by: Anna Sjerling <[email protected]>

AnnaSjerling added clustering NOT_READY_FOR_MERGE labels Oct 21, 2024

AnnaSjerling assigned NataliaIvakina Oct 21, 2024

AnnaSjerling changed the title ~~Rewrite disaster recovery based on recreate 2~~ Rewrite disaster recovery based on recreate and status check Oct 21, 2024

NataliaIvakina added the WIP label Nov 11, 2024

tselmegbaasan reviewed Nov 25, 2024

View reviewed changes

modules/ROOT/pages/clustering/disaster-recovery.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clustering/disaster-recovery.adoc Outdated Show resolved Hide resolved

AnnaSjerling force-pushed the rewrite-disaster-recovery-based-on-recreate-2 branch from b74c7f3 to 1d27305 Compare November 28, 2024 14:34

AnnaSjerling added 6 commits November 28, 2024 15:42

Rewrite disaster recovery to use the new recreate and status check pr…

a727ace

…ocedures.

Move to new structure.

657b957

WIP

b88cdad

WIP

29bd007

Making notes about potential problems found during testing.

526fe08

Rewrite based on problems found during testing.

3140906

AnnaSjerling force-pushed the rewrite-disaster-recovery-based-on-recreate-2 branch from 1d27305 to 3140906 Compare November 28, 2024 14:42

AnnaSjerling assigned tselmegbaasan and unassigned tselmegbaasan Nov 28, 2024

tselmegbaasan approved these changes Dec 6, 2024

View reviewed changes

Review comments.

43d60b6

NataliaIvakina added cherry-pick-this-to-5.x and removed WIP NOT_READY_FOR_MERGE labels Dec 10, 2024

NataliaIvakina reviewed Dec 17, 2024

View reviewed changes

modules/ROOT/pages/clustering/disaster-recovery.adoc Outdated Show resolved Hide resolved

NataliaIvakina reviewed Dec 17, 2024

View reviewed changes

modules/ROOT/pages/clustering/disaster-recovery.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clustering/disaster-recovery.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clustering/disaster-recovery.adoc Outdated Show resolved Hide resolved

AnnaSjerling and others added 2 commits December 20, 2024 09:44

Review comments and make the output examples from e.g. SHOW SERVERS t…

dd49ab3

…he same as the actual output.

Update disaster-recovery.adoc

3930931

NataliaIvakina merged commit a2f28f2 into neo4j:dev Dec 20, 2024
8 checks passed

AnnaSjerling commented Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite disaster recovery based on recreate and status check #1890

Rewrite disaster recovery based on recreate and status check #1890

AnnaSjerling commented Oct 21, 2024 •

edited

Loading

tselmegbaasan left a comment

tselmegbaasan left a comment

NataliaIvakina left a comment

AnnaSjerling commented Dec 19, 2024

neo-technology-commit-status-publisher commented Dec 20, 2024 •

edited

Loading

AnnaSjerling Dec 20, 2024

AnnaSjerling Dec 20, 2024

Rewrite disaster recovery based on recreate and status check #1890

Rewrite disaster recovery based on recreate and status check #1890

Conversation

AnnaSjerling commented Oct 21, 2024 • edited Loading

tselmegbaasan left a comment

Choose a reason for hiding this comment

tselmegbaasan left a comment

Choose a reason for hiding this comment

NataliaIvakina left a comment

Choose a reason for hiding this comment

AnnaSjerling commented Dec 19, 2024

neo-technology-commit-status-publisher commented Dec 20, 2024 • edited Loading

AnnaSjerling Dec 20, 2024

Choose a reason for hiding this comment

AnnaSjerling Dec 20, 2024

Choose a reason for hiding this comment

AnnaSjerling commented Oct 21, 2024 •

edited

Loading

neo-technology-commit-status-publisher commented Dec 20, 2024 •

edited

Loading