Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Case Deletion #33831

Open
wants to merge 48 commits into
base: master
Choose a base branch
from
Open

Add Support for Case Deletion #33831

wants to merge 48 commits into from

Conversation

minhaminha
Copy link
Contributor

@minhaminha minhaminha commented Dec 1, 2023

Product Description

Previously, the only supported way for a user to delete a case was to either delete the user who owned that case or to archive the form that created it. This PR adds an option to directly soft delete a case and all its related subcases and submission forms, which will then be hard deleted by a periodic task 90 days later.

This also extends the relatively new direct form deletion action. If a form a user is trying to delete was responsible for creating a case (or multiple), they'll be notified that additional cases will be deleted as well.

Example case deletion workflow:

  1. Dedicated button for case deletion. (May move to some other location to not crowd the bottom bar).
Screen Shot 2023-12-01 at 5 25 23 PM
  1. Once you click Delete Case, you are taken to a page that details what cases (and their forms) will be deleted, what cases will be reopened due to one of those forms being archived/deleted and what cases will be affected but not deleted due to the form that updated that case being archived/deleted.
Screen Shot 2023-12-01 at 5 27 09 PM
  1. At the bottom of the page, you will confirm this is your intention.
Screen Shot 2023-12-01 at 5 30 04 PM
  1. After clicking the button, you'll see the final confirmation modal that asks the user to type the name of case into the input field to once again confirm that case and its related cases/forms will be deleted.
Screen Shot 2023-12-01 at 5 31 15 PM

To avoid attempting to delete too many cases (as some cases may have 100s of related ones from it being created by a bulk import), the amount of cases that can deleted at once will be capped at 100, though this could be changed.

For more comprehensive screenshots and for an example of the form driven case deletion, there is a linked doc you can find in the ticket below (latest comment).

Technical Summary

Jira Ticket

Case deletion works in 3 parts:

  • from the original case targeted for deletion, getting all forms associated with that case and checking if those forms have any other cases associated with it as well. This walks through all possible case relations and returns a list to reference for deletion as well as lists used to format the case confirmation page seen above.
  • Once the user submits the correct case name into the final confirmation modal, it first attempts to archive every form associated with the case/its subcases. Step 1 makes sure that the form that creates a case is the last of the forms of that case to be archived, otherwise it won't archive it. Once archived without error, it'll soft delete all forms and cases referenced.
  • 90 days later those forms and cases will be picked up by the automated deletion task.

Form driven case deletion works similarly:

  • If a form was responsible for creating a case (checks its case updates), the delete button on the form view page will inform the user that additional cases and their related forms will be deleted alongside the main form.
  • The user will be redirected to a modified version of the case deletion page which uses the same case relations walkthrough but organizes the display data to highlight which cases were directly affected by the form being deleted and which are not.
  • The user will still be prompted to type the name of a case directly created by the form before deletion.

Safety Assurance

Safety story

Tested locally and staging. QA testing forthcoming.

Automated test coverage

Wrote some tests to ensure the core functions of the feature works as intended.

QA Plan

TBA

Rollback instructions

  • This PR can be reverted after deploy with no further considerations

Labels & Review

  • Risk label is set correctly
  • The set of people pinged as reviewers is appropriate for the level of risk of the change

@minhaminha minhaminha added the product/all-users-all-environments Change impacts all users on all environments label Dec 1, 2023
@dimagimon dimagimon added the Risk: High Change affects files that have been flagged as high risk. label Dec 4, 2023
@minhaminha minhaminha requested a review from biyeun December 4, 2023 19:04
Copy link
Contributor

@millerdev millerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading the description it sounds like it will be necessary to click three different delete buttons and enter a case name to delete a case. Could that be reduced to two delete clicks by moving the case name input to the same page that shows what forms and cases will be deleted? That seems like a better user experience to me. Has that option been considered?

This review only covers the get_cases_and_forms_for_deletion function. I plan to review other parts of this PR in later review sessions.

corehq/apps/reports/standard/cases/case_data.py Outdated Show resolved Hide resolved
corehq/apps/reports/standard/cases/case_data.py Outdated Show resolved Hide resolved
corehq/apps/reports/standard/cases/case_data.py Outdated Show resolved Hide resolved
corehq/apps/reports/standard/cases/case_data.py Outdated Show resolved Hide resolved
corehq/apps/reports/standard/cases/case_data.py Outdated Show resolved Hide resolved
corehq/apps/reports/standard/cases/case_data.py Outdated Show resolved Hide resolved
corehq/apps/reports/standard/cases/case_data.py Outdated Show resolved Hide resolved
corehq/apps/reports/standard/cases/case_data.py Outdated Show resolved Hide resolved
corehq/apps/reports/standard/cases/case_data.py Outdated Show resolved Hide resolved
@minhaminha minhaminha marked this pull request as ready for review December 4, 2023 21:53
cases[case.case_id] = {}
case_names[case.case_id] = case.name
if len(case_names) == 1: # only add primary label to first/main case
case_names[case.case_id] += ' <span class="label label-default">primary case</span>'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a boolean rather than raw HTML?

Similar to my comment above - perhaps a data structure would be better than a string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second this! raw HTML in python should always be avoided

delete_forms = [] # list of forms to be soft deleted

# For formatting the list of cases/submission forms
cases = {} # structured like: {case: {form: {case touched by form: actions taken by form}}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps a dataclass or similar data structure could be used to give this better structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with using dataclasses but can they be unwrapped like a dict in an html template? The sole use of this dictionary is to display the case/form information in the exact order it's stored in so I'm not sure if converting it to a dataclass would really help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minhaminha I think something like a namedtuple would be helpful, here is an example:

Results = namedtuple('Results', ['restored', 'not_found', 'not_deleted'])

you can then call . _asdict() on your namedtuple

Copy link
Contributor

@millerdev millerdev Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with using dataclasses but can they be unwrapped like a dict in an html template?

dataclass and attrs objects (a richer form of the same) can both be converted to dicts, although it should not be necessary. Just pass the object(s) out to the template and reference properties normally.

Copy link
Member

@biyeun biyeun Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@millerdev I always found namedtuple nice due to the shorter notation for a quick dataclass-esq object for simple use cases like this. curious where your preference for an attrs object comes in vs a namedtuple? Usually I would use the former if I wanted optional fields and methods, but namedtuple for very quick named attributes that were always required

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find dataclasses and attrs to support more natural class syntax than namedtuple. There is tons more flexibility such as field defaults like factory=list, it's trivial to add methods, etc. Also, something is sticking in my mind that namedtuple has a performance overhead penalty, but I could be wrong about that. SO also points out that

When you compare named tuples, the names are ignored: two named tuples are equal if they contain the same values in the same order, even if they have different class names or field names

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing is namedtuple doesn't support default values which is often useful.

Also, I find most IDEs quite bad at backlinking to namedtuple definitions since they are just a variable and have no static class definition.

corehq/apps/reports/standard/cases/case_data.py Outdated Show resolved Hide resolved

@method_decorator(require_case_view_permission)
def dispatch(self, request, *args, **kwargs):
self.delete_dict, redirect = get_cases_and_forms_for_deletion(request, self.domain, self.case_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this take to run? Wondering it would make sense to offload it to celery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my testing, It doesn’t take long at all. There’s also a limit on how many cases it can look at/return so the lag shouldn’t really be noticeable.

corehq/form_processor/models/cases.py Outdated Show resolved Hide resolved
@minhaminha
Copy link
Contributor Author

From reading the description it sounds like it will be necessary to click three different delete buttons and enter a case name to delete a case. Could that be reduced to two delete clicks by moving the case name input to the same page that shows what forms and cases will be deleted? That seems like a better user experience to me. Has that option been considered?

The final modal where you enter in the name of the case to confirm your intent to delete was meant to copy the way we currently do mobile worker deletion and I think if we were to move the input box and all the other warnings/instructions in the modal to the same page it might get a little crowded. Though I do agree it feels like a like of clicks I feel like that also emphasizes the how “dangerous” this action could be, which is something we also want. @biyeun do you have any more thoughts on reducing the amount of clicks?

@biyeun
Copy link
Member

biyeun commented Dec 6, 2023

@millerdev The idea behind the confirmation modal is to make it very clear to the user that this is a destructive action and they should be very sure about what they are doing. It's borrowing cues from how github handles repository deletion, or how we handle deleting a mobile worker. I think it's appropriate for this level of permanent data deletion.

The following Cases will be reopened but <strong>not</strong> deleted.
{% endblocktrans %}
<ol>
{% for case_link, form_link in reopened_cases.items %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the case_link html is being generated in python above. can these instead just be URLs and the link HTML is specified in the template? It's always better to separate design from function.

prepared_forms = {}
for form in affected_cases[case]:
prepared_forms[get_form_link(form)] = affected_cases[case][form]
prepared_affected_cases[get_case_link(case)] = prepared_forms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to tie in mentions of a data structure. perhaps something like

AffectedCase = namedtuple('AffectedCase', ['name', 'url', 'affected_forms'])
AffectedForm = namedtuple('AffectedForm', ['name, 'url', 'actions'])

and then in the template, you can call on the properties by iterating through a list of structured items rather than key, value pairs from affected_cases.items

Copy link
Contributor

@millerdev millerdev Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except use attrs rather than namedtuple

from attrs import define, field

@define
class AffectedCase:
    name = field()
    url = field()
    affected_forms = field()

...

html template. The latter will be further modified in format_case_data_for_display.
"""
@define
class DeleteCase:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could these dataclasses and related utility functions live in a separate utils file outside of this report so that the dataclasses (which are looking great!) and the other related functions don't have to be nested under a larger wrapper function? It's still a bit difficult to digest get_case_and_display_data from top to bottom due to all the nesting. It would be great if walk_case_relations was further broken down too...

Copy link
Member

@biyeun biyeun Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps the location for this proposed file can be corehq.apps.hqcase.case_deletion_utils?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there was also a comment earlier about adding unit tests for these extracted functions...seconding this as well! hqcase seems like a more appropriate place for these tests rather that reports, so having the utils file there will also make the organization much more digestible for future understanding

delete_forms.insert(0, form_id)
form_object = XFormInstance.objects.get_form(form_id, domain)
if form_id not in form_names:
form_names[form_id] = escape(xmlns_to_name(domain, form_object.xmlns, form_object.app_id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an example guidance to further break down this function...
could all the places here where form_names[form_id] is being set be converted into a utility function that looks like...

def get_form_name(domain, form_object):
    form_name = escape(xmlns_to_name(domain, form_object.xmlns, form_object.app_id))
    if form_name == form_object.xmlns:
        inferred_name = [
            get_app(domain, form_object.app_id).name or "[Unknown App]",
            "[Unknown Module]",
             form_object.name or "[Unknown Form]"
         ]
         form_name = escape(' > '.join(inferred_name))
    return form_name

then right before you set current_form, you can do...

form_names[form_id] = get_form_name(domain, form_object)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up puling this and the process that gets the case's xforms into their own functions but no matter what I tried, I couldn't break down walk_case_relations further, partly because everything I tried ended up being very redundant and required passing around a lot of paramaters/context when it'd be easier to just access them in the main function. Maybe I'm thinking about this in the wrong way?

I think even just pulling out all the dataclass definitions and those 2 functions make get_case_and_display_data as a whole, way more digestable. Do you have any more thoughts on this?

corehq/apps/hqcase/case_deletion_utils.py Outdated Show resolved Hide resolved
corehq/apps/hqcase/case_deletion_utils.py Outdated Show resolved Hide resolved
CaseBlock(cases[i + 1], create=True).as_text(),
CaseBlock(case_id=cases[i], update={}).as_text(),
CaseBlock(case_id=cases[i + 1], create=True,
index={'parent': (parent_type + '{}'.format(i), cases[i])}).as_text()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
index={'parent': (parent_type + '{}'.format(i), cases[i])}).as_text()
index={'parent': (parent_type + str(i), cases[i])}).as_text()

], self.domain)
self.addCleanup(_delete_all_cases_and_forms, self.domain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this cleanup causing a problem?

form_instance = XFormInstance.objects.get_form(form, domain)
form_instance.soft_delete()
else:
raise FormArchiveError(form.form_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising an error hear means that some forms in form_list may get deleted while others (all those not processed at the time the error is raised) may not, which would leave the system in a state where (to the user) the operation appeared to fail, but some things got deleted. This does not seem like an intuitive outcome. I think the operation should either succeed with all forms deleted or fail with no forms deleted. Is that possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops yea that wouldn't be good. I made sure all forms are archived properly first before moving onto the soft deletion (and unarchives previously archives forms if it fails in the middle of that).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Would it be hard to add a test for that functionality?

Comment on lines 496 to 498
cases, xforms = self.make_complex_case()
_, error = soft_delete_cases_and_forms(request, self.domain,
list(cases.values()), list(xforms.values())[:-1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What causes the error here? And how does this test ensure that the error is not encountered first, before any of the other forms are archived?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in making the multiple cases/forms in make_complex_case, the last form updates a particular child case while the second form creates it. Since on the first error it'll sort it in reverse received_on order, it'll at least archive the the third form before throwing an error saying it can't archive the second form, since the fourth form wasn't archived/part of the delete_form_list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to that test to explain how it works.

@minhaminha
Copy link
Contributor Author

@millerdev just a warning: there's a few more commits to come to make the delete function a background task + possibly implementing the progress bar

@minhaminha
Copy link
Contributor Author

sorry for the force push - messed up the first Merge branch 'master' commit. This also reminds me this will need a decent of commit history cleaning once its fully ready to go.

@Charl1996
Copy link
Contributor

@minhaminha Is this PR still on the radar?

@minhaminha
Copy link
Contributor Author

@Charl1996 Graham is going to be picking this back up soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product/all-users-all-environments Change impacts all users on all environments Risk: High Change affects files that have been flagged as high risk.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants