Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use scheduled action for resetting large record and meta tables #1543

Merged
merged 15 commits into from
Aug 7, 2024

Conversation

tharsheblows
Copy link
Contributor

@tharsheblows tharsheblows commented Aug 5, 2024

Fixes #1503

Development notes

Changes to default database reset functionality:

  • if the site is a single site installation or multisite install with the plugin network activated, the Stream and Stream meta tables are truncated and then a cleanup function is run on the meta table. (This is for the case where a record is added in between truncation the {$wpdb->streammeta} table and the {$wpdb->stream} table.)
  • if the site is a multisite which is not network activated and
    • the site has <= 1000000 records in the {$wpdb->stream} table: the previous deletion functionality will run
    • the site has > 1000000 records in the {$wpdb->stream} table:
      • an action will be scheduled async to delete 250k records at a time using the previous deletion functionality. The limit is handled by a range on the {$wpdb->stream}.ID column
      • the action is recursive and will run until there are no records left with an ID lower than the initial ID. This is to avoid the situation where records keep getting added as the actions run so they never finish.

The "Reset Stream Database" link in Stream --> Settings --> Advanced will not show while the records are being deleted. The warning given will depend on whether or not it's going to be an asynchronous process.

In my local environment, each pass of deleting 250k records takes 10-15 seconds.

Added filters

  • wp_stream_is_large_records_table: filter the number of records to be processed when deciding what is a large records table. Default is 1M, ie anything over 1M records to be processed is considered a large records table.
  • wp_stream_batch_size: how many records in the {$wpdb->stream} table will be used when doing the deletion.

Incidental additions:

  • $this->plugin->get_site_type() returns what sort of site it is:
  • Admin::WP_STREAM_MULTI_NETWORK is a multisite with the plugin network activated
  • Admin::::WP_STREAM_MULTI_NOT_NETWORK is a multisite where the plugin is not network enabled
  • Admin::WP_STREAM_SINGLE_SITE is a single site
  • Action Scheduler library to use instead of WP cron functions

Testing instructions:

You can use the npm run large-records-generate to generate a bunch of records. You can see how many you've generated by running npm run large-records-show

Checklist

  • Project documentation has been updated to reflect the changes in this pull request, if applicable.
  • I have tested the changes in the local development environment (see contributing.md).
  • I have added phpunit tests.

Release Changelog

  • Fix: Use async process for resetting the Stream database on large multisites where the plugin is not network enabled
  • Fix: Truncate tables when resetting the Stream database on single sites and multisites where the plugin is network enabled

Copy link
Contributor

@delawski delawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me 👍

classes/class-admin.php Outdated Show resolved Hide resolved
classes/class-admin.php Outdated Show resolved Hide resolved
Comment on lines 21 to 34
/**
* Used to check if it's a single site, not multisite.
*/
const WP_STREAM_SINGLE_SITE = 'single';

/**
* Used to check if it's a multisite with the plugin network enabled.
*/
const WP_STREAM_MULTI_NETWORK = 'multisite-network';

/**
* Used to check if it's a multisite with the plugin not network enabled.
*/
const WP_STREAM_MULTI_NOT_NETWORK = 'multisite-not-network';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 thought: Two things, nitpicks really:

  1. I feel like this belongs to the Plugin class and not Admin (alternatively, this could be part of some kind of a Utility class). What's your take?
  2. I guess it's a matter of preference but I wonder what do you think about having a set of methods for checking the installation type instead of those enum-like string constants? Something like Admin::is_single_site(), Admin::is_multisite_network_activated() and Admin::is_multisite_not_network_activated().
    It would follow the WordPress convention like is_network_admin() and make the conditions easier to read (potentially). But then they couldn't be used in a switch statement later on and a regular if ... else would have to be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @delawski, but also have one comment about the constants naming:

I'm not sure what's the naming convention in this plugin, but addition of the WP_STREAM_ prefix seems to not be necessary here (these constants are scoped in Admin class).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this belongs to the Plugin class and not Admin (alternatively, this could be part of some kind of a Utility class). What's your take?
Yes! I agree. I'll put it in the Plugin class for now and if it feels out of place there, move to a Utility class.

I guess it's a matter of preference but I wonder what do you think about having a set of methods for checking the installation type instead of those enum-like string constants? Something like Admin::is_single_site(), Admin::is_multisite_network_activated() and Admin::is_multisite_not_network_activated().

Yes, I like this also, it makes sense to me and would make development easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoszgadomski Yes! You're right, that prefix isn't necessary and also not in line with the naming condition! I'll fix those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit : a6d2d36

If I missed doing something in that, it's accidental so please tell me

Comment on lines 474 to 517
/**
* Retrieves the deletion warning message based on the site type
* and whether or not there is currently a process running to delete the tables.
*
* @return string The deletion warning message.
*/
public function get_deletion_warning(): string {
$site_type = $this->plugin->get_site_type();

switch ( $site_type ) {
case Admin::WP_STREAM_MULTI_NETWORK:
$warning = __( 'Warning: This will delete all activity records from the database for all sites.', 'stream' );
break;
case Admin::WP_STREAM_MULTI_NOT_NETWORK:
$warning = $this->plugin->is_large_records_table( Admin::get_blog_record_table_size() ) ?
$this->get_async_deletion_warning() :
__( 'Warning: This will delete all activity records from the database for this site.', 'stream' );
break;
default:
$warning = __( 'Warning: This will delete all activity records from the database.', 'stream' );
break;
}

return $warning;
}

/**
* Retrieves the warning message for asynchronous deletion.
*
* This method checks if there is an action scheduler event running already deleting things
* and returns an appropriate message.
*
* @return string The warning message.
*/
private function get_async_deletion_warning() {
// Check if there is an action scheduler event running already deleting things.
$has_scheduled_action = Admin::is_running_async_deletion();

return $has_scheduled_action ?
__( 'Currently deleting records. Please be patient, this can take a while.', 'stream' )
:
__( 'Warning: This will delete all activity records from the database for this site. If any are added while this is running, they will not be deleted.', 'stream' );
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥜 nitpick: A very minor thing yet I'd like to share with you. I find this code snippet relatively hard to read because of this ternaries and indirection between them (one ternary calls a method which has yet another ternary inside). Do you think it'd be clearer if the get_async_deletion_warning() logic was moved over to the switch statement and the regular if .. else statements used instead of ternaries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@delawski I've updated it here to use if ... else statements and checked for the AS action first as I think that's the most robust for future development (in case anything else uses the async deletion, we don't have to worry about the conditional here) plus it's a cheap check. Is that ok?

Also I added more empty lines than usual as I have a bit of trouble seeing if / else statements when there are no empty lines for some reason.

Commit 12ca190

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you!

Also I added more empty lines than usual as I have a bit of trouble seeing if / else statements when there are no empty lines for some reason.

All good! 😄 I wonder if it would make it easier to read for you if there were only if statements (without any elseif) and they simply returned the warning message directly (there's no other logic besides that if .. else in the function). Anyway, let's leave it as is, it's super clean and readable to me now 👍

classes/class-admin.php Outdated Show resolved Hide resolved
composer.json Show resolved Hide resolved
classes/class-settings.php Outdated Show resolved Hide resolved
Comment on lines 21 to 34
/**
* Used to check if it's a single site, not multisite.
*/
const WP_STREAM_SINGLE_SITE = 'single';

/**
* Used to check if it's a multisite with the plugin network enabled.
*/
const WP_STREAM_MULTI_NETWORK = 'multisite-network';

/**
* Used to check if it's a multisite with the plugin not network enabled.
*/
const WP_STREAM_MULTI_NOT_NETWORK = 'multisite-not-network';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @delawski, but also have one comment about the constants naming:

I'm not sure what's the naming convention in this plugin, but addition of the WP_STREAM_ prefix seems to not be necessary here (these constants are scoped in Admin class).

classes/class-admin.php Outdated Show resolved Hide resolved
FROM {$wpdb->stream} AS `stream`
LEFT JOIN {$wpdb->streammeta} AS `meta`
ON `meta`.`record_id` = `stream`.`ID`
WHERE 1=1 AND `blog_id`=%d;",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tharsheblows The WHERE 1=1 is typically added when a SQL query is dynamically generated. While it will be ignored by the database, there's no need to have it in code in this case, as all the "where" clauses are static:

Suggested change
WHERE 1=1 AND `blog_id`=%d;",
`blog_id`=%d;",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit e931533

Thank you! :)

Comment on lines +116 to +117
// Load Action Scheduler.
require_once $this->locations['dir'] . '/vendor/woocommerce/action-scheduler/action-scheduler.php';
Copy link
Member

@bartoszgadomski bartoszgadomski Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tharsheblows There's a risk of conflict if this package was loaded by another plugin. Should this be preceded with function_exists call (or something similar) to only load this dependency if it has not been loaded by another plugin earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoszgadomski Action Scheduler works when included multiple times! It's a clever library – it loads the most recent version requested so all of the functions expected are available. Eg latest version does it like this

* @return bool Whether or not this should be considered large.
*/
public function is_large_records_table( int $record_number ): bool {
return apply_filters( 'wp_stream_is_large_records_table', $record_number > 1000000, $record_number );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tharsheblows Suggestion: adding a docblock comment to all custom hooks would be beneficial for developers to understand how this plugin can be adjusted :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoszgadomski ACK yes, I totally forgot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 6111295

Copy link
Contributor

@delawski delawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing the feedback. The PR looks fabulous!

@delawski delawski added this to the 4.1.0 milestone Aug 7, 2024
@tharsheblows tharsheblows merged commit 0318c8a into develop Aug 7, 2024
2 checks passed
@tharsheblows tharsheblows deleted the fix/1503-reset-large-tables branch August 7, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large tables not able to be reset
3 participants