Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test AMP compatibility of entire site #1183

Merged
merged 91 commits into from
Aug 29, 2018
Merged

Test AMP compatibility of entire site #1183

merged 91 commits into from
Aug 29, 2018

Conversation

kienstra
Copy link
Contributor

@kienstra kienstra commented May 30, 2018

Pull Request For #1007

This crawls the site, and test for AMP validity by template/content type.

  • Validate the homepage, as the existing query won't include it if it's not a static page.
  • Should this also validate the URLs of categories and tags?
  • Consider whether reusing AMP_Validation_Manager::validate_queued_posts_on_frontend() is the best way way to validate these URLs. It will probably need to be changed to batch the requests to the front-end, and I'm not sure if that will interfere with the existing functionality.
  • Consider whether this really needs its own class. At the moment, it's probably not big enough to need it.

Fixes #1007.

Define get_post_permalinks().
This gets permalinks for public posts,
other than attachments.
Another helper function can call
AMP_Validation_Manager::validate_url()
for each of these permalinks.
@kienstra kienstra changed the title [WIP] Ability to test AMP compatibility of entire site [WIP] Test AMP compatibility of entire site May 30, 2018
kienstra added 3 commits May 29, 2018 23:49
validate_queued_posts_on_frontend() uses IDs,
which are stored in
$posts_pending_frontend_validation.
So simply return IDs from the helper function.
Before, $post_ids wasn't assigned as an array().
This addresses a failed Travis build.
Querying for ID was simply to make testing easier.
So remove that, and fall back to the default
'orderby' => 'date'.
foreach ( $site_post_ids as $id ) {
AMP_Validation_Manager::$posts_pending_frontend_validation[ $id ] = true;
}
return AMP_Validation_Manager::validate_queued_posts_on_frontend();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggested, this validate_queued_posts_on_frontend method is probably not relevant. The main thing will be to:

  1. Obtain the list of URLs on the site, including homepage, posts, pages, categories, archives, etc. A good sitemap XML generator plugin should have the logic you need for this.
  2. If in paired mode, append ?amp to each of the URLs discovered. This will be used instead of amp_get_permalink() because the endpoint will be eliminated when AMP theme support is present. See Discontinue use of amp endpoint in favor of query var when amp theme support is present #1148. Some consideration in paired mode may be needed for whether AMP is available or not for a given post.
  3. For each AMP URL, call \AMP_Validation_Manager::validate_url() and pass the results into \AMP_Invalid_URL_Post_Type::store_validation_errors().
  4. Chunk the results to prevent timing out. This is particularly key in WP-Cron and Ajax. In WP-CLI this isn't relevant, and a progress meter should be shown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your fast reply, @westonruter! These details really help.

kienstra added 2 commits May 30, 2018 13:23
This class is not going to use this method anymore:
validate_queued_posts_on_frontend().
So this will need the permalinks, not IDs.
For example, http://example.org/?cat=2
This includes categories and tags,
and any more that are registered.
But it excludes post_format links.
* @param int $number_links The maximum amount of links to get (optional).
* @return string[] $links The term links in an array.
*/
public static function get_taxonomy_links( $number_links = 200 ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be pagination support here. The WP_Term_Query supports an $offset arg. I suggest that this method only return links for one specific taxonomy. The site crawler will probably need to make multiple calls to such a method to iterate over all of the taxonomy terms in batches. One way to do it would be to get the terms sorted by ID in ascending order. That ensures that when you paginate through the terms you won't miss terms that get added while you're processing through them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good ideas. These commits apply your suggestions, as I understand them. This now has $taxonomy and $offset parameters.

'public' => true,
) );

// It doesn't seem necessary to get links for post format terms, like asides, galleries, or quotes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it is necessary. There are archive links for post formats: https://codex.wordpress.org/Function_Reference/get_post_format_link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, commit db54a removes the exemption of post_format terms.

@westonruter
Copy link
Member

I think the “Check Site” button would make sense on the “Invalid AMP Pages” screen. Clicking it should show a spinner and some progress indicator probably.

And as noted on #1184 (comment):

I've been considering that we should add a new AMP admin screen to manage settings regarding theme support, including:

  • Forcibly enable amp theme support for the current theme, even if it lacks add_theme_support('amp'). This will be particularly useful as the plugin will support all the core themes in AMP by default, and it would allow
  • Decide whether to have paired mode.
  • Preemptively opt-in to tree shaking, since it is going to be needed in the vast majority of cases anyway. Options here could be “Ask me” (block AMP without accepting), “As needed” (sometimes), and “Always”.

Having a “Check Site Compatibility” button would make a lot of sense on such a Theme Support admin screen. As part of this, we should allow for the amp_validate query parameter to force the plugin to behave as if amp theme support is present. In that way, a site could be checked for AMP compatibility before even enabling amp theme support via an opt-in toggle on that admin screen.

kienstra added 13 commits May 30, 2018 17:25
Before, I had unset() these from the array.
But as Weston pointed out,
post_format terms can have archive links.
Add a $taxonomy parameter,
to only get the terms for a given taxonomy.
As Weston mentioned,
this helps to get the links in batches.
Also, simplify the test.
Props 10up Engineering Best Practices.
The prevents the need to use
wp_list_pluck() to get the ids.
A similar change to the one Weston proposed
for get_taxonomy_links().
Get permalinks by post types,
and allow paging through them.
This will allow iteration, in a way similar to
that in get_taxonomy_links().
Iterate over all of the terms and posts
from the existing helper methods.
@todo: consider self::BATCH_SIZE,
and whether the validation requests
should be throttled,
or somehow prevented from timing out WP.
In PHP 5.3, there was:
Fatal error: Cannot access self:: when no class scope is active.
https://travis-ci.org/Automattic/amp-wp/jobs/386045157.

This looks to be because self isn't
available inside the function in 5.3.
Another option might be to add $self to use().
But this seems simpler, if it works.
This will enable crawling the site in Native AMP.
This seems simpler than registering Paired Mode
support, and adding template_dir.
But it's still an open discussion as to
whether it should only be Native AMP.
If this already finds theme support,
it doesn't change it.
In the validate_url() helper function,
find if it's in Paired Mode.
If so, set the $url to the amp endpoint.
This mainly calls the existing helper function.
But it also passes it $wp_cli_progress.
This advances the progress bar,
after each post type or taxonomy validation completes.
Success: 196 URLs were crawled, and 196 have AMP validation issue(s).
Query for the validation error posts.
And output the number found.
This drives to the page:
Invalid AMP Pages (URLs)

Also, output a message at the beginning:
Crawling the entire site to test for AMP validity.
This might take a while...
@kienstra
Copy link
Contributor Author

kienstra commented Jun 1, 2018

[WIP] WP-CLI Output
When you run wp eval-file bin/validate-site.php

wp-cli-output

And when it finishes:
p-cli-after

@westonruter
Copy link
Member

That UI for WP-CLI looks perfect 👌

} else {
echo "Please run this script with WP-CLI via: wp eval-file bin/validate-site.php\n";
exit( 1 );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding this as a bin script it should get registered as an actual command with WP-CLI. For example, wp amp validate-site. The command needs to be included with the plugin when it is released, and the bin directory is just for development.

}

if ( $wp_cli_progress ) {
$wp_cli_progress->tick();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could get de-coupled from WP-CLI by passing a $progress_callback function instead which is called, and then you could call the method like so:

\AMP_Site_Validation::validate_entire_site_urls( function() use ( $wp_cli_progress ) {
    $wp_cli_progress->tick()
} ) 

*/
function crawl_site() {
\WP_CLI::log( 'Crawling the entire site to test for AMP validity. This might take a while...' );
$count_post_types_and_taxonomies = count( get_post_types( array( 'public' => true ), 'names' ) ) + count( get_taxonomies( array( 'public' => true ) ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should count the number of posts and terms, not just the number of types.

$offset = 0;

while ( ! empty( $permalinks ) ) {
self::validate_urls( $permalinks );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, the tick should be done with each URL that is processed, not after each type is processed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Commit 3d9a01 calls tick() for every URL validated.

@kienstra
Copy link
Contributor Author

kienstra commented Jun 7, 2018

Hi @westonruter,
Thanks for reviewing this. You make good points, like registering an actual WP-CLI command.

If it's alright, I'm going to leave this work for the next phase.

kienstra added 3 commits July 11, 2018 23:49
3 assertions failed,
so fix these both in the tested class
and the PHPUnit class.
On Weston's suggestion.
Before, running the validation required:
wp eval-file bin/validate-site.php.
This has a much better display of the activity.
On Weston's suggestion.
Create a new method to count all of the URLs
to be validated.
This should only be used for WP-CLI,
as it uses 'posts_per_page' => -1
@westonruter
Copy link
Member

Also, this doesn't register the command at all in 'Classic' mode, as the class is bootstrapped in AMP_Validation_Manager. Should that change?

@kienstra Also, this is important for AC2 to be satisfied:

AC2: A validation test should be able to evaluate an entire site even if AMP theme support is not declared yet. (i.e. Be able to force the equivalent of AMP theme support when crawling.)

If I'm in Classic mode, currently the result is:

$ wp amp validate-site --limit=2 --force
Crawling the site for AMP validity.
Validating 13 URLs...  100% [==========================================================================] 0:00 / 0:00
Success: 0 crawled URLs have unaccepted issue(s) out of 0 total which AMP validation issue(s); 0 URLs were crawled.

For more details, please see:
https://src.wordpress-develop.test/wp-admin/edit.php?post_type=amp_invalid_url

+--------------------------+-----------+---------------+
| Template or content type | URL Count | Validity Rate |
+--------------------------+-----------+---------------+
+--------------------------+-----------+---------------+

So that needs to be improved so that site validity can be checked for AMP theme support prior to switching it on. To achieve that, I think the following needs to be done:

  • When invoking the command, check if the theme support is classic, and if so, return an error if --force is not supplied and explain why.
  • When --force is enabled, the force query param should cause not only AMP to be available for a URL but also for AMP theme support to be enabled.
  • The command will need to just-in-time call \AMP_Validation_Error_Taxonomy::register() and \AMP_Invalid_URL_Post_Type::register()

@kienstra
Copy link
Contributor Author

Hi @westonruter,
That's a good point, and thanks for the steps to validate in 'Classic' mode. I'll work on that today if that's alright.

On Weston's suggestion, as this would not be
able to validate any URL without the --force flag.
As Weston suggested,
if in Classic mode and --force is passed,
force theme support in the crawled URLs.
This is now done in read_theme_support(),
thought there might be a better way.
@kienstra
Copy link
Contributor Author

Hi @westonruter, the 2 commits above apply your suggestions for forcing theme support when in Classic mode. I hope they capture what you had in mind.

Here's the error that appears when in 'Classic' mode and --force isn't present:

wp-cli-error

self::$force_crawl_urls = true;
}

if ( ! current_theme_supports( 'amp' ) ) {
Copy link
Contributor Author

@kienstra kienstra Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional will capture the Template Mode being Classic, though maybe you had in mind a more narrow check, like:

! current_theme_supports( 'amp' ) && 'disabled' === AMP_Options_Manager::get_option( 'theme_support' ) 

* Only show admin URL to review results if theme support initially present.
* Skip showing table if no results obtained.
… and taxonomy

* Hide the admin menu links for invalid URLs and validation errors by default in classic mode.
* Invoking `wp amp validate-site` will cause the post type and taxonomy to be populated.
* When in classic mode, show links to invalid pages and validation errors with template mode as opposed to in admin menu.
* When viewing the admin screens for the post type and taxonomy, show the admin menu links for both.
@westonruter
Copy link
Member

The last piece here seemed to me to be providing a way for the user to access the validation results for their wp amp validate-site call even while AMP theme support is turned off. This is implemented in edcb23a, where the post type and taxonomy are unconditionally registered but they are only shown in the admin menu if theme support is present. Otherwise, links to the screens are shown if there are invalid URL posts that exist:

image

if ( current_theme_supports( 'amp' ) ) {
return true;
}
if ( 'edit.php' === $pagenow && ( isset( $_GET['post_type'] ) && self::POST_TYPE_SLUG === $_GET['post_type'] ) ) { // WPCS: CSRF OK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return the result of the conditional instead of explicitedly retuning true or false?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Done in a83a569.

if ( current_theme_supports( 'amp' ) ) {
return true;
}
if ( 'edit-tags.php' === $pagenow && ( isset( $_GET['taxonomy'] ) && self::TAXONOMY_SLUG === $_GET['taxonomy'] ) ) { // WPCS: CSRF OK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here....return the result of the conditional expression instead of explicit trueor false. It’s less code and more clear.

WP_CLI::success(
sprintf(
/* translators: $1%d is the number of URls crawled, $2%d is the number of validation issues, $3%d is the number of unaccepted issues, $4%s is the list of validation by type, $5%s is the link for more details */
__( '%3$d crawled URLs have unaccepted issue(s) out of %2$d total which AMP validation issue(s); %1$d URLs were crawled.', 'amp' ),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

total which AMP validation issue(s)

should be:

total with AMP validation issue(s)

@kienstra
Copy link
Contributor Author

Commands Work As Expected

Hi @westonruter,
The WP-CLI commands here work as expected. Including:

wp amp validate-site

wp amp validate-site --include=is_author

wp amp validate-site --force (when in Classic mode)

wp amp validate-site --limit=2

limit-2

Also, great idea to add links for the invalid URLs and errors in Classic mode:

classic-validation

@westonruter westonruter merged commit c539a18 into develop Aug 29, 2018
@westonruter westonruter deleted the valid-site branch August 29, 2018 01:41
@westonruter westonruter added this to the v1.0 milestone Aug 29, 2018
@postphotos
Copy link
Contributor

Hi @westonruter - really like the progress here and thanks for tagging this in #1359.

Also wanted to call out #1273 here - the idea I spelled out there could try to surface errors next to where toggles exist (so that a user is more likely to select modes when they might be afraid to do this). I'm not entirely sure this is necessary or how straightforward implementing it might be, but it's captured in the backlog regardless.

Both that feature (and #1273) are trying to take an approach to accomplish the inverse goals of #1366: Instead of surfacing settings info in the error listings, surface the error listings in the settings.

The key decisions I still think are up for discussion (past v1.0-stable) will reside around how much information a user should see and in what context that's necessary, but based on the work presented in #1006 I'm happy with the current efforts. Thanks for your help here!

cc: @jwold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants