Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only update posts that are marked as modified #167

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dirkhh
Copy link

@dirkhh dirkhh commented Feb 28, 2017

Github tells us which files were modified. If we find that information in the
webhook callback, let's not blindly update all posts but instead only update
those posts that we were told were modified.

This adds a new function to the payload class to get the actual head commit
included in the payload.

json_decode(json_encode(), true) is used to convert the object to an array -
I'm not sure if there is a different way that you would like to use for this
(I'm not really a PHP developer - but this works).

Signed-off-by: Dirk Hohndel [email protected]

Github tells us which files were modified. If we find that information in the
webhook callback, let's not blindly update all posts but instead only update
those posts that we were told were modified.

This adds a new function to the payload class to get the actual head commit
included in the payload.

json_decode(json_encode(), true) is used to convert the object to an array -
I'm not sure if there is a different way that you would like to use for this
(I'm not really a PHP developer - but this works).

Signed-off-by: Dirk Hohndel <[email protected]>
@mAAdhaTTah
Copy link
Owner

I haven't forgotten about this. Will take a look this weekend. Do you have this code in place and running on your site? Wondering how it works in the real world.

@dirkhh
Copy link
Author

dirkhh commented Mar 3, 2017

This is NOT ready to pull.
I run this (actually, a marginally newer version) on my server.
It works perfectly and is extremely fast if you only ever merge or push ONE commit at a time. But it actually gets things wrong if you send more than one commit. I have played with a fix for that but that needs more testing (and crashes in some circumstances with my current code). But I'm convinced that this is the right thing to do and that we have all the pieces to know exactly which files changed and to not waste time on the ones that don't. I should have an update in a day or three.

@mAAdhaTTah
Copy link
Owner

Sounds good. I await your updates.

Instead of just looking at the head commit we need to loop over all of
the commits that are in the change set that was pushed. This way we will
track all of the modified files.

Signed-off-by: Dirk Hohndel <[email protected]>
@dirkhh
Copy link
Author

dirkhh commented Mar 5, 2017

I think this is worth actually taking a look at.
Grab the commits array (you already had an access for this), loop over all of the commits in there, convert them to arrays and grab the modified files from them.
I did a medium amount of testing and this seems to work. Unsure if there are situations with adding or deleting files where this would get it wrong.

Copy link
Owner

@mAAdhaTTah mAAdhaTTah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not that familiar with PHP & PHPUnit, don't worry about the tests; I'm not sure all the failures are related to your code anyway. But if you want to take a shot at updating those, that would be awesome.

@@ -76,7 +77,7 @@ public function payload( WordPress_GitHub_Sync_Payload $payload ) {
* @return string|WP_Error
*/
public function master() {
return $this->commit( $this->app->api()->fetch()->master() );
return $this->commit( $this->app->api()->fetch()->master(), null );
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing in null here, we should not pass in anything and update the default value for this method.

@@ -86,7 +87,20 @@ public function master() {
*
* @return string|WP_Error
*/
protected function commit( $commit ) {
protected function commit( $commit , $commits_array) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...in which case $commits_array = array() should go here.

$mod_files = array();
$update_all = true;
foreach ( $commits_array as $commit_obj ) {
if ( $commit_obj != null ) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we could remove this null check, unless there's a possibility of getting null from the API.

@@ -103,10 +117,20 @@ protected function commit( $commit ) {
continue;
}

$posts[] = $post = $this->blob_to_post( $blob );
$post = $this->blob_to_post( $blob );
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're still iterating over the tree (see $commit->tree()->blobs() above), is the savings here primarily from making fewer DB calls?

$update_all = true;
foreach ( $commits_array as $commit_obj ) {
if ( $commit_obj != null ) {
$next_commit = json_decode(json_encode($commit_obj), true);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can cast the object to an array with $next_commit = (array) $commit_obj; instead of the whole JSON thing.

@litefeel
Copy link
Contributor

litefeel commented Mar 7, 2017

My english is so pool.

how to get commits
commits is not complate in webhook, it's just only 20
https://developer.github.com/v3/activity/events/types/#pushevent

if commits.length >= 20 then
     // https://developer.github.com/v3/repos/commits/#compare-two-commits
     // get commits with compare commits api, this api can list max is 250, i think is enough.
     GET /repos/:owner/:repo/compare/:commit...master
end

merge commit to find file of update and new.

cache something

Got a packet bigger than 'max_allowed_packet'

max_allowed_packet is 1M by default. so dont overtake it.

Make the cache smaller.
Commit only cache github response, no other.
Tree only cache sha and path, dont cache commit in tree.

// in tree
function getPostBySha($sha) {
    return $cache->getSha($sha); 
}

function getPostByPath($path) {
    $sha = $this->getSha($path);
    $return $this->getPostBySha($sha);
}

@dirkhh
Copy link
Author

dirkhh commented Mar 7, 2017

Based on @lite3 's comments I am in way over my head.
I think I might be able to implement the changes that @mAAdhaTTah requested (actually, sure, I could have done that), but I definitely don't feel comfortable adding calls to a different API.
Let's just forget about this - I have it in my version and it does what I need - no one's going to submit more than 20 commits as a pull request to my website.

@dirkhh dirkhh closed this Mar 7, 2017
@mAAdhaTTah mAAdhaTTah reopened this Mar 7, 2017
@mAAdhaTTah
Copy link
Owner

@dirkhh If you're willing to make the changes suggested, we could resolve the "20 commits issue" by just falling back to the old behavior (get the entire tree), rather than spiking the work you've done already.

@dirkhh
Copy link
Author

dirkhh commented Mar 7, 2017 via email

@litefeel
Copy link
Contributor

litefeel commented Mar 8, 2017

Now I am doing a new version, do not cache, do not tree. do not too many commits.

@mAAdhaTTah
Copy link
Owner

@dirkhh @litefeel are either of you guys working on this?

@dirkhh
Copy link
Author

dirkhh commented Jul 2, 2017

I have been completely side tracked. I have a hacked version that does what I need, but I'd need to start from scratch to do this right. So if anyone else wants to look, please don't wait for me.

@litefeel
Copy link
Contributor

litefeel commented Jul 3, 2017

@mAAdhaTTah I have been complete it. https://github.com/litefeel/writing-on-github/
don't convert markdown to html.

@litefeel
Copy link
Contributor

litefeel commented Jul 3, 2017

@mAAdhaTTah Replace the tree api with the content api

@mAAdhaTTah mAAdhaTTah mentioned this pull request Feb 1, 2018
3 tasks
@stebrech stebrech mentioned this pull request Feb 15, 2018
@IeuanWalker
Copy link

@dirkhh @mAAdhaTTah
I want to implement a similar thing in C#. And I can't really read PHP that well.
Could you list the Github API that are in use, and how they are being used

I'm only interested in importing post from Github (not exporting)

@mAAdhaTTah
Copy link
Owner

@IeuanWalker We used the Git Data API and build up the new tree based on the current tree, then add a new commit pointing to the new tree. This helps us avoid adding multiple commits for things like the initial site export and batch deletion (which is why we don't use the Content API @litefeel).

@IeuanWalker
Copy link

@mAAdhaTTah Thanks but that's mostly for exporting isn't it?, do you something different to import the post from GitHub?

All I want to be able to do is add/ update a new post to a repository (not from the web application) and for the web application to pick up the change and add/ update the database. Repository won't need to be updated from the web application.

@mAAdhaTTah
Copy link
Owner

No, we use it to read the tree as well. You can also use the Content API, which we used initially.

@IeuanWalker
Copy link

@mAAdhaTTah Oh ok. So are you using the push webhook to know when something has been added then the git data api to update/ add the new posts?

@mAAdhaTTah
Copy link
Owner

Yeah, basically.

@IeuanWalker
Copy link

@mAAdhaTTah cool makes sense thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants