-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rsync option to package command and improve performance #303
base: master
Are you sure you want to change the base?
Changes from all commits
3bf8f4e
ecdde95
2b38622
9115c50
b975570
30d5997
1139d5c
33519bb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,6 +51,35 @@ directory. Defaults to the docroot. | |
packages directory. This can be overridden by calling grunt package with the | ||
`--name` parameter. | ||
|
||
**archive**: Used to enable the `compress` option via the configuration instead | ||
of the command line. Defaults to false. | ||
|
||
**rsync**: If set to true, rsync will be used instead of file copying for | ||
improved performance. This automatically enables the `compress` option and will | ||
write the archive to the package destination. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why enable compress by default? The main use case I've had for this requires not doing it, so curious what the thought is. Don't recall if that can be turned off, we should note that in docs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's just a default. You can still set "archive" to false if you don't want it. Just trying to support the best defaults for new projects. I want "rsync=true" to represent the default behavior for "high performance" that enables archive, vm_data, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suppose a project leaves it creating an archive, and turns on rsync. What is the use case we're aiming for? Operational code backups? I'm open to calling this a best default, I'm just trying to understand what we're talking about empowering. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By creating a single archive and using rsync you get the highest performance. There is only a single file (the archive) the needs to be moved out of the VM disk and into the NFS mounted system which is hugely faster than moving the non-archived directory. What they do with the archive is project specific. I expect we'll eventually add a "deploy" command that does all of the git commit/push work from inside grunt, but for now the archived package can be moved to a hosting site, or uncompressed locally and committed to a production repo. Basically, we are just trying to minimize the total data flow to NFS along with the individual number of files (since each file has NFS overhead, so lots of small files are really bad performance). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha, writing the tarball being less I/O, offset by the time to create the tarball. Have you noted the time to create the tarball is less than syncing the files individually? I expect so. |
||
|
||
**vmData**: This path is used as the intermediate generation directory for the | ||
package when using the `rsync` option. By selecting a volume mounted in your | ||
VM `/data` area, this can significantly improve performance. The archive is | ||
still moved to the normal package destination when complete. Defaults to | ||
`build/vm_data`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. GDT is not targeted at virtual/containerized environments. This piece needs a bit more contextual documentation and to remove Seems strange to place it in build/vm_data, why not build/package/tmp or even get a formal /tmp directory generated via os.tmpdir()? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the super important part of this PR. Without the virtual intermediate directory there was no way to get anything performant and people are bypassing gdt on their projects because of this. I know GDT is not "targetting" at containers, which is why this option isn't enabled by default. The idea here is that in your docker container you mount a volume that gdt can use for various purposes to improve performance. Only tackling the packaging right now, but I can imagine other commands using this also. So don't want it to be within "build/package". The /data is just an example that pertains to our own container, but that isn't anywhere in the GDT except the docs as an example. Since our docs already reference many P2 practices, I don't see this as a problem right now (see the CI docs for example). Still, specific updates to this doc file are welcome if you want to wordsmith it a bit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I want to make sure all config options and documentation are not overly bound to containers or virtual environments specifically, even if the changes we have facilitate that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To make this less specific to Docker and the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, I think calling it a tempDir option would be fine. I'll make that change to the code and docs. Thanks for the suggestion. |
||
|
||
## Performance | ||
|
||
When running `grunt package` within a docker container, the default file | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we focus more on NFS mounts and less on Docker? I'm fine with Docker in the docs, but it should be almost more like a sidebar detail. It's great that we're looking into accelerating this for use with NFS mounts, it would be great if we could identify similar tricks for Drush Make and Composer (build in an intermediate directory, transfer to the volume mounted space where speedup can better affect the complete codebase. The major issue that originally drove me to look into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I covered this above, but the whole point of this PR was performance. Switching from copy to rsync gives a small performance improvement. The majority of the improvement comes from using the vm_data. Yes, it's really about NFS and not docker, so the docs can be wordsmithed a bit. And yes, this vm_data is definitely something we want to incorporate into other areas of gdt. The package command was just the big target because it was taking 40 minutes on a project. |
||
operations will be using NFS to update the files on your local disk, which will | ||
be very slow. Using the `rsync` option along with creating the `build/vm_data` | ||
intermediate mount point in your docker container can improve performance by | ||
a factor of 20 or more. However, the downside is only the compressed archive of | ||
your package will be written to your local disk. The full package folder will | ||
only be accessible within your docker container. | ||
|
||
A sample docker-composer.yml entry to mount the vm_data looks like this: | ||
``` | ||
volumes: | ||
- /data/PROJECTNAME/package:/var/www/build/vm_data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've picked up this branch and am trying to pull it along. As part of this I'm reviewing and revising the docs to clarify my game plan. I'm confused why we would volume mount the intermediary directory. Wouldn't that slow down the rsync operation compared to leaving the tempDir somewhere in the container but not in a volume mount where NFS slowdowns can catch it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The /data volume is just in the VM, not on the local Mac NFS system. This simply makes the stuff in /var/www/build/vm_data persistent, similar to what we do with the MySQL database. Volumes are only slow if they map the local Mac folders |
||
``` | ||
|
||
## Packaging for Acquia | ||
|
||
The `package` command has the flexibility to support many different use cases, | ||
|
@@ -65,6 +94,7 @@ Acquia repository with support for custom hooks and scripts. | |
"packages": { | ||
"srcFiles": ["!sites/*/files/**", "!xmlrpc.php", "!modules/php/*"], | ||
"projFiles": ["README*", "bin/**", "hooks/**"], | ||
"rsync": true, | ||
"dest": { | ||
"docroot": "docroot" | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,27 +61,99 @@ module.exports = function(grunt) { | |
|
||
grunt.registerTask('package', 'Package the operational codebase for deployment. Use package:compress to create an archive.', function() { | ||
grunt.loadNpmTasks('grunt-contrib-copy'); | ||
grunt.loadNpmTasks('grunt-shell'); | ||
|
||
var config = grunt.config.get('config.packages'); | ||
var srcFiles = ['**', '!**/.gitkeep'].concat((config && config.srcFiles && config.srcFiles.length) ? config.srcFiles : '**'); | ||
var srcFiles = (config && config.srcFiles && config.srcFiles.length) ? config.srcFiles : []; | ||
var projFiles = (config && config.projFiles && config.projFiles.length) ? config.projFiles : []; | ||
|
||
// Look for a package target spec, build destination path. | ||
var packageName = grunt.option('name') || config.name || 'package'; | ||
var destPath = grunt.config.get('config.buildPaths.packages') + '/' + packageName; | ||
|
||
// Determine if we are using rsync and vmPath and tarball for performance. | ||
var useRsync = grunt.config('config.packages.rsync') || false; | ||
// Determine if we are creating a tarball archive. | ||
var archive = grunt.config('config.packages.archive') || useRsync; | ||
|
||
// When using rsync, generate to a path that can be mounted in the VM. | ||
var vmPath = grunt.config.get('config.packages.vmData') || 'build/vm_data'; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per earlier comments, I'd like a more contextual, less tech-specific name for this -- intermediary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Specific suggestions are welcome. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per other thread, we're calling it tempDir. Just closing the loop here :) Certainly shorter than |
||
var destPath = vmPath + '/' + packageName; | ||
var finalPath = grunt.config.get('config.buildPaths.packages') + '/' + packageName; | ||
if (!useRsync) { | ||
// If not using rsync, then directly generate to final path. | ||
destPath = finalPath; | ||
} | ||
|
||
var tasks = []; | ||
var cleanPaths = [destPath]; | ||
if (destPath !== finalPath) { | ||
cleanPaths.push(finalPath); | ||
} | ||
grunt.option('package-dest', destPath); | ||
grunt.config.set('clean.packages', cleanPaths); | ||
tasks.push('clean:packages'); | ||
|
||
var excludePaths = ['bower_components', 'node_modules', '.gitkeep']; | ||
|
||
if (useRsync) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As previously explored with https://github.com/phase2/grunt-drupal-tasks/blob/master/tasks/make.js#L71, we need a fallback for Windows which so far has been simply "Got Windows? Okay, we'll still use copy!" Because the configuration for copy and rsync are not fully compatible, I've been thinking of rsync not as a mode, but rsync to copy fallback as a breaking change so we can revise configuration requirements. Open to discussion on this point. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code I added makes them compatible. You still use srcFiles, projFiles, and the new excludePaths. The code builds the proper arguments for either copy or rsync. So yes, Windows cannot use the "rsync=true" option. If there is a simple way to detect if rsync is installed I'd be happy to use that to set the default and have it fallback. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have a canRsync utility function used by the code I linked to. I'd like that to be checked as part of the condition whether we are using Rsync. We can make that function smarter to detect if rsync is installed in a follow-up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice, I didn't see the canRsync. I'll add that. |
||
// Pull negate conditions from srcFiles into excludePaths. | ||
for (var srcIndex = 0; srcIndex < srcFiles.length; srcIndex++) { | ||
var item = srcFiles[srcIndex]; | ||
if (item.substr(0, 1) === '!') { | ||
item = item.slice(1); | ||
item = item.replace('**', '*'); | ||
excludePaths.push(item); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is really smart, however, the filesystem matching by the Node minimatch library has more breadth of syntax than is typically supported by Bash/Rsync/etc beyond just negation, such as fancy globbing and simple expressions. That's the incompatibility I alluded to earlier. I think we need to confirm that failure is reasonably graceful as a minimum. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but I think we can address that in the documentation. IMHO none of our config options should be tied to a specific implementation (node, rsync, etc). So we should specify how wildcards are handled and then convert to the needed format for either Copy or Rsync. Right now the only issue is ** vs * and rsync seems to take the ** and still function, so I didn't need to change that. If people want fancy expressions then we should use some form of RegEx that would be less implementation-specific. But that's work for later. For now if somebody wants to use a fancy Node minimatch expression then they just can't use the new rsync option. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
} | ||
} | ||
|
||
// Setup default rsync command and options. | ||
var rsync = 'rsync -ahWL --no-l --stats --chmod=Du+rwx '; | ||
for (var pathIndex = 0; pathIndex < excludePaths.length; pathIndex++) { | ||
rsync = rsync + "--exclude " + excludePaths[pathIndex] + ' '; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like only the final excluded file would be included here. Shouldn't we remove the loop and replace with an array join on whitespace? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is adding "--exclude path" to rsync for each item in the array. So maybe a join with "--exclude " would work? Otherwise this loop does actually work fine. |
||
} | ||
|
||
// Ensure destination exists and sent the rsync. | ||
var srcPath = grunt.config('config.buildPaths.html'); | ||
var destSrc = path.resolve(destPath, grunt.config.get('config.packages.dest.docroot') || ''); | ||
var rsyncCommand = rsync + srcPath + '/ ' + destSrc + '/'; | ||
|
||
grunt.config.set('mkdir.package', { | ||
options: { | ||
create: [destSrc] | ||
} | ||
}); | ||
|
||
grunt.config('shell.rsync', { | ||
command: rsyncCommand | ||
}); | ||
} else { | ||
// Use slower copy if rsync is disabled in options. | ||
srcFiles.unshift('**'); | ||
// Add any additional excludePaths to srcFiles. | ||
for (var i = 0; i < excludePaths.length; i++) { | ||
srcFiles.push('!**/' + excludePaths[i]); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Array merge seems more clean here. Also, something to dedup might be worthwhile to make sure any code that wants to remove exclusions only has to find a single instance of a mask. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I originally looped through excludePaths to add the '!**/' and then merged the array with srcFiles but this seemed cleaner. So I guess I'd welcome a code alternative to review. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something like:
Union both merges and unique-ifies. Looks like we should check on updating lodash. |
||
} | ||
grunt.config('copy.source', { | ||
files: [ | ||
{ | ||
expand: true, | ||
cwd: '<%= config.buildPaths.html %>', | ||
src: srcFiles, | ||
dest: path.resolve(destPath, grunt.config.get('config.packages.dest.docroot') || ''), | ||
dot: true, | ||
follow: true | ||
} | ||
], | ||
options: { | ||
gruntLogHeader: false, | ||
mode: true | ||
} | ||
}); | ||
} | ||
|
||
// Always copy any files specified in projFiles. | ||
grunt.config('copy.package', { | ||
files: [ | ||
{ | ||
expand: true, | ||
cwd: '<%= config.buildPaths.html %>', | ||
src: srcFiles, | ||
dest: path.resolve(destPath, grunt.config.get('config.packages.dest.docroot') || ''), | ||
dot: true, | ||
follow: true | ||
}, | ||
{ | ||
expand: true, | ||
src: projFiles, | ||
|
@@ -96,9 +168,12 @@ module.exports = function(grunt) { | |
} | ||
}); | ||
|
||
grunt.config.set('clean.packages', [destPath]); | ||
|
||
tasks.push('clean:packages'); | ||
if (useRsync) { | ||
tasks.push('mkdir:package'); | ||
tasks.push('shell:rsync'); | ||
} else { | ||
tasks.push('copy:source'); | ||
} | ||
tasks.push('copy:package'); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. copy:package looks like it's been stripped down to some final copy operations of random project files. Maybe this should be renamed for clarity? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I even thought of "copy:projFiles" to be really specific. If we call it "package-extras" then having the config be "projFiles" might be confusing too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That seems fine. This is a stepping stone to #255, otherwise I'd suggest we drop projFiles and rename the config for clarity. |
||
|
||
// If the `composer.json` file is being packaged, rebuild composer dependencies without dev. | ||
|
@@ -117,25 +192,31 @@ module.exports = function(grunt) { | |
tasks.push('composer:drupal-scaffold'); | ||
} | ||
|
||
if (this.args[0] && this.args[0] === 'compress') { | ||
if (archive || (this.args[0] && this.args[0] === 'compress')) { | ||
grunt.loadNpmTasks('grunt-contrib-compress'); | ||
grunt.config('compress.package', { | ||
options: { | ||
archive: destPath + '.tgz', | ||
mode: 'tgz', | ||
gruntLogHeader: false | ||
mode: 'tgz' | ||
}, | ||
files: [ | ||
{ | ||
expand: true, | ||
dot: true, | ||
cwd: grunt.config.get('config.buildPaths.packages') + '/' + packageName, | ||
cwd: destPath, | ||
src: ['**'] | ||
} | ||
] | ||
}); | ||
|
||
tasks.push('compress:package'); | ||
|
||
if (destPath !== finalPath) { | ||
grunt.config('shell.mvArchive', { | ||
command: 'mv ' + destPath + '.tgz ' + finalPath + '.tgz' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Of note, mv will not work across NFS volume mounts. We will want to document that limitation a bit to help inform how projects are configured. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So that's what I thought too. But it seems to work across docker volumes. Maybe we should change it from a "mv" to a normal grunt copy. Then delete the tarball from the destPath. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, maybe that restriction is only outside the container. mv is an order of magnitude faster than copy, so if it works I think we should keep it until we have reason to switch to copy. A bigger issue is windows compat. For portability we should just switch to an fs.rename[Sync]. |
||
}); | ||
tasks.push('shell:mvArchive'); | ||
} | ||
} | ||
|
||
grunt.task.run(tasks); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove quiet? Seems like an unrelated change. We might make the test less verbose by setting the environment variable
GDT_QUIET
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the --quiet because it was super hard to debug the failing travis tests. Travis already spits out a ton of stuff and seeing what the grunt tasks actually do and if they get errors is important I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was --quiet suppressing errors too? In our case it's used to suppress desktop notifications which are irrelevant from inside travis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package command had a bug that was not completing all of its tasks, so without seeing the task output it was difficult to debug. Also hard to see the actual tasks being run through the cloud of all the other output generated in the travis log. I don't consider those desktop notifications irrelevant for travis as they aided me in debugging. And it doesn't hurt to have them, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the
--quiet
option andGDT_QUIET
env var are supposed to only affect desktop notifications (i.e. growl). If the option affects the error reporting from Grunt, we should change the option to avoid conflicts.