Test sharding / parallelization #439

vojtajina · 2013-03-29T08:16:43Z

Allow splitting a single test suite into multiple chunks, so that it can be executed in multiple browsers and therefore executed on multiple machines or at least using multiple cores.

shteou · 2013-07-10T13:40:01Z

Hey, I saw Issue #412 and it's exactly the sort of functionality I am looking for.

Has there been any movement on this?
If there aren't any plans, do you have any high level thoughts on how you would like this to be implemented?

Cheers.

vojtajina · 2013-07-14T23:04:16Z

I definitely wanna do this, it's hard to say when...

We need a way to group files.
It should be dynamic (not that developers have to manually group them), so that it's possible to scale (easily change the number of file groups - the number of browsers where we execute in parallel).
When using a dependency management system (eg. Closure's goog.require, goog.provide; see karma-closure), this will be much easier and simplified (because we have can figure out the dependency graph and therefore only load the files that are really needed).
Probably "label" test files and then split these test files (assuming each test file has same amount of test; later we can add some sort of cost/labeling).

Web-server has to understand these groups
Web-server, when serving context.html has to understand these groups and also know what browser is requesting.
Probably additional argument to "execute" message to the client(browser), and the client, when refreshing the iframe uses this "groupId" as a query param, eg. context.html?group=2.
This grouping should be probably done in fileList (or once the file list changes; similar to what karma-closure does now; we should make the "file_list_modified" event async).

Currently the resolved files object looks something like:

{
  served: [
    // File objects (metadata, where the file is stored, timestamps, etc.)
  ],
  included: [
    // File objects
  ]
}

After this change, it would be:

{
  served: [
    // File objects (I'm thinking we might make this map rather than an array)
  ],
  included: {
    0: [
      // files included in the group=0 browser
    ],
    1: [
      // files included in the group=1 browser
    ]
  }

vojtajina · 2013-07-14T23:04:54Z

@shteou Would you be interested in working on this ? I would definitely help you...

vojtajina · 2013-07-14T23:08:26Z

Also, does it make sense ?

I mentioned karma-closure couple of times, that is this plugin: https://github.com/karma-runner/karma-closure
It is very "hacky" (the way how it hooks into Karma; if you had multiple plugins like this, it would end up really badly;-)), but it does interesting things - it basically analyze the dependencies and therefore can generate list of included files, based on the dependencies. So before this "resolving" we would group the test files and karma-closure would resolve each group separately...

EtaiG · 2014-12-27T23:17:10Z

I'd just like to note that we (Wix) tried this out. danyshaanan created a test case for this.
He enabled karma to split up the loaded tests and run them in several child processes.

It didn't work out too well, since many of our tests require loading a large part of our code, and therefore the setup time and loading of all the packages for each child process was too costly. Depending on the number of child processes, it usually ended up running slower, though it did come close to running in the same amount of time.

We tried it out when we were at about ~2,000 tests. We now have over 3,500 tests in this project, so it might be worth revisiting this.

If anyone else is working on this or has another angle for this, we are also more than happy to help.

jbnicolai · 2015-01-09T10:27:50Z

@EtaiG I have not started working on this, but as my project currently exceeds 3000 tests as well it's becoming something I want to invest some time into as well.

danyshaanan · 2015-01-11T16:15:51Z

A note about dynamic creation of groups (@vojtajina) - We should be aware of how this affects tests that happen to have side effects. Imagine tests B, C, and D, and a naive alphabetical devision of tests into two groups - {B,C} and {D}. Lets say C has a devastating side affect and I'm adding test A, hence changing the grouping to {A,B} and {C,D}. Now D will fail, just because the grouping changed.

Of course, tests shouldn't have side effects, but this case is bound to happen, and might be very confusing to users.

EtaiG · 2015-01-11T17:44:48Z

I think we can ignore this case and let people who encounter it deal with
their own problems.
We can also enable the API for this to allow the consumer to decide the
grouping.
On Jan 11, 2015 6:16 PM, "Dany Shaanan" [email protected] wrote:

A note about dynamic creation of groups (@vojtajina
https://github.com/vojtajina) - We should be aware of how this affects
tests that happen to have side effects. Imagine tests B, C, and D, and a
naive alphabetical devision of tests into two groups - {B,C} and {D}. Lets
say C has a devastating side affect and I'm adding test A, hence changing
the grouping to {A,B} and {C,D}. Now D will fail, just because the grouping
changed.

Of course, tests shouldn't have side effects, but this case is bound to
happen, and might be very confusing to users.

—
Reply to this email directly or view it on GitHub
#439 (comment).

scriby · 2015-03-30T18:51:46Z

+1

We have a large number of tests at work, and sharding would be very beneficial. As you said, there shouldn't be side effects between tests, and for anyone who doesn't want to remove the side effects, I'd say they just don't get to run their tests in parallel :)

As long as the sharding is opt-in I think the confusion should be manageable.

LFDM · 2015-04-20T20:33:26Z

Hey @EtaiG and @danyshaanan, very interested in this experiment you mentioned. Is this code accessible somewhere? I'd very much like to experiment with this a bit - maybe your work could give me a headstart!

danyshaanan · 2015-04-20T21:13:47Z

@LFDM We have nothing to share at the moment, but I'm just about to rewrite a smarter version of it in the coming couple of weeks. I'll try to do so in a way I'll be able to share. Feel free to ping me about this in a week or so if I'll not post anything by then.

AlanFoster · 2015-05-09T10:21:55Z

👍 Sounds like a great idea to me; Would love to see any progress updates on this @danyshaanan!

park9140 · 2015-08-10T18:33:43Z

👍 this would be great.

ghost · 2015-09-08T15:04:16Z

@danyshaanan! any news?

danyshaanan · 2015-09-09T06:49:57Z

@aaichlmayr :
yeah, bad ones - It didn't seem to work out that well; This feature's spec is non trivial, therefore the implementation was not as clean and I would hope, and the benefits were not convincing enough to go ahead with it, so we scrapped this plan.

ghost · 2015-09-09T13:09:02Z

Thanks for the info

booleanbetrayal · 2015-12-16T15:56:54Z

As long as the sharding is opt-in I think the confusion should be manageable.

Totally agree. Think it's fine to launch as an experimental feature with this requirement. Would love to see this land and would be happy to help bug-hunt, etc.

FezVrasta · 2015-12-21T14:55:15Z

Hi, has the feature been shipped already?

navels · 2016-04-11T17:16:18Z

👍 anyone making headway on this?

presidenten · 2016-08-17T14:22:32Z

So is this a feature yet?

presidenten · 2016-08-18T12:08:31Z

I dont get it? Did I do somethink wrong? What did I miss? Why the thumbs down?

FezVrasta · 2016-08-18T12:12:46Z

If you reply to an issue, all the subscribed people get an email and a notification. If you just want to add a +1 on the issue, do so adding a thumb up reaction to the first post (or to the one with more upvotes), in this way you don't flood the whole list of subscribers.

Florian-R · 2016-08-18T12:30:08Z

@dignifiedquire Could you lock this one like #1320 with a help:wanted label? Thanks!

pauldraper · 2016-12-13T19:17:00Z

When using a dependency management system this will be much easier and simplified

True, but it's a hack to get these systems to work in karma in the first place right? I'm tempted to put that consideration aside for now.

I agree with the suggestions made by @vojtajina #439 (comment) (even if they are three and a half years old :)

I thinking

module.exports = function(config) {
  config.set({
    files: [
      'lib/lib1.js',
      'lib/lib2.js',
      {'other/**/*.js', included: false},
      {pattern: 'app/**/*.js', grouped: true},
      {patttern: 'more-app/**/*.js', grouped: true}, 
    ],
  });
}

And then the resolved object would be

{
  served: [
    'other/file1.js',
    'other/files/file2.js'
  ],
  included: {
    common: [
      'lib/lib1.js',
      'lib/lib2.js'
    ],
    groups: {
      0: {
        'app/file1.js',
        'app/file2.js',
      },
      1: {
        'app/file3.js',
        'more-app/file.js',
      }
    }
  }
}

We could reuse concurrency, though a default of Infinity is bad -- most commonly we want to run as many tests as we have cores.

We'd probably want a groups config. I could divide my code into 10 groups, and run with concurrency 3 until they are all done. As @EtaiG pointed out, there is a balance between fine-grained scheduling for better utilization, and overhead of loading common files.

habermeier · 2016-12-28T22:28:25Z

I'd hate to have to group tests by hand. What if we had the system uses the regular configuration as a starting point, and have it build up and refine an optimal parallel test plan over time? Along the way it might be able to discover any dependency chains (that shouldn't be there, but might be). It could flag those as "todo" items for developers, but could work around those should it discover them. Whatever it does, it'd be good for it to be able to deal with changes in the test code gracefully, so it would not have to recompute the whole thing all over again when a single test is added (or removed).

I'm sure the computational complexity would be enormous for getting at the very best configuration, but maybe some rough heuristics would get us reasonably close.

FezVrasta · 2016-12-28T23:25:22Z

really, just copy what jest does. it's fine

pauldraper · 2017-01-02T23:39:52Z

I'd hate to have to group tests by hand.

Not sure if I understand this right, but I wasn't suggesting that.

{pattern: 'app/**/*.js', grouped: true},
{pattern: 'more-app/**/*.js', grouped: true},

grouped just means "these are the files that are eligible for sharding", as opposed to the common library files that are in all the tests groups. Karma then generates a number of arbitrary groups automatically from those locations. In the example, the generated groups were app/file1.js,app/file2.js and app/file3.js,more-app/file.js.

I suggest a groups config option for the number of groups. It can be tuned to weigh scheduling efficiency against startup overhead.

brandonros · 2017-04-20T18:05:01Z

Is this dead in the water?

vikerman · 2017-06-06T04:58:43Z

Hello - Here is my proposal for running tests in parallel.

This is a very simple sharding strategy - But it should provide speedup just by using multiple processors on the machine. This is meant mostly for local development and not much for CI runs(where remote CI setup costs far outweigh the speed gains of parallelization)

karma.js changes:

The root Karma URL can take in 3 extra URL parameters - shardId, shardIndex, totalShards
This is passed on to the context iframe

Jasmine(or Mocha etc.) Adapter changes:

In the context page, the Jasmine(or mocha) adapter can process the shardIndex and totalShards parameters
The adapter passes the shardId, shardIndex and totalShards in the "info" object it uses while connecting back to the Karma sever(See section below on how it's used)
The adapter walks the suite/spec tree and collects all leaf specs in an array
The adapter uses a very simple sharding strategy - It runs the subset of tests
[(shardIndex/totalShards * totalTests) -> ((shardIndex + 1)/totalShards * totalTests)]

Karma server changes:

The server now has logic to wait for all shards to connect before starting a test run
This is so that the test execution doesn't immediately start when the browser instance corresponding to the first shard connects and run once more when the rest of the shards connect.
Server uses "shardId" and "totalShards" to determine whether enough number of sharded browsers have connected with the same "shardId"

Chrome(or other launcher) changes:

"ChromeSharded" (and "ChromeHeadlesSharded etc.) is a new type of launcher that launches Chrome with N different tabs - each with the same shardId and totalShards and the appropriate shardIndex
Default number of shards = Number of processors on the machine
Overriden by some Launcher flag / Environment variable

Reporter changes: (I need to flush this out more. Any ideas welcome here)

No changes in initial version - For local runs reporter output doesn't matter? Prints error from any of the shards on the console
Ideally reporter can collate all results from all the shards into a single report

brandonros · 2017-08-02T17:45:52Z

I'm going to try to get this done this weekend. I don't know anything about this project or the codebase, but I think a ton of people would be saved a ton of time if I can figure something out. Maybe different tabs running on different ports? Could get hairy...

littleninja · 2017-08-02T19:56:00Z

@brandonros rock on! I'm in a similar boat--haven't contributed to the project before but would also be interested if I can help with a sharding feature. Would you be willing to create a fork to get the ball rolling? I don't know--do people usually create "WIP - xyz feature" pull request to help rally effort?

brandonros · 2017-08-03T19:19:59Z

So, here's what happens at a high level:

your package calls Karma with a configuration describing a framework like Jasmine
A browser is opened and pointed to the local Karma server, which serves up some HTML/JS listing your files to test, and an iframe, then hands it off to the framework

I actually think it would make more sense to inject into Jasmine, because I was able to boil down exactly where the tests are executed. However, I ran into an issue where it didn't like that I was trying to execute different suites at the same time.

So, I came up with multiple tabs:

http://localhost:9876/?shardIndex=0&numShards=4
http://localhost:9876/?shardIndex=1&numShards=4
http://localhost:9876/?shardIndex=2&numShards=4
http://localhost:9876/?shardIndex=3&numShards=4

at https://github.com/brandonros/karma/commit/40cf8eb79be7af9892448a16da5b6578cd3dd983

It is still very early. All it does is allow you to chunk the test files (I hardcoded that they have to contain Spec) across multiple tabs. I'll try to update if I make any worthwhile progress on a more complete package.

Edit: I just tested it and it doesn't really work. Karma kind of falls apart as soon as you start serving different tabs different tests. I'm not sure if the architecture of Karma really supports parallel/concurrent testing. Even if I was able to work through the bugs and make the multi-tab approach, I'd need an event and logic to go with it to wait until all browsers are idle.

Edit 2: Something already existed for gulp, but I am still stuck on Grunt, so I came up with this. Hopefully it will help somebody as a rough draft for their Gruntfile.js. The improvement really isn't that great because loading all of the files in 2, 3, 4, 8 tabs isn't the best. I am going to try WebWorkers next.

function setupConcurrentTestTasks() {
    var shards = Array.apply(null, {length: 4});
    var specFiles = glob.sync('source/js/**/*Spec.js');
    var chunkSize = Math.ceil(specFiles.length / shards.length);
    var chunkedSpecFiles = specFiles.chunk(chunkSize);

    shards.forEach(function(shard, index) {
      gruntConfig.copy['karmaSource-' + index] = {
        src: 'karma.conf.tpl.js',
        dest: 'target/karma-' + index + '.conf.source.js',
        options: {
          process: function(content) {
            var sourceFiles = [
              'source/js/**/*.module.js',
              'source/js/app.js',
              '<%= ngtemplates.utils.dest %>',
              '<%= ngtemplates.components.dest %>',
              '<%= ngtemplates.framework.dest %>',
              'source/js/**/!(*Spec).js'
            ].concat(chunkedSpecFiles[index]);

            return filterKarmaConfig(sourceFiles, content);
          }
        }
      };

      gruntConfig.copy['karmaDist-' + index] = {
        src: 'karma.conf.tpl.js',
        dest: 'target/karma-' + index + '.conf.dist.js',
        options: {
          process: function(content) {
            var distFile = [
              'target/dist/js/<%= pkg.name %>-<%= pkg.version %>.min.js'
            ].concat(chunkedSpecFiles[index]);

            return filterKarmaConfig(distFile, content);
          }
        }
      };

      gruntConfig.karma['debug-' + index] = {
        configFile: 'target/karma-' + index + '.conf.source.js',
        options: {
         preprocessors: debugPrep,
          singleRun: grunt.option('singleRun'),
          browsers: noBrowsers ? [] : ['<%= karmaBrowser %>']
        }
      };

      gruntConfig.karma['dist-' + index] = {
        configFile: 'target/karma-' + index + '.conf.dist.js',
        preprocessors: {},
        reporters: ['progress', 'junit', 'threshold'],
        options: {
          browsers: noBrowsers ? [] : ['<%= karmaBrowser %>']
        }
      };

      gruntConfig.concurrent.source.tasks.push('karma-source-' + index);
      gruntConfig.concurrent.dist.tasks.push('karma-dist-' + index);

      grunt.registerTask('karma-source-' + index, [
        'copy:karmaSource-' + index,
        'wiredep:test',
        'karma:debug-' + index
      ]);

      grunt.registerTask('karma-dist-' + index, [
        'copy:karmaDist-' + index,
        'wiredep:target',
        'wiredep:test',
        'karma:dist-' + index
      ]);
    });
  }

  grunt.registerTask('testSourceConcurrently', [
    'clean',
    'writeConfig',
    'copy:test',
    'copy:stage',
    'ngtemplates',
    'concurrent:source'
  ]);

  grunt.registerTask('testDistConcurrently', [
    'compile',
    'concurrent:dist'
  ]);

rschuft · 2017-10-12T15:28:54Z

I created a plugin that automatically shards tests across the listed browsers (e.g. if you want two sets you list two browsers... browsers: ['ChromeHeadless', 'ChromeHeadless']). It doesn't achieve one of the concerns listed in this thread: run tests at the same time. It forces concurrency: 1. It does however fix the memory problems of having too many specs loading in a single browser and it correctly works with coverage reporting.

karma-sharding

UPDATE: Version 4.0.0 of karma-sharding supports parallel browser execution and no longer forces concurrency to 1.

joeljeske · 2018-01-19T03:37:55Z

I have also created a plugin, karma-parallel similar to the one that @rschuft made but is a bit different. It supports running specs in different browser instances by splitting up the commonly used describe as opposed to splitting up the spec files themselves. This allows you to run tests in parallel, even after using a bundler such as karma-webpack or karma-browserify. It also supports testing in multiple types of browsers.

karma-parallel

guilhermejcgois · 2018-01-19T16:38:21Z

@joeljeske do u have tested with angular project using ng-cli? we have one and your plugin seems to be interesting to us, will give a try

joeljeske · 2018-01-20T00:54:19Z

I do not use an ng-cli project on a regular basis, but I have done basic testing on an ng-cli project and it works just fine. Please log any issues if you run into to something.

One note; it is not yet tested with coverage reporting. It would likely be best to disable coverage reporting when using karma-parallel. Coverage reporting is a future feature of the plugin.

rschuft · 2018-01-24T02:37:42Z

The karma-sharding plugin doesn't play well with the ng-cli because of the way webpack bundles the specs together before the middleware engages with it. Hopefully @joeljeske can bypass that limitation with his approach.

joeljeske · 2018-02-02T23:29:11Z

@guilhermejcgois, the latest release of karma-parallel is tested and compliant with ng-cli projects. Code coverage support was just introduced. Would love to hear feedback on your experience with it.

intellix · 2018-05-31T08:19:28Z

My results from karma-parallel on MBP 15" Early 2013 (8 cores): joeljeske/karma-parallel#1 (comment)

1x --------------
real 1m25.659s
user 1m42.960s
sys 0m10.470s

2x --------------
real 1m15.996s
user 2m1.312s
sys 0m13.601s

3x --------------
real 1m11.746s
user 2m23.417s
sys 0m18.218s

4x --------------
real 1m13.970s
user 2m36.800s
sys 0m19.698s

7x just timed out

Was expecting at least 2x perf but seems not to really make a difference. Is anyone doing sharding/parallelism and actually seeing positive results? Would be nice to see

wbhob · 2019-05-10T14:43:05Z

I'm having an issue where, when one shard disconnects, it simply stops that group of tests and moves on to the next – then, it returns exit code 0. Any thoughts on what may be causing this?

Karma ^3.1.4
Jasmine ^2.99
karma-parallel ^0.3.1

rschuft · 2019-05-10T14:45:47Z

I’m not sure. It sounds like it is not really collecting the failures at all. Which browser are you firing off?

…

On May 10, 2019, at 10:43 AM, Wilson Hobbs ***@***.***> wrote: I'm having an issue where, when one shard disconnects, it simply stops that group of tests and moves on to the next – then, it returns exit code 0. Any thoughts on what may be causing this? Karma ^3.1.4 Jasmine ^2.99 karma-parallel ^0.3.1 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

wbhob · 2019-05-10T17:14:38Z

I’ve tried Chrome and ChromeHeadless

juanmendes · 2019-08-12T15:24:17Z

@guilhermejcgois, the latest release of karma-parallel is tested and compliant with ng-cli projects. Code coverage support was just introduced. Would love to hear feedback on your experience with it.

@joeljeske We use karma-parallel and we're getting false positives because one of the shards isn't running all the tests. There are test problems, but karma-parallel hides them because it doesn't fail the tests. After a browser disconnects, typically a shard is restarted from the beginning. However, s sometimes they restart and run just one of the tests. It definitely has to do with the size of the project. We're running 3500 tests where over 200 of them are ng component DOM testing. I've created an issue here joeljeske/karma-parallel#42

ghost assigned vojtajina Mar 29, 2013

vojtajina mentioned this issue Mar 29, 2013

How to utilize multicore #434

Closed

vojtajina mentioned this issue Jul 4, 2013

Run each set of tests in separate iframe #412

Closed

google-admin unassigned vojtajina Mar 26, 2015

nicbou mentioned this issue May 6, 2015

Option to restart browser after test run #1393

Closed

dignifiedquire added the type: backlog label May 16, 2015

dignifiedquire removed this from the v0.14 milestone May 17, 2015

dignifiedquire mentioned this issue May 26, 2015

Collaboration with karma daviddias/piri-piri#6

Open

dignifiedquire removed the type: backlog label May 26, 2015

garysoed mentioned this issue Jun 7, 2015

Split karma to one tab per test garysoed/protoboard#74

Closed

vlazar mentioned this issue Jun 8, 2015

Test multiple globals with the same name #1438

Open

vlazar mentioned this issue Jan 4, 2016

Tests for Bliss.shy LeaVerou/bliss#154

Open

maksimr added the help wanted label Dec 13, 2016

chadwhitacre mentioned this issue Dec 14, 2016

Fix spurious test failure with "creates new identifiers" [OSF-6970] CenterForOpenScience/osf.io#6655

Closed

zsfarkas mentioned this issue Feb 21, 2017

Question: Performance of 'ng test' - possibility of using more than one CPUs angular/angular-cli#4848

Closed

maksimr mentioned this issue May 8, 2017

Feature Request: Ability to run tests in parallel #2695

Closed

stefaneidelloth mentioned this issue Jul 29, 2017

Parallel test execution / sharding theintern/intern#779

Closed

Test sharding / parallelization #439

Test sharding / parallelization #439

Comments

vojtajina commented Mar 29, 2013

shteou commented Jul 10, 2013

vojtajina commented Jul 14, 2013

vojtajina commented Jul 14, 2013

vojtajina commented Jul 14, 2013

EtaiG commented Dec 27, 2014

jbnicolai commented Jan 9, 2015

danyshaanan commented Jan 11, 2015

EtaiG commented Jan 11, 2015

scriby commented Mar 30, 2015

LFDM commented Apr 20, 2015

danyshaanan commented Apr 20, 2015

AlanFoster commented May 9, 2015

park9140 commented Aug 10, 2015

ghost commented Sep 8, 2015

danyshaanan commented Sep 9, 2015

ghost commented Sep 9, 2015

booleanbetrayal commented Dec 16, 2015

FezVrasta commented Dec 21, 2015

navels commented Apr 11, 2016

presidenten commented Aug 17, 2016

presidenten commented Aug 18, 2016

FezVrasta commented Aug 18, 2016

Florian-R commented Aug 18, 2016

pauldraper commented Dec 13, 2016 • edited Loading

habermeier commented Dec 28, 2016

FezVrasta commented Dec 28, 2016

pauldraper commented Jan 2, 2017 • edited Loading

brandonros commented Apr 20, 2017

vikerman commented Jun 6, 2017

brandonros commented Aug 2, 2017

littleninja commented Aug 2, 2017

brandonros commented Aug 3, 2017 • edited Loading

rschuft commented Oct 12, 2017 • edited Loading

joeljeske commented Jan 19, 2018

guilhermejcgois commented Jan 19, 2018

joeljeske commented Jan 20, 2018

rschuft commented Jan 24, 2018

joeljeske commented Feb 2, 2018

intellix commented May 31, 2018 • edited Loading

wbhob commented May 10, 2019

rschuft commented May 10, 2019 via email

wbhob commented May 10, 2019

juanmendes commented Aug 12, 2019 • edited Loading

pauldraper commented Dec 13, 2016 •

edited

Loading

pauldraper commented Jan 2, 2017 •

edited

Loading

brandonros commented Aug 3, 2017 •

edited

Loading

rschuft commented Oct 12, 2017 •

edited

Loading

intellix commented May 31, 2018 •

edited

Loading

juanmendes commented Aug 12, 2019 •

edited

Loading