This guide complements the developer guide and gives additional information on the project, how to modify DISCO and how to contribute to the codebase.
Disco has grown a lot since its early days, and like any sizeable code base, getting started is both difficult and intimidating: there are a lot of files, it's not clear what's important at first, and even where to start is a bit of a puzzle. This document aims at giving you an efficient process to get familiar with DISCO.
The two main technologies behind DISCO are TypeScript and distributed machine learning. In the following sections I will assume that you are familiar with both to a certain extent. If not, the following references might be useful:
Important
Disco is a big project and some information is probably outdated. Let us know by opening an issue.
DISCO is a complex project composed of the Disco.js library (discojs
, discojs-node
and discojs-web
), a front-end (webapp
),
a server
and a cli
(e.g., for benchmarking). Depending on what your goal is, you might only use a subset of them, e.g. you won't need an in-depth understanding of the webapp and Vue.js to add a new decentralized learning feature. Instead, you will probably rely on the CLI.
-
If you are going to work, contribute and improve the project, I first recommend you get a good understand of what DISCO does: play around with the website, train a model from the pre-defined tasks, or even create your own custom task. Feedback is always appreciated, feel free to let us know via the github issues/in person if you noticed any issues or thought of an improvement.
-
Then, get a high-level understanding of the different parts of the projects in the developer guide, even if you're planning on working on a subset of the project. If you want to know more about a specific part of the project, refer to the table of contents at the end of the DEV guide.
-
Follow the installation instructions from the developer guide to launch a DISCO instance running in your browser.
Tip
The most common issues with running DISCO are usually due to using old Node.js versions and setting the appropriate environment on M1 Macs, see our FAQ for more troubleshooting. Note that DISCO has been not tested on Windows (only Linux and macOS).
There are many ways to use Disco.js: from a browser, a CLI, by importing discojs-node
in your own Node.js scripts and applications or from your own UI implementation. Note that whatever your setting, using Disco.js always requires a server
instance. Some cases like the CLI starts a server instance automatically, but others like the webapp doesn't and require either an existing instance or to have you launch a local server instance. As described in the server
README file, the server is in charge of connecting peers to the ML tasks. In order to connect and partake in a distributed training session you first need to find the session and how to join it so the sever exposes an API to that end.
As a contributor, you will certainly end up having to run TypeScript scripts. A practical way to do so is to use ts-node
:
npm i -g ts-node # globally to run scripts from anywhere
ts-node your_script.ts
You can start a server
instance locally with:
npm -w server start
Running the server relies on nodemon
which watches the module for changes and enables hot-reloading. Therefore, any (saved) code change is automatically taken into account and doesn't require a build. However, note that modifying discojs
isn't effective automatically and requires a build. You may have to restart the server manually after rebuilding discojs
. Section Building discojs
discusses this in more details.
You can test the server with:
npm -w server test
Make sure you are not running a server at the same time as the test suite will launch its own instance. We use mocha, chai and supertest for testing; respectively they are libraries for unit tests, assertions, and http testing.
Server tests live in the server/tests/
folder. All files ending with the .spec.ts
extension written in this folder will be run as tests. Simply write a new your_own_test.spec.ts
file to include it in the testing pipeline.
If you are planning to contribute to the webapp
, have a look at VUEJS.md to read more on how Vue.js is used in this project.
The webapp
requires that an server instance is running. You can start a local one as described in the last section with:
npm -w server start # from the root folder
The webapp can now be started with:
npm -w webapp start # from the root folder
npm start # from the webapp folder
The Vue development mode supports hot-reloading via vite
and the client will automatically restart whenever a change in webapp
is detected. Starting the Web Client should print something similar to
VITE v5.2.7 ready in 1312 ms
➜ Local: http://localhost:8081/
➜ Network: use --host to expose
➜ press h + enter to show help
You can access the client at the Local address from the machine running the webapp and any device on the same network can access the app with the Network address.
As said previously, modifying discojs
isn't effective automatically and requires a build.
You can test the webapp
with:
npm -w webapp test
The webapp tests rely on cypress
and the test suite is located in the webapp/cypress
folder.
Note that you can also run test interactively in the browser of your choice. To do so, run
VITE_SERVER_URL=http://server npx -w webapp start-server-and-test start http://localhost:8081 'cypress open --e2e'
which should open the Cypress UI and let you choose the browser you wand to use and which tests to run. More information on the Cypress docs.
It is possible to record the cypress tests ran in the Github Actions CI and visualize them in the Cypress Cloud. It is currently used only when needed (because the free plan has a limited number of recordings). The cypress documentation describes how to set up the recordings.
-
A Disco project has been created in the Cypress Cloud and you need to be added to the project to be able to visualize the recordings.
-
In case a new Cypress project is now being used, make sure that the settings are correct:
- In
webapp/cypress.config.ts
make sure the correct project ID has been set, It currently is:
projectId: "aps8et"
- The github workflow
.github/workflows/record-cypress.yml
relies onCYPRESS_RECORD_KEY
which is a github repository secret.
- Finally, you can trigger the
record-cypress
workflow manually from github as described in the documentation
If you are brought to modify the discojs
folder have a look at DISCOJS.md which explains some of the concepts internal to the library.
Because TypeScript needs to be transpiled to JavaScript, you need to rebuild the discojs
folder for changes to be effective:
npm -w discojs run build
The previous command invokes the TypeScript compiler (tsc
) which successively compiles discojs
, discojs-node
and discojs-web
, creating equivalent JavaScript files in the modules' respective dist/
directory.
To automate the building phase, you can use the watch
command to rebuild a module whenever changes are detected. The watch
command currently only works at the level of discojs
, discojs-node
or discojs-web
(i.e., running watch over the whole discojs
folder doesn't work and would only watch discojs
)
npm -w ./discojs run watch build
npm -w ./discojs-node run watch build # another terminal
npm -w ./discojs-web run watch build # one more terminal
Building is not necessary for other modules like the server
the webapp
or cli
as long as no change have been made to discojs
. However you may need to restart the server
or the webapp
after rebuilding discojs
.
To test discojs
, first make sure a server instance is running:
npm -w server start
And then start the discojs
test suite:
npm -w discojs test
Similarly to the server, any file ending with .spec.ts
will be ran in the test suite. As a convention, we duplicate the name of the TypeScript file we are testing. For example, async_informant.spec.ts
tests features implemented in async_informant.ts
and is located in the same folder.
discojs
contains the core, platform-agnostic code of Disco.js, used by both discojs-web
and discojs-node
. As such, contributions to discojs
must only contain code independent of either Node or the browser. As the names subtly suggest, discojs-node
and discojs-web
implement features specific to Node.js and browsers respectively, mostly related to memory and data handling as browser don't allow access to the file system.
Currently, the discojs-node
project is available as the @epfml/discojs-node
NPM package, which can be installed with
npm i @epfml/discojs-node
and the discojs-web
as the @epfml/discojs-web
.
Tip
If your code changes don't seem to be effective, close everything, rebuild everything and restart. For example, changes in discojs/src/default_tasks
requires rebuilding discojs
and restarting the server
to be effective.
In Disco, we rely on the widely used debug
library. To use it, we first import debug and instantiate the debug object:
import createDebug from "debug";
const debug = createDebug("discojs:models:gpt:model"); // use nested namespaces
const logs = { loss: 0.01, accuracy: 0.56}
debug("Here are the GPT logs: %o", logs)
To visualize the logs in the command line, we need to set the DEBUG
environment variable to choose the namespaces from which you want to see the debug statements. For example:
DEBUG='discojs:models:gpt*' npm -w cli run benchmark_gpt
will print the debug statement from above. Similarly if we set DEBUG='*'
.
The server debug statements are visualized the same way, for example:
DEBUG='server*,discojs*' npm -w server start
shows the debug statements from anywhere in the server and in discojs.
To visualize debug statements in the browser, you need to open the console (Inspect element > Console) and set the localStorage.debug
to the namespace of your choice, for example localStorage.debug='webapp*,discojs*'
to visualize both the debug statements from anywhere in the webapp and in discojs. Note that you may need to refresh the page for changes to localStorage to be effective.
To get debug statements in the Cypress tests you need to modify webapp/cypress/support/e2e.ts
and add:
beforeEach(() => { localStorage.debug = "discojs*,webapp*" });
We need to set the localStorage
before each test because it is reset between each unit tests.
The procedure for working on a feature is the following:
- Create a new branch to work in
- Write code along with comments and docstring to implement a feature
- Write tests for the feature
- Create a draft pull request (PR)
- Run the test suites and clean your code
- Request a review and jump back back to 2. if needed
- Merge the PR
Once you start working on a feature, create a new branch from the develop
branch, and use the following convention: IssueNumber-Key-Word-YourName
.
So for example, if I am working on issue #202, which is related to fixing a train bug I would call this branch: 202-train-bug-nacho
From your local repository:
# currently in branch `develop`
git checkout -b 202-train-bug-nacho
Once you've committed some changes, push the new branch to the remote (origin
here):
git push -u origin 202-train-bug-nacho
-u
, short for --set-upstream
, makes the remote branch (origin/202-train-bug-nacho
) track your local branch (202-train-bug-nacho
)
- TypeScript files should be written in snake_case, lowercase words separated by underscores, e.g.
event_connection.ts
- Vue.js files should be written in PascalCase (capitalized words including the first), e.g.
DatasetInput.vue
- Classes, interfaces and types should also be written in PascalCase. For example class
MeanAggregator
and interfaceEventConnection
- Functions and variable names should be written in camelCase, starting with a lowercase letter: function
isWithinRoundCutoff
and variableroundCutoff
Write docstring in the JSDoc style. For reference: list of JSDoc tags supported in TypeScript.
Test the newly implemented features locally by following instructions in the Contributing in practice section.
Once you have added a minimum number of content to your branch, you can create a draft PR. Create a pull request to merge your branch (e.g., 202-train-bug-nacho
) into the develop
branch. develop
should always be functional and up to date with new working features. It is the equivalent of the main
or master
branch in DISCO.
It is important to give a good description to your PR as this makes it easier for other people to go through it.
[!TIP] > This PR is a good example.
Once you have finished your work on your draft PR, make sure to do the following before turning it into review PR.
- Run the adequate test suites (server, webapp, discojs).
- Make sure you remove debugging comments / console outputs.
- Merge (or rebase if you can do it properly)
develop
into your feature branch:
git checkout develop
git pull
git checkout 202-train-bug-nacho
git merge develop
# Solve potential merge conflicts
git push
- Ask for your PR to be reviewed and merge it once it is approved. Delete your feature branch afterwards.
Depending on what you will be working on you may be interested in different documentation. Have a look at the markdown guides in docs
and the table of content in DEV.md. Notably:
- Understanding Disco.js inner workings is key if you are planning to add a new machine learning feature or work in
discojs
- The Vue.js architecture guide explains how the browser client is implemented with Vue.js.
- Regarding cryptography and privacy, this document explains the measures DISCO takes to ensure privacy and confidentiality.