Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFD 177: Upgrade JS package manager #43404

Merged
merged 4 commits into from
Jul 4, 2024
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
194 changes: 194 additions & 0 deletions rfd/0177-upgrade-js-package-manager.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you look at how pnpm interacts with dependabot? That's something we need to keep in mind now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, we don't need to change anything, pnpm uses the same value for package-ecosystem: npm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependabot seems to support only pnpm v7 and v8, at least according to the docs. There are some reports of v9 working though. dependabot/dependabot-core#9522

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot to send the comment.

People report that pnpm v9 works when "packageManager": "[email protected]" is set in package.json. We will set this anyway, so we shouldn't run into this problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Renovate? We may switch from Dependabot to Renovate in the future..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that worries me is compatibility with different platforms that we build our software on. As in, pnpm not working on some old CentOS or whatever.

Before setting out to fix all the problems in the code to make it compatible with pnpm, what do you think about making a tag build with pnpm installing some simple package.json in a separate folder?

IIRC, we use corepack on many platforms, so hopefully installing pnpm will be straightforward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, installing pnpm should be straightforward. We install yarn by calling corepack enable yarn, it needs to be changed to corepack enable to enable Corepack itself, allowing it to install other package mangers.

As for system compatibility, if the buildbox can run modern Node.js, it shouldn't have problems running pnpm (fyi we no longer run Node.js on CentOS). But I see your point, I will run a tag build to test it.

Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
---
author: Grzegorz Zdunek ([email protected])
state: draft
---

# RFD 177 - Upgrade JS package manager

## Required Approvers

- Engineering: @zmb3 && (@ravicious || @ryanclark)

## What
Currently (June 2024), we use Yarn 1.x to manage JS dependencies. We should
upgrade it to Yarn >= 2.x or switch to a different manager.

## Way
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Way
## Why

Yarn 1.x entered maintenance mode in January 2020 and will eventually reach
end-of-life in terms of support.
It also has many unfixed issues, with a one that affects us the most:
[incorrect dependency hoisting when using workspaces](https://github.com/Yarnpkg/Yarn/issues/7572).

## Details
### What is the problem with dependency hoisting in Yarn 1.x?
Dependency hoisting is a feature designed to optimize the layout of the
`node_modules` directory by moving dependencies as high in the project directory
tree as possible. This approach reduces duplication and simplifies the
dependency tree, leading to less disk space usage.
However, this does not work correctly in Yarn 1.x and hoists too deep
dependencies.
For example, if `packages/build/package.json` depends on `@babel/core ^7.23.2`
and `@storybook/builder-webpack5 ^6.5.16` depends on `@babel/core ^7.12.10`,
Yarn will hoist the `@babel/core` version resolved from that storybook
dependency, rather than the direct `@babel/core` dep in package.json.
This makes managing dependencies difficult: we don't really know what version of
a package is being used.

We experienced this issue many times, and the workaround was to move such
dependency to the root `package.json`.
The Yarn team confirmed that it won't be fixed in the 1.x branch.

Theoretically, upgrading the package manager to Yarn >= 2.0.0 (with
`nodeLinker: node-modules` option that hoists dependencies in a similar way to
Yarn 1.x) would solve this particular problem, but we should first take a closer
look at the dependency hoisting problem.

### What’s the problem with dependency hoisting in general?
Traditionally, packages mangers (npm, Yarn 1.x) maintain a flat dependency list
in `node_modules` by hoisting all dependencies there.
For example, when the developer installs package `fuu`, which depends on
`bar` and `baz`, the package manager creates three directories,
`fuu`, `bar` and `baz` in `node_modules`.
The developer can then import all of these packages in code, although only one
of them is listed in the `package.json` file.
It is possible because of how the Node.js module resolution algorithm works.
The `package.json` file is not needed at all; when Node.js resolves an import,
it traverses `node_modules` folders all the way to the user's root directory,
looking for that dependency.

The main problem with using undeclared dependencies is that we have no control
over them. The next (even a patch) version of the `fuu` package may update `bar`
to a version that makes our code relying on `bar` stop working properly (read
more: [pnpm's strictness helps to avoid silly bugs](https://www.kochan.io/nodejs/pnpms-strictness-helps-to-avoid-silly-bugs.html)).
We can see such dependencies in our codebase: we import `prop-types` in a few
components, but we don't specify in any workspace’s `package.json`
(it's taken from `react-select`).

To address that problem, the new package manager should use non-flat (or
sometimes called isolated) `node_modules` structure, which contains only
direct dependencies, declared in `package.json`.
Let's go through installation options of the most popular package managers:
* npm has `install-strategy`:
* hoisted (default): install non-duplicated in top-level, and duplicated as
necessary within directory structure.
* nested: install in place, no hoisting. Note: this does not seem to work
correctly https://github.com/npm/cli/issues/6537.
* shallow: only install direct deps at top-level. Note: it creates
`node_modules` directory only in the project's root, so it does not change
much when it comes to workspaces, the dependencies are still hoisted.
* linked: (experimental) install in node_modules/.store, link in place,
unhoisted.
Note: workes like pnpm. See an [RFD](https://github.com/npm/rfcs/blob/main/accepted/0042-isolated-mode.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note: workes like pnpm. See an [RFD](https://github.com/npm/rfcs/blob/main/accepted/0042-isolated-mode.md).
Note: works like pnpm. See an [RFD](https://github.com/npm/rfcs/blob/main/accepted/0042-isolated-mode.md).

* Yarn has `nodeLinker`:
* pnp: no node_modules, unhoisted. Note: does not work with Electron.
* pnpm: install in `node_modules/.store`, link in place, unhoisted.
Note: It should work with the latest alpha electron-builder and Electron
versions.
* node_modules: flat, hoisted. Same as the node_modules created by npm or
Yarn Classic.
* pnpm has `node-linker`:
* isolated: dependencies are symlinked from a virtual store at
`node_modules/.pnpm`. Note: I was able to build Teleport Connect with the
latest electron-builder and Electron versions.
* hoisted: a flat node_modules without symlinks is created.
Same as the node_modules created by npm or Yarn Classic.
* pnp: no node_modules. Same as pnp in Yarn.

The installation option that is most strict and works with Electron is
npm's `linked`, Yarn's `pnpm` and pnpm's `isolated`.

### So, what package manager should we migrate to?
Modern package managers offer similar features, so I think the strictness and
performance of the manager are the factors that differentiate them the most.
Here pnpm appears to be the winner. It supports installing dependencies as
linked by default, which makes it the most battle-tested implementation.
The speed is also great, it's consistently one of the fastest tools in
[benchmarks](https://p.datadoghq.eu/sb/d2wdprp9uki7gfks-c562c42f4dfd0ade4885690fa719c818?fromUser=false&refresh_mode=sliding&tpl_var_npm%5B0%5D=%2A&tpl_var_pnpm%5B0%5D=%2A&tpl_var_yarn-classic%5B0%5D=%2A&tpl_var_yarn-modern%5B0%5D=%2A&tpl_var_yarn-nm%5B0%5D=%2A&tpl_var_yarn-pnpm%5B0%5D=no&from_ts=1711447153431&to_ts=1719223153431&live=true).

The npm support for `linked` is really interesting, but as long as it is
experimental, we should not use it in a production environment.

When it comes to Yarn, the main advantage is that we already use it, so we could
still type `yarn start-teleport`.
However, if we are going to use pnpm installation method, why not switch to pnpm
entirely?
Yarn also shows a warning `The pnpm linker doesn't support providing different
versions to workspaces' peer dependencies` during the installation, but I'm not
sure what significance this has.
Comment on lines +115 to +117
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means a situation where one package has something like react@^17.0.0 in its peer deps and another package has react@^18.0.0. Does this mean that pnpm works in a similar manner to cargo, bundler and other sensible package managers? I didn't spend a lot of time looking for this but I couldn't find how pnpm resolves versions of deps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this article answers the question https://pnpm.io/how-peers-are-resolved

Does this mean that pnpm works in a similar manner to cargo, bundler and other sensible package managers?

I don't have much experience with other package managers, but I'm guessing you mean resolving exact dependency versions for packages?
So, if I'm correct, Yarn in pnpm mode is not able to resolve different react versions from your example, but pnpm manager can do so. That's another point for pnpm :)

gzdunek marked this conversation as resolved.
Show resolved Hide resolved

To sum up: I recommend migrating to pnpm.

## Migration process
### Establishing correct relationships between workspaces
After switching the package manager to pnpm, the Web UI/Connect apps won’t run,
due to TS/JS code in the workspaces not being able to resolve their imports
(like `react`).
Why? This is a continuation of the hoisting problem — our workspaces rely
on dependencies of other workspaces, without declaring them in their
`package.json`.
It worked in Yarn 1.x, since adding a dependency to a one workspace made it
available in any other workspace (because of hoisting to the root
`node_modules`).
We can see this in `packages/build`: the `@opentelemetry/*` dependencies are
declared there but used only in `packages/teleport`.

This contradicts the idea of workspaces: they are designed to encapsulate
project dependencies and code. Sharing dependencies between unrelated workspaces
breaks this encapsulation, leading to potential conflicts and contamination
where changes in one workspace inadvertently affect another.

To allow the projects to compile again, the dependencies need to be moved to the
correct workspaces.
I see three scenarios for how they should be moved over:
1. Dependencies that are used only in a single workspace should be kept there.
Example: `electron-builder` that builds Connect - it’s declared in
`packages/build`, but it should be in `packages/teleterm`.
2. Project-wide dependencies imported everywhere, should be kept in the root
`package.json`.
Example: `react` - declared in `build/package.json`, but imported in every other
workspace.
3. Project-wide tools used in the root `package.json` should be kept in
`build/package.json`, if possible.
Example: `jest` or `storybook` can be installed in `build/package.json`
which will expose them through scripts to the root `package.json`.
Comment on lines +150 to +153
Copy link
Member

@ravicious ravicious Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose behind a separate build package then?

I believe the package structure that we have today comes from Gravity and Teleport which had to share common deps. Gravity got discontinued, but the package structure remains, because we now have Connect which also needs the same deps as Teleport.

The other question is, do we need workspaces at all?

Where are workspaces useful?

From my experience, they're useful when inside a single repo you work on multiple individual packages that are consumed separately, and those packages depend on a bunch of smaller packages that are in the repo. "Consumed separately" might mean they're published as separate npm packages. Or there's multiple apps and separate CI jobs want to pull in only deps required by a certain app.

When working on those separate packages, you don't want to be limited by the regular npm package limitations. I.e., if you want to update <Button>, it's best if you could just update one file and both teleterm and teleport picked up the changes immediately. It'd be bad if you had to release [email protected] first and then update it in both teleterm and teleport.

Perhaps also those separate packages need separate deps, or separate versions of deps.

Quoting from Yarn docs:

Workspaces shine in many situations, with the main ones being:

  • When a project has a core package surrounded with various add-ons; this is for example the case for Babel.

  • When you need to publish multiple packages and want to avoid your contributors having to open PRs on many separate repositories whenever they want to make a change. This is for example the case for Jest.

  • Or when projects want to keep strict boundaries within their code and avoid becoming an entangled monolith. This is for example the case for Yarn itself, or many enterprise codebases.

The last point might sound as if it applied to us.

How do we use workspaces at the moment?

Currently, we do none of that. We always install all deps, we always clone the whole repo. There's pretty much no dep separation. Almost all dep management is done by Vite which just pulls relevant deps into the final bundle.

On the contrary – we want both projects (Teleport and Connect) to use the same versions of deps so that it's easy to reuse code between them. So the way pnpm resolves deps is beneficial here – we want the same deps always, everywhere.

The way we use workspaces feels similar to how you'd use regular CommonJS modules to share code. <Button> is used in multiple places, so you place it in button/index.js and use it in multiple places. vite is used by both projects, so you place it in build/package.json and use it in multiple places.

But as you mentioned, that's not how packages work in. The closest we are to using packages the "correct" way is shared and design maybe, where I don't really care how <Button> is implemented or what deps it needs, I just want to use it in my component. So I import Button from design and yarn handles design deps during installation.

Could we live without workspaces?

Connect needs a separate package.json so that electron-builder can pick correct deps. This is the one place where dep separation actually matters. Otherwise each .dmg with Connect would ship with JS deps it doesn't use at all.

Could we build Connect without workspaces though? What if it just continued to be a separate folder? Could we survive not using workspaces or are there are benefits to them that I failed to notice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that if Connect is supposed to stay consumed as a separate entity by electron-builder, it needs to behave as a real package. As in, it needs to have its own package.json and the resolved packages need to be accounted for in some lockfile.

So maybe the structure proposed by Grzegorz is fine? A separate build package certainly helps with encapsulating which deps are needed for dev purposes only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we build Connect without workspaces though? What if it just continued to be a separate folder?

When it comes to electron-builder, we could probably use the two package.json structure https://www.electron.build/tutorials/two-package-structure.html?
So in the teleterm/package.json we would still keep the dependencies that should be bundled.
The drawback of it is that we would have to add postinstall script to trigger installing them:

To ensure your dependencies are always updated based on both files, simply add "postinstall": "electron-builder install-app-deps" to your development package.json. This will basically automatically trigger an npm install within your app directory so you don’t have to do this work every time you install/update your dependencies.

But tbh, I'm not a huge fun of this structure, it feels more complicated than the workspace option.

When it comes to workspaces in general, I think the biggest benefit for us is this one:

Or when projects want to keep strict boundaries within their code and avoid becoming an entangled monolith. This is for example the case for Yarn itself, or many enterprise codebases.

Currently we break these boundaries by importing from teleport to shared or to teleterm.
However, while working on the RFD, I realized that we could easily enforce strict rules between the workspaces by getting rid of TS aliases and importing the code directly from the packages, for example @gravitational/shared/AccessRequests instead of shared/AccessRequests. Then we would have to set hoist-workspace-packages=false in the config to prevent linking all workspaces to the root node_modules.

After that, a given workspace would be able to import code only from the workspaces that it declares in its package.json.

Maybe it's possible to achieve a similar result with a linter rule? But for sure enforcing this at the package level provides better DX than during lining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose behind a separate build package then?

I wasn't 100% sure if we should keep it, after we fully migrate to Vite, we will only have jest and vite configs there, plus the dependencies. Mainly, I wanted to not make too many changes in this migration.

However, maybe it will be better to keep the dev dependencies in the root package.json (some of them will need to be moved there anyway and packages/build will shrink even more).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much of it applies to pnpm, but when I was reading about Plug'n'Play, I found this:

Yarn prevents ghost dependencies in the packages your project depends on, but also in your own code itself - this is to decrease the chances that a package would work on your development machine but break once published.

It however has a side effect when it comes to bins. If you have typescript listed at the root of your project, the tsc binary will be available in the root package but only in the root project. In other words, any workspace using the tsc binary in its scripts will need to declare it in its dependencies.

A good recommendation to avoid this kind of issue is to have a "tooling" workspace to contain your infrastructure tools and scripts, and have all other workspaces depend on it.

https://yarnpkg.com/features/pnp#shared-binaries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this scenario in pnpm and adding typescript to the root package.json makes tsc available in a child workspace, so we wouldn't have this problem.

But I think I will still try to keep the tooling in packages/build and see how it works.


There is also an edge case for teleport.e. Its dependencies are declared in
`packages/e-imports`, allowing them to be installed when only the OSS teleport
repository is cloned. This makes the lockfile always look the same, regardless
of teleport.e being cloned or not.
After our workspaces become “isolated”, code in teleport.e will no longer be
able to import these dependencies.
This problem could be solved by re-exporting them from `packages/e-imports`.
For example, `import Highlight from ‘react-highlight’` would become
`import Highlight from '@gravitational/e-imports/react-highlight'`.
However, the ergonomics of this solution is not ideal, as importing packages
from `@gravitational/e-imports` would be too cumbersome for developers.
Instead, we will move these dependencies to the root `package.json`, making
them explicitly available everywhere. We don't have too many packages used
exclusively by teleport.e, so this should be acceptable.
The `e-imports` workspace will be removed.

This part should be done before the actual package manager upgrade.

### Upgrading the manager
In the actual PR, we will need to:
1. Set `"packageManager": "[email protected]”` in the root package.json (the exact
version is required).
- It will install the correct package manager automatically if `corepack` is
enabled.
- Calling `yarn run ...` will return a clear error that the project uses a
different package manager.
2. Convert the lockfile to a new format (`pnpm import`).
3. Update CI, Makefile and package.json scripts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also over communicate the change to ensure no one is bit by this


The dev teams will be notified of the transition via Slack, along with examples
of the updated commands.
For example:
>`yarn start-teleport` is now `pnpm run start-teleport`

### Backports
For the best developer experience, we should change the manager in all release
branches. However, this may be really time-consuming, since all branches have
different dependencies.
Alternatively, we can add `"packageManager": "[email protected]”`to other release
branches to indicate these still use Yarn 1.x.
Loading