Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix bug in JupyterLab and VSCode where jupyter notebook metadata-only changes show in git status" #6368

Open
parminder-thindal-moj opened this issue Dec 18, 2024 · 4 comments
Labels
bug Something isn't working feature-request

Comments

@parminder-thindal-moj
Copy link
Contributor

parminder-thindal-moj commented Dec 18, 2024

Describe the feature request.

When working with Jupyter Notebooks in JupyterLab or VS Code on the Analysis Platform (AP), I’m running into a lot of unnecessary metadata changes.

Every cell in the notebook seems to show changes in the id field of the metadata, even when no real edits were made.

Related to Issue #1028

Specifically, every notebook cell shows modifications in the id field of the metadata, as seen in the attached images.

What’s Happening

In VS Code (v2.7.0):

When I open a notebook in VS Code (with the Jupyter extension), it automatically replaces each cell’s id in the metadata with a new randomly generated UUID. For example:

Before: "id": "0"

After: "id": "b8c316e1-962a-4ec0-914b-84b73353ee62"

Image

This isn’t just limited to notebooks I’ve edited—even freshly pulled files from the remote repo get updated metadata IDs:

Image

In JupyterLab (v3.1.13, Python 3.9):

The same thing happens when I open notebooks in JupyterLab.

Image

This happens consistently for all cells, even those I haven’t edited or files that i have not opened, but pulled from the remote.

Why This is a Problem

1. It Breaks Git Workflows

Because every cell gets marked as changed, Git commands become really hard to use:

git stash or discarding changes: Stashing often gets stuck in a loop because all cells are flagged as modified.

git rebase, git pull/push, git switch/checkout: These commands frequently fail due to metadata conflicts that need to be resolved manually.

2. Version Control is a Mess

Meaningful changes in the notebook are buried under all the metadata changes. The git diff is almost unusable because it’s full of irrelevant metadata updates.

3. Cross-Environment Issues

The problem seems worse when:

A notebook is opened in JupyterLab after being created or edited in VS Code (could be due to virtual environment differences).

Pulling changes from the remote repo also triggers these updates.

Workarounds I’ve Tried

To deal with this, I’ve had to resort to:

  • Force-checking out branches.

  • Re-cloning the repo.

  • Deleting and re-adding files.

  • Using nbstripout to remove metadata from notebooks before committing.

nbstripout helps reduce conflicts, but it also leads to blank diffs and commits, which isn’t ideal, and can be cumbersome for the user to recommit empty changes.

What I’m Asking For

Can we look into how notebook metadata (especially id fields) is handled in these tools? Specifically:

Is there a way to prevent unnecessary metadata changes when switching between JupyterLab and VS Code?

Can the id fields be standardized so they don’t change every time a notebook is opened?

Are there better tools or settings we can use to avoid this issue altogether?

Value / Purpose

  • Reduce frustration from users.
  • Streamline collboration & enable cross environmental workflows between Jupyter and Vs Code
  • Improve git usability.
  • Reduce version control noise.
@joeprinold
Copy link

This is an issue that we come across - requires quite a lot of un-picking and probably relates in some missed material changes when quality assuring.

@jacobwoffenden jacobwoffenden changed the title :insect: Fix bug in JupyterLab and VSCode where jupyter notebook metadata-only changes show in git status" 🐛 Fix bug in JupyterLab and VSCode where jupyter notebook metadata-only changes show in git status" Dec 18, 2024
@parminder-thindal-moj
Copy link
Contributor Author

parminder-thindal-moj commented Dec 23, 2024

Image

May or may not be relevant to this issue, but adding this terminal screenshot of jupyterlab to highlight what happens when trying to rebase from main on a feature branch, with a notebook that has a blank cell id in the metadata.

When running git reset --hard to restore branch to latest HEAD, changes are still being picked up in test files despite no changes being actively made.

Image

@laura-williams2
Copy link

Hello, this is causing quite a headache for my team at the moment - do you know when we might have a solution? Thanks

@vmohanram
Copy link

Hi AP Team, Has there been any progress on this please. I am in the same team as @parminder-thindal-moj and @laura-williams2 and this issue is now impacting all of us. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature-request
Projects
Status: 👀 TODO
Development

No branches or pull requests

4 participants