Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell dependency graph #1175

Open
nvdv opened this issue Mar 5, 2016 · 29 comments
Open

Cell dependency graph #1175

nvdv opened this issue Mar 5, 2016 · 29 comments
Milestone

Comments

@nvdv
Copy link
Contributor

nvdv commented Mar 5, 2016

At present all Notebook cells are executed linearly:

Cell 1
   |
Cell 2
   |
Cell 3

but sometimes there's no need to calculate Cell 2 in order to get result from Cell 3 and calculating Cell 2 might be time-consuming.
Setting cell dependency graph somehow would resolve this issue.

@takluyver
Copy link
Member

Have a look at ipycache if you have long-running cells that you don't always want to re-run. I don't think we want to get into defining a DAG of cells.

@Carreau Carreau added this to the no action milestone Mar 7, 2016
@Carreau
Copy link
Member

Carreau commented Mar 7, 2016

There is a long thread we had a few years[*] ago about that on the mailing list.

[*] OMG I'm old now.

@JamiesHQ
Copy link
Member

@nvdv : We're doing a little housekeeping on our issue log and noticed this thread from 2016. Has this issue been resolved to your satisfaction and can it be closed? thanks!

@nvdv
Copy link
Contributor Author

nvdv commented Apr 27, 2017 via email

@adam-m-jcbs
Copy link

The long thread discussing this, linked above by @Carreau , is unreachable for me. So apologies if I'm rehashing things discussed there.

I certainly agree managing a DAG of cells is not desirable. But it would be cool if there was a built-in cell magic for stating cells to be automatically run first before running the current cell. Naively, this doesn't seem to be too burdensome a feature to implement, but I'm mostly a Jupyter notebook user, not developer, so I could be wrong. Does there exist any such cell magic, or a cell magic that could be used for this purpose?

@mxxun
Copy link

mxxun commented Oct 17, 2018

For future reference: the long thread was moved.

@nickurak
Copy link

Conversely, while a dependency graph might tell you you don't need to evaluate/re-evaluate cell B just because A changed, it might also tell you that you're going to have a bad time trying to evaluate C if C depends on A.

In accordance with https://jupyter-notebook.readthedocs.io/en/stable/security.html , if someone tried to execute a cell that depended on another, I wonder if it would make sense to do so automatically?

At a minimum, it might be helpful to have some visual feedback to indicate that the cell isn't runnable until some particular cell above satisfies its dependencies.

@pedrovgp
Copy link

@takluyver, is there any reason for a DAG of cells to be out of question? Visualising cells in a graph would certainly allow both cell dependency to become clearer as well as improve story telling capabilities, since non-linear (branching) stories are hard to tell within today's notebooks.

For a simple concrete example: imagine a notebook to evaluate three real estate expansion plans for a given city. The first node of cells loads the current real estate data and describes the current state of affairs. From there, you get three branches, each of them following similar logic but following different scenario premisses and arriving to comparable (but different) end results.

Today, this analysis could be done using a chapter for each scenario, but that still requires rolling up and down to compare, maybe unclear settings of which cell to run before scenario A, maybe (accidentally) re-running scenario A before B (run all is sooo easy to click on), etc.

@jasongrout
Copy link
Member

I think using a magic (or cell metadata) to explicitly define dependencies for a DAG of cells is a very interesting idea. I think automatically coming up with the DAG on the front end is probably prohibitively hard, given that we have a number of kernels of different languages. There was some work from a CalPoly group of students on a kernel that would keep track of a DAG, IIRC, somewhat like ObservableHQ.

@nickurak
Copy link

Because it's been a year, and this idea has been bouncing around my head a little -- here's a sketch of a thought in this area:

I'd be really interested in a world where the cells run in actual scopes, and cells were more explict about what they were pulling in from each other. This might be reasonably easy in python, but maybe tricky in different languages.

label_cell("utility")
def func_that_makes_a_df():
   <code>
<Some markdown explaining that function>
label_cell("get_pf")
from cell("utilty") import func that_makes_a_df()
df = func_that_makes_a_df()
<Some markdown that talk about a dataframe>
from cell("get_pf") import df as plotttable_df
import plotly

plotly.plot_something(plottable_df)

Making the only things that are shared between cells super-explicit might help:

  • reduce all kinds of unexpected behavior and unexpected side-effects of scope mixing
  • allow Jupyter to reason about the dependencies
  • give good errors when the dependencies are missing
  • automatically execute cells as they're needed.

I haven't really thought at all about what this might look like outside of the Python world.

@nickurak
Copy link

In that world, attempting to refer to func_that_makes_a_df in a cell that isn't explicitly importing it from another cell would, for example, fail, with a NameError: name 'func_that_makes_a_df' is not defined exception.

@pedrovgp
Copy link

@nickurak , I can see other use cases for that, but the use case you've described could be solved establishing cell dependency and splitting code in different cells accordingly. That would be a more generic approach as well, since it could apply to other languages.

Your example would be something like:

  • Label cell 1 as "utility"
  • Label cell 2 as "get_pf"
  • Add "depends on 'utility'" to cell "get_pf"
  • Add "depends on 'get_pf'" to cell 3 (which plots something)

If you need a function (but not another) that is defined in a given cell, simply split it into two cells and add the dependency only to the one you need.

@pedrovgp
Copy link

I have worked on a (quick and dirty) visual proposition of how to use cell dependencies to facilitate story telling and organize notebook flows. It probably makes more sense in JupyterLab project, but anyway, this is what I envision: https://docs.google.com/presentation/d/1nWAjvuCZb4MEu9SiTy-QWfMWBThpDpZFnuKNp1S_fHs/edit?usp=sharing

Any comments are appreciated.

@toobaz
Copy link
Contributor

toobaz commented Nov 25, 2019

If you need a function (but not another) that is defined in a given cell, simply split it into two cells and add the dependency only to the one you need.

A question, which I see as a prerequisite for this discussion: is there already in any Jupyter plugin a standard, or at least popular, way to uniquely identify cells?

@pedrovgp
Copy link

Seems like it is going to be a part of JupyterLab Core [https://github.com/jupyterlab/jupyterlab-celltags]

@jasongrout
Copy link
Member

A question, which I see as a prerequisite for this discussion: is there already in any Jupyter plugin a standard, or at least popular, way to uniquely identify cells?

Yes. In the Jupyter official notebook format, a cell can have an optional unique name in its metadata: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata

@toobaz
Copy link
Contributor

toobaz commented Nov 26, 2019

Yes. In the Jupyter official notebook format, a cell can have an optional unique name in its metadata: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata

Cool! And is this already exposed somewhere?

@jasongrout
Copy link
Member

Cool! And is this already exposed somewhere?

It's exposed everywhere, in the sense that any library or frontend that can write to cell metadata can write this key. Jupyter notebook and JupyterLab, for example, expose an interface for writing to the cell metadata.

@jasongrout
Copy link
Member

(To be clear, as with any metadata, it is optional and up to the writer to set this value. It is not set by default in JupyterLab, though it may be set in the notebook by default to some sort of UUID).

@toobaz
Copy link
Contributor

toobaz commented Nov 26, 2019

It's exposed everywhere, in the sense that any library or frontend that can write to cell metadata can write this key.

Yes, sorry, my question was misleading. I should have asked: is there already some UI for allowing the user to see/change this?

@jasongrout
Copy link
Member

Yes (though it's just a json editor). In JupyterLab, it's the wrench icon in the left sidebar. In classic notebook, it's the View > Cell Toolbar > Edit Metadata.

@Carreau
Copy link
Member

Carreau commented Nov 26, 2019

In case that has not been posted already, please see also https://github.com/dataflownb and https://github.com/stitchfix/nodebook

@Carreau
Copy link
Member

Carreau commented Nov 26, 2019

Both of those got talks at JupyterCon in 2018 so should be somewhere on Youtube.

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/dag-based-notebooks/11173/2

@stefaneidelloth
Copy link

stefaneidelloth commented Oct 13, 2021

https://observablehq.com/ uses a DAG and I would love to see a JupyterLab extension providing similar features:

https://observablehq.com/@observablehq/how-observable-runs

Edit

Moved overview of projects to jupyterlab:
https://discourse.jupyter.org/t/dag-based-notebooks/11173/4

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-to-get-output-model-for-a-given-cell-in-a-jupyterlab-extension/11342/1

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/dag-based-notebooks/11173/4

@jondo
Copy link

jondo commented Oct 14, 2024

Also see https://marimo.io/ .

@krassowski
Copy link
Member

It's surprising that no one mentioned https://github.com/ipyflow/ipyflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests