Skip to content

Latest commit

 

History

History
151 lines (120 loc) · 6.19 KB

ROADMAP.md

File metadata and controls

151 lines (120 loc) · 6.19 KB

Roadmap

This file is used only as a kind of backlog or/and draft for ideas.

Current version: 0.9.0

Next version 0.10.0

  • Admin endpoint for projects, space used, registry, and so forth
  • Fix rebuild of runtime in the current mode doesn't update registry value in the db
  • Client unification between DiskClient and NBClient
  • Remove project information from labfile?
  • More login options like allow/disallow refresh token
  • Tests: >= 65%
  • Fix broken tests
  • Pull docker image before container execution (useful for current images)

MAY:

  • plugin system for cli ?
  • allow user for custom tagging current images
  • Import or change a project in a existing lab functions repo
  • Registration endpoint?
  • Custom tagging current builds ?

version 0.9.0

MUST:

  • workflows.yaml file change to labfile.yaml

  • GPU scaling

  • migration rq to libq

  • Refresh Token: rotates whit each refresh request (idea: tracks refresh token with access_token..)

  • Cluster client and Cluster API for creation, destruction and listing of agents and machines

  • Fix smart_open dependecy for the client

  • Doc: release process

  • Task execution

  • Support python 3.10

  • scopes implementation for cluster endpoints

  • Fix lab manager db upgrade command

  • [-] Manage agent credentials state on temporal disk

MAY:

  • Project delete task
  • Watch events logs on demand
  • Better info context for cli and lib
  • [-] Timeouts default for server, tasks and clients
  • refactor of cluster package
  • packer images for agent (default, gpu/nvidia)
  • agent client should keep credentials in a internal cache
  • The agent should pull first the docker-image for the notebook execution (maybe if the tag is current)

Details

The goal of this release is keep improving stability of the system.

The focus will be in observability and security.

As a second goal, but maybe that's for a following release, is adding other storage options tied to google cloud an aws.

Version 0.8.0

MUST

  • NBTask args refactor
  • Does runtimes struct need a name to be associated with?
  • RQWorker: Overwrite worker self ip discovery logic
  • RQWorker: Activity time
  • RQWorker: Agent token generation for workers
  • Workflows: Allows workflows execution by alias (the same for history)
  • Review settings strategy for server/worker
  • A machine types registry by cloud vendor (gcloud / digitalocean)
  • Autoscaling workers
  • Doc: how to install client nb-workflows
  • Tests: >= 55%
  • Security: direct implementation of a JWT System
  • A default runtime available for a first time project creation
  • GStorage plugin implementation to be used as fileserver (notebooks result, project uploads and so on)
  • Executors: Review function logic and resilience for errors.
  • General client config added

MAY

  • Refactor: clean deprecated packages as core, agent and agent_client
  • Review ExecutionTaskResult
  • CI/CD: constraint merges to main branch
  • Doc: about architecture
  • Doc: branch and release strategy adopted
  • Doc: User guide
  • CI/CD: constraint merges to main branch
  • Doc: how to install client nb-workflows
  • [-] Notifications: slack and discord (from previous release)

Details The goal of this release is adding support to autoscale machines for workers

version 0.7.0

MUST:

  • Executios: Namespacing Execution ID for tracking loads.
  • Executors: Execution ID injected from Web
  • Notifications: slack and discord keep working on this for the next release
  • Tests: >= 40%
  • History: Execution result of a Workflow
  • Web: API versioning
  • Models: Alembic migrations implemented
  • Copy outputs executions locally
  • Project: During first time project creation review feedback if project name already exist

MAY:

  • Add project as mixin to History Model which allows to see full executions detail per project
  • Log execution streaming
  • Custom Errors
  • Example project using nb-workflows
  • Split NBClient into UserClient and AgentClient
  • Tracks dockerfiles versions.
  • Types: NBTask as pydantic model.
  • Types: ScheduleData as pydantic model.
  • Projects: File spec change where each Workflow Name will be a dict (like serivces in docker-compose.yml)
  • Clients refactoring: One client for command line (with filesystem side effects), Another one as agent.

Details The goal of this release is a functional system that delivery the promise of remote execution of notebook for production loads. With that in mind, the focus will be in the stabilization of workflows executions, adding tests cases, execution feedback and cli enhancements.

Backlog

independent from a schedule release

  • Sequencial and multiple executions: option to share folders between workflows.
  • Option to convert a collab into a workflow in a project
  • Option to create a project from a notebook
  • Agent which communicates only doing long-pulling to the server see Github Actions Runner
  • Review private_key strategy, evaluate sealed boxes
  • Allows Control plane to spawn machines
  • If a job dies by timeout or by a runtime error, the docker spawned will still be running, review this case.
  • Default project for each user ? this will allow uploading and executing notebooks from any place without worryng about dependencies.
  • Separation between client and server, settings flag ? base settings shared?
  • [-] Jupyter on demand instance (locally)
  • Executors: Docker volumes specification cancelled
  • Optional [Any] Dockerfile RUN command definition from settings. Cancelled
  • Executors: SeqPipe implmentation cancelled
  • Security: constraint access by scopes(permissions) and claims(by project)
  • In WorkflowState NBTask could be a List instead of a unique value, this could allows sequencial executions of notebooks.
  • Evaluates dockerfile generation only in server-side
  • Prometheus metrics
  • E2E: complete testing of project creation, workflows push and execution