Breaking Changes
-
The default sqlite and
dagster-postgres
implementations have been altered to extract the eventstep_key
field as a column, to enable faster per-step queries. You will need to rundagster instance migrate
to update the schema. You may optionally migrate your historical event log data to extract thestep_key
using themigrate_event_log_data
function. This will ensure that your historical event log data will be captured in future step-key based views. Thisevent_log
data migration can be invoked as follows:from dagster.core.storage.event_log.migration import migrate_event_log_data from dagster import DagsterInstance migrate_event_log_data(instance=DagsterInstance.get())
-
We have made pipeline metadata serializable and persist that along with run information. While there are no user-facing features to leverage this yet, it does require an instance migration.
dagster instance migrate
. If you have already run the migration for theevent_log
changes above, you do not need to run it again. Any unforeseen errors related the the newsnapshot_id
in theruns
table or the newsnapshots
table are related to this migration. -
dagster-pandas
ColumnTypeConstraint
has been removed in favor ofColumnDTypeFnConstraint
andColumnDTypeInSetConstraint
.
New
- You can now specify that dagstermill output notebooks be yielded as an output from dagstermill solids, in addition to being materialized.
- You may now set the extension on files created using the
FileManager
machinery. - dagster-pandas typed
PandasColumn
constructors now support pandas 1.0 dtypes. - The Dagit Playground has been restructured to make the relationship between Preset, Partition Sets, Modes, and subsets more clear. All of these buttons have be reconciled and moved to the left side of the Playground.
- Config sections that are required but not filled out in the Dagit playground are now detected and labeled in orange.
- dagster-celery config now support using
env:
to load from environment variables.
Bugfix
- Fixed a bug where selecting a preset in
dagit
would not populate tags specified on the pipeline definition. - Fixed a bug where metadata attached to a raised
Failure
was not displayed in the error modal indagit
. - Fixed an issue where reimporting dagstermill and calling
dagstermill.get_context()
outside of the parameters cell of a dagstermill notebook could lead to unexpected behavior. - Fixed an issue with connection pooling in dagster-postgres, improving responsiveness when using the Postgres-backed storages.
Experimental
- Added a longitudinal view of runs for on the
Schedule
tab for scheduled, partitioned pipelines. Includes views of run status, execution time, and materializations across partitions. The UI is in flux and is currently optimized for daily schedules, but feedback is welcome.
Breaking Changes
default_value
inField
no longer accepts native instances of python enums. Instead the underlying string representation in the config system must be used.default_value
inField
no longer accepts callables.- The
dagster_aws
imports have been reorganized; you should now import resources fromdagster_aws.<AWS service name>
.dagster_aws
providess3
,emr
,redshift
, andcloudwatch
modules. - The
dagster_aws
S3 resource no longer attempts to model the underlying boto3 API, and you can now just use any boto3 S3 API directly on a S3 resource, e.g.context.resources.s3.list_objects_v2
. (#2292)
New
- New
Playground
view indagit
showing an interactive config map - Improved storage and UI for showing schedule attempts
- Added the ability to set default values in
InputDefinition
- Added CLI command
dagster pipeline launch
to launch runs using a configuredRunLauncher
- Added ability to specify pipeline run tags using the CLI
- Added a
pdb
utility toSolidExecutionContext
to help with debugging, available within a solid ascontext.pdb
- Added
PresetDefinition.with_additional_config
to allow for config overrides - Added resource name to log messages generated during resource initialization
- Added grouping tags for runs that have been retried / reexecuted.
Bugfix
- Fixed a bug where date range partitions with a specified end date was clipping the last day
- Fixed an issue where some schedule attempts that failed to start would be marked running forever.
- Fixed the
@weekly
partitioned schedule decorator - Fixed timezone inconsistencies between the runs view and the schedules view
- Integers are now accepted as valid values for Float config fields
- Fixed an issue when executing dagstermill solids with config that contained quote characters.
dagstermill
- The Jupyter kernel to use may now be specified when creating dagster notebooks with the --kernel flag.
dagster-dbt
dbt_solid
now has aNothing
input to allow for sequencing
dagster-k8s
- Added
get_celery_engine_config
to select celery engine, leveraging Celery infrastructure
Documentation
- Improvements to the airline and bay bikes demos
- Improvements to our dask deployment docs (Thanks jswaney!!)
New
-
Added the
IntSource
type, which lets integers be set from environment variables in config. -
You may now set tags on pipeline definitions. These will resolve in the following cases:
- Loading in the playground view in Dagit will pre-populate the tag container.
- Loading partition sets from the preset/config picker will pre-populate the tag container with the union of pipeline tags and partition tags, with partition tags taking precedence.
- Executing from the CLI will generate runs with the pipeline tags.
- Executing programmatically using the
execute_pipeline
api will create a run with the union of pipeline tags andRunConfig
tags, withRunConfig
tags taking precedence. - Scheduled runs (both launched and executed) will have the union of pipeline tags and the schedule tags function, with the schedule tags taking precedence.
-
Output materialization configs may now yield multiple Materializations, and the tutorial has been updated to reflect this.
-
We now export the
SolidExecutionContext
in the public API so that users can correctly type hint solid compute functions.
Dagit
- Pipeline run tags are now preserved when resuming/retrying from Dagit.
- Scheduled run stats are now grouped by partition.
- A "preparing" section has been added to the execution viewer. This shows steps that are in progress of starting execution.
- Markers emitted by the underlying execution engines are now visualized in the Dagit execution timeline.
Bugfix
- Resume/retry now works as expected in the presence of solids that yield optional outputs.
- Fixed an issue where dagster-celery workers were failing to start in the presence of config
values that were
None
. - Fixed an issue with attempting to set
threads_per_worker
on Dask distributed clusters.
dagster-postgres
- All postgres config may now be set using environment variables in config.
dagster-aws
- The
s3_resource
now exposes alist_objects_v2
method corresponding to the underlying boto3 API. (Thanks, @basilvetas!) - Added the
redshift_resource
to access Redshift databases.
dagster-k8s
- The
K8sRunLauncher
config now includes theload_kubeconfig
andkubeconfig_file
options.
Documentation
- Fixes and improvements.
Dependencies
- dagster-airflow no longer pins its werkzeug dependency.
Community
-
We've added opt-in telemetry to Dagster so we can collect usage statistics in order to inform development priorities. Telemetry data will motivate projects such as adding features in frequently-used parts of the CLI and adding more examples in the docs in areas where users encounter more errors.
We will not see or store solid definitions (including generated context) or pipeline definitions (including modes and resources). We will not see or store any data that is processed within solids and pipelines.
If you'd like to opt in to telemetry, please add the following to
$DAGSTER_HOME/dagster.yaml
:telemetry: enabled: true
-
Thanks to @basilvetas and @hspak for their contributions!
New
- It is now possible to use Postgres to back schedule storage by configuring
dagster_postgres.PostgresScheduleStorage
on the instance. - Added the
execute_pipeline_with_mode
API to allow executing a pipeline in test with a specific mode without having to specifyRunConfig
. - Experimental support for retries in the Celery executor.
- It is now possible to set run-level priorities for backfills run using the Celery executor by
passing
--celery-base-priority
todagster pipeline backfill
. - Added the
@weekly
schedule decorator.
Deprecations
- The
dagster-ge
library has been removed from this release due to drift from the underlying Great Expectations implementation.
dagster-pandas
PandasColumn
now includes anis_optional
flag, replacing the previousColumnExistsConstraint
.- You can now pass the
ignore_missing_values flag
toPandasColumn
in order to apply column constraints only to the non-missing rows in a column.
dagster-k8s
- The Helm chart now includes provision for an Ingress and for multiple Celery queues.
Documentation
- Improvements and fixes.
New
- It is now possible to configure a dagit instance to disable executing pipeline runs in a local subprocess.
- Resource initialization, teardown, and associated failure states now emit structured events visible in Dagit. Structured events for pipeline errors and multiprocess execution have been consolidated and rationalized.
- Support Redis queue provider in
dagster-k8s
Helm chart. - Support external postgresql in
dagster-k8s
Helm chart.
Bugfix
- Fixed an issue with inaccurate timings on some resource initializations.
- Fixed an issue that could cause the multiprocess engine to spin forever.
- Fixed an issue with default value resolution when a config value was set using
SourceString
. - Fixed an issue when loading logs from a pipeline belonging to a different repository in Dagit.
- Fixed an issue with where the CLI command
dagster schedule up
would fail in certain scenarios with theSystemCronScheduler
.
Pandas
- Column constraints can now be configured to permit NaN values.
Dagstermill
- Removed a spurious dependency on sklearn.
Docs
- Improvements and fixes to docs.
- Restored dagster.readthedocs.io.
Experimental
- An initial implementation of solid retries, throwing a
RetryRequested
exception, was added. This API is experimental and likely to change.
Other
- Renamed property
runtime_type
todagster_type
in definitions. The following are deprecated and will be removed in a future version.InputDefinition.runtime_type
is deprecated. UseInputDefinition.dagster_type
instead.OutputDefinition.runtime_type
is deprecated. UseOutputDefinition.dagster_type
instead.CompositeSolidDefinition.all_runtime_types
is deprecated. UseCompositeSolidDefinition.all_dagster_types
instead.SolidDefinition.all_runtime_types
is deprecated. UseSolidDefinition.all_dagster_types
instead.PipelineDefinition.has_runtime_type
is deprecated. UsePipelineDefinition.has_dagster_type
instead.PipelineDefinition.runtime_type_named
is deprecated. UsePipelineDefinition.dagster_type_named
instead.PipelineDefinition.all_runtime_types
is deprecated. UsePipelineDefinition.all_dagster_types
instead.
Docs
- New docs site at docs.dagster.io.
- dagster.readthedocs.io is currently stale due to availability issues.
New
- Improvements to S3 Resource. (Thanks @dwallace0723!)
- Better error messages in Dagit.
- Better font/styling support in Dagit.
- Changed
OutputDefinition
to takeis_required
rather thanis_optional
argument. This is to remain consistent with changes toField
in 0.7.1 and to avoid confusion with python's typing and dagster's definition ofOptional
, which indicates None-ability, rather than existence.is_optional
is deprecated and will be removed in a future version. - Added support for Flower in dagster-k8s.
- Added support for environment variable config in dagster-snowflake.
Bugfixes
- Improved performance in Dagit waterfall view.
- Fixed bug when executing solids downstream of a skipped solid.
- Improved navigation experience for pipelines in Dagit.
- Fixed for the dagster-aws CLI tool.
- Fixed issue starting Dagit without DAGSTER_HOME set on windows.
- Fixed pipeline subset execution in partition-based schedules.
Dagit
- Dagit now looks up an available port on which to run when the default port is not available. (Thanks @rparrapy!)
dagster_pandas
- Hydration and materialization are now configurable on
dagster_pandas
dataframes.
dagster_aws
- The
s3_resource
no longer uses an unsigned session by default.
Bugfixes
- Type check messages are now displayed in Dagit.
- Failure metadata is now surfaced in Dagit.
- Dagit now correctly displays the execution time of steps that error.
- Error messages now appear correctly in console logging.
- GCS storage is now more robust to transient failures.
- Fixed an issue where some event logs could be duplicated in Dagit.
- Fixed an issue when reading config from an environment variable that wasn't set.
- Fixed an issue when loading a repository or pipeline from a file target on Windows.
- Fixed an issue where deleted runs could cause the scheduler page to crash in Dagit.
Documentation
- Expanded and improved docs and error messages.
Breaking Changes
There are a substantial number of breaking changes in the 0.7.0 release.
Please see 070_MIGRATION.md
for instructions regarding migrating old code.
Scheduler
-
The scheduler configuration has been moved from the
@schedules
decorator toDagsterInstance
. Existing schedules that have been running are no longer compatible with current storage. To migrate, remove thescheduler
argument on all@schedules
decorators:instead of:
@schedules(scheduler=SystemCronScheduler) def define_schedules(): ...
Remove the
scheduler
argument:@schedules def define_schedules(): ...
Next, configure the scheduler on your instance by adding the following to
$DAGSTER_HOME/dagster.yaml
:scheduler: module: dagster_cron.cron_scheduler class: SystemCronScheduler
Finally, if you had any existing schedules running, delete the existing
$DAGSTER_HOME/schedules
directory and rundagster schedule wipe && dagster schedule up
to re-instatiate schedules in a valid state. -
The
should_execute
andenvironment_dict_fn
argument toScheduleDefinition
now have a required first argumentcontext
, representing theScheduleExecutionContext
Config System Changes
-
In the config system,
Dict
has been renamed toShape
;List
toArray
;Optional
toNoneable
; andPermissiveDict
toPermissive
. The motivation here is to clearly delineate config use cases versus cases where you are using types as the inputs and outputs of solids as well as python typing types (for mypy and friends). We believe this will be clearer to users in addition to simplifying our own implementation and internal abstractions.Our recommended fix is not to used Shape and Array, but instead to use our new condensed config specification API. This allow one to use bare dictionaries instead of
Shape
, lists with one member instead ofArray
, bare types instead ofField
with a single argument, and python primitive types (int
,bool
etc) instead of the dagster equivalents. These result in dramatically less verbose config specs in most cases.So instead of
from dagster import Shape, Field, Int, Array, String # ... code config=Shape({ # Dict prior to change 'some_int' : Field(Int), 'some_list: Field(Array[String]) # List prior to change })
one can instead write:
config={'some_int': int, 'some_list': [str]}
No imports and much simpler, cleaner syntax.
-
config_field
is no longer a valid argument onsolid
,SolidDefinition
,ExecutorDefintion
,executor
,LoggerDefinition
,logger
,ResourceDefinition
,resource
,system_storage
, andSystemStorageDefinition
. Useconfig
instead. -
For composite solids, the
config_fn
no longer takes aConfigMappingContext
, and the context has been deleted. To upgrade, remove the first argument toconfig_fn
.So instead of
@composite_solid(config={}, config_fn=lambda context, config: {})
one must instead write:
@composite_solid(config={}, config_fn=lambda config: {})
-
Field
takes ais_required
rather than ais_optional
argument. This is to avoid confusion with python's typing and dagster's definition ofOptional
, which indicates None-ability, rather than existence.is_optional
is deprecated and will be removed in a future version.
Required Resources
-
All solids, types, and config functions that use a resource must explicitly list that resource using the argument
required_resource_keys
. This is to enable efficient resource management during pipeline execution, especially in a multiprocessing or remote execution environment. -
The
@system_storage
decorator now requires argumentrequired_resource_keys
, which was previously optional.
Dagster Type System Changes
dagster.Set
anddagster.Tuple
can no longer be used within the config system.- Dagster types are now instances of
DagsterType
, rather than a class than inherits fromRuntimeType
. Instead of dynamically generating a class to create a custom runtime type, just create an instance of aDagsterType
. The type checking function is now an argument to theDagsterType
, rather than an abstract method that has to be implemented in a subclass. RuntimeType
has been renamed toDagsterType
is now an encouraged API for type creation.- Core type check function of DagsterType can now return a naked
bool
in addition to aTypeCheck
object. type_check_fn
onDagsterType
(formerlytype_check
andRuntimeType
, respectively) now takes a first argumentcontext
of typeTypeCheckContext
in addition to the second argument ofvalue
.define_python_dagster_type
has been eliminated in favor ofPythonObjectDagsterType
.dagster_type
has been renamed tousable_as_dagster_type
.as_dagster_type
has been removed and similar capabilities added asmake_python_type_usable_as_dagster_type
.PythonObjectDagsterType
andusable_as_dagster_type
no longer take atype_check
argument. If a custom type_check is needed, useDagsterType
.- As a consequence of these changes, if you were previously using
dagster_pyspark
ordagster_pandas
and expecting Pyspark or Pandas types to work as Dagster types, e.g., in type annotations to functions decorated with@solid
to indicate that they are input or output types for a solid, you will need to callmake_python_type_usable_as_dagster_type
from your code in order to map the Python types to the Dagster types, or just use the Dagster types (dagster_pandas.DataFrame
instead ofpandas.DataFrame
) directly.
Other
- We no longer publish base Docker images. Please see the updated deployment docs for an example Dockerfile off of which you can work.
step_metadata_fn
has been removed fromSolidDefinition
&@solid
.SolidDefinition
&@solid
now takestags
and enforces that values are strings or are safely encoded as JSON.metadata
is deprecated and will be removed in a future version.resource_mapper_fn
has been removed fromSolidInvocation
.
New
-
Dagit now includes a much richer execution view, with a Gantt-style visualization of step execution and a live timeline.
-
Early support for Python 3.8 is now available, and Dagster/Dagit along with many of our libraries are now tested against 3.8. Note that several of our upstream dependencies have yet to publish wheels for 3.8 on all platforms, so running on Python 3.8 likely still involves building some dependencies from source.
-
dagster/priority
tags can now be used to prioritize the order of execution for the built-in in-process and multiprocess engines. -
dagster-postgres
storages can now be configured with separate arguments and environment variables, such as:run_storage: module: dagster_postgres.run_storage class: PostgresRunStorage config: postgres_db: username: test password: env: ENV_VAR_FOR_PG_PASSWORD hostname: localhost db_name: test
-
Support for
RunLauncher
s onDagsterInstance
allows for execution to be "launched" outside of the Dagit/Dagster process. As one example, this is used bydagster-k8s
to submit pipeline execution as a Kubernetes Job. -
Added support for adding tags to runs initiated from the
Playground
view in dagit. -
Added
@monthly_schedule
decorator. -
Added
Enum.from_python_enum
helper to wrap Python enums for config. (Thanks @kdungs!) -
[dagster-bash] The Dagster bash solid factory now passes along
kwargs
to the underlying solid construction, and now has a singleNothing
input by default to make it easier to create a sequencing dependency. Also, logs are now buffered by default to make execution less noisy. -
[dagster-aws] We've improved our EMR support substantially in this release. The
dagster_aws.emr
library now provides anEmrJobRunner
with various utilities for creating EMR clusters, submitting jobs, and waiting for jobs/logs. We also now provide aemr_pyspark_resource
, which together with the new@pyspark_solid
decorator makes moving pyspark execution from your laptop to EMR as simple as changing modes. [dagster-pandas] Addedcreate_dagster_pandas_dataframe_type
,PandasColumn
, andConstraint
API's in order for users to create custom types which perform column validation, dataframe validation, summary statistics emission, and dataframe serialization/deserialization. -
[dagster-gcp] GCS is now supported for system storage, as well as being supported with the Dask executor. (Thanks @habibutsu!) Bigquery solids have also been updated to support the new API.
Bugfix
- Ensured that all implementations of
RunStorage
clean up pipeline run tags when a run is deleted. Requires a storage migration, usingdagster instance migrate
. - The multiprocess and Celery engines now handle solid subsets correctly.
- The multiprocess and Celery engines will now correctly emit skip events for steps downstream of failures and other skips.
- The
@solid
and@lambda_solid
decorators now correctly wrap their decorated functions, in the sense offunctools.wraps
. - Performance improvements in Dagit when working with runs with large configurations.
- The Helm chart in
dagster_k8s
has been hardened against various failure modes and is now compatible with Helm 2. - SQLite run and event log storages are more robust to concurrent use.
- Improvements to error messages and to handling of user code errors in input hydration and output materialization logic.
- Fixed an issue where the Airflow scheduler could hang when attempting to load dagster-airflow pipelines.
- We now handle our SQLAlchemy connections in a more canonical way (thanks @zzztimbo!).
- Fixed an issue using S3 system storage with certain custom serialization strategies.
- Fixed an issue leaking orphan processes from compute logging.
- Fixed an issue leaking semaphores from Dagit.
- Setting the
raise_error
flag inexecute_pipeline
now actually raises user exceptions instead of a wrapper type.
Documentation
- Our docs have been reorganized and expanded (thanks @habibutsu, @vatervonacht, @zzztimbo). We'd love feedback and contributions!
Thank you Thank you to all of the community contributors to this release!! In alphabetical order: @habibutsu, @kdungs, @vatervonacht, @zzztimbo.
Bugfix
- Improved SQLite concurrency issues, uncovered while using concurrent nodes in Airflow
- Fixed sqlalchemy warnings (thanks @zzztimbo!)
- Fixed Airflow integration issue where a Dagster child process triggered a signal handler of a parent Airflow process via a process fork
- Fixed GCS and AWS intermediate store implementations to be compatible with read/write mode serialization strategies
- Improve test stability
Documentation
- Improved descriptions for setting up the cron scheduler (thanks @zzztimbo!)
New
- Added the dagster-github library, a community contribution from @Ramshackle-Jamathon and @k-mahoney!
dagster-celery
- Simplified and improved config handling.
- An engine event is now emitted when the engine fails to connect to a broker.
Bugfix
- Fixes a file descriptor leak when running many concurrent dagster-graphql queries (e.g., for backfill).
- The
@pyspark_solid
decorator now handles inputs correctly. - The handling of solid compute functions that accept kwargs but which are decorated with explicit input definitions has been rationalized.
- Fixed race conditions in concurrent execution using SQLite event log storage with concurrent execution, uncovered by upstream improvements in the Python inotify library we use.
Documentation
- Improved error messages when using system storages that don't fulfill executor requirements.
New
- We are now more permissive when specifying configuration schema in order make constructing configuration schema more concise.
- When specifying the value of scalar inputs in config, one can now specify that value directly as the key of the input, rather than having to embed it within a
value
key.
Breaking
- The implementation of SQL-based event log storages has been consolidated,
which has entailed a schema change. If you have event logs stored in a
Postgres- or SQLite-backed event log storage, and you would like to maintain
access to these logs, you should run
dagster instance migrate
. To check what event log storages you are using, rundagster instance info
. - Type matches on both sides of an
InputMapping
orOutputMapping
are now enforced.
New
- Dagster is now tested on Python 3.8
- Added the dagster-celery library, which implements a Celery-based engine for parallel pipeline execution.
- Added the dagster-k8s library, which includes a Helm chart for a simple Dagit installation on a Kubernetes cluster.
Dagit
- The Explore UI now allows you to render a subset of a large DAG via a new solid
query bar that accepts terms like
solid_name+*
and+solid_name+
. When viewing very large DAGs, nothing is displayed by default and*
produces the original behavior. - Performance improvements in the Explore UI and config editor for large pipelines.
- The Explore UI now includes a zoom slider that makes it easier to navigate large DAGs.
- Dagit pages now render more gracefully in the presence of inconsistent run storage and event logs.
- Improved handling of GraphQL errors and backend programming errors.
- Minor display improvements.
dagster-aws
- A default prefix is now configurable on APIs that use S3.
- S3 APIs now parametrize
region_name
andendpoint_url
.
dagster-gcp
- A default prefix is now configurable on APIs that use GCS.
dagster-postgres
- Performance improvements for Postgres-backed storages.
dagster-pyspark
- Pyspark sessions may now be configured to be held open after pipeline execution completes, to enable extended test cases.
dagster-spark
spark_outputs
must now be specified when initializing aSparkSolidDefinition
, rather than in config.- Added new
create_spark_solid
helper and newspark_resource
. - Improved EMR implementation.
Bugfix
- Fixed an issue retrieving output values using
SolidExecutionResult
(e.g., in test) for dagster-pyspark solids. - Fixes an issue when expanding composite solids in Dagit.
- Better errors when solid names collide.
- Config mapping in composite solids now works as expected when the composite solid has no top level config.
- Compute log filenames are now guaranteed not to exceed the POSIX limit of 255 chars.
- Fixes an issue when copying and pasting solid names from Dagit.
- Termination now works as expected in the multiprocessing executor.
- The multiprocessing executor now executes parallel steps in the expected order.
- The multiprocessing executor now correctly handles solid subsets.
- Fixed a bad error condition in
dagster_ssh.sftp_solid
. - Fixed a bad error message giving incorrect log level suggestions.
Documentation
- Minor fixes and improvements.
Thank you Thank you to all of the community contributors to this release!! In alphabetical order: @cclauss, @deem0n, @irabinovitch, @pseudoPixels, @Ramshackle-Jamathon, @rparrapy, @yamrzou.
Breaking
- The
selector
argument toPipelineDefinition
has been removed. This API made it possible to construct aPipelineDefinition
in an invalid state. UsePipelineDefinition.build_sub_pipeline
instead.
New
- Added the
dagster_prometheus
library, which exposes a basic Prometheus resource. - Dagster Airflow DAGs may now use GCS instead of S3 for storage.
- Expanded interface for schedule management in Dagit.
Dagit
- Performance improvements when loading, displaying, and editing config for large pipelines.
- Smooth scrolling zoom in the explore tab replaces the previous two-step zoom.
- No longer depends on internet fonts to run, allowing fully offline dev.
- Typeahead behavior in search has improved.
- Invocations of composite solids remain visible in the sidebar when the solid is expanded.
- The config schema panel now appears when the config editor is first opened.
- Interface now includes hints for autocompletion in the config editor.
- Improved display of solid inputs and output in the explore tab.
- Provides visual feedback while filter results are loading.
- Better handling of pipelines that aren't present in the currently loaded repo.
Bugfix
- Dagster Airflow DAGs previously could crash while handling Python errors in DAG logic.
- Step failures when running Dagster Airflow DAGs were previously not being surfaced as task failures in Airflow.
- Dagit could previously get into an invalid state when switching pipelines in the context of a solid subselection.
frozenlist
andfrozendict
now pass Dagster's parameter type checks forlist
anddict
.- The GraphQL playground in Dagit is now working again.
Nits
- Dagit now prints its pid when it loads.
- Third-party dependencies have been relaxed to reduce the risk of version conflicts.
- Improvements to docs and example code.
Breaking
- The interface for type checks has changed. Previously the
type_check_fn
on a custom type was required to return None (=passed) or else raiseFailure
(=failed). Now, atype_check_fn
may returnTrue
/False
to indicate success/failure in the ordinary case, or else return aTypeCheck
. The newsuccess
field onTypeCheck
now indicates success/failure. This obviates the need for thetypecheck_metadata_fn
, which has been removed. - Executions of individual composite solids (e.g. in test) now produce a
CompositeSolidExecutionResult
rather than aSolidExecutionResult
. dagster.core.storage.sqlite_run_storage.SqliteRunStorage
has moved todagster.core.storage.runs.SqliteRunStorage
. Any persisteddagster.yaml
files should be updated with the new classpath.is_secret
has been removed fromField
. It was not being used to any effect.- The
environmentType
andconfigTypes
fields have been removed from the dagster-graphqlPipeline
type. TheconfigDefinition
field onSolidDefinition
has been renamed toconfigField
.
Bugfix
PresetDefinition.from_files
is now guaranteed to give identical results across all Python minor versions.- Nested composite solids with no config, but with config mapping functions, now behave as expected.
- The dagster-airflow
DagsterKubernetesPodOperator
has been fixed. - Dagit is more robust to changes in repositories.
- Improvements to Dagit interface.
New
- dagster_pyspark now supports remote execution on EMR with the
@pyspark_solid
decorator.
Nits
- Documentation has been improved.
- The top level config field
features
in thedagster.yaml
will no longer have any effect. - Third-party dependencies have been relaxed to reduce the risk of version conflicts.
- Scheduler errors are now visible in dagit
- Run termination button no longer persists past execution completion
- Fixes run termination for multiprocess execution
- Fixes run termination on Windows
dagit
no longer prematurely returns control to terminal on Windowsraise_on_error
is now available on theexecute_solid
test utilitycheck_dagster_type
added as a utility to help test type checks on custom types- Improved support in the type system for
Set
andTuple
types - Allow composite solids with config mapping to expose an empty config schema
- Simplified graphql API arguments to single-step re-execution to use
retryRunId
,stepKeys
execution parameters instead of areexecutionConfig
input object - Fixes missing step-level stdout/stderr from dagster CLI
-
Adds a
type_check
parameter toPythonObjectType
,as_dagster_type
, and@as_dagster_type
to enable custom type checks in place of defaultisinstance
checks. See documentation here: https://dagster.readthedocs.io/en/latest/sections/learn/tutorial/types.html#custom-type-checks -
Improved the type inference experience by automatically wrapping bare python types as dagster types.
-
Reworked our tutorial (now with more compelling/scary breakfast cereal examples) and public API documentation. See the new tutorial here: https://dagster.readthedocs.io/en/latest/sections/learn/tutorial/index.html
-
New solids explorer in Dagit allows you to browse and search for solids used across the repository.
-
Enabled solid dependency selection in the Dagit search filter.
- To select a solid and its upstream dependencies, search
+{solid_name}
. - To select a solid and its downstream dependents, search
{solid_name}+
. - For both search
+{solid_name}+
.
For example. In the Airline demo, searching
+join_q2_data
will get the following: - To select a solid and its upstream dependencies, search
-
Added a terminate button in Dagit to terminate an active run.
-
Added an
--output
flag todagster-graphql
CLI. -
Added confirmation step for
dagster run wipe
anddagster schedule wipe
commands (Thanks @shahvineet98). -
Fixed a wrong title in the
dagster-snowflake
library README (Thanks @Step2Web).
- Changed composition functions
@pipeline
and@composite_solid
to automatically give solids aliases with an incrementing integer suffix when there are conflicts. This removes to the need to manually alias solid definitions that are used multiple times. - Add
dagster schedule wipe
command to delete all schedules and remove all schedule cron jobs execute_solid
test util now works on composite solids.- Docs and example improvements: https://dagster.readthedocs.io/
- Added
--remote
flag todagster-graphql
for querying remote dagit servers. - Fixed issue with duplicate run tag autocomplete suggestions in dagit (#1839)
- Fixed Windows 10 / py3.6+ bug causing pipeline execution failures
- Fixed an issue where Dagster public images tagged
latest
on Docker Hub were erroneously published with an older version of Dagster (#1814) - Fixed an issue where the most recent scheduled run was not displayed in dagit (#1815)
- Fixed a bug with the
dagster schedule start --start-all
command (#1812) - Added a new scheduler command to restart a schedule:
dagster schedule restart
. Also added a flag to restart all running schedules:dagster schedule restart --restart-all-running
.
New
This major release includes features for scheduling, operating, and executing pipelines that elevate dagit and dagster from a local development tool to a deployable service.
DagsterInstance
introduced as centralized system to control run, event, compute log, and local intermediates storage.- A
Scheduler
abstraction has been introduced along side an initial implementation ofSystemCronScheduler
indagster-cron
. dagster-aws
has been extended with a CLI for deploying dagster to AWS. This can spin up a Dagit node and all the supporting infrastructure—security group, RDS PostgreSQL instance, etc.—without having to touch the AWS console, and for deploying your code to that instance.- Dagit
Runs
: a completely overhauled Runs history page. Includes the ability toRetry
,Cancel
, andDelete
pipeline runs from the new runs page.Scheduler
: a page for viewing and interacting with schedules.Compute Logs
: stdout and stderr are now viewable on a per execution step basis in each run. This is available in real time for currently executing runs and for historical runs.- A
Reload
button in the top right in dagit restarts the web-server process and updates the UI to reflect repo changes, including DAG structure, solid names, type names, etc. This replaces the previous file system watching behavior.
Breaking Changes
--log
and--log-dir
no longer supported as CLI args. Existing runs and events stored via these flags are no longer compatible with current storage.raise_on_error
moved from in process executor config to argument to arguments in python API methods such asexecute_pipeline
- Fixes an issue using custom types for fan-in dependencies with intermediate storage.
- Fixes an issue running some Dagstermill notebooks on Windows.
- Fixes a transitive dependency issue with Airflow.
- Bugfixes, performance improvements, and better documentation.
- Fixed an issue with specifying composite output mappings (#1674)
- Added support for specifying Dask worker resources (#1679)
- Fixed an issue with launching Dagit on Windows
- Execution details are now configurable. The new top-level
ExecutorDefinition
and@executor
APIs are used to define in-process, multiprocess, and Dask executors, and may be used by users to define new executors. Like loggers and storage, executors may be added to aModeDefinition
and may be selected and configured through theexecution
field in the environment dict or YAML, including through Dagit. Executors may no longer be configured through theRunConfig
. - The API of dagster-dask has changed. Pipelines are now executed on Dask using the
ordinary
execute_pipeline
API, and the Dask executor is configured through the environment. (See the dagster-dask README for details.) - Added the
PresetDefinition.from_files
API for constructing a preset from a list of environment files (replacing the old usage of this class).PresetDefinition
may now be directly instantiated with an environment dict. - Added a prototype integration with dbt.
- Added a prototype integration with Great Expectations.
- Added a prototype integration with Papertrail.
- Added the dagster-bash library.
- Added the dagster-ssh library.
- Added the dagster-sftp library.
- Loosened the PyYAML compatibility requirement.
- The dagster CLI no longer takes a
--raise-on-error
or--no-raise-on-error
flag. Set this option in executor config. - Added a
MarkdownMetadataEntryData
class, so events yielded from client code may now render markdown in their metadata. - Bug fixes, documentation improvements, and improvements to error display.
- Dagit now accepts parameters via environment variables prefixed with
DAGIT_
, e.g.DAGIT_PORT
. - Fixes an issue with reexecuting Dagstermill notebooks from Dagit.
- Bug fixes and display improvments in Dagit.
- Reworked the display of structured log information and system events in Dagit, including support for structured rendering of client-provided event metadata.
- Dagster now generates events when intermediates are written to filesystem and S3 storage, and these events are displayed in Dagit and exposed in the GraphQL API.
- Whitespace display styling in Dagit can now be toggled on and off.
- Bug fixes, display nits and improvements, and improvements to JS build process, including better display for some classes of errors in Dagit and improvements to the config editor in Dagit.
- Pinned RxPY to 1.6.1 to avoid breaking changes in 3.0.0 (py3-only).
- Most definition objects are now read-only, with getters corresponding to the previous properties.
- The
valueRepr
field has been removed fromExecutionStepInputEvent
andExecutionStepOutputEvent
. - Bug fixes and dagit UX improvements, including SQL highlighting and error handling.
- Added top-level
define_python_dagster_type
function. - Renamed
metadata_fn
totypecheck_metadata_fn
in all runtime type creation APIs. - Renamed
result_value
andresult_values
tooutput_value
andoutput_values
onSolidExecutionResult
- Dagstermill: Reworked public API now contains only
define_dagstermill_solid
,get_context
,yield_event
,yield_result
,DagstermillExecutionContext
,DagstermillError
, andDagstermillExecutionError
. Please see the new guide for details. - Bug fixes, including failures for some dagster CLI invocations and incorrect handling of Airflow timestamps.