Skip to content

Latest commit

 

History

History
411 lines (381 loc) · 53.8 KB

16.0.0.md

File metadata and controls

411 lines (381 loc) · 53.8 KB

16.0.0 (2023-01-12)

Full Changelog

Breaking changes:

  • Remove unused ExecutionPlan::relies_input_order (has been replaced with required_input_ordering) #4856 (alamb)
  • Add DataFrame::into_view instead of implementing TableProvider (#2659) #4778 (tustvold)

Implemented enhancements:

  • Support custom window frame with AVG aggregate function #4845
  • add sqllogicaltest for tpch and remove some duplicated test. #4801
  • Catalog Snapshot Isolation #4697
  • Support select .. FROM 'parquet.file' in datafusion-cli #4580

Fixed bugs:

  • Regression: write_csv result has incorrect formatting #4876
  • Incorrect results for join condition against current master branch #4844
  • Match Postgres for stddev and variance on less than 3 values #4843
  • JOIN ... USING (columns) works incorrectly with multiple columns (joined-over columns are missing in the output) #4674
  • ROW_NUMBER window function inconsistent across partitions in multi-threaded runtime #4673
  • SELECT ... FROM (tbl1 UNION tbl2) wrongly works like SELECT DISTINCT ... FROM (tbl1 UNION tbl2) #4667
  • DataFrame TableProvider Circular Reference #2659

Documentation updates:

Closed issues:

  • Remove tests from sql_integration that were ported to sqllogictest #4498
  • How to register a http url to the object_store #4491
  • optimizer: support unsigned <-> decimal for unwrap_cast_in_comparion rule #4287
  • Add SQL support for NATURAL JOIN #117
  • [Datafusion] Datafusion queries involving a column name that begins with a number produces unexpected results #108

Merged pull requests:

  • docs: improve Column::normalize_with_schemas docs #4871 (crepererum)
  • Skip EliminateCrossJoin rule when meet non-empty join filter #4869 (ygf11)
  • Support for SQL Natural Join #4863 [sql] (Jefffrey)
  • Minor: Move test data into datafusion/core/tests/data #4855 (alamb)
  • Covariance single row input & null skipping #4852 (korowa)
  • Document ability to select directly from files in datafusion-cli #4851 (alamb)
  • Fix push_down_projection through a distinct #4849 (Jefffrey)
  • Support using var/var_pop/stddev/stddev_pop in window expressions with custom frames #4848 (jonmmease)
  • Update variance/stddev to work with single values #4847 (jonmmease)
  • Implement retract_batch for AvgAccumulator #4846 (jonmmease)
  • Support wildcard select on multiple column using joins #4840 [sql] (Jefffrey)
  • Orthogonalize distribution and sort enforcement rules into EnforceDistribution and EnforceSorting #4839 (mustafasrepo)
  • support select .. FROM 'parquet.file' in datafusion-cli #4838 (unconsolable)
  • Remove tests from sql_integration that were ported to sqllogictest #4836 (matthewwillian)
  • add tpch sqllogicaltest and remove some duplicated test #4802 (jackwener)

16.0.0-rc1 (2023-01-07)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Move the ExtractEquijoinPredicate behind the SubqueryFilterToJoin #4759
  • Remove the config datafusion.execution.coalesce_target_batch_size #4756
  • SimplifyExpressions will fail when rebuild equijoin with alias #4754
  • Provide a constructor for the ConfigOptions with HashMap<String, String> #4752
  • Non-deprecated support for planning SQL without DDL #4720
  • Add regression tests for planning TPC-DS queries #4718
  • Move the extracting join keys logic to optimizer #4710
  • Support compression in IPCWriter #4708
  • Support prepared statement parameter type inference #4700
  • PruningPredicate Use Physical not Logical Predicate #4695
  • Support for executing infinite files #4692
  • Add a sort rule to remove unnecessary SortExecs from physical plan #4686
  • Install protoc automatically when building datafusion/proto crate #4684
  • Make DfSchema wrap SchemaRef #4680
  • Reorder the physical plan optimizer rules #4678
  • Inconsistent behavior with PostgreSQL to decide Window Expressions ordering #4641
  • Returns error too late when parsing invalid file compression type. #4636
  • Make OptimizerConfig a Trait #4631
  • Move Optimize onto DataFrame #4626
  • Make LogicalPlanBuilder Consuming #4622
  • Make DataFrame Consuming #4621
  • rules don't need to recursion inside themselves #4613
  • [window function] support min max with self define sliding window. #4603
  • Add try_optimize for all_rules #4598
  • Refine the physical plan serialization and deserialization #4597
  • Normalize datafusion configuration names #4595
  • Add need_data_exchange in the ExecutionPlan to indicate whether a physical operator needs data exchange #4585
  • Bump Datafusion sql-parser dependency to 0.28 #4573
  • tpch test exist duplicated #4563
  • user-defined aggregate function as window function #4552
  • Convert a Prepare Logical Plan into a Logical Plan with all parameters replaced with values #4550
  • FileStream requires fake ObjectStore when ParquetFileReaderFactory is used #4533
  • Avoid reading the entire file in ChunkedStore #4524
  • Enrich filter statistics predictions with estimated column boundaries #4518
  • Show window frame info in physical plan #4509
  • Add sqllogictest auto labeler #4507
  • Optimize is_distinct_from / is_not_distinct_from #4482
  • Add window func related logic plan to proto ability. #4480
  • Make window function related struct public. #4479
  • Improve partition file explain plan display to show groupings #4466
  • Add support for non-column key for equijoin when eliminating cross join to inner join #4442
  • Remove the schema checking from CrossJoinExec::try_new #4431
  • Initial support for prepared statement #4426
  • Add support for NTILE built-in Window Function #4403
  • Add Support for MIN, MAX Aggregate Functions when run with custom window frames #4402
  • Support INSERT INTO statement #4397
  • Enhancement: split the SQL planner into smaller modules #4392
  • Proposal: Improve the join keys of logical plan #4389
  • Add MergeSubqueryAlias rule #4383
  • Optimizer rule support subqueryAlias #4381
  • Rewrite simple regex expressions #4370
  • Revisit get_statistics_with_limit() method in datasource mod #4323
  • Support for type coercion for a (Timestamp, Utf8) pair #4311
  • replace the operation about decimal to the arrow-rs kernel #4289
  • change date_part return types to f64 #3997
  • Better api for setting ConfigOptions from SessionContext #3908
  • Make ConfigOptions easier to work with #3886
  • An asynchronous version of CatalogList/CatalogProvider/SchemaProvider #3777
  • Allow configs to be set with string values #3500
  • support scientific notation for SQL literals #3448
  • Adopt physical plan serde from arrow-ballista #3257
  • Improve codebase readability and error messages by and consistently handle downcasting #3152
  • Re-enable where_clauses_object_safety #3081
  • optimize/simplify the literal data type and remove unnecessary cast、try_cast #3031
  • Move datafusion-substrait crate into arrow-datafusion repo #2646
  • [enhancement] rules don't need to recursion inside themselves #2620
  • Add support for GROUPING SETS syntax in SQL planner #2469
  • Optimize EXISTS subquery expressions by rewriting as semi-join #2351
  • Add Delta Lake TableProvider #525
  • Support window functions with window frame #361

Fixed bugs:

  • PushdownFilter rule exist bug will cause filter change wrong #4822
  • Unlimited memory consumption in RepartitionExec #4816
  • Physical Optimizer Config Mutation Doesn't Take Effect #4806
  • cargo test failed error: linking with cc failed: exit status: 1 #4790
  • Parquet files generated by DataFusion cannot be read by Apache Spark #4782
  • datafusion-physical-expr doesn't compile when blake3/traits-preview is enabled #4781
  • Multiple ways to express like / ilike / not like / not ilike #4765
  • SessionState::optimize and SessionState::create_physical_plan Don't Update Query Start Time #4747
  • Page Filtering Incorrectly Handles Pages with Different Row Counts #4744
  • cargo test failing on master due to tpcds_logical_q41 stackoverflow #4728
  • PruningPredicate Different Evaluation Context from Query #4693
  • Skipping optimizer rule due to create_name not supporting wildcard #4681
  • Create physical plan bug: got Arrow schema with 1 and DataFusion schema with 0 #4677
  • Timestamp <-> Date32 compare doesn't work #4672
  • Wrongly use the function clamp #4654
  • Fix the clippy errors #4653
  • Filter Null Keys Update Not Taking Effect #4638
  • Should not generate duplicate sort keys from Window expr's partition by keys #4635
  • common_sub_expression_eliminate exists bug #4575
  • Confusing "Bare" in doesn't exist messages #4571
  • having shouldn't include alias in projection #4556
  • wrong comment about having #4554
  • drop view t1, t2, ... and drop table t1, t2, ... silently ignores arguments past the first #4531
  • Extract from timestamp doesn't support nanosecond #4528
  • prepare_select_exprs don't need outer_query_schema #4526
  • Table names with periods are not handled correctly #4513
  • Push_down_projection push redundant column. #4486
  • Planner don't generate SubqueryAlias #4483
  • Planner generate replicated Projection | SubqueryAlias #4481
  • apply_table_alias will ignore alias_name when columns is empty. #4454
  • Fix output_ordering of WindowAggExec #4438
  • Incorrect error for plus/minus operations over timestamps and dates #4420
  • Optimization rule filter_push_down causes FieldNotFound error #4401
  • Should not convert a normal non-inner join to Cross Join when there are non-equal Join conditions #4363
  • MemoryConsumer::try_grow Underflow #4328
  • Potential MemoryManager Deadlock #4325
  • create external table should fail to parse if syntax is incorrect #4262
  • Nullif func states support for Boolean type, but fails if this is attempted #4205
  • ProjectionPushDown rule don't consider the alias in projection. #4174
  • Stack overflow planning complex query #4065
  • Can not use extract <part> on the value of now() #3980
  • Bug with intervals and logical and/or #3944
  • CoalesceBatches doesn't provide correct elapsed_compute info in metrics #3894
  • Paniced at to_timestamp_micros function when the timestamp is too large. #3832
  • Optimizer casts decimals to different values on different platforms #3791
  • CSV inference reads in the whole file to memory, regardless of row limit #3658
  • after type coercion CommonSubexprEliminate will produce invalid projection #3635
  • panic at attempt to multiply with overflow when doing math on Decimal128 columns #3437
  • Precedence bug with date comparison to date plus interval #3408
  • Median aggregation using DataFrame panics: "AggregateState is not a scalar aggregate" #3105
  • date_part does't work for now() #3096
  • hash_join panics when join keys have different data types #2877
  • Memory manager triggers unnecessary spills #2829
  • Address performance/execution plan of TPCH query 9 #77

Documentation updates:

  • Add a new open source project that is use DataFusion as query engine #4768 (francis-du)

Closed issues:

  • move the tests in planner #4798
  • Make it easier to update sqltestlogic test expected output ("test script completion mode") #4570
  • Make ConfigOption names into an Enum #4517
  • Implement null / empty string handling for sqllogictest #4500
  • Write a blog about parquet predicate pushdown #3464
  • Ensure column names are equivalent with or without optimization #1123

Merged pull requests:

  • Bump tokio from 1.23.0 to 1.23.1 in /datafusion-cli #4835 (dependabot[bot])
  • Fix a few links in roadmap.md #4833 (romanz)
  • DataFusion 16.0.0 release prep: Update version + add changelog #4831 [sql] (andygrove)
  • feat: use arrow row format for hash-group-by #4830 (crepererum)
  • refactor: split relation of planner into one part. #4829 [sql] (jackwener)
  • bugfix: remove cnf_rewrite in push_down_filter #4825 (jackwener)
  • minor: add some comments to row group pruning tests #4823 (alamb)
  • Handle trailing tbl column in TPCH benchmarks #4821 (tustvold)
  • fix: account for memory in RepartitionExec #4820 (crepererum)
  • Fix clippy #4817 (tustvold)
  • Add test cases: row group filter with missing statistics for decimal data type #4810 (liukun4515)
  • Move default catalog and schema onto ConfigOptions (#3887) #4805 (tustvold)
  • remove duplicated test #4800 (jackwener)
  • Update sqlparser requirement from 0.29 to 0.30 #4799 [sql] (dependabot[bot])
  • rewrite the function ensure_any_column_reference_is_unambiguous #4797 [sql] (HaoYang670)
  • Uncomment nanoseconds tests after sql parser upgrade #4789 (comphead)
  • fix: ListingSchemaProvider directory paths (related: #4204) #4788 (cfraz89)
  • Minimize stack space required to plan deeply nested binary expressions #4787 [sql] (alamb)
  • Minor: Refactor some sql planning code into functions #4785 [sql] (alamb)
  • Make datafusion-physical-expr compatible with blake3/traits-preview feature. #4784 (BoredPerson)
  • refactor: split expression pf planner into one part. #4783 [sql] (jackwener)
  • Fix Stack overflow in sql planning in debug builds #4779 [sql] (alamb)
  • Pipeline-friendly Bounded Memory Window Executor #4777 (mustafasrepo)
  • Implement OptimizerConfig for SessionState #4775 (tustvold)
  • refactor: extract parse_value #4774 [sql] (jackwener)
  • Structify ConfigOptions (#4517) #4771 (tustvold)
  • Update sqlparser to 29.0.0 #4770 [sql] (alamb)
  • Refactor extract_join_keys and move the ExtractEquijoinPredicate rule #4760 (ygf11)
  • Remove the config datafusion.execution.coalesce_target_batch_size and use datafusion.execution.batch_size instead #4757 (yahoNanJing)
  • Add alias check for equijoin in from_plan #4755 (ygf11)
  • Take the top level schema into account when creating UnionExec #4753 (HaoYang670)
  • Set query_execution_start_time on snapshot from SessionContext (#4747) #4750 (tustvold)
  • minor: Improve docstrings #4748 [sql] (alamb)
  • Append generated column to the schema instead of prepending for WindowAggExec #4746 (mustafasrepo)
  • Minor: comments about coercion in physical planner #4745 (alamb)
  • Simplify parquet filter predicate test, fix Page Filtering Incorrectly Handles Pages with Different Row Counts #4743 (tustvold)
  • support byte array for decimal in parquet page and row group filters #4742 (liukun4515)
  • revert some code for #4726 / remove unnecessary coercion in physical plans #4741 (liukun4515)
  • Cleanup InformationSchema plumbing #4740 (tustvold)
  • Minor: use a common method to check the validate of equijoin predicate #4739 (ygf11)
  • minor: Support more data type for null_counts in the PruningStatistics #4738 (liukun4515)
  • Extended datatypes & signatures support for NULLIF function #4737 (korowa)
  • minor: improve debug logging for pruning predicates #4736 (alamb)
  • refactor: parallelize parquet_exec test case single_file #4735 (waynexia)
  • fix: add one more projection to recover output schema #4733 (waynexia)
  • remove SubqueryFilterToJoin #4731 (jackwener)
  • Create writer with arrow::ipc::IPCWriteOptions #4730 (askoa)
  • Implement cast between Date and Timestamp #4726 (comphead)
  • Dynamic information_schema configuration and port more tests #4722 (alamb)
  • Add TPC-DS query planning regression tests #4719 (andygrove)
  • Minor: refactor streaming CSV inference code #4717 (alamb)
  • Reorder the physical plan optimizer rules, extract GlobalSortSelection, make Repartition optional #4714 (yahoNanJing)
  • Eagerly construct PagePruningPredicate #4713 (tustvold)
  • Move the extract_join_keys to optimizer #4711 [sql] (ygf11)
  • Avoid to bypass try_new/new() to build plan directly and cleanup filter #4702 (jackwener)
  • MINOR: Remove where_clause_object_safety clippy ignore (#3081) #4696 (tustvold)
  • Support for executing infinite files and boundedness-aware join reordering rule #4694 (metesynnada)
  • Unnecessary SortExec removal rule from Physical Plan #4691 (mustafasrepo)
  • minor: rename the github actions #4689 (jackwener)
  • FOLLOWUP: remove more recursion in optimizer rules. #4687 (jackwener)
  • Add line that prevents display_name from being called on Wildcard #4682 (andre-cc-natzka)
  • Deprecate SessionContext::create_logical_plan (#4617) #4679 (tustvold)
  • Support NTILE window function #4676 (berkaycpp)
  • Support min max aggregates in window functions with sliding windows #4675 (berkaycpp)
  • Refactor Expr::AggregateFunction and Expr::WindowFunction to use struct #4671 [sql] (Jefffrey)
  • Support type coercion for equijoin #4666 (ygf11)
  • Add --complete auto completion mode to sqllogictests #4665 (alamb)
  • Fix CoalesceBatches elasped_compute metric #4664 (Jefffrey)
  • Refactor Expr::Sort to use struct #4663 [sql] (Jefffrey)
  • More descriptive error for plus/minus between timestamps/dates #4662 (Jefffrey)
  • Stream CSV file during schema inference #4661 (Jefffrey)
  • Refine the logical and physical plan serialization and deserialization #4659 (yahoNanJing)
  • Use thiserror in sqllogictest erorr #4657 (xudong963)
  • fix cargo clippy warning #4652 [sql] (jackwener)
  • Improve group by hash performance: avoid group-key/-state clones for hash-groupby #4651 (crepererum)
  • remove recursion in optimizer rules #4650 (jackwener)
  • replace the arithmetic op for decimal array op decimal array using arrow kernel #4648 (liukun4515)
  • simplify regex expressions #4646 (crepererum)
  • Avoid generate duplicate sort Keys from Window Expressions, fix bug when decide Window Expressions ordering #4643 [sql] (mingmwang)
  • Refactor Expr::TryCast to use a struct #4642 [sql] (ygf11)
  • add ILIKE support #4639 (crepererum)
  • Detect invalid (unsupported) compression types when parsing #4637 [sql] (HaoYang670)
  • unwrap_cast_in_comparison.rs: support unint <-> decimal #4634 (liukun4515)
  • MINOR: Fix incorrect config definitions #4623 (andygrove)
  • FOLLOWUP: remove optimize() #4619 (jackwener)
  • Optimizer: avoid every rule must recursive children in optimizer #4618 (jackwener)
  • fix: run logical optimizer rules for TableScan expressions #4614 (crepererum)
  • refactor: relax the signature of register_* in SessionContext #4612 (waynexia)
  • Remove the function consume_token from the parser #4609 [sql] (HaoYang670)
  • Make SchemaProvider::table async #4607 (tustvold)
  • Lazy system tables #4606 (tustvold)
  • Refactor: Change equijoin keys from column to expression in logical join #4602 [sql] (ygf11)
  • refactor: extract assert_optimized_plan_eq from UT. #4600 (jackwener)
  • add try_optimize() for all rules. #4599 (jackwener)
  • Normalize datafusion configuration names #4596 (yahoNanJing)
  • Fix the bugs in parsing COMPRESSION TYPE #4590 [sql] (HaoYang670)
  • Minor: Remove datafusion-core dev dependency from datafusion-sql #4589 [sql] (alamb)
  • Improve error handling for array downcasting #4588 (retikulum)
  • Update to arrow v29 #4587 [sql] (tustvold)
  • Add need_data_exchange in the ExecutionPlan to indicate whether a physical operator needs data exchange #4586 (yahoNanJing)
  • Move subset of select tests to sqllogic #4583 (ajayaa)
  • bugfix: just allow having use expr in groupby or aggr #4579 [sql] (jackwener)
  • Output sqllogictests with arrow display rather than CSV writer #4578 (alamb)
  • Minor: Add test case for reduce cross join #4577 (ygf11)
  • refactor: remove redundant outer_query_schema #4576 [sql] (jackwener)
  • Preserve the TryCast expression in columnize_expr #4574 [sql] (byteink)
  • Remove Confusing "Bare" in does not exist messages #4572 [sql] (alamb)
  • Minor: Add tests for date interval predicate handling #4569 (alamb)
  • Update sqlparser requirement from 0.27 to 0.28 #4568 [sql] (alamb)
  • Avoid materializing local varaibles when creating sortMergeJoinExec #4566 (HaoYang670)
  • Minor: Fix logical conflict #4565 (alamb)
  • feat: support nested loop join with the initial version #4562 [sql] (liukun4515)
  • feat: prepare logical plan to logical plan without params/placeholders #4561 [sql] (NGA-TRAN)
  • Write faster kernel for is_distinct #4560 (comphead)
  • refactor code about query -> plan for subqueries #4559 [sql] (jackwener)
  • fix: remove wrong comment about having #4555 [sql] (jackwener)
  • feat: user-defined aggregate function(UDAF) as window function #4553 [sql] (MichaelScofield)
  • Fix date_part/extract functions to support now() #4548 (comphead)
  • bump sqllogictest to 0.9.0 #4547 (xxchan)
  • minor: Remove more clones from the planner #4546 [sql] (alamb)
  • Add tests for coercion of timestamps to strings #4545 (alamb)
  • MINOR: move sqllogictest to dev-dependencies #4544 (alamb)
  • MINOR: add some comments about intended use of ChunkedStore #4541 (alamb)
  • fix: remove TODOs linked to arrow#3147 #4540 (crepererum)
  • refactor: remove redundant build_join_schema() #4538 (jackwener)
  • Move some create/drop tests to ddl.slt #4535 (alamb)
  • Minor: Avoid cloning as many Ident during SQL planning #4534 [sql] (alamb)
  • shouldn't add outer_query_schema in sql_select_to_rex #4527 [sql] (jackwener)
  • Avoid reading the entire file in ChunkedStore #4525 (metesynnada)
  • Simplify MemoryManager #4522 (tustvold)
  • Fix limited statistic collection accross files with no stats #4521 (isidentical)
  • refactor: make Ctes a struct to also store data types provided by prepare stmt #4520 [sql] (NGA-TRAN)
  • Enrich filter statistics with known column boundaries #4519 (isidentical)
  • Remove Option from window frame #4516 [sql] (mustafasrepo)
  • Make nightly clippy happy #4515 [sql] (xudong963)
  • Remove interior mutability of MemTable #4514 (xudong963)
  • Make window function related struct public for ballista. #4511 (Ted-Jiang)
  • minor: rename push_down_limit #4510 (jackwener)
  • Add get_window_frame in window_expr, show frame info in window_agg_exec #4508 (Ted-Jiang)
  • Add sqllogictest auto labeler #4506 (mvanschellebeeck)
  • Add some more aggregate sqllogictests and remove rust tests #4505 (mvanschellebeeck)
  • Remove sqllogictests CI run #4504 (mvanschellebeeck)
  • Refactor code for insert in sqllogictest #4503 (xudong963)
  • Add empty string normalization to sqllogictests #4501 (alamb)
  • sqllogictest: A logging and command line filter #4497 (alamb)
  • Support insert into statement in sqllogictest #4496 (xudong963)
  • Improve error handling for array downcasting #4493 (retikulum)
  • Unify most of SessionConfig settings into ConfigOptions #4492 (alamb)
  • feat: support prepare statement #4490 [sql] (NGA-TRAN)
  • Minor: Update docstrings and comments to aggregate code #4489 (alamb)
  • Fix panic in median "AggregateState is not a scalar aggregate" #4488 (alamb)
  • fix push_down_projection push redundant columns. #4487 (jackwener)
  • Add window func related logic plan to proto ability. #4485 (Ted-Jiang)
  • fix Planner don't generate SubqueryAlias and generate duplicated SubqueryAlias #4484 [sql] (jackwener)
  • Improve parquet partition_file output display #4467 (alamb)
  • minor: remove redundant unwrap() #4463 (jackwener)
  • Fix Cte in from clause with duplicated cte name #4461 [sql] (xudong963)
  • Replace &Option<T> with Option<&T> part 2 #4458 (askoa)
  • Fix output_partitioning(), output_ordering(), equivalence_properties() in WindowAggExec, shift the Column indexes #4455 (mingmwang)
  • fix push_down_filter for pushing filters on grouping columns rather than aggregate columns #4447 (jackwener)
  • Add support for non-column key for equijoin when eliminating cross join to inner join #4443 [sql] (ygf11)
  • Remove the schema checking when creating CrossJoinExec #4432 (HaoYang670)
  • date_part support fractions of second #4385 (comphead)
  • Minor: use upstream RowSelection code from arrow intersect_row_selection #4340 (alamb)
  • Support type coercion for timestamp and utf8 #4312 (andre-cc-natzka)