Skip to content

Releases: spotify/scio

v0.7.3

12 Mar 20:09
Compare
Choose a tag to compare

"Vulpes Vulpes"

Bug Fixes & Improvements

  • Fix FileStorage.avroFile (#1727)
  • Fix perf regression in Coder (#1729)
  • Reduce the size of the captured stacktrace in WrappedBCoder (#1745)
  • Fix #1734: Limit job graph size by not wrapping native beam coders (#1741)
  • Explicit reset position on SeekableInput (#1747)
  • Support scalatest NotWord (#1743)
  • make BigQuery priority sysprop case-insensitive (#1736)
  • Use getSchema and avoid reflection when creating AvroCoder (#1724)
  • Clarify error message when a job uses an input multiple times. (#1720)
  • tiny typos in Coders.md (#1732)
  • Incorrect generic type in ScalaDoc (#1725)
  • Use BenchmarkResult as entity (#1712)

v0.7.2

04 Mar 17:57
Compare
Choose a tag to compare

"Ursus t. Ussuricus"

Features

  • Update Beam to 2.10 (#1674, #1676)
  • Clearer Coder exceptions (#1672)
  • Use new HadoopFormatIO (#1675)
  • Add spanner MutationGroup coder (#1704)
  • Optimize CombineFn's (speeds up aggregate-, reduce-, and combine-based operations!) (#1699)
  • Use list side input on cross product (#1691)
  • Fix DistinctBy serialization for Scala Classes (#1710, #1715)
  • Remove deprecation warning on tfRecordExampleFileWithSchema (#1714)
  • Cleanup around scio context (#1679)
  • Version bumps: cassandra-all -> 2.2.14 (#1677), 3.11.4 (#1678); Sparkey -> 3.0.0 (#1690), ES5 -> 5.6.15, ES6 -> 6.6.1 (#1700); tensorflow -> 1.13.1 (#1707); scalatest -> 3.0.6 (#1709); featran-* -> 0.3.0 (#1713)

Bug fixes

  • Fix Magnolia generated tree annotations removal to ensure Derived coders are serializable (#1673)

v0.7.1

08 Feb 18:45
Compare
Choose a tag to compare

"Taxidea Taxus"

Features

  • New HashCode-based partitioning method for keyed SCollections (#1654)
  • New Coder for java.lang.ArrayList (#1649), and more space-efficient coders for small ADTs like Either and Try (#1652)
  • new BinaryIO output (#1663)
  • Simpler, clearer toString method for Coders (#1671)
  • Custom Assertions for unit testing Coders added to scio-test package (#1642)
  • New SideMap and SideSet SideInput types, usable in hashFullOuterJoin, hashIntersectByKey, and hashFilter methods
  • Library version bumps: mysql-connector-java -> 8.0.15 (#1653), mysql-socket-factory -> 1.0.12 (#1627), protobuf-java -> 3.6.1 (#1633), hadoop-client -> 2.7.7 (#1634), jackson-module-scala -> 2.9.8 (#1632), parquet-avro -> 1.10.1 (#1648), kantan.csv -> 0.5.0 (#1647)

Bug fixes & Improvements

  • Optimized Bloom filter aggregations in sparse joins (#1644)
  • Spanner-specific Coders repackaged from scio-core to scio-spanner (#1630)
  • Fallback coder always uses Kryo (#1668) and RichCoderRegistry is removed (#1670)

v0.7.0

18 Jan 19:48
Compare
Choose a tag to compare

"Suricata suricatta"

Breaking changes

  • See v0.7.0 Migration Guide for detailed instructions
  • New Magnolia based Coders derivation replaces ClassTag and Kryo
  • New ScioIO replaces TestIO[T] to simplify IO implementation and stubbing in JobTest
  • Update dynamic file destination API #1305
  • Remove deprecated TensorFlow graph prediction method #1370
  • Object file IO is no longer backwards compatible due to coder changes
  • Refactor bigquery client (#1439)

Features

Bug fixes & Improvements

  • Make PTransform names unique #1355 #1387
  • Fail for unknown args in ContextAndArgs.typed[T] (#1413)
  • Fix verifyNondeterministic exception in coders (#1418)
  • Fix BigQueryType on refined types (#1424)
  • Fix mergeAccumulators crash (#1428)
  • Set timestamp attribute in JobTest for PubSubIO (#1417)
  • Rework Coder's implicit not found message (again) (#1469)
  • Fix KryoRegistrar scope widenning (#1462)
  • Make compression options in ExtractOps typed (#1449) (#1457)
  • Add back BigQuery schema caching, regression of #1439 (#1458)
  • Register default file systems in Scio test context (fix #1455) (#1463)
  • Use coherent defaults accross IO (#1478)
  • Fix scio-repl to use refactored BigQuery client (#1459)
  • Typed argument parsing is broken when name contains camelCase. (#1460)
  • Pubusb topic name was not being set for Messages (#1568)
  • Fix macro generated class directory (#1558)
  • Fix stack overflow when maxByKey is used with explicit ordering (#1560)
  • Fix id and timestamp attributes not being passed in saveAsPubsub (#1559)
  • Fix flatten type inference changing the coder context bound to an implicit parameter(#1551)
  • Fix: use CodeMaterializer in SideOutputCollections (#1548)
  • Default to disabled warning on coders (#1588)
  • Use alternative to deprecated write method (#1592)
  • Simplify BigQueryType query method arg type parsing (#1585)
  • Add rules for TextIO, AvroIO, PubsubIO and BigQueryIO (#1577)
  • #1587: Fix sideoutput potentialy missing coder (#1598)
  • Add region to DataflowResult (#1479)
  • Remove unused autovalue dependency (#1575)

v0.7.0-beta3

08 Jan 09:22
Compare
Choose a tag to compare
v0.7.0-beta3 Pre-release
Pre-release

Bug fixes & Improvements

  • Default to disabled warning on coders (#1588)
  • Use alternative to deprecated write method (#1592)
  • Simplify BigQueryType query method arg type parsing (#1585)
  • Add rules for TextIO, AvroIO, PubsubIO and BigQueryIO (#1577)
  • #1587: Fix sideoutput potentialy missing coder (#1598)
  • Update beam-runners-direct-java, ... to 2.9.0 (#1580)
  • Update annoy4s to 0.8.0 (#1579)
  • Update zoltar-api, zoltar-tensorflow to 0.5.1 (#1578)
  • Update circe-core, circe-generic, ... to 0.11.0 (#1586)
  • Update guava to 25.1-jre (#1589)
  • Remove unused autovalue dependency (#1575)

Features

  • Add elasticsearch 6 (#1572)
  • Add Numeric type support in scio-bigquery (#1599)

v0.7.0-beta2

06 Dec 19:03
Compare
Choose a tag to compare
v0.7.0-beta2 Pre-release
Pre-release

Bug Fixes

  • Pubusb topic name was not being set for Messages (#1568)
  • Fix macro generated class directory (#1558)
  • Fix stack overflow when maxByKey is used with explicit ordering (#1560)
  • Fix id and timestamp attributes not being passed in saveAsPubsub (#1559)
  • Fix flatten type inference changing the coder context bound to an implicit parameter(#1551)
  • Fix: use CodeMaterializer in SideOutputCollections (#1548)

Features

v0.7.0-beta1

06 Dec 20:52
Compare
Choose a tag to compare
v0.7.0-beta1 Pre-release
Pre-release

Bug fixes

  • Rework Coder's implicit not found message (again) (#1469)
  • Fix KryoRegistrar scope widenning (#1462)
  • Make compression options in ExtractOps typed (#1449) (#1457)
  • Add back BigQuery schema caching, regression of #1439 (#1458)
  • Register default file systems in Scio test context (fix #1455) (#1463)
  • Use coherent defaults accross IO (#1478)
  • Fix scio-repl to use refactored BigQuery client (#1459)
  • Typed argument parsing is broken when name contains camelCase. (#1460)

Features

  • Add Google Spanner package (#1491)
  • Add BigQuery TimePartitioning support, fix #1419 (#1466)
  • Add subscription function to PubSubAdmin(#1483)
  • Bump Beam to 2.8.0 (#1493)
  • Update dependencies (#1489)
  • Add scalafix rules (#1435)(#1464)(#1474)(#1468)(#1470)
  • Add BigQueryType typesafe args (#1476)
  • Add region to DataflowResult (#1479)
  • Expose transform function (#1492)(#1487)
  • Allow creating DataflowResult from df Job (#1481)
  • Remove Future.failed in IOs (#1482)
  • Add better error messages when missing sys.props (#1488)(#1461)

v0.7.0-alpha2

17 Oct 16:55
Compare
Choose a tag to compare
v0.7.0-alpha2 Pre-release
Pre-release

Bug Fixes:

  • Fail for unknown args in ContextAndArgs.typed[T] (#1413)
  • Fix verifyNondeterministic exception in coders (#1418)
  • Fix BigQueryType on refined types (#1424)
  • Fix mergeAccumulators crash (#1428)
  • Set timestamp attribute in JobTest for PubSubIO (#1417)

Features:

Breaking Changes:

  • Refactor bigquery client (#1439)
  • Move all coders to scio-core and rename scio-coders-macros to scio-macros (#1438)

v0.7.0-alpha1

20 Sep 06:11
Compare
Choose a tag to compare
v0.7.0-alpha1 Pre-release
Pre-release

Breaking changes

  • See v0.7.0 Migration Guide for detailed instructions
  • New Magnolia based Coders derivation replaces ClassTag and Kryo
  • New ScioIO replaces TestIO[T] to simplify IO implementation and stubbing in JobTest
  • Update dynamic file destination API #1305
  • Remove deprecated TensorFlow graph prediction method #1370
  • Object file IO is no longer backwards compatible due to coder changes

Features

  • Magnolia & macro based coder for magnitudes faster (de)serialization
  • Redesigned unified ScioIO[T] for all IO modules
  • Add SCollection#{readAll,readAllBytes} (splittable DoFn support) #796 #1363
  • Add sparse left and right outer joins #1386
  • Check and warn chained joins #1362
  • Support Parquet compression #1189 #1318
  • Port Parquet IO to Parquet 1.10 #1340 #1345
  • Configurable fetch and batch size for JDBC IO #1314

Bug fixes

v0.6.1

12 Sep 15:27
Compare
Choose a tag to compare

"Rhyncholestes raphanurus"

Features

Bug fixes

  • Make PCollection names unique #1356
  • Register joda-time serializers #1341 #1347
  • Fix duplicate jars in classpath #1334 #1348
  • Use location-aware Dataflow job endpoints #1337
  • Change DoFnWithResource logging level to DEBUG #1351
  • Cache schema in AvroType macro #1025 #1359