Skip to content

Releases: spotify/scio

v0.5.0-alpha2

29 Jan 23:48
Compare
Choose a tag to compare
v0.5.0-alpha2 Pre-release
Pre-release

Breaking changes

  • BigQueryIO in JobTest#output now requires a type parameter. Explicit .map(T.toTableRow) of test data is no longer needed.
  • Typed AvroIO now accepts case classes instead of Avro records in JobTest. Explicit .map(T.toGenericRecord) of test data is no longer needed. See this change for more.
  • Package com.spotify.scio.extra.transforms is moved from scio-extra to scio-core, under com.spotify.scio.transforms.

See this section for more details.

Features

  • Remove toGenericRecord requirement when testing typed AvrioIO #1022 #1036
  • Bump sparkey to 2.2.1, protobuf-generic to 0.2.4 #1028

Bug fixes

v0.5.0-alpha1

17 Jan 20:28
Compare
Choose a tag to compare
v0.5.0-alpha1 Pre-release
Pre-release

Breaking changes

  • BigQueryIO in JobTest#output now requires a type parameter. Explicit .map(T.toTableRow) of test data is no longer needed.
  • Package com.spotify.scio.extra.transforms is moved from scio-extra to scio-core, under com.spotify.scio.transforms.

See this section for more details.

Features

  • Support reading BigQuery as Avro #964, #992
  • Add TFRecordSpec support for Featran #1002
  • Add AsyncLookupDoFn #1012

Bug fixes

  • Fix SCollectionMatchers serialization #1001
  • Check runner version #1008 #1009

v0.4.7

04 Jan 20:24
Compare
Choose a tag to compare

"Hydrochoerus hydrochaeris"

Features

  • Add support for TFRecordSpec #990
  • Add convenience methods to randomSpit #987
  • Add BigtableDoFn #922 #931
  • Add optional arguments validation #979
  • Performance improvement in Avro and BigQuery macro converters #989
  • Update new bigquery datetime and timestamp specification #982

Bug fixes

  • Fix join performance regression introduced in 0.4.4 #976 #983
  • Fine tune dynamic sink API and and integration tests #993 #995

v0.4.6

19 Dec 17:56
Compare
Choose a tag to compare

"Galago gallarum"

Features

  • Upgrade Beam to 2.2.0 #797 #958
  • Support dynamic file IO destinations #919 #965
  • Support custom Kryo options via PipelineOptions #896 #955
  • Propagate input to TensorFlow predict output fn
  • Annotate more examples with Socco
  • Support compression in TextIO #972
  • Use Compression in TFRecordIO #977
  • Add TSV examples #974

Bug fixes

  • Use window as side input cache key #959 #960
  • Use canonical path in macro type providers #975
  • Fix deduplication in SCollection#subtract #973
  • Fix empty RHS for hashJoin and hashLeftJoin #953
  • Fix ClassNotFound issue with ClosureCleaner
  • Lift projection in ParquetAvroFile#flatMap
  • Add Dataflow runner to scio-examples #963 #968
  • Remove deprecated Pubsub ClientAuthInterceptor #957 #962

v0.4.5

28 Nov 23:51
Compare
Choose a tag to compare

"Felis ferus"

Features

  • Add PubSub admin helper methods #929
  • Add Elasticsearch ensureIndex helper methods #912 #925
  • Support drop row range in Bigtable admin #926
  • Examples site with Socco annotations

Bug fixes

  • Fix dependencies issue with DataflowRunner, #934 #935
  • Use SingletonSideInput instead of MultiMapSideInput for hash joins #916 #917
  • Use thread safe version of SparkeyReader #941 #944
  • Make SCollection#take lazy, fix #938 #940
  • Require extra map in Parquet Avro, fix #928 #930
  • Tag and exclude slow tests #909 #927
  • Fix build for Windows #918 #920 #921

v0.4.4

01 Nov 21:58
Compare
Choose a tag to compare

"Erinaceus europaeus"

Breaking change

Dataflow runner dependency is removed from scio-core. You need to explicitly add all runner dependencies now. Dataflow specific logic is also removed from ScioResult. See this page for more details.

Features

  • Add bigquery dynamic destinations #876 #73
  • Add time condition matchers #883 #458
  • Support default value for singleton side input #894
  • Allow wrapping internal views as SideInput #897
  • Decouple Dataflow runner #779 #882 #850
  • Bump Scala 2.12 to 2.12.4 #893
  • Bump sbt to 1.0.3 #904

Bug fixes

  • Use chunked in JTraversableSerializer, fix #887 #888
  • Performance improvement for 2-way joins #901 #891
  • Fix Option support in flatten, fixes #886 #889
  • Honor caching configuration for in standalone BigQueryClient #899
  • Fix contribution guideline and code of conduct #906 #907
  • Scaladoc fixes #900

v0.4.3

12 Oct 06:18
Compare
Choose a tag to compare

"Dendrohyrax dorsalis"

Features

  • Expose internal map of Args #861
  • Workaround asymmetry between distCache and distCache mocking #869

Bug fixes

  • Fix kryo performance #825
  • Update SideInput cache on context change #865 #866
  • Handle bigquery client failure #828 #864
  • Fix BigQuery scio-idea-plugin integration #874 #877
  • Handle stream EOF in TFRecordCodec #815 #878
  • Revert scio-repl to Scala 2.11 #867

v0.4.2

27 Sep 06:32
Compare
Choose a tag to compare

"Castor canadensis"

Breaking change

Beam direct runner is no longer a dependency of scio-core. Add the following dependency if you want to run a pipeline locally. The current beam version is 2.1.0.

"org.apache.beam" % "beam-runners-direct-java" % beamVersion

Features

  • Add flatten and flattenValues to SCollection #842
  • Add Annoy side input #783 #812
  • Support saving TF Example together with feature spec #816
  • Support metrics in JobTest #846 #851
  • Use Scala 2.12 for scio-repl #834 #835
  • Support custom body in BigQueryType #808
  • Add Scio Benchmark #506 #830
  • Remove direct runner dependency #777 #852
  • Remove DataflowPipelineOptions from ScioILoop #779
  • Bump Bigtable dependency #841
  • Update Algebird to 0.13.2

Bug fixes

  • Fix bug in JIterable Serializer #836
  • Fix for KryoAtomicClass serialization issue #855 #856
  • Fork in scio-examples/run #847 #848
  • Test AsyncDoFn with ScalaCheck commands #839
  • Fix GuavaAsyncDoFn flakiness #858
  • Add key to Datastore example #782
  • Patch all AvroIO files #857

v0.4.1

07 Sep 18:15
Compare
Choose a tag to compare

"Blarina brevicauda"

Features

  • Bump Beam to 2.1.0 #633
  • Add Parquet Avro read support #794 #801
  • Add custom parallelism for Cassandra sink #792 #795
  • Make runWithLocalOutput public to allow more generic testing #826
  • Add option gettings to BigQuery TableRow #810
  • Add tableExists to BigQueryClient #824
  • Add enhanced version of TableReference with asTableSpec #822
  • Make ArraySemigroup a Monoid #800
  • Use Kryo 4.0.1 #818

Bug fixes

  • Fix KryoAtomicCoder regression #829
  • Generate nested case class with stable names for AvroType #802
  • Fail on BigQueryType defined in package #811
  • Hide com.spotify.scio.io in REPL wildcard import #821
  • Remove some runner specific logic #804 #805 #807

v0.4.0

22 Aug 20:53
Compare
Choose a tag to compare

"Atelerix albiventris"

Features

  • Add Parquet Avro read support #771
  • Add option to set BigQuery priority #759
  • Add missing helpers for Date,Time,DateTime #761
  • Add example of DoFn usage #766
  • Add safeFlatMap #776
  • Support custom number of shards in TFRecord output #787
  • Support fetching Bigtable cluster size #773
  • Support custom Circe Printer for JSON IO #769, #770
  • Fail on duplicate IO within the pipeline in JobTest #786
  • Coder performance improvements #767
  • Move TFRecordIO to tensforflow sub-module

Bug fixes

  • Enforce numeric for BigQuery $LATEST partition #780 #781
  • Lazy batch Elasticsearch requests #784 #785
  • Check isCacheEnabled in BigQueryClient #758
  • Support shouldNot with SCollectionMatchers, fix #760 #763
  • Fix AvroType schema namespace #774
  • Monoid reduceOption should default to zero
  • Fix null string in TopWikipediaSessions