Releases: spotify/scio
v0.9.6
"Specialis Revelio"
🐛 Bug Fix release!
- Upgrade Sparkey to work around hash collision bug. (#3532)
From sparkey-java#3.2.1:
Fixed bug where creating hash files would erroneously lose some keys. The bug only applies to cases where: construction mode was SORTING and there existed hash collisions (so typically only when hash mode is 32 bits and the number of keys is more than 100000). The bug was introduced along with SORTING in 2.3.0.
Scio users may have been impacted by this bug if:
- writing any Sparkey files from their pipelines (
.asSparkey
,.asSparkeySideInput
, etc) - writing between 1M and 8,388,608 elements per individual Sparkey file or shard
v0.10.0-beta2
🚀 Enhancements
- improve smb auto-splittability (#3485) @clairemcginty
- Add RedisDoFn (#3461) @regadas
- Add byte size limit option to ElasticsearchIO writer (#1829) (#3437) @alexclare
- Add RedisType implicit error message (#3456) @regadas
- Override Parquet block size and GCS connector FADVISE (#3452) @nevillelyh
- Add multiple key range read for BigTableIO (#3442) @syodage
- Distributed HyperLogLog++ support with ZetaSketch (#3440) @syodage
🐛 Bug Fixes
- Fix scio version warning in tests (#3489) @regadas
- Fix getRecord should not throw on a missing field (#3490) @sfines-clgx
- Fix logging in ApproxFilter (#3475) @martinbomio
- Update gRPC to 1.32.1 (#3467) @regadas
- Revert read/write test param removal (#3451) @regadas
- Add projection & predicate in test ID for BigQueryStorage (#3466) @moscowart
- fixup! fix doc error in Tap.scala (#3487) @regadas
📗 Documentation
- Fix README Scio logo path (#3468) @viktorjonsson
- Clarify smb auto-parallelism behavior in GH site (#3449) @clairemcginty
- Added documentation for typedBigQuery (#3448) @jasmineytchen
🏗️ Build Improvements
🔧 Refactorings
- Rework
com.spotify.scio.transforms
package (#3454) @regadas - Add RedisDoFn (#3461) @regadas
- Refactor sketching package (#3443) @syodage
🌱 Dependency Updates
- Update mysql-socket-factory to 1.2.0 (#3491) @scala-steward
- Update protobuf-java to 3.14.0 (#3476) @scala-steward
- Update elasticsearch to 7.10.0 (#3472) @regadas
- Update gRPC to 1.32.1 (#3467) @regadas
- Update scalatest to 3.2.3 (#3463) @scala-steward
- Update scalacheck to 1.15.1 (#3460) @scala-steward
- Update cassandra-all to 3.11.9 (#3453) @scala-steward
- Update scalacheck to 1.15.0 (#3444) @scala-steward
Contributors to this release
@clairemcginty, @alexclare, @dependabot, @dependabot[bot], @jasmineytchen, @martinbomio, @moscowart, @nevillelyh, @regadas, @scala-steward, @sfines-clgx, @syodage and @viktorjonsson
v0.10.0-beta1
See v0.10.0 Migration Guide for detailed instructions.
Improvements
- Add scio-redis (#3386)
- Add RedisMutation support for RedisIO (#3331) (#3391)
- Add ApproxDistinctCount trait and two HLL++ beam extension implementations. (#3361)
- Add Predicate for SMB source, fix #3398 (#3402)
- Allow SortedBucketPreKeyedSink key validation with null keys (#3439)
- Add missing sortMergeGroupByKey API with custom parallelism (#3430)
- Unblock MergeAndWriteBucketsSource while in progress (#3407)
- Add SColletion
withResource
functions (#3389) - Make BQ inherit labels from Beam/Dataflow job (#3375)
- Simplify Bigquery Format typeclass and bubble up Avro coder (#3401)
- Support $LATEST replace for BQ Table type (#3376)
- Integrate docs in CI (#3392)
- Add 0.10.0 migration guide (#3436)
- Add scalafix rules for 0.10 (#3432)
- Clarify extra.csv.CsvIO docs (#3435)
- Update scio.g8 steps
- Add install instructions for sbt. (#3433)
Bug Fixes
- Avoid Iterable backed by List when running locally (#3408)
- Remove Coder context bound from read (#3384)
- Avoid fallback Coder on GenericRecord read (#3385)
- Add missing pubsub coders to avoid fallback (#3382)
- Remove prompt before interp is created (#3404)
- Fix repl windows character escape (#3405)
- Sanitize staged file path string (#3406)
- Remove unused fields in AvroIO (#3383)
- Fix JavaAsyncLookupDoFn doc link (#3381)
Dependency Updates
- Update Beam to 2.25.0 (#3438)
- Update sbt-avro to 3.2.0 (#3434)
- Update magnolify-avro, magnolify-bigtable, ... to 0.3.0 (#3431)
- Update elasticsearch-rest-client, ... to 7.9.3 (#3422)
- Update sbt-explicit-dependencies to 0.2.15 (#3417)
- Update joda-time to 2.10.8 (#3424)
- Update sbt-mima-plugin to 0.8.1 (#3414)
- Update sbt to 1.4.1 (#3415)
- Update scalafmt-core to 2.7.5 (#3410)
- Update mysql-connector-java to 8.0.22 (#3412)
- Update sbt-mdoc to 2.2.10 (#3413)
- Update junit to 4.13.1 (#3394)
- Update caffeine to 2.8.6 (#3395)
- Bump Featran to 0.7.0, re-enable failing TF tests (#3380)
v0.10.0-alpha1
Breaking Changes
- Clean up coder implicits propagation #3056 (#3170)
- Move Google Cloud IOs to
scio-google-cloud-platform
(#3340) - Deprecate ScioContext pubsubIO methods (#3345)
- Remove EOLed elasticsearch5 support (#3217)
- Remove readAll* API deprecated since 0.8 (#3216)
- Remove async DoFn aliases deprecated since 0.8
- Remove key-value transform code deprecated since 0.8 (#3213)
- Remove scio-extra code deprecated since 0.8 (#3212)
Improvements
- Add internal composite transfrom (#3159)
Bug fixes
- Remove BeamCoders unchecked warning (#3343)
v0.9.5
"Colovaria"
There are no breaking changes in this release, but some were introduced with v0.9.0
:
See v0.9.0 Migration Guide for detailed instructions.
Improvements
- Add custom GenericJson pretty print (#3367)
- scio-parquet to support dynamic destinations for windowed scollections (#3356)
- Support $LATEST replacement for Query (#3357)
- Mutable ScalableBloomFilter (#3339)
- Add specialized TupleCoders (#3350)
- Add nullCoder on Record and Disjunction coders (#3349)
Bug Fixes
- Support null-key records in smb writes (#3359)
- Fix serialization struggles in SMB transform API (#3342)
- Grammar / spelling fixes in migration guides (#3358)
- Remove unused macro import (#3353)
- Remove unused BaseSeqLikeCoder implicit (#3344)
- Filter out potentially included env directories (#3322)
- Simplify LowPriorityCoders (#3320)
- Remove unused and not useful Coder implicit trait (#3319)
- Make javaBeanCoder lower prio (#3318)
Dependency Updates
- Update Beam to 2.24.0 (#3325)
- Update scalafmt-core to 2.7.3 (#3364)
- Update elasticsearch-rest-client, ... to 7.9.2 (#3347)
- Update hadoop libs to 2.8.5 (#3337)
- Update sbt-scalafix to 0.9.21 (#3335)
- Update sbt-mdoc to 2.2.9 (#3327)
- Update sbt-avro to 3.1.0 (#3323)
- Update mysql-socket-factory to 1.1.0 (#3321)
- Update scala-collection-compat to 2.2.0 (#3312)
- Update sbt-mdoc to 2.2.8 (#3313)
v0.9.4
"Deletrius"
There are no breaking changes in this release, but some were introduced with v0.9.0
:
See v0.9.0 Migration Guide for detailed instructions.
Improvements
- Add SCollection#filterNot (#3291)
- Improve filterValues doc (#3290)
- Add support for JDBC sharding by UUID encoded as string (#3307)
- Add optimised coder derivation for AnyVal (#3296)
- Support BigQuery Avro Format (#3221)
- Support sparkey compression, fix #3210 (#3295)
- Warn if sparkey is bigger than memory, #3280
- (fix #3278) Warn on chained .groupByKey.join (#3297)
- [SMB] delete file early in NativeFileSorter (#3274)
- Change default SMB codec to Deflate to match Scio (#3247)
- Add java.time LocalDate, LocalDateTime, LocalTime, Period, Duration coders(#3238)
Bug Fixes
- Remove duplicate ShardedSparkeyReader
- Use andThen for future side effect ops (#3275)
- Fix Bigquery IT test
- Add state to exception message when pipeline is cancelled (#3270)
- Avoid scala.jdk.CollectionConverters implicit import in Avro macro (#3250)
- Avoid scala.jdk.CollectionConverters implicit import in Bigquery macro (#3240)
- fix(avro-bq): added EnumSymbol case for matching avro types to BQ TableRow (#3226) (#3232)
- Remove uneeded LowPriority implicit (#3239)
- Remove coders deprecation warns (#3242)
- Use ## and support consistent Array hashCode (#3246)
- Improve SMB error handling (#3253)
- Workaround for sorter memory limit #3260 (#3269)
Dependency Updates
- Bump sparkey to 3.2.0
- Remove unused imports (#3243)
- Update case-app, case-app-annotations, ... to 2.0.4 (#3256)
- Update cassandra-all to 3.11.8 (#3281)
- Update cassandra-driver-core to 3.10.2 (#3276)
- Update commons-io to 2.8.0 (#3310)
- Update elasticsearch, ... to 7.9.1 (#3301)
- Update elasticsearch, ... to 6.8.12 (#3264)
- Update flink runner to 1.10.1 (#3249)
- Update magnolia to 0.17.0 (#3262)
- Update magnolify-avro, magnolify-bigtable, ... to 0.2.3 (#3263)
- Update parquet-avro, parquet-column, ... to 1.11.1 (#3251)
- Update protobuf-generic to 0.2.9 (#3227)
- Update protobuf-java to 3.13.0 (#3257)
- Update sbt-avro to 3.0.0 (#3252)
- Update sbt-bloop to 1.4.4 (#3287)
- Update sbt-buildinfo to 0.10.0 (#3245)
- Update sbt-java-formatter to 0.6.0 (#3259)
- Update sbt-jmh to 0.4.0 (#3288)
- Update sbt-mdoc to 2.2.7 (#3311)
- Update sbt-mima-plugin to 0.8.0 (#3305)
- Update sbt-scalafix to 0.9.20 (#3298)
- Update scalactic to 3.2.2 (#3271)
- Update scalafmt-core to 2.7.0
- Update scalatest to 3.2.2 (#3272)
- Update transport to 6.8.12 (#3265)
- Use beam-runners-flink-1.10 when using BEAM_RUNNERS (#3303)
v0.9.3
"Petrificus Totalus"
There are no breaking changes in this release, but some were introduced with v0.9.0
:
See v0.9.0 Migration Guide for detailed instructions.
Improvements
- Allow user-supplied filename prefix for smb writes/reads (#3215)
- Refactor SortedBucketTransform into a BoundedSource + reuse merge logic (#3097)
- Add keyGroupFilter optimization to scio-smb (#3160)
- Add error message to
BaseAsyncLookupDoFn
preconditions check (#3176) - Add Elasticsearch 5,6,7 add/update alias on multiple indices ops (#3134)
- Add initial update alias op to ES7(#2920)
- Add ScioContext#applyTransform (#3146)
- Allow SCollection#transform name override (#3142)
- Allow setting default name through SCollection#applyTransform (#3144)
- Update 0.9 migration doc and add Bigquery Type read schema documentation(#3148)
Bug Fixes
- AvroBucketMetadata should validate keyPath (fix #3038) (#3140)
- Allow union types in non leaf field for key (#3187)
- Fix issue with union type as non-leaf field of smb key (#3193)
- Fix ContextAndArgs#typed overloading issue (#3199)
- Fix error propagation on Scala Future onSuccess callback (#3178)
- Fix ByteBuffer should be readOnly (#3220)
- Fix compiler warnings (#3183)
- Fix JdbcShardedReadOptions.fetchSize description (#3209)
- Fix FAQ typo (#3194)
- Fix scalafix error in SortMergeBucketScioContextSyntax (#3158)
- Add scalafix ExplicitReturnType and ProcedureSyntax rules (#3179)
- Cleanup a few more unused and unchecked params (#3223)
- Use GcpOptions#getWorkerZone instead of deprecated GcpOptions#getZone (#3224)
- Use raw coder in SCollection#applyKvTransform (#3171)
- Add raw beam coder materializer (#3164)
- Avoid circular dep between SCollection and PCollectionWrapper (#3163)
- Remove unused param of internal partitionFn (#3166)
- Remove unused CoderRegistry (#3165)
- Remove defunct scio-bench (#3150)
- Reuse applyTransform (#3162)
- Make multijoin.py python3
- Use TextIO#withCompression (#3145)
Dependency Updates
- Update Beam SDK to 2.23.0 (#3197)
- Update dependencies to be inline with 2.23.0 (#3225)
- Update to scala 2.12.12 (#3157)
- Update auto-value to 1.7.4 (#3147)
- Update breeze to 1.1 (#3211)
- Update cassandra-all to 3.11.7 (#3186)
- Update cassandra-driver-core to 3.10.0 (#3195)
- Update commons-lang3 to 3.11 (#3161)
- Update commons-text to 1.9 (#3185)
- Update contributing guidelines with current tools (#3149)
- Update elasticsearch-rest-client, ... to 7.8.1 (#3192)
- Update elasticsearch, ... to 6.8.11 (#3188)
- Update jackson-module-scala to 2.10.5 (#3169)
- Update jna to 5.6.0 (#3156)
- Update magnolify to 0.2.2 (#3154)
- Update mysql-connector-java to 8.0.21 (#3153)
- Update pprint to 0.6.0 (#3203)
- Update protobuf version to 3.11.4 (#3200)
- Update sbt-scalafix to 0.9.18 (#3138)
- Update sbt-sonatype to 3.9.4 (#3136)
- Update scalafmt-core to 2.6.2 (#3139)
- Update scalafmt-core to 2.6.3 (#3152)
- Update scalafmt-core to 2.6.4 (#3167)
- Update sparkey to 3.1.0 (#3204)
- Fix conflicting gcsio dependency (#3180)
v0.9.2
"Alohomora"
There are no breaking changes in this release, but some were introduced with v0.9.0
:
See v0.9.0 Migration Guide for detailed instructions.
Improvements
- Add more info to RateLimiterDoFn javadoc (#3054)
- Add OnTimeBehaviour window option (#3063)
- add small SMB sink optimizations (#3079)
- Add support for PipelineOptions in ContextAndArgs (#3121)
- Add support for read only
ensureTable
(#3114) - Add support for sharded JDBC reads from tables (#3020)
- Add support for String/ByteString TF Bucket key types (#3098)
- Ensures the future returned by addCallback will complete (#3099)
- Update BigQueryType and AvroType compileTimeOnly msg (#3107)
- Cleanup scio-repl classloader (#3118)
- Initialize
ScioContext
on:reset
(#3109) - Remove unnecessary cache of client interceptors (#3091)
- Refactor scio-repl underlying ILoop (#3108)
- Make SMB source a splittable BoundedSource (#3005)
- More Coder doc (#3025)
- Use a sampled subset of files to compute estimated SMB source size (#3037)
- Use IndexRecord upper bound (#3028)
- Use MonoidAggregator when available (#3051)
- Wrap skewedJoin internals in transform (#3104)
- Update deprecated .toIterator with .iterator (#3053)
- Update to use testId instead of toString (#3115)
Bug Fixes
- Fix additional exception wrappings for Flink (#3035)
- Fix classpath issue when running from IntelliJ or Bloop (#3087)
- Fix containValue serialization (#3128)
- Fix doc site edit page link (#3062)
- Fix javafmt from #3054
- Fix repl-generated classfiles output directory (#3106)
- Fix scio-repl scala specific repl settings (#3089)
- fix site homepage formatting
- fix SMB doc formatting (#3078)
- Fix tuplecoders script
- Fix: Add extra schema encoding step if coder already set (#3027)
- Fix: propagate PipelineOptions in job test (#3026)
- Fix: support mutable zeroValue in aggregate/combine/fold (#3033)
- Add missing Dataflow runner to scio-repl (#3081)
Dependency Updates
- Update auto-value, auto-value-annotations to 1.7.3 (#3042)
- Update Beam to 2.22.0 (#3039)
- Update caffeine to 2.8.5 (#3112)
- Update case-app, case-app-annotations, ... to 2.0.3 (#3094)
- Update elasticsearch-rest-client, ... to 7.8.0 (#3067)
- Update magnolify-avro, magnolify-bigtable, ... to 0.2.1 (#3127)
- Update scala to 2.13.3 (#3130)
- Update scalactic to 3.2.0 (#3072)
- Update scalatest to 3.2.0 (#3073)
- Update transport to 6.8.10 (#3047)
v0.9.1
"Aberto"
There are no breaking changes in this release, but some were introduced with v0.9.0
:
See v0.9.0 Migration Guide for detailed instructions.
Improvements
- Add option to disable colors when pretty printing (#2897)
- Add SCollection reify methods (#2987)
- Add scio-extra BigQuery saveAvroAsBigQuery IT test (#3004)
- Unify sharded & non sharded sparkey write (#3012)
- Write empty sparkey shards when no data (#3008)
- Add SCollection#groupMap and SCollection#groupMapReduce operations (#2996)
- Add BigQuery section to 0.9.0 migration guide (#2991)
- Update default SMB sorter memory to 1GB (#2993)
- SCollection.withWindowFn with upper bound type (#2975)
- Implement SMB "least bucket replication" for SortedBucketSource (#2953)
- Add Grpc-specific Kryo serializers (fix #2963) (#2964)
- Document bigquery Query type (#2950)
- Remove unsused Coder and ClassTag (#2934)
- Remove unused declared vals
- New method mapKeys for PairScollection (#2922)
- Use scala 2.13 as default (#2927)
- New inLatePane matcher for SCollection (#2921)
- Add java.sql.Timestamp coder (#2907)
Bug Fixes
- Use scalaVersion semantic selector (#3022)
- Add -release 8 scalac option (#3006)
- Fix scio-extra BigQuery IT test
- Use base64 encoding (#3003)
- Use BigQuery client project consistently (#2995)
- Propagate targetParallelism from SortedBucketIO (#2997)
- Remove mercator override (#2992)
- Fixup Grpc Kryo serializers (#2967)
- Revert "Fix: generate tree eagerly before checking for private constructors (#2846)" (#2943)
- Fix tcnative and gcsio dep conflicts (#2948)
- Rename BigQuery Beam Schema based op (#2932)
- Fix creation of local sharded Sparkey files (#2930)
- Don't require a List for Bigtable admin operations. (#2931)
- Don't write SMB metadata if prior steps fail (fix #2895) (#2899)
- Fix dependency conflicts (#2924)
- Fix docs public datasets
- Fix circe deprecation warn (#2910)
- Fix: Parser with custom Formatter not always in scope (#2909)
- Fix ScalaMatcher serialization in isEqualTo (#2904)
- Fix: temp table location changed back to US (#2905)
- Fix deprecation warns (#2966)
Dependency Updates
- Update kantan.csv to 0.6.1 (#3016)
- Update caffeine to 2.8.4 (#2999)
- Update magnolify-avro, magnolify-bigtable, ... to 0.2.0 (#2986)
- Update caffeine to 2.8.3 (#2983)
- Update elasticsearch 6/7 (#2974) (#2970)
- Update auto-value, auto-value-annotations to 1.7.2 (#2973)
- Update auto-service to 1.0-rc7 (#2972)
- Update scala to 2.13.2 (#2951)
- Update scalactic to 3.1.2 (#2959)
- Update scalatest to 3.1.2 (#2961)
- Update algebird-core, algebird-test to 0.13.7 (#2954)
- Update auto-value, auto-value-annotations to 1.7.1 (#2940)
- Update jackson-annotations, jackson-core, ... to 2.10.4 (#2938) (#2939)
- Update magnolia to 0.16.0 (#2923)
- Update cassandra-driver-core to 3.9.0 (#2925)
- Update mysql-socket-factory to 1.0.16 (#2911)
- Update mysql-connector-java to 8.0.20 (#2918)
- Update joda-time to 2.10.6 (#2912)
- Update featran-core, featran-scio, ... to 0.6.0 (#2896)
- Update scala-collection-compat to 2.1.6 (#2894)
v0.9.0
"Furnunculus"
Breaking changes
- See v0.9.0 Migration Guide for detailed instructions
- Remove deprecated elasticsearch2 (#2800)
- Remove deprecated cassandra2 (#2801)
- Remove deprecated tensorflow saveAsTfExampleFile (#2798)
- Remove toEither from ScioUtil (#2799)
- Remove ReflectiveRecordIO (#2856)
- Remove context close in favor of run (#2858)
- Remove deprecated ScioContext Future references (#2859)
- Rework implicits/syntax for scio-extra bigquery package (#2844)
- Remove implicit Coder requirement for .saveAsSortedBucket (#2839)
- Drop scala 2.11 support (#2619)
- Re-vamp Bloom filter and sparse-transforms (#2651)
- Remove deprecated bigQuery, typedBigQuery and saveAsBigQuery (#2806)
Improvements
- Add scala 2.13 support (#2619)
- Add queryAsSource to BigQueryType (#2804)
- Deprecate BigQueryType query in favor of queryRaw (#2857)
- Make OptionCoder extends from AtomicCoder (#2882)
- Make iterable and traversable coders buffered (#2881)
- Better support for alternative runners in tests (#2877)
- Use UUID in SMB temp directory (#2849)
- Reuse ApproxFilter (#2817)
- Support metadata in AvroFileOperations (fix #2832) (#2834)
- Add --help command line support for custom PipelineOptions (#2840)(#2843)
- Add covary method to lift SCollection to the specified type (#2808)
- Customize equality in unit tests and better failure message (#2733)
- Add more convinience methods that support default transform names (#2805)
Bug Fixes
- Fix create scio-spanner it clients lazy (#2889)
- Fix generate tree eagerly before checking for private constructors (#2846)
- Fix missing-bucket case when Sink collection is empty (#2869)
- Remove uneeded caffeine dep in scio-bigquery (#2861)
- Fix Sharded Sparkey string hashing behaviour for strings longer than one character. (#2826)
- Add magnolify BigtableType usage examples to scio-examples #2789 (#2816)
- Check jobReference.location for query location (#2845)
- Fix NPE in BaseAsyncLookupDoFn.Try#hashCode() (#2841)
- Fix: cancel job on waitUntilFinish timeout (#2823)
- Fix: full camelCase typed args support (#2777)
Dependency Updates
- Update magnolify to 0.1.7
- Update magnolia to 0.14.5 (#2886)
- Update beam-runners-core-construction-java, ... to 2.20.0 (#2887)
- Update scala-collection-compat to 2.1.5 (#2885)
- Update gcs-connector to hadoop2-2.1.2 (#2842)
- Update algebra to 2.0.1 (#2821)
- Update es6 transport to 6.8.8 (#2830)
- Update es7 elasticsearch-rest-client, ... to 7.6.2 (#2829)
- Update cats-kernel to 2.1.1 (#2822)
- Update PPrint to 0.5.9 (#2793)