Skip to content

v3.0.0

Compare
Choose a tag to compare
@cmgrote cmgrote released this 17 Nov 21:16

πŸŽ‰ New features

  • Adds off-heap caching to reduce memory footprint by @cmgrote in #984
  • Parses precision, scale, max length and raw definition from SQL dataType by @cmgrote in #1006
  • Adds new connector types by @Aryamanz29 in #1005

πŸ§ͺ Experimental

⛑️ Breaking changes

  • Previously, there were at least 2-3 separate approaches used for caching across the SDK and custom package toolkits, despite all of them relying purely on on-heap memory. In this release caches have been revamped:
    • to be more consistent (same operations across caches)
    • to move them off-heap (with disk spilling)
  • We have tried to minimize the impact of this change, but if you are using these caches directly (or related custom package runtime toolkit functionality) you may be impacted. See the end of the release notes for further details.
  • The custom package testing toolkit's interface has also been changed slightly:
    • The PackageTest class now requires a "tag" parameter, that is used in all generated names for that test to make them more immediately identifiable.
    • Rather than calling the setup() method with your custom package's configuration and then directly invoking your custom package's code (typically via its main() method), you should now use the runCustomPackage() method β€” which takes both the configuration and your custom package's execution method as parameters. This ensures your custom package is executed with appropriate isolation from any others that may be tested in parallel.

🐞 Bug fixes

  • Fixes an issue in asset batching related to case-insensitivity handling by @ErnestoLoma in #990

πŸ“¦ Packages

  • Fixes a dependency issue on Google Cloud by @ErnestoLoma in #974
  • Fixes container name used on Azure by @ErnestoLoma in #1013
  • Passes through failOnErrors in relational assets builder, to allow continuation beyond failures by @cmgrote in #994
  • Asset import improvements by @cmgrote in #1014
  • Extends off-heap caching to all custom package asset caches by @cmgrote in #988
  • Fixes generalization-specialization relationship attribute names for data model assets by @Aryamanz29 in #975

πŸ₯— QOL improvements

  • Extends off-heap caching to asset batch processing tracking by @cmgrote in #988

Full Changelog: v2.3.2...v3.0.0

Caching changes

As mentioned in the breaking changes, this release introduces a complete revamp of the way caching is done across both the SDK and custom package toolkits.

Moved off-heap

Previously the caching was done entirely in-memory, on-heap. This naturally limited its scale β€” to the point where we had begun to observe OutOfMemory errors in some cases. With this release, all caches are moved off-heap and managed by an embedded, persistent key-value store that still offers high performance. (Note that this does mean some temporary files will be created locally to manage the caches β€” these are created using Files.createTempDirectory(), so the specific location will depend on how the JVM manages this on your system.)

Consistency changes

We've consolidated 2-3 different interfaces for caching that had evolved independently during the lifecycle of the SDK and package toolkit to date, resolving that tech debt so that all can directly benefit from the move to off-heap.

  • Each cache now manages the following for each object in it:
    • an ID (always a UUID)
    • a human-readable name
    • an optional secondary ID (such as the hashed-string internal representation for tags and custom metadata)
    • the actual object that was cached
  • Retrieval now uses these options almost exclusively, for consistency. For example, instead of .getCustomMetadataDef() on the custom metadata cache, you would now just use .getByName().
  • Prior operations like lookupAssetByIdentity and lookupAssetByGuid in some cache variations have been made consistent, to lookupByIdentity and lookupById.
  • Wherever possible the caches are now implemented using Java generics typing, so should return strong types without needing casting (rather than generic types like Asset).

Necessary interface changes

Various places that used to make use of collections (Collection<Asset>, List<Asset>, etc) now instead either directly make use of the new OffHeapAssetCache or of Stream<Asset> β€” to no longer require everything in-memory on the heap in order to be able to process it. All caches are now generally Closeable, too, to allow them to self-manage cleaning up any temporary files they use for persistence.

  • Utils.updateConnectionCache() in the custom package runtime toolkit (now uses OffHeapAssetCache)
  • PersistentConnectionCache.deleteAssets() and .addAssets() in the custom package runtime toolkit (now process Stream<Asset>)
  • AssetGenerator.cacheCreated() interface (now processes Stream<Asset>)
  • ImportResults.Details internal tracking of created, updated, restored and skipped assets (now use OffHeapAssetCache)
  • AssetRemover.deleteAssets() in the custom package runtime toolkit (now uses OffHeapAssetCache)
  • DeltaProcessor.run() in the custom package runtime toolkit (now uses OffHeapAssetCache)
  • AtlanClient itself is now Closeable, so it can clean up its own managed caches
  • AssetBatch and ParallelBatch no longer offer fluent-builder creation, but only overloaded constructors, and both now track their processing of created, updated, restored and skipped assets in OffHeapAssetCaches, and are now Closeable so they can clean up their own managed caches.