Skip to content

Releases: typelevel/cats-effect

v3.3.2

30 Dec 18:31
v3.3.2
2888202
Compare
Choose a tag to compare

This is the seventeenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

This patch release focuses primarily on performance improvements in two major areas: blocking/interruptible and suspended fiber tracking.

In the former area, the Cats Effect fiber runtime has long had support for the scala.concurrent.blocking construct within any code which is scheduled on its worker threads. When such a block is hit, the runtime takes it as a signal that it is about to lose a functioning worker thread and thus spawns a new one, seamlessly putting it into rotation to ensure the pool is not starved by the current worker thread being blocked. This trick works very well, but wasn't particularly recommended in user code because the performance was worse than the native IO.blocking operation.

In this release, Vasil has changed the behavior of the pool to seamlessly shift worker state when a blocking section is hit, effectively morphing another thread into the exact state as the now-blocked thread. Additionally, spare threads constructed when blocking operations are hit are now cached for one minute before being cleaned up if still idle, ensuring that they're still around if a subsequent blocking operation is hit in short order.

These improvements, taken together, mean that scala.concurrent.blocking inside of delay is actually faster than the IO.blocking operation by a significant margin, meaning that we can reap immediate performance benefits by converting IO.blocking and IO.interruptible to use this native mechanism rather than an ancillary thread pool.

Screen Shot 2021-12-30 at 11 53 45 AM

Please note that the above is plotted on a log scale to make it easier to see the relative differences in each scenario. For reference, the improvements in the "fine grained" benchmark represent the test running 141x faster! (not a percent sign) Blocking is still bad for throughput, but it's a lot less bad now. You can find all of these benchmarks in the repository.

As if that weren't enough, we've reimplemented the tracking mechanism for suspended fibers which underlies the new fiber dump feature introduced in 3.3.0. This feature was and is implemented using a thread local set-like data structure which maintains weak references to any suspended fiber. The weak references are necessary for two reasons. First, it ensures that any fiber which is suspended and then the callback is "lost" can still be garbage collected normally. Second, it allows us to avoid the extra memory barriers associated with backtracking to the suspending thread when the fiber is resumed, making the whole mechanism significantly faster.

Unfortunately, this comes with a cost: these weak references must be examined and ultimately cleaned by the garbage collector, which means that we're effectively taking synchronous work out of the main code path and moving it asynchronously into the garbage collector. This in turn can mean that certain types of workflows which already put significant pressure on the GC may have seen diminished performance with the update to 3.3.0.

This release significantly reduces the GC overhead by simplifying and specializing the data structure to reduce the number of weak references and allocations involved in the tracking itself. The results should be unnoticeable in most optimized workloads, but for applications which are creating a significant amount of short-lived objects within their hot path, these changes should produce a substantial speed-up relative to 3.3.1.

User-Facing Pull Requests

Thank you so much!

v3.3.1

20 Dec 17:41
v3.3.1
4c281f0
Compare
Choose a tag to compare

This is the sixteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

This release contains bug fixes and performance enhancements to tracing and some other areas. Both enhanced exceptions and fiber dumps should now behave better on the latest versions of the JVM (which use a different top-level package identifier). Tracing on Scala.js can now be fully disabled if its overhead is too high, whereas previously some bookkeeping was retained even when tracing was configured to off.

Most significantly, Resource.uncancelable previously contained a significant bug in which error states were rewritten within the block. This ultimately stemmed from a limitation in the original Resource API with respect to full representation of outcomes, and it indirectly impacted all use of Async[Resource]. Thanks to the efforts of @TimWSpence, these bugs have now been completely squashed with the addition of a new Resource interpreter: allocatedCase (originally proposed by @kubukoz). In the interest of maintaining forward-compatibility, this function is currently marked as package-private, but will be marked as public in Cats Effect 3.4.0.

User-Facing Pull Requests

Heartfelt thanks; you are all amazing!

v3.3.0

28 Nov 05:32
v3.3.0
9c8af7d
Compare
Choose a tag to compare

This is the fifteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas (detailed below). Scalafixes are available and should be automatically applied by Scala Steward if relevant.

The theme of this release has been improving observability, testability, and debuggability of all applications using Cats Effect 3. This has resulted in a massive set of new functionality, tweaks, and improvements which make 3.3.0 the most significant Cats Effect release ever apart from 3.0.0 itself. The developer experience has been significantly improved, particularly in tricky areas such as diagnosing deadlocks and deterministically testing functionality involving timers and clocks. Additionally, new functionality has been brought to Scala.js, including full support for tracing!

Finally, we took the opportunity to continue to build on IO's best-in-class performance with significant improvements to the fiber runtime (including dynamic workload fairness self-tuning) and continued dramatic slimming of the fiber memory footprint. Embarrassingly, we even discovered that the build was using an unnecessary Scala compiler flag which inhibited performance in some scenarios (particularly involving large numbers of fibers) by up to 15%! All of these improvements should add up into a very noticeable leap forward for application metrics in nearly all real-world scenarios.

Notable Changes

Thread Fiber Dumps

One of the most annoying and difficult problems to resolve in any asynchronous application is the asynchronous deadlock. This scenario happens when you have a classic deadlock of some variety, where one fiber is waiting for a second fiber which is in turn waiting for the first (in the simplest case). Due to the asynchronous nature of the runtime, whenever a fiber blocks, all references to it are removed from the internal runtime, meaning that a deadlock generally leaves absolutely no residue whatsoever, and the only recourse as a developer is to just start sprinkling IO.println expressions around the code to see if you can figure out where it's getting stuck.

This is very much in contrast to a conventional deadlock in a synchronous runtime, where we have JVM-level tools such as thread dumps to suss out where things are stuck. In particular, thread dumps are a commonly-applied low level tool offered by the JVM which can serve to inform users of what threads are active at that point in time and what each of their call stacks are. This tool is generally quite useful, but it becomes even more useful when the threads are deadlocked: the call stacks show exactly where each thread is blocked, making it relatively simple to reconstruct what they are blocked on and thus how to untie the knot.

Fiber dumps are a similar construct for Cats Effect applications. Even better, you don't need to change anything in order to take advantage of this functionality. As a simple example, here is an application which trivially deadlocks:

import cats.effect.{IO, IOApp}

object Deadlock extends IOApp.Simple {
  val run =
    for {
      latch <- IO.deferred[Unit]

      body = latch.get
      fiber <- body.start
      _ <- fiber.join

      _ <- latch.complete(())
    } yield ()
}

The main fiber is waiting on fiber.join, which will only be completed once latch is released, which in turn will only happen on the main fiber after the child fiber completes. Thus, both fibers are deadlocked on each other. Prior to fiber dumps, this situation would be entirely invisible. Manually traversing the IO internals via a heap dump would be the only mechanism for gathering clues as to the problem, which is far from user-friendly and also generally fruitless.

As of Cats Effect 3.3.0, users can now simply trigger a fiber dump to get the following diagnostic output printed to standard error:

cats.effect.IOFiber@56824a14 WAITING
 ├ flatMap @ Deadlock$.$anonfun$run$2(Deadlock.scala:26)
 ├ flatMap @ Deadlock$.$anonfun$run$1(Deadlock.scala:25)
 ├ deferred @ Deadlock$.<clinit>(Deadlock.scala:22)
 ├ flatMap @ Deadlock$.<clinit>(Deadlock.scala:22)
 ╰ run$ @ Deadlock$.run(Deadlock.scala:19)
 
cats.effect.IOFiber@6194c61c WAITING
 ├ get @ Deadlock$.$anonfun$run$1(Deadlock.scala:24)
 ╰ get @ Deadlock$.$anonfun$run$1(Deadlock.scala:24)
 
Thread[io-compute-14,5,run-main-group-6] (#14): 0 enqueued
Thread[io-compute-12,5,run-main-group-6] (#12): 0 enqueued
Thread[io-compute-6,5,run-main-group-6] (#6): 0 enqueued
Thread[io-compute-5,5,run-main-group-6] (#5): 0 enqueued
Thread[io-compute-8,5,run-main-group-6] (#8): 0 enqueued
Thread[io-compute-9,5,run-main-group-6] (#9): 0 enqueued
Thread[io-compute-11,5,run-main-group-6] (#11): 0 enqueued
Thread[io-compute-7,5,run-main-group-6] (#7): 0 enqueued
Thread[io-compute-10,5,run-main-group-6] (#10): 0 enqueued
Thread[io-compute-4,5,run-main-group-6] (#4): 0 enqueued
Thread[io-compute-13,5,run-main-group-6] (#13): 0 enqueued
Thread[io-compute-0,5,run-main-group-6] (#0): 0 enqueued
Thread[io-compute-2,5,run-main-group-6] (#2): 0 enqueued
Thread[io-compute-3,5,run-main-group-6] (#3): 0 enqueued
Thread[io-compute-1,5,run-main-group-6] (#1): 0 enqueued
Thread[io-compute-15,5,run-main-group-6] (#15): 0 enqueued
 
Global: enqueued 0, foreign 0, waiting 2

A fiber dump prints every fiber known to the runtime, regardless of whether they are suspended, blocked, yielding, active on some foreign runtime (via evalOn), or actively running on a worker thread. You can see an example of a larger dump in this gist. Each fiber given a stable unique hexadecimal ID and paired with its status as well as its current trace, making it extremely easy to identify problems such as our earlier deadlock: the first fiber is suspended at line 26 (fiber.join) while the second fiber is suspended at line 24 (latch.get). This gives us a very good idea of what's happening and how to fix it.

Note that most production applications have a lot of fibers at any point in time (millions and even tens of millions are possible even on consumer hardware), so the dump may be quite large. It's also worth noting that this is a statistical snapshot mechanism. The data it is aggregating is spread across multiple threads which may or may not have all published into main memory at a given point in time. Thus, it isn't necessarily an instantaneously consistent view of the runtime. Under some circumstances, trace information for a given fiber may be behind its actual position, or a fiber may be reported as being in one state (e.g. YIELDING) when in fact it is in a different one (e.g. WAITING). Under rare circumstances, newly-spawned fibers may be missed. These circumstances are considerably more common on ARM architectures than they are under x86 due to store order semantics.

Summary statistics for the global fiber runtime are printed following the fiber traces. In the above example, these statistics are relatively trivial, but in a real-world application this can give you an idea of where your fibers are being scheduled.

Triggering the above fiber dump is a matter of sending a POSIX signal to the process using the kill command. The exact signal is dependent on the JVM (and version thereof) and operating system under which your application is running. Rather than attempting to hard-code all possible compatible signal configurations, Cats Effect simply attempts to register both INFO and USR1 (for JVM applications) or USR2 (for Node.js applications). In practice, INFO will most commonly be used on macOS and BSD, while USR1 is more common on Linux. Thus, kill -INFO <pid> on macOS and kill -USR1 <pid> on Linux (or USR2 for Node.js applications). POSIX signals do not exist on Windows (except under WSL, which behaves exactly like a normal Linux), and thus the mechanism is disabled.

Since INFO is the signal used on macOS and BSD, this combined with a quirk of Apple's TTY implementation means that anyone running a Cats Effect application on macOS can simply hit Ctrl-T within the active application to trigger a fiber dump, similar to how you can use Ctrl-\ to trigger a thread dump. Note that this trick only works on macOS, since that is the only platform which maps a particular keybind to either the INFO or USR1 signals.

In the event that you're either running on a platform which doesn't support POSIX signals, or the signal registration failed for whatever reason, Cats Effect on the JVM will also automatically register an MBean under cats.effect.unsafe.metrics.LiveFiberSnapshotTriggerMBean which can produce a string representation of the fiber dump when its only method is invoked.

This entire mechanism has no performance impact (well, it probably would if you kept printing the dump in a loop, but don't do that). It is controlled by the same configuration as tracing.

And in case you were wondering, yes, it does work on Node.js applications!

cats.effect.IOFiber@d WAITING
 
cats.effect.IOFiber@9 WAITING
 ╰ deferred @ <jscode>.null.$c_LDeadlock$(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:22)
 
cats.effect.IOFiber@a WAITING
 ├ flatMap @ <jscode>.null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:26)
 ├ flatMap @ <jscode>.null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:25)
 ├ deferred @ <jscode>.null.$c_LDeadlock$(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala...
Read more

v3.2.9

18 Sep 05:16
v3.2.9
4b42b0f
Compare
Choose a tag to compare

This is the fourteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.2.x release.

The only change in this release (from 3.2.8) is a minor bugfix which affects Semaphore. In particular, under certain circumstnaces, fibers awaiting permits could end up indefinitely stuck awaiting wake-up. This could happen whenever multiple fibers were awaiting and the releasing fiber was canceled while notifying the awaiters.

User-Facing Pull Requests

Thank you so much!

v2.5.4

17 Sep 23:51
v2.5.4
6900aee
Compare
Choose a tag to compare

This is the seventeenth major release in the Cats Effect 2.x lineage. It is fully binary compatible with all 2.x.y releases.

The primary change in this release is a new feature: tracing support for delay and defer! This is something that Cats Effect 3 has supported for some time now, but it was never supported in CE2 for various reasons. In implementing this feature, we also backported the thunk acquisition fix from Cats Effect 3, which works around some of the changes in Scala 3's encoding of by-name parameters.

User-Facing Pull Requests

Very special thanks to all of you!

v3.2.8

07 Sep 22:00
v3.2.8
7dc4aaa
Compare
Choose a tag to compare

This is the thirteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.2.x release.

This release reverts the changes to the priority queue implemented in cats.effect.std.PQueue, namely it follows other standard libraries in the choice that FIFO semantics are not necessarily respected when elements are tied in terms of priority.

Furthermore, this release brings several bug fixes for corner cases and performance improvements to the Cats Effect runtime support for detecting and guarding against scala.concurrent.blocking actions (calling Await.result on scala.concurrent.Future or calling unsafeRunSync() on the compute runtime).

User-Facing Pull Requests

We hope you enjoy this release. Thank you.

v3.2.7

04 Sep 02:45
v3.2.7
5a79be7
Compare
Choose a tag to compare

This is the twelfth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.2.x release.

User-Facing Pull Requests

  • #2298PQueue respects FIFO for elements at the same priority (@SystemFw)

Thank you so much!

v3.2.6

17 Sep 23:21
v3.2.6
3b73eb8
Compare
Choose a tag to compare

This release was made in error. It is identical to 3.2.5.

v3.2.5

28 Aug 20:09
v3.2.5
670a184
Compare
Choose a tag to compare

This is the eleventh major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.2.x release.

This release fixes a regression in the work-stealing fiber scheduler. In addition, it reverts a change to the Async#fromPromise and IO.fromPromise signatures which was source-incompatible in numerous common scenarios due to limitations in Scala's type inference.

User-Facing Pull Requests

Thank you!

v3.2.4

28 Aug 00:27
v3.2.4
Compare
Choose a tag to compare

Update: This release introduced a regression in the work-stealing thread pool. This regression is fixed in 3.2.5, and it is strongly recommended that users not depend on 3.2.4.

This is the tenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.2.x release.

User-Facing Pull Requests

Thank you so very much, all of you!