Reduce memory footprint of evaluated values in LazyList #10937

lrytz · 2024-11-25T15:28:03Z

The LazyList implementation uses two objects per evaluated value,
a LazyList and one State instance. So the overhead for evaluated
values is 2x compared to Stream.

This PR reduces the overhead to one LazyList instance with two
fields (head and tail) per evaluated value.

Thread safety is implemented through checking the volatile head
field and synchronizing on this for initialization. This is
the equivalent to the previous implementation which uses a lazy val.

Breaking changes:

Serialization is not compatible. Non-evaluated (tails of) streams are
serialized using plain object serialization. The internals are
different and serialization is incompatible in both directions.

toString of cyclic streams displays one more element, <cycle>
is only shown staring starting off the second element. For example,
in val x: LL[Int] = 1 #:: 2 #:: x, the x.tail.tail instance
is different than x, only the next tail is a shared instance.
Before, the cycle could be detected earlier because the state
instance of x and x.tail.tail is the same, even if the
LazyList instances were different.

Fixes scala/bug#13062

lrytz · 2024-11-25T15:30:51Z

JOL output before this change:

scala.collection.immutable.LazyList@2d2e5f00d footprint:
     COUNT       AVG       SUM   DESCRIPTION
       100        16      1600   java.lang.Object
       101        24      2424   scala.collection.immutable.LazyList
       100        24      2400   scala.collection.immutable.LazyList$State$Cons
         1        16        16   scala.collection.immutable.LazyList$State$Empty$
       302                6440   (total)

After:

scala.collection.immutable.LazyList@2ea6137d footprint:
     COUNT       AVG       SUM   DESCRIPTION
       100        16      1600   java.lang.Object
       101        24      2424   scala.collection.immutable.LazyList
       201                4024   (total)

sjrd · 2024-11-25T15:46:08Z

That looks like a reasonable encoding. However it's going to be tricky to make thread-safe while staying portable for the non-JVM platforms. The straightforward approach would be to use CAS'es on the fields, but that's not supported on JS/Native other than through ju.concurrent.AtomicX classes. Using the latter would of course defeat the purpose of the whole PR.

He-Pin · 2024-11-25T16:10:20Z

Does other platform support the AtomicXFieldUpdatper？ that will help save more bytes

sjrd · 2024-11-25T16:15:19Z

No, they don't. These classes are also reflection-based.

Ichoran · 2024-11-25T18:24:00Z

Is space or runtime performance more important for LazyList? If space isn't clearly more important, we'd better have a benchmark before/after also. synchronized is an obvious way to fix the thread safety; it's just slower under some scenarios.

NthPortal · 2024-11-25T22:16:18Z

I'll try to have a look at this. I've had another idea for a while of boxing all the evaluation into its own class, and then just swapping the val from the box to the result after the evaluation is done, which sidesteps the concurrency issues I believe

lrytz · 2024-11-25T23:02:18Z

👍 I also think this can probably be cleaned up furhter. def state now creates a new State.Cons on each invocation which is unnecessary. I just did the minimal change to see what can be done.

NthPortal · 2024-11-27T10:41:14Z

@lrytz I haven't fixed this enough for it to compile, but thoughts on this design? https://github.com/scala/scala/compare/2.13.x...NthPortal:scala:ll-memory/R1?expand=1

lrytz · 2024-11-28T11:00:38Z

@NthPortal I see, the idea is using a lazy val in the Evaluator to implement thread safety, and make sure that multiple threads assigning to _head / _tail is idempotent.

I'm not sure which one is easier, your design or implementing thread safety manually.

Inspired by the desugaring for class C { lazy val x = 42 }:

public class C {
    private int x;
    private volatile boolean bitmap$0;

    private int x$lzycompute() {
        C c = this;
        synchronized (c) {
            if (!this.bitmap$0) {
                this.x = 42;
                this.bitmap$0 = true;
            }
        }
        return this.x;
    }

    public int x() {
        if (!this.bitmap$0) {
            return this.x$lzycompute();
        }
        return this.x;
    }
}

We should be able to do it as:

class LazyList {
  @volatile var _head = Undefined
  var _state = lazyState

  def state =
    if (_head ne Undefined) sCons(_head, _state)
    else {
      synchronized {
        if (_head eq Undefined) {
          // evaluate lazy state
          _state = ...
          _head = ...
        }
      }
      state
    }

I pushed that, wdyt? I would need to brush up my java memory model knowledge to be sure. If a thread observes _head ne Undefined, is it guaranteed to also see the changed _state? It seems so from the lazy val desugaring, but I don't have the knowledge to explain it.

Ichoran · 2024-11-28T14:22:21Z

Aren't you creating a new object on every access now? Are we sure this is a win?

NthPortal · 2024-11-28T14:52:11Z

@lrytz I'm pretty sure you're correct that setting @volatile var _head second acts as a fence for _state. I think my biggest issue is that it no longer detects self-reference, which means that you get a stack overflow instead, which is bad UX. I do like the idea of less boxing though. on the other hand, this makes it more difficult (maybe impossible?) to share the state when prepending without double-evaluating

lrytz · 2024-11-29T15:02:10Z

Aren't you creating a new object on every access now? Are we sure this is a win?

@Ichoran I changed this now (see commit "remove State"). WDYT?

it no longer detects self-reference, which means that you get a stack overflow instead, which is bad UX

@NthPortal could you explain this in more detail, I don't know what you mean. The selfReferentialFailure JUnit test passes.

I adjusted the tests for toString of cyclic lazy lists. The underlying reason

scala> lazy val ll: LazyList[Int] = 1 #:: 2 #:: ll
lazy val ll: LazyList[Int] // unevaluated

scala> ll.tail.tail eq ll
val res0: Boolean = false

scala> ll.tail.tail.tail eq ll.tail
val res1: Boolean = true

Until here it's the same before and after my change. But ll.toString is now different: before LazyList(1, 2, <cycle>), after LazyList(1, 2, 1, <cycle>)

Before the change, ll.tail.tail.state eq ll.state is true

import scala.util.chaining._
def getMethodAccessible[A: reflect.ClassTag](name: String): java.lang.reflect.Method =
  implicitly[reflect.ClassTag[A]]
    .runtimeClass.getDeclaredMethods
    .find(_.getName == name).get.tap(_.setAccessible(true))

val state = getMethodAccessible[LazyList[_]]("scala$collection$immutable$LazyList$$state")

scala> state.invoke(ll.tail.tail) eq state.invoke(ll)
val res6: Boolean = true

But we no longer have the state objects, so cycles are detected one element later.

Serialization: is actually not compatible, because un-evaluated lazy lists are serialized using plain object serialization. I don't think we can work around this. But we don't guarantee serialization compatibility between releases, so 🤷‍♂️.

Ichoran · 2024-11-29T21:27:58Z

@lrytz - I guess is about the best that can be hoped for given the binary compatibility requirements. There is the one extra creation of a LazyList that you destructure and throw away, but reworking everything to use a mutating visitor instead of a regular lambda is probably (1) hard to get right and (2) hard to make binary compatible (because you inherently need visibility into structure that used to not exist).

So I think the performance overhead, if it's an issue, would be the only thing remaining (not counting correctness, of course--I can't eyeball that, but there are a fair number of unit tests, plus some collection-laws).

I'm pretty sure if two different threads are building up erroniously mutually referential lists, you can get a deadlock instead of an error condition here. Maybe we don't care.

If thread 1 is trying to get the next item in xs, and to do so it starts traversing ys; but meanwhile, thread 2 is trying to extend ys and to do that it's looking into xs; you can have thread 1 enter the synchronized block of the next uninitialized xs, then run down ys and wait for ys's synch lock to be free. But thread 2 has it, runs down xs, hits the sync block of xs...and...no more progress can be made.

Now, this is an error condition. If ys(i) depends on xs(j) and xs(j) depends on ys(i), there's no way to perform the initialization. But you won't get the friendly "hey, re-entrancy means you messed up" exception; you won't even get a stack overflow.

I don't know if robustness to this kind of messup is big enough to worry about. To do it, you'd need to drop synchronization while executing the lambda, then try to pick it up again. This would allow one (and eventually both) threads to crash out with exceptions. But under contention, this might increase the cost considerably. Maybe with the volatile var guard, it wouldn't actually matter much. But I wouldn't bet on it without a test, and it's really hard to microbenchmark multi-threaded code.

lrytz · 2024-12-02T16:20:56Z

@Ichoran you're right that a lot of short lived objects are allocated while evaluating lazy lists.

For example

LazyList.from(1).take(N).toList allocates N*4 LazyList instances during evaluation
LazyList.from(1).take(N+1).take(N).toList allocates N*6
(1 #:: 2 ... #:: N #:: LazyList.empty).toList allocates N*3

There's the Function0 allocations on top of that (not one for every LazyList allocation though).

I compared the state of this PR with current LazyList, it's exactly the same (when adding up LazyList and State instances in the current implementation).

Maybe that situation can also be improved..? It doesn't seem to hold up this PR though.

Comparing allocation to Stream:

Stream.from(1).take(N).toList allocates N*2 Stream instances
Stream.from(1).take(N+1).take(N).toList allocates N*3
(1 #:: 2 ... #:: N #:: Stream.empty).toList allocates N

For the deadlock scenarion, I think it's the same with the current LazyList implementation which uses val state: State[A] = { ... lazyState() ... }, no?

Ichoran · 2024-12-02T19:05:00Z

@lrytz - I think the deadlock situation is different because lazy val doesn't lock, does it? It uses a CAS-like mechanism, right? (Multiple attempts are cool; whoever finishes first wins and gets to set the cached value.) If that's correct, I think the lambda goes through, tries to initialize the lazy val a second time, and is caught by the re-entrant detection code.

Regarding the temporary object overhead--yeah, I wasn't thinking it was particularly worse. Just that it's a shame, in general. But no reason to fix it with this PR; the memory savings are the key part.

lrytz · 2024-12-02T19:32:14Z

@Ichoran lazy vals are evaluated at most once, i posted the desugaring in a comment above #10937 (comment)

Ichoran · 2024-12-02T19:45:25Z

@lrytz - Oh, oops. Then I agree: the deadlock issue is the same.

lrytz · 2024-12-02T20:39:47Z

lazy vals are evaluated at most once

modulo exceptions, that is...

scala> lazy val x = { println("x"); throw new RuntimeException(); 1 }

scala> x
x
java.lang.RuntimeException

scala> x
x
java.lang.RuntimeException

Ichoran · 2024-12-02T20:46:54Z

@lrytz - Yes, that was my mistake--I was incorrectly inferring the general behavior from the behavior with exceptions.

src/library/scala/collection/immutable/LazyList.scala

NthPortal · 2024-12-04T00:11:07Z

could you explain this in more detail, I don't know what you mean. The selfReferentialFailure JUnit test passes.

@lrytz my comment was about your code snippet that didn't have MidEvaluation, so it's moot. I was concerned about the following:

lazy val ll: LazyList[Int] = 1 #:: ll.drop(1)
ll.tail.head // oops

src/library/scala/collection/immutable/LazyList.scala

NthPortal

I love this—thank you for all the work you've put into this Lukas! 💜

I just have a few nitpicks, some bikeshedding, and a single stack safety question that I'm very unsure about, but I'd be in no way unhappy to see it merged as is

src/library/scala/collection/immutable/SeqMap.scala

src/library/scala/collection/immutable/LazyList.scala

test/junit/scala/collection/immutable/LazyListTest.scala

NthPortal · 2024-12-04T20:31:41Z

hey Lukas, have you signed the CLA? the bot isn't sure 😜

Also simplify a couple of operation implementations and align the helper method names with the current design.

lrytz · 2024-12-05T10:17:39Z

Thanks a lot for the sharp-eyed (and -brained) review @NthPortal!

NthPortal

this popped into my mind late last night (and thankfully I remembered it in the morning), and it might be a blocker =/

src/library/scala/collection/immutable/LazyList.scala

NthPortal · 2024-12-19T02:37:42Z

test/junit/scala/collection/immutable/LazyListTest.scala

@@ -349,13 +396,13 @@ class LazyListTest {

    // Check forced cyclic and non-cyclic streams
    assertEquals("LazyList(-2, -1)", pre(2).force.toString)
-    assertEquals("LazyList(0, 1, <cycle>)", cyc(2).force.toString)
-    assertEquals("LazyList(-2, -1, 0, 1, <cycle>)", precyc(2,2).force.toString)
+    assertEquals("LazyList(0, 1, 0, <cycle>)", cyc(2).force.toString)


it occurred to me that we could almost certainly prevent the duplicate element by buffering the last element in a var and appending it "late" (and skipping that append if we detect a cycle)

Right, thank you for the idea. I cleaned up addStringNoForce and implemented your suggestion.

scala-jenkins added this to the 2.13.16 milestone Nov 25, 2024

lrytz mentioned this pull request Nov 25, 2024

LazyList uses 2x memory for evaluated values compared to Stream scala/bug#13062

Open

lrytz force-pushed the t13062 branch from b47e65c to a734811 Compare November 25, 2024 15:40

SethTisue requested a review from NthPortal November 25, 2024 21:22

lrytz modified the milestones: 2.13.16, 2.13.17 Nov 26, 2024

SethTisue added library:collections PRs involving changes to the standard collection library performance the need for speed. usually compiler performance, sometimes runtime performance. labels Nov 27, 2024

lrytz force-pushed the t13062 branch 2 times, most recently from 5fc4205 to 799ff04 Compare November 29, 2024 12:03

lrytz force-pushed the t13062 branch from faa4b36 to 7ba24b5 Compare December 2, 2024 21:00

NthPortal reviewed Dec 4, 2024

View reviewed changes

src/library/scala/collection/immutable/LazyList.scala Outdated Show resolved Hide resolved

lrytz force-pushed the t13062 branch from 81186bb to d1954f3 Compare December 4, 2024 13:24

review feedback

99356b7

lrytz force-pushed the t13062 branch 3 times, most recently from 250431f to 770b377 Compare December 4, 2024 15:39

lrytz commented Dec 4, 2024

View reviewed changes

src/library/scala/collection/immutable/LazyList.scala Show resolved Hide resolved

NthPortal approved these changes Dec 4, 2024

View reviewed changes

Ensure LazyList.initState assigns evaluated head / tail

ec8b7b4

Also simplify a couple of operation implementations and align the helper method names with the current design.

lrytz force-pushed the t13062 branch from 770b377 to 16daf49 Compare December 5, 2024 09:13

second round of review feedback

f462b47

lrytz force-pushed the t13062 branch from 16daf49 to f462b47 Compare December 5, 2024 09:17

NthPortal requested changes Dec 9, 2024

View reviewed changes

src/library/scala/collection/immutable/LazyList.scala Outdated Show resolved Hide resolved

src/library/scala/collection/immutable/LazyList.scala Show resolved Hide resolved

lrytz commented Dec 10, 2024

View reviewed changes

src/library/scala/collection/immutable/LazyList.scala Show resolved Hide resolved

heap saftety test for foldLeft

57112f3

lrytz force-pushed the t13062 branch 3 times, most recently from cc7fe7a to 58e886e Compare December 10, 2024 20:31

lrytz added 2 commits December 10, 2024 23:05

distinguish null and empty lazy list, cons(1, null) should NPE

8da24c8

rename _state to _tail

41171c2

lrytz force-pushed the t13062 branch from 58e886e to 41171c2 Compare December 10, 2024 22:05

lrytz mentioned this pull request Dec 13, 2024

@nullOut annotation #10959

Draft

NthPortal approved these changes Dec 18, 2024

View reviewed changes

NthPortal reviewed Dec 19, 2024

View reviewed changes

lrytz added 2 commits December 20, 2024 14:42

simplify addStringNoForce

9465508

restore original toString / mkString

2bc13ce

lrytz force-pushed the t13062 branch from 3b96d13 to 2bc13ce Compare December 20, 2024 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory footprint of evaluated values in LazyList #10937

Reduce memory footprint of evaluated values in LazyList #10937

lrytz commented Nov 25, 2024 •

edited

Loading

lrytz commented Nov 25, 2024

sjrd commented Nov 25, 2024

He-Pin commented Nov 25, 2024

sjrd commented Nov 25, 2024

Ichoran commented Nov 25, 2024

NthPortal commented Nov 25, 2024

lrytz commented Nov 25, 2024

NthPortal commented Nov 27, 2024

lrytz commented Nov 28, 2024 •

edited

Loading

Ichoran commented Nov 28, 2024

NthPortal commented Nov 28, 2024

lrytz commented Nov 29, 2024

Ichoran commented Nov 29, 2024

lrytz commented Dec 2, 2024

Ichoran commented Dec 2, 2024

lrytz commented Dec 2, 2024

Ichoran commented Dec 2, 2024

lrytz commented Dec 2, 2024

Ichoran commented Dec 2, 2024

NthPortal commented Dec 4, 2024 •

edited

Loading

NthPortal left a comment

NthPortal commented Dec 4, 2024

lrytz commented Dec 5, 2024

NthPortal left a comment

NthPortal Dec 19, 2024

lrytz Dec 20, 2024

Reduce memory footprint of evaluated values in LazyList #10937

Are you sure you want to change the base?

Reduce memory footprint of evaluated values in LazyList #10937

Conversation

lrytz commented Nov 25, 2024 • edited Loading

lrytz commented Nov 25, 2024

sjrd commented Nov 25, 2024

He-Pin commented Nov 25, 2024

sjrd commented Nov 25, 2024

Ichoran commented Nov 25, 2024

NthPortal commented Nov 25, 2024

lrytz commented Nov 25, 2024

NthPortal commented Nov 27, 2024

lrytz commented Nov 28, 2024 • edited Loading

Ichoran commented Nov 28, 2024

NthPortal commented Nov 28, 2024

lrytz commented Nov 29, 2024

Ichoran commented Nov 29, 2024

lrytz commented Dec 2, 2024

Ichoran commented Dec 2, 2024

lrytz commented Dec 2, 2024

Ichoran commented Dec 2, 2024

lrytz commented Dec 2, 2024

Ichoran commented Dec 2, 2024

NthPortal commented Dec 4, 2024 • edited Loading

NthPortal left a comment

Choose a reason for hiding this comment

NthPortal commented Dec 4, 2024

lrytz commented Dec 5, 2024

NthPortal left a comment

Choose a reason for hiding this comment

NthPortal Dec 19, 2024

Choose a reason for hiding this comment

lrytz Dec 20, 2024

Choose a reason for hiding this comment

lrytz commented Nov 25, 2024 •

edited

Loading

lrytz commented Nov 28, 2024 •

edited

Loading

NthPortal commented Dec 4, 2024 •

edited

Loading