-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory footprint of evaluated values in LazyList #10937
base: 2.13.x
Are you sure you want to change the base?
Conversation
JOL output before this change:
After:
|
That looks like a reasonable encoding. However it's going to be tricky to make thread-safe while staying portable for the non-JVM platforms. The straightforward approach would be to use CAS'es on the fields, but that's not supported on JS/Native other than through |
Does other platform support the AtomicXFieldUpdatper? that will help save more bytes |
No, they don't. These classes are also reflection-based. |
Is space or runtime performance more important for |
I'll try to have a look at this. I've had another idea for a while of boxing all the evaluation into its own class, and then just swapping the val from the box to the result after the evaluation is done, which sidesteps the concurrency issues I believe |
👍 I also think this can probably be cleaned up furhter. |
@lrytz I haven't fixed this enough for it to compile, but thoughts on this design? https://github.com/scala/scala/compare/2.13.x...NthPortal:scala:ll-memory/R1?expand=1 |
@NthPortal I see, the idea is using a lazy val in the Evaluator to implement thread safety, and make sure that multiple threads assigning to I'm not sure which one is easier, your design or implementing thread safety manually. Inspired by the desugaring for public class C {
private int x;
private volatile boolean bitmap$0;
private int x$lzycompute() {
C c = this;
synchronized (c) {
if (!this.bitmap$0) {
this.x = 42;
this.bitmap$0 = true;
}
}
return this.x;
}
public int x() {
if (!this.bitmap$0) {
return this.x$lzycompute();
}
return this.x;
}
} We should be able to do it as: class LazyList {
@volatile var _head = Undefined
var _state = lazyState
def state =
if (_head ne Undefined) sCons(_head, _state)
else {
synchronized {
if (_head eq Undefined) {
// evaluate lazy state
_state = ...
_head = ...
}
}
state
} I pushed that, wdyt? I would need to brush up my java memory model knowledge to be sure. If a thread observes |
Aren't you creating a new object on every access now? Are we sure this is a win? |
@lrytz I'm pretty sure you're correct that setting |
5fc4205
to
799ff04
Compare
@Ichoran I changed this now (see commit "remove State"). WDYT?
@NthPortal could you explain this in more detail, I don't know what you mean. The I adjusted the tests for scala> lazy val ll: LazyList[Int] = 1 #:: 2 #:: ll
lazy val ll: LazyList[Int] // unevaluated
scala> ll.tail.tail eq ll
val res0: Boolean = false
scala> ll.tail.tail.tail eq ll.tail
val res1: Boolean = true Until here it's the same before and after my change. But Before the change, import scala.util.chaining._
def getMethodAccessible[A: reflect.ClassTag](name: String): java.lang.reflect.Method =
implicitly[reflect.ClassTag[A]]
.runtimeClass.getDeclaredMethods
.find(_.getName == name).get.tap(_.setAccessible(true))
val state = getMethodAccessible[LazyList[_]]("scala$collection$immutable$LazyList$$state")
scala> state.invoke(ll.tail.tail) eq state.invoke(ll)
val res6: Boolean = true But we no longer have the Serialization: is actually not compatible, because un-evaluated lazy lists are serialized using plain object serialization. I don't think we can work around this. But we don't guarantee serialization compatibility between releases, so 🤷♂️. |
@lrytz - I guess is about the best that can be hoped for given the binary compatibility requirements. There is the one extra creation of a So I think the performance overhead, if it's an issue, would be the only thing remaining (not counting correctness, of course--I can't eyeball that, but there are a fair number of unit tests, plus some collection-laws). I'm pretty sure if two different threads are building up erroniously mutually referential lists, you can get a deadlock instead of an error condition here. Maybe we don't care. If thread 1 is trying to get the next item in xs, and to do so it starts traversing ys; but meanwhile, thread 2 is trying to extend ys and to do that it's looking into xs; you can have thread 1 enter the synchronized block of the next uninitialized xs, then run down ys and wait for ys's synch lock to be free. But thread 2 has it, runs down xs, hits the sync block of xs...and...no more progress can be made. Now, this is an error condition. If ys(i) depends on xs(j) and xs(j) depends on ys(i), there's no way to perform the initialization. But you won't get the friendly "hey, re-entrancy means you messed up" exception; you won't even get a stack overflow. I don't know if robustness to this kind of messup is big enough to worry about. To do it, you'd need to drop synchronization while executing the lambda, then try to pick it up again. This would allow one (and eventually both) threads to crash out with exceptions. But under contention, this might increase the cost considerably. Maybe with the volatile var guard, it wouldn't actually matter much. But I wouldn't bet on it without a test, and it's really hard to microbenchmark multi-threaded code. |
@Ichoran you're right that a lot of short lived objects are allocated while evaluating lazy lists. For example
There's the I compared the state of this PR with current LazyList, it's exactly the same (when adding up LazyList and State instances in the current implementation). Maybe that situation can also be improved..? It doesn't seem to hold up this PR though. Comparing allocation to Stream:
For the deadlock scenarion, I think it's the same with the current |
@lrytz - I think the deadlock situation is different because lazy val doesn't lock, does it? It uses a CAS-like mechanism, right? (Multiple attempts are cool; whoever finishes first wins and gets to set the cached value.) If that's correct, I think the lambda goes through, tries to initialize the lazy val a second time, and is caught by the re-entrant detection code. Regarding the temporary object overhead--yeah, I wasn't thinking it was particularly worse. Just that it's a shame, in general. But no reason to fix it with this PR; the memory savings are the key part. |
@Ichoran lazy vals are evaluated at most once, i posted the desugaring in a comment above #10937 (comment) |
@lrytz - Oh, oops. Then I agree: the deadlock issue is the same. |
modulo exceptions, that is...
|
@lrytz - Yes, that was my mistake--I was incorrectly inferring the general behavior from the behavior with exceptions. |
@lrytz my comment was about your code snippet that didn't have lazy val ll: LazyList[Int] = 1 #:: ll.drop(1)
ll.tail.head // oops |
250431f
to
770b377
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this—thank you for all the work you've put into this Lukas! 💜
I just have a few nitpicks, some bikeshedding, and a single stack safety question that I'm very unsure about, but I'd be in no way unhappy to see it merged as is
hey Lukas, have you signed the CLA? the bot isn't sure 😜 |
Also simplify a couple of operation implementations and align the helper method names with the current design.
Thanks a lot for the sharp-eyed (and -brained) review @NthPortal! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this popped into my mind late last night (and thankfully I remembered it in the morning), and it might be a blocker =/
cc7fe7a
to
58e886e
Compare
@@ -349,13 +396,13 @@ class LazyListTest { | |||
|
|||
// Check forced cyclic and non-cyclic streams | |||
assertEquals("LazyList(-2, -1)", pre(2).force.toString) | |||
assertEquals("LazyList(0, 1, <cycle>)", cyc(2).force.toString) | |||
assertEquals("LazyList(-2, -1, 0, 1, <cycle>)", precyc(2,2).force.toString) | |||
assertEquals("LazyList(0, 1, 0, <cycle>)", cyc(2).force.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it occurred to me that we could almost certainly prevent the duplicate element by buffering the last element in a var
and appending it "late" (and skipping that append if we detect a cycle)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thank you for the idea. I cleaned up addStringNoForce
and implemented your suggestion.
The LazyList implementation uses two objects per evaluated value,
a LazyList and one State instance. So the overhead for evaluated
values is 2x compared to Stream.
This PR reduces the overhead to one LazyList instance with two
fields (head and tail) per evaluated value.
Thread safety is implemented through checking the volatile
head
field and synchronizing on
this
for initialization. This isthe equivalent to the previous implementation which uses a
lazy val
.Breaking changes:
Serialization is not compatible. Non-evaluated (tails of) streams are
serialized using plain object serialization. The internals are
different and serialization is incompatible in both directions.
toString
of cyclic streams displays one more element,<cycle>
is only shown staring starting off the second element. For example,
in
val x: LL[Int] = 1 #:: 2 #:: x
, thex.tail.tail
instanceis different than
x
, only the nexttail
is a shared instance.Before, the cycle could be detected earlier because the
state
instance of
x
andx.tail.tail
is the same, even if theLazyList
instances were different.Fixes scala/bug#13062