Skip to content

Latest commit

 

History

History
86 lines (47 loc) · 8.6 KB

ismm2020.rst

File metadata and controls

86 lines (47 loc) · 8.6 KB

ISMM 2020 Notes

Trying something new. As ISMM 2020 is a virtual conference this year, I'm listening in to all the talks and making notes as I go. YouTube Live Stream. Proceedings.

Verified Sequential Malloc/Free

Seperation logic based proofs, described in DSL for a proof tool I am unfamiliar with. Basic strategy is to do source level proof w/proof annotations stored separately and rely on CompCert (a verifier C compiler) to produce a verified binary. Library verified is a malloc library I'm unfamiliar with, unclear how "real" this code is. Verification work done manually by the author. The first couple of slides do nicely describe the strength of seperation logic for the domain and some of the key intuitions.

Alligator Collector: A Latency-Optimized Garbage Collector for Functional Programming Languages

Partially concurrent collector for GHC. Presentation is somewhat weak for a audience familiar with garbage collection fundementals. Collector sounds fairly basic by modern Java standards, but it makes for an interesting experience report. The design used isn't a bad starting point for languages without a mature collector.

The question at the end about implications of precompiled binaries is interesting. In particular, acknowledged advantage of load barrier and partially incremental collection. Answer mentioned "customer was strict" about that requirement which provides some context on why the design evolved in the way it did.

Understanding and Optimizing Persistent Memory Allocation

Focus is on writing crash atomic (i.e. exception safe) allocation code for persistent memory. Approach take is to persist only heap metadata sufficient to implement a conservative (ambigious) GC. Previous approaches referenced appear to be analogous to standard persistent techniques for databases (i.e. commit logs). Positioning is described most clearly in conclusion slide. Questions at the end are the clearest part of the talk.

The contribution of this paper appear fairly minor conceptually. There's a passing mention of first lock free allocator, but the only part actually discussed is persisting only heap metadata and reconstructing rest on recovery. Previous work had already used conservative GC as fallback. The choice to use a handlized heap is understandable, but has unfortunate performance implications.

An interesting idea to explore in this area would be what a relocating collector looks like in this space. One way to avoid need for handlization is to support pointer fixup. You could track previous mapped addresses and as long as those don't overlap, you can interpret all previous pointers with a load barrier. Technically, you can do fixup without mark or relocate, but if you're paying the semantic costs to support fixup you might as well get the fragmentation benefits.

Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories

Starts with a nice overview of problem domain. Super helpful as I know little about this area!

Approach appears to basically use profiling of first access to classify memory as "likely read heavy" and "likely write heavy". There's also a bit about "near and far", but that went over my head. There's a profile mechanism in the page fault handler (pressumably in hardware) which classifies. On first access to a new page, this instruction profile is used to classify new page.

Key expectation seems to be that individual instructions are either read heavy or write heavy. This seems reasonable, but it's not clear to me why you need a dynamic profile for this. The instruction encoding seems to tell you this. Maybe the dynamic part is needed for near and far? I didn't follow that part.

This is far enough out of my area that I'm not following details. If you're interested in topic, highly recommend listening to the talk directly.

Prefetching in Functional Languages

Nicely explained problem domain and problem statement.

I am very skeptical of the value of prefetching on modern hardware; the talk tries to justify the need on OOO hardware and while not wrong, I think it's over stating the problem. Particularly with a GC which arranges objects in approximate traversal order (single linked lists are trivially to get right) I don't see a strong value here. (Ah, later performance comparison uses ARM chips and Knights Landing). I just showed my mainline Intel bias...)

After listening to the talk, it's really not clear what the contribution was. Most of the discussion is generic prefetch discussion. Did they add a prefetch instruction to OCaml? That's pretty basic. Was this a characterization study? Was there some form of compiler support?

It is a nice presentation on the basics of prefetching and the performance tradeoffs thereof.

Garbage Collection Using a Finite Liveness Domain

Basic notion is to use a heap analysis to describe reachable objects from each reference. Standard GC is reachability based which simply means we use the conservative trivial analysis (i.e. everything is live if there's an edge to it.) General problem with this approach is scalability, this paper tries to approach that with a restricted set of analysis results. The basic framing appears to basically be given a tree, which subtree is live? The set of subtrees is resticted to first level, left recurisve, right recursive, and all.

A thought, has anyone considered reversing the liveness result? If when scanning the stack, you used the liveness analysis to break references to dead objects this must be semantic preserving. At that point, a standard reachability GC can produce the refined results. Actually, given that, isn't the result equivelent to a complicated DSE which inserts code which only runs at the GC point? Maybe a compiler could support this with a DSE and gc closure which runs at stack scan time? (Clarification: The inversion of the analysis result works for single threaded programs or unescaped objects in concurrent languages, but not for objects accessible by multiple threads.) (Ian asked the same question :) and the other provided a nice answer about AA I'd missed.)

ThinGC: Complete Isolation With Marginal Overhead

Objects are grouped into hot and cold pages. All access is to hot pages; pages are moved to hot region if needed to satisfy a memory access. (This seems an odd choice and has obvious parallels to compressed heap ideas which have been explored in previous work.) Implementation builds on ZGC, but that appears to be irrelevant to the main focus of the work.

Question: Why not have cold heap be subset of old gen? Allows reuse of card mark instead of a separate remembered set. A: ZGC is single generational.

Application study looks very promising. This is probably the key contribution from the paper. Reheating percentage is low for most application, but high for a few. This is a problem for the approach. I was unhappy that this wasn't further discussed. I asked a question on this at the end, and didn't get the sense this had really been explored

As a result of the variation in reheating, also observed very wide variations in performance and pattern. Also a verbal mention of run to run variation due to memory arrangement, but this wasn't justified or expanded upon. (I'm midly suspicious of this without evidence.)

My questions: * What was the motivation for requiring reheating rather than directly accessing the cold objects? Did you evaluate the implications of this choice? A: plan is to use very slow memory for cold memory; current work does not. * Did you explore the patterns which tended to cause reheating? Is there any idiomatic pattern which might influence the design? A: no information

My takeway: no strong conclusions, haven't either proved or disproved the idea of hot/cold separation; too tied to decision around reheating. Maybe paper has more useful details?

Improving Phase Change Memory Performance with Data Content Aware Access

Skipped this session.

Keynote: Richard Jones

Schedule conflict, will watch later.