Include a fingerprint of the specific arguments in mangled names. #4771

zygoloid · 2025-01-08T01:09:29Z

Instead of including the raw index of the specific, which is unstable across files and across unrelated changes, use a fingerprint of the constant values of the specific arguments. This is a placeholder until we decide on how we want to mangle specific functions.

danakj · 2025-01-08T14:40:25Z

toolchain/sem_ir/inst_fingerprinter.cpp

@@ -17,7 +17,7 @@ namespace Carbon::SemIR {
 namespace {
 struct Worklist {
  // The file containing the instruction we're currently processing.
-  const File* sem_ir;
+  const File* sem_ir = nullptr;


Great :) I was wondering inside if this would be a good idea on the last PR

danakj · 2025-01-08T14:43:53Z

toolchain/sem_ir/inst_fingerprinter.cpp

@@ -337,11 +344,25 @@ struct Worklist {

 auto InstFingerprinter::GetOrCompute(const File* file, InstId inst_id)
    -> uint64_t {
-  Worklist worklist = {.sem_ir = nullptr,
-                       .todo = {{file, inst_id}},
+  Worklist worklist = {.todo = {{file, inst_id}},
                       .fingerprints = &fingerprints_};
  worklist.Run();


Sorta aside, but the new GetOrCompute avoids doing a second lookup, and it seems like Run() could also keep around the last fingerprint and return it, which would be the same as the lookup here and save us that?

Good point, done. (In this PR for convenience, let me know if you'd prefer that I split it out.)

danakj · 2025-01-08T14:45:23Z

toolchain/sem_ir/inst_fingerprinter.cpp

+  worklist.Prepare(file);
+  worklist.Add(inst_block_id);
+  if (!worklist.todo.empty()) {
+    worklist.Run();
+    worklist.Prepare(file);
+    worklist.Add(inst_block_id);
+  }


I don't love how this is breaking the rather nice Worklist/Run abstraction. /me goes to stare at Worklist and see if this can be done internally

I see, it's because InstBlockId is not an InstId so it can't go in todo. This is maybe a terrible idea but just in case - did you consider constructing some artificial typed inst here that has an InstBlockId inside it, and give that as the todo?

Or a possibly heavier or different shaped hammer: todo could become a list of variant<InstId, InstBlockId> and Run could deal with that internally?

Yeah, I find this a little unsatisfying too. We can't construct a real InstId for a new artificial instruction without modifying the File, which seems out-of-contract (and we're using this from lowering which really ought to not mutate the File).

I think switching to storing a variant in the todo list could be a bit heavyweight. On the other hand, if we see the same InstBlockIds repeatedly, caching their fingerprints might save us some work. I think we should probably make this call based on performance data, but I don't think we have representative examples we can use to measure this yet, so I've added a TODO to try this out later.

One other option I considered was storing an Inst rather than an InstId in the todo list. That'd allow us to use an artificial typed inst here without modifying the IR. Again it's a little heavy -- an Inst is four times the size of an InstId -- and I'm not sure whether it's worth it.

Ok thanks. Yeah if we're concerned with performance here, having some kind of benchmark that we can look at would be great! I would be a little surprised if a variant of two ids is noticeably worse than one id, but we can't tell without measuring.

For the performance impact, it's mostly the extra levels in the Merkle tree that I'm concerned with, rather than the variant itself. I'd expect that it's a win to add separate levels for InstBlocks if we often end up reusing the work, and not otherwise. [And I don't know what would constitute "often" either :)]

Ah, I was not imagining that changing actually, I think. Maybe I will send a lil followup to this and you can see if you like what I am picturing or not.

#4776 is what I was thinking, see what you think. No hard feelings if it's not what you'd like.

inst block fingerprints.

github-actions bot added the toolchain label Jan 8, 2025

github-actions bot requested a review from jonmeow January 8, 2025 01:09

danakj reviewed Jan 8, 2025

View reviewed changes

Preserve the most recent hash to avoid a lookup. Add TODO to try caching

5b45ee6

inst block fingerprints.

zygoloid requested a review from danakj January 8, 2025 19:28

Merge branch 'trunk' into toolchain-fingerprint-specifics

bb2c68d

danakj approved these changes Jan 8, 2025

View reviewed changes

zygoloid enabled auto-merge January 8, 2025 19:54

zygoloid added this pull request to the merge queue Jan 8, 2025

Merged via the queue into carbon-language:trunk with commit 9a5f2d7 Jan 8, 2025
8 checks passed

zygoloid deleted the toolchain-fingerprint-specifics branch January 8, 2025 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include a fingerprint of the specific arguments in mangled names. #4771

Include a fingerprint of the specific arguments in mangled names. #4771

zygoloid commented Jan 8, 2025

danakj Jan 8, 2025

danakj Jan 8, 2025

zygoloid Jan 8, 2025

danakj Jan 8, 2025

zygoloid Jan 8, 2025

danakj Jan 8, 2025

zygoloid Jan 8, 2025

danakj Jan 8, 2025

danakj Jan 8, 2025

Include a fingerprint of the specific arguments in mangled names. #4771

Include a fingerprint of the specific arguments in mangled names. #4771

Conversation

zygoloid commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment