Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate analysis results #2

Open
martin-luecke opened this issue Nov 14, 2023 · 32 comments
Open

Investigate analysis results #2

martin-luecke opened this issue Nov 14, 2023 · 32 comments
Assignees

Comments

@martin-luecke
Copy link
Collaborator

martin-luecke commented Nov 14, 2023

Investigate why we have so many entries in the entry "Fails statically but passes dynamically".
Suspicion is that this relates to Exceptions thrown in the Analysis when the rewrite is illegal from a PDL IR perspective.

e.g. bottom left in this table

|                     |   Passes Dynamic Check |   Fails Dynamic Check |   Total |
|---------------------+------------------------+-----------------------+---------|
| Passes Static Check |                98      |               1.01031 |      98 |
| Fails Static Check  |                82.6667 |               1.01224 |     248 |
| Total               |                 4      |             490       |     346 |
@martin-luecke martin-luecke self-assigned this Nov 14, 2023
@math-fehr
Copy link
Contributor

I added the missing PDL verifiers, and I now only generate verifying PDL.
The results are like this now:

|                     |   Passes Dynamic Check |   Fails Dynamic Check |   Total |
|---------------------+------------------------+-----------------------+---------|
| Passes Static Check |                1.35366 |               3.82759 |     111 |
| Fails Static Check  |                9.025   |               1.12461 |     722 |
| Total               |              162       |            1284       |     833 |

Though I'm not sure what these number means?

@martin-luecke
Copy link
Collaborator Author

Oh, must have been tired while writing this. It should be correct now.
Here is a table I generated off 10000 rewrites:

Analysis failed, MLIR execution failed:  6740
Analysis failed, MLIR execution succeeded:  45
Analysis succeeded, MLIR execution failed:  1340
Analysis succeeded, MLIR execution succeeded:  11
PDL Analysis raised an exception:  1864

|                     |   Passes Dynamic Check |   Fails Dynamic Check |   Total |
|---------------------+------------------------+-----------------------+---------|
| Passes Static Check |             0.00814212 |              0.991858 |    1351 |
| Fails Static Check  |             0.00663228 |              0.993368 |    6785 |
| Total               |            56          |           8080        |    8136 |

The bottom right 99% number is encouraging, though it's still interesting that we deem 45 rewrites as invalid that still succeed dynamically.
I will start investigating why we have so many rewrites that pass our analysis and still fail (top right 99%).

@martin-luecke
Copy link
Collaborator Author

Okay, I have discovered some weird behavior. On my setup the mlir-opt binary does not have the pass --test-pdl-bytecode-pass and therefore should always fail. However, here it is reported to succeed 56 times.

As far as I can see this is because this code here. It assumes MLIR ran successfully when neither of the Exceptions occur. However, in some cases it seems to be possible that the generator created by put_operations_in_region(dag, region, ops) does not return a single entry, thus MLIR never runs and this returns None, which seems to indicate success in this code.

We probably should have a sanity check in the beginning with a very simple input IR, rewrite and expected IR to make sure that MLIR actually runs correctly

@math-fehr I think this part was written by you, do you know why this might return no region?

The specific rewrite this failed with is:

pdl.pattern : benefit(1) {
  %0 = pdl.type : i32
  %1 = pdl.operand : %0
  %2 = pdl.operation "pdltest.matchop" (%1 : !pdl.value)
  %3 = pdl.type : i32
  %4 = pdl.operand : %3
  %5 = pdl.type : i32
  %6 = pdl.operand : %5
  %7 = pdl.type : i32
  %8 = pdl.operation "pdltest.matchop" (%4, %6 : !pdl.value, !pdl.value) -> (%7 : !pdl.type)
  %9 = pdl.result 0 of %8
  %10 = pdl.type : i32
  %11 = pdl.operand : %10
  %12 = pdl.type : i32
  %13 = pdl.operation "pdltest.matchop" (%11, %9 : !pdl.value, !pdl.value) -> (%12 : !pdl.type)
  %14 = pdl.result 0 of %13
  %15 = pdl.type : i32
  %16 = pdl.operand : %15
  %17 = pdl.type : i32
  %18 = pdl.operand : %17
  %19 = pdl.type : i32
  %20 = pdl.type : i32
  %21 = pdl.operation "pdltest.matchop" (%16, %18 : !pdl.value, !pdl.value) -> (%19, %20 : !pdl.type, !pdl.type)
  %22 = pdl.result 0 of %21
  %23 = pdl.result 1 of %21
  pdl.rewrite %21 {
    pdl.replace %2 with %8
    %24 = pdl.operation "pdltest.rewriteop"
  }
}

Which is deemed incorrect by the analysis (the matching dag is not connected) and should also fail MLIR execution

@math-fehr
Copy link
Contributor

I'm on it now!

@martin-luecke
Copy link
Collaborator Author

Here is the newest table with a proper MLIR build (that I spent 3h generating 😅 ):

Analysis failed, MLIR execution failed:  5490
Analysis failed, MLIR execution succeeded:  190
Analysis succeeded, MLIR execution failed:  1636
Analysis succeeded, MLIR execution succeeded:  849
PDL Analysis raised an exception:  1835

|                     |   Passes Dynamic Check |   Fails Dynamic Check |   Total |
|---------------------+------------------------+-----------------------+---------|
| Passes Static Check |              0.34165   |              0.65835  |    2485 |
| Fails Static Check  |              0.0334507 |              0.966549 |    5680 |
| Total               |           1039         |           7126        |    8165 |

The 65% in the top right indicate to me that our static analysis is still too weak to catch many cases that make MLIR fail dynamically. I did not investigate this yet.

the 3% in the bottom left where MLIR does not fail, yet our analysis thinks the rewrite is invalid looks to me as if we do not generate complex enough inputs to the MLIR rewrite. From investigating the results a little this behavior shows when e.g. an operation is replaced by itself, which is unsafe as the replacement degenerates to an erasure. But the inputs the fuzzer generated for this rewrite did not use the matched op in any of the cases.
This looks like we have to make the fuzzer slightly more involved.
We could also make this driven by our analysis result, i.e. spend more time generating more complex inputs if we already know the analysis result is False

What do you think @math-fehr

@math-fehr
Copy link
Contributor

Yeah that makes a lot of sense!
I've been passing some time trying to reproduce your results yesterday, and indeed the 1000/7000 is what I get locally as well.
It's taking a long time to generate that table, if you want what I can do directly is paralellize this quickly to make this N_CPU times faster!

I'll look into the fuzzer!

@math-fehr
Copy link
Contributor

I just added multithreading, so if you have a good machine this should speed up things greatly.
Also, I'll try to make this more reproducible, so we can more quickly see and share programs that are failing.
I also added an -n flag to run less examples, because 10k is a bit too long for my laptop!

@math-fehr
Copy link
Contributor

I also improved the printer in the table so we actually print the seeds that are failing/succeeding, and I added the --seed option so that we can more easily reproduce these things in the other scripts.

@martin-luecke
Copy link
Collaborator Author

That's great! It generates much faster now. 👍
I'll look into it this afternoon

@martin-luecke
Copy link
Collaborator Author

martin-luecke commented Jan 10, 2024

I looked into the category "Analysis failed, MLIR execution succeeded" and these are the results:

1 - replacement with itself: [53710184, 986607411, 453082198, 857706059, 343558720, 245997989, 726033229, 491648746, 408902191, 202563114, 951616026, 569519230, 333821845, 451135926, 719811309, 260615611, 835064486, 597551532, 245649464, 813873333, 658467859, 416864332, 446221126, 85600382, 1010392821, 200165247, 578770519, 524409999, 924892835, 756603494, 377184186, 512831000, 265404788]
2 - replacement with (one of) its own results: [770902343, 454833168, 532373262, 634914704, 966517452, 410290553, 413571616, 248939737, 193559947]
3 - scoping analysis too limited: [801914424, 658467859, 976459372, 935389095, (741148029 this is actually similar to 214129241), 578770519, 924892835, 512831000, 466583095] - (at least 976459372, 935389095 are not actually save, depending on where the use of the replaced op is!)
4 - erased op still in use: [354430857]
5 - number of results different in replacement, should actually fail MLIR? [976459372, 466583095 and a few others as well]

regarding 1: MLIR does not complain about the replacement with itself and that only breaks when there are actually uses outside of the matched IR.
regarding 2: Same thing
@math-fehr How far off are we on the option of generating uses of operations that are replaced / erased?

regarding 3:

  • The analysis has issues when we have e.g. this pattern:
    %3 = pdl.operation "pdltest.matchop" (%1 : !pdl.value) -> (%2 : !pdl.type)
    %4 = pdl.result 0 of %3
    pdl.rewrite %3 {
        ...
        %7 = pdl.operation "pdltest.rewriteop" (%4, %4 : !pdl.value, !pdl.value) -> (%5, %6 : !pdl.type, !pdl.type)

because the scope of %4 is not picked up correctly.
I will have a look at this probably tomorrow.

  • 976459372, 935389095 are not actually save. I think if we generate more options with various uses of operations that are replaced we will find counter examples to rewrites like these. The issue with them is that it matches
    A
    / \
    B C
    i.e. both C and C are after A, but we don't have a guarantee on the order of B and C. Then they replace B with C.

regarding 4: I am not quite sure why this is the error here, have to look deeper into this one

regarding 5: I am not sure why these pass MLIR, the do a replacement with the wrong number of values (number of results of the replaced op have to match!). I know MLIR has an assertion for this, but it does not trigger here.

@math-fehr We should think about a sanity check whether our generated IR actually provides a match for the corresponding rewrite. I am worried we generate a bunch of inputs, run them through MLIR, find no matches, i.e. successful return code and assume correctness. Or do we have something like this already?
How we could do this: (We could also enable this as a debug option instead of on default)
Add the option "--debug-only=pdl-bytecode" to the call to mlir and check "RecordMatch" in res.stderr
This option enables the detailed debugging of the PDL interpreter, which prints "Executing RecordMatch" after satisfying all requirements for a match.
What do you think?

@math-fehr
Copy link
Contributor

Nice!
For 1 and 2, I can realistically do that today!

For 3, I think I have a good idea on how to fix that, which should also speedup our checks.
Currently, I generate a bunch of tests with many different configurations. Realistically, because of how the patterns behave, I think that if I generate "the worst case" (one operation per block, and no dominance between unrelated blocks), then it should always catch the error. That would also mean that we don't need to fuzz these matches, we just need to generate the worst one. I can try to implement that today/tomorrow, but most likely tomorrow.

@math-fehr math-fehr reopened this Jan 10, 2024
@math-fehr
Copy link
Contributor

math-fehr commented Jan 10, 2024

I'll also try to implement the sanity check tomorrow with your debug-only solution!

@math-fehr
Copy link
Contributor

1-2 is implemented with the last commit now

@math-fehr
Copy link
Contributor

I realized I was really wrong with problem 3, there is no "worst case" scenario.
I'm still working on something to make it better.
Instead of generating randomly a block DAG, and then adding operations to it, I'll just get an operation ordering, and then generating all the possible interleavings, and then generating a corresponding worst DAG for the given interleaving (one operation per block).
I'm on it now

@math-fehr
Copy link
Contributor

math-fehr commented Jan 11, 2024

I wrote a new fuzzing algorithm, it seems to work!
I unveiled a new kind of bug, that is on our end however (489407815):

  builtin.module attributes  {"sym_name" = "patterns"} {
    pdl.pattern : benefit(1) {
      %0 = pdl.type : i32
      %1 = pdl.operand : %0
      %2 = pdl.type : i32
      %3 = pdl.operand : %2
      %4 = pdl.type : i32
      %5 = pdl.operation "pdltest.matchop" (%1, %3 : !pdl.value, !pdl.value) -> (%4 : !pdl.type)
      %6 = pdl.result 0 of %5
      %7 = pdl.type : i32
      %8 = pdl.operand : %7
      %9 = pdl.type : i32
      %10 = pdl.type : i32
      %11 = pdl.operation "pdltest.matchop" (%8, %6 : !pdl.value, !pdl.value) -> (%9, %10 : !pdl.type, !pdl.type)
      %12 = pdl.result 0 of %11
      %13 = pdl.result 1 of %11
      pdl.rewrite %11 {
        %14 = pdl.operation "pdltest.rewriteop" (%13 : !pdl.value)
        pdl.erase %14
      }
    }
  }

This will actually succeed, because the operation that is being added will be removed directly, so even if it accessed non-dominated operands, it will still pass the verifier at the end.

@math-fehr
Copy link
Contributor

Another one that should be accepted (696391871):

  pdl.pattern : benefit(1) {
    %0 = pdl.operation "pdltest.matchop"
    pdl.rewrite %0 {
      pdl.replace %0 with %0
      ^^^^^^^^^^^--------------
      | replacement_with_itself
      -------------------------
    }
  }
}

Despite being replaced with itself, it has no results, so this cannot go wrong

@math-fehr
Copy link
Contributor

Also, I added a check, and indeed all generated programs are matched by pdl

@math-fehr
Copy link
Contributor

And finally, this kind of rewrite is accepted by our analysis, but not by MLIR,
because we are replacing one operation with no results with multiple results.
@martin-luecke, what should we do here? Should we just not generate these rewrites,
or should we add that in our analysis?

builtin.module {
  builtin.module attributes  {"sym_name" = "patterns"} {
    pdl.pattern : benefit(1) {
      %0 = pdl.type : i32
      %1 = pdl.operand : %0
      %2 = pdl.type : i32
      %3 = pdl.operation "pdltest.matchop" (%1 : !pdl.value) -> (%2 : !pdl.type)
      %4 = pdl.result 0 of %3
      %5 = pdl.type : i32
      %6 = pdl.operand : %5
      %7 = pdl.operation "pdltest.matchop" (%4, %6 : !pdl.value, !pdl.value)
      pdl.rewrite %7 {
        pdl.replace %7 with (%4 : !pdl.value)
      }
    }
  }
  builtin.module attributes  {"sym_name" = "ir"} {
    func.func @test(%8 : i32) {
      %9 = "pdltest.matchop"(%8) : (i32) -> i32
      "test.use_op"(%9) : (i32) -> ()
      "pdltest.matchop"(%9, %8) : (i32, i32) -> ()
      "test.use_op"() : () -> ()
    }
  }
}

@martin-luecke
Copy link
Collaborator Author

Hey Mathieu,
Thanks for your effort! The new fuzzing approach looks very promising!
I changed a few things:

  • slightly extended the analysis and got rid of a few cases of self replacements.
  • restructured how the fuzzing results are reported to properly differ between the ways the analysis fails.
  • For MLIR we also print the seeds that we did not manage to generate matches for.

Can it be that the seeds changed with the new approach to fuzzing? At least I could not regenerate your examples from your seeds, but managed to find similar ones.
As far as I tested now I fixed the issues you mentioned in your seeds 489407815 and 696391871 .

what should we do here? Should we just not generate these rewrites,
or should we add that in our analysis?

Yes, I will add that soon as well, I think we do not yet handle replacement with values quite well. In general we should look to only make the analysis stronger, never remove certain inputs.

I will soon start looking into cases where the analysis returns success and the rewrite fails in MLIR, as the analysis should ideally be strictly stronger, i.e. conservative than the dynamic execution.

@math-fehr
Copy link
Contributor

Really nice!
Oh yes, the seeds might have changed when I changed how to generate the matching sadly!

@martin-luecke
Copy link
Collaborator Author

martin-luecke commented Jan 16, 2024

Cheers guys,
I iterated some more on the analysis and added some missing checks.
The current table looks like this:

Total: s fail d fail, s succ d succ, s fail d succ, s succ d fail, failed analyses
categories: 9395, 253, 317,35,0

|                     |   Passes Dynamic Check |   Fails Dynamic Check |   Total |
|---------------------+------------------------+-----------------------+---------|
| Passes Static Check |               0.878472 |              0.121528 |     288 |
| Fails Static Check  |               0.03264  |              0.96736  |    9712 |
| Total               |             570        |           9430        |   10000 |

The biggest issues we have are:

  1. unsafe erasures and replacements of the root op, would be fixed if we would generate a use for the root op: 597409992, 333889688, 491876988, 675765611, 245997989,726033229, 491648746, 1042419699, 408902191, 202563114,951616026, 569519230, 338758748, 333821845, 7614311,923177764, 260615611, 323906379, 835064486,805965556, 598993846, 813873333, 730872099, 416864332, 912978988, 85600382, 1010392821, 667330844, 200165247, 891082067, 756603494, 377184186
    e.g.
builtin.module {
  pdl.pattern : benefit(1) {
    %0 = pdl.type : i32
    %1 = pdl.operand : %0
    %2 = pdl.type : i32
    %3 = pdl.operand : %2
    %4 = pdl.operation "pdltest.matchop" (%1, %3 : !pdl.value, !pdl.value)
    pdl.rewrite %4 {
      pdl.erase %4
      ^^^^^^^^^-------
      | unsafe erasure
      ----------------
    }
  }
}
  1. A B C ordering, what I mentioned above already
    A
    /
    B C
    rewrite replaces B with C.
    For that we don't generate the counter example.976459372, 935389095, 62163759, 512831000
    e.g.
builtin.module {
  pdl.pattern : benefit(1) {
    %0 = pdl.type : i32
    %1 = pdl.type : i32
    %A = pdl.operation "pdltest.matchop" -> (%0, %1 : !pdl.type, !pdl.type)
    %3 = pdl.result 0 of %A
    %4 = pdl.result 1 of %A
    %5 = pdl.type : i32
    %6 = pdl.operand : %5
    %7 = pdl.type : i32
    %B = pdl.operation "pdltest.matchop" (%6, %3 : !pdl.value, !pdl.value) -> (%7 : !pdl.type)
    %9 = pdl.result 0 of %8
    %C = pdl.operation "pdltest.matchop" (%4 : !pdl.value)
    pdl.rewrite %C {
      pdl.replace %C with %B
      ^^^^^^^^^^^---------------------------------------------------------
      | pdl.Operation used for replacement is not part of the matched DAG!
      --------------------------------------------------------------------
    }
  }
}
  1. A weird one: This one should obviously fail MLIR with wrong number of replacement values (or so) but does not.
    It looks to me like we only generate exactly one MLIR input that does not fail the rewrite.
    390260112:
builtin.module {
  pdl.pattern : benefit(1) {
    %0 = pdl.type : i32
    %1 = pdl.operation "pdltest.matchop" -> (%0 : !pdl.type)
    %2 = pdl.result 0 of %1
    %3 = pdl.operation "pdltest.matchop" (%2 : !pdl.value)
    pdl.rewrite %3 {
      pdl.replace %3 with %1
      ^^^^^^^^^^^-------------------------
      | wrong number of replacement values
      ------------------------------------
    }
  }
}

@math-fehr What are your thoughts? Do you think we can generate counter examples for these?

@math-fehr
Copy link
Contributor

I'll take a look today and see what I can generate! I thought I generated a use for the root op, we'll see if I made a mistake or if it's something else!

@math-fehr
Copy link
Contributor

So just after looking at it, for problem 1, I think it's a bug in the analysis part.
All these examples have roots that have no results, so it's actually valid?

@math-fehr
Copy link
Contributor

For problem 3, my feeling is that this is actually MLIR that has a bug somewhere.
It generates one example, that gets rewritten correctly. My feeling is that if the number of results that are replacing the operation is greater, then MLIR just "forgets" about the last ones.
In the example you sent for instance, we replace an operation with no results with an operation with a single result. So it just deletes the operation.

@math-fehr
Copy link
Contributor

And for problem 2, I think I see the problem.
The issue is that I always thought that PDL patterns were rooted DAGS, but it seems that this is not the case?
At least on your example, there is a root that is matched, and then one other unrelated operation is used for the replacement.
I'll change how I generate examples tomorrow, because the problem is here that we need to use basic blocks again to generate the correct counter example!

@martin-luecke
Copy link
Collaborator Author

martin-luecke commented Jan 17, 2024

So just after looking at it, for problem 1, I think it's a bug in the analysis part. All these examples have roots that have no results, so it's actually valid?

Indeed, I missed that 👍 . I fixed this with my latest commit
This means errors of categories 1 and 2 above should be fixed

@math-fehr
Copy link
Contributor

Wait, categories 2 as well? My feeling is that this is an MLIR bug right? Or is it a misunderstanding in my part that these PDL rewrites should be allowed?

@martin-luecke
Copy link
Collaborator Author

Sorry, too many numbers flying around. I meant category 3.

@math-fehr
Copy link
Contributor

To be fair, I feel that we should keep the analysis you had in category 3.
While MLIR accepts it, I believe this should be an error.

@martin-luecke
Copy link
Collaborator Author

Yes, I agree. I should put in a variable to disable it.
My intention is to collect all bugs we find and submit patches for them eventually.

@math-fehr
Copy link
Contributor

It took longer than expected, but now I have an algorithm for category 2 that should generate all possible matches.
It's actually quite a funny algorithm, but it took longer than expected finding it.

However, in the 4 examples you sent, actually none of them are fixed by it.
This is because the first 2 examples are actually replacement of 0 results, and the other 2 examples seems to be unrelated problems?

Here is an example of program that we can now generate a counter-example correctly:

builtin.module {
  pdl.pattern : benefit(1) {
    %0 = pdl.type : i32
    %1 = pdl.operand : %0
    %2 = pdl.type : i32
    %3 = pdl.operand : %2
    %4 = pdl.type : i32
    %5 = pdl.type : i32
    %6 = pdl.operation "pdltest.matchop" (%1, %3 : !pdl.value, !pdl.value) -> (%4, %5 : !pdl.type, !pdl.type)
    %7 = pdl.result 0 of %6
    %8 = pdl.result 1 of %6
    %9 = pdl.type : i32
    %10 = pdl.operand : %9
    %11 = pdl.type : i32
    %12 = pdl.type : i32
    %13 = pdl.operation "pdltest.matchop" (%7, %10 : !pdl.value, !pdl.value) -> (%11, %12 : !pdl.type, !pdl.type)
    %14 = pdl.result 0 of %13
    %15 = pdl.result 1 of %13
    %16 = pdl.operation "pdltest.matchop" (%8 : !pdl.value) -> (%11 : !pdl.type)
    pdl.rewrite %16 {
      pdl.replace %16 with %13
    }
  }
}

@math-fehr
Copy link
Contributor

62163759 -> seems that your error is that we erase an operation not part of the matching DAG. I believe that an MLIR verifier is missing here?
512831000 -> Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants