Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHC internal error reported in pate container #414

Closed
thebendavis opened this issue Jun 21, 2024 · 5 comments
Closed

GHC internal error reported in pate container #414

thebendavis opened this issue Jun 21, 2024 · 5 comments

Comments

@thebendavis
Copy link
Member

thebendavis commented Jun 21, 2024

I'm seeing what looks like PATE triggering a GHC 9.6.2 bug in our current docker image built by CI, running on my M1 mac.

I pulled the current docker image built from CI on the current master and tried running it on the scada.exe example:

$ docker run --platform linux/amd64 --rm -it -v "$(pwd)":/z artifactory.galois.com:5025/pate/pate:refs-heads-master -o /z/scada.exe -p /z/scada.patched.exe -s parse_packet 

Up to date
.
Choose Entry Point
0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)
?>1
.........
0: Function Entry "parse_packet" (segment1+0x554) (User Request).......
1: segment1+0x580 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)...assertion failed [node != nullptr]: There is a hole in the VmTracker tree at address 0xfffffffffffffffa
(VMAllocationTracker.cpp:137 vm_allocation_info_for_address)
 /home/src/pate.sh: line 26:    35 Trace/breakpoint trap   cabal v2-exec ghci -- -v0 -fobject-code -fno-warn-type-defaults -fno-warn-missing-home-modules -threaded -rtsopts "-with-rtsopts=-N -A16M -c" -i"${SCRIPT_DIR}/tools/pate-repl/" -ghci-script ${temp_ghci_cd} -ghci-script "${SCRIPT_DIR}/loadrepl.ghci" -ghci-script ${temp_ghci}

If I re-run the above, sometimes I get other errors, e.g.

ghc-9.6.2: internal error: allocGroup: free list corrupted
    (GHC version 9.6.2 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
@thebendavis thebendavis changed the title Encountering GHC 9.6.2 bug Encountering GHC bug Jun 21, 2024
@thebendavis
Copy link
Member Author

thebendavis commented Jun 21, 2024

In #413 I tried building the docker container in CI with GHC 9.6.5 - my hope was that some GHC bugfixes in the newer releases would address the issue I described above. Unfortunately, I still encounter issues with the GHC 9.6.5-built PATE:

$ docker run --platform linux/amd64 --rm -it -v "$(pwd)":/z artifactory.galois.com:5025/pate/pate:refs-pull-413-merge -o /z/scada.exe -p /z/scada.patched.exe -s parse_packet

Up to date
.
Choose Entry Point
0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)
?>1
........
0: Function Entry "parse_packet" (segment1+0x554) (User Request)......
1: segment1+0x580 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)......
2: segment1+0x5ac [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
3: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)......
4: segment1+0x5cc [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)......
5: segment1+0x5e0 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)......
6: segment1+0x600 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
7: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)......
8: segment1+0x624 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)..ghc-9.6.5: internal error: evacuate: strange closure type -795295279
    (GHC version 9.6.5 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
/home/src/pate.sh: line 26:    35 Aborted                 cabal v2-exec ghci -- -v0 -fobject-code -fno-warn-type-defaults -fno-warn-missing-home-modules -threaded -rtsopts "-with-rtsopts=-N -A16M -c" -i"${SCRIPT_DIR}/tools/pate-repl/" -ghci-script ${temp_ghci_cd} -ghci-script "${SCRIPT_DIR}/loadrepl.ghci" -ghci-script ${temp_ghci}

If I try a few times I'll see slightly different errors, e.g.

ghc-9.6.5: internal error: scavenge: unimplemented/strange closure type 240533401 @ 0x428b4aa928
    (GHC version 9.6.5 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug

but I've also seen it segfault with a message like:

$ docker run --platform linux/amd64 --rm -it -v "$(pwd)":/z artifactory.galois.com:5025/pate/pate:refs-pull-413-merge -o /z/scada.exe -p /z/scada.patched.exe -s parse_packet

Up to date
.
Choose Entry Point
0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)
?>1
......../home/src/pate.sh: line 26:    35 Segmentation fault      cabal v2-exec ghci -- -v0 -fobject-code -fno-warn-type-defaults -fno-warn-missing-home-modules -threaded -rtsopts "-with-rtsopts=-N -A16M -c" -i"${SCRIPT_DIR}/tools/pate-repl/" -ghci-script ${temp_ghci_cd} -ghci-script "${SCRIPT_DIR}/loadrepl.ghci" -ghci-script ${temp_ghci}

@thebendavis thebendavis changed the title Encountering GHC bug Encountering GHC bug in pate CI-built container Jun 21, 2024
@thebendavis
Copy link
Member Author

I tried building and running pate locally on my host (not using the docker container) and ./pate.sh processed this target just fine for me. So I'm wondering if something weird is going on specifically with our docker container.

@thebendavis
Copy link
Member Author

thebendavis commented Jun 21, 2024

On a separate linux VM running on x86_64 hardware, I have

  1. built my own docker container and observed pate working on this target
  2. pulled the same master CI-built container and observed pate working on this target

So I am now suspicious the issue may have something to do with running an amd64 container on my M1 (arm64) mac system. On my M1 mac, I tried disabling the "Use Rosetta for x86_64/amd64 emulation on Apple Silicon" option and rerunning, but I observed that pate runs much slower (expected) and is eventually Killed before it can process much of the target.

Up to date
.
Choose Entry Point
0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)
?>1
..........................................................................................................................................................................................................................................
0: Function Entry "parse_packet" (segment1+0x554) (User Request)..........................................................................................................................................................................................................................................
1: segment1+0x580 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains).................................................................................................................................................
/home/src/pate.sh: line 26:    59 Killed                  cabal v2-exec ghci -- -v0 -fobject-code -fno-warn-type-defaults -fno-warn-missing-home-modules -threaded -rtsopts "-with-rtsopts=-N -A16M -c" -i"${SCRIPT_DIR}/tools/pate-repl/" -ghci-script ${temp_ghci_cd} -ghci-script "${SCRIPT_DIR}/loadrepl.ghci" -ghci-script ${temp_ghci}

@thebendavis
Copy link
Member Author

Another test: on my M1 mac, I tried disabling "Use Virtualization framework" and all related options ("VirtioFS" and "Use Rosetta") in Docker Desktop for Mac and tried re-running the CI-built master docker container from above.

It is eventually also killed, but makes it much further first:

$ docker run --platform linux/amd64 --rm -it -v "$(pwd)":/z artifactory.galois.com:5025/pate/pate:refs-heads-master -o /z/scada.exe -p /z/scada.patched.exe -s parse_packet

Up to date
..
Choose Entry Point
0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)

?>
?>
?>wait

0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)
?>1
..........................................................................................................................................................................................................................................
0: Function Entry "parse_packet" (segment1+0x554) (User Request)...........................................................................................................................................................................................................................................
1: segment1+0x580 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)........................................................................................................................................................................
2: segment1+0x5ac [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains).....................
3: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)..........................................................................................................................................................
4: segment1+0x5cc [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)............................................................................................................................
5: segment1+0x5e0 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)..................................................................................................................................
6: segment1+0x600 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)................
7: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)..............................................................................................................................
8: segment1+0x624 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)..........................................................................................
Handle observable difference:
0: Emit warning and continue 
1: Assert difference is infeasible (defer proof) 
2: Assert difference is infeasible (prove immediately) 
3: Assume difference is infeasible 
4: Avoid difference with equivalence condition 
?>4

0: Function Entry "parse_packet" (segment1+0x554) (User Request)
1: segment1+0x580 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
2: segment1+0x5ac [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
3: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)
4: segment1+0x5cc [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
5: segment1+0x5e0 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
6: segment1+0x600 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
7: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)
8: segment1+0x624 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)..
9: segment1+0x644 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains).......................
/home/src/pate.sh: line 26:    59 Killed                  cabal v2-exec ghci -- -v0 -fobject-code -fno-warn-type-defaults -fno-warn-missing-home-modules -threaded -rtsopts "-with-rtsopts=-N -A16M -c" -i"${SCRIPT_DIR}/tools/pate-repl/" -ghci-script ${temp_ghci_cd} -ghci-script "${SCRIPT_DIR}/loadrepl.ghci" -ghci-script ${temp_ghci}

@thebendavis
Copy link
Member Author

Ok, at this point it seems this is a Docker Desktop + arm64 mac virtualization issue rather than anything wrong with our container(s).

There are other recent segfault issues posted in the docker/for-mac repo issue tracker. A response posted today suggests trying an internal build of Docker Desktop for mac, a pre-release of Docker Desktop 4.32.0. When I try this version, our container works again. So this seems to be an issue with the current stable release of Docker Desktop, which should be fixed in an upcoming release.

@thebendavis thebendavis changed the title Encountering GHC bug in pate CI-built container Encountering GHC bug in pate container Jun 21, 2024
@thebendavis thebendavis changed the title Encountering GHC bug in pate container GHC internal error reported in pate container Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant