Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

22 Crash Recovery Bugs Found in PMDK Ports #1

Open
iangneal opened this issue Jan 31, 2022 · 1 comment
Open

22 Crash Recovery Bugs Found in PMDK Ports #1

iangneal opened this issue Jan 31, 2022 · 1 comment

Comments

@iangneal
Copy link

Summary

We found 22 crash-consistency bugs introduced in WITCHER's PMDK adaptation layer added during the process of porting data structures to us a PM allocator. These bugs are all related to the initialization phase of the data structures (i.e., during PMDK pool creation and allocation of the root/parent structures of the key-value data structures).

Since these bugs are specific to WITCHER's port, I'm filing these bugs here rather than with the maintainers of the ported applications/data structures.

NULL dereferences

Most of these bugs are related to the misuse of the PMDK API. In the common function init_nvmm_pool: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/pmdk.c#L5-L30

If a crash occurs during pmemobj_create: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/pmdk.c#L15

Then calls to pmemobj_open in the post-crash execution of the application will return NULL. This case is handled within init_nvmm_pool by returning NULL: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/pmdk.c#L21-L24

However, all of the benchmarks that use init_nvmm_pool do not check for the case when init_nvmm_pool returns NULL. Ergo, these benchmarks will immediately segfault when trying to . For example, in WOART: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L785-L789

If root_obj is NULL, a segfault will occur on access to root_obj->woat_ptr: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L788

Non-atomic initialization/lack of NULL checks

Many of the benchmarks that use init_nvmm_pool do not check for the crash when a crash occurs between pool allocation and the initial allocation of the object. Ergo, they will immediately crash upon their first read of the data structure, as the root pointer is not allocated.

For example, during WOART initialization: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L780-L797

If there is a crash after init_nvmm_pool and root_obj->woart_ptr = tree;, then root_obj will not be NULL but root_obj->woart_ptr will still be NULL, and init_woart will return NULL on restart: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L788

This will cause NULL to be passed to the art_{search,insert} functions, i.e.: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/WORT/woart/main/main.c#L34

Since the art_* functions assume they will not be passed a NULL pointer: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L293-L295

This will ultimately result in a segfault on recovery.

Specific Bugs by Location/Benchmark

Here's a list of all the bugs by their source code locations. All of these bugs fall into the previously-outlined groups, so the per-bug explanations are left fairly terse.

WOART

Bug 1:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L785-L788

  • No NULL check on init_nvmm_pool return value.

Bug 2:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L794

WORT

Bug 3:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/wort/wort.c#L103-L107

  • No NULL check on init_nvmm_pool return value.

Bug 4:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/wort/wort.c#L113

  • (Similar to Bug 2) Non-atomic allocation of art_tree can result in init_wort returning NULL.

FAST & FAIR

These bugs are both in the single port and the concurrent port.

Bug 5:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/single/src/btree.h#L1039-L1043

  • No NULL check on init_nvmm_pool return value.

Bug 6:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/single/src/btree.h#L1044-L1046

  • init_FastFair can return NULL, leading to subsequent segfault.

Bug 7:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/concurrent/src/btree.h#L1174-L1180

  • No NULL check on init_nvmm_pool return value.

Bug 8:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/concurrent/src/btree.h#L1182-L1183

  • init_FastFair can return NULL, leading to subsequent segfault.

Level Hashing

Bug 9:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/Level_Hashing/persistent_level_hashing/level_hashing.c#L66-L69

  • No NULL check on init_nvmm_pool return value.

Bug 10:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/Level_Hashing/persistent_level_hashing/level_hashing.c#L69
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/Level_Hashing/persistent_level_hashing/level_hashing.c#L121

  • level_init can return NULL, leading to subsequent segfault.

CCEH

Bug 11:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/CCEH/src/CCEH_MSB.cpp#L525-L528

  • No NULL check on init_nvmm_pool return value.

Bug 12:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/CCEH/src/CCEH_MSB.cpp#L528-L529

  • No NULL check on root_obj->cceh_ptr, leading to segfault on ret->Recovery().

P-ART

Bug 13:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-ART/Tree.cpp#L70-L73

  • No NULL check on init_nvmm_pool return value.

Bug 14:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-ART/Tree.cpp#L72-L74

  • No NULL check on root_obj->p_art_ptr, leading to segfault on tree->recover().

P-BwTree

Bug 15:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-BwTree/main/main.cpp#L71-L74

  • No NULL check on init_nvmm_pool return value.

Bug 16:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-BwTree/main/main.cpp#L74-L75

  • No NULL check on root_obj->p_bwtree_ptr, leading to segfault on tree->re_init().

P-CLHT

Bug 17:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-CLHT/src/clht_lb_res.c#L1060-L1063

  • No NULL check on init_nvmm_pool return value.

Bug 18:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-CLHT/src/clht_lb_res.c#L1061-L1065

  • No NULL check on root_obj->clht_ptr, leading to segfault on clht_lock_initialization(hashtable).

P-HOT

Bug 19:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-HOT/main/main.cpp#L133-L136

  • No NULL check on init_nvmm_pool return value.

Bug 20:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-HOT/main/main.cpp#L134-L138

  • No NULL check on root_obj->p_hot_ptr, leading to segfault on tree->recovery().

P-Masstree

Bug 21:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-Masstree/masstree.h#L2358-L2361

  • No NULL check on init_nvmm_pool return value.

Bug 22:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-Masstree/masstree.h#L2359-L2364

  • No NULL check on root_obj->p_mt_ptr, leading to segfault on tree->recovery().
@iangneal
Copy link
Author

FYI for anyone who stumbles upon this repo: I confirmed with the authors via email that they are aware of these bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant