Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Commit

Permalink
merge presentation 2
Browse files Browse the repository at this point in the history
  • Loading branch information
connortsui20 committed May 2, 2024
2 parents ed7d8ae + 1840c21 commit 13bc2dc
Show file tree
Hide file tree
Showing 6 changed files with 367 additions and 1 deletion.
1 change: 0 additions & 1 deletion async-bpm
Submodule async-bpm deleted from 5d3afa
Binary file added proposal/images/bpm_design.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added proposal/images/linux_io.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added proposal/images/traditional_bpm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
188 changes: 188 additions & 0 deletions proposal/presentation2.html

Large diffs are not rendered by default.

179 changes: 179 additions & 0 deletions proposal/presentation2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
---
marp: true
theme: default
paginate: true
---

# Eggstrain and Beyond

## **Authors: Connor, Kyle, Sarvesh**

---

# Eggstrain - Demo on TPC-H Query 1

Our proof of concept is a demo on TPC-H Query 1 using the `eggstrain` execution engine.

The main contribution is our asynchronous framework utilizing tokio and rayon threads.

---

# The End of Eggstrain

With this working demo, we have almost completed our initial goal of creating an execution engine with the operators we have previously discussed.

- More work to be more robust and feature-rich
- Foundation for a powerful and efficient async execution engine
- Need new async bpm to support the engine

---

# Recap: Buffer Pool Manager

A buffer pool manager manages synchronizing data between volatile memory and persistent storage.

- In charge of bringing data from storage into memory in the form of pages
- In charge of synchronizing reads and writes to the memory-local page data
- In charge of writing data back out to disk so it is synchronized

---

# Traditional Buffer Pool Manager

![bg right:50% 100%](images/traditional_bpm.png)

Traditional BPMs will use a global hash table that maps page IDs to memory frames.

- Source: _LeanStore: In-Memory Data Management Beyond Main Memory (2018)_

---

# Recap: Blocking I/O

Additionally, traditional buffer pool managers will use blocking reads and writes to send data between memory and persistent storage.

Blocking I/O is heavily reliant on the Operating System.

> The DBMS can almost always manage memory better than the OS
- Source: 15-445 Lecture 6 on Buffer Pools

---

# Recap: I/O System Calls

What happens when we issue a `pread()` or `pwrite()` call?

- We stop what we're doing
- We transfer control to the kernel
- _We are blocked waiting for the kernel to finish and transfer control back_
- _A read from disk is *probably* scheduled somewhere_
- _Something gets copied into the kernel_
- _The kernel copies that something into userspace_
- We come back and resume execution

---

# Blocking I/O for Buffer Pool Managers

Blocking I/O is fine for most situations, but might be a bottleneck for a DBMS's Buffer Pool Manager.

- Typically optimizations are implemented to offset the cost of blocking:
- Pre-fetching
- Scan-sharing
- Background writing
- `O_DIRECT`

---

# Non-blocking I/O

What if we could do I/O _without_ blocking? There exist a few ways to do this:

- `libaio`
- `io_uring`
- SPDK
- All of these allow for _asynchronous I/O_

---

# `io_uring`

![bg right:50% 90%](images/linux_io.png)

This Buffer Pool Manager is going to be built with asynchronous I/O using `io_uring`.

- Source: _What Modern NVMe Storage Can Do, And How To Exploit It... (2023)_

---

# Asynchronous I/O

Asynchronous I/O really only works when the programs running on top of it implement _cooperative multitasking_.

- At a high level, the kernel gets to decide what thread gets to run
- Cooperative multitasking allows the program to decide who gets to run
- Context switching between tasks is a lightweight maneuver
- If one task is waiting for I/O, we can cheaply switch to a different task!

---

# Eggstrain

The key thing here is that our Execution Engine `eggstrain` fully embraces asynchronous execution.

- Rust has first-class support for asynchronous programs
- Using `async` libraries is almost as simple as plug-and-play
- The `tokio` crate is an easy runtime to get set up
- We can easily create a buffer pool manager in the form of a Rust library crate

---

# Goals

The goal of this system is to _fully exploit parallelism_.

- NVMe drives have gotten really, really fast
- Blocking I/O simply cannot match the full throughput of an NVMe drive
- They are _completely_ bottle-necked by today's software
- If we can fully exploit parallelism in software _and_ hardware, we can get close to matching the speed of in-memory systems, while using persistent storage

---

# Proposed Design

The next slide has a proposed design for a fully asynchronous buffer pool manager. The full (somewhat incomplete) writeup can be found [here](https://github.com/Connortsui20/async-bpm).

- Heavily inspired by LeanStore
- Eliminates the global page table and uses tagged pointers to data
- Even more inspired by this paper:
- _What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines (2023)_
- Gabriel Haas and Viktor Leis
- The goal is to _eliminate as many sources of global contention as possible_

---

![bg 100%](images/bpm_design.png)

---

# Some Issues

- There is a scheduler-per-thread, but no scheduler assigning tasks to specific workers (so it does not work with a multithreaded asynchronous runtime)
- The proposed design does not have a backend stage to fully synchronize I/O
- Eviction is done naively by a single worker thread
- Deadlocks!!!
- Will probably switch to polling a list of free frames that gets populated by foreground tasks

---

# Future Work

- This will definitely not be done by the end of this semester
- Which means our execution engine is also not going to be "complete" in that we will not support spill-to-disk
- Our contribution is the beginning of an implementation of an asynchronous Buffer Pool Manager in Rust
- That can theoretically be plugged into any asynchronous execution engine like `eggstrain` or even DataFusion

---

# **Thank you!**

0 comments on commit 13bc2dc

Please sign in to comment.