Skip to content

Commit

Permalink
Merge pull request #557 from rapidsai/branch-24.12
Browse files Browse the repository at this point in the history
Forward-merge branch-24.12 into branch-25.02
  • Loading branch information
GPUtester authored Nov 20, 2024
2 parents db606fe + 5ff93bc commit 8d19a29
Show file tree
Hide file tree
Showing 18 changed files with 414 additions and 104 deletions.
11 changes: 8 additions & 3 deletions cpp/doxygen/main_page.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,19 @@ Then run the example:
## Runtime Settings

#### Compatibility Mode (KVIKIO_COMPAT_MODE)
When KvikIO is running in compatibility mode, it doesn't load `libcufile.so`. Instead, reads and writes are done using POSIX. Notice, this is not the same as the compatibility mode in cuFile. That is cuFile can run in compatibility mode while KvikIO is not.
When KvikIO is running in compatibility mode, it doesn't load `libcufile.so`. Instead, reads and writes are done using POSIX. Notice, this is not the same as the compatibility mode in cuFile. It is possible that KvikIO performs I/O in the non-compatibility mode by using the cuFile library, but the cuFile library itself is configured to operate in its own compatibility mode. For more details, refer to [cuFile compatibility mode](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html#cufile-compatibility-mode) and [cuFile environment variables](https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#environment-variables)

Set the environment variable `KVIKIO_COMPAT_MODE` to enable/disable compatibility mode. By default, compatibility mode is enabled:
The environment variable `KVIKIO_COMPAT_MODE` has three options (case-insensitive):
- `ON` (aliases: `TRUE`, `YES`, `1`): Enable the compatibility mode.
- `OFF` (aliases: `FALSE`, `NO`, `0`): Disable the compatibility mode, and enforce cuFile I/O. GDS will be activated if the system requirements for cuFile are met and cuFile is properly configured. However, if the system is not suited for cuFile, I/O operations under the `OFF` option may error out, crash or hang.
- `AUTO`: Try cuFile I/O first, and fall back to POSIX I/O if the system requirements for cuFile are not met.

Under `AUTO`, KvikIO falls back to the compatibility mode:
- when `libcufile.so` cannot be found.
- when running in Windows Subsystem for Linux (WSL).
- when `/run/udev` isn't readable, which typically happens when running inside a docker image not launched with `--volume /run/udev:/run/udev:ro`.

This setting can also be controlled by `defaults::compat_mode()` and `defaults::compat_mode_reset()`.
This setting can also be programmatically controlled by `defaults::set_compat_mode()` and `defaults::compat_mode_reset()`.


#### Thread Pool (KVIKIO_NTHREADS)
Expand Down
4 changes: 2 additions & 2 deletions cpp/examples/basic_io.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ int main()
check(cudaSetDevice(0) == cudaSuccess);

cout << "KvikIO defaults: " << endl;
if (kvikio::defaults::compat_mode()) {
if (kvikio::defaults::is_compat_mode_preferred()) {
cout << " Compatibility mode: enabled" << endl;
} else {
kvikio::DriverInitializer manual_init_driver;
Expand Down Expand Up @@ -181,7 +181,7 @@ int main()
cout << "Parallel POSIX read (" << kvikio::defaults::thread_pool_nthreads()
<< " threads): " << read << endl;
}
if (kvikio::is_batch_and_stream_available() && !kvikio::defaults::compat_mode()) {
if (kvikio::is_batch_and_stream_available() && !kvikio::defaults::is_compat_mode_preferred()) {
std::cout << std::endl;
Timer timer;
// Here we use the batch API to read "/tmp/test-file" into `b_dev` by
Expand Down
2 changes: 1 addition & 1 deletion cpp/examples/basic_no_cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ constexpr int LARGE_SIZE = 8 * SIZE; // LARGE SIZE to test partial s
int main()
{
cout << "KvikIO defaults: " << endl;
if (kvikio::defaults::compat_mode()) {
if (kvikio::defaults::is_compat_mode_preferred()) {
cout << " Compatibility mode: enabled" << endl;
} else {
kvikio::DriverInitializer manual_init_driver;
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/kvikio/batch.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
* Copyright (c) 2023-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -118,7 +118,7 @@ class BatchHandle {
std::vector<CUfileIOParams_t> io_batch_params;
io_batch_params.reserve(operations.size());
for (const auto& op : operations) {
if (op.file_handle.is_compat_mode_on()) {
if (op.file_handle.is_compat_mode_preferred()) {
throw CUfileException("Cannot submit a FileHandle opened in compatibility mode");
}

Expand Down
4 changes: 2 additions & 2 deletions cpp/include/kvikio/buffer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ inline void buffer_register(const void* devPtr_base,
int flags = 0,
const std::vector<int>& errors_to_ignore = std::vector<int>())
{
if (defaults::compat_mode()) { return; }
if (defaults::is_compat_mode_preferred()) { return; }
CUfileError_t status = cuFileAPI::instance().BufRegister(devPtr_base, size, flags);
if (status.err != CU_FILE_SUCCESS) {
// Check if `status.err` is in `errors_to_ignore`
Expand All @@ -67,7 +67,7 @@ inline void buffer_register(const void* devPtr_base,
*/
inline void buffer_deregister(const void* devPtr_base)
{
if (defaults::compat_mode()) { return; }
if (defaults::is_compat_mode_preferred()) { return; }
CUFILE_TRY(cuFileAPI::instance().BufDeregister(devPtr_base));
}

Expand Down
138 changes: 122 additions & 16 deletions cpp/include/kvikio/defaults.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/**
* @file
*/

#pragma once

#include <algorithm>
Expand All @@ -27,7 +32,48 @@
#include <kvikio/shim/cufile.hpp>

namespace kvikio {
/**
* @brief I/O compatibility mode.
*/
enum class CompatMode : uint8_t {
OFF, ///< Enforce cuFile I/O. GDS will be activated if the system requirements for cuFile are met
///< and cuFile is properly configured. However, if the system is not suited for cuFile, I/O
///< operations under the OFF option may error out, crash or hang.
ON, ///< Enforce POSIX I/O.
AUTO, ///< Try cuFile I/O first, and fall back to POSIX I/O if the system requirements for cuFile
///< are not met.
};

namespace detail {
/**
* @brief Parse a string into a CompatMode enum.
*
* @param compat_mode_str Compatibility mode in string format(case-insensitive). Valid values
* include:
* - `ON` (alias: `TRUE`, `YES`, `1`)
* - `OFF` (alias: `FALSE`, `NO`, `0`)
* - `AUTO`
* @return A CompatMode enum.
*/
inline CompatMode parse_compat_mode_str(std::string_view compat_mode_str)
{
// Convert to lowercase
std::string tmp{compat_mode_str};
std::transform(
tmp.begin(), tmp.end(), tmp.begin(), [](unsigned char c) { return std::tolower(c); });

CompatMode res{};
if (tmp == "on" || tmp == "true" || tmp == "yes" || tmp == "1") {
res = CompatMode::ON;
} else if (tmp == "off" || tmp == "false" || tmp == "no" || tmp == "0") {
res = CompatMode::OFF;
} else if (tmp == "auto") {
res = CompatMode::AUTO;
} else {
throw std::invalid_argument("Unknown compatibility mode: " + std::string{tmp});
}
return res;
}

template <typename T>
T getenv_or(std::string_view env_var_name, T default_val)
Expand Down Expand Up @@ -77,16 +123,24 @@ inline bool getenv_or(std::string_view env_var_name, bool default_val)
std::string{env_val});
}

template <>
inline CompatMode getenv_or(std::string_view env_var_name, CompatMode default_val)
{
auto* env_val = std::getenv(env_var_name.data());
if (env_val == nullptr) { return default_val; }
return parse_compat_mode_str(env_val);
}

} // namespace detail

/**
* @brief Singleton class of default values used thoughtout KvikIO.
* @brief Singleton class of default values used throughout KvikIO.
*
*/
class defaults {
private:
BS::thread_pool _thread_pool{get_num_threads_from_env()};
bool _compat_mode;
CompatMode _compat_mode;
std::size_t _task_size;
std::size_t _gds_threshold;
std::size_t _bounce_buffer_size;
Expand All @@ -104,13 +158,7 @@ class defaults {
{
// Determine the default value of `compat_mode`
{
if (std::getenv("KVIKIO_COMPAT_MODE") != nullptr) {
// Setting `KVIKIO_COMPAT_MODE` take precedence
_compat_mode = detail::getenv_or("KVIKIO_COMPAT_MODE", false);
} else {
// If `KVIKIO_COMPAT_MODE` isn't set, we infer based on runtime environment
_compat_mode = !is_cufile_available();
}
_compat_mode = detail::getenv_or("KVIKIO_COMPAT_MODE", CompatMode::AUTO);
}
// Determine the default value of `task_size`
{
Expand Down Expand Up @@ -163,19 +211,77 @@ class defaults {
* - when `/run/udev` isn't readable, which typically happens when running inside a docker
* image not launched with `--volume /run/udev:/run/udev:ro`
*
* @return The boolean answer
* @return Compatibility mode.
*/
[[nodiscard]] static CompatMode compat_mode() { return instance()->_compat_mode; }

/**
* @brief Reset the value of `kvikio::defaults::compat_mode()`.
*
* Changing the compatibility mode affects all the new FileHandles whose `compat_mode` argument is
* not explicitly set, but it never affects existing FileHandles.
*
* @param compat_mode Compatibility mode.
*/
static void compat_mode_reset(CompatMode compat_mode) { instance()->_compat_mode = compat_mode; }

/**
* @brief Infer the `AUTO` compatibility mode from the system runtime.
*
* If the requested compatibility mode is `AUTO`, set the expected compatibility mode to
* `ON` or `OFF` by performing a system config check; otherwise, do nothing. Effectively, this
* function reduces the requested compatibility mode from three possible states
* (`ON`/`OFF`/`AUTO`) to two (`ON`/`OFF`) so as to determine the actual I/O path. This function
* is lightweight as the inferred result is cached.
*/
static CompatMode infer_compat_mode_if_auto(CompatMode compat_mode)
{
if (compat_mode == CompatMode::AUTO) {
static auto inferred_compat_mode_for_auto = []() -> CompatMode {
return is_cufile_available() ? CompatMode::OFF : CompatMode::ON;
}();
return inferred_compat_mode_for_auto;
}
return compat_mode;
}

/**
* @brief Given a requested compatibility mode, whether it is expected to reduce to `ON`.
*
* This function returns true if any of the two condition is satisfied:
* - The compatibility mode is `ON`.
* - It is `AUTO` but inferred to be `ON`.
*
* Conceptually, the opposite of this function is whether requested compatibility mode is expected
* to be `OFF`, which would occur if any of the two condition is satisfied:
* - The compatibility mode is `OFF`.
* - It is `AUTO` but inferred to be `OFF`.
*
* @param compat_mode Compatibility mode.
* @return Boolean answer.
*/
[[nodiscard]] static bool compat_mode() { return instance()->_compat_mode; }
static bool is_compat_mode_preferred(CompatMode compat_mode)
{
return compat_mode == CompatMode::ON ||
(compat_mode == CompatMode::AUTO &&
defaults::infer_compat_mode_if_auto(compat_mode) == CompatMode::ON);
}

/**
* @brief Reset the value of `kvikio::defaults::compat_mode()`
* @brief Whether the global compatibility mode from class defaults is expected to be `ON`.
*
* This function returns true if any of the two condition is satisfied:
* - The compatibility mode is `ON`.
* - It is `AUTO` but inferred to be `ON`.
*
* Changing compatibility mode, effects all new FileHandles that doesn't sets the
* `compat_mode` argument explicitly but it never effect existing FileHandles.
* Conceptually, the opposite of this function is whether the global compatibility mode is
* expected to be `OFF`, which would occur if any of the two condition is satisfied:
* - The compatibility mode is `OFF`.
* - It is `AUTO` but inferred to be `OFF`.
*
* @param enable Whether to enable compatibility mode or not.
* @return Boolean answer.
*/
static void compat_mode_reset(bool enable) { instance()->_compat_mode = enable; }
static bool is_compat_mode_preferred() { return is_compat_mode_preferred(compat_mode()); }

/**
* @brief Get the default thread pool.
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/kvikio/error.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ struct CUfileException : public std::runtime_error {
if (error != CUDA_SUCCESS) { \
const char* err_name = nullptr; \
const char* err_str = nullptr; \
CUresult err_name_status = cudaAPI::instance().GetErrorName(error, &err_name); \
CUresult err_str_status = cudaAPI::instance().GetErrorString(error, &err_str); \
CUresult err_name_status = kvikio::cudaAPI::instance().GetErrorName(error, &err_name); \
CUresult err_str_status = kvikio::cudaAPI::instance().GetErrorString(error, &err_str); \
if (err_name_status == CUDA_ERROR_INVALID_VALUE) { err_name = "unknown"; } \
if (err_str_status == CUDA_ERROR_INVALID_VALUE) { err_str = "unknown"; } \
throw(_exception_type){std::string{"CUDA error at: "} + __FILE__ + ":" + \
Expand Down
Loading

0 comments on commit 8d19a29

Please sign in to comment.