v3.0
- Added Python bindings for the SCR library
- Supports Python 2 and 3
- Implemented in the
scr.py
module (import scr
) - Uses the C Foreign Function Interface (CFFI) to wrap calls to libscr
- To use the Python bindings, first install SCR, then follow the steps in the python/README.md
- Improved support for large datasets and shared access to files. Applications can now configure SCR to bypass the cache and access datasets on the global file system:
- For datasets that are too large to fit in cache or for systems that have no cache available, SCR can use the global file system. This improves portability so that applications can use SCR on any cluster.
- Since bypass mode is more general, it is enabled by default. To use cache, one must disable bypass mode by setting (
SCR_CACHE_BYPASS=0
). - For applications that write shared files, SCR can use bypass mode during the SCR Checkpoint/Output API.
- For applications that write datasets as a file-per-process but require shared access to files during restart, one can write to cache but set
SCR_GLOBAL_RESTART=1
. This rebuilds and flushes cached datasets duringSCR_Init
. It also enables bypass mode for restart so that the application can read its dataset from the global file system using the SCR Restart API.
- Applications can now instruct SCR to load a specific checkpoint by naming it in the
SCR_CURRENT
parameter before callingSCR_Init
. - Restart loop:
- SCR now supports a loop around
SCR_Have_restart
,SCR_Start_restart
, andSCR_Complete_restart
. If an application detects a problem during its restart, it can passvalid=0
toSCR_Complete_restart
. SCR will then load the next most recent checkpoint, which the application can query with another call toSCR_Have_restart
. This process can be continued until either a checkpoint is read successfully or all checkpoints have been exhausted.
- SCR now supports a loop around
SCR_Need_checkpoint
now returns false unless one has set one ofSCR_CHECKPOINT_INTERVAL/SECONDS/OVERHEAD
- Restored watchdog support on SLURM systems
- New build options:
- Added support for static-only builds with
-DBUILD_SHARED_LIBS=OFF
- Added CMake options to disable portions of the build including
-DENABLE_EXAMPLES=[ON/OFF]
and-DENABLE_TESTS=[ON/OFF]
- Added support to specify the number of trailing underscores for Fortran bindings with
-DENABLE_FORTRAN_TRAILING_UNDERSCORES=[AUTO/ON/OFF]
- Added support for static-only builds with
- New API calls:
SCR_Config(const char* config)
to set and query SCR configuration parameters beforeSCR_Init()
, and query parameters afterSCR_Init()
.SCR_Configf(const char* config, ...)
a version ofSCR_Config
that supports printf-style formatting.SCR_Current(const char* name)
enables an application that reads its checkpoint without using the SCR Restart API to inform SCR about which checkpoint it loaded so that SCR can still track the proper ordering of checkpointsSCR_Delete(const char* name)
to ask SCR to delete a datasetSCR_Drop(const char* name)
to ask SCR to drop a dataset from the index without deleting the underlying data files
- Improved flush methods
- Added IBM BB API (https://github.com/IBM/CAST), e.g.,
SCR_FLUSH_TYPE=BBAPI
- Added pthreads, e.g.,
SCR_FLUSH_TYPE=PTHREAD
- Added support for multiple outstanding asynchronous flushes
- Initial support for
scr_poststage
of BBAPI transfers after completion of allocation (beta)
- Added IBM BB API (https://github.com/IBM/CAST), e.g.,
- New redundancy scheme:
- Reed-Solomon encoding (
SCR_COPY_TYPE=RS
) allows a configurable number of failures per group, from 1 to N-1 where N is the set size. UseSCR_SET_SIZE
to specify the group size andSCR_SET_FAILURES
to specify the number of failures per group.
- Reed-Solomon encoding (
- SCR configuration parameters now support interpolation of environment variables in configuration files, e.g.,
>>: cat .scrconf SCR_CACHE_BASE=$BBPATH
- Default path for SCR system configuration file moved from
/etc/scr.conf
to<install>/etc/scr.conf
- SCR now preserves file metadata including atime, mtime, uid, gid, and mode bits
- New logging options:
- text file - written to the SCR prefix directory (
SCR_LOG_TXT_ENABLE=1
) - syslog - one can configure the syslog prefix, facility, and level to be used (
SCR_LOG_SYSLOG_ENABLE=1
)
- text file - written to the SCR prefix directory (
- Apps can now configure SCR to maintain a sliding window of checkpoints on the parallel file system with an
SCR_PREFIX_SIZE
parameter. After flushing a new checkpoint, SCR will delete older checkpoints - Default cache and control directories have been moved from
/tmp
to/dev/shm
on Linux systems - Assists for application developers when integrating the SCR API
- A new
SCR_CACHE_PURGE
parameter configures SCR to delete datasets from cache in new runs - A new
SCR_PREFIX_PURGE
parameter similarly deletes datasets from the prefix directory in new runs - Added internal checks to warn developers about incorrect API usage
- A new
- Refactored code base to use ECP-VeloC components https://github.com/ecp-veloc/
- Improves code modularity and reuse
- Improved testing
- New release tarball packages source for SCR and many of its components to simplify direct builds, e.g.,
wget https://github.com/LLNL/scr/releases/download/v3.0/scr-v3.0.tgz tar -xzf scr-v3.0.tgz cd scr-v3.0 mkdir build cd build cmake -DCMAKE_INSTALL_PREFIX=../install -DSCR_RESOURCE_MANAGER=SLURM ../ make -j install