Variable | Description |
---|---|
QSV_DOTENV_PATH |
The full pathname of the dotenv file to load, OVERRIDING existing environment variables. This takes precedence over any other dotenv files in the filesystem. |
QSV_DEFAULT_DELIMITER |
single ascii character to use as delimiter. Overrides --delimiter option. Defaults to "," (comma) for CSV files & "\t" (tab) for TSV files when not set. Note that this will also set the delimiter for qsv's output to stdout.However, using the --output option, regardless of this environment variable, will automatically change the delimiter used in the generated file based on the file extension - i.e. comma for .csv ; tab for .tsv & .tab ; and semicolon for .ssv files |
QSV_SNIFF_DELIMITER |
if set, the delimiter is automatically detected. Overrides QSV_DEFAULT_DELIMITER & --delimiter option. Note that this does not work with stdin. |
QSV_NO_HEADERS |
if set, the first row will NOT be interpreted as headers. Supersedes QSV_TOGGLE_HEADERS . |
QSV_TOGGLE_HEADERS |
if set to 1 , toggles header setting - i.e. inverts qsv header behavior, with no headers being the default, & setting --no-headers will actually mean headers will not be ignored. |
QSV_ANTIMODES_LEN |
set to the maximum number of characters when listing "antimodes" in stats . Otherwise, the default is 100 (max: 5192). |
QSV_AUTOINDEX_SIZE |
if set, specifies the minimum file size (in bytes) of a CSV file before an index is automatically created. Note that stale indices are automatically updated regardless of this setting. |
QSV_CACHE_DIR |
The directory to use for caching downloaded lookup_table resources using the luau qsv_register_lookup() helper function. |
QSV_CKAN_API |
The CKAN Action API endpoint to use with the luau qsv_register_lookup() helper function when using the "ckan://" scheme. |
QSV_CKAN_TOKEN |
The CKAN token to use with the luau qsv_register_lookup() helper function when using the "ckan://" scheme. Only required to access private resources. |
QSV_COMMENT_CHAR |
set to an ascii character. If set, any lines(including the header) that start with this character are ignored. |
QSV_MAX_JOBS |
number of jobs to use for multithreaded commands (currently apply , applydp , dedup , diff , extsort , frequency , joinp , schema , snappy , sort , split , stats , to , tojsonl & validate ). If not set, max_jobs is set to the detected number of logical processors. See Multithreading for more info. |
QSV_NO_UPDATE |
if set, prohibit self-update version check for the latest qsv release published on GitHub. |
QSV_LLM_APIKEY |
The API key of the supported LLM service to use with the describegpt command. |
QSV_OUTPUT_BOM |
if set, the output will have a Byte Order Mark (BOM) at the beginning. This is used to generate Excel-friendly CSVs on Windows. |
QSV_PREFER_DMY |
if set, date parsing will use DMY format. Otherwise, use MDY format (used with datefmt , schema , sniff & stats commands). |
QSV_REGEX_UNICODE |
if set, makes search , searchset & replace commands unicode-aware. For increased performance, these commands are not unicode-aware by default & will ignore unicode values when matching & will abort when unicode characters are used in the regex. Note that the apply operations regex_replace operation is always unicode-aware. |
QSV_RDR_BUFFER_CAPACITY |
reader buffer size (default - 128k (bytes): 131072) |
QSV_SKIP_FORMAT_CHECK |
if set, skips mime-type checking of input files. Set this when optimizing for performance and when encountering false positives as a format check involves scanning the input file to infer the mime-type/format. |
QSV_WTR_BUFFER_CAPACITY |
writer buffer size (default - 512k (bytes): 524288) |
QSV_FREEMEMORY_HEADROOM_PCT |
the percentage of free available memory required when running qsv in "non-streaming" mode (i.e. the entire file needs to be loaded into memory). If the incoming file is greater than the available memory after the headroom is subtracted, qsv will not proceed. Set to 0 to skip memory check. See Memory Management for more info. (default: (percent) 20 ) |
QSV_MEMORY_CHECK |
if set, check if input file size < AVAILABLE memory - HEADROOM (CONSERVATIVE mode) when running in "non-streaming" mode. Otherwise, qsv will only check if the input file size < TOTAL memory - HEADROOM (NORMAL mode). This is done to prevent Out-of-Memory errors. See Memory Management for more info. |
QSV_LOG_LEVEL |
desired level (default - off; error , warn , info , trace , debug ). |
QSV_LOG_DIR |
when logging is enabled, the directory where the log files will be stored. If the specified directory does not exist, qsv will attempt to create it. If not set, the log files are created in the directory where qsv was started. See Logging for more info. |
QSV_LOG_UNBUFFERED |
if set, log messages are written directly to disk, without buffering. Otherwise, log messages are buffered before being written to the log file (8k buffer, flushing every second). See flexi_logger for details. |
QSV_PROGRESSBAR |
if set, enable the --progressbar option on the apply , fetch , fetchpost , foreach , luau , py , replace , search , searchset , sortcheck & validate commands. |
QSV_DISKCACHE_TTL_SECONDS |
set time-to-live of diskcache cached values (default (seconds): 2419200 (28 days)). |
QSV_DISKCACHE_TTL_REFRESH |
if set, enables cache hits to refresh TTL of diskcache cached values. |
QSV_REDIS_CONNSTR |
the fetch command can use Redis to cache responses. Set to connect to the desired Redis instance. (default: redis:127.0.0.1:6379/1 ). For more info on valid Redis connection string formats, click here. |
QSV_FP_REDIS_CONNSTR |
the fetchpost command can also use Redis to cache responses (default: redis:127.0.0.1:6379/2 ). Note that fetchpost connects to database 2, as opposed to fetch which connects to database 1. |
QSV_REDIS_MAX_POOL_SIZE |
the maximum Redis connection pool size. (default: 20). |
QSV_REDIS_TTL_SECONDS |
set time-to-live of Redis cached values (default (seconds): 2419200 (28 days)). |
QSV_REDIS_TTL_REFRESH |
if set, enables cache hits to refresh TTL of Redis cached values. |
QSV_TIMEOUT |
for commands with a --timeout option (fetch , fetchpost , luau , sniff and validate ), the number of seconds before a web request times out (default: 30). |
QSV_USER_AGENT |
the user-agent to use for web requests. When specifying a custom user agent. It supports the following variables - $QSV_VERSION, $QSV_TARGET, $QSV_BIN_NAME and $QSV_KIND. Try to conform to the IETF RFC 72321 standard. See here for examples. (default: $QSV_BIN_NAME/$QSV_VERSION ($QSV_TARGET; $QSV_KIND; https://github.com/dathere/qsv) - e.g. qsv/0.105.0 (x86_64-unknown-linux; prebuilt; https://github.com/dathere/qsv) ). |
Several dependencies also have environment variables that influence qsv's performance & behavior:
-
Memory Allocator
When incorporating qsv into a data pipeline that runs in batch mode, particularly with very large CSV files using qsv commands that load entire CSV files into memory, you can fine tune qsv's memory allocator run-time behavior using the environment variables for the allocator you're using: -
Network Access (reqwest)
qsv uses reqwest and will honor proxy settings set through theHTTP_PROXY
,HTTPS_PROXY
,ALL_PROXY
&NO_PROXY
environment variables. -
Polars
qsv uses polars for several commands - currentlycount
,joinp
andsqlp
. Polars has its own set of environment variables that can be set to influence its behavior (see here). The most relevant ones are:POLARS_VERBOSE
- if set to 1, polars will output logging messages to stderr.POLARS_PANIC_ON_ERR
- if set to 1, panics on polars-related errors, instead of returning an error.POLARS_BACKTRACE_IN_ERR
- if set to 1, includes backtrace in polars-related error messages.
ℹ️ NOTE: To get a list of all active qsv-relevant environment variables, run
qsv --envlist
. Relevant env vars are defined as anything that starts withQSV_
,MIMALLOC_
,JEMALLOC_
,MALLOC_CONF
& the proxy variables listed above.
qsv supports the use of .env
files to set environment variables. The .env
file is a simple text file that contains key-value pairs, one per line.
It processes .env
files as follows:
- Upon invocation, qsv will check if the
QSV_DOTENV_PATH
environment variable is set. If it is, it will look for the file specified by the variable. If the file is found, it will be processed. - If the
QSV_DOTENV_PATH
environment variable is not set, qsv will look for a file named.env
in the current working directory. If one is found, it will be processed. - If no
.env
file is not found in the current working directory, qsv will next look for an.env
file with the same filestem as the binary in the directory where the binary is (e.g. ifqsv
/qsvlite
/qsvdp
is in/usr/local/bin
, it will look for/usr/local/bin/qsv.env
,/usr/local/bin/qsvlite.env
or/usr/local/bin/qsvdp.env
respectively). - If no
.env
files are found, qsv will proceed with its default settings and the current environment variables, which may include "QSV_" variables.
When processing .env
files, qsv will:
- overwrite any existing environment variables with the same name
- where multiple declarations of the same variable exist, the last one will be used
- ignore any lines that start with
#
(comments)
To facilitate the use of .env
files, a dotenv.template
file is included in the qsv distribution. This file contains all the environment variables that qsv recognizes, along with their default values. Copy the template to a file named '.env' and modify it to suit your needs.