diff --git a/docs/_index.md b/docs/_index.md index b3bf7f12e3..3986352f27 100644 --- a/docs/_index.md +++ b/docs/_index.md @@ -5,17 +5,17 @@ heading: SuperDB --- SuperDB offers a new approach that makes it easier to manipulate and manage -your data. With its [super-structured data model](formats/_index.md#2-a-super-structured-pattern), +your data. With its [super-structured data model](formats#2-a-super-structured-pattern), messy JSON data can easily be given the fully-typed precision of relational tables without giving up JSON's uncanny ability to represent eclectic data. ## Getting Started -Trying out SuperDB is easy: just [install](install.md) the command-line tool -[`super`](commands/super.md) and run through the [tutorial](tutorials/zq.md). +Trying out SuperDB is easy: just [install](install) the command-line tool +[`super`](commands/super) and run through the [tutorial](tutorials/zq). Compared to putting JSON data in a relational column, the -[super-structured data model](formats/zed.md) makes it really easy to +[super-structured data model](formats/zed) makes it really easy to mash up JSON with your relational tables. The `super` command is a little like [DuckDB](https://duckdb.org/) and a little like [`jq`](https://stedolan.github.io/jq/) but super-structured data ties the @@ -25,44 +25,44 @@ For a non-technical user, SuperDB is as easy to use as web search while for a technical user, SuperDB exposes its technical underpinnings in a gradual slope, providing as much detail as desired, packaged up in the easy-to-understand -[Super JSON data format](formats/jsup.md) and -[SuperPipe language](language/_index.md). +[Super JSON data format](formats/jsup) and +[SuperPipe language](language). While `super` and its accompanying data formats are production quality, the project's -[SuperDB data lake](commands/super-db.md) is a bit earlier in development. +[SuperDB data lake](commands/super-db) is a bit earlier in development. ## Terminology "Super" is an umbrella term that describes a number of different elements of the system: -* The [super data model](formats/zed.md) is the abstract definition of the data types and semantics +* The [super data model](formats/zed) is the abstract definition of the data types and semantics that underlie the super-structured data formats. -* The [super data formats](formats/_index.md) are a family of -[human-readable (Super JSON, JSUP)](formats/jsup.md), -[sequential (Super Binary, BSUP)](formats/bsup.md), and -[columnar (Super Columnar, CSUP)](formats/csup.md) formats that all adhere to the +* The [super data formats](formats) are a family of +[human-readable (Super JSON, JSUP)](formats/jsup), +[sequential (Super Binary, BSUP)](formats/bsup), and +[columnar (Super Columnar, CSUP)](formats/csup) formats that all adhere to the same abstract super data model. -* The [SuperPipe language](language/_index.md) is the system's pipeline language for performing +* The [SuperPipe language](language) is the system's pipeline language for performing queries, searches, analytics, transformations, or any of the above combined together. -* A [SuperPipe query](language/overview.md) is a script that performs +* A [SuperPipe query](language/overview) is a script that performs search and/or analytics. -* A [SuperPipe shaper](language/shaping.md) is a script that performs +* A [SuperPipe shaper](language/shaping) is a script that performs data transformation to _shape_ the input data into the desired set of organizing super-structured data types called "shapes", which are traditionally called _schemas_ in relational systems but are much more flexible in SuperDB. -* A [SuperDB data lake](commands/super-db.md) is a collection of super-structured data stored -across one or more [data pools](commands/super-db.md#data-pools) with ACID commit semantics and +* A [SuperDB data lake](commands/super-db) is a collection of super-structured data stored +across one or more [data pools](commands/super-db#data-pools) with ACID commit semantics and accessed via a [Git](https://git-scm.com/)-like API. ## Digging Deeper -The [SuperPipe language documentation](language/_index.md) +The [SuperPipe language documentation](language) is the best way to learn about `super` in depth. All of its examples use `super` commands run on the command line. Run `super -h` for a list of command options and online help. -The [`super db` documentation](commands/super-db.md) +The [`super db` documentation](commands/super-db) is the best way to learn about the SuperDB data lake. All of its examples use `super db` commands run on the command line. Run `super db -h` or `-h` with any subcommand for a list of command options @@ -92,7 +92,7 @@ or other third-party services to interpret the lake data. Once copied, a new service can be instantiated by pointing a `super db serve` at the copy of the lake. -Functionality like [data compaction](commands/super-db.md#manage) and retention are all API-driven. +Functionality like [data compaction](commands/super-db#manage) and retention are all API-driven. Bite-sized components are unified by the super-structured data, usually in the SUPZ format: * All lake meta-data is available via meta-queries. diff --git a/docs/commands/_index.md b/docs/commands/_index.md index f618316376..02ad79c4ac 100644 --- a/docs/commands/_index.md +++ b/docs/commands/_index.md @@ -3,15 +3,15 @@ title: Commands weight: 2 --- -The [`super` command](super.md) is used to execute command-line queries on -inputs from files, HTTP URLs, or [S3](../integrations/amazon-s3.md). +The [`super` command](super) is used to execute command-line queries on +inputs from files, HTTP URLs, or [S3](../integrations/amazon-s3). -The [`super db` sub-commands](super-db.md) are for creating, configuring, ingesting +The [`super db` sub-commands](super-db) are for creating, configuring, ingesting into, querying, and orchestrating SuperDB data lakes. These sub-commands are organized into further subcommands like the familiar command patterns of `docker` or `kubectl`. -All operations with these commands utilize the [super data model](../formats/_index.md) -and provide querying via [SuperSQL](../language/_index.md). +All operations with these commands utilize the [super data model](../formats) +and provide querying via [SuperSQL](../language). Built-in help for `super` and all sub-commands is always accessible with the `-h` flag. diff --git a/docs/commands/super-db.md b/docs/commands/super-db.md index febae54da4..77c8699b08 100644 --- a/docs/commands/super-db.md +++ b/docs/commands/super-db.md @@ -5,7 +5,7 @@ title: super db > **TL;DR** `super db` is a sub-command of `super` to manage and query SuperDB data lakes. > You can import data from a variety of formats and it will automatically -> be committed in [super-structured](../formats/_index.md) +> be committed in [super-structured](../formats) > format, providing full fidelity of the original format and the ability > to reconstruct the original data without loss of information. > @@ -16,13 +16,13 @@ title: super db

:::tip Status -While [`super`](super.md) and its accompanying [formats](../formats/_index.md) +While [`super`](super) and its accompanying [formats](../formats) are production quality, the SuperDB data lake is still fairly early in development and alpha quality. That said, SuperDB data lakes can be utilized quite effectively at small scale, or at larger scales when scripted automation is deployed to manage the lake's data layout via the -[lake API](../lake/api.md). +[lake API](../lake/api). Enhanced scalability with self-tuning configuration is under development. ::: @@ -32,7 +32,7 @@ Enhanced scalability with self-tuning configuration is under development. A SuperDB data lake is a cloud-native arrangement of data, optimized for search, analytics, ETL, data discovery, and data preparation at scale based on data represented in accordance -with the [super data model](../formats/zed.md). +with the [super data model](../formats/zed). A lake is organized into a collection of data pools forming a single administrative domain. The current implementation supports @@ -70,7 +70,7 @@ bite-sized chunks for learning how the system works and how the different components come together. While the CLI-first approach provides these benefits, -all of the functionality is also exposed through [an API](../lake/api.md) to +all of the functionality is also exposed through [an API](../lake/api) to a lake service. Many use cases involve an application like [SuperDB Desktop](https://zui.brimdata.io/) or a programming environment like Python/Pandas interacting @@ -115,8 +115,8 @@ replication easy to support and deploy. The cloud objects that comprise a lake, e.g., data objects, commit history, transaction journals, partial aggregations, etc., -are stored as super-structured data, i.e., either as [row-based Super Binary](../formats/bsup.md) -or [Super Columnar](../formats/csup.md). +are stored as super-structured data, i.e., either as [row-based Super Binary](../formats/bsup) +or [Super Columnar](../formats/csup). This makes introspection of the lake structure straightforward as many key lake data structures can be queried with metadata queries and presented to a client for further processing by downstream tooling. @@ -261,11 +261,11 @@ which is the sort key for all data stored in the lake. Different data pools can have different pool keys but all of the data in a pool must have the same pool key. -As pool data is often comprised of [records](../formats/zed.md#21-record) (analogous to JSON objects), +As pool data is often comprised of [records](../formats/zed#21-record) (analogous to JSON objects), the pool key is typically a field of the stored records. When pool data is not structured as records/objects (e.g., scalar or arrays or other non-record types), then the pool key would typically be configured -as the [special value `this`](../language/pipeline-model.md#the-special-value-this). +as the [special value `this`](../language/pipeline-model#the-special-value-this). Data can be efficiently scanned if a query has a filter operating on the pool key. For example, on a pool with pool key `ts`, the query `ts == 100` @@ -294,7 +294,7 @@ optimize scans over such data is impaired. Because commits are transactional and immutable, a query sees its entire data scan as a fixed "snapshot" with respect to the -commit history. In fact, the [`from` operator](../language/operators/from.md) +commit history. In fact, the [`from` operator](../language/operators/from) allows a commit object to be specified with the `@` suffix to a pool reference, e.g., ``` @@ -331,7 +331,7 @@ Time travel using timestamps is a forthcoming feature. ## `super db` Commands -While `super db` is itself a sub-command of [`super`](super.md), it invokes +While `super db` is itself a sub-command of [`super`](super), it invokes a large number of interrelated sub-commands, similar to the [`docker`](https://docs.docker.com/engine/reference/commandline/cli/) or [`kubectl`](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands) @@ -347,16 +347,16 @@ for that sub-command. sub-command and so forth. By default, commands that display lake metadata (e.g., [`log`](#log) or -[`ls`](#ls)) use the human-readable [lake metadata output](super.md#superdb-data-lake-metadata-output) +[`ls`](#ls)) use the human-readable [lake metadata output](super#superdb-data-lake-metadata-output) format. However, the `-f` option can be used to specify any supported -[output format](super.md#output-formats). +[output format](super#output-formats). ### Auth ``` super db auth login|logout|method|verify ``` Access to a lake can be secured with [Auth0 authentication](https://auth0.com/). -A [guide](../integrations/zed-lake-auth.md) is available with example configurations. +A [guide](../integrations/zed-lake-auth) is available with example configurations. Please reach out to us on our [community Slack](https://www.brimdata.io/join-slack/) if you have feedback on your experience or need additional help. @@ -403,7 +403,7 @@ The `-orderby` option indicates the [pool key](#pool-key) that is used to sort the data in lake, which may be in ascending or descending order. If a pool key is not specified, then it defaults to -the [special value `this`](../language/pipeline-model.md#the-special-value-this). +the [special value `this`](../language/pipeline-model#the-special-value-this). A newly created pool is initialized with a branch called `main`. @@ -468,12 +468,12 @@ a "table" as all super-structured data is _self describing_ and can be queried i schema-agnostic fashion. Data of any _shape_ can be stored in any pool and arbitrary data _shapes_ can coexist side by side. -As with [`super`](super.md), -the [input arguments](super.md#usage) can be in -any [supported format](super.md#input-formats) and +As with [`super`](super), +the [input arguments](super#usage) can be in +any [supported format](super#input-formats) and the input format is auto-detected if `-i` is not provided. Likewise, the inputs may be URLs, in which case, the `load` command streams -the data from a Web server or [S3](../integrations/amazon-s3.md) and into the lake. +the data from a Web server or [S3](../integrations/amazon-s3) and into the lake. When data is loaded, it is broken up into objects of a target size determined by the pool's `threshold` parameter (which defaults to 500MiB but can be configured @@ -538,7 +538,7 @@ The `date` field here is used by the lake system to do [time travel](#time-trave through the branch and pool history, allowing you to see the state of branches at any time in their commit history. -Arbitrary metadata expressed as any [Super JSON value](../formats/jsup.md) +Arbitrary metadata expressed as any [Super JSON value](../formats/jsup) may be attached to a commit via the `-meta` flag. This allows an application or user to transactionally commit metadata alongside committed data for any purpose. This approach allows external applications to implement arbitrary @@ -547,7 +547,7 @@ commit history. Since commit objects are stored as super-structured data, the metadata can easily be queried by running the `log -f bsup` to retrieve the log in Super Binary format, -for example, and using [`super`](super.md) to pull the metadata out +for example, and using [`super`](super) to pull the metadata out as in: ``` super db log -f bsup | super -c 'has(meta) | yield {id,meta}' - @@ -580,7 +580,7 @@ A commit object includes an optional author and message, along with a required timestamp, that is stored in the commit journal for reference. These values may be specified as options to the [`load`](#load) command, and are also available in the -[lake API](../lake/api.md) for automation. +[lake API](../lake/api) for automation. :::tip note The branchlog meta-query source is not yet implemented. @@ -612,7 +612,7 @@ If the `-monitor` option is specified and the lake is [located](#locating-the-la via network connection, `super db manage` will run continuously and perform updates as needed. By default a check is performed once per minute to determine if updates are necessary. The `-interval` option may be used to specify an -alternate check frequency in [duration format](../formats/jsup.md#23-primitive-values). +alternate check frequency in [duration format](../formats/jsup#23-primitive-values). If `-monitor` is not specified, a single maintenance pass is performed on the lake. @@ -657,13 +657,13 @@ according to configured policies and logic. ``` super db query [options] ``` -The `query` command runs a [SuperSQL](../language/_index.md) query with data from a lake as input. -A query typically begins with a [`from` operator](../language/operators/from.md) +The `query` command runs a [SuperSQL](../language) query with data from a lake as input. +A query typically begins with a [`from` operator](../language/operators/from) indicating the pool and branch to use as input. The pool/branch names are specified with `from` in the query. -As with [`super`](super.md), the default output format is Super JSON for +As with [`super`](super), the default output format is Super JSON for terminals and Super Binary otherwise, though this can be overridden with `-f` to specify one of the various supported output formats. @@ -685,7 +685,7 @@ Filters on pool keys are efficiently implemented as the data is laid out according to the pool key and seek indexes keyed by the pool key are computed for each data object. -When querying data to the [Super Binary](../formats/bsup.md) output format, +When querying data to the [Super Binary](../formats/bsup) output format, output from a pool can be easily piped to other commands like `super`, e.g., ``` super db query -f bsup 'from logs' | super -f table -c 'count() by field' - @@ -699,7 +699,7 @@ By default, the `query` command scans pool data in pool-key order though the query optimizer may, in general, reorder the scan to optimize searches, aggregations, and joins. An order hint can be supplied to the `query` command to indicate to -the optimizer the desired processing order, but in general, [`sort` operators](../language/operators/sort.md) +the optimizer the desired processing order, but in general, [`sort` operators](../language/operators/sort) should be used to guarantee any particular sort order. Arbitrarily complex queries can be executed over the lake in this fashion @@ -773,7 +773,7 @@ super db serve [options] ``` The `serve` command implements the [server personality](#command-personalities) to service requests from instances of the client personality. -It listens for [lake API](../lake/api.md) requests on the interface and port +It listens for [lake API](../lake/api) requests on the interface and port specified by the `-l` option, executes the requests, and returns results. The `-log.level` option controls log verbosity. Available levels, ordered diff --git a/docs/commands/super.md b/docs/commands/super.md index 2feb835b8c..99eb9a46de 100644 --- a/docs/commands/super.md +++ b/docs/commands/super.md @@ -3,10 +3,10 @@ weight: 1 title: super --- -> **TL;DR** `super` is a command-line tool that uses [SuperSQL](../language/_index.md) -> to query a variety of data formats in files, over HTTP, or in [S3](../integrations/amazon-s3.md) +> **TL;DR** `super` is a command-line tool that uses [SuperSQL](../language) +> to query a variety of data formats in files, over HTTP, or in [S3](../integrations/amazon-s3) > storage. Best performance is achieved when operating on data in binary formats such as -> [Super Binary](../formats/bsup.md), [Super Columnar](../formats/csup.md), +> [Super Binary](../formats/bsup), [Super Columnar](../formats/csup), > [Parquet](https://github.com/apache/parquet-format), or > [Arrow](https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format). @@ -18,7 +18,7 @@ super [ options ] [ -c query ] input [ input ... ] `super` is a command-line tool for processing data in diverse input formats, providing data wrangling, search, analytics, and extensive transformations -using the [SuperSQL](../language/_index.md) dialect of SQL. Any SQL query expression +using the [SuperSQL](../language) dialect of SQL. Any SQL query expression may be extended with [pipe syntax](https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/) to filter, transform, and/or analyze input data. Super's SQL pipes dialect is extensive, so much so that it can resemble @@ -26,12 +26,12 @@ a log-search experience despite its SQL foundation. The `super` command works with data from ephemeral sources like files and URLs. If you want to persist your data into a data lake for persistent storage, -check out the [`super db`](super-db.md) set of commands. +check out the [`super db`](super-db) set of commands. -By invoking the `-c` option, a query expressed in the [SuperSQL language](../language/_index.md) +By invoking the `-c` option, a query expressed in the [SuperSQL language](../language) may be specified and applied to the input stream. -The [super data model](../formats/zed.md) is based on [super-structured data](../formats/_index.md#2-a-super-structured-pattern), meaning that all data +The [super data model](../formats/zed) is based on [super-structured data](../formats#2-a-super-structured-pattern), meaning that all data is both strongly _and_ dynamically typed and need not conform to a homogeneous schema. The type structure is self-describing so it's easy to daisy-chain queries and inspect data at any point in a complex query or data pipeline. @@ -96,7 +96,7 @@ is equivalent to SELECT VALUE 1+1 ``` To learn more about shortcuts, refer to the SuperSQL -[documentation on shortcuts](../language/pipeline-model.md#implied-operators). +[documentation on shortcuts](../language/pipeline-model#implied-operators). For built-in command help and a listing of all available options, simply run `super` with no arguments. @@ -104,9 +104,9 @@ simply run `super` with no arguments. ## Data Formats `super` supports a number of [input](#input-formats) and [output](#output-formats) formats, but the super formats -([Super Binary](../formats/bsup.md), -[Super Columnar](../formats/csup.md), -and [Super JSON](../formats/jsup.md)) tend to be the most versatile and +([Super Binary](../formats/bsup), +[Super Columnar](../formats/csup), +and [Super JSON](../formats/jsup)) tend to be the most versatile and easy to work with. `super` typically operates on binary-encoded data and when you want to inspect @@ -126,12 +126,12 @@ in the order appearing on the command line forming the input stream. | Option | Auto | Specification | |-----------|------|------------------------------------------| | `arrows` | yes | [Arrow IPC Stream Format](https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format) | -| `bsup` | yes | [Super Binary](../formats/bsup.md) | -| `csup` | yes | [Super Columnar](../formats/csup.md) | +| `bsup` | yes | [Super Binary](../formats/bsup) | +| `csup` | yes | [Super Columnar](../formats/csup) | | `csv` | yes | [Comma-Separated Values (RFC 4180)](https://www.rfc-editor.org/rfc/rfc4180.html) | | `json` | yes | [JSON (RFC 8259)](https://www.rfc-editor.org/rfc/rfc8259.html) | -| `jsup` | yes | [Super JSON](../formats/jsup.md) | -| `zjson` | yes | [Super JSON over JSON](../formats/zjson.md) | +| `jsup` | yes | [Super JSON](../formats/jsup) | +| `zjson` | yes | [Super JSON over JSON](../formats/zjson) | | `line` | no | One string value per input line | | `parquet` | yes | [Apache Parquet](https://github.com/apache/parquet-format) | | `tsv` | yes | [Tab-Separated Values](https://en.wikipedia.org/wiki/Tab-separated_values) | @@ -177,7 +177,7 @@ would produce this output in the default Super JSON format #### JSON Auto-detection: Super vs. Plain -Since [Super JSON](../formats/jsup.md) is a superset of plain JSON, `super` must be careful how it distinguishes the two cases when performing auto-inference. +Since [Super JSON](../formats/jsup) is a superset of plain JSON, `super` must be careful how it distinguishes the two cases when performing auto-inference. While you can always clarify your intent via `-i jsup` or `-i json`, `super` attempts to "just do the right thing" when you run it with Super JSON vs. plain JSON. @@ -188,8 +188,8 @@ not desirable because (1) the Super JSON parser is not particularly performant a JSON any number that appears without a decimal point as an integer type. :::tip note -The reason `super` is not particularly performant for Super JSON is that the [Super Binary](../formats/bsup.md) or -[Super Columnar](../formats/csup.md) formats are semantically equivalent to Super JSON but much more efficient and +The reason `super` is not particularly performant for Super JSON is that the [Super Binary](../formats/bsup) or +[Super Columnar](../formats/csup) formats are semantically equivalent to Super JSON but much more efficient and the design intent is that these efficient binary formats should be used in use cases where performance matters. Super JSON is typically used only when data needs to be human-readable in interactive settings or in automated tests. @@ -210,12 +210,12 @@ typically omit quotes around field names. | Option | Specification | |-----------|------------------------------------------| | `arrows` | [Arrow IPC Stream Format](https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format) | -| `bsup` | [Super Binary](../formats/bsup.md) | -| `csup` | [Super Columnar](../formats/csup.md) | +| `bsup` | [Super Binary](../formats/bsup) | +| `csup` | [Super Columnar](../formats/csup) | | `csv` | [Comma-Separated Values (RFC 4180)](https://www.rfc-editor.org/rfc/rfc4180.html) | | `json` | [JSON (RFC 8259)](https://www.rfc-editor.org/rfc/rfc8259.html) | -| `jsup` | [Super JSON](../formats/jsup.md) | -| `zjson` | [Super JSON over JSON](../formats/zjson.md) | +| `jsup` | [Super JSON](../formats/jsup) | +| `zjson` | [Super JSON over JSON](../formats/zjson) | | `lake` | [SuperDB Data Lake Metadata Output](#superdb-data-lake-metadata-output) | | `parquet` | [Apache Parquet](https://github.com/apache/parquet-format) | | `table` | (described [below](#simplified-text-outputs)) | @@ -355,7 +355,7 @@ parquetio: encountered multiple types (consider 'fuse'): {x:int64} and {s:string ##### Fusing Schemas -As suggested by the error above, the [`fuse` operator](../language/operators/fuse.md) can merge different record +As suggested by the error above, the [`fuse` operator](../language/operators/fuse) can merge different record types into a blended type, e.g., here we create the file and read it back: ```mdtest-command echo '{x:1}{s:"hello"}' | super -o out.parquet -f parquet -c fuse - @@ -404,8 +404,8 @@ input formats. They may be a good fit for use with other text-based shell tools, but due to their limitations should be used with care. In `text` output, minimal formatting is applied, e.g., strings are shown -without quotes and brackets are dropped from [arrays](../formats/zed.md#22-array) -and [sets](../formats/zed.md#23-set). [Records](../formats/zed.md#21-record) +without quotes and brackets are dropped from [arrays](../formats/zed#22-array) +and [sets](../formats/zed#23-set). [Records](../formats/zed#21-record) are printed as tab-separated field values without their corresponding field names. For example: @@ -448,7 +448,7 @@ word style hello greeting ``` -If this is undesirable, the [`fuse` operator](../language/operators/fuse.md) +If this is undesirable, the [`fuse` operator](../language/operators/fuse) may prove useful to unify the input stream under a single record type that can be described with a single header line. Doing this to our last example, we find @@ -466,12 +466,12 @@ hello - greeting #### SuperDB Data Lake Metadata Output The `lake` format is used to pretty-print lake metadata, such as in -[`super db` sub-command](super-db.md) outputs. Because it's `super db`'s default output format, +[`super db` sub-command](super-db) outputs. Because it's `super db`'s default output format, it's rare to request it explicitly via `-f`. However, since it's possible for -`super db` to [generate output in any supported format](super-db.md#super-db-commands), +`super db` to [generate output in any supported format](super-db#super-db-commands), the `lake` format is useful to reverse this. -For example, imagine you'd executed a [meta-query](super-db.md#meta-queries) via +For example, imagine you'd executed a [meta-query](super-db#meta-queries) via `super db query -Z "from :pools"` and saved the output in this file `pools.jsup`. ```mdtest-input pools.jsup @@ -509,13 +509,13 @@ If you are ever stumped about how the `super` compiler is parsing your query, you can always run `super -C` to compile and display your query in canonical form without running it. This can be especially handy when you are learning the language and -[its shortcuts](../language/pipeline-model.md#implied-operators). +[its shortcuts](../language/pipeline-model#implied-operators). For example, this query ```mdtest-command super -C -c 'has(foo)' ``` -is an implied [`where` operator](../language/operators/where.md), which matches values +is an implied [`where` operator](../language/operators/where), which matches values that have a field `foo`, i.e., ```mdtest-output where has(foo) @@ -524,7 +524,7 @@ while this query ```mdtest-command super -C -c 'a:=x+1' ``` -is an implied [`put` operator](../language/operators/put.md), which creates a new field `a` +is an implied [`put` operator](../language/operators/put), which creates a new field `a` with the value `x+1`, i.e., ```mdtest-output put a:=x+1 @@ -538,10 +538,10 @@ as soon as they happen and cause the `super` process to exit. On the other hand, runtime errors resulting from the query itself do not halt execution. Instead, these error conditions produce -[first-class errors](../language/data-types.md#first-class-errors) +[first-class errors](../language/data-types#first-class-errors) in the data output stream interleaved with any valid results. Such errors are easily queried with the -[`is_error` function](../language/functions/is_error.md). +[`is_error` function](../language/functions/is_error). This approach provides a robust technique for debugging complex queries, where errors can be wrapped in one another providing stack-trace-like debugging @@ -571,15 +571,15 @@ error("divide by zero") ## Examples -As you may have noticed, many examples of the [SuperSQL language](../language/_index.md) +As you may have noticed, many examples of the [SuperSQL language](../language) are illustrated using this pattern ``` echo | super -c - ``` -which is used throughout the [language documentation](../language/_index.md) -and [operator reference](../language/operators/_index.md). +which is used throughout the [language documentation](../language) +and [operator reference](../language/operators). -The language documentation and [tutorials directory](../tutorials/_index.md) +The language documentation and [tutorials directory](../tutorials) have many examples, but here are a few more simple `super` use cases. _Hello, world_ @@ -591,7 +591,7 @@ produces this Super JSON output "hello, world" ``` -_Some values of available [data types](../language/data-types.md)_ +_Some values of available [data types](../language/data-types)_ ```mdtest-command echo '1 1.5 [1,"foo"] |["apple","banana"]|' | super -z - ``` @@ -613,7 +613,7 @@ produces <[(int64,string)]> <|[string]|> ``` -_A simple [aggregation](../language/aggregates/_index.md)_ +_A simple [aggregation](../language/aggregates)_ ```mdtest-command echo '{key:"foo",val:1}{key:"bar",val:2}{key:"foo",val:3}' | super -z -c 'sum(val) by key | sort key' - @@ -623,7 +623,7 @@ produces {key:"bar",sum:2} {key:"foo",sum:4} ``` -_Read CSV input and [cast](../language/functions/cast.md) a to an integer from default float_ +_Read CSV input and [cast](../language/functions/cast) a to an integer from default float_ ```mdtest-command printf "a,b\n1,foo\n2,bar\n" | super -z -c 'a:=int64(a)' - ``` @@ -669,7 +669,7 @@ measurements among SuperDB, We'll use the Parquet format to compare apples to apples and also report results for the custom columnar database format of DuckDB, the [new beta JSON type](https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse) of ClickHouse, -and the [Super Binary](../formats/bsup.md) format used by `super`. +and the [Super Binary](../formats/bsup) format used by `super`. The detailed steps shown [below](#appendix-2-running-the-tests) can be reproduced via [automated scripts](https://github.com/brimdata/super/blob/main/scripts/super-cmd-perf). @@ -725,7 +725,7 @@ clickhouse-client --query " INSERT INTO gha SELECT * FROM file('gharchive_gz/*.json.gz', JSONAsObject);" ``` To create a super-structed file for the `super` command, there is no need to -[`fuse`](../language/operators/fuse.md) the data into a single schema (though `super` can still work with the fused +[`fuse`](../language/operators/fuse) the data into a single schema (though `super` can still work with the fused schema in the Parquet file), and we simply ran this command to create a Super Binary file: ``` @@ -781,7 +781,7 @@ FROM 'gha' WHERE v.payload.pull_request.body LIKE '%in case you have any feedback 😊%' ``` SuperSQL supports `LIKE` and could run the plain SQL query, but it also has a -similar function called [`grep`](../language/functions/grep.md) that can operate over specified fields or +similar function called [`grep`](../language/functions/grep) that can operate over specified fields or default to all the string fields in any value. The SuperSQL query that uses `grep` is ```sql @@ -949,7 +949,7 @@ Dynamic: In scope SELECT tupleElement(arrayJoin(v.payload.pull_request.assignees SuperSQL's data model does not require these kinds of gymnastics as everything does not have to be jammed into a table. Instead, we can use the -`UNNEST` pipe operator combined with the [spread operator](../language/expressions.md#array-expressions) applied to the array of +`UNNEST` pipe operator combined with the [spread operator](../language/expressions#array-expressions) applied to the array of string fields to easily produce a stream of string values representing the assignees. Then we simply aggregate the assignee stream: ```sql @@ -1092,7 +1092,7 @@ WHERE v.payload.pull_request.body LIKE '%in case you have any feedback 😊%' Benchmark 1: clickhouse-client --queries-file /mnt/tmpdir/tmp.oymd2K7311 2 Time (abs ≡): 0.904 s [User: 0.038 s, System: 0.030 s] - + About to execute ================ clickhouse --queries-file /mnt/tmpdir/tmp.K3EjBntwdo @@ -1107,7 +1107,7 @@ WHERE payload.pull_request.body LIKE '%in case you have any feedback 😊%' Benchmark 1: clickhouse --queries-file /mnt/tmpdir/tmp.K3EjBntwdo 2 Time (abs ≡): 70.647 s [User: 70.320 s, System: 3.447 s] - + About to execute ================ datafusion-cli --file /mnt/tmpdir/tmp.zSkYYYeSG6 @@ -1126,11 +1126,11 @@ DataFusion CLI v43.0.0 +---------+ | 2 | +---------+ -1 row(s) fetched. +1 row(s) fetched. Elapsed 10.764 seconds. Time (abs ≡): 10.990 s [User: 66.344 s, System: 10.974 s] - + About to execute ================ duckdb /mnt/gha.db < /mnt/tmpdir/tmp.31z1ThfK6B @@ -1150,7 +1150,7 @@ Benchmark 1: duckdb /mnt/gha.db < /mnt/tmpdir/tmp.31z1ThfK6B │ 2 │ └──────────────┘ Time (abs ≡): 12.985 s [User: 78.328 s, System: 9.270 s] - + About to execute ================ duckdb < /mnt/tmpdir/tmp.x2HfLY0RBU @@ -1170,7 +1170,7 @@ Benchmark 1: duckdb < /mnt/tmpdir/tmp.x2HfLY0RBU │ 2 │ └──────────────┘ Time (abs ≡): 13.356 s [User: 89.551 s, System: 6.785 s] - + About to execute ================ super -z -I /mnt/tmpdir/tmp.KmM8c3l1gb @@ -1208,7 +1208,7 @@ WHERE Benchmark 1: clickhouse-client --queries-file /mnt/tmpdir/tmp.tgIZkIc6XA 3 Time (abs ≡): 13.244 s [User: 0.058 s, System: 0.022 s] - + About to execute ================ clickhouse --queries-file /mnt/tmpdir/tmp.0ENj1f6lI8 @@ -1227,7 +1227,7 @@ WHERE Benchmark 1: clickhouse --queries-file /mnt/tmpdir/tmp.0ENj1f6lI8 3 Time (abs ≡): 870.218 s [User: 950.089 s, System: 18.760 s] - + About to execute ================ datafusion-cli --file /mnt/tmpdir/tmp.veTUjcdQto @@ -1250,11 +1250,11 @@ DataFusion CLI v43.0.0 +---------+ | 3 | +---------+ -1 row(s) fetched. +1 row(s) fetched. Elapsed 21.422 seconds. Time (abs ≡): 21.661 s [User: 129.457 s, System: 19.646 s] - + About to execute ================ duckdb /mnt/gha.db < /mnt/tmpdir/tmp.CcmsLBMCmv @@ -1278,7 +1278,7 @@ Benchmark 1: duckdb /mnt/gha.db < /mnt/tmpdir/tmp.CcmsLBMCmv │ 3 │ └──────────────┘ Time (abs ≡): 20.043 s [User: 137.850 s, System: 10.587 s] - + About to execute ================ duckdb < /mnt/tmpdir/tmp.BI1AC3TnV2 @@ -1302,7 +1302,7 @@ Benchmark 1: duckdb < /mnt/tmpdir/tmp.BI1AC3TnV2 │ 3 │ └──────────────┘ Time (abs ≡): 21.352 s [User: 144.078 s, System: 9.044 s] - + About to execute ================ super -z -I /mnt/tmpdir/tmp.v0WfEuBi8J @@ -1336,7 +1336,7 @@ WHERE v.actor.login='johnbieren' Benchmark 1: clickhouse-client --queries-file /mnt/tmpdir/tmp.CFT0wwiAbD 879 Time (abs ≡): 0.080 s [User: 0.025 s, System: 0.018 s] - + About to execute ================ clickhouse --queries-file /mnt/tmpdir/tmp.XFTW0X911r @@ -1351,7 +1351,7 @@ WHERE actor.login='johnbieren' Benchmark 1: clickhouse --queries-file /mnt/tmpdir/tmp.XFTW0X911r 879 Time (abs ≡): 0.954 s [User: 0.809 s, System: 0.164 s] - + About to execute ================ datafusion-cli --file /mnt/tmpdir/tmp.QLU5fBDx7L @@ -1370,11 +1370,11 @@ DataFusion CLI v43.0.0 +---------+ | 879 | +---------+ -1 row(s) fetched. +1 row(s) fetched. Elapsed 0.340 seconds. Time (abs ≡): 0.388 s [User: 1.601 s, System: 0.417 s] - + About to execute ================ duckdb /mnt/gha.db < /mnt/tmpdir/tmp.WVteXNRqfp @@ -1394,7 +1394,7 @@ Benchmark 1: duckdb /mnt/gha.db < /mnt/tmpdir/tmp.WVteXNRqfp │ 879 │ └──────────────┘ Time (abs ≡): 0.177 s [User: 1.011 s, System: 0.137 s] - + About to execute ================ duckdb < /mnt/tmpdir/tmp.b5T64pDmwq @@ -1414,7 +1414,7 @@ Benchmark 1: duckdb < /mnt/tmpdir/tmp.b5T64pDmwq │ 879 │ └──────────────┘ Time (abs ≡): 0.416 s [User: 2.235 s, System: 0.187 s] - + About to execute ================ super -z -I /mnt/tmpdir/tmp.s5e3Ueg2zU @@ -1429,7 +1429,7 @@ WHERE actor.login='johnbieren' Benchmark 1: super -z -I /mnt/tmpdir/tmp.s5e3Ueg2zU {count:879(uint64)} Time (abs ≡): 5.830 s [User: 17.284 s, System: 1.737 s] - + About to execute ================ SUPER_VAM=1 super -z -I /mnt/tmpdir/tmp.2f1t2J9pWR @@ -1472,7 +1472,7 @@ Benchmark 1: clickhouse-client --queries-file /mnt/tmpdir/tmp.hFAMHegng8 30 IssueCommentEvent 35 PullRequestEvent Time (abs ≡): 0.132 s [User: 0.034 s, System: 0.018 s] - + About to execute ================ clickhouse --queries-file /mnt/tmpdir/tmp.MiXEgFCu5o @@ -1495,7 +1495,7 @@ Benchmark 1: clickhouse --queries-file /mnt/tmpdir/tmp.MiXEgFCu5o 3 ForkEvent 35 PullRequestEvent Time (abs ≡): 0.864 s [User: 0.747 s, System: 0.180 s] - + About to execute ================ datafusion-cli --file /mnt/tmpdir/tmp.uI0r2dLw8f @@ -1522,11 +1522,11 @@ DataFusion CLI v43.0.0 | 9 | IssuesEvent | | 29 | WatchEvent | +---------+-------------------------------+ -8 row(s) fetched. +8 row(s) fetched. Elapsed 0.315 seconds. Time (abs ≡): 0.358 s [User: 1.385 s, System: 0.404 s] - + About to execute ================ duckdb /mnt/gha.db < /mnt/tmpdir/tmp.Nqj23A926J @@ -1554,7 +1554,7 @@ Benchmark 1: duckdb /mnt/gha.db < /mnt/tmpdir/tmp.Nqj23A926J │ 35 │ PullRequestEvent │ └──────────────┴───────────────────────────────┘ Time (abs ≡): 0.143 s [User: 0.722 s, System: 0.162 s] - + About to execute ================ duckdb < /mnt/tmpdir/tmp.LepFhAA9Y3 @@ -1582,7 +1582,7 @@ Benchmark 1: duckdb < /mnt/tmpdir/tmp.LepFhAA9Y3 │ 29 │ WatchEvent │ └──────────────┴───────────────────────────────┘ Time (abs ≡): 0.318 s [User: 1.547 s, System: 0.159 s] - + About to execute ================ super -z -I /mnt/tmpdir/tmp.oWK2c4UwIp @@ -1605,7 +1605,7 @@ Benchmark 1: super -z -I /mnt/tmpdir/tmp.oWK2c4UwIp {type:"PullRequestEvent",count:35(uint64)} {type:"PushEvent",count:15(uint64)} Time (abs ≡): 5.692 s [User: 15.531 s, System: 1.644 s] - + About to execute ================ SUPER_VAM=1 super -z -I /mnt/tmpdir/tmp.S1AYE55Oyi @@ -1661,7 +1661,7 @@ tmtmtmtm 356 AMatutat 260 danwinship 208 Time (abs ≡): 72.059 s [User: 142.588 s, System: 6.638 s] - + About to execute ================ datafusion-cli --file /mnt/tmpdir/tmp.bWB9scRPum @@ -1696,11 +1696,11 @@ DataFusion CLI v43.0.0 | AMatutat | 260 | | danwinship | 208 | +-----------------+-------+ -5 row(s) fetched. +5 row(s) fetched. Elapsed 24.234 seconds. Time (abs ≡): 24.575 s [User: 163.931 s, System: 24.758 s] - + About to execute ================ duckdb /mnt/gha.db < /mnt/tmpdir/tmp.3724dO4AgT @@ -1734,7 +1734,7 @@ Benchmark 1: duckdb /mnt/gha.db < /mnt/tmpdir/tmp.3724dO4AgT │ danwinship │ 208 │ └─────────────────┴───────┘ Time (abs ≡): 520.980 s [User: 4062.107 s, System: 15.406 s] - + About to execute ================ duckdb < /mnt/tmpdir/tmp.WcA1AOl9UB @@ -1768,7 +1768,7 @@ Benchmark 1: duckdb < /mnt/tmpdir/tmp.WcA1AOl9UB │ danwinship │ 208 │ └─────────────────┴───────┘ Time (abs ≡): 503.567 s [User: 3747.792 s, System: 10.013 s] - + About to execute ================ super -z -I /mnt/tmpdir/tmp.iTtaFeoj74 diff --git a/docs/formats/_index.md b/docs/formats/_index.md index e7fa09c7bf..c7c999dea3 100644 --- a/docs/formats/_index.md +++ b/docs/formats/_index.md @@ -6,12 +6,12 @@ weight: 5 > **TL;DR** The super data model defines a new and easy way to manage, store, > and process data utilizing an emerging concept called [super-structured data](#2-a-super-structured-pattern). -> The [data model specification](zed.md) defines the high-level model that is realized +> The [data model specification](zed) defines the high-level model that is realized > in a [family of interoperable serialization formats](#3-the-data-model-and-formats), > providing a unified approach to row, columnar, and human-readable formats. Together these > represent a superset of both the dataframe/table model of relational systems and the > semi-structured model that is used ubiquitously in development as JSON and by NoSQL -> data stores. The Super JSON spec has [a few examples](jsup.md#3-examples). +> data stores. The Super JSON spec has [a few examples](jsup#3-examples). ## 1. Background @@ -123,7 +123,7 @@ then such a collection of records looks precisely like a relational table. Here, the record type of such a collection corresponds to a well-defined schema consisting of field names (i.e, column names) where each field has a specific type. -[Named types](../language/data-types.md#named-types) are also available, so by simply naming a particular record type +[Named types](../language/data-types#named-types) are also available, so by simply naming a particular record type (i.e., a schema), a relational table can be projected from a pool of data with a simple query for that named type. @@ -224,7 +224,7 @@ object is just as performant as a traditional schema-based columnar format like ### 2.4 First-class Types -With [first-class types](../language/data-types.md#first-class-types), any type can also be a value, which means that in +With [first-class types](../language/data-types#first-class-types), any type can also be a value, which means that in a properly designed query and analytics system based on the super data model, a type can appear anywhere that a value can appear. In particular, types can be aggregation keys. @@ -256,7 +256,7 @@ In SQL based systems, errors typically result in cryptic messages or null values offering little insight as to the actual cause of the error. -By comparison, SuperDB includes [first-class errors](../language/data-types.md#first-class-errors). When combined with the super +By comparison, SuperDB includes [first-class errors](../language/data-types#first-class-errors). When combined with the super data model, error values may appear anywhere in the output and operators can propagate or easily wrap errors so complicated analytics pipelines can be debugged by observing the location of errors in the output results. @@ -264,24 +264,24 @@ can be debugged by observing the location of errors in the output results. ## 3. The Data Model and Formats The concept of super-structured data and first-class types and errors -is solidified in the [data model specification](zed.md), +is solidified in the [data model specification](zed), which defines the model but not the serialization formats. A set of companion documents define a family of tightly integrated serialization formats that all adhere to the same super data model, providing a unified approach to row, columnar, and human-readable formats: -* [Super JSON](jsup.md) is a human-readable format for super-structured data. All JSON +* [Super JSON](jsup) is a human-readable format for super-structured data. All JSON documents are Super JSON values as the Super JSON format is a strict superset of the JSON syntax. -* [Super Binary](bsup.md) is a row-based, binary representation somewhat like +* [Super Binary](bsup) is a row-based, binary representation somewhat like Avro but leveraging the super data model to represent a sequence of arbitrarily-typed values. -* [Super Columnar](csup.md) is columnar like Parquet or ORC but also +* [Super Columnar](csup) is columnar like Parquet or ORC but also embodies the super data model for heterogeneous and self-describing schemas. -* [Super JSON over JSON](zjson.md) defines a format for encapsulating Super JSON +* [Super JSON over JSON](zjson) defines a format for encapsulating Super JSON inside plain JSON for easy decoding by JSON-based clients, e.g., the [JavaScript library used by SuperDB Desktop](https://github.com/brimdata/zui/tree/main/packages/zed-js) -and the [SuperDB Python library](../libraries/python.md). +and the [SuperDB Python library](../libraries/python). Because all of the formats conform to the same super data model, conversions between a human-readable form, a row-based binary form, and a row-based columnar form can diff --git a/docs/formats/bsup.md b/docs/formats/bsup.md index 333f245d24..5f68790385 100644 --- a/docs/formats/bsup.md +++ b/docs/formats/bsup.md @@ -7,7 +7,7 @@ heading: Super Binary Specification ## 1. Introduction Super Binary is an efficient, sequence-oriented serialization format for any data -conforming to the [super data model](zed.md). +conforming to the [super data model](zed). Super Binary is "row oriented" and analogous to [Apache Avro](https://avro.apache.org) but does not @@ -128,7 +128,7 @@ but is useful to an implementation to deterministically size decompression buffers in advance of decoding. Values for the `format` byte are defined in the -[Super Binary compression format specification](./compression.md). +[Super Binary compression format specification](./compression). :::tip note This arrangement of frames separating types and values allows @@ -212,7 +212,7 @@ of the length of the string followed by that many bytes of UTF-8 encoded string data. :::tip note -As defined by [Super JSON](jsup.md), a field name can be any valid UTF-8 string much like JSON +As defined by [Super JSON](jsup), a field name can be any valid UTF-8 string much like JSON objects can be indexed with arbitrary string keys (via index operator) even if the field names available to the dot operator are restricted by language syntax for identifiers. @@ -309,7 +309,7 @@ existing type ID ``. `` is encoded as a `uvarint` and ` is encoded as a `uvarint` representing the length of the name in bytes, followed by that many bytes of UTF-8 string. -As indicated in the [data model](zed.md), +As indicated in the [data model](zed), it is an error to define a type name that has the same name as a primitive type, and it is permissible to redefine a previously defined type name with a type that differs from the previous definition. diff --git a/docs/formats/compression.md b/docs/formats/compression.md index 4b3ed3e40a..b9cecabf39 100644 --- a/docs/formats/compression.md +++ b/docs/formats/compression.md @@ -5,7 +5,7 @@ heading: ZNG Compression Types --- This document specifies values for the `` byte of a -[Super Binary compressed value message block](bsup.md#2-the-super-binary-format) +[Super Binary compressed value message block](bsup#2-the-super-binary-format) and the corresponding algorithms for the `` byte sequence. As new compression algorithms are specified, they will be documented diff --git a/docs/formats/csup.md b/docs/formats/csup.md index 133a6e7322..7c3bd28ed2 100644 --- a/docs/formats/csup.md +++ b/docs/formats/csup.md @@ -5,9 +5,9 @@ heading: Super Columnar Specification --- Super Columnar is a file format based on -the [super data model](zed.md) where data is stacked to form columns. +the [super data model](zed) where data is stacked to form columns. Its purpose is to provide for efficient analytics and search over -bounded-length sequences of [super-structured data](./_index.md#2-a-super-structured-pattern) that is stored in columnar form. +bounded-length sequences of [super-structured data](.#2-a-super-structured-pattern) that is stored in columnar form. Like [Parquet](https://github.com/apache/parquet-format), Super Columnar provides an efficient representation for semi-structured data, @@ -76,9 +76,9 @@ merging them together (or even leaving the Super Columnar entity as separate fil The data section contains raw data values organized into _segments_, where a segment is a seek offset and byte length relative to the data section. Each segment contains a sequence of -[primitive-type values](zed.md#1-primitive-types), +[primitive-type values](zed#1-primitive-types), encoded as counted-length byte sequences where the counted-length is -variable-length encoded as in the [Super Binary specification](bsup.md). +variable-length encoded as in the [Super Binary specification](bsup). Segments may be compressed. There is no information in the data section for how segments relate @@ -388,7 +388,7 @@ using the same tag within the union value. ### Hello, world -Start with this [Super JSON](jsup.md) file `hello.jsup`: +Start with this [Super JSON](jsup) file `hello.jsup`: ``` {a:"hello",b:"world"} {a:"goodnight",b:"gracie"} diff --git a/docs/formats/jsup.md b/docs/formats/jsup.md index 44d24e8257..f1f1816dfc 100644 --- a/docs/formats/jsup.md +++ b/docs/formats/jsup.md @@ -7,7 +7,7 @@ heading: Super JSON Specification ## 1. Introduction Super JSON is the human-readable, text-based serialization format of -the [super data model](zed.md). +the [super data model](zed). Super JSON builds upon the elegant simplicity of JSON with "type decorators". Where the type of a value is not implied by its syntax, a parenthesized @@ -90,7 +90,7 @@ same name, from the case that an existing named type is merely decorating the va ### 2.3 Primitive Values The type names and format for -[primitive values](zed.md#1-primitive-types) is as follows: +[primitive values](zed#1-primitive-types) is as follows: | Type | Value Format | |------------|---------------------------------------------------------------| @@ -225,13 +225,13 @@ record types as well as enum symbols. Complex values are built from primitive values and/or other complex values and conform to the super data model's complex types: -[record](zed.md#21-record), -[array](zed.md#22-array), -[set](zed.md#23-set), -[map](zed.md#24-map), -[union](zed.md#25-union), -[enum](zed.md#26-enum), and -[error](zed.md#27-error). +[record](zed#21-record), +[array](zed#22-array), +[set](zed#23-set), +[map](zed#24-map), +[union](zed#25-union), +[enum](zed#26-enum), and +[error](zed#27-error). Complex values have an implied type when their constituent values all have implied types. diff --git a/docs/formats/zjson.md b/docs/formats/zjson.md index f4e8d506fa..3e9ebd0cb1 100644 --- a/docs/formats/zjson.md +++ b/docs/formats/zjson.md @@ -6,14 +6,14 @@ heading: ZJSON Specification ## 1. Introduction -The [super data model](zed.md) +The [super data model](zed) is based on richly typed records with a deterministic field order, -as is implemented by the [Super JSON](jsup.md), [Super Binary](bsup.md), and [Super Columnar](csup.md) formats. +as is implemented by the [Super JSON](jsup), [Super Binary](bsup), and [Super Columnar](csup) formats. Given the ubiquity of JSON, it is desirable to also be able to serialize super data into the JSON format. However, encoding super data values directly as JSON values would not work without loss of information. -For example, consider this [Super JSON](jsup.md) data: +For example, consider this [Super JSON](jsup) data: ``` { ts: 2018-03-24T17:15:21.926018012Z, @@ -60,7 +60,7 @@ Also, it is at the whim of a JSON implementation whether or not the order of object keys is preserved. While JSON is well suited for data exchange of generic information, it is not -sufficient for the [super-structured data model](./_index.md#2-a-super-structured-pattern). +sufficient for the [super-structured data model](.#2-a-super-structured-pattern). That said, JSON can be used as an encoding format for super data with another layer of encoding on top of a JSON-based protocol. This allows clients like web apps or Electron apps to receive and understand Super JSON and, with the help of client @@ -92,7 +92,7 @@ The type and value fields are encoded as defined below. ### 2.1 Type Encoding -The type encoding for a primitive type is simply its [type name](zed.md#1-primitive-types) +The type encoding for a primitive type is simply its [type name](zed#1-primitive-types) e.g., "int32" or "string". Complex types are encoded with small-integer identifiers. @@ -256,7 +256,7 @@ as described recursively herein, * a type value is encoded [as above](#21-type-encoding), * each primitive that is not a type value is encoded as a string conforming to its Super JSON representation, as described in the -[corresponding section of the Super JSON specification](jsup.md#23-primitive-values). +[corresponding section of the Super JSON specification](jsup#23-primitive-values). For example, a record with three fields --- a string, an array of integers, and an array of union of string, and float64 --- might have a value that looks like this: @@ -268,7 +268,7 @@ and an array of union of string, and float64 --- might have a value that looks l A ZJSON file is composed of ZJSON objects formatted as [newline delimited JSON (NDJSON)](https://en.wikipedia.org/wiki/JSON_streaming#NDJSON). -e.g., the [super](../commands/super.md) CLI command +e.g., the [super](../commands/super) CLI command writes its ZJSON output as lines of NDJSON. ## 4. Example diff --git a/docs/install.md b/docs/install.md index fe1e8db596..afed1efd0e 100644 --- a/docs/install.md +++ b/docs/install.md @@ -9,7 +9,7 @@ Several options for installing `super` are available: * [Build from source](#building-from-source). To install the SuperDB Python client, see the -[Python library documentation](libraries/python.md). +[Python library documentation](libraries/python). ## Homebrew diff --git a/docs/integrations/fluentd.md b/docs/integrations/fluentd.md index 9688930bfe..9a916d7bba 100644 --- a/docs/integrations/fluentd.md +++ b/docs/integrations/fluentd.md @@ -4,7 +4,7 @@ title: Fluentd --- The [Fluentd](https://www.fluentd.org/) open source data collector can be used -to push log data to a [SuperDB data lake](../commands/super-db.md) in a continuous manner. +to push log data to a [SuperDB data lake](../commands/super-db) in a continuous manner. This allows for querying near-"live" event data to enable use cases such as dashboarding and alerting in addition to creating a long-running historical record for archiving and analytics. @@ -12,7 +12,7 @@ record for archiving and analytics. This guide walks through two simple configurations of Fluentd with a Zed lake that can be used as reference for starting your own production configuration. As it's a data source important to many in the Zed community, log data from -[Zeek](./zeek/_index.md) is used in this guide. The approach shown can be +[Zeek](./zeek) is used in this guide. The approach shown can be easily adapted to any log data source. ## Software @@ -59,7 +59,7 @@ After making these changes, Zeek was started by running A binary [release package](https://github.com/brimdata/super/releases) of Zed executables compatible with our instance was downloaded and unpacked to a -directory in our `$PATH`, then the [lake service](../commands/super-db.md#serve) +directory in our `$PATH`, then the [lake service](../commands/super-db#serve) was started with a specified storage path. ``` @@ -77,7 +77,7 @@ zed create zeek ``` The default settings when running `zed create` set the -[pool key](../commands/super-db.md#pool-key) to the `ts` +[pool key](../commands/super-db#pool-key) to the `ts` field and sort the stored data in descending order by that key. This configuration is ideal for Zeek log data. @@ -86,7 +86,7 @@ The [Zui](https://zui.brimdata.io/) desktop application automatically starts a Zed lake service when it launches. Therefore if you are using Zui you can skip the first set of commands shown above. The pool can be created from Zui by clicking **+**, selecting **New Pool**, then entering `ts` for the -[pool key](../commands/super-db.md#pool-key). +[pool key](../commands/super-db#pool-key). ::: ### Fluentd @@ -105,7 +105,7 @@ sudo gem install fluentd --no-doc The following simple `fluentd.conf` was used to watch the streamed Zeek logs for newly added lines and load each set of them to the pool in the Zed lake as -a separate [commit](../commands/super-db.md#commit-objects). +a separate [commit](../commands/super-db#commit-objects). ``` @@ -188,19 +188,19 @@ produced the following response: ## Shaping Example The query result just shown reflects the minimal data typing available in JSON -format. Meanwhile, the [Zed data model](../formats/zed.md) provides much +format. Meanwhile, the [Zed data model](../formats/zed) provides much richer data typing options, including some types well-suited to Zeek data such as `ip`, `time`, and `duration`. In Zed, the task of cleaning up data to -improve its typing is known as [shaping](../language/shaping.md). +improve its typing is known as [shaping](../language/shaping). -For Zeek data specifically, a [reference shaper](zeek/shaping-zeek-json.md#reference-shaper-contents) +For Zeek data specifically, a [reference shaper](zeek/shaping-zeek-json#reference-shaper-contents) is available that reflects the field and type information in the logs generated by a recent Zeek release. To improve the quality of our data, we next created an expanded configuration that applies the shaper before loading the data into our pool. First we saved the contents of the shaper from -[here](zeek/shaping-zeek-json.md#reference-shaper-contents) to a file +[here](zeek/shaping-zeek-json#reference-shaper-contents) to a file `shaper.zed`. Then in the same directory we created the following `fluentd-shaped.conf`: @@ -305,7 +305,7 @@ Example output: Notice quotes are no longer present around the values that contain IP addresses and times, since they are no longer stored as strings. With the data in this -shaped form, we could now invoke [Zed language](../language/_index.md) +shaped form, we could now invoke [Zed language](../language) functionality that leverages the richer data typing such as filtering `ip` values by CIDR block, e.g., @@ -343,7 +343,7 @@ which in our test environment produced ## Zed Lake Maintenance -The lake stores the data for each [`load`](../commands/super-db.md#load) +The lake stores the data for each [`load`](../commands/super-db#load) operation in a separate commit. If you observe the output of `zed log -use zeek-shaped` after several minutes, you will see many such commits have accumulated, which is a reflection of Fluentd frequently @@ -355,14 +355,14 @@ degrade as many small commits accumulate. However, the `-manage 5m` option that was included when starting our Zed lake service mitigates this effect by compacting the data in the lake's pools every five minutes. This results in storing the pool data across a smaller number of larger -[data objects](../lake/format.md#data-objects), allowing for better query performance +[data objects](../lake/format#data-objects), allowing for better query performance as data volumes increase. By default, even after compaction is performed, the granular commit history is -still maintained to allow for [time travel](../commands/super-db.md#time-travel) +still maintained to allow for [time travel](../commands/super-db#time-travel) use cases. However, if time travel is not functionality you're likely to leverage, you can reduce the lake's storage footprint by periodically running -[`zed vacuum`](../commands/super-db.md#vacuum). This will delete files from lake +[`zed vacuum`](../commands/super-db#vacuum). This will delete files from lake storage that contain the granular commits that have already been rolled into larger objects by compaction. @@ -391,10 +391,10 @@ options. Varying these may impact how quickly events appear in the pool and the size of the commit objects to which they're initially stored. 2. **ZNG format** - In the [shaping example](#shaping-example) shown above, we -used the [Super JSON format](../formats/jsup.md) format for the shaped data output from -[`super`](../commands/super.md). This text format is typically used in contexts +used the [Super JSON format](../formats/jsup) format for the shaped data output from +[`super`](../commands/super). This text format is typically used in contexts where human readability is required. Due to its compact nature, -[Super Binary](../formats/bsup.md) format would have been preferred, but in our research +[Super Binary](../formats/bsup) format would have been preferred, but in our research we found Fluentd consistently steered us toward using only text formats. However, someone more proficient with Fluentd may be able to employ ZNG instead. diff --git a/docs/integrations/grafana.md b/docs/integrations/grafana.md index b1bad65e6a..d713fd60ec 100644 --- a/docs/integrations/grafana.md +++ b/docs/integrations/grafana.md @@ -6,6 +6,6 @@ heading: Grafana Data Source Plugin A [data source plugin](https://grafana.com/grafana/plugins/?type=datasource) for [Grafana](https://grafana.com/) is available that enables plotting of -time-series data that's stored in [SuperDB data lakes](../commands/super-db.md). See the +time-series data that's stored in [SuperDB data lakes](../commands/super-db). See the README in the [grafana-zed-datasource repository](https://github.com/brimdata/grafana-zed-datasource) for details. diff --git a/docs/integrations/zed-lake-auth.md b/docs/integrations/zed-lake-auth.md index 79efc5f92b..bafb88c881 100644 --- a/docs/integrations/zed-lake-auth.md +++ b/docs/integrations/zed-lake-auth.md @@ -4,11 +4,11 @@ title: Authentication Configuration heading: Configuring Authentication for a Zed Lake Service --- -A [SuperDB data lake service](../commands/super-db.md#serve) may be configured to require +A [SuperDB data lake service](../commands/super-db#serve) may be configured to require user authentication to be accessed from clients such as the [Zui](https://zui.brimdata.io/) application, the -[`super db`](../commands/super.md) CLI commands, or the -[SuperDB Python client](../libraries/python.md). This document describes a simple +[`super db`](../commands/super) CLI commands, or the +[SuperDB Python client](../libraries/python). This document describes a simple [Auth0](https://auth0.com) configuration with accompanying `super db serve` flags that can be used as a starting point for creating similar configurations in your own environment. @@ -96,7 +96,7 @@ checkbox to enable the **Device Code** grant type. ## Zed Lake Service Configuration -1. Login to our Linux VM and [install](../install.md#building-from-source) +1. Login to our Linux VM and [install](../install#building-from-source) the most recent Zed tools from source. ``` diff --git a/docs/integrations/zeek/_index.md b/docs/integrations/zeek/_index.md index 1298a50099..2f14aee281 100644 --- a/docs/integrations/zeek/_index.md +++ b/docs/integrations/zeek/_index.md @@ -8,6 +8,6 @@ with logs from the [Zeek](https://zeek.org/) open source network security monitoring tool. Depending on how you use Zeek, one or more of the following docs may be of interest to you. -* [Reading Zeek Log Formats](reading-zeek-log-formats.md) -* [Zed/Zeek Data Type Compatibility](data-type-compatibility.md) -* [Shaping Zeek JSON](shaping-zeek-json.md) +* [Reading Zeek Log Formats](reading-zeek-log-formats) +* [Zed/Zeek Data Type Compatibility](data-type-compatibility) +* [Shaping Zeek JSON](shaping-zeek-json) diff --git a/docs/integrations/zeek/data-type-compatibility.md b/docs/integrations/zeek/data-type-compatibility.md index 7de64aac74..70d3dcb612 100644 --- a/docs/integrations/zeek/data-type-compatibility.md +++ b/docs/integrations/zeek/data-type-compatibility.md @@ -3,13 +3,13 @@ weight: 2 title: Zed/Zeek Data Type Compatibility --- -As the [super data model](../../formats/zed.md) was in many ways inspired by the +As the [super data model](../../formats/zed) was in many ways inspired by the [Zeek TSV log format](https://docs.zeek.org/en/master/log-formats.html#zeek-tsv-format-logs), -SuperDB's rich storage formats ([Super JSON](../../formats/jsup.md), -[Super Binary](../../formats/bsup.md), etc.) maintain comprehensive interoperability +SuperDB's rich storage formats ([Super JSON](../../formats/jsup), +[Super Binary](../../formats/bsup), etc.) maintain comprehensive interoperability with Zeek. When Zeek is configured to output its logs in JSON format, much of the rich type information is lost in translation, but -this can be restored by following the guidance for [shaping Zeek JSON](shaping-zeek-json.md). +this can be restored by following the guidance for [shaping Zeek JSON](shaping-zeek-json). On the other hand, Zeek TSV can be converted to Zed storage formats and back to Zeek TSV without any loss of information. @@ -19,8 +19,8 @@ the types that may appear in Zeek logs. Zed tools maintain an internal Zed-typed representation of any Zeek data that is read or imported. Therefore, knowing the equivalent types will prove useful when performing operations in the -[Zed language](../../language/_index.md) such as -[type casting](../../language/shaping.md#cast) or looking at the data +[Zed language](../../language) such as +[type casting](../../language/shaping#cast) or looking at the data when output as Super JSON. ## Equivalent Types @@ -34,20 +34,20 @@ applicable to handling certain types. | Zeek Type | Zed Type | Additional Detail | |------------|------------|-------------------| -| [`bool`](https://docs.zeek.org/en/current/script-reference/types.html#type-bool) | [`bool`](../../formats/zed.md#1-primitive-types) | | -| [`count`](https://docs.zeek.org/en/current/script-reference/types.html#type-count) | [`uint64`](../../formats/zed.md#1-primitive-types) | | -| [`int`](https://docs.zeek.org/en/current/script-reference/types.html#type-int) | [`int64`](../../formats/zed.md#1-primitive-types) | | -| [`double`](https://docs.zeek.org/en/current/script-reference/types.html#type-double) | [`float64`](../../formats/zed.md#1-primitive-types) | See [`double` details](#double) | -| [`time`](https://docs.zeek.org/en/current/script-reference/types.html#type-time) | [`time`](../../formats/zed.md#1-primitive-types) | | -| [`interval`](https://docs.zeek.org/en/current/script-reference/types.html#type-interval) | [`duration`](../../formats/zed.md#1-primitive-types) | | -| [`string`](https://docs.zeek.org/en/current/script-reference/types.html#type-string) | [`string`](../../formats/zed.md#1-primitive-types) | See [`string` details about escaping](#string) | -| [`port`](https://docs.zeek.org/en/current/script-reference/types.html#type-port) | [`uint16`](../../formats/zed.md#1-primitive-types) | See [`port` details](#port) | -| [`addr`](https://docs.zeek.org/en/current/script-reference/types.html#type-addr) | [`ip`](../../formats/zed.md#1-primitive-types) | | -| [`subnet`](https://docs.zeek.org/en/current/script-reference/types.html#type-subnet) | [`net`](../../formats/zed.md#1-primitive-types) | | -| [`enum`](https://docs.zeek.org/en/current/script-reference/types.html#type-enum) | [`string`](../../formats/zed.md#1-primitive-types) | See [`enum` details](#enum) | -| [`set`](https://docs.zeek.org/en/current/script-reference/types.html#type-set) | [`set`](../../formats/zed.md#23-set) | See [`set` details](#set) | +| [`bool`](https://docs.zeek.org/en/current/script-reference/types.html#type-bool) | [`bool`](../../formats/zed#1-primitive-types) | | +| [`count`](https://docs.zeek.org/en/current/script-reference/types.html#type-count) | [`uint64`](../../formats/zed#1-primitive-types) | | +| [`int`](https://docs.zeek.org/en/current/script-reference/types.html#type-int) | [`int64`](../../formats/zed#1-primitive-types) | | +| [`double`](https://docs.zeek.org/en/current/script-reference/types.html#type-double) | [`float64`](../../formats/zed#1-primitive-types) | See [`double` details](#double) | +| [`time`](https://docs.zeek.org/en/current/script-reference/types.html#type-time) | [`time`](../../formats/zed#1-primitive-types) | | +| [`interval`](https://docs.zeek.org/en/current/script-reference/types.html#type-interval) | [`duration`](../../formats/zed#1-primitive-types) | | +| [`string`](https://docs.zeek.org/en/current/script-reference/types.html#type-string) | [`string`](../../formats/zed#1-primitive-types) | See [`string` details about escaping](#string) | +| [`port`](https://docs.zeek.org/en/current/script-reference/types.html#type-port) | [`uint16`](../../formats/zed#1-primitive-types) | See [`port` details](#port) | +| [`addr`](https://docs.zeek.org/en/current/script-reference/types.html#type-addr) | [`ip`](../../formats/zed#1-primitive-types) | | +| [`subnet`](https://docs.zeek.org/en/current/script-reference/types.html#type-subnet) | [`net`](../../formats/zed#1-primitive-types) | | +| [`enum`](https://docs.zeek.org/en/current/script-reference/types.html#type-enum) | [`string`](../../formats/zed#1-primitive-types) | See [`enum` details](#enum) | +| [`set`](https://docs.zeek.org/en/current/script-reference/types.html#type-set) | [`set`](../../formats/zed#23-set) | See [`set` details](#set) | | [`vector`](https://docs.zeek.org/en/current/script-reference/types.html#type-vector) | [`array`](../../formats/zed.md#22-array | | -| [`record`](https://docs.zeek.org/en/current/script-reference/types.html#type-record) | [`record`](../../formats/zed.md#21-record | See [`record` details](#record) | +| [`record`](https://docs.zeek.org/en/current/script-reference/types.html#type-record) | [`record`](../../formats/zed#21-record | See [`record` details](#record) | :::tip Note The [Zeek data types](https://docs.zeek.org/en/current/script-reference/types.html) @@ -62,7 +62,7 @@ there is no authoritative specification of the Zeek TSV log format. ## Example The following example shows a TSV log that includes each Zeek data type, how -it's output as Super JSON by [`super`](../../commands/super.md), and then how it's written back out again as a Zeek +it's output as Super JSON by [`super`](../../commands/super), and then how it's written back out again as a Zeek log. You may find it helpful to refer to this example when reading the [type-specific details](#type-specific-details). @@ -150,8 +150,8 @@ out again in the Zeek TSV log format. Other implementations of the Zed storage formats (should they exist) may handle these differently. Multiple Zeek types discussed below are represented via a -[type definition](../../formats/jsup.md#22-type-decorators) to one of Zed's -[primitive types](../../formats/zed.md#1-primitive-types). The Zed type +[type definition](../../formats/jsup#22-type-decorators) to one of Zed's +[primitive types](../../formats/zed#1-primitive-types). The Zed type definitions maintain the history of the field's original Zeek type name such that `zq` may restore it if the field is later output in Zeek TSV format. Knowledge of its original Zeek type may also enable special @@ -204,7 +204,7 @@ _not_ intended to be read or presented as such. Meanwhile, another Zeek UTF-8. These details are currently only captured within the Zeek source code itself that defines how these values are generated. -Zed includes a [primitive type](../../formats/zed.md#1-primitive-types) +Zed includes a [primitive type](../../formats/zed#1-primitive-types) called `bytes` that's suited to storing the former "always binary" case and a `string` type for the latter "always printable" case. However, Zeek logs do not currently communicate details that would allow an implementation to know @@ -248,7 +248,7 @@ Zed that refer to the record at a higher level but affect all values lower down in the record hierarchy. For instance, revisiting the data from our example, we can output all fields within -`my_record` using Zed's [`cut` operator](../../language/operators/cut.md). +`my_record` using Zed's [`cut` operator](../../language/operators/cut). #### Command: diff --git a/docs/integrations/zeek/reading-zeek-log-formats.md b/docs/integrations/zeek/reading-zeek-log-formats.md index 8d78a9304f..0a587fc92f 100644 --- a/docs/integrations/zeek/reading-zeek-log-formats.md +++ b/docs/integrations/zeek/reading-zeek-log-formats.md @@ -5,17 +5,17 @@ title: Reading Zeek Log Formats Zed is capable of reading both common Zeek log formats. This document provides guidance for what to expect when reading logs of these formats using -the Zed [command line tools](../../commands/_index.md). +the Zed [command line tools](../../commands). ## Zeek TSV [Zeek TSV](https://docs.zeek.org/en/master/log-formats.html#zeek-tsv-format-logs) is Zeek's default output format for logs. This format can be read automatically (i.e., no `-i` command line flag is necessary to indicate the input format) -with the Zed tools such as [`super`](../../commands/super.md). +with the Zed tools such as [`super`](../../commands/super). The following example shows a TSV [`conn.log`](https://docs.zeek.org/en/master/logs/conn.html) being read via `zq` and -output as [Super JSON](../../formats/jsup.md). +output as [Super JSON](../../formats/jsup). #### conn.log @@ -70,9 +70,9 @@ super -Z -c 'head 1' conn.log Other than Zed, Zeek provides one of the richest data typing systems available and therefore such records typically need no adjustment to their data types once they've been read in as is. The -[Zed/Zeek Data Type Compatibility](data-type-compatibility.md) document +[Zed/Zeek Data Type Compatibility](data-type-compatibility) document provides further detail on how the rich data types in Zeek TSV map to the -equivalent [rich types in Zed](../../formats/zed.md#1-primitive-types). +equivalent [rich types in Zed](../../formats/zed#1-primitive-types). ## Zeek JSON @@ -144,10 +144,10 @@ that Zeek chose to output these values as it did. Furthermore, if you were just seeking to do quick searches on the string values or simple math on the numbers, these limitations may be acceptable. However, if you intended to perform operations like -[aggregations with time-based grouping](../../language/functions/bucket.md) -or [CIDR matches](../../language/functions/network_of.md) +[aggregations with time-based grouping](../../language/functions/bucket) +or [CIDR matches](../../language/functions/network_of) on IP addresses, you would likely want to restore the rich Zed data types as -the records are being read. The document on [shaping Zeek JSON](shaping-zeek-json.md) +the records are being read. The document on [shaping Zeek JSON](shaping-zeek-json) provides details on how this can be done. ## The Role of `_path` @@ -155,7 +155,7 @@ provides details on how this can be done. Zeek's `_path` field plays an important role in differentiating between its different [log types](https://docs.zeek.org/en/master/script-reference/log-files.html) (`conn`, `dns`, etc.) For instance, -[shaping Zeek JSON](shaping-zeek-json.md) relies on the value of +[shaping Zeek JSON](shaping-zeek-json) relies on the value of the `_path` field to know which Zed type to apply to an input JSON record. diff --git a/docs/integrations/zeek/shaping-zeek-json.md b/docs/integrations/zeek/shaping-zeek-json.md index b574028ae7..55e2d1891f 100644 --- a/docs/integrations/zeek/shaping-zeek-json.md +++ b/docs/integrations/zeek/shaping-zeek-json.md @@ -3,10 +3,10 @@ weight: 3 title: Shaping Zeek JSON --- -When [reading Zeek JSON format logs](reading-zeek-log-formats.md#zeek-json), +When [reading Zeek JSON format logs](reading-zeek-log-formats#zeek-json), much of the rich data typing that was originally present inside Zeek is at risk of being lost. This detail can be restored using a Zed -[shaper](../../language/shaping.md), such as the +[shaper](../../language/shaping), such as the [reference shaper described below](#reference-shaper-contents). ## Zeek Version/Configuration @@ -153,12 +153,12 @@ such a field would be maintained and assigned an inferred type. * `_error_if_cropped` (default: `true`) - If such a field is cropped, the original input record will be -[wrapped inside a Zed `error` value](../../language/shaping.md#error-handling) +[wrapped inside a Zed `error` value](../../language/shaping#error-handling) along with the shaped and cropped variations. At these default settings, the shaper is well-suited for an iterative workflow with a goal of establishing full coverage of the JSON data with rich Zed -types. For instance, the [`has_error` function](../../language/functions/has_error.md) +types. For instance, the [`has_error` function](../../language/functions/has_error) can be applied on the shaped output and any error values surfaced will point to fields that can be added to the type definitions in the shaper. @@ -172,7 +172,7 @@ type port=uint16; type zenum=string; type conn_id={orig_h:ip,orig_p:port,resp_h:ip,resp_p:port}; ``` -The `port` and `zenum` types are described further in the [Zed/Zeek Data Type Compatibility](data-type-compatibility.md) +The `port` and `zenum` types are described further in the [Zed/Zeek Data Type Compatibility](data-type-compatibility) doc. The `conn_id` type will just save us from having to repeat these fields individually in the many Zeek record types that contain an embedded `id` record. @@ -183,7 +183,7 @@ The bulk of this Zed shaper consists of detailed per-field data type definitions for each record in the default set of JSON logs output by Zeek. These type definitions reference the types we defined above, such as `port` and `conn_id`. The syntax for defining primitive and complex types follows the -relevant sections of the [Super JSON Format](../../formats/jsup.md#2-the-super-json-format) +relevant sections of the [Super JSON Format](../../formats/jsup#2-the-super-json-format) specification. ``` @@ -194,7 +194,7 @@ specification. ``` :::tip note -See [the role of `_path`](reading-zeek-log-formats.md#the-role-of-_path) +See [the role of `_path`](reading-zeek-log-formats#the-role-of-_path) for important details if you're using Zeek's built-in [ASCII logger](https://docs.zeek.org/en/current/scripts/base/frameworks/logging/writers/ascii.zeek.html) rather than the [JSON Streaming Logs](https://github.com/corelight/json-streaming-logs) package. ::: @@ -234,7 +234,7 @@ yield nest_dotted(this) Picking this apart, it transforms each record as it's being read in several steps. -1. The [`nest_dotted` function](../../language/functions/nest_dotted.md) +1. The [`nest_dotted` function](../../language/functions/nest_dotted) reverses the Zeek JSON logger's "flattening" of nested records, e.g., how it populates a field named `id.orig_h` rather than creating a field `id` with sub-field `orig_h` inside it. Restoring the original nesting now gives us @@ -242,19 +242,19 @@ steps. and access the entire 4-tuple of values, but still access the individual values using the same dotted syntax like `id.orig_h` when needed. -2. The [`switch` operator](../../language/operators/switch.md) is used to flag +2. The [`switch` operator](../../language/operators/switch) is used to flag any problems encountered when applying the shaper logic, e.g., * An incoming Zeek JSON record has a `_path` value for which the shaper lacks a type definition. * A field in an incoming Zeek JSON record is located in our type - definitions but cannot be successfully [cast](../../language/functions/cast.md) + definitions but cannot be successfully [cast](../../language/functions/cast) to the target type defined in the shaper. * An incoming Zeek JSON record has additional field(s) beyond those in the target type definition and the [configurable options](#configurable-options) are set such that this should be treated as an error. -3. Each [`shape` function](../../language/functions/shape.md) call applies an +3. Each [`shape` function](../../language/functions/shape) call applies an appropriate type definition based on the nature of the incoming Zeek JSON record. The logic of `shape` includes: @@ -317,7 +317,7 @@ produces If working in a directory containing many JSON logs, the reference shaper can be applied to all the records they contain and -output them all in a single [Super Binary](../../formats/bsup.md) file as +output them all in a single [Super Binary](../../formats/bsup) file as follows: ``` diff --git a/docs/lake/api.md b/docs/lake/api.md index ab8448c07a..f10d37003b 100644 --- a/docs/lake/api.md +++ b/docs/lake/api.md @@ -262,10 +262,10 @@ On success, HTTP 204 is returned with no response payload. Create a commit that reflects the deletion of some data in the branch. The data to delete can be specified via a list of object IDs or -as a filter expression (see [limitations](../commands/super-db.md#delete)). +as a filter expression (see [limitations](../commands/super-db#delete)). This simply removes the data from the branch without actually removing the -underlying data objects thereby allowing [time travel](../commands/super-db.md#time-travel) to work in the face +underlying data objects thereby allowing [time travel](../commands/super-db#time-travel) to work in the face of deletes. Permanent removal of underlying data objects is handled by a separate [vacuum](#vacuum-pool) operation. @@ -280,7 +280,7 @@ POST /pool/{pool}/branch/{branch}/delete | pool | string | path | **Required.** ID of the pool. | | branch | string | path | **Required.** Name of branch. | | object_ids | [string] | body | Object IDs to be deleted. | -| where | string | body | Filter expression (see [limitations](../commands/super-db.md#delete)). | +| where | string | body | Filter expression (see [limitations](../commands/super-db#delete)). | | Content-Type | string | header | [MIME type](#mime-types) of the request payload. | | Accept | string | header | Preferred [MIME type](#mime-types) of the response. | @@ -525,7 +525,7 @@ service will expect ZSON as the payload format. An exception to this is when [loading data](#load-data) and Content-Type is not specified. In this case the service will attempt to introspect the data and may determine the type automatically. The -[input formats](../commands/super.md#input-formats) table describes which +[input formats](../commands/super#input-formats) table describes which formats may be successfully auto-detected. ### Response Payloads @@ -534,7 +534,7 @@ To receive successful (2xx) responses in a preferred format, include the MIME type of the format in the request's Accept HTTP header. If the Accept header is not specified, the service will return ZSON as the default response format. A different default response format can be specified by invoking the -`-defaultfmt` option when running [`super db serve`](../commands/super-db.md#serve). +`-defaultfmt` option when running [`super db serve`](../commands/super-db#serve). For non-2xx responses, the content type of the response will be `application/json` or `text/plain`. diff --git a/docs/lake/format.md b/docs/lake/format.md index f223ac2700..549842ede7 100644 --- a/docs/lake/format.md +++ b/docs/lake/format.md @@ -12,8 +12,8 @@ as we add new capabilities to the system. ## Introduction -To support the client-facing [SuperDB data lake semantics](../commands/super-db.md#the-lake-model) -implemented by the [`super db` command](../commands/super-db.md), we are developing +To support the client-facing [SuperDB data lake semantics](../commands/super-db#the-lake-model) +implemented by the [`super db` command](../commands/super-db), we are developing an open specification for the Zed lake storage format described in this document. As we make progress on the Zed lake model, we will update this document as we go. @@ -31,7 +31,7 @@ to provide a universal data representation for all of these different approaches Also, while we are not currently focused on building a SQL engine for the Zed lake, it is most certainly possible to do so, as a Zed record type -[is analagous to](../formats/_index.md#2-a-super-structured-pattern) +[is analagous to](../formats#2-a-super-structured-pattern) a SQL table definition. SQL tables can essentially be dynamically projected via a table virtualization layer built on top of the Zed lake model. @@ -198,7 +198,7 @@ the HEAD of the journal is accessed. > a file for exclusive access and checking that it has zero length after > a successful open. -Second, strong read/write ordering semantics (as exists in [Amazon S3](../integrations/amazon-s3.md)) +Second, strong read/write ordering semantics (as exists in [Amazon S3](../integrations/amazon-s3)) can be used to implement transactional journal updates as follows: * _TBD: this is worked out but needs to be written up_ diff --git a/docs/language/_index.md b/docs/language/_index.md index 303b399ad1..de5fd3a63c 100644 --- a/docs/language/_index.md +++ b/docs/language/_index.md @@ -5,12 +5,12 @@ heading: The Zed Language --- The language documents: -* provide an [overview](overview.md) of the Zed language, -* describe Zed's [pipeline model](pipeline-model.md), -* explain Zed's [data types](data-types.md), -* show the syntax of [statements](statements.md) that define constants, functions, operators, and named types, -* describe the syntax of [expressions](expressions.md) and [search expressions](search-expressions.md), -* explain [lateral subqueries](lateral-subqueries.md), -* describe [shaping and type fusion](shaping.md), and -* enumerate the [operators](operators/_index.md), [functions](functions/_index.md), -and [aggregate functions](aggregates/_index.md) in reference format. +* provide an [overview](overview) of the Zed language, +* describe Zed's [pipeline model](pipeline-model), +* explain Zed's [data types](data-types), +* show the syntax of [statements](statements) that define constants, functions, operators, and named types, +* describe the syntax of [expressions](expressions) and [search expressions](search-expressions), +* explain [lateral subqueries](lateral-subqueries), +* describe [shaping and type fusion](shaping), and +* enumerate the [operators](operators), [functions](functions), +and [aggregate functions](aggregates) in reference format. diff --git a/docs/language/aggregates/_index.md b/docs/language/aggregates/_index.md index 2f28163798..dcff0b7cb4 100644 --- a/docs/language/aggregates/_index.md +++ b/docs/language/aggregates/_index.md @@ -3,20 +3,20 @@ title: Aggregates heading: Aggregate Functions --- -Aggregate functions appear in either [summarization](../operators/summarize.md) -or [expression](../expressions.md#aggregate-function-calls) context and produce an aggregate +Aggregate functions appear in either [summarization](../operators/summarize) +or [expression](../expressions#aggregate-function-calls) context and produce an aggregate value for a sequence of input values. -- [and](and.md) - logical AND of input values -- [any](any.md) - select an arbitrary value from its input -- [avg](avg.md) - average value -- [collect](collect.md) - aggregate values into array -- [collect_map](collect_map.md) - aggregate map values into a single map -- [count](count.md) - count input values -- [dcount](dcount.md) - count distinct input values -- [fuse](fuse.md) - compute a fused type of input values -- [max](max.md) - maximum value of input values -- [min](min.md) - minimum value of input values -- [or](or.md) - logical OR of input values -- [sum](sum.md) - sum of input values -- [union](union.md) - set union of input values +- [and](and) - logical AND of input values +- [any](any) - select an arbitrary value from its input +- [avg](avg) - average value +- [collect](collect) - aggregate values into array +- [collect_map](collect_map) - aggregate map values into a single map +- [count](count) - count input values +- [dcount](dcount) - count distinct input values +- [fuse](fuse) - compute a fused type of input values +- [max](max) - maximum value of input values +- [min](min) - minimum value of input values +- [or](or) - logical OR of input values +- [sum](sum) - sum of input values +- [union](union) - set union of input values diff --git a/docs/language/aggregates/count.md b/docs/language/aggregates/count.md index bc2a7a9b21..efca168984 100644 --- a/docs/language/aggregates/count.md +++ b/docs/language/aggregates/count.md @@ -71,7 +71,7 @@ echo '1 "foo" 10.0.0.1' | super -z -c 'count() where grep("bar")' - 0(uint64) ``` -Note that the number of input values are counted, unlike the [`len` function](../functions/len.md) which counts the number of elements in a given value: +Note that the number of input values are counted, unlike the [`len` function](../functions/len) which counts the number of elements in a given value: ```mdtest-command echo '[1,2,3]' | super -z -c 'count()' - ``` diff --git a/docs/language/aggregates/fuse.md b/docs/language/aggregates/fuse.md index 1d010d7a13..8156b428a1 100644 --- a/docs/language/aggregates/fuse.md +++ b/docs/language/aggregates/fuse.md @@ -9,7 +9,7 @@ fuse(any) -> type ### Description -The _fuse_ aggregate function applies [type fusion](../shaping.md#type-fusion) +The _fuse_ aggregate function applies [type fusion](../shaping#type-fusion) to its input and returns the fused type. This aggregation is useful with group-by for data exploration and discovery diff --git a/docs/language/conventions.md b/docs/language/conventions.md index 5e9823a520..0775dcd056 100644 --- a/docs/language/conventions.md +++ b/docs/language/conventions.md @@ -4,15 +4,15 @@ title: Conventions heading: Type Conventions --- -[Function](functions/_index.md) arguments and [operator](operators/_index.md) input values are all dynamically typed, -yet certain functions expect certain specific [data types](data-types.md) +[Function](functions) arguments and [operator](operators) input values are all dynamically typed, +yet certain functions expect certain specific [data types](data-types) or classes of data types. To this end, the function and operator prototypes in the Zed documentation include several type classes as follows: * _any_ - any Zed data type * _float_ - any floating point Zed type * _int_ - any signed or unsigned Zed integer type * _number_ - either float or int -* _record_ - any [record](../formats/jsup.md#251-record-type) type +* _record_ - any [record](../formats/jsup#251-record-type) type Note that there is no "any" type in Zed as all super-structured data is comprehensively typed; "any" here simply refers to a value that is allowed diff --git a/docs/language/data-types.md b/docs/language/data-types.md index ec789b127a..9b906f8984 100644 --- a/docs/language/data-types.md +++ b/docs/language/data-types.md @@ -4,20 +4,20 @@ title: Data Types --- The SuperPipe language includes most data types of a typical programming language -as defined in the [super data model](../formats/zed.md). +as defined in the [super data model](../formats/zed). The syntax of individual literal values generally follows -the [Super JSON syntax](../formats/jsup.md) with the exception that -[type decorators](../formats/jsup.md#22-type-decorators) +the [Super JSON syntax](../formats/jsup) with the exception that +[type decorators](../formats/jsup#22-type-decorators) are not included in the language. Instead, a -[type cast](expressions.md#casts) may be used in any expression for explicit +[type cast](expressions#casts) may be used in any expression for explicit type conversion. In particular, the syntax of primitive types follows the -[primitive-value definitions](../formats/jsup.md#23-primitive-values) in Super JSON -as well as the various [complex value definitions](../formats/jsup.md#24-complex-values) +[primitive-value definitions](../formats/jsup#23-primitive-values) in Super JSON +as well as the various [complex value definitions](../formats/jsup#24-complex-values) like records, arrays, sets, and so forth. However, complex values are not limited to -constant values like Super JSON and can be composed from [literal expressions](expressions.md#literals). +constant values like Super JSON and can be composed from [literal expressions](expressions#literals). ## First-class Types @@ -25,7 +25,7 @@ As in the super data model, the SuperPipe language has first-class types: any type may be used as a value. The primitive types are listed in the -[data model specification](../formats/zed.md#1-primitive-types) +[data model specification](../formats/zed#1-primitive-types) and have the same syntax in SuperPipe. Complex types also follow the Super JSON syntax. Note that the type of a type value is simply `type`. @@ -44,11 +44,11 @@ a few examples: Complex types may be composed, as in `[({s:string},{x:int64})]` which is an array of type `union` of two types of records. -The [`typeof` function](functions/typeof.md) returns a value's type as +The [`typeof` function](functions/typeof) returns a value's type as a value, e.g., `typeof(1)` is `` and `typeof()` is ``. First-class types are quite powerful because types can -serve as group-by keys or be used in ["data shaping"](shaping.md) logic. +serve as group-by keys or be used in ["data shaping"](shaping) logic. A common workflow for data introspection is to first perform a search of exploratory data and then count the shapes of each type of data as follows: ``` @@ -86,8 +86,8 @@ As in any modern programming language, types can be named and the type names persist into the data model and thus into the serialized input and output. Named types may be defined in four ways: -* with a [`type` statement](statements.md#type-statements), -* with the [`cast` function](functions/cast.md), +* with a [`type` statement](statements#type-statements), +* with the [`cast` function](functions/cast), * with a definition inside of another type, or * by the input data itself. @@ -161,7 +161,7 @@ the scope of the Zed data model and language. That said, Zed provides flexible building blocks so systems can define their own schema versioning and schema management policies on top of these Zed primitives. -The [super-structured data model](../formats/_index.md#2-a-super-structured-pattern) +The [super-structured data model](../formats#2-a-super-structured-pattern) is a superset of relational tables and SuperPipe's type system can easily make this connection. As an example, consider this type definition for "employee": @@ -183,7 +183,7 @@ from anywhere | sort salary | head 5 ``` -and since type comparisons are so useful and common, the [`is` function](functions/is.md) +and since type comparisons are so useful and common, the [`is` function](functions/is) can be used to perform the type match: ``` from anywhere @@ -202,7 +202,7 @@ to work. ## First-class Errors As with types, errors in SuperPipe are first-class: any value can be transformed -into an error by wrapping it in an [`error` type](../formats/zed.md#27-error). +into an error by wrapping it in an [`error` type](../formats/zed#27-error). In general, expressions and functions that result in errors simply return a value of type `error` as a result. This encourages a powerful flow-style @@ -253,14 +253,14 @@ For example, suppose a bad value shows up: ``` {kind:"bad", stuff:{foo:1,bar:2}} ``` -A [shaper](shaping.md) could catch the bad value (e.g., as a default -case in a [`switch`](operators/switch.md) topology) and propagate it as +A [shaper](shaping) could catch the bad value (e.g., as a default +case in a [`switch`](operators/switch) topology) and propagate it as an error using the expression: ``` yield error({message:"unrecognized input",input:this}) ``` then such errors could be detected and searched for downstream with the -[`is_error` function](functions/is_error.md). +[`is_error` function](functions/is_error). For example, ``` is_error(this) @@ -333,7 +333,7 @@ produces error("missing") ``` Sometimes you want missing errors to show up and sometimes you don't. -The [`quiet` function](functions/quiet.md) transforms missing errors into +The [`quiet` function](functions/quiet) transforms missing errors into "quiet errors". A quiet error is the value `error("quiet")` and is ignored by most operators, in particular `yield`. For example, ```mdtest-command @@ -345,7 +345,7 @@ produces ``` And what if you want a default value instead of a missing error? The -[`coalesce` function](functions/coalesce.md) returns the first value that is not +[`coalesce` function](functions/coalesce) returns the first value that is not null, `error("missing")`, or `error("quiet")`. For example, ```mdtest-command echo "{x:1} {y:2}" | super -z -c "yield coalesce(x, 0)" - diff --git a/docs/language/expressions.md b/docs/language/expressions.md index 47d423de70..a3293b830a 100644 --- a/docs/language/expressions.md +++ b/docs/language/expressions.md @@ -6,7 +6,7 @@ title: Expressions Zed expressions follow the typical patterns in programming languages. Expressions are typically used within pipeline operators to perform computations on input values and are typically evaluated once per each -input value [`this`](pipeline-model.md#the-special-value-this). +input value [`this`](pipeline-model#the-special-value-this). For example, `yield`, `where`, `cut`, `put`, `sort` and so forth all take various expressions as part of their operation. @@ -107,7 +107,7 @@ where `` is an identifier representing the field name referenced. If a field name is not representable as an identifier, then [indexing](#indexing) may be used with a quoted string to represent any valid field name. Such field names can be accessed using -[`this`](pipeline-model.md#the-special-value-this) and an array-style reference, e.g., +[`this`](pipeline-model#the-special-value-this) and an array-style reference, e.g., `this["field with spaces"]`. If the dot operator is applied to a value that is not a record @@ -209,7 +209,7 @@ produces ``` Note that if the expression has side effects, -as with [aggregate function calls](expressions.md#aggregate-function-calls), only the selected expression +as with [aggregate function calls](expressions#aggregate-function-calls), only the selected expression will be evaluated. For example, @@ -240,15 +240,15 @@ produces ``` -Zed includes many [built-in functions](functions/_index.md), some of which take +Zed includes many [built-in functions](functions), some of which take a variable number of arguments. -Zed also allows you to create [user-defined functions](statements.md#func-statements). +Zed also allows you to create [user-defined functions](statements#func-statements). ## Aggregate Function Calls -[Aggregate functions](aggregates/_index.md) may be called within an expression. -Unlike the aggregation context provided by a [summarizing group-by](operators/summarize.md), such calls +[Aggregate functions](aggregates) may be called within an expression. +Unlike the aggregation context provided by a [summarizing group-by](operators/summarize), such calls in expression context yield an output value for each input value. Note that because aggregate functions carry state which is typically @@ -266,7 +266,7 @@ produces {id:2(uint64),value:"bar"} {id:3(uint64),value:"baz"} ``` -In contrast, calling aggregate functions within the [`summarize` operator](operators/summarize.md) +In contrast, calling aggregate functions within the [`summarize` operator](operators/summarize) ```mdtest-command echo '"foo" "bar" "baz"' | super -z -c 'summarize count(),union(this)' - ``` @@ -277,7 +277,7 @@ produces just one output value ## Literals -Any of the [data types](data-types.md) may be used in expressions +Any of the [data types](data-types) may be used in expressions as long as it is compatible with the semantics of the expression. String literals are enclosed in either single quotes or double quotes and @@ -351,7 +351,7 @@ where a `` has one of three forms: ``` The first form is a customary colon-separated field and value similar to JavaScript, where `` may be an identifier or quoted string. -The second form is an [implied field reference](pipeline-model.md#implied-field-references) +The second form is an [implied field reference](pipeline-model#implied-field-references) ``, which is shorthand for `:`. The third form is the `...` spread operator which expects a record value as the result of `` and inserts all of the fields from the resulting record. @@ -479,7 +479,7 @@ produces ### Union Values -A union value can be created with a [cast](expressions.md#casts). For example, a union of types `int64` +A union value can be created with a [cast](expressions#casts). For example, a union of types `int64` and `string` is expressed as `(int64,string)` and any value that has a type that appears in the union type may be cast to that union type. Since 1 is an `int64` and "foo" is a `string`, they both can be @@ -493,7 +493,7 @@ produces "foo"((int64,string)) ``` The value underlying a union-tagged value is accessed with the -[`under` function](functions/under.md): +[`under` function](functions/under): ```mdtest-command echo '1((int64,string))' | super -z -c 'yield under(this)' - ``` @@ -516,13 +516,13 @@ produces ## Casts -Type conversion is performed with casts and the built-in [`cast` function](functions/cast.md). +Type conversion is performed with casts and the built-in [`cast` function](functions/cast). Casts for primitive types have a function-style syntax of the form ``` ( ) ``` -where `` is a [Zed type](data-types.md#first-class-types) and `` is any Zed expression. +where `` is a [Zed type](data-types#first-class-types) and `` is any Zed expression. In the case of primitive types, the type-value angle brackets may be omitted, e.g., `(1)` is equivalent to `string(1)`. If the result of `` cannot be converted @@ -553,7 +553,7 @@ produces 1970-10-07T00:00:00Z ``` -Casts of complex or [named types](data-types.md#named-types) may be performed using type values +Casts of complex or [named types](data-types#named-types) may be performed using type values either in functional form or with `cast`: ``` ( ) diff --git a/docs/language/functions/_index.md b/docs/language/functions/_index.md index 27d2c54652..2e0fd9bb85 100644 --- a/docs/language/functions/_index.md +++ b/docs/language/functions/_index.md @@ -2,69 +2,69 @@ title: Functions --- -Functions appear in [expression](../expressions.md) context and +Functions appear in [expression](../expressions) context and take Zed values as arguments and produce a value as a result. In addition to the built-in functions listed below, Zed also allows for the creation of -[user-defined functions](../statements.md#func-statements). +[user-defined functions](../statements#func-statements). A function-style syntax is also available for converting values to each of -Zed's [primitive types](../../formats/zed.md#1-primitive-types), e.g., +Zed's [primitive types](../../formats/zed#1-primitive-types), e.g., `uint8()`, `time()`, etc. For details and examples, read about the -[`cast` function](cast.md) and how it is [used in expressions](../expressions.md#casts). +[`cast` function](cast) and how it is [used in expressions](../expressions#casts). -* [abs](abs.md) - absolute value of a number -* [base64](base64.md) - encode/decode base64 strings -* [bucket](bucket.md) - quantize a time or duration value into buckets of equal widths -* [cast](cast.md) - coerce a value to a different type -* [ceil](ceil.md) - ceiling of a number -* [cidr_match](cidr_match.md) - test if IP is in a network -* [compare](compare.md) - return an int comparing two values -* [coalesce](coalesce.md) - return first value that is not null, a "missing" error, or a "quiet" error -* [crop](crop.md) - remove fields from a value that are missing in a specified type -* [error](error.md) - wrap a value as an error -* [every](every.md) - bucket `ts` using a duration -* [fields](fields.md) - return the flattened path names of a record -* [fill](fill.md) - add null values for missing record fields -* [flatten](flatten.md) - transform a record into a flattened map -* [floor](floor.md) - floor of a number -* [grep](grep.md) - search strings inside of values -* [grok](grok.md) - parse a string into a structured record -* [has](has.md) - test existence of values -* [hex](hex.md) - encode/decode hexadecimal strings -* [has_error](has_error.md) - test if a value has an error -* [is](is.md) - test a value's type -* [is_error](is_error.md) - test if a value is an error -* [join](join.md) - concatenate array of strings with a separator -* [kind](kind.md) - return a value's type category -* [ksuid](ksuid.md) - encode/decode KSUID-style unique identifiers -* [len](len.md) - the type-dependent length of a value -* [levenshtein](levenshtein.md) Levenshtein distance -* [log](log.md) - natural logarithm -* [lower](lower.md) - convert a string to lower case -* [map](map.md) - apply a function to each element of an array or set -* [missing](missing.md) - test for the "missing" error -* [nameof](nameof.md) - the name of a named type -* [nest_dotted](nest_dotted.md) - transform fields in a record with dotted names to nested records -* [network_of](network_of.md) - the network of an IP -* [now](now.md) - the current time -* [order](order.md) - reorder record fields -* [parse_uri](parse_uri.md) - parse a string URI into a structured record -* [parse_zson](parse_zson.md) - parse ZSON text into a Zed value -* [pow](pow.md) - exponential function of any base -* [quiet](quiet.md) - quiet "missing" errors -* [regexp](regexp.md) - perform a regular expression search on a string -* [regexp_replace](regexp_replace.md) - replace regular expression matches in a string -* [replace](replace.md) - replace one string for another -* [round](round.md) - round a number -* [rune_len](rune_len.md) - length of a string in Unicode code points -* [shape](shape.md) - apply cast, fill, and order -* [split](split.md) - slice a string into an array of strings -* [sqrt](sqrt.md) - square root of a number -* [strftime](strftime.md) - format time values -* [trim](trim.md) - strip leading and trailing whitespace -* [typename](typename.md) - look up and return a named type -* [typeof](typeof.md) - the type of a value -* [typeunder](typeunder.md) - the underlying type of a value -* [under](under.md) - the underlying value -* [unflatten](unflatten.md) - transform a record with dotted names to a nested record -* [upper](upper.md) - convert a string to upper case +* [abs](abs) - absolute value of a number +* [base64](base64) - encode/decode base64 strings +* [bucket](bucket) - quantize a time or duration value into buckets of equal widths +* [cast](cast) - coerce a value to a different type +* [ceil](ceil) - ceiling of a number +* [cidr_match](cidr_match) - test if IP is in a network +* [compare](compare) - return an int comparing two values +* [coalesce](coalesce) - return first value that is not null, a "missing" error, or a "quiet" error +* [crop](crop) - remove fields from a value that are missing in a specified type +* [error](error) - wrap a value as an error +* [every](every) - bucket `ts` using a duration +* [fields](fields) - return the flattened path names of a record +* [fill](fill) - add null values for missing record fields +* [flatten](flatten) - transform a record into a flattened map +* [floor](floor) - floor of a number +* [grep](grep) - search strings inside of values +* [grok](grok) - parse a string into a structured record +* [has](has) - test existence of values +* [hex](hex) - encode/decode hexadecimal strings +* [has_error](has_error) - test if a value has an error +* [is](is) - test a value's type +* [is_error](is_error) - test if a value is an error +* [join](join) - concatenate array of strings with a separator +* [kind](kind) - return a value's type category +* [ksuid](ksuid) - encode/decode KSUID-style unique identifiers +* [len](len) - the type-dependent length of a value +* [levenshtein](levenshtein) Levenshtein distance +* [log](log) - natural logarithm +* [lower](lower) - convert a string to lower case +* [map](map) - apply a function to each element of an array or set +* [missing](missing) - test for the "missing" error +* [nameof](nameof) - the name of a named type +* [nest_dotted](nest_dotted) - transform fields in a record with dotted names to nested records +* [network_of](network_of) - the network of an IP +* [now](now) - the current time +* [order](order) - reorder record fields +* [parse_uri](parse_uri) - parse a string URI into a structured record +* [parse_zson](parse_zson) - parse ZSON text into a Zed value +* [pow](pow) - exponential function of any base +* [quiet](quiet) - quiet "missing" errors +* [regexp](regexp) - perform a regular expression search on a string +* [regexp_replace](regexp_replace) - replace regular expression matches in a string +* [replace](replace) - replace one string for another +* [round](round) - round a number +* [rune_len](rune_len) - length of a string in Unicode code points +* [shape](shape) - apply cast, fill, and order +* [split](split) - slice a string into an array of strings +* [sqrt](sqrt) - square root of a number +* [strftime](strftime) - format time values +* [trim](trim) - strip leading and trailing whitespace +* [typename](typename) - look up and return a named type +* [typeof](typeof) - the type of a value +* [typeunder](typeunder) - the underlying type of a value +* [under](under) - the underlying value +* [unflatten](unflatten) - transform a record with dotted names to a nested record +* [upper](upper) - convert a string to upper case diff --git a/docs/language/functions/cast.md b/docs/language/functions/cast.md index 63819d1b66..2b349bbdf4 100644 --- a/docs/language/functions/cast.md +++ b/docs/language/functions/cast.md @@ -11,15 +11,15 @@ cast(val: any, name: string) -> any ### Description -The _cast_ function performs type casts but handles both [primitive types](../../formats/zed.md#1-primitive-types) and -[complex types](../../formats/zed.md#2-complex-types). If the input type `t` is a primitive type, then the result +The _cast_ function performs type casts but handles both [primitive types](../../formats/zed#1-primitive-types) and +[complex types](../../formats/zed#2-complex-types). If the input type `t` is a primitive type, then the result is equivalent to ``` t(val) ``` e.g., the result of `cast(1, )` is the same as `string(1)` which is `"1"`. In the second form, where the `name` argument is a string, cast creates -a new [named type](../data-types.md#named-types) where the name for the type is given by `name` and its +a new [named type](../data-types#named-types) where the name for the type is given by `name` and its type is given by `typeof(val)`. This provides a convenient mechanism to create new named types from the input data itself without having to hard code the type in the SuperPipe query. @@ -43,8 +43,8 @@ and the input value is returned when casting to complex types. :::tip Many users seeking to `cast` record values prefer to use the -[`shape` function](./shape.md) which applies the `cast`, [`fill`](./fill.md), -and [`order`](./order.md) functions simultaneously. +[`shape` function](./shape) which applies the `cast`, [`fill`](./fill), +and [`order`](./order) functions simultaneously. ::: ### Examples diff --git a/docs/language/functions/fill.md b/docs/language/functions/fill.md index a5dd47897e..588ce8daaa 100644 --- a/docs/language/functions/fill.md +++ b/docs/language/functions/fill.md @@ -22,8 +22,8 @@ If `val` is not a record, it is returned unmodified. :::tip Many users seeking the functionality of `fill` prefer to use the -[`shape` function](./shape.md) which applies the `fill`, [`cast`](./cast.md), -and [`order`](./order.md) functions simultaneously on a record. +[`shape` function](./shape) which applies the `fill`, [`cast`](./cast), +and [`order`](./order) functions simultaneously on a record. ::: ### Examples diff --git a/docs/language/functions/grep.md b/docs/language/functions/grep.md index 1cf0ccce91..5338cb0e25 100644 --- a/docs/language/functions/grep.md +++ b/docs/language/functions/grep.md @@ -13,8 +13,8 @@ grep( [, e: any]) -> bool The _grep_ function searches all of the strings in its input value `e` (or `this` if `e` is not given) using the `` argument, which can be a -[regular expression](../search-expressions.md#regular-expressions), -[glob pattern](../search-expressions.md#globs), or string. +[regular expression](../search-expressions#regular-expressions), +[glob pattern](../search-expressions#globs), or string. If the pattern matches for any string, then the result is `true`. Otherwise, it is `false`. > Note that string matches are case insensitive while regular expression diff --git a/docs/language/functions/grok.md b/docs/language/functions/grok.md index 56cf02e474..71e5e24f0b 100644 --- a/docs/language/functions/grok.md +++ b/docs/language/functions/grok.md @@ -57,7 +57,7 @@ issue describing your use case. to store `num` as an integer type instead of as a string. SuperPipe currently accepts this trailing `:type` syntax but effectively ignores it and stores all parsed values as strings. Downstream use of the - [`cast` function](cast.md) can be used instead for data type conversion. + [`cast` function](cast) can be used instead for data type conversion. ([super/4928](https://github.com/brimdata/super/issues/4928)) 2. Some Logstash Grok examples use an optional square bracket syntax for @@ -67,7 +67,7 @@ issue describing your use case. ``` to store a value into `{"nested": {"field": ... }}`. In SuperPipe the more common dot-separated field naming convention `nested.field` can be combined - with the downstream use of the [`nest_dotted` function](nest_dotted.md) to + with the downstream use of the [`nest_dotted` function](nest_dotted) to store values in nested fields. ([super/4929](https://github.com/brimdata/super/issues/4929)) @@ -164,7 +164,7 @@ echo '"2020-09-16T04:20:42.45+01:00 DEBUG This is a sample debug log message"' | } ``` -As with any [string literal](../expressions.md#literals), the +As with any [string literal](../expressions#literals), the leading backslash in escape sequences in string arguments must be doubled, such as changing the `\d` to `\\d` if we repurpose the [included pattern](#included-patterns) for `NUMTZ` as a `definitions` argument: diff --git a/docs/language/functions/has.md b/docs/language/functions/has.md index 8d296fe37a..be88691d02 100644 --- a/docs/language/functions/has.md +++ b/docs/language/functions/has.md @@ -12,7 +12,7 @@ has(val: any [, ... val: any]) -> bool The _has_ function returns false if any of its arguments are `error("missing")` and otherwise returns true. -`has(e)` is a shortcut for [`!missing(e)`](missing.md). +`has(e)` is a shortcut for [`!missing(e)`](missing). This function is most often used to test the existence of certain fields in an expected record, e.g., `has(a,b)` is true when `this` is a record and has diff --git a/docs/language/functions/map.md b/docs/language/functions/map.md index d73ce659ad..562f3be485 100644 --- a/docs/language/functions/map.md +++ b/docs/language/functions/map.md @@ -12,7 +12,7 @@ map(v: array|set, f: function) -> array|set The _map_ function applies function `f` to every element in array or set `v` and returns an array or set of the results. Function `f` must be a function that takes -only one argument. `f` may be a [user-defined function](../statements.md#func-statements). +only one argument. `f` may be a [user-defined function](../statements#func-statements). ### Examples diff --git a/docs/language/functions/order.md b/docs/language/functions/order.md index 7d0e645c5f..9211c2f0ae 100644 --- a/docs/language/functions/order.md +++ b/docs/language/functions/order.md @@ -28,13 +28,13 @@ order(val, <{}>) ``` :::tip Many users seeking the functionality of `order` prefer to use the -[`shape` function](./shape.md) which applies the `order`, [`cast`](./cast.md), -and [`fill`](./fill.md) functions simultaneously on a record. +[`shape` function](./shape) which applies the `order`, [`cast`](./cast), +and [`fill`](./fill) functions simultaneously on a record. ::: :::tip Note -[Record expressions](../expressions.md#record-expressions) can also be used to -reorder fields without specifying types ([example](../shaping.md#order)). +[Record expressions](../expressions#record-expressions) can also be used to +reorder fields without specifying types ([example](../shaping#order)). ::: ### Examples diff --git a/docs/language/functions/regexp.md b/docs/language/functions/regexp.md index b736dc5db2..f79d401129 100644 --- a/docs/language/functions/regexp.md +++ b/docs/language/functions/regexp.md @@ -11,7 +11,7 @@ regexp(re: string|regexp, s: string) -> any ### Description The _regexp_ function returns an array of strings holding the text of the left most match of the regular expression `re`, which can be either -a string value or a [regular expression](../search-expressions.md#regular-expressions), +a string value or a [regular expression](../search-expressions#regular-expressions), and the matches of each parenthesized subexpression (also known as capturing groups) if there are any. A null value indicates no match. diff --git a/docs/language/functions/regexp_replace.md b/docs/language/functions/regexp_replace.md index 8bfc3fc43d..37baf10cbd 100644 --- a/docs/language/functions/regexp_replace.md +++ b/docs/language/functions/regexp_replace.md @@ -11,7 +11,7 @@ regexp_replace(s: string, re: string|regexp, new: string) -> string ### Description The _regexp_replace_ function substitutes all characters matching the -[regular expression](../search-expressions.md#regular-expressions) `re` in string `s` with +[regular expression](../search-expressions#regular-expressions) `re` in string `s` with the string `new`. Variables in `new` are replaced with corresponding matches drawn from `s`. diff --git a/docs/language/functions/shape.md b/docs/language/functions/shape.md index dea7cb2f59..eed2b49b50 100644 --- a/docs/language/functions/shape.md +++ b/docs/language/functions/shape.md @@ -11,12 +11,12 @@ shape(val: any, t: type) -> any ### Description The _shape_ function applies the -[`cast`](cast.md), -[`fill`](fill.md), and -[`order`](order.md) functions to its input to provide an -overall [data shaping](../shaping.md) operation. +[`cast`](cast), +[`fill`](fill), and +[`order`](order) functions to its input to provide an +overall [data shaping](../shaping) operation. -Note that `shape` does not perform a [`crop` function](./crop.md) so +Note that `shape` does not perform a [`crop` function](./crop) so extra fields in the input are propagated to the output. ### Examples diff --git a/docs/language/functions/typename.md b/docs/language/functions/typename.md index 21c043405c..26d006ed3d 100644 --- a/docs/language/functions/typename.md +++ b/docs/language/functions/typename.md @@ -10,8 +10,8 @@ typename(name: string) -> type ### Description -The _typename_ function returns the [type](../../formats/jsup.md#25-types) of the -[named type](../../formats/jsup.md#258-named-type) given by `name` if it exists. Otherwise, `error("missing")` is returned. +The _typename_ function returns the [type](../../formats/jsup#25-types) of the +[named type](../../formats/jsup#258-named-type) given by `name` if it exists. Otherwise, `error("missing")` is returned. ### Examples diff --git a/docs/language/functions/typeof.md b/docs/language/functions/typeof.md index c8924e8c56..c6a740bcca 100644 --- a/docs/language/functions/typeof.md +++ b/docs/language/functions/typeof.md @@ -10,7 +10,7 @@ typeof(val: any) -> type ### Description -The _typeof_ function returns the [type](../../formats/jsup.md#25-types) of +The _typeof_ function returns the [type](../../formats/jsup#25-types) of its argument `val`. Types are first class so the returned type is also a value. The type of a type is type `type`. diff --git a/docs/language/functions/typeunder.md b/docs/language/functions/typeunder.md index 89f59f0b71..97f970e3d9 100644 --- a/docs/language/functions/typeunder.md +++ b/docs/language/functions/typeunder.md @@ -11,7 +11,7 @@ typeunder(val: any) -> type ### Description The _typeunder_ function returns the type of its argument `val`. If this type is a -[named type](../../formats/zed.md#3-named-type), then the referenced type is +[named type](../../formats/zed#3-named-type), then the referenced type is returned instead of the named type. ### Examples diff --git a/docs/language/functions/upper.md b/docs/language/functions/upper.md index f3c3f44223..bcafe276be 100644 --- a/docs/language/functions/upper.md +++ b/docs/language/functions/upper.md @@ -23,7 +23,7 @@ echo '"Super JSON"' | super -z -c 'yield upper(this)' - "SUPER JSON" ``` -[Slices](../expressions.md#slices) can be used to uppercase a subset of a string as well. +[Slices](../expressions#slices) can be used to uppercase a subset of a string as well. ```mdtest-command echo '"super JSON"' | diff --git a/docs/language/lateral-subqueries.md b/docs/language/lateral-subqueries.md index 86294eafb9..8b5bcdb309 100644 --- a/docs/language/lateral-subqueries.md +++ b/docs/language/lateral-subqueries.md @@ -6,7 +6,7 @@ title: Lateral Subqueries Lateral subqueries provide a powerful means to apply a Zed query to each subsequence of values generated from an outer sequence of values. The inner query may be _any_ pipeline operator sequence (excluding -[`from` operators](operators/from.md)) and may refer to values from +[`from` operators](operators/from)) and may refer to values from the outer sequence. :::tip Note @@ -15,7 +15,7 @@ join", which runs a subquery for each row of the outer query's results. ::: Lateral subqueries are created using the scoped form of the -[`over` operator](operators/over.md). They may be nested to arbitrary depth +[`over` operator](operators/over). They may be nested to arbitrary depth and accesses to variables in parent lateral query bodies follows lexical scoping. @@ -69,7 +69,7 @@ produces ## Lateral Scope A lateral scope has the form `=> ( )` and currently appears -only the context of an [`over` operator](operators/over.md), +only the context of an [`over` operator](operators/over), as illustrated above, and has the form: ``` over ... with [, ...] => ( ) @@ -89,7 +89,7 @@ In the field reference form, a single identifier `` refers to a field in the parent scope and makes that field's value available in the lateral scope via the same name. -Note that any such variable definitions override [implied field references](pipeline-model.md#implied-field-references) of +Note that any such variable definitions override [implied field references](pipeline-model#implied-field-references) of `this`. If a both a field named `x` and a variable named `x` need be referenced in the lateral scope, the field reference should be qualified as `this.x` while the variable is referenced simply as `x`. @@ -101,8 +101,8 @@ This query runs to completion for each inner sequence and emits each subquery result as each inner sequence traversal completes. This structure is powerful because _any_ pipeline operator sequence (excluding -[`from` operators](operators/from.md)) can appear in the body of -the lateral scope. In contrast to the [`yield`](operators/yield.md) example above, a [`sort`](operators/sort.md) could be +[`from` operators](operators/from)) can appear in the body of +the lateral scope. In contrast to the [`yield`](operators/yield) example above, a [`sort`](operators/sort) could be applied to each subsequence in the subquery, where `sort` reads all values of the subsequence, sorts them, emits them, then repeats the process for the next subsequence. For example, @@ -126,7 +126,7 @@ parenthesized form: ``` :::tip -The parentheses disambiguate a lateral expression from a [lateral pipeline operator](operators/over.md). +The parentheses disambiguate a lateral expression from a [lateral pipeline operator](operators/over). ::: This form must always include a [lateral scope](#lateral-scope) as indicated by ``. @@ -186,6 +186,6 @@ produces {s:[4,5]} ``` Similarly, a primitive value may be consistently produced by concluding the -lateral scope with an operator such as [`head`](operators/head.md) or -[`tail`](operators/tail.md), or by applying certain [aggregate functions](aggregates/_index.md) -such as done with [`sum`](aggregates/sum.md) above. +lateral scope with an operator such as [`head`](operators/head) or +[`tail`](operators/tail), or by applying certain [aggregate functions](aggregates) +such as done with [`sum`](aggregates/sum) above. diff --git a/docs/language/operators/_index.md b/docs/language/operators/_index.md index e6192a512a..4a5cd989e5 100644 --- a/docs/language/operators/_index.md +++ b/docs/language/operators/_index.md @@ -3,34 +3,34 @@ title: Operators --- Operators process a sequence of input values to create an output sequence -and appear as the components of a [pipeline](../pipeline-model.md). In addition to the built-in +and appear as the components of a [pipeline](../pipeline-model). In addition to the built-in operators listed below, Zed also allows for the creation of -[user-defined operators](../statements.md#operator-statements). +[user-defined operators](../statements#operator-statements). -* [assert](assert.md) - evaluate an assertion -* [combine](combine.md) - combine parallel pipeline branches into a single output -* [cut](cut.md) - extract subsets of record fields into new records -* [drop](drop.md) - drop fields from record values -* [file](from.md) - source data from a file -* [fork](fork.md) - copy values to parallel pipeline branches -* [from](from.md) - source data from pools, files, or URIs -* [fuse](fuse.md) - coerce all input values into a merged type -* [get](from.md) - source data from a URI -* [head](head.md) - copy leading values of input sequence -* [join](join.md) - combine data from two inputs using a join predicate -* [load](load.md) - add and commit data to a pool -* [merge](merge.md) - combine parallel pipeline branches into a single, ordered output -* [over](over.md) - traverse nested values as a lateral query -* [pass](pass.md) - copy input values to output -* [put](put.md) - add or modify fields of records -* [rename](rename.md) - change the name of record fields -* [sample](sample.md) - select one value of each shape -* [search](search.md) - select values based on a search expression -* [sort](sort.md) - sort values -* [summarize](summarize.md) - perform aggregations -* [switch](switch.md) - route values based on cases -* [tail](tail.md) - copy trailing values of input sequence -* [top](top.md) - get top N sorted values of input sequence -* [uniq](uniq.md) - deduplicate adjacent values -* [where](where.md) - select values based on a Boolean expression -* [yield](yield.md) - emit values from expressions +* [assert](assert) - evaluate an assertion +* [combine](combine) - combine parallel pipeline branches into a single output +* [cut](cut) - extract subsets of record fields into new records +* [drop](drop) - drop fields from record values +* [file](from) - source data from a file +* [fork](fork) - copy values to parallel pipeline branches +* [from](from) - source data from pools, files, or URIs +* [fuse](fuse) - coerce all input values into a merged type +* [get](from) - source data from a URI +* [head](head) - copy leading values of input sequence +* [join](join) - combine data from two inputs using a join predicate +* [load](load) - add and commit data to a pool +* [merge](merge) - combine parallel pipeline branches into a single, ordered output +* [over](over) - traverse nested values as a lateral query +* [pass](pass) - copy input values to output +* [put](put) - add or modify fields of records +* [rename](rename) - change the name of record fields +* [sample](sample) - select one value of each shape +* [search](search) - select values based on a search expression +* [sort](sort) - sort values +* [summarize](summarize) - perform aggregations +* [switch](switch) - route values based on cases +* [tail](tail) - copy trailing values of input sequence +* [top](top) - get top N sorted values of input sequence +* [uniq](uniq) - deduplicate adjacent values +* [where](where) - select values based on a Boolean expression +* [yield](yield) - emit values from expressions diff --git a/docs/language/operators/cut.md b/docs/language/operators/cut.md index 6c05d82b77..823e56aba1 100644 --- a/docs/language/operators/cut.md +++ b/docs/language/operators/cut.md @@ -10,7 +10,7 @@ cut [:=] [, [:=] ...] ### Description The `cut` operator extracts values from each input record in the -form of one or more [field assignments](../pipeline-model.md#field-assignments), +form of one or more [field assignments](../pipeline-model#field-assignments), creating one field for each expression. Unlike the `put` operator, which adds or modifies the fields of a record, `cut` retains only the fields enumerated, much like a SQL projection. @@ -34,7 +34,7 @@ resulting in `error("missing")` for expressions that reference fields of `this`. Note that when the field references are all top level, `cut` is a special case of a yield with a -[record literal](../expressions.md#record-expressions) having the form: +[record literal](../expressions#record-expressions) having the form: ``` yield {: [, :...]} ``` @@ -79,7 +79,7 @@ _Invoke a function while cutting to set a default value for a field_ :::tip This can be helpful to transform data into a uniform record type, such as if the output will be exported in formats such as `csv` or `parquet` (see also: -[`fuse`](fuse.md)). +[`fuse`](fuse)). ::: ```mdtest-command diff --git a/docs/language/operators/file.md b/docs/language/operators/file.md index b3fff30404..dba58d8030 100644 --- a/docs/language/operators/file.md +++ b/docs/language/operators/file.md @@ -4,9 +4,9 @@ ### Synopsis -`file` is a shorthand notation for `from`. See the [from operator](from.md) documentation for details. +`file` is a shorthand notation for `from`. See the [from operator](from) documentation for details. :::tip Note The `file` shorthand is exclusively for working with inputs to -[`super`](../../commands/super.md) and is not available for use with [SuperDB data lakes](../../commands/super-db.md). +[`super`](../../commands/super) and is not available for use with [SuperDB data lakes](../../commands/super-db). ::: diff --git a/docs/language/operators/fork.md b/docs/language/operators/fork.md index d57c6a1fe7..0e928b0296 100644 --- a/docs/language/operators/fork.md +++ b/docs/language/operators/fork.md @@ -18,7 +18,7 @@ the pipeline. The output of a fork consists of multiple branches that must be merged. If the downstream operator expects a single input, then the output branches are -merged with an automatically inserted [combine operator](combine.md). +merged with an automatically inserted [combine operator](combine). ### Examples diff --git a/docs/language/operators/from.md b/docs/language/operators/from.md index 961d20e865..3a9786764e 100644 --- a/docs/language/operators/from.md +++ b/docs/language/operators/from.md @@ -22,20 +22,20 @@ from ( The `from` operator identifies one or more data sources and transmits their data to its output. A data source can be -* the name of a data pool in a SuperDB lake, with optional [commitish](../../commands/super-db.md#commitish); -* the names of multiple data pools, expressed as a [regular expression](../search-expressions.md#regular-expressions) or [glob](../search-expressions.md#globs) pattern; +* the name of a data pool in a SuperDB lake, with optional [commitish](../../commands/super-db#commitish); +* the names of multiple data pools, expressed as a [regular expression](../search-expressions#regular-expressions) or [glob](../search-expressions#globs) pattern; * a path to a file; * an HTTP, HTTPS, or S3 URI; or -* the [`pass` operator](pass.md), to treat the upstream pipeline branch as a source. +* the [`pass` operator](pass), to treat the upstream pipeline branch as a source. :::tip Note -File paths and URIs may be followed by an optional [format](../../commands/super.md#input-formats) specifier. +File paths and URIs may be followed by an optional [format](../../commands/super#input-formats) specifier. ::: Sourcing data from pools is only possible when querying a lake, such as -via the [`super db` command](../../commands/super-db.md) or -[SuperDB lake API](../../lake/api.md). Sourcing data from files is only possible -with the [`super` command](../../commands/super.md). +via the [`super db` command](../../commands/super-db) or +[SuperDB lake API](../../lake/api). Sourcing data from files is only possible +with the [`super` command](../../commands/super). When a single pool name is specified without `@`-referencing a commit or ID, or when using a pool pattern, the tip of the `main` branch of each pool is @@ -43,9 +43,9 @@ accessed. In the first four forms, a single source is connected to a single output. In the fifth form, multiple sources are accessed in parallel and may be -[joined](join.md), [combined](combine.md), or [merged](merge.md). +[joined](join), [combined](combine), or [merged](merge). -A pipeline can be split with the [`fork` operator](fork.md) as in +A pipeline can be split with the [`fork` operator](fork) as in ``` from PoolOne |> fork ( => op1 |> op2 | ... @@ -62,7 +62,7 @@ from ( ``` Similarly, data can be routed to different pipeline branches with replication -using the [`switch` operator](switch.md): +using the [`switch` operator](switch): ``` from ... |> switch color ( case "red" => op1 |> op2 | ... diff --git a/docs/language/operators/get.md b/docs/language/operators/get.md index ca9bc6987b..fc8d1d83cf 100644 --- a/docs/language/operators/get.md +++ b/docs/language/operators/get.md @@ -4,4 +4,4 @@ ### Synopsis -`get` is a shorthand notation for `from`. See the [from operator](from.md) documentation for details. +`get` is a shorthand notation for `from`. See the [from operator](from) documentation for details. diff --git a/docs/language/operators/join.md b/docs/language/operators/join.md index 83c7bf6461..5008a51008 100644 --- a/docs/language/operators/join.md +++ b/docs/language/operators/join.md @@ -18,7 +18,7 @@ The first `join` syntax shown above was more recently introduced and is in some ways similar to other languages such as SQL. The second was the original `join` syntax in SuperPipe. Most joins can be expressed using either syntax. See the -[join tutorial](../../tutorials/join.md) +[join tutorial](../../tutorials/join) for details. ::: @@ -44,4 +44,4 @@ For anti join, the `` is undefined and thus cannot be specified. ### Examples -The [join tutorial](../../tutorials/join.md) includes several examples. +The [join tutorial](../../tutorials/join) includes several examples. diff --git a/docs/language/operators/load.md b/docs/language/operators/load.md index 4bf2a6c4d5..2848a51d2d 100644 --- a/docs/language/operators/load.md +++ b/docs/language/operators/load.md @@ -10,18 +10,18 @@ load [@] [author ] [message ] [meta ] :::tip Note The `load` operator is exclusively for working with pools in a -[SuperDB data lake](../../commands/super-db.md) and is not available for use in -[`super`](../../commands/super.md). +[SuperDB data lake](../../commands/super-db) and is not available for use in +[`super`](../../commands/super). ::: ### Description The `load` operator populates the specified `` with the values it -receives as input. Much like how [`super db load`](../../commands/super-db.md#load) +receives as input. Much like how [`super db load`](../../commands/super-db#load) is used at the command line to populate a pool with data from files, streams, and URIs, the `load` operator is used to save query results from your SuperPipe query to a pool in the same SuperDB data lake. `` is a string indicating the -[name or ID](../../commands/super-db.md#data-pools) of the destination pool. +[name or ID](../../commands/super-db#data-pools) of the destination pool. If the optional `@` string is included then the data will be committed to an existing branch of that name, otherwise the `main` branch is assumed. The `author`, `message`, and `meta` strings may also be provided to further diff --git a/docs/language/operators/over.md b/docs/language/operators/over.md index f16781d8cb..243f48daa6 100644 --- a/docs/language/operators/over.md +++ b/docs/language/operators/over.md @@ -12,8 +12,8 @@ The `over` operator traverses complex values to create a new sequence of derived values (e.g., the elements of an array) and either (in the first form) sends the new values directly to its output or (in the second form) sends the values to a scoped computation as indicated -by ``, which may represent any SuperPipe [subquery](../lateral-subqueries.md) operating on the -derived sequence of values as [`this`](../pipeline-model.md#the-special-value-this). +by ``, which may represent any SuperPipe [subquery](../lateral-subqueries) operating on the +derived sequence of values as [`this`](../pipeline-model#the-special-value-this). Each expression `` is evaluated in left-to-right order and derived sequences are generated from each such result depending on its types: @@ -22,11 +22,11 @@ generated from each such result depending on its types: entry in the map, and * all other values generate a single value equal to itself. -Records can be converted to maps with the [`flatten` function](../functions/flatten.md) +Records can be converted to maps with the [`flatten` function](../functions/flatten) resulting in a map that can be traversed, e.g., if `this` is a record, it can be traversed with `over flatten(this)`. -The nested subquery depicted as `` is called a [lateral subquery](../lateral-subqueries.md). +The nested subquery depicted as `` is called a [lateral subquery](../lateral-subqueries). ### Examples diff --git a/docs/language/operators/pass.md b/docs/language/operators/pass.md index 38f0ef582a..5e8ecf735a 100644 --- a/docs/language/operators/pass.md +++ b/docs/language/operators/pass.md @@ -11,7 +11,7 @@ pass The `pass` operator outputs a copy of each input value. It is typically used with operators that handle multiple branches of the pipeline such as -[`fork`](fork.md) and [`join`](join.md). +[`fork`](fork) and [`join`](join). ### Examples diff --git a/docs/language/operators/put.md b/docs/language/operators/put.md index 2426e3387c..597c0bee65 100644 --- a/docs/language/operators/put.md +++ b/docs/language/operators/put.md @@ -9,7 +9,7 @@ ### Description The `put` operator modifies its input with -one or more [field assignments](../pipeline-model.md#field-assignments). +one or more [field assignments](../pipeline-model#field-assignments). Each expression is evaluated based on the input record and the result is either assigned to a new field of the input record if it does not exist, or the existing field is modified in its original location with the result. @@ -23,7 +23,7 @@ a computed value cannot be referenced in another expression. If you need to re-use a computed result, this can be done by chaining multiple `put` operators. The `put` keyword is optional since it is an -[implied operator](../pipeline-model.md#implied-operators). +[implied operator](../pipeline-model#implied-operators). Each `` expression must be a field reference expressed as a dotted path or one more constant index operations on `this`, e.g., `a.b`, `this["a"]["b"]`, @@ -35,7 +35,7 @@ For any input value that is not a record, an error is emitted. Note that when the field references are all top level, `put` is a special case of a `yield` with a -[record literal](../expressions.md#record-expressions) +[record literal](../expressions#record-expressions) using a spread operator of the form: ``` yield {...this, : [, :...]} diff --git a/docs/language/operators/search.md b/docs/language/operators/search.md index 69aeef6b66..acd4edc451 100644 --- a/docs/language/operators/search.md +++ b/docs/language/operators/search.md @@ -8,12 +8,12 @@ ``` ### Description -The `search` operator filters its input by applying a [search expression](../search-expressions.md) `` +The `search` operator filters its input by applying a [search expression](../search-expressions) `` to each input value and dropping each value for which the expression evaluates to `false` or to an error. The `search` keyword is optional since it is an -[implied operator](../pipeline-model.md#implied-operators). +[implied operator](../pipeline-model#implied-operators). When Zed queries are run interactively, it is convenient to be able to omit the "search" keyword, but when search filters appear in Zed source files, @@ -48,7 +48,7 @@ echo '1 2 3' | super -z -c '? 2 or 3' - 2 3 ``` -_A search with [Boolean logic](../search-expressions.md#boolean-logic)_ +_A search with [Boolean logic](../search-expressions#boolean-logic)_ ```mdtest-command echo '1 2 3' | super -z -c 'search this >= 2 AND this <= 2' - ``` diff --git a/docs/language/operators/sort.md b/docs/language/operators/sort.md index 0e98672818..77c8ba4e2f 100644 --- a/docs/language/operators/sort.md +++ b/docs/language/operators/sort.md @@ -66,7 +66,7 @@ echo '2 null 1 3' | super -z -c 'sort this' - 3 null ``` -_With no sort expression, sort will sort by [`this`](../pipeline-model.md#the-special-value-this) for non-records_ +_With no sort expression, sort will sort by [`this`](../pipeline-model#the-special-value-this) for non-records_ ```mdtest-command echo '2 null 1 3' | super -z -c sort - ``` diff --git a/docs/language/operators/summarize.md b/docs/language/operators/summarize.md index 3edcc78b2a..81e77124cb 100644 --- a/docs/language/operators/summarize.md +++ b/docs/language/operators/summarize.md @@ -14,7 +14,7 @@ ### Description In the first four forms, the `summarize` operator consumes all of its input, -applies an [aggregate function](../aggregates/_index.md) to each input value +applies an [aggregate function](../aggregates) to each input value optionally filtered by a `where` clause and/or organized with the group-by keys specified after the `by` keyword, and at the end of input produces one or more aggregations for each unique set of group-by key values. @@ -24,16 +24,16 @@ unique combination of values of the group-by keys specified after the `by` keyword. The `summarize` keyword is optional since it is an -[implied operator](../pipeline-model.md#implied-operators). +[implied operator](../pipeline-model#implied-operators). Each aggregate function may be optionally followed by a `where` clause, which applies a Boolean expression that indicates, for each input value, whether to deliver it to that aggregate. (`where` clauses are analogous -to the [`where` operator](where.md).) +to the [`where` operator](where).) The output field names for each aggregate and each key are optional. If omitted, a field name is inferred from each right-hand side, e.g, the output field for the -[`count` aggregate function](../aggregates/count.md) is simply `count`. +[`count` aggregate function](../aggregates/count) is simply `count`. A key may be either an expression or a field. If the key field is omitted, it is inferred from the expression, e.g., the field name for `by lower(s)` diff --git a/docs/language/operators/switch.md b/docs/language/operators/switch.md index 8e65955fee..3010405911 100644 --- a/docs/language/operators/switch.md +++ b/docs/language/operators/switch.md @@ -42,7 +42,7 @@ where it appears does not influence the result. The output of a switch consists of multiple branches that must be merged. If the downstream operator expects a single input, then the output branches are -merged with an automatically inserted [combine operator](combine.md). +merged with an automatically inserted [combine operator](combine). ### Examples diff --git a/docs/language/operators/top.md b/docs/language/operators/top.md index 559fc964cb..0f5453224d 100644 --- a/docs/language/operators/top.md +++ b/docs/language/operators/top.md @@ -13,7 +13,7 @@ The `top` operator returns the top N values from a sequence sorted in descending order by one or more expressions. N is given by ``, a compile-time constant expression that evaluates to a positive integer. -`top` is functionally similar to [`sort`](sort.md) but is less resource +`top` is functionally similar to [`sort`](sort) but is less resource intensive because only the top N values are stored in memory (i.e., values less than the minimum are discarded). diff --git a/docs/language/operators/where.md b/docs/language/operators/where.md index 2f26062486..c52c4ec0cc 100644 --- a/docs/language/operators/where.md +++ b/docs/language/operators/where.md @@ -13,11 +13,11 @@ to each input value and dropping each value for which the expression evaluates to `false` or to an error. The `where` keyword is optional since it is an -[implied operator](../pipeline-model.md#implied-operators). +[implied operator](../pipeline-model#implied-operators). The "where" keyword requires a boolean-valued expression and does not support -[search expressions](../search-expressions.md). Use the -[search operator](search.md) if you want search syntax. +[search expressions](../search-expressions). Use the +[search operator](search) if you want search syntax. When SuperPipe queries are run interactively, it is highly convenient to be able to omit the "where" keyword, but when where filters appear in query source files, diff --git a/docs/language/operators/yield.md b/docs/language/operators/yield.md index fa08062018..fd35c262e6 100644 --- a/docs/language/operators/yield.md +++ b/docs/language/operators/yield.md @@ -12,10 +12,10 @@ The `yield` operator produces output values by evaluating one or more expressions on each input value and sending each result to the output in left-to-right order. Each `` may be any valid -[expression](../expressions.md). +[expression](../expressions). The `yield` keyword is optional since it is an -[implied operator](../pipeline-model.md#implied-operators). +[implied operator](../pipeline-model#implied-operators). ### Examples diff --git a/docs/language/overview.md b/docs/language/overview.md index ff5e1d62ca..e11ef81dd6 100644 --- a/docs/language/overview.md +++ b/docs/language/overview.md @@ -14,10 +14,10 @@ by a number of commands: command |> command | command | ... ``` However, in Zed, the entities that transform data are called -"[operators](operators/_index.md)" instead of "commands" and unlike Unix pipelines, +"[operators](operators)" instead of "commands" and unlike Unix pipelines, the streams of data in a Zed query are typed data sequences that adhere to the -[Zed data model](../formats/zed.md). +[Zed data model](../formats/zed). Moreover, Zed sequences can be forked and joined: ``` operator @@ -48,7 +48,7 @@ much as a modern SQL engine optimizes a declarative SQL query. ## Search and Analytics Zed is also intended to provide a seamless transition from a simple search experience -(e.g., typed into a search bar or as the query argument of the [`super`](../commands/super.md) command-line +(e.g., typed into a search bar or as the query argument of the [`super`](../commands/super) command-line tool) to more a complex analytics experience composed of complex joins and aggregations where the Zed language source text would typically be authored in a editor and managed under source-code control. @@ -66,7 +66,7 @@ is a search for values with both the strings "example.com" and "urgent" present. Unlike typical log search systems, the Zed language operators are uniform: you can specify an operator including keyword search terms, Boolean predicates, -etc. using the same [search expression](search-expressions.md) syntax at any point +etc. using the same [search expression](search-expressions) syntax at any point in the pipeline. For example, @@ -111,12 +111,12 @@ search "example.com" AND "urgent" The following sections continue describing the Zed language. -* [The Pipeline Model](pipeline-model.md) -* [Data Types](data-types.md) -* [Const, Func, Operator, and Type Statements](statements.md) -* [Expressions](expressions.md) -* [Search Expressions](search-expressions.md) -* [Lateral Subqueries](lateral-subqueries.md) -* [Shaping and Type Fusion](shaping.md) +* [The Pipeline Model](pipeline-model) +* [Data Types](data-types) +* [Const, Func, Operator, and Type Statements](statements) +* [Expressions](expressions) +* [Search Expressions](search-expressions) +* [Lateral Subqueries](lateral-subqueries) +* [Shaping and Type Fusion](shaping) -You may also be interested in the detailed reference materials on [operators](operators/_index.md), [functions](functions/_index.md), and [aggregate functions](aggregates/_index.md), as well as the [conventions](conventions.md) for how they're described. +You may also be interested in the detailed reference materials on [operators](operators), [functions](functions), and [aggregate functions](aggregates), as well as the [conventions](conventions) for how they're described. diff --git a/docs/language/pipeline-model.md b/docs/language/pipeline-model.md index a8e1421558..75830224e0 100644 --- a/docs/language/pipeline-model.md +++ b/docs/language/pipeline-model.md @@ -6,19 +6,19 @@ title: Pipeline Model In SuperPipe, each operator takes its input from the output of its upstream operator beginning either with a data source or with an implied source. -All available operators are listed on the [reference page](operators/_index.md). +All available operators are listed on the [reference page](operators). ## Pipeline Sources In addition to the data sources specified as files on the `zq` command line, -a source may also be specified with the [`from` operator](operators/from.md). +a source may also be specified with the [`from` operator](operators/from). When running on the command-line, `from` may refer to a file, an HTTP -endpoint, or an [S3](../integrations/amazon-s3.md) URI. When running in a [SuperDB data lake](../commands/super-db.md), `from` typically +endpoint, or an [S3](../integrations/amazon-s3) URI. When running in a [SuperDB data lake](../commands/super-db), `from` typically refers to a collection of data called a "data pool" and is referenced using the pool's name much as SQL references database tables by their name. -For more detail, see the reference page of the [`from` operator](operators/from.md), +For more detail, see the reference page of the [`from` operator](operators/from), but as an example, you might use the `get` form of `from` to fetch data from an HTTP endpoint and process it with `super`, in this case, to extract the description and license of a GitHub repository: @@ -35,7 +35,7 @@ echo '"hello, world"' | super - The examples throughout the language documentation use this "echo pattern" to standard input of `super -` to illustrate language semantics. Note that in these examples, the input values are expressed as a sequence of values serialized -in the [Super JSON format](../formats/jsup.md) +in the [Super JSON format](../formats/jsup) and the `super` query text expressed as the `-c` argument of the `super` command is expressed in the syntax of the SuperPipe language described here. @@ -45,7 +45,7 @@ Each operator is identified by name and performs a specific operation on a stream of records. Some operators, like -[`summarize`](operators/summarize.md) or [`sort`](operators/sort.md), +[`summarize`](operators/summarize) or [`sort`](operators/sort), read all of their input before producing output, though `summarize` can produce incremental results when the group-by key is aligned with the order of the input. @@ -58,24 +58,24 @@ on values as they are produced. For example, a long running query that produces incremental output will stream results as they are produced, i.e., running `zq` to standard output will display results incrementally. -The [`search`](operators/search.md) and [`where`](operators/where.md) +The [`search`](operators/search) and [`where`](operators/where) operators "find" values in their input and drop the ones that do not match what is being looked for. -The [`yield` operator](operators/yield.md) emits one or more output values -for each input value based on arbitrary [expressions](expressions.md), +The [`yield` operator](operators/yield) emits one or more output values +for each input value based on arbitrary [expressions](expressions), providing a convenient means to derive arbitrary output values as a function of each input value, much like the map concept in the MapReduce framework. -The [`fork` operator](operators/fork.md) copies its input to parallel +The [`fork` operator](operators/fork) copies its input to parallel branches of a pipeline. The output of these parallel branches can be combined in a number of ways: -* merged in sorted order using the [`merge` operator](operators/merge.md), -* joined using the [`join` operator](operators/join.md), or -* combined in an undefined order using the implied [`combine` operator](operators/combine.md). +* merged in sorted order using the [`merge` operator](operators/merge), +* joined using the [`join` operator](operators/join), or +* combined in an undefined order using the implied [`combine` operator](operators/combine). A pipeline can also be split to multiple branches using the -[`switch` operator](operators/switch.md), in which data is routed to only one +[`switch` operator](operators/switch), in which data is routed to only one corresponding branch (or dropped) based on the switch clauses. Switch operators typically @@ -101,10 +101,10 @@ produces ``` Note that the output order of the switch branches is undefined (indeed they run in parallel on multiple threads). To establish a consistent sequence order, -a [`merge` operator](operators/merge.md) +a [`merge` operator](operators/merge) may be applied at the output of the switch specifying a sort key upon which to order the upstream data. Often such order does not matter (e.g., when the output -of the switch hits an [aggregator](aggregates/_index.md)), in which case it is typically more performant +of the switch hits an [aggregator](aggregates)), in which case it is typically more performant to omit the merge (though the SuperDB runtime will often delete such unnecessary operations automatically as part optimizing queries when they are compiled). @@ -115,7 +115,7 @@ forwarded from the switch to the downstream operator in an undefined order. ## The Special Value `this` In SuperPipe, there are no looping constructs and variables are limited to binding -values between [lateral scopes](lateral-subqueries.md#lateral-scope). +values between [lateral scopes](lateral-subqueries#lateral-scope). Instead, the input sequence to an operator is produced continuously and any output values are derived from input values. @@ -134,7 +134,7 @@ produces this case-sensitive output: "bar" "foo" ``` -But we can make the sort case-insensitive by applying a [function](functions/_index.md) to the +But we can make the sort case-insensitive by applying a [function](functions) to the input values with the expression `lower(this)`, which converts each value to lower-case for use in in the sort without actually modifying the input value, e.g., @@ -153,7 +153,7 @@ produces A common SuperPipe use case is to process sequences of record-oriented data (e.g., arising from formats like JSON or Avro) in the form of events or structured logs. In this case, the input values to the operators -are [records](../formats/zed.md#21-record) and the fields of a record are referenced with the dot operator. +are [records](../formats/zed#21-record) and the fields of a record are referenced with the dot operator. For example, if the input above were a sequence of records instead of strings and perhaps contained a second field, e.g., @@ -180,8 +180,8 @@ is shorthand for `sort this.s` ## Field Assignments A typical operation in records involves -adding or changing the fields of a record using the [`put` operator](operators/put.md) -or extracting a subset of fields using the [`cut` operator](operators/cut.md). +adding or changing the fields of a record using the [`put` operator](operators/put) +or extracting a subset of fields using the [`cut` operator](operators/cut). Also, when aggregating data using group-by keys, the group-by assignments create new named record fields. @@ -213,7 +213,7 @@ experience, SuperPipe has a canonical, long form that can be abbreviated using syntax that supports an agile, interactive query workflow. To this end, SuperPipe allows certain operator names to be optionally omitted when they can be inferred from context. For example, the expression following -the [`summarize` operator](operators/summarize.md) +the [`summarize` operator](operators/summarize) ``` summarize count() by id ``` @@ -231,7 +231,7 @@ is abbreviated foo bar or x > 100 ``` Furthermore, if an operator-free expression is not valid syntax for -a search expression but is a valid [expression](expressions.md), +a search expression but is a valid [expression](expressions), then the abbreviation is treated as having an implied `yield` operator, e.g., ``` {s:lower(s)} @@ -249,7 +249,7 @@ the implied record field named `foo`. Another common query pattern involves adding or mutating fields of records where the input is presumed to be a sequence of records. -The [`put` operator](operators/put.md) provides this mechanism and the `put` +The [`put` operator](operators/put) provides this mechanism and the `put` keyword is implied by the [field assignment](#field-assignments) syntax `:=`. For example, the operation diff --git a/docs/language/search-expressions.md b/docs/language/search-expressions.md index 527d8f7e57..98b5c27727 100644 --- a/docs/language/search-expressions.md +++ b/docs/language/search-expressions.md @@ -7,13 +7,13 @@ Search expressions provide a hybrid syntax between keyword search and boolean expressions. In this way, a search is a shorthand for a "lean forward" style activity where one is interactively exploring data with ad hoc searches. All shorthand searches have a corresponding -long form built from the [expression syntax](expressions.md) in combination with the -[search term syntax](search-expressions.md#search-terms) described below. +long form built from the [expression syntax](expressions) in combination with the +[search term syntax](search-expressions#search-terms) described below. ## Search Patterns Several styles of string search can be performed with a search expression -(as well as the [`grep` function](functions/grep.md)) using "patterns", +(as well as the [`grep` function](functions/grep)) using "patterns", where a pattern is a regular expression, glob, or simple string. ### Regular Expressions @@ -36,8 +36,8 @@ produces {s:"bar"} {foo:1} ``` -Regular expressions may also appear in the [`grep`](functions/grep.md), -[`regexp`](functions/regexp.md), and [`regexp_replace`](functions/regexp_replace.md) functions: +Regular expressions may also appear in the [`grep`](functions/grep), +[`regexp`](functions/regexp), and [`regexp_replace`](functions/regexp_replace) functions: ```mdtest-command echo '"foo" {s:"bar"} {s:"baz"} {foo:1}' | super -z -c 'yield {ba_start:grep(/^ba.*/, s),last_s_char:regexp(/(.)$/,s)[1]}' - @@ -97,7 +97,7 @@ produces {a:1} ``` -Globs may also appear in the [`grep` function](functions/grep.md)): +Globs may also appear in the [`grep` function](functions/grep)): ```mdtest-command echo '"foo" {s:"bar"} {s:"baz"} {foo:1}' | super -z -c 'yield grep(ba*, s)' - ``` @@ -126,11 +126,11 @@ The search patterns described above can be combined with other "search terms" using Boolean logic to form search expressions. :::tip note -When processing [Super Binary](../formats/bsup.md) data, the SuperDB runtime performs a multi-threaded +When processing [Super Binary](../formats/bsup) data, the SuperDB runtime performs a multi-threaded Boyer-Moore scan over decompressed data buffers before parsing any data. This allows large buffers of data to be efficiently discarded and skipped when -searching for rarely occurring values. For a [SuperDB data lake](../lake/format.md), -a planned feature will use [Super Columnar](../formats/csup.md) files to further accelerate searches. +searching for rarely occurring values. For a [SuperDB data lake](../lake/format), +a planned feature will use [Super Columnar](../formats/csup) files to further accelerate searches. ::: ### Search Terms @@ -276,8 +276,8 @@ the "in" operator, e.g., #### Predicate Search Term -Any Boolean-valued [function](functions/_index.md) like `is`, `has`, -`grep`, etc. and any [comparison expression](expressions.md#comparisons) +Any Boolean-valued [function](functions) like `is`, `has`, +`grep`, etc. and any [comparison expression](expressions#comparisons) may be used as a search term and mixed into a search expression. For example, diff --git a/docs/language/shaping.md b/docs/language/shaping.md index ed8a2fa033..a705112862 100644 --- a/docs/language/shaping.md +++ b/docs/language/shaping.md @@ -12,7 +12,7 @@ a well-defined set of schemas, which combines the data into a unified store like a data warehouse. In Zed, this cleansing process is called "shaping" the data, and Zed leverages -its rich, [super-structured](../formats/_index.md#2-a-super-structured-pattern) +its rich, [super-structured](../formats#2-a-super-structured-pattern) type system to perform core aspects of data transformation. In a data model with nesting and multiple scalar types (such as Zed or JSON), shaping includes converting the type of leaf fields, adding or removing fields @@ -21,21 +21,21 @@ to "fit" a given shape, and reordering fields. While shaping remains an active area of development, the core functions in Zed that currently perform shaping are: -* [`cast`](functions/cast.md) - coerce a value to a different type -* [`crop`](functions/crop.md) - remove fields from a value that are missing in a specified type -* [`fill`](functions/fill.md) - add null values for missing fields -* [`order`](functions/order.md) - reorder record fields -* [`shape`](functions/shape.md) - apply `cast`, `fill`, and `order` +* [`cast`](functions/cast) - coerce a value to a different type +* [`crop`](functions/crop) - remove fields from a value that are missing in a specified type +* [`fill`](functions/fill) - add null values for missing fields +* [`order`](functions/order) - reorder record fields +* [`shape`](functions/shape) - apply `cast`, `fill`, and `order` They all have the same signature, taking two parameters: the value to be -transformed and a [type value](data-types.md) for the target type. +transformed and a [type value](data-types) for the target type. > Another type of transformation that's needed for shaping is renaming fields, -> which is supported by the [`rename` operator](operators/rename.md). -> Also, the [`yield` operator](operators/yield.md) +> which is supported by the [`rename` operator](operators/rename). +> Also, the [`yield` operator](operators/yield) > is handy for simply emitting new, arbitrary record literals based on > input values and mixing in these shaping functions in an embedded record literal. -> The [`fuse` aggregate function](aggregates/fuse.md) is also useful for fusing +> The [`fuse` aggregate function](aggregates/fuse) is also useful for fusing > values into a common schema, though a type is returned rather than values. In the examples below, we will use the following named type `connection` @@ -74,7 +74,7 @@ field path in the specified type, e.g., super -Z -I connection.zed -c 'cast(this, )' sample.json ``` casts the address fields to type `ip`, the port fields to type `port` -(which is a [named type](data-types.md#named-types) for type `uint16`) and the address port pairs to +(which is a [named type](data-types#named-types) for type `uint16`) and the address port pairs to type `socket` without modifying the `uid` field or changing the order of the `server` and `client` fields: ```mdtest-output @@ -162,7 +162,7 @@ about the `uid` field as it is not in the `connection` type: ``` As an alternative to the `order` function, -[record expressions](expressions.md#record-expressions) can be used to reorder +[record expressions](expressions#record-expressions) can be used to reorder fields without specifying types. For example: ```mdtest-command @@ -236,7 +236,7 @@ drops the `uid` field after shaping: ## Error Handling -A failure during shaping produces an [error value](data-types.md#first-class-errors) +A failure during shaping produces an [error value](data-types#first-class-errors) in the problematic leaf field. For example, consider this alternate input data file `malformed.json`. @@ -285,10 +285,10 @@ we see two errors: ``` Since these error values are nested inside an otherwise healthy record, adding -[`has_error(this)`](functions/has_error.md) downstream in our Zed pipeline +[`has_error(this)`](functions/has_error) downstream in our Zed pipeline could help find or exclude such records. If the failure to shape _any_ single field is considered severe enough to render the entire input record unhealthy, -[a conditional expression](expressions.md#conditional) +[a conditional expression](expressions#conditional) could be applied to wrap the input record as an error while including detail to debug the problem, e.g., @@ -428,8 +428,8 @@ fused into the single-type sequence: To perform fusion, Zed currently includes two key mechanisms (though this is an active area of development): -* the [`fuse` operator](operators/fuse.md), and -* the [`fuse` aggregate function](aggregates/fuse.md). +* the [`fuse` operator](operators/fuse), and +* the [`fuse` aggregate function](aggregates/fuse). ### Fuse Operator @@ -474,7 +474,7 @@ Since the `fuse` here is an aggregate function, it can also be used with group-by keys. Supposing we want to divide records into categories and fuse the records in each category, we can use a group-by. In this simple example, we will fuse records based on their number of fields using the -[`len` function:](functions/len.md) +[`len` function:](functions/len) ```mdtest-command echo '{x:1} {x:"foo",y:"foo"} {x:2,y:"bar"}' | super -z -c 'fuse(this) by len(this) |> sort len' - diff --git a/docs/language/statements.md b/docs/language/statements.md index 4852647ef7..cd53d6cffb 100644 --- a/docs/language/statements.md +++ b/docs/language/statements.md @@ -10,7 +10,7 @@ Constants may be defined and assigned to a symbolic name with the syntax ``` const = ``` -where `` is an identifier and `` is a constant [expression](expressions.md) +where `` is an identifier and `` is a constant [expression](expressions) that must evaluate to a constant at compile time and not reference any runtime state such as `this`, e.g., ```mdtest-command @@ -26,7 +26,7 @@ One or more `const` statements may appear only at the beginning of a scope (i.e., the main scope at the start of a query, the start of the body of a [user-defined operator](#operator-statements), or a [lateral scope](lateral-subqueries.md/#lateral-scope) -defined by an [`over` operator](operators/over.md)) +defined by an [`over` operator](operators/over)) and binds the identifier to the value in the scope in which it appears in addition to any contained scopes. @@ -42,7 +42,7 @@ User-defined functions may be created with the syntax func ( [ [, ...]] ) : ( ) ``` where `` and `` are identifiers and `` is an -[expression](expressions.md) that may refer to parameters but not to runtime +[expression](expressions) that may refer to parameters but not to runtime state such as `this`. For example, @@ -61,7 +61,7 @@ One or more `func` statements may appear at the beginning of a scope (i.e., the main scope at the start of a query, the start of the body of a [user-defined operator](#operator-statements), or a [lateral scope](lateral-subqueries.md/#lateral-scope) -defined by an [`over` operator](operators/over.md)) +defined by an [`over` operator](operators/over)) and binds the identifier to the expression in the scope in which it appears in addition to any contained scopes. @@ -88,14 +88,14 @@ A user-defined operator can then be called with using the familiar call syntax ( [ [, ...]] ) ``` where `` is the identifier of the user-defined operator and `` is a list -of [expressions](expressions.md) matching the number of ``s defined in +of [expressions](expressions) matching the number of ``s defined in the operator's signature. One or more `op` statements may appear only at the beginning of a scope (i.e., the main scope at the start of a query, the start of the body of a [user-defined operator](#operator-statements), or a [lateral scope](lateral-subqueries.md/#lateral-scope) -defined by an [`over` operator](operators/over.md)) +defined by an [`over` operator](operators/over)) and binds the identifier to the value in the scope in which it appears in addition to any contained scopes. @@ -123,13 +123,13 @@ produces ### Arguments The arguments to a user-defined operator must be either constant values (e.g., -a [literal](expressions.md#literals) or reference to a +a [literal](expressions#literals) or reference to a [defined constant](#const-statements)), or a reference to a path in the data -stream (e.g., a [field reference](expressions.md#field-dereference)). Any +stream (e.g., a [field reference](expressions#field-dereference)). Any other expression will result in a compile-time error. Because both constant values and path references evaluate in -[expression](expressions.md) contexts, a `` may often be used inside of +[expression](expressions) contexts, a `` may often be used inside of a user-defined operator without regard to the argument's origin. For instance, with the program `params.spq` ```mdtest-input params.spq @@ -167,7 +167,7 @@ illegal left-hand side of assignment in params.spq at line 2, column 3: ``` A constant value must be used to pass a parameter that will be referenced as -the data source of a [`from` operator](operators/from.md). For example, we +the data source of a [`from` operator](operators/from). For example, we quote the pool name in our program `count-pool.spq` ```mdtest-input count-pool.spq op CountPool(pool_name): ( @@ -226,7 +226,7 @@ Named types may be created with the syntax ``` type = ``` -where `` is an identifier and `` is a [type](data-types.md#first-class-types). +where `` is an identifier and `` is a [type](data-types#first-class-types). This creates a new type with the given name in the type system, e.g., ```mdtest-command echo 80 | super -z -c 'type port=uint16 cast(this, )' - @@ -240,7 +240,7 @@ One or more `type` statements may appear at the beginning of a scope (i.e., the main scope at the start of a query, the start of the body of a [user-defined operator](#operator-statements), or a [lateral scope](lateral-subqueries.md/#lateral-scope) -defined by an [`over` operator](operators/over.md)) +defined by an [`over` operator](operators/over)) and binds the identifier to the type in the scope in which it appears in addition to any contained scopes. diff --git a/docs/libraries/python.md b/docs/libraries/python.md index 7c8feb0e75..0315cc87d7 100644 --- a/docs/libraries/python.md +++ b/docs/libraries/python.md @@ -6,12 +6,12 @@ title: Python Zed includes preliminary support for Python-based interaction with a Zed lake. The Zed Python package supports loading data into a Zed lake as well as -querying and retrieving results in the [ZJSON format](../formats/zjson.md). +querying and retrieving results in the [ZJSON format](../formats/zjson). The Python client interacts with the Zed lake via the REST API served by -[`super db serve`](../commands/super-db.md#serve). +[`super db serve`](../commands/super-db#serve). This approach works adequately when high data throughput is not required. -We plan to introduce native [Super Binary](../formats/bsup.md) support for +We plan to introduce native [Super Binary](../formats/bsup) support for Python that should increase performance substantially for more data intensive workloads. diff --git a/docs/tutorials/join.md b/docs/tutorials/join.md index 7badba8cad..3594002799 100644 --- a/docs/tutorials/join.md +++ b/docs/tutorials/join.md @@ -3,7 +3,7 @@ weight: 3 title: Join Overview --- -This is a brief primer on the SuperPipe [`join` operator](../language/operators/join.md). +This is a brief primer on the SuperPipe [`join` operator](../language/operators/join). Currently, `join` is limited in that only equi-join (i.e., a join predicate containing `=`) is supported. @@ -159,10 +159,10 @@ produces ## Inputs from Pools In the examples above, we used the -[`file` operator](../language/operators/file.md) to read our respective inputs +[`file` operator](../language/operators/file) to read our respective inputs from named file sources. However, if the inputs are stored in pools in a SuperDB data lake, we would instead specify the sources as data pools using the -[`from` operator](../language/operators/from.md). +[`from` operator](../language/operators/from). Here we'll load our input data to pools in a temporary data lake, then execute our inner join using `super db query`. @@ -201,9 +201,9 @@ produces In addition to the syntax shown so far, `join` supports an alternate syntax in which left and right inputs are specified by the two branches of a preceding -[`fork` operator](../language/operators/fork.md), -[`from` operator](../language/operators/from.md), or -[`switch` operator](../language/operators/switch.md). +[`fork` operator](../language/operators/fork), +[`from` operator](../language/operators/from), or +[`switch` operator](../language/operators/switch). Here we'll use the alternate syntax to perform the same inner join shown earlier in the [Inner Join section](#inner-join). @@ -287,7 +287,7 @@ records. In the query `multi-value-join.spq`, we create the keys as embedded records inside each input record, using the same field names and data types in each. We'll leave the created `fruitkey` records intact to show what they look like, but since it represents redundant data, in practice we'd -typically [`drop`](../language/operators/drop.md) it after the `join` in our pipeline. +typically [`drop`](../language/operators/drop) it after the `join` in our pipeline. ```mdtest-input multi-value-join.spq file fruit.json |> put fruitkey:={name,color} @@ -391,7 +391,7 @@ produces If embedding the opposite record is undesirable, the left and right records can easily be merged with the -[spread operator](../language/expressions.md#record-expressions). Additional +[spread operator](../language/expressions#record-expressions). Additional processing may be necessary to handle conflicting field names, such as in the example just shown where the `name` field is used differently in the left and right inputs. We'll demonstrate this by augmenting `embed-opposite.spq` diff --git a/docs/tutorials/zed.md b/docs/tutorials/zed.md index a38ab961a2..bd99a78aef 100644 --- a/docs/tutorials/zed.md +++ b/docs/tutorials/zed.md @@ -9,7 +9,7 @@ analytics? This is where the `zed` command comes in. `zed` builds on the type system and language found in `zq` and adds a high performance data lake on top. > Note: `zed` is currently in alpha form. Check out its current status in the -> [`super db` command](../commands/super-db.md) documentation.. +> [`super db` command](../commands/super-db) documentation.. ## Creating a Lake @@ -84,7 +84,7 @@ Our data has been committed. The `-use prs` argument in `zed load` tells With our data now loaded let's run a quick `count()` query to verify that we have the expected data. To do this we'll use the `zed query` command. To those -familiar with [`super`](../commands/super.md), `zed query` operates similarly except +familiar with [`super`](../commands/super), `zed query` operates similarly except it doesn't accept file input arguments since it queries pools. ```bash @@ -316,9 +316,9 @@ $ zed query -Z 'min(created_at), max(created_at)' Obviously this is only the tip of the iceberg in terms of things that can be done with the `zed` command. Some suggested next steps: -1. Dig deeper into SuperDB data lakes by having a look at the [`super db` command](../commands/super-db.md) documentation. +1. Dig deeper into SuperDB data lakes by having a look at the [`super db` command](../commands/super-db) documentation. 2. Get a better idea of ways you can query your data by looking at the -[Zed language documentation](../language/_index.md). +[Zed language documentation](../language). If you have any questions or run into any snags, join the friendly Zed community at the [Brim Data Slack workspace](https://www.brimdata.io/join-slack/). diff --git a/docs/tutorials/zq.md b/docs/tutorials/zq.md index 6561d09b77..765f675226 100644 --- a/docs/tutorials/zq.md +++ b/docs/tutorials/zq.md @@ -5,10 +5,10 @@ heading: super Tutorial --- This tour provides new users of `super` an overview of the tool and -the [SuperPipe language](../language/_index.md) +the [SuperPipe language](../language) by walking through a number of examples on the command-line. This should get you started without having to read through all the gory details -of the [SuperPipe language](../language/_index.md) or [`super` command-line usage](../commands/super.md). +of the [SuperPipe language](../language) or [`super` command-line usage](../commands/super). We'll start with some simple one-liners on the command line where we feed some data to `super` with `echo` and specify `-` for `super` input to indicate @@ -20,12 +20,12 @@ Then, toward the end of the tour, we'll experiment with some real-world GitHub d pulled from the GitHub API. If you want to follow along on the command line, -just make sure the `super` command [is installed](../install.md) +just make sure the `super` command [is installed](../install) as well as [`jq`](https://stedolan.github.io/jq/). ## But JSON -While `super` is based on a new type of [data model](../formats/zed.md), +While `super` is based on a new type of [data model](../formats/zed), Zed just so happens to be a superset of JSON. So if all you ever use `zq` for is manipulating JSON data, @@ -36,7 +36,7 @@ doing interesting things on that input, and emitting results, of course, as JSON `jq` is awesome and powerful, but its syntax and computational model can sometimes be daunting and difficult. We tried to make `zq` really easy and intuitive, -and it is usually faster, sometimes [much faster](../commands/super.md#performance), +and it is usually faster, sometimes [much faster](../commands/super#performance), than `jq`. To this end, if you want full JSON compatibility without having to delve into the @@ -45,7 +45,7 @@ expect JSON values as input and produce JSON values as output, much like `jq`. :::tip If your downstream JSON tooling expects only a single JSON value, we can use -`-j` along with [`collect()`](../language/aggregates/collect.md) to aggregate +`-j` along with [`collect()`](../language/aggregates/collect) to aggregate multiple input values into an array. A `collect()` example is shown [later in this tutorial](#running-analytics). ::: @@ -64,7 +64,7 @@ and you get ``` With `zq`, the mysterious `jq` value `.` is instead called the almost-as-mysterious value -[`this`](../language/pipeline-model.md#the-special-value-this) and you say: +[`this`](../language/pipeline-model#the-special-value-this) and you say: ```mdtest-command echo '1 2 3' | super -z -c 'this+1' - ``` @@ -75,7 +75,7 @@ which also gives 4 ``` > Note that we are using the `-z` option with `zq` in all of the examples, -> which causes `zq` to format the output as [ZSON](../formats/jsup.md). +> which causes `zq` to format the output as [ZSON](../formats/jsup). > When running `zq` on the terminal, you do not need `-z` as it is the default, > but we include it here for clarity and because all of these examples are > run through automated testing, which is not attached to a terminal. @@ -101,7 +101,7 @@ expression `2` is evaluated for each input value, and the value `2` is produced each time, so three copies of `2` are emitted. In `zq` however, `2` by itself is interpreted as a search and is -[shorthand for](../language/pipeline-model.md#implied-operators) `search 2` so the command +[shorthand for](../language/pipeline-model#implied-operators) `search 2` so the command ```mdtest-command echo '1 2 3' | super -z -c '? 2' - ``` @@ -135,7 +135,7 @@ produces Doing searches like this in `jq` would be hard. That said, we can emulate the `jq` transformation stance by explicitly -indicating that we want to [yield](../language/operators/yield.md) +indicating that we want to [yield](../language/operators/yield) the result of the expression evaluated for each input value, e.g., ```mdtest-command echo '1 2 3' | super -z -c 'yield 2' - @@ -157,15 +157,15 @@ trying to do high-precision stuff with data. When using `zq`, it's handy to operate in the domain of Zed data and only output to JSON when needed. -The human-readable format of Zed is called [ZSON](../formats/jsup.md) +The human-readable format of Zed is called [ZSON](../formats/jsup) (and yes, that's a play on the acronym JSON). ZSON is nice because it has a comprehensive type system and you can -go from ZSON to an efficient binary row format ([Super Binary](../formats/bsup.md)) -and columnar ([Super Columnar](../formats/csup.md)) --- and vice versa --- +go from ZSON to an efficient binary row format ([Super Binary](../formats/bsup)) +and columnar ([Super Columnar](../formats/csup)) --- and vice versa --- with complete fidelity and no loss of information. In this tour, we'll stick to ZSON (though for large data sets, -[Super Binary is much faster](../commands/super.md#performance)). +[Super Binary is much faster](../commands/super#performance)). The first thing you'll notice about ZSON is that you don't need quotations around field names. We can see this by taking some JSON @@ -208,7 +208,7 @@ produces ## Comprehensive Types -ZSON also has a [comprehensive type system](../formats/zed.md). +ZSON also has a [comprehensive type system](../formats/zed). For example, here is ZSON "record" with a taste of different types of values as record fields: @@ -248,7 +248,7 @@ Here, `v1` is a 64-bit IEEE floating-point value just like JSON. Unlike JSON, `v2` is a 64-bit integer. And there are other integer types as with `v3`, -which utilizes a [ZSON type decorator](../formats/jsup.md#22-type-decorators), +which utilizes a [ZSON type decorator](../formats/jsup#22-type-decorators), in this case, to clarify its specific type of integer as unsigned 8 bits. a @@ -276,7 +276,7 @@ As is often the case with semi-structured systems, you deal with nested values all the time: in JSON, data is nested with objects and arrays, while in Zed, data is nested with "records" and arrays (as well as other complex types). -[Record expressions](../language/expressions.md#record-expressions) +[Record expressions](../language/expressions#record-expressions) are rather flexible with `zq` and look a bit like JavaScript or `jq` syntax, e.g., ```mdtest-command @@ -322,7 +322,7 @@ produces Sometimes you just want to extract or mutate certain fields of records. -Similar to the Unix `cut` command, the Zed [cut operator](../language/operators/cut.md) +Similar to the Unix `cut` command, the Zed [cut operator](../language/operators/cut) extracts fields, e.g., ```mdtest-command echo '{s:"foo", val:1}{s:"bar"}' | super -z -c 'cut s' - @@ -332,7 +332,7 @@ produces {s:"foo"} {s:"bar"} ``` -while the [put operator](../language/operators/put.md) mutates existing fields +while the [put operator](../language/operators/put) mutates existing fields or adds new fields, e.g., ```mdtest-command echo '{s:"foo", val:1}{s:"bar"}' | super -z -c 'put val:=123,pi:=3.14' - @@ -366,7 +366,7 @@ produces ``` Sometimes you expect missing errors to occur sporadically and just want to ignore them, which can you easily do with the -[quiet function](../language/functions/quiet.md), e.g., +[quiet function](../language/functions/quiet), e.g., ```mdtest-command echo '{s:"foo", val:1}{s:"bar"}' | super -z -c 'cut quiet(val)' - ``` @@ -378,7 +378,7 @@ produces ## Union Types One of the tricks `zq` uses to represent JSON data in its structured type system -is [union types](../language/expressions.md#union-values). +is [union types](../language/expressions#union-values). Most of the time, you don't need to worry about unions but they show up from time to time. Even when they show up, Zed just tries to "do the right thing" so you usually @@ -410,7 +410,7 @@ preparation, union types are really quite powerful. They allow records with fields of different types or mixed-type arrays to be easily expressed while also having a very precise type definition. This is the essence of Zed's new -[super-structured data model](../formats/_index.md#2-a-super-structured-pattern). +[super-structured data model](../formats#2-a-super-structured-pattern). ## First-class Types @@ -420,7 +420,7 @@ In other words, Zed has [first-class](https://en.wikipedia.org/wiki/First-class_citizen) types. The type of any value in `zq` can be accessed via the -[typeof function](../language/functions/typeof.md), e.g., +[typeof function](../language/functions/typeof), e.g., ```mdtest-command echo '1 "foo" 10.0.0.1' | super -z -c 'yield typeof(this)' - ``` @@ -434,7 +434,7 @@ What's the big deal here? We can print out the type of something. Yawn. Au contraire, this is really quite powerful because we can use types as values to functions, e.g., as a dynamic argument to -the [cast function](../language/functions/cast.md): +the [cast function](../language/functions/cast): ```mdtest-command echo '{a:0,b:"2"}{a:0,b:"3"}' | super -z -c 'yield cast(b, typeof(a))' - ``` @@ -496,7 +496,7 @@ false ## Sample Sometimes you'd like to see a sample value of each shape, not its type. -This is easy to do with the [any aggregate function](../language/aggregates/any.md), +This is easy to do with the [any aggregate function](../language/aggregates/any), e.g, ```mdtest-command echo '{x:1,y:2}{s:"foo"}{x:3,y:4}' | @@ -507,7 +507,7 @@ produces {s:"foo"} {x:1,y:2} ``` -We like this pattern so much there is a shortcut [sample operator](../language/operators/sample.md), e.g., +We like this pattern so much there is a shortcut [sample operator](../language/operators/sample), e.g., ```mdtest-command echo '{x:1,y:2}{s:"foo"}{x:3,y:4}' | super -z -c 'sample this |> sort this' - ``` @@ -537,7 +537,7 @@ This is where you might have to spend a little bit of time coding up the right `zq` logic to disentangle a JSON mess. But once the data is cleaned up, you can leave it in a Zed format and not worry again. -To do so, the [fuse operator](../language/operators/fuse.md) comes in handy. +To do so, the [fuse operator](../language/operators/fuse) comes in handy. Let's say you have this sequence of data: ``` {a:1,b:null} @@ -615,7 +615,7 @@ produces 1 ``` Hmm, there's just one value. It's probably a big JSON array but let's check with -the [kind function](../language/functions/kind.md), and as expected: +the [kind function](../language/functions/kind), and as expected: ```mdtest-command dir=docs/tutorials super -z -c 'kind(this)' prs.json ``` @@ -636,7 +636,7 @@ pull-request objects as elements of one array representing a single JSON value. Let's see what sorts of things are in this array. Here, we need to enumerate the items from the array and do something with them. So how about we use -the [over operator](../language/operators/over.md) +the [over operator](../language/operators/over) to traverse the array and count the array items by their "kind", ```mdtest-command dir=docs/tutorials super -z -c 'over this |> count() by kind(this)' prs.json @@ -670,7 +670,7 @@ produces All that data across the samples and only three shapes. They must each be really big. Let's check that out. -We can use the [len function](../language/functions/len.md) on the records to +We can use the [len function](../language/functions/len) on the records to see the size of each of the four records: ```mdtest-command dir=docs/tutorials super -z -c 'over this |> sample |> len(this) |> sort this' prs.json @@ -774,7 +774,7 @@ super -Z -c 'over this |> sample _links' prs.json ``` While these fields have some useful information, we'll decide to drop them here and focus on other top-level fields. To do this, we can use the -[drop operator](../language/operators/drop.md) to whittle down the data: +[drop operator](../language/operators/drop) to whittle down the data: ``` super -Z -c 'over this |> fuse |> drop head,base,_link |> sample' prs.json ``` @@ -874,7 +874,7 @@ and you will get strings that are all ISO dates: ... ``` To fix those strings, we simply transform the fields in place using the -(implied) [put operator](../language/operators/put.md) and redirect the final +(implied) [put operator](../language/operators/put) and redirect the final output the ZNG file `prs.bsup`: ``` super -c ' @@ -903,7 +903,7 @@ which now gives: ``` and we can see that the date fields are correctly typed as type `time`! -> Note that we sorted the output values here using the [sort operator](../language/operators/sort.md) +> Note that we sorted the output values here using the [sort operator](../language/operators/sort) > to produce a consistent output order since aggregations can be run in parallel > to achieve scale and do not guarantee their output order. @@ -955,11 +955,11 @@ DATE NUMBER TITLE 2019-11-12T16:49:07Z PR #6 a few clarifications to the zson spec ... ``` -Note that we used a [formatted string literal](../language/expressions.md#formatted-string-literals) +Note that we used a [formatted string literal](../language/expressions#formatted-string-literals) to convert the field `number` into a string and format it with surrounding text. Instead of old PRs, we can get the latest list of PRs using the -[tail operator](../language/operators/tail.md) since we know the data is sorted +[tail operator](../language/operators/tail) since we know the data is sorted chronologically. This command retrieves the last five PRs in the dataset: ```mdtest-command dir=docs/tutorials super -f table -c ' @@ -997,13 +997,13 @@ the login field from each record: super -z -c 'over requested_reviewers |> collect(login)' prs.bsup ``` Oops, this gives us an array of the reviewer logins -with repetitions since [collect](../language/aggregates/collect.md) +with repetitions since [collect](../language/aggregates/collect) collects each item that it encounters into an array: ```mdtest-output ["mccanne","nwt","henridf","mccanne","nwt","mccanne","mattnibs","henridf","mccanne","mattnibs","henridf","mccanne","mattnibs","henridf","mccanne","nwt","aswan","henridf","mccanne","nwt","aswan","philrz","mccanne","mccanne","aswan","henridf","aswan","mccanne","nwt","aswan","mikesbrown","henridf","aswan","mattnibs","henridf","mccanne","aswan","nwt","henridf","mattnibs","aswan","aswan","mattnibs","aswan","henridf","aswan","henridf","mccanne","aswan","aswan","mccanne","nwt","aswan","henridf","aswan"] ``` What we'd prefer is a set of reviewers where each reviewer appears only once. This -is easily done with the [union](../language/aggregates/union.md) aggregate function +is easily done with the [union](../language/aggregates/union) aggregate function (not to be confused with union types) which computes the set-wise union of its input and produces a Zed `set` type as its output. In this case, the output is a set of strings, written `|[string]|` @@ -1024,7 +1024,7 @@ in the graph and each set of reviewers is another node. So as a first step, let's figure out how to create each edge, where an edge is a relation between the requesting user and the set of reviewers. We can -create this in Zed with a ["lateral subquery"](../language/lateral-subqueries.md). +create this in Zed with a ["lateral subquery"](../language/lateral-subqueries). Instead of computing a set-union over all the reviewers across all PRs, we instead want to compute the set-union over the reviewers in each PR. We can do this as follows: @@ -1040,7 +1040,7 @@ which produces an output like this: {reviewers:|["henridf","mccanne","mattnibs"]|} ... ``` -Note that the syntax `=> ( ... )` defines a [lateral scope](../language/lateral-subqueries.md#lateral-scope) where any Zed subquery can +Note that the syntax `=> ( ... )` defines a [lateral scope](../language/lateral-subqueries#lateral-scope) where any Zed subquery can run in isolation over the input values created from the sequence of values traversed by the outer `over`. @@ -1049,7 +1049,7 @@ To do this, we need to reference the `user.login` from the top-level scope withi lateral scope. This can be done by bringing that value into the scope using a `with` clause appended to the `over` expression and yielding a -[record literal](../language/expressions.md#record-expressions) with the desired value: +[record literal](../language/expressions#record-expressions) with the desired value: ```mdtest-command dir=docs/tutorials super -z -c ' over requested_reviewers with user=user.login => ( @@ -1253,6 +1253,6 @@ of tricks to: clean data for analysis by `zq` or even export into other systems or for testing. If you'd like to learn more, feel free to read through the -[language docs](../language/_index.md) in depth -or see how you can organize [data into a lake](../commands/super-db.md) +[language docs](../language) in depth +or see how you can organize [data into a lake](../commands/super-db) using a git-like commit model.