From f0c92ca0b33a8ff4e915430221b985d0041571d8 Mon Sep 17 00:00:00 2001 From: Julien Rousseau Date: Tue, 24 Oct 2023 13:50:01 -0400 Subject: [PATCH 1/7] updated database diagram --- README.md | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index e11b8b8..6a936bc 100644 --- a/README.md +++ b/README.md @@ -43,6 +43,7 @@ Options: These can all be set when starting the sink. See [cli structure](#cli-structure). **.env** + ```bash # Authentication PUBLIC_KEY=... # Ed25519 Public-key provided by https://github.com/pinax-network/substreams-sink-webhook @@ -72,11 +73,13 @@ The `USER_DIMENSION` is generated by the user provided schema and is augmented b ```mermaid erDiagram - USER_DIMENSION }|--|{ block : " " - USER_DIMENSION }|--|{ manifest : " " + USER_DIMENSION }|--|{ blocks : " " + USER_DIMENSION }|--|{ module_hashes : " " + + blocks }|--|{ unparsed_json : " " + module_hashes }|--|{ unparsed_json : " " - block }|--|{ unparsed_json : " " - manifest }|--|{ unparsed_json : " " + blocks }|--|{ final_blocks : " " USER_DIMENSION { user_data unknown @@ -95,19 +98,22 @@ erDiagram chain LowCardinality(String) } - block { + blocks { block_id FixedString(64) - block_number UInt32() + block_number UInt32 chain LowCardinality(String) timestamp DateTime64(3_UTC) - final_block Bool } - manifest { + module_hashes { module_hash FixedString(40) - module_name String() + module_name String chain LowCardinality(String) - type String() + type String + } + + final_blocks { + block_id FixedString(64) } ``` From cc1e9db3dfdca5e8a67442a5151eac7dd705c0d1 Mon Sep 17 00:00:00 2001 From: Julien Rousseau Date: Tue, 24 Oct 2023 14:13:49 -0400 Subject: [PATCH 2/7] updated schema creation docs --- README.md | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 6a936bc..5cfa687 100644 --- a/README.md +++ b/README.md @@ -135,30 +135,28 @@ substreams-sink-clickhouse --create-db --name ### Schema initialization -_This step can be skipped. If so, the data will be stored as-is in the `unparsed_json` table. It should then be parsed by the user with ClickHouse's tools (eg: `MaterializedView`)_ +_This step can be skipped. If so, the data will be stored as-is in the `unparsed_json` table. It should then be parsed by the user with ClickHouse's tools. See this [article](https://clickhouse.com/docs/en/integrations/data-formats/json#using-materialized-views)._ -Initializes the database according to a SQL file. See [example file](#example-sql-file). - -**CLI** - -``` -substreams-sink-clickhouse --schema-url -``` +Initializes the database according to a SQL or a GraphQL file. See [example schema files](#schema-examples). **Web UI** -Upload a `.sql` file on [http://localhost:3000](http://localhost:3000). (POST request `/schema`, Content-Type: `application/octet-stream`) +Upload a schema file on [http://localhost:3000](http://localhost:3000). + +_Use PUT `/schema/sql` or PUT `/schema/graphql` with `Content-Type: application/octet-stream`._ **Curl** ```bash -curl --location --request POST 'http://localhost:3000/schema' --header 'Authorization: Bearer ' --header 'Content-Type: application/json' --data-raw '' +> curl --location --request PUT 'http://localhost:3000/schema/sql' --header 'Authorization: Bearer ' --header 'Content-Type: application/json' --data-raw '' + +> curl --location --request PUT 'http://localhost:3000/schema/graphql' --header 'Authorization: Bearer ' --header 'Content-Type: application/json' --data-raw '' ``` -#### Example SQL file +### Schema examples
-Click to expand +Example SQL file ```sql CREATE TABLE IF NOT EXISTS contracts ( @@ -173,6 +171,21 @@ ORDER BY (address)
+
+Example GraphQL file + +```graphql +type Contracts @entity { + id: ID! + address: String! + name: String + symbol: String + decimals: BigInt +} +``` + +
+ ### Sink Serves an endpoint to receive Substreams data from [substreams-sink-webhook](https://github.com/pinax-network/substreams-sink-webhook). From e991456fc665d91c4a7e2bf389de38911f95ae53 Mon Sep 17 00:00:00 2001 From: Julien Rousseau Date: Tue, 24 Oct 2023 14:19:34 -0400 Subject: [PATCH 3/7] added features in readme --- README.md | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 52 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 5cfa687..fe1cea6 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,57 @@ ## Features -- TO-DO: See [issues](https://github.com/pinax-network/substreams-sink-clickhouse/issues?q=is%3Aissue+is%3Aclosed) +
+Serverless data sinking + +By using this sink with [substreams-sink-webhook](https://github.com/pinax-network/substreams-sink-webhook), data from any substreams is available in ClickHouse easily. + +
+ +
+Automatic block information + +Data for each block is stored alongside every record. The fields and their structure can be found in the [database structure](#database-structure). + +
+ +
+SQL schemas + +A schema can be passed in to define the end table for substreams data. It will be extended as described in the [database structure](#database-structure). + +They can be set according to the steps in [database initialization](#database-initialization). + +
+ +
+GraphQL schemas + +[TheGraph's GraphQL entity](https://thegraph.com/docs/en/developing/creating-a-subgraph/#defining-entities) schemas can be passed in to define the end table for substreams data. See [database initialization](#database-initialization). + +They are converted to SQL following these rules before being executed. The available types are defined [here](https://thegraph.com/docs/en/developing/creating-a-subgraph/#graphql-supported-scalars). + +| GraphQL data type | ClickHouse equivalent | +| ----------------- | --------------------- | +| `Bytes` | `String` | +| `String` | `String` | +| `Boolean` | `boolean` | +| `Int` | `Int32` | +| `BigInt` | `String` | +| `BigDecimal` | `String` | +| `Float` | `Float64` | +| `ID` | `String` | + +
+ +
+NO schema + +No schema is required to store data in ClickHouse. Everything can be stored in `unparsed_json` (see [database structure](#database-structure)). + +The user **must** build custom [views](https://clickhouse.com/docs/en/guides/developer/cascading-materialized-views) to transform the data according to their needs. Further details are available in [ClickHouse's documentation](https://clickhouse.com/docs/en/integrations/data-formats/json#using-materialized-views). + +
## Usage @@ -30,7 +80,7 @@ Options: --database The database to use inside ClickHouse (default: "default", env: DATABASE) --username Database user (default: "default", env: USERNAME) --password Password associated with the specified username (default: "", env: PASSWORD) - --create-database If the specified database does not exist, automatically create it (default: "false", env: CREATE_DATABASE) --async-insert https://clickhouse.com/docs/en/operations/settings/settings#async-insert (choices: "0", "1", default: 1, env: ASYNC_INSERT) --wait-for-insert https://clickhouse.com/docs/en/operations/settings/settings#wait-for-async-insert (choices: "0", "1", default: 0, env: WAIT_FOR_INSERT) --queue-limit Insert delay to each response when the pqueue exceeds this value (default: 10, env: QUEUE_LIMIT) From b23c44ddf0b1d9affa1d96778a96b68095c3c1b1 Mon Sep 17 00:00:00 2001 From: Julien Rousseau Date: Tue, 24 Oct 2023 14:31:53 -0400 Subject: [PATCH 4/7] removed unused config options --- .env.example | 2 -- README.md | 2 -- src/config.ts | 3 --- src/schemas.spec.ts | 4 ---- src/schemas.ts | 2 -- 5 files changed, 13 deletions(-) diff --git a/.env.example b/.env.example index 3e3d10d..f14d187 100644 --- a/.env.example +++ b/.env.example @@ -11,10 +11,8 @@ HOST=http://127.0.0.1:8123 DATABASE=default USERNAME=default PASSWORD= -CREATE_DB=false # Sink QUEUE_LIMIT=10 QUEUE_CONCURRENCY=10 -SCHEMA_URL=... # generate SQL schema by providing file (ex: ./schema.sql) or URL path (ex: https://example.com/schema.sql) VERBOSE=true \ No newline at end of file diff --git a/README.md b/README.md index fe1cea6..990d699 100644 --- a/README.md +++ b/README.md @@ -73,14 +73,12 @@ Options: -V, --version output the version number -p, --port HTTP port on which to attach the sink (default: "3000", env: PORT) -v, --verbose Enable verbose logging (choices: "true", "false", default: "pretty", env: VERBOSE) - -s, --schema-url Execute SQL instructions before starting the sink (env: SCHEMA_URL) --public-key Public key to validate messages (env: PUBLIC_KEY) --auth-key Auth key to validate requests (env: AUTH_KEY) --host Database HTTP hostname (default: "http://localhost:8123", env: HOST) --database The database to use inside ClickHouse (default: "default", env: DATABASE) --username Database user (default: "default", env: USERNAME) --password Password associated with the specified username (default: "", env: PASSWORD) - --create-database If the specified database does not exist, automatically create it (default: "false", env: CREATE_DATABASE) --async-insert https://clickhouse.com/docs/en/operations/settings/settings#async-insert (choices: "0", "1", default: 1, env: ASYNC_INSERT) --wait-for-insert https://clickhouse.com/docs/en/operations/settings/settings#wait-for-async-insert (choices: "0", "1", default: 0, env: WAIT_FOR_INSERT) --queue-limit Insert delay to each response when the pqueue exceeds this value (default: 10, env: QUEUE_LIMIT) diff --git a/src/config.ts b/src/config.ts index 2fd4f3a..b09139a 100644 --- a/src/config.ts +++ b/src/config.ts @@ -13,7 +13,6 @@ export const DEFAULT_HOST = "http://localhost:8123"; export const DEFAULT_DATABASE = "default"; export const DEFAULT_USERNAME = "default"; export const DEFAULT_PASSWORD = ""; -export const DEFAULT_CREATE_DATABASE = "false"; export const DEFAULT_ASYNC_INSERT = 1; export const DEFAULT_WAIT_FOR_ASYNC_INSERT = 0; export const DEFAULT_QUEUE_LIMIT = 10; @@ -29,14 +28,12 @@ export const opts = program .addOption(new Option("-p, --port ", "HTTP port on which to attach the sink").env("PORT").default(DEFAULT_PORT)) .addOption(new Option("-v, --verbose ", "Enable verbose logging").choices(["true", "false"]).env("VERBOSE").default(DEFAULT_VERBOSE)) .addOption(new Option("--hostname ", "Server listen on HTTP hostname").env("HOSTNAME").default(DEFAULT_HOSTNAME)) - .addOption(new Option("-s, --schema-url ", "Execute SQL instructions before starting the sink").env("SCHEMA_URL").preset(DEFAULT_SCHEMA_URL)) .addOption(new Option("--public-key ", "Public key to validate messages").env("PUBLIC_KEY")) .addOption(new Option("--auth-key ", "Auth key to validate requests").env("AUTH_KEY")) .addOption(new Option("--host ", "Database HTTP hostname").env("HOST").default(DEFAULT_HOST)) .addOption(new Option("--username ", "Database user").env("USERNAME").default(DEFAULT_USERNAME)) .addOption(new Option("--password ", "Password associated with the specified username").env("PASSWORD").default(DEFAULT_PASSWORD)) .addOption(new Option("--database ", "The database to use inside ClickHouse").env("DATABASE").default(DEFAULT_DATABASE)) - .addOption(new Option("--create-database ", "If the specified database does not exist, automatically create it").env("CREATE_DATABASE").default(DEFAULT_CREATE_DATABASE)) .addOption(new Option("--async-insert ", "https://clickhouse.com/docs/en/operations/settings/settings#async-insert").choices(["0", "1"]).env("ASYNC_INSERT").default(DEFAULT_ASYNC_INSERT)) .addOption(new Option("--wait-for-async-insert ", "https://clickhouse.com/docs/en/operations/settings/settings#wait-for-async-insert").choices(["0", "1"]).env("WAIT_FOR_INSERT").default(DEFAULT_WAIT_FOR_ASYNC_INSERT)) .addOption(new Option("--queue-limit ","Insert delay to each response when the pqueue exceeds this value").env("QUEUE_LIMIT").default(DEFAULT_QUEUE_LIMIT)) diff --git a/src/schemas.spec.ts b/src/schemas.spec.ts index 9a08604..1b9bf7c 100644 --- a/src/schemas.spec.ts +++ b/src/schemas.spec.ts @@ -12,11 +12,9 @@ const config = ConfigSchema.parse({ hostname: "0.0.0.0", publicKey: "a3cb7366ee8ca77225b4d41772e270e4e831d171d1de71d91707c42e7ba82cc9", host: "http://127.0.0.1:8123", - schemaUrl: "./schema.sql", database: "default", username: "default", password: "", - createDb: "false", queueLimit: "10", queueConcurrency: "10", verbose: "true", @@ -30,11 +28,9 @@ describe("ConfigSchema", () => { test("port", () => expect(config.port).toBe(3000)); test("queueLimit", () => expect(config.queueLimit).toBe(10)); test("verbose", () => expect(config.verbose).toBe(true)); - test("schemaUrl", () => expect(config.schemaUrl).toBe("./schema.sql")); test("database", () => expect(config.database).toBe("default")); test("username", () => expect(config.username).toBe("default")); test("publicKey", () => expect(config.publicKey).toBe("a3cb7366ee8ca77225b4d41772e270e4e831d171d1de71d91707c42e7ba82cc9")); test("waitForAsyncInsert", () => expect(config.waitForAsyncInsert).toBe(0)); test("asyncInsert", () => expect(config.asyncInsert).toBe(1)); - test("createDatabase", () => expect(config.createDatabase).toBe(false)); }); diff --git a/src/schemas.ts b/src/schemas.ts index 174aa06..3da789f 100644 --- a/src/schemas.ts +++ b/src/schemas.ts @@ -18,12 +18,10 @@ export const ConfigSchema = z.object({ database: z.string(), username: z.string(), password: z.string(), - createDatabase: boolean, asyncInsert: oneOrZero, waitForAsyncInsert: oneOrZero, queueLimit: positiveNumber, queueConcurrency: positiveNumber, - schemaUrl: z.optional(z.string()), }); export type ConfigSchema = z.infer; From df1ce4fdd0aef810652a10a8acde82abde6396b3 Mon Sep 17 00:00:00 2001 From: Julien Rousseau Date: Wed, 25 Oct 2023 11:35:12 -0400 Subject: [PATCH 5/7] updated link --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 990d699..7498d70 100644 --- a/README.md +++ b/README.md @@ -88,7 +88,7 @@ Options: ### Environment variables -These can all be set when starting the sink. See [cli structure](#cli-structure). +These can all be set when starting the sink. See [usage](#usage). **.env** From e55982ac76b1e2ff8b54f1ee22305206f2e0e032 Mon Sep 17 00:00:00 2001 From: Julien Rousseau Date: Wed, 25 Oct 2023 11:39:28 -0400 Subject: [PATCH 6/7] updated table indexes in docs --- README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 7498d70..db1792f 100644 --- a/README.md +++ b/README.md @@ -167,11 +167,13 @@ erDiagram **Indexes** -| Table | Fields | -| -------------- | ------------------------------------------ | -| USER_DIMENSION | `(chain, module_hash)` `(chain, block_id)` | -| block | `(block_id, block_number, timestamp)` | -| manifest | `module_hash` | +| Table | Fields | +| -------------- | -------------------------------------------- | +| USER_DIMENSION | `(chain, module_hash)` `(chain, block_id)` | +| module_hashes | `module_hash` | +| blocks | `(block_id, block_number, chain, timestamp)` | +| unparsed_json | `(source, chain, module_hash, block_id)` | +| final_blocks | `block_id` | ### Database initialization From d44b215fd5315ffd0562c260eaff803e129b72da Mon Sep 17 00:00:00 2001 From: Julien Rousseau Date: Thu, 26 Oct 2023 10:31:20 -0400 Subject: [PATCH 7/7] updated db init process --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index db1792f..c0e3dd0 100644 --- a/README.md +++ b/README.md @@ -106,12 +106,10 @@ HOST=http://127.0.0.1:8123 DATABASE=default USERNAME=default PASSWORD= -CREATE_DB=false # Sink QUEUE_LIMIT=10 QUEUE_CONCURRENCY=10 -SCHEMA_URL=... # generate SQL schema by providing file (ex: ./schema.sql) or URL path (ex: https://example.com/schema.sql) VERBOSE=true ``` @@ -177,10 +175,12 @@ erDiagram ### Database initialization -Create a database in ClickHouse. (Optionally, skip this step and use the `default` database.) +Create a database in ClickHouse and setup the dimension tables. + +Use `POST /init` on [http://localhost:3000](http://localhost:3000). ```bash -substreams-sink-clickhouse --create-db --name +> curl --location --request PUT 'http://localhost:3000/init' --header 'Authorization: Bearer ``` ### Schema initialization