diff --git a/docs/identity-columns/GenerateIdentityValues.md b/docs/identity-columns/GenerateIdentityValues.md new file mode 100644 index 000000000..f712a1434 --- /dev/null +++ b/docs/identity-columns/GenerateIdentityValues.md @@ -0,0 +1,102 @@ +--- +title: GenerateIdentityValues +--- + +# GenerateIdentityValues Non-Deterministic Leaf Expression + +`GenerateIdentityValues` is a non-deterministic `LeafExpression` ([Spark SQL]({{ book.spark_sql }}/expressions/Expression/#LeafExpression)). + +`GenerateIdentityValues` uses [PartitionIdentityValueGenerator](PartitionIdentityValueGenerator.md) to [generate the next IDENTITY value](PartitionIdentityValueGenerator.md#next). + +## Creating Instance + +`GenerateIdentityValues` takes the following to be created: + +* [PartitionIdentityValueGenerator](PartitionIdentityValueGenerator.md) + +`GenerateIdentityValues` is created using [apply](#apply) utility. + +## Create GenerateIdentityValues { #apply } + +```scala +apply( + start: Long, + step: Long, + highWaterMarkOpt: Option[Long]): GenerateIdentityValues +``` + +`apply` creates a [GenerateIdentityValues](#creating-instance) with a new [PartitionIdentityValueGenerator](PartitionIdentityValueGenerator.md) with the given input arguments. + +--- + +`apply` is used when: + +* `IdentityColumn` is requested to [createIdentityColumnGenerationExpr](IdentityColumn.md#createIdentityColumnGenerationExpr) + +## initializeInternal { #initializeInternal } + +??? note "Nondeterministic" + + ```scala + initializeInternal( + partitionIndex: Int): Unit + ``` + + `initializeInternal` is part of the `Nondeterministic` ([Spark SQL]({{ book.spark_sql }}/expressions/Nondeterministic/#initializeInternal)) abstraction. + +`initializeInternal` requests this [PartitionIdentityValueGenerator](#generator) to [initialize](PartitionIdentityValueGenerator.md#initialize) with the given `partitionIndex`. + +## evalInternal { #evalInternal } + +??? note "Nondeterministic" + + ```scala + evalInternal( + input: InternalRow): Long + ``` + + `evalInternal` is part of the `Nondeterministic` ([Spark SQL]({{ book.spark_sql }}/expressions/Nondeterministic/#evalInternal)) abstraction. + +`evalInternal` requests this [PartitionIdentityValueGenerator](#generator) for the [next IDENTITY value](PartitionIdentityValueGenerator.md#next). + +## Nullable { #nullable } + +??? note "Expression" + + ```scala + nullable: Boolean + ``` + + `nullable` is part of the `Expression` ([Spark SQL]({{ book.spark_sql }}/expressions/Expression/#nullable)) abstraction. + +`nullable` is always `false`. + +## Generating Java Source Code for Code-Generated Expression Evaluation { #doGenCode } + +??? note "Expression" + + ```scala + doGenCode( + ctx: CodegenContext, + ev: ExprCode): ExprCode + ``` + + `doGenCode` is part of the `Expression` ([Spark SQL]({{ book.spark_sql }}/expressions/Expression/#doGenCode)) abstraction. + +`doGenCode` generates a Java source code with this [PartitionIdentityValueGenerator](#generator) to be [initialized](PartitionIdentityValueGenerator.md#initialize) (for the current `partitionIndex`) followed by requesting the [next IDENTITY value](PartitionIdentityValueGenerator.md#next). + +```scala +import org.apache.spark.sql.delta.GenerateIdentityValues + +val expr = GenerateIdentityValues(start = 0, step = 1, highWaterMarkOpt = None) + +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext +val ctx = new CodegenContext + +val code = expr.genCode(ctx).code +println(code) +``` + +```text +final long value_0 = ((org.apache.spark.sql.delta.PartitionIdentityValueGenerator) references[0] /* generator */).next(); +``` diff --git a/docs/identity-columns/IdentityColumn.md b/docs/identity-columns/IdentityColumn.md index c385270b4..f6919cb31 100644 --- a/docs/identity-columns/IdentityColumn.md +++ b/docs/identity-columns/IdentityColumn.md @@ -35,6 +35,38 @@ copySchemaWithMergedHighWaterMarks( * `CloneTableBase` is requested to [prepareSourceMetadata](../commands/clone/CloneTableBase.md#prepareSourceMetadata) (for [CreateDeltaTableCommand](../commands/create-table/CreateDeltaTableCommand.md)) * `RestoreTableCommand` is [executed](../commands/restore/RestoreTableCommand.md#run) +## Create Expression to Generate IDENTITY Values { #createIdentityColumnGenerationExpr } + +```scala +createIdentityColumnGenerationExpr( + field: StructField): Expression +``` + +`createIdentityColumnGenerationExpr` creates a [GenerateIdentityValues](GenerateIdentityValues.md#apply) for the [IdentityInfo](#getIdentityInfo) for the given `StructField`. + +--- + +`createIdentityColumnGenerationExpr` is used when: + +* `IdentityColumn` is requested to [createIdentityColumnGenerationExprAsColumn](#createIdentityColumnGenerationExprAsColumn) +* `PreprocessTableMerge` is requested to [resolveImplicitColumns](../PreprocessTableMerge.md#resolveImplicitColumns) + +## Create Column to Generate IDENTITY Values { #createIdentityColumnGenerationExprAsColumn } + +```scala +createIdentityColumnGenerationExprAsColumn( + field: StructField): Column +``` + +`createIdentityColumnGenerationExprAsColumn` creates a `Column` ([Spark SQL]({{ book.spark_sql }}/Column)) with a [GenerateIdentityValues](#createIdentityColumnGenerationExpr) expression for the given `StructField`. + +--- + +`createIdentityColumnGenerationExprAsColumn` is used when: + +* `ColumnWithDefaultExprUtils` is requested to [addDefaultExprsOrReturnConstraints](../ColumnWithDefaultExprUtils.md#addDefaultExprsOrReturnConstraints) +* `WriteIntoDelta` is requested to [writeAndReturnCommitData](../commands/WriteIntoDelta.md#writeAndReturnCommitData) + ## getIdentityColumns { #getIdentityColumns } ```scala diff --git a/docs/identity-columns/PartitionIdentityValueGenerator.md b/docs/identity-columns/PartitionIdentityValueGenerator.md new file mode 100644 index 000000000..98d2bd699 --- /dev/null +++ b/docs/identity-columns/PartitionIdentityValueGenerator.md @@ -0,0 +1,3 @@ +# PartitionIdentityValueGenerator + +`PartitionIdentityValueGenerator` is...FIXME