diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 19f5920..921bab8 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2023-12-21T13:47:01","documenter_version":"1.2.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.2","generation_timestamp":"2024-04-20T02:31:13","documenter_version":"1.4.0"}} \ No newline at end of file diff --git a/dev/api/index.html b/dev/api/index.html index 760a081..9d75b33 100644 --- a/dev/api/index.html +++ b/dev/api/index.html @@ -1,19 +1,19 @@ -API Reference · NeuralAttentionlib.jl

API Reference

Functional

NeuralAttentionlib.alibi_position_embeddingFunction
alibi_position_embedding(mask::Union{AbstractMask, Nothing}, score, args...)

Add the non-trainable ALiBi position embedding to the attention score. The ALiBi embedding varied for each head, which assuming the attention is multi-head variants. The first dimension of the batch dimension of the attention score is treated as the head dimension. mask can either be a attention mask or nothing. Usually, it is needed when there are gaps or prefix paddings in the samples.

source
NeuralAttentionlib.biased_scoreFunction
biased_score(bias, score, args...)

Adding a precomputed bias to the attention score. bias should be in shape (key length, query length, ...) and size(bias, 1) == size(s, 1) == size(bias, 2) == size(s, 2) && ndims(bias) <= ndims(s) where s = score(args...) must hold.

source
NeuralAttentionlib.layer_normFunction
layer_norm([epsilon = 1e-5,] alpha, beta, x)

Function which perform layer normalization on x. alpha and beta can a Vector, Number or Nothing.

$layer_norm(α, β, x) = α\frac{(x - μ)}{σ} + β$

If both alpha and beta is Nothing, this is just a standardize function applied on the first dimension.

source
NeuralAttentionlib.masked_scoreFunction
masked_score(mask) = masked_score $ mask
+API Reference · NeuralAttentionlib.jl

API Reference

Functional

NeuralAttentionlib.alibi_position_embeddingFunction
alibi_position_embedding(mask::Union{AbstractMask, Nothing}, score, args...)

Add the non-trainable ALiBi position embedding to the attention score. The ALiBi embedding varied for each head, which assuming the attention is multi-head variants. The first dimension of the batch dimension of the attention score is treated as the head dimension. mask can either be a attention mask or nothing. Usually, it is needed when there are gaps or prefix paddings in the samples.

source
NeuralAttentionlib.biased_scoreFunction
biased_score(bias, score, args...)

Adding a precomputed bias to the attention score. bias should be in shape (key length, query length, ...) and size(bias, 1) == size(s, 1) == size(bias, 2) == size(s, 2) && ndims(bias) <= ndims(s) where s = score(args...) must hold.

source
NeuralAttentionlib.layer_normFunction
layer_norm([epsilon = 1e-5,] alpha, beta, x)

Function which perform layer normalization on x. alpha and beta can a Vector, Number or Nothing.

$layer_norm(α, β, x) = α\frac{(x - μ)}{σ} + β$

If both alpha and beta is Nothing, this is just a standardize function applied on the first dimension.

source
NeuralAttentionlib.move_head_dim_in_permFunction
move_head_dim_in_perm(x::AbstractArray{T, N}, nobatch=false)
 move_head_dim_in_perm(N::Int, nobatch=false)

Dimension order for permutedims to move the head dimension (created by split_head) from batch dimension to feature dimension (for merge_head). Return a tuple of integer of length n. nobatch specify where x is a batch of data.

Example

julia> Functional.move_head_dim_in_perm(5, false)
 (1, 4, 2, 3, 5)
 
 julia> Functional.move_head_dim_in_perm(5, true)
 (1, 5, 2, 3, 4)
-

See also: merge_head, move_head_dim_in

source
NeuralAttentionlib.move_head_dim_out_permFunction
move_head_dim_out_perm(x::AbstractArray{T, N}, nobatch=false)
 move_head_dim_out_perm(N::Int, nobatch=false)

Dimension order for permutedims to move the head dimension (created by split_head) to batch dimension. Return a tuple of integer of length n. nobatch specify where x is a batch of data.

Example

julia> Functional.move_head_dim_out_perm(5, false)
 (1, 3, 4, 2, 5)
 
 julia> Functional.move_head_dim_out_perm(5, true)
 (1, 3, 4, 5, 2)
-

See also: split_head, move_head_dim_out

source
NeuralAttentionlib.naive_qkv_attentionFunction
naive_qkv_attention(q, k, v, mask=nothing)

The scaled dot-product attention of a regular transformer layer.

$Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$

It's equivalent to generic_qkv_attention(weighted_sum_mixing, normalized_score(NNlib.softmax) $ masked_score(GenericMaskOp(), mask) $ scaled_dot_product_score, q, k, v).

#Example

julia> fdim, ldim, bdim = 32, 10, 4;
+

See also: split_head, move_head_dim_out

source
NeuralAttentionlib.naive_qkv_attentionFunction
naive_qkv_attention(q, k, v, mask=nothing)

The scaled dot-product attention of a regular transformer layer.

$Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$

It's equivalent to generic_qkv_attention(weighted_sum_mixing, normalized_score(NNlib.softmax) $ masked_score(GenericMaskOp(), mask) $ scaled_dot_product_score, q, k, v).

#Example

julia> fdim, ldim, bdim = 32, 10, 4;
 
 julia> x = randn(fdim, ldim, bdim);
 
@@ -24,30 +24,30 @@
 
 julia> y ≈ z
 true
-

See also: generic_qkv_attention

source
NeuralAttentionlib.normalized_scoreFunction
normalized_score(norm) = normalized_score $ norm
-normalized_score(norm, score, args...)

Normalized attenion score api. norm is the normalize function (like softmax) and score is the function that compute attention score from args....

See also: naive_qkv_attention

source
NeuralAttentionlib.rms_layer_normFunction
rms_layer_norm([epsilon = 1e-5,] alpha, x)

Function which perform root-mean-square layer normalization on x. alpha and beta can a Vector, Number or Nothing.

$rms_layer_norm(α, x) = α\frac{x}{\sqrt{\sum_{i=1}^{N} x^2 / N}}$

If both alpha is Nothing, this is just a normalization with root-mean-square function applied on the first dimension.

source
NeuralAttentionlib.scalar_relative_position_embeddingFunction
scalar_relative_position_embedding(relative_position_id_func, embedding_table, score, args...)

A relative position embedding that produce a trainable scalar bias for each value in the attention score. relative_position_id_func is a function that take the attention score and return a relative_position_id matrix with the same size of the attention score with batches (normally (key length, query length)). This relative_position_id would be used to index (or gather) the embedding_table. embedding_table is an array with multiple dimensions, where the first dimension is the number of possible "id"s and the remaining dimensions are for giving different value to each heads. By default we treat the last dimension of attention score as the batch dimension and the dimension between last dimension and the "length" dimension as the head dimensions.

source
NeuralAttentionlib.scaled_dot_product_scoreFunction
 scaled_dot_product_score(q, k, s = sqrt(inv(size(k, 1))))

The scaled dot-product attention score function of a regular transformer layer.

$Score(Q, K) = \frac{QK^T}{\sqrt{d_k}}$

scaled_dot_product_score(f, q, k)

Apply a transform function f on q/k before dot-product.

See also: naive_qkv_attention

source
NeuralAttentionlib.split_headFunction
split_head(head::Int, x)

Split the first dimension into head piece of small vector. Equivalent to reshape(x, :, head, tail(size(x))...).

source
NeuralAttentionlib.with_rotary_position_embeddingFunction
with_rotary_position_embedding([size,] x)

Apply rotary position embedding to x. Can take an size argument and the rotary position embedding will only apply to x[1:size, :, ...]. Should be used with scaled_dot_product_score/dot_product_score.

source
NeuralAttentionlib.normalized_scoreFunction
normalized_score(norm) = normalized_score $ norm
+normalized_score(norm, score, args...)

Normalized attenion score api. norm is the normalize function (like softmax) and score is the function that compute attention score from args....

See also: naive_qkv_attention

source
NeuralAttentionlib.rms_layer_normFunction
rms_layer_norm([epsilon = 1e-5,] alpha, x)

Function which perform root-mean-square layer normalization on x. alpha and beta can a Vector, Number or Nothing.

$rms_layer_norm(α, x) = α\frac{x}{\sqrt{\sum_{i=1}^{N} x^2 / N}}$

If both alpha is Nothing, this is just a normalization with root-mean-square function applied on the first dimension.

source
NeuralAttentionlib.scalar_relative_position_embeddingFunction
scalar_relative_position_embedding(relative_position_id_func, embedding_table, score, args...)

A relative position embedding that produce a trainable scalar bias for each value in the attention score. relative_position_id_func is a function that take the attention score and return a relative_position_id matrix with the same size of the attention score with batches (normally (key length, query length)). This relative_position_id would be used to index (or gather) the embedding_table. embedding_table is an array with multiple dimensions, where the first dimension is the number of possible "id"s and the remaining dimensions are for giving different value to each heads. By default we treat the last dimension of attention score as the batch dimension and the dimension between last dimension and the "length" dimension as the head dimensions.

source
NeuralAttentionlib.scaled_dot_product_scoreFunction
 scaled_dot_product_score(q, k, s = sqrt(inv(size(k, 1))))

The scaled dot-product attention score function of a regular transformer layer.

$Score(Q, K) = \frac{QK^T}{\sqrt{d_k}}$

scaled_dot_product_score(f, q, k)

Apply a transform function f on q/k before dot-product.

See also: naive_qkv_attention

source
NeuralAttentionlib.split_headFunction
split_head(head::Int, x)

Split the first dimension into head piece of small vector. Equivalent to reshape(x, :, head, tail(size(x))...).

source
NeuralAttentionlib.with_rotary_position_embeddingFunction
with_rotary_position_embedding([size,] x)

Apply rotary position embedding to x. Can take an size argument and the rotary position embedding will only apply to x[1:size, :, ...]. Should be used with scaled_dot_product_score/dot_product_score.

source
NeuralAttentionlib.PrefixedFunctionType
PrefixedFunction(f, args::NTuple{N}) <: Function

A type representating a partially-applied version of the function f, with the first N arguments fixed to the values args. In other words, PrefixedFunction(f, args) behaves similarly to (xs...)->f(args..., xs...).

See also NeuralAttentionlib.:$.

source

Mask

NeuralAttentionlib.apply_maskMethod
apply_mask(op::GenericMaskOp, mask::AbstractMask, score)

Equivalent to op.apply(score, op.scale .* (op.flip ? .! mask : mask)).

Example

julia> x = randn(10, 10);
+end

Structure for holding parameter of multihead_qkv_attention.

(op::MultiheadQKVAttenOp)(q, k, v, mask = nothing)

Perform multihead attention.

source
NeuralAttentionlib.PrefixedFunctionType
PrefixedFunction(f, args::NTuple{N}) <: Function

A type representating a partially-applied version of the function f, with the first N arguments fixed to the values args. In other words, PrefixedFunction(f, args) behaves similarly to (xs...)->f(args..., xs...).

See also NeuralAttentionlib.:$.

source

Mask

NeuralAttentionlib.apply_maskMethod
apply_mask(op::GenericMaskOp, mask::AbstractMask, score)

Equivalent to op.apply(score, op.scale .* (op.flip ? .! mask : mask)).

Example

julia> x = randn(10, 10);
 
 julia> m = CausalMask()
 CausalMask()
 
 julia> apply_mask(GenericMaskOp(.+, true, -1e9), m, x) ==  @. x + (!m * -1e9)
 true
-
source
NeuralAttentionlib.BatchedMaskType
BatchedMask(mask::AbstractMask) <: AbstractWrapperMask

Attention mask wrapper over array mask for applying the same mask within the same batch.

Example

julia> m = SymLengthMask([2,3])
+
source
NeuralAttentionlib.BatchedMaskType
BatchedMask(mask::AbstractMask) <: AbstractWrapperMask

Attention mask wrapper over array mask for applying the same mask within the same batch.

Example

julia> m = SymLengthMask([2,3])
 SymLengthMask{1, Vector{Int32}}(Int32[2, 3])
 
 julia> trues(3,3, 2) .* m
@@ -88,7 +88,7 @@
  1  1  1
  1  1  1
  1  1  1
-
source
NeuralAttentionlib.BiLengthMaskType
BiLengthMask(q_len::A, k_len::A) where {A <: AbstractArray{Int, N}} <: AbstractArrayMask

Attention mask specified by two arrays of integer that indicate the length dimension size.

Example

julia> bm = BiLengthMask([2,3], [3, 5])
+
source
NeuralAttentionlib.BiLengthMaskType
BiLengthMask(q_len::A, k_len::A) where {A <: AbstractArray{Int, N}} <: AbstractArrayMask

Attention mask specified by two arrays of integer that indicate the length dimension size.

Example

julia> bm = BiLengthMask([2,3], [3, 5])
 BiLengthMask{1, Vector{Int32}}(Int32[2, 3], Int32[3, 5])
 
 julia> trues(5,5, 2) .* bm
@@ -106,7 +106,7 @@
  1  1  1  0  0
  1  1  1  0  0
  1  1  1  0  0
-

See also: SymLengthMask, BatchedMask, RepeatMask

source
NeuralAttentionlib.CausalMaskType
CausalMask() <: AbstractDatalessMask

Attention mask that block the future values.

Similar to applying LinearAlgebra.triu! on the score matrix

source
NeuralAttentionlib.CausalMaskType
CausalMask() <: AbstractDatalessMask

Attention mask that block the future values.

Similar to applying LinearAlgebra.triu! on the score matrix

source
NeuralAttentionlib.GenericSequenceMaskType
GenericSequenceMask(mask::AbstractArray{Bool}) <: AbstractSequenceMask

Create a sequence mask from an array of Bool.

Example

julia> m = GenericSequenceMask(rand(Bool, 10, 2))
 GenericSequenceMask{3, Array{Bool, 3}}([0 1 … 0 0;;; 1 0 … 1 0])
 
 julia> trues(7, 10, 2) .* m
@@ -136,7 +136,7 @@
 
 [:, :, 2] =
  1  0  1  1  0  1  1  1  1  0
-
source
NeuralAttentionlib.LengthMaskType
LengthMask(len::AbstractArray{Int, N}) <: AbstractSequenceMask

A Sequence Mask specified by an array of integer that indicate the length dimension size. Can be convert to attention mask (SymLengthMask, BiLengthMask) with AttenMask.

Example

julia> ones(7, 7, 2) .* LengthMask([3, 5])
 7×7×2 Array{Float64, 3}:
 [:, :, 1] =
  1.0  1.0  1.0  0.0  0.0  0.0  0.0
@@ -155,7 +155,7 @@
  1.0  1.0  1.0  1.0  1.0  0.0  0.0
  1.0  1.0  1.0  1.0  1.0  0.0  0.0
  1.0  1.0  1.0  1.0  1.0  0.0  0.0
-
source
NeuralAttentionlib.LocalMaskType
LocalMask(width::Int) <: AbstractDatalessMask

Attention mask that only allow local (diagonal like) values to pass.

width should be ≥ 0 and A .* LocalMask(1) is similar to Diagonal(A)

source
NeuralAttentionlib.RandomMaskType
RandomMask(p::Float64) <: AbstractDatalessMask

Attention mask that block value randomly.

p specify the percentage of value to block. e.g. A .* RandomMask(0) is equivalent to identity(A) and A .* RandomMask(1) is equivalent to zero(A).

source
NeuralAttentionlib.RepeatMaskType
RepeatMask(mask::AbstractMask, num::Int) <: AbstractWrapperMask

Attention mask wrapper over array mask for doing inner repeat on the last dimension.

Example

julia> m = SymLengthMask([2,3])
+
source
NeuralAttentionlib.LocalMaskType
LocalMask(width::Int) <: AbstractDatalessMask

Attention mask that only allow local (diagonal like) values to pass.

width should be ≥ 0 and A .* LocalMask(1) is similar to Diagonal(A)

source
NeuralAttentionlib.RandomMaskType
RandomMask(p::Float64) <: AbstractDatalessMask

Attention mask that block value randomly.

p specify the percentage of value to block. e.g. A .* RandomMask(0) is equivalent to identity(A) and A .* RandomMask(1) is equivalent to zero(A).

source
NeuralAttentionlib.RepeatMaskType
RepeatMask(mask::AbstractMask, num::Int) <: AbstractWrapperMask

Attention mask wrapper over array mask for doing inner repeat on the last dimension.

Example

julia> m = SymLengthMask([2,3])
 SymLengthMask{1, Vector{Int32}}(Int32[2, 3])
 
 julia> trues(3,3, 2) .* m
@@ -196,7 +196,7 @@
  1  1  1
  1  1  1
  1  1  1
-
source
NeuralAttentionlib.RevBiLengthMaskType
RevBiLengthMask(q_len::A, k_len::A) where {A <: AbstractArray{Int, N}} <: AbstractArrayMask

BiLengthMask but counts from the end of array, used for left padding.

Example

julia> bm = RevBiLengthMask([2,3], [3, 5])
+
source
NeuralAttentionlib.RevLengthMaskType
RevLengthMask(len::AbstractArray{Int, N}) <: AbstractSequenceMask

LengthMask but counts from the end of array, used for left padding. Can be convert to attention mask (RevSymLengthMask, RevBiLengthMask) with AttenMask.

Example

julia> ones(7, 7, 2) .* RevLengthMask([3, 5])
 7×7×2 Array{Float64, 3}:
 [:, :, 1] =
  0.0  0.0  0.0  0.0  1.0  1.0  1.0
@@ -233,7 +233,7 @@
  0.0  0.0  1.0  1.0  1.0  1.0  1.0
  0.0  0.0  1.0  1.0  1.0  1.0  1.0
  0.0  0.0  1.0  1.0  1.0  1.0  1.0
-
source
NeuralAttentionlib.SymLengthMaskType
SymLengthMask(len::AbstractArray{Int, N}) <: AbstractArrayMask

Attention mask specified by an array of integer that indicate the length dimension size. assuming Query length and Key length are the same.

Example

julia> m = SymLengthMask([2,3])
 SymLengthMask{1, Vector{Int32}}(Int32[2, 3])
 
 julia> trues(3,3, 2) .* m
@@ -261,7 +261,7 @@
  1  1  1
  1  1  1
  1  1  1
-

See also: LengthMask, BiLengthMask, BatchedMask, RepeatMask

source
Base.:!Method
!m::AbstractMask

Boolean not of an attention mask

source
Base.:&Method
m1::AbstractMask & m2::AbstractMask

logical and of two attention mask

source
Base.:|Method
m1::AbstractMask | m2::AbstractMask

logical or of two attention mask

source
NeuralAttentionlib.AttenMaskFunction
AttenMask(m::AbstractMask)

Convert mask into corresponding attention mask.

AttenMask(q_mask::AbstractSequenceMask, k_mask::AbstractSequenceMask)

Create a attention mask from 2 sequence masks specific the sequence mask for "query" and "key".

source
Base.:!Method
!m::AbstractMask

Boolean not of an attention mask

source
Base.:&Method
m1::AbstractMask & m2::AbstractMask

logical and of two attention mask

source
Base.:|Method
m1::AbstractMask | m2::AbstractMask

logical or of two attention mask

source
NeuralAttentionlib.AttenMaskFunction
AttenMask(m::AbstractMask)

Convert mask into corresponding attention mask.

AttenMask(q_mask::AbstractSequenceMask, k_mask::AbstractSequenceMask)

Create a attention mask from 2 sequence masks specific the sequence mask for "query" and "key".

source
NeuralAttentionlib.getmaskFunction
getmask(m::AbstractMask, score, scale = 1)

Convert m into mask array of AbstractArray for score with scale.

Example

julia> getmask(CausalMask(), randn(7,7), 2)
 7×7 Matrix{Float64}:
  2.0  2.0  2.0  2.0  2.0  2.0  2.0
  0.0  2.0  2.0  2.0  2.0  2.0  2.0
@@ -270,7 +270,7 @@
  0.0  0.0  0.0  0.0  2.0  2.0  2.0
  0.0  0.0  0.0  0.0  0.0  2.0  2.0
  0.0  0.0  0.0  0.0  0.0  0.0  2.0
-
source

Matmul

NeuralAttentionlib.collapsed_sizeFunction
collapsed_size(x, ni, nj [, n])::Dim{3}

Collapse the dimensionality of x into 3 according to ni and nj where ni, nj specify the number of second and third dimensions it take.

(X1, X2, ..., Xk, Xk+1, Xk+2, ..., Xk+ni, Xk+ni+1, ..., Xn)
+
source

Matmul

NeuralAttentionlib.collapsed_sizeFunction
collapsed_size(x, ni, nj [, n])::Dim{3}

Collapse the dimensionality of x into 3 according to ni and nj where ni, nj specify the number of second and third dimensions it take.

(X1, X2, ..., Xk, Xk+1, Xk+2, ..., Xk+ni, Xk+ni+1, ..., Xn)
  |______dim1___|  |_________ni_________|  |______nj______|

Example

julia> x = randn(7,6,5,4,3,2);
 
 julia> collapsed_size(x, 2, 2, 1)
@@ -284,7 +284,7 @@
 
 julia> collapsed_size(x, 2, 2)
 (42, 20, 6)
-

See also: noncollapsed_size

source
NeuralAttentionlib.matmulFunction
matmul(a::AbstractArray, b::AbstractArray, s::Number = 1)

Equivalent to s .* (a * b) if a and b are Vector or Matrix. For array with higher dimension, it will convert a and b to CollapsedDimsArray and perform batched matrix multiplication, and then return the result as CollapsedDimsArray. This is useful for preserving the dimensionality. If the batch dimension of a and b have different shape, it pick the shape of b for batch dimension. Work with NNlib.batch_transpose and NNlib.batch_adjoint.

Example

# b-dim shape: (6,)
+

See also: noncollapsed_size

source
NeuralAttentionlib.matmulFunction
matmul(a::AbstractArray, b::AbstractArray, s::Number = 1)

Equivalent to s .* (a * b) if a and b are Vector or Matrix. For array with higher dimension, it will convert a and b to CollapsedDimsArray and perform batched matrix multiplication, and then return the result as CollapsedDimsArray. This is useful for preserving the dimensionality. If the batch dimension of a and b have different shape, it pick the shape of b for batch dimension. Work with NNlib.batch_transpose and NNlib.batch_adjoint.

Example

# b-dim shape: (6,)
 julia> a = CollapsedDimsArray(randn(3,4,2,3,6), 2, 1); size(a)
 (12, 6, 6)
 
@@ -302,7 +302,7 @@
 # equivanlent to `batched_mul` but preserve shape
 julia> NNlib.batched_mul(collapseddims(a), collapseddims(b)) == collapseddims(matmul(a, b))
 true
-

See also: CollapsedDimsArray, unwrap_collapse, collapseddims

source
NeuralAttentionlib.noncollapsed_sizeFunction
noncollapsed_size(x, ni, nj [, n])

Collapse the dimensionality of x into 3 according to ni and nj.

(X1, X2, ..., Xk, Xk+1, Xk+2, ..., Xk+ni, Xk+ni+1, ..., Xn)
  |______dim1___|  |_________ni_________|  |______nj______|

But take the size before collapse. e.g. noncollapsed_size(x, ni, nj, 2) will be (Xi, Xi+1, ..., Xj-1).

Example

julia> x = randn(7,6,5,4,3,2);
 
 julia> noncollapsed_size(x, 2, 2, 1)
@@ -316,4 +316,4 @@
 
 julia> noncollapsed_size(x, 2, 2)
 ((7, 6), (5, 4), (3, 2))
-

See also: collapsed_size

source
NeuralAttentionlib.scaled_matmulFunction
scaled_matmul(a::AbstractArray, b::AbstractArray, s::Number = 1)

Basically equivalent to unwrap_collapse(matmul(a, b, s)), but not differentiable w.r.t. to s.

source
+

See also: collapsed_size

source
NeuralAttentionlib.scaled_matmulFunction
scaled_matmul(a::AbstractArray, b::AbstractArray, s::Number = 1)

Basically equivalent to unwrap_collapse(matmul(a, b, s)), but not differentiable w.r.t. to s.

source
diff --git a/dev/assets/documenter.js b/dev/assets/documenter.js index f531160..c6562b5 100644 --- a/dev/assets/documenter.js +++ b/dev/assets/documenter.js @@ -4,7 +4,6 @@ requirejs.config({ 'highlight-julia': 'https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/languages/julia.min', 'headroom': 'https://cdnjs.cloudflare.com/ajax/libs/headroom/0.12.0/headroom.min', 'jqueryui': 'https://cdnjs.cloudflare.com/ajax/libs/jqueryui/1.13.2/jquery-ui.min', - 'minisearch': 'https://cdn.jsdelivr.net/npm/minisearch@6.1.0/dist/umd/index.min', 'katex-auto-render': 'https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/contrib/auto-render.min', 'jquery': 'https://cdnjs.cloudflare.com/ajax/libs/jquery/3.7.0/jquery.min', 'headroom-jquery': 'https://cdnjs.cloudflare.com/ajax/libs/headroom/0.12.0/jQuery.headroom.min', @@ -103,9 +102,10 @@ $(document).on("click", ".docstring header", function () { }); }); -$(document).on("click", ".docs-article-toggle-button", function () { +$(document).on("click", ".docs-article-toggle-button", function (event) { let articleToggleTitle = "Expand docstring"; let navArticleToggleTitle = "Expand all docstrings"; + let animationSpeed = event.noToggleAnimation ? 0 : 400; debounce(() => { if (isExpanded) { @@ -116,7 +116,7 @@ $(document).on("click", ".docs-article-toggle-button", function () { isExpanded = false; - $(".docstring section").slideUp(); + $(".docstring section").slideUp(animationSpeed); } else { $(this).removeClass("fa-chevron-down").addClass("fa-chevron-up"); $(".docstring-article-toggle-button") @@ -127,7 +127,7 @@ $(document).on("click", ".docs-article-toggle-button", function () { articleToggleTitle = "Collapse docstring"; navArticleToggleTitle = "Collapse all docstrings"; - $(".docstring section").slideDown(); + $(".docstring section").slideDown(animationSpeed); } $(this).prop("title", navArticleToggleTitle); @@ -224,224 +224,465 @@ $(document).ready(function () { }) //////////////////////////////////////////////////////////////////////////////// -require(['jquery', 'minisearch'], function($, minisearch) { - -// In general, most search related things will have "search" as a prefix. -// To get an in-depth about the thought process you can refer: https://hetarth02.hashnode.dev/series/gsoc +require(['jquery'], function($) { -let results = []; -let timer = undefined; +$(document).ready(function () { + let meta = $("div[data-docstringscollapsed]").data(); -let data = documenterSearchIndex["docs"].map((x, key) => { - x["id"] = key; // minisearch requires a unique for each object - return x; + if (meta?.docstringscollapsed) { + $("#documenter-article-toggle-button").trigger({ + type: "click", + noToggleAnimation: true, + }); + } }); -// list below is the lunr 2.1.3 list minus the intersect with names(Base) -// (all, any, get, in, is, only, which) and (do, else, for, let, where, while, with) -// ideally we'd just filter the original list but it's not available as a variable -const stopWords = new Set([ - "a", - "able", - "about", - "across", - "after", - "almost", - "also", - "am", - "among", - "an", - "and", - "are", - "as", - "at", - "be", - "because", - "been", - "but", - "by", - "can", - "cannot", - "could", - "dear", - "did", - "does", - "either", - "ever", - "every", - "from", - "got", - "had", - "has", - "have", - "he", - "her", - "hers", - "him", - "his", - "how", - "however", - "i", - "if", - "into", - "it", - "its", - "just", - "least", - "like", - "likely", - "may", - "me", - "might", - "most", - "must", - "my", - "neither", - "no", - "nor", - "not", - "of", - "off", - "often", - "on", - "or", - "other", - "our", - "own", - "rather", - "said", - "say", - "says", - "she", - "should", - "since", - "so", - "some", - "than", - "that", - "the", - "their", - "them", - "then", - "there", - "these", - "they", - "this", - "tis", - "to", - "too", - "twas", - "us", - "wants", - "was", - "we", - "were", - "what", - "when", - "who", - "whom", - "why", - "will", - "would", - "yet", - "you", - "your", -]); - -let index = new minisearch({ - fields: ["title", "text"], // fields to index for full-text search - storeFields: ["location", "title", "text", "category", "page"], // fields to return with search results - processTerm: (term) => { - let word = stopWords.has(term) ? null : term; - if (word) { - // custom trimmer that doesn't strip @ and !, which are used in julia macro and function names - word = word - .replace(/^[^a-zA-Z0-9@!]+/, "") - .replace(/[^a-zA-Z0-9@!]+$/, ""); - } +}) +//////////////////////////////////////////////////////////////////////////////// +require(['jquery'], function($) { - return word ?? null; - }, - // add . as a separator, because otherwise "title": "Documenter.Anchors.add!", would not find anything if searching for "add!", only for the entire qualification - tokenize: (string) => string.split(/[\s\-\.]+/), - // options which will be applied during the search - searchOptions: { - boost: { title: 100 }, - fuzzy: 2, +/* +To get an in-depth about the thought process you can refer: https://hetarth02.hashnode.dev/series/gsoc + +PSEUDOCODE: + +Searching happens automatically as the user types or adjusts the selected filters. +To preserve responsiveness, as much as possible of the slow parts of the search are done +in a web worker. Searching and result generation are done in the worker, and filtering and +DOM updates are done in the main thread. The filters are in the main thread as they should +be very quick to apply. This lets filters be changed without re-searching with minisearch +(which is possible even if filtering is on the worker thread) and also lets filters be +changed _while_ the worker is searching and without message passing (neither of which are +possible if filtering is on the worker thread) + +SEARCH WORKER: + +Import minisearch + +Build index + +On message from main thread + run search + find the first 200 unique results from each category, and compute their divs for display + note that this is necessary and sufficient information for the main thread to find the + first 200 unique results from any given filter set + post results to main thread + +MAIN: + +Launch worker + +Declare nonconstant globals (worker_is_running, last_search_text, unfiltered_results) + +On text update + if worker is not running, launch_search() + +launch_search + set worker_is_running to true, set last_search_text to the search text + post the search query to worker + +on message from worker + if last_search_text is not the same as the text in the search field, + the latest search result is not reflective of the latest search query, so update again + launch_search() + otherwise + set worker_is_running to false + + regardless, display the new search results to the user + save the unfiltered_results as a global + update_search() + +on filter click + adjust the filter selection + update_search() + +update_search + apply search filters by looping through the unfiltered_results and finding the first 200 + unique results that match the filters + + Update the DOM +*/ + +/////// SEARCH WORKER /////// + +function worker_function(documenterSearchIndex, documenterBaseURL, filters) { + importScripts( + "https://cdn.jsdelivr.net/npm/minisearch@6.1.0/dist/umd/index.min.js" + ); + + let data = documenterSearchIndex.map((x, key) => { + x["id"] = key; // minisearch requires a unique for each object + return x; + }); + + // list below is the lunr 2.1.3 list minus the intersect with names(Base) + // (all, any, get, in, is, only, which) and (do, else, for, let, where, while, with) + // ideally we'd just filter the original list but it's not available as a variable + const stopWords = new Set([ + "a", + "able", + "about", + "across", + "after", + "almost", + "also", + "am", + "among", + "an", + "and", + "are", + "as", + "at", + "be", + "because", + "been", + "but", + "by", + "can", + "cannot", + "could", + "dear", + "did", + "does", + "either", + "ever", + "every", + "from", + "got", + "had", + "has", + "have", + "he", + "her", + "hers", + "him", + "his", + "how", + "however", + "i", + "if", + "into", + "it", + "its", + "just", + "least", + "like", + "likely", + "may", + "me", + "might", + "most", + "must", + "my", + "neither", + "no", + "nor", + "not", + "of", + "off", + "often", + "on", + "or", + "other", + "our", + "own", + "rather", + "said", + "say", + "says", + "she", + "should", + "since", + "so", + "some", + "than", + "that", + "the", + "their", + "them", + "then", + "there", + "these", + "they", + "this", + "tis", + "to", + "too", + "twas", + "us", + "wants", + "was", + "we", + "were", + "what", + "when", + "who", + "whom", + "why", + "will", + "would", + "yet", + "you", + "your", + ]); + + let index = new MiniSearch({ + fields: ["title", "text"], // fields to index for full-text search + storeFields: ["location", "title", "text", "category", "page"], // fields to return with results processTerm: (term) => { let word = stopWords.has(term) ? null : term; if (word) { + // custom trimmer that doesn't strip @ and !, which are used in julia macro and function names word = word .replace(/^[^a-zA-Z0-9@!]+/, "") .replace(/[^a-zA-Z0-9@!]+$/, ""); + + word = word.toLowerCase(); } return word ?? null; }, + // add . as a separator, because otherwise "title": "Documenter.Anchors.add!", would not + // find anything if searching for "add!", only for the entire qualification tokenize: (string) => string.split(/[\s\-\.]+/), - }, -}); + // options which will be applied during the search + searchOptions: { + prefix: true, + boost: { title: 100 }, + fuzzy: 2, + }, + }); -index.addAll(data); + index.addAll(data); + + /** + * Used to map characters to HTML entities. + * Refer: https://github.com/lodash/lodash/blob/main/src/escape.ts + */ + const htmlEscapes = { + "&": "&", + "<": "<", + ">": ">", + '"': """, + "'": "'", + }; + + /** + * Used to match HTML entities and HTML characters. + * Refer: https://github.com/lodash/lodash/blob/main/src/escape.ts + */ + const reUnescapedHtml = /[&<>"']/g; + const reHasUnescapedHtml = RegExp(reUnescapedHtml.source); + + /** + * Escape function from lodash + * Refer: https://github.com/lodash/lodash/blob/main/src/escape.ts + */ + function escape(string) { + return string && reHasUnescapedHtml.test(string) + ? string.replace(reUnescapedHtml, (chr) => htmlEscapes[chr]) + : string || ""; + } -let filters = [...new Set(data.map((x) => x.category))]; -var modal_filters = make_modal_body_filters(filters); -var filter_results = []; + /** + * Make the result component given a minisearch result data object and the value + * of the search input as queryString. To view the result object structure, refer: + * https://lucaong.github.io/minisearch/modules/_minisearch_.html#searchresult + * + * @param {object} result + * @param {string} querystring + * @returns string + */ + function make_search_result(result, querystring) { + let search_divider = `
`; + let display_link = + result.location.slice(Math.max(0), Math.min(50, result.location.length)) + + (result.location.length > 30 ? "..." : ""); // To cut-off the link because it messes with the overflow of the whole div + + if (result.page !== "") { + display_link += ` (${result.page})`; + } -$(document).on("keyup", ".documenter-search-input", function (event) { - // Adding a debounce to prevent disruptions from super-speed typing! - debounce(() => update_search(filter_results), 300); + let textindex = new RegExp(`${querystring}`, "i").exec(result.text); + let text = + textindex !== null + ? result.text.slice( + Math.max(textindex.index - 100, 0), + Math.min( + textindex.index + querystring.length + 100, + result.text.length + ) + ) + : ""; // cut-off text before and after from the match + + text = text.length ? escape(text) : ""; + + let display_result = text.length + ? "..." + + text.replace( + new RegExp(`${escape(querystring)}`, "i"), // For first occurrence + '$&' + ) + + "..." + : ""; // highlights the match + + let in_code = false; + if (!["page", "section"].includes(result.category.toLowerCase())) { + in_code = true; + } + + // We encode the full url to escape some special characters which can lead to broken links + let result_div = ` + +
+
${escape(result.title)}
+
${result.category}
+
+

+ ${display_result} +

+
+ ${display_link} +
+
+ ${search_divider} + `; + + return result_div; + } + + self.onmessage = function (e) { + let query = e.data; + let results = index.search(query, { + filter: (result) => { + // Only return relevant results + return result.score >= 1; + }, + }); + + // Pre-filter to deduplicate and limit to 200 per category to the extent + // possible without knowing what the filters are. + let filtered_results = []; + let counts = {}; + for (let filter of filters) { + counts[filter] = 0; + } + let present = {}; + + for (let result of results) { + cat = result.category; + cnt = counts[cat]; + if (cnt < 200) { + id = cat + "---" + result.location; + if (present[id]) { + continue; + } + present[id] = true; + filtered_results.push({ + location: result.location, + category: cat, + div: make_search_result(result, query), + }); + } + } + + postMessage(filtered_results); + }; +} + +// `worker = Threads.@spawn worker_function(documenterSearchIndex)`, but in JavaScript! +const filters = [ + ...new Set(documenterSearchIndex["docs"].map((x) => x.category)), +]; +const worker_str = + "(" + + worker_function.toString() + + ")(" + + JSON.stringify(documenterSearchIndex["docs"]) + + "," + + JSON.stringify(documenterBaseURL) + + "," + + JSON.stringify(filters) + + ")"; +const worker_blob = new Blob([worker_str], { type: "text/javascript" }); +const worker = new Worker(URL.createObjectURL(worker_blob)); + +/////// SEARCH MAIN /////// + +// Whether the worker is currently handling a search. This is a boolean +// as the worker only ever handles 1 or 0 searches at a time. +var worker_is_running = false; + +// The last search text that was sent to the worker. This is used to determine +// if the worker should be launched again when it reports back results. +var last_search_text = ""; + +// The results of the last search. This, in combination with the state of the filters +// in the DOM, is used compute the results to display on calls to update_search. +var unfiltered_results = []; + +// Which filter is currently selected +var selected_filter = ""; + +$(document).on("input", ".documenter-search-input", function (event) { + if (!worker_is_running) { + launch_search(); + } }); +function launch_search() { + worker_is_running = true; + last_search_text = $(".documenter-search-input").val(); + worker.postMessage(last_search_text); +} + +worker.onmessage = function (e) { + if (last_search_text !== $(".documenter-search-input").val()) { + launch_search(); + } else { + worker_is_running = false; + } + + unfiltered_results = e.data; + update_search(); +}; + $(document).on("click", ".search-filter", function () { if ($(this).hasClass("search-filter-selected")) { - $(this).removeClass("search-filter-selected"); + selected_filter = ""; } else { - $(this).addClass("search-filter-selected"); + selected_filter = $(this).text().toLowerCase(); } - // Adding a debounce to prevent disruptions from crazy clicking! - debounce(() => get_filters(), 300); + // This updates search results and toggles classes for UI: + update_search(); }); -/** - * A debounce function, takes a function and an optional timeout in milliseconds - * - * @function callback - * @param {number} timeout - */ -function debounce(callback, timeout = 300) { - clearTimeout(timer); - timer = setTimeout(callback, timeout); -} - /** * Make/Update the search component - * - * @param {string[]} selected_filters */ -function update_search(selected_filters = []) { - let initial_search_body = ` -
Type something to get started!
- `; - +function update_search() { let querystring = $(".documenter-search-input").val(); if (querystring.trim()) { - results = index.search(querystring, { - filter: (result) => { - // Filtering results - if (selected_filters.length === 0) { - return result.score >= 1; - } else { - return ( - result.score >= 1 && selected_filters.includes(result.category) - ); - } - }, - }); + if (selected_filter == "") { + results = unfiltered_results; + } else { + results = unfiltered_results.filter((result) => { + return selected_filter == result.category.toLowerCase(); + }); + } let search_result_container = ``; + let modal_filters = make_modal_body_filters(); let search_divider = `
`; if (results.length) { @@ -449,19 +690,23 @@ function update_search(selected_filters = []) { let count = 0; let search_results = ""; - results.forEach(function (result) { - if (result.location) { - // Checking for duplication of results for the same page - if (!links.includes(result.location)) { - search_results += make_search_result(result, querystring); - count++; - } - + for (var i = 0, n = results.length; i < n && count < 200; ++i) { + let result = results[i]; + if (result.location && !links.includes(result.location)) { + search_results += result.div; + count++; links.push(result.location); } - }); + } - let result_count = `
${count} result(s)
`; + if (count == 1) { + count_str = "1 result"; + } else if (count == 200) { + count_str = "200+ results"; + } else { + count_str = count + " results"; + } + let result_count = `
${count_str}
`; search_result_container = `
@@ -490,125 +735,37 @@ function update_search(selected_filters = []) { $(".search-modal-card-body").html(search_result_container); } else { - filter_results = []; - modal_filters = make_modal_body_filters(filters, filter_results); - if (!$(".search-modal-card-body").hasClass("is-justify-content-center")) { $(".search-modal-card-body").addClass("is-justify-content-center"); } - $(".search-modal-card-body").html(initial_search_body); + $(".search-modal-card-body").html(` +
Type something to get started!
+ `); } } /** * Make the modal filter html * - * @param {string[]} filters - * @param {string[]} selected_filters * @returns string */ -function make_modal_body_filters(filters, selected_filters = []) { - let str = ``; - - filters.forEach((val) => { - if (selected_filters.includes(val)) { - str += `${val}`; - } else { - str += `${val}`; - } - }); +function make_modal_body_filters() { + let str = filters + .map((val) => { + if (selected_filter == val.toLowerCase()) { + return `${val}`; + } else { + return `${val}`; + } + }) + .join(""); - let filter_html = ` + return `
Filters: ${str} -
- `; - - return filter_html; -} - -/** - * Make the result component given a minisearch result data object and the value of the search input as queryString. - * To view the result object structure, refer: https://lucaong.github.io/minisearch/modules/_minisearch_.html#searchresult - * - * @param {object} result - * @param {string} querystring - * @returns string - */ -function make_search_result(result, querystring) { - let search_divider = `
`; - let display_link = - result.location.slice(Math.max(0), Math.min(50, result.location.length)) + - (result.location.length > 30 ? "..." : ""); // To cut-off the link because it messes with the overflow of the whole div - - if (result.page !== "") { - display_link += ` (${result.page})`; - } - - let textindex = new RegExp(`\\b${querystring}\\b`, "i").exec(result.text); - let text = - textindex !== null - ? result.text.slice( - Math.max(textindex.index - 100, 0), - Math.min( - textindex.index + querystring.length + 100, - result.text.length - ) - ) - : ""; // cut-off text before and after from the match - - let display_result = text.length - ? "..." + - text.replace( - new RegExp(`\\b${querystring}\\b`, "i"), // For first occurrence - '$&' - ) + - "..." - : ""; // highlights the match - - let in_code = false; - if (!["page", "section"].includes(result.category.toLowerCase())) { - in_code = true; - } - - // We encode the full url to escape some special characters which can lead to broken links - let result_div = ` - -
-
${result.title}
-
${result.category}
-
-

- ${display_result} -

-
- ${display_link} -
-
- ${search_divider} - `; - - return result_div; -} - -/** - * Get selected filters, remake the filter html and lastly update the search modal - */ -function get_filters() { - let ele = $(".search-filters .search-filter-selected").get(); - filter_results = ele.map((x) => $(x).text().toLowerCase()); - modal_filters = make_modal_body_filters(filters, filter_results); - update_search(filter_results); +
`; } }) @@ -635,103 +792,107 @@ $(document).ready(function () { //////////////////////////////////////////////////////////////////////////////// require(['jquery'], function($) { -let search_modal_header = ` - -`; - -let initial_search_body = ` -
Type something to get started!
-`; - -let search_modal_footer = ` - -`; - -$(document.body).append( - ` - + (main output)

The attention operation is actually a special way to "mix" (or "pick" in common lecture) the input information. In (probably) the first attention paper, the attention is defined as weighted sum of the input sequence given a word embedding. The idea is furthur generalize to QKV attention in the first transformer paper.

1. Attention Score

The attention score is used to decide how much the each piece of input information will contribute to the output value and also how many entry the attention operation will output. The operation that will modify the attention score matrix should be consider as part of this block. For example: Different attention masks (local attention, random attention, ...), normalization (softmax, l2-norm, ...), and some special attention that take other inputs (transformer decoder, relative position encoding, ...).

2. Mixing

We refer to the operation that take the attention score and input value as "mixing". Usually it's just a weighted sum over the input value and use the attention score as the weight.

3. Attention Operation

The whole scoring + mixing and other pre/post processing made up an attention operation. Things like handling multi-head should happen at this level.

Attention Mask

Attention masks are a bunch of operation that modified the attention score.

1. Dataless mask

We use "dataless" to refer to masks that are independent to the input. For example, CausalMask works the same on each data regardless of the batch size or the data content.

2. Array mask

We call the mask that is dependent to the input as "array mask". For example, SymLengthMask is used to avoid the padding token being considered in the attention operation, thus each data batch might have different mask value.