API Reference
Base.:!
Base.:&
Base.:|
NeuralAttentionlib.:$
NeuralAttentionlib.AttenMask
NeuralAttentionlib.GetIndexer
NeuralAttentionlib.alibi_position_embedding
NeuralAttentionlib.apply_mask
NeuralAttentionlib.apply_mask
NeuralAttentionlib.attention_score
NeuralAttentionlib.biased_score
NeuralAttentionlib.collapsed_size
NeuralAttentionlib.collapseddims
NeuralAttentionlib.collapseddims
NeuralAttentionlib.dot_product_score
NeuralAttentionlib.generic_grouped_query_attention
NeuralAttentionlib.generic_multihead_qkv_attention
NeuralAttentionlib.generic_qkv_attention
NeuralAttentionlib.get_sincos_position_embeddings
NeuralAttentionlib.getmask
NeuralAttentionlib.grouped_query_attention
NeuralAttentionlib.layer_norm
NeuralAttentionlib.lengths
NeuralAttentionlib.masked_score
NeuralAttentionlib.matmul
NeuralAttentionlib.merge_head
NeuralAttentionlib.mixing
NeuralAttentionlib.move_head_dim_in
NeuralAttentionlib.move_head_dim_in_perm
NeuralAttentionlib.move_head_dim_out
NeuralAttentionlib.move_head_dim_out_perm
NeuralAttentionlib.multihead_qkv_attention
NeuralAttentionlib.naive_qkv_attention
NeuralAttentionlib.noncollapsed_size
NeuralAttentionlib.normalized_score
NeuralAttentionlib.rms_layer_norm
NeuralAttentionlib.scalar_relative_position_embedding
NeuralAttentionlib.scaled_dot_product_score
NeuralAttentionlib.scaled_matmul
NeuralAttentionlib.split_head
NeuralAttentionlib.t5_bucketed_position_id
NeuralAttentionlib.t5_causal_bucketed_position_id
NeuralAttentionlib.unwrap_collapse
NeuralAttentionlib.weighted_sum_mixing
NeuralAttentionlib.with_rotary_position_embedding
NeuralAttentionlib.AbstractArrayMask
NeuralAttentionlib.AbstractAttenMask
NeuralAttentionlib.AbstractDatalessMask
NeuralAttentionlib.AbstractMask
NeuralAttentionlib.AbstractMaskOp
NeuralAttentionlib.AbstractSeqMask
NeuralAttentionlib.BandPartMask
NeuralAttentionlib.BatchedMask
NeuralAttentionlib.BiLengthMask
NeuralAttentionlib.BiSeqMask
NeuralAttentionlib.CausalGroupedQueryAttenOp
NeuralAttentionlib.CausalGroupedQueryAttenOpWithScore
NeuralAttentionlib.CausalMask
NeuralAttentionlib.CausalMultiheadQKVAttenOp
NeuralAttentionlib.CausalMultiheadQKVAttenOpWithScore
NeuralAttentionlib.CollapsedDimsArray
NeuralAttentionlib.GenericAttenMask
NeuralAttentionlib.GenericSeqMask
NeuralAttentionlib.GroupedQueryAttenOp
NeuralAttentionlib.GroupedQueryAttenOpWithScore
NeuralAttentionlib.Indexer
NeuralAttentionlib.LengthMask
NeuralAttentionlib.LocalMask
NeuralAttentionlib.MultiheadQKVAttenOp
NeuralAttentionlib.MultiheadQKVAttenOpWithScore
NeuralAttentionlib.PrefixedFunction
NeuralAttentionlib.RandomMask
NeuralAttentionlib.RepeatMask
NeuralAttentionlib.RevBiLengthMask
NeuralAttentionlib.RevLengthMask
NeuralAttentionlib.RevSymLengthMask
NeuralAttentionlib.SymLengthMask
Functional
NeuralAttentionlib.alibi_position_embedding
— Functionalibi_position_embedding(mask::Union{AbstractMask, Nothing}, score, args...)
Add the non-trainable ALiBi position embedding to the attention score. The ALiBi embedding varied for each head, which assuming the attention is multi-head variants. The first dimension of the batch dimension of the attention score is treated as the head dimension. mask
can either be a attention mask or nothing
. Usually, it is needed when there are gaps or prefix paddings in the samples.
NeuralAttentionlib.attention_score
— Functionattention_score(f, args...) = f(args...)
Attention score api. Can be overload for doing custom implementation with generic_qkv_attention
. f
is the score function.
See also: generic_qkv_attention
, generic_multihead_qkv_attention
, mixing
NeuralAttentionlib.biased_score
— Functionbiased_score(bias, score, args...)
Adding a precomputed bias
to the attention score. bias
should be in shape (key length, query length, ...)
and size(bias, 1) == size(s, 1) == size(bias, 2) == size(s, 2) && ndims(bias) <= ndims(s)
where s = score(args...)
must hold.
NeuralAttentionlib.dot_product_score
— Functiondot_product_score(q, k)
Dot-product attention score function. Equivalent to scaled_dot_product_score(q, k, 1)
.
See also: scaled_dot_product_score
NeuralAttentionlib.generic_grouped_query_attention
— Functiongeneric_grouped_query_attention(mixingf, scoref, head, group, q, k, v, args...)
Generic version grouped_query_attention
. Need to specify mixing and score functon.
NeuralAttentionlib.generic_multihead_qkv_attention
— Functiongeneric_multihead_qkv_attention(mixingf, scoref, head, q, k, v, args...)
Generic version of multihead_qkv_attention
. Need to specify mixing and score function.
NeuralAttentionlib.generic_qkv_attention
— Functiongeneric_qkv_attention(mixingf, scoref, q, k, v, args...)
Generic version of naive_qkv_attention
. Need to specify mixing and score function.
NeuralAttentionlib.get_sincos_position_embeddings
— Functionget_sincos_position_embeddings(hidden_size::Integer, normalized::Bool, x)
sincos position embeddings. x
can be either a integer specifying the length or an array of position indices.
NeuralAttentionlib.grouped_query_attention
— Functiongrouped_query_attention(head, group, q, k, v, mask=nothing)
Similar to multihead_qkv_attention
, but multiple queries are using the same group of keys/values.
NeuralAttentionlib.layer_norm
— Functionlayer_norm([epsilon = 1e-5,] alpha, beta, x)
Function which perform layer normalization on x
. alpha
and beta
can a Vector
, Number
or Nothing
.
$layer_norm(α, β, x) = α\frac{(x - μ)}{σ} + β$
If both alpha
and beta
is Nothing
, this is just a standardize function applied on the first dimension.
NeuralAttentionlib.masked_score
— Functionmasked_score(mask) = masked_score $ mask
+API Reference · NeuralAttentionlib.jl API Reference
Base.:!
Base.:&
Base.:|
NeuralAttentionlib.:$
NeuralAttentionlib.AttenMask
NeuralAttentionlib.GetIndexer
NeuralAttentionlib.alibi_position_embedding
NeuralAttentionlib.apply_mask
NeuralAttentionlib.apply_mask
NeuralAttentionlib.attention_score
NeuralAttentionlib.biased_score
NeuralAttentionlib.collapsed_size
NeuralAttentionlib.collapseddims
NeuralAttentionlib.collapseddims
NeuralAttentionlib.dot_product_score
NeuralAttentionlib.generic_grouped_query_attention
NeuralAttentionlib.generic_multihead_qkv_attention
NeuralAttentionlib.generic_qkv_attention
NeuralAttentionlib.get_sincos_position_embeddings
NeuralAttentionlib.getmask
NeuralAttentionlib.grouped_query_attention
NeuralAttentionlib.layer_norm
NeuralAttentionlib.lengths
NeuralAttentionlib.masked_score
NeuralAttentionlib.matmul
NeuralAttentionlib.merge_head
NeuralAttentionlib.mixing
NeuralAttentionlib.move_head_dim_in
NeuralAttentionlib.move_head_dim_in_perm
NeuralAttentionlib.move_head_dim_out
NeuralAttentionlib.move_head_dim_out_perm
NeuralAttentionlib.multihead_qkv_attention
NeuralAttentionlib.naive_qkv_attention
NeuralAttentionlib.noncollapsed_size
NeuralAttentionlib.normalized_score
NeuralAttentionlib.rms_layer_norm
NeuralAttentionlib.scalar_relative_position_embedding
NeuralAttentionlib.scaled_dot_product_score
NeuralAttentionlib.scaled_matmul
NeuralAttentionlib.split_head
NeuralAttentionlib.t5_bucketed_position_id
NeuralAttentionlib.t5_causal_bucketed_position_id
NeuralAttentionlib.unwrap_collapse
NeuralAttentionlib.weighted_sum_mixing
NeuralAttentionlib.with_rotary_position_embedding
NeuralAttentionlib.AbstractArrayMask
NeuralAttentionlib.AbstractAttenMask
NeuralAttentionlib.AbstractDatalessMask
NeuralAttentionlib.AbstractMask
NeuralAttentionlib.AbstractMaskOp
NeuralAttentionlib.AbstractSeqMask
NeuralAttentionlib.BandPartMask
NeuralAttentionlib.BatchedMask
NeuralAttentionlib.BiLengthMask
NeuralAttentionlib.BiSeqMask
NeuralAttentionlib.CausalGroupedQueryAttenOp
NeuralAttentionlib.CausalGroupedQueryAttenOpWithScore
NeuralAttentionlib.CausalMask
NeuralAttentionlib.CausalMultiheadQKVAttenOp
NeuralAttentionlib.CausalMultiheadQKVAttenOpWithScore
NeuralAttentionlib.CollapsedDimsArray
NeuralAttentionlib.GenericAttenMask
NeuralAttentionlib.GenericSeqMask
NeuralAttentionlib.GroupedQueryAttenOp
NeuralAttentionlib.GroupedQueryAttenOpWithScore
NeuralAttentionlib.Indexer
NeuralAttentionlib.LengthMask
NeuralAttentionlib.LocalMask
NeuralAttentionlib.MultiheadQKVAttenOp
NeuralAttentionlib.MultiheadQKVAttenOpWithScore
NeuralAttentionlib.PrefixedFunction
NeuralAttentionlib.RandomMask
NeuralAttentionlib.RepeatMask
NeuralAttentionlib.RevBiLengthMask
NeuralAttentionlib.RevLengthMask
NeuralAttentionlib.RevSymLengthMask
NeuralAttentionlib.SymLengthMask
Functional
NeuralAttentionlib.alibi_position_embedding
— Functionalibi_position_embedding(mask::Union{AbstractMask, Nothing}, score, args...)
Add the non-trainable ALiBi position embedding to the attention score. The ALiBi embedding varied for each head, which assuming the attention is multi-head variants. The first dimension of the batch dimension of the attention score is treated as the head dimension. mask
can either be a attention mask or nothing
. Usually, it is needed when there are gaps or prefix paddings in the samples.
sourceNeuralAttentionlib.attention_score
— Functionattention_score(f, args...) = f(args...)
Attention score api. Can be overload for doing custom implementation with generic_qkv_attention
. f
is the score function.
See also: generic_qkv_attention
, generic_multihead_qkv_attention
, mixing
sourceNeuralAttentionlib.biased_score
— Functionbiased_score(bias, score, args...)
Adding a precomputed bias
to the attention score. bias
should be in shape (key length, query length, ...)
and size(bias, 1) == size(s, 1) == size(bias, 2) == size(s, 2) && ndims(bias) <= ndims(s)
where s = score(args...)
must hold.
sourceNeuralAttentionlib.dot_product_score
— Functiondot_product_score(q, k)
Dot-product attention score function. Equivalent to scaled_dot_product_score(q, k, 1)
.
See also: scaled_dot_product_score
sourceNeuralAttentionlib.generic_grouped_query_attention
— Functiongeneric_grouped_query_attention(mixingf, scoref, head, group, q, k, v, args...)
Generic version grouped_query_attention
. Need to specify mixing and score functon.
sourceNeuralAttentionlib.generic_multihead_qkv_attention
— Functiongeneric_multihead_qkv_attention(mixingf, scoref, head, q, k, v, args...)
Generic version of multihead_qkv_attention
. Need to specify mixing and score function.
sourceNeuralAttentionlib.generic_qkv_attention
— Functiongeneric_qkv_attention(mixingf, scoref, q, k, v, args...)
Generic version of naive_qkv_attention
. Need to specify mixing and score function.
sourceNeuralAttentionlib.get_sincos_position_embeddings
— Functionget_sincos_position_embeddings(hidden_size::Integer, normalized::Bool, x)
sincos position embeddings. x
can be either a integer specifying the length or an array of position indices.
sourceNeuralAttentionlib.grouped_query_attention
— Functiongrouped_query_attention(head, group, q, k, v, mask=nothing)
Similar to multihead_qkv_attention
, but multiple queries are using the same group of keys/values.
sourceNeuralAttentionlib.layer_norm
— Functionlayer_norm([epsilon = 1e-5,] alpha, beta, x)
Function which perform layer normalization on x
. alpha
and beta
can a Vector
, Number
or Nothing
.
$layer_norm(α, β, x) = α\frac{(x - μ)}{σ} + β$
If both alpha
and beta
is Nothing
, this is just a standardize function applied on the first dimension.
sourceNeuralAttentionlib.masked_score
— Functionmasked_score(mask) = masked_score $ mask
masked_score(maskop, mask) = masked_score $ maskop $ mask
-masked_score(maskop::AbstractMaskOp, mask::AbstractMask, score, args...)
Masked attention score api. Applying the mask
according to maskop
on the attention score compute from score(args...)
.
See also: naive_qkv_attention
, SymLengthMask
, BiLengthMask
sourceNeuralAttentionlib.merge_head
— Functionmerge_head(x)
merge the head
dimension split by split_head
.
sourceNeuralAttentionlib.mixing
— Functionmixing(f, v, g, args...) = f(attention_score(g, args...), v)
Mixing
function api. Can be overload for doing custom implementation with generic_qkv_attention
. f
is the mixing function and g
is score function.
See also: generic_qkv_attention
, generic_multihead_qkv_attention
, attention_score
sourceNeuralAttentionlib.move_head_dim_in
— Functionmove_head_dim_in(x::AbstractArray, nobatch=false)
Equivanlent to permutedims(x, move_head_dim_in_perm(x, nobatch)))
See also: merge_head
, move_head_dim_in_perm
sourceNeuralAttentionlib.move_head_dim_in_perm
— Functionmove_head_dim_in_perm(x::AbstractArray{T, N}, nobatch=false)
+masked_score(maskop::AbstractMaskOp, mask::AbstractMask, score, args...)
Masked attention score api. Applying the mask
according to maskop
on the attention score compute from score(args...)
.
See also: naive_qkv_attention
, SymLengthMask
, BiLengthMask
sourceNeuralAttentionlib.merge_head
— Functionmerge_head(x)
merge the head
dimension split by split_head
.
sourceNeuralAttentionlib.mixing
— Functionmixing(f, v, g, args...) = f(attention_score(g, args...), v)
Mixing
function api. Can be overload for doing custom implementation with generic_qkv_attention
. f
is the mixing function and g
is score function.
See also: generic_qkv_attention
, generic_multihead_qkv_attention
, attention_score
sourceNeuralAttentionlib.move_head_dim_in
— Functionmove_head_dim_in(x::AbstractArray, nobatch=false)
Equivanlent to permutedims(x, move_head_dim_in_perm(x, nobatch)))
See also: merge_head
, move_head_dim_in_perm
sourceNeuralAttentionlib.move_head_dim_in_perm
— Functionmove_head_dim_in_perm(x::AbstractArray{T, N}, nobatch=false)
move_head_dim_in_perm(N::Int, nobatch=false)
Dimension order for permutedims
to move the head
dimension (created by split_head
) from batch dimension to feature dimension (for merge_head
). Return a tuple of integer of length n
. nobatch
specify where x
is a batch of data.
Example
julia> Functional.move_head_dim_in_perm(5, false)
(1, 4, 2, 3, 5)
julia> Functional.move_head_dim_in_perm(5, true)
(1, 5, 2, 3, 4)
-
See also: merge_head
, move_head_dim_in
sourceNeuralAttentionlib.move_head_dim_out
— Functionmove_head_dim_out(x::AbstractArray, nobatch=false)
Equivanlent to permutedims(x, move_head_dim_out_perm(x, nobatch)))
See also: split_head
, move_head_dim_out_perm
sourceNeuralAttentionlib.move_head_dim_out_perm
— Functionmove_head_dim_out_perm(x::AbstractArray{T, N}, nobatch=false)
+
See also: merge_head
, move_head_dim_in
sourceNeuralAttentionlib.move_head_dim_out
— Functionmove_head_dim_out(x::AbstractArray, nobatch=false)
Equivanlent to permutedims(x, move_head_dim_out_perm(x, nobatch)))
See also: split_head
, move_head_dim_out_perm
sourceNeuralAttentionlib.move_head_dim_out_perm
— Functionmove_head_dim_out_perm(x::AbstractArray{T, N}, nobatch=false)
move_head_dim_out_perm(N::Int, nobatch=false)
Dimension order for permutedims
to move the head
dimension (created by split_head
) to batch dimension. Return a tuple of integer of length n
. nobatch
specify where x
is a batch of data.
Example
julia> Functional.move_head_dim_out_perm(5, false)
(1, 3, 4, 2, 5)
julia> Functional.move_head_dim_out_perm(5, true)
(1, 3, 4, 5, 2)
-
See also: split_head
, move_head_dim_out
sourceNeuralAttentionlib.multihead_qkv_attention
— Functionmultihead_qkv_attention(head, q, k, v, mask=nothing)
Multihead version of naive_qkv_attention
. The core operation for implement a regular transformer layer.
sourceNeuralAttentionlib.naive_qkv_attention
— Functionnaive_qkv_attention(q, k, v, mask=nothing)
The scaled dot-product attention of a regular transformer layer.
$Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$
It's equivalent to generic_qkv_attention(weighted_sum_mixing, normalized_score(NNlib.softmax) $ masked_score(GenericMaskOp(), mask) $ scaled_dot_product_score, q, k, v)
.
#Example
julia> fdim, ldim, bdim = 32, 10, 4;
+
See also: split_head
, move_head_dim_out
sourceNeuralAttentionlib.multihead_qkv_attention
— Functionmultihead_qkv_attention(head, q, k, v, mask=nothing)
Multihead version of naive_qkv_attention
. The core operation for implement a regular transformer layer.
sourceNeuralAttentionlib.naive_qkv_attention
— Functionnaive_qkv_attention(q, k, v, mask=nothing)
The scaled dot-product attention of a regular transformer layer.
$Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$
It's equivalent to generic_qkv_attention(weighted_sum_mixing, normalized_score(NNlib.softmax) $ masked_score(GenericMaskOp(), mask) $ scaled_dot_product_score, q, k, v)
.
#Example
julia> fdim, ldim, bdim = 32, 10, 4;
julia> x = randn(fdim, ldim, bdim);
@@ -24,30 +24,30 @@
julia> y ≈ z
true
-
See also: generic_qkv_attention
sourceNeuralAttentionlib.normalized_score
— Functionnormalized_score(norm) = normalized_score $ norm
-normalized_score(norm, score, args...)
Normalized attenion score api. norm
is the normalize function (like softmax
) and score
is the function that compute attention score from args...
.
See also: naive_qkv_attention
sourceNeuralAttentionlib.rms_layer_norm
— Functionrms_layer_norm([epsilon = 1e-5,] alpha, x)
Function which perform root-mean-square layer normalization on x
. alpha
and beta
can a Vector
, Number
or Nothing
.
$rms_layer_norm(α, x) = α\frac{x}{\sqrt{\sum_{i=1}^{N} x^2 / N}}$
If both alpha
is Nothing
, this is just a normalization with root-mean-square function applied on the first dimension.
sourceNeuralAttentionlib.scalar_relative_position_embedding
— Functionscalar_relative_position_embedding(relative_position_id_func, embedding_table, score, args...)
A relative position embedding that produce a trainable scalar bias for each value in the attention score. relative_position_id_func
is a function that take the attention score and return a relative_position_id
matrix with the same size of the attention score with batches (normally (key length, query length)
). This relative_position_id
would be used to index (or gather
) the embedding_table
. embedding_table
is an array with multiple dimensions, where the first dimension is the number of possible "id"
s and the remaining dimensions are for giving different value to each heads. By default we treat the last dimension of attention score as the batch dimension and the dimension between last dimension and the "length" dimension as the head dimensions.
sourceNeuralAttentionlib.scaled_dot_product_score
— Function scaled_dot_product_score(q, k, s = sqrt(inv(size(k, 1))))
The scaled dot-product attention score function of a regular transformer layer.
$Score(Q, K) = \frac{QK^T}{\sqrt{d_k}}$
scaled_dot_product_score(f, q, k)
Apply a transform function f
on q
/k
before dot-product.
See also: naive_qkv_attention
sourceNeuralAttentionlib.split_head
— Functionsplit_head(head::Int, x)
Split the first dimension into head
piece of small vector. Equivalent to reshape(x, :, head, tail(size(x))...)
.
sourceNeuralAttentionlib.t5_bucketed_position_id
— Functiont5_bucketed_position_id(n_buckets::Int, max_distance::Int)
A relative_position_id_func
used in the T5 Transformer model. The relative distances is assigned to a logarithmical buecket and the distance beyond max_distance
would be assigned to the same bucket.
See also: scalar_relative_position_embedding
, t5_causal_bucketed_position_id
sourceNeuralAttentionlib.t5_causal_bucketed_position_id
— Functiont5_causal_bucketed_position_id(n_buckets::Int, max_distance::Int)
Same as t5_bucketed_position_id
but only attent to past. Should be used with CausalMask
See also: scalar_relative_position_embedding
, t5_bucketed_position_id
sourceNeuralAttentionlib.weighted_sum_mixing
— Functionweighted_sum_mixing(s, v)
The mixing function of a regular transformer layer. s
is the attention score and v
is the value of QKV attention.
sourceNeuralAttentionlib.with_rotary_position_embedding
— Functionwith_rotary_position_embedding([size,] x)
Apply rotary position embedding to x
. Can take an size
argument and the rotary position embedding will only apply to x[1:size, :, ...]
. Should be used with scaled_dot_product_score
/dot_product_score
.
sourceNeuralAttentionlib.CausalGroupedQueryAttenOp
— Typestruct CausalGroupedQueryAttenOp{F} <: AbstractAttenOp
+
See also: generic_qkv_attention
sourceNeuralAttentionlib.normalized_score
— Functionnormalized_score(norm) = normalized_score $ norm
+normalized_score(norm, score, args...)
Normalized attenion score api. norm
is the normalize function (like softmax
) and score
is the function that compute attention score from args...
.
See also: naive_qkv_attention
sourceNeuralAttentionlib.rms_layer_norm
— Functionrms_layer_norm([epsilon = 1e-5,] alpha, x)
Function which perform root-mean-square layer normalization on x
. alpha
and beta
can a Vector
, Number
or Nothing
.
$rms_layer_norm(α, x) = α\frac{x}{\sqrt{\sum_{i=1}^{N} x^2 / N}}$
If both alpha
is Nothing
, this is just a normalization with root-mean-square function applied on the first dimension.
sourceNeuralAttentionlib.scalar_relative_position_embedding
— Functionscalar_relative_position_embedding(relative_position_id_func, embedding_table, score, args...)
A relative position embedding that produce a trainable scalar bias for each value in the attention score. relative_position_id_func
is a function that take the attention score and return a relative_position_id
matrix with the same size of the attention score with batches (normally (key length, query length)
). This relative_position_id
would be used to index (or gather
) the embedding_table
. embedding_table
is an array with multiple dimensions, where the first dimension is the number of possible "id"
s and the remaining dimensions are for giving different value to each heads. By default we treat the last dimension of attention score as the batch dimension and the dimension between last dimension and the "length" dimension as the head dimensions.
sourceNeuralAttentionlib.scaled_dot_product_score
— Function scaled_dot_product_score(q, k, s = sqrt(inv(size(k, 1))))
The scaled dot-product attention score function of a regular transformer layer.
$Score(Q, K) = \frac{QK^T}{\sqrt{d_k}}$
scaled_dot_product_score(f, q, k)
Apply a transform function f
on q
/k
before dot-product.
See also: naive_qkv_attention
sourceNeuralAttentionlib.split_head
— Functionsplit_head(head::Int, x)
Split the first dimension into head
piece of small vector. Equivalent to reshape(x, :, head, tail(size(x))...)
.
sourceNeuralAttentionlib.t5_bucketed_position_id
— Functiont5_bucketed_position_id(n_buckets::Int, max_distance::Int)
A relative_position_id_func
used in the T5 Transformer model. The relative distances is assigned to a logarithmical buecket and the distance beyond max_distance
would be assigned to the same bucket.
See also: scalar_relative_position_embedding
, t5_causal_bucketed_position_id
sourceNeuralAttentionlib.t5_causal_bucketed_position_id
— Functiont5_causal_bucketed_position_id(n_buckets::Int, max_distance::Int)
Same as t5_bucketed_position_id
but only attent to past. Should be used with CausalMask
See also: scalar_relative_position_embedding
, t5_bucketed_position_id
sourceNeuralAttentionlib.weighted_sum_mixing
— Functionweighted_sum_mixing(s, v)
The mixing function of a regular transformer layer. s
is the attention score and v
is the value of QKV attention.
sourceNeuralAttentionlib.with_rotary_position_embedding
— Functionwith_rotary_position_embedding([size,] x)
Apply rotary position embedding to x
. Can take an size
argument and the rotary position embedding will only apply to x[1:size, :, ...]
. Should be used with scaled_dot_product_score
/dot_product_score
.
sourceNeuralAttentionlib.CausalGroupedQueryAttenOp
— Typestruct CausalGroupedQueryAttenOp{F} <: AbstractAttenOp
head::Int
group::Int
p::F
-end
Structure for holding parameter of grouped_query_attention
.
(op::CausalGroupedQueryAttenOp)(q, k, v, mask = nothing)
Perform grouped query attention where mask
would be combined with a CausalMask
.
sourceNeuralAttentionlib.CausalGroupedQueryAttenOpWithScore
— TypeSame as CausalGroupedQueryAttenOp
but also return the attention score
sourceNeuralAttentionlib.CausalMultiheadQKVAttenOp
— Typestruct CausalMultiheadQKVAttenOp{F} <: AbstractAttenOp
+end
Structure for holding parameter of grouped_query_attention
.
(op::CausalGroupedQueryAttenOp)(q, k, v, mask = nothing)
Perform grouped query attention where mask
would be combined with a CausalMask
.
sourceNeuralAttentionlib.CausalGroupedQueryAttenOpWithScore
— TypeSame as CausalGroupedQueryAttenOp
but also return the attention score
sourceNeuralAttentionlib.CausalMultiheadQKVAttenOp
— Typestruct CausalMultiheadQKVAttenOp{F} <: AbstractAttenOp
head::Int # number of head
p::F # dropout probability
-end
Structure for holding parameter of multihead_qkv_attention
.
(op::CausalMultiheadQKVAttenOp)(q, k, v, mask = nothing)
Perform multihead attention where mask
would be combined with a CausalMask
sourceNeuralAttentionlib.CausalMultiheadQKVAttenOpWithScore
— TypeSame as CausalMultiheadQKVAttenOp
but also return the attention score
sourceNeuralAttentionlib.GroupedQueryAttenOp
— Typestruct GroupedQueryAttenOp{F} <: AbstractAttenOp
+end
Structure for holding parameter of multihead_qkv_attention
.
(op::CausalMultiheadQKVAttenOp)(q, k, v, mask = nothing)
Perform multihead attention where mask
would be combined with a CausalMask
sourceNeuralAttentionlib.CausalMultiheadQKVAttenOpWithScore
— TypeSame as CausalMultiheadQKVAttenOp
but also return the attention score
sourceNeuralAttentionlib.GroupedQueryAttenOp
— Typestruct GroupedQueryAttenOp{F} <: AbstractAttenOp
head::Int
group::Int
p::F
-end
Structure for holding parameter of grouped_query_attention
.
(op::GroupedQueryAttenOp)(q, k, v, mask = nothing)
Perform grouped query attention.
sourceNeuralAttentionlib.GroupedQueryAttenOpWithScore
— TypeSame as GroupedQueryAttenOp
but also return the attention score
sourceNeuralAttentionlib.MultiheadQKVAttenOp
— Typestruct MultiheadQKVAttenOp{F} <: AbstractAttenOp
+end
Structure for holding parameter of grouped_query_attention
.
(op::GroupedQueryAttenOp)(q, k, v, mask = nothing)
Perform grouped query attention.
sourceNeuralAttentionlib.GroupedQueryAttenOpWithScore
— TypeSame as GroupedQueryAttenOp
but also return the attention score
sourceNeuralAttentionlib.MultiheadQKVAttenOp
— Typestruct MultiheadQKVAttenOp{F} <: AbstractAttenOp
head::Int # number of head
p::F # dropout probability
-end
Structure for holding parameter of multihead_qkv_attention
.
(op::MultiheadQKVAttenOp)(q, k, v, mask = nothing)
Perform multihead attention.
sourceNeuralAttentionlib.MultiheadQKVAttenOpWithScore
— TypeSame as MultiheadQKVAttenOp
but also return the attention score
sourceNeuralAttentionlib.PrefixedFunction
— TypePrefixedFunction(f, args::NTuple{N}) <: Function
A type representating a partially-applied version of the function f
, with the first N
arguments fixed to the values args
. In other words, PrefixedFunction(f, args)
behaves similarly to (xs...)->f(args..., xs...)
.
See also NeuralAttentionlib.:$
.
sourceNeuralAttentionlib.:$
— Methodf $ x
-f $ x $ y $ ...
Partially-applied function. Return a PrefixedFunction
.
sourceMask
NeuralAttentionlib.AbstractMaskOp
— TypeAbstractMaskOp
Trait-like abstract type for holding operation related argument, defined how the mask should be apply to input array
sourceNeuralAttentionlib.apply_mask
— Methodapply_mask(op::GenericMaskOp, mask::AbstractMask, score)
Equivalent to op.apply(score, op.scale .* (op.flip ? .! mask : mask))
.
Example
julia> x = randn(10, 10);
+end
Structure for holding parameter of multihead_qkv_attention
.
(op::MultiheadQKVAttenOp)(q, k, v, mask = nothing)
Perform multihead attention.
sourceNeuralAttentionlib.MultiheadQKVAttenOpWithScore
— TypeSame as MultiheadQKVAttenOp
but also return the attention score
sourceNeuralAttentionlib.PrefixedFunction
— TypePrefixedFunction(f, args::NTuple{N}) <: Function
A type representating a partially-applied version of the function f
, with the first N
arguments fixed to the values args
. In other words, PrefixedFunction(f, args)
behaves similarly to (xs...)->f(args..., xs...)
.
See also NeuralAttentionlib.:$
.
sourceNeuralAttentionlib.:$
— Methodf $ x
+f $ x $ y $ ...
Partially-applied function. Return a PrefixedFunction
.
sourceMask
NeuralAttentionlib.AbstractMaskOp
— TypeAbstractMaskOp
Trait-like abstract type for holding operation related argument, defined how the mask should be apply to input array
sourceNeuralAttentionlib.apply_mask
— Methodapply_mask(op::GenericMaskOp, mask::AbstractMask, score)
Equivalent to op.apply(score, op.scale .* (op.flip ? .! mask : mask))
.
Example
julia> x = randn(10, 10);
julia> m = CausalMask()
CausalMask()
julia> apply_mask(GenericMaskOp(.+, true, -1e9), m, x) == @. x + (!m * -1e9)
true
-
sourceNeuralAttentionlib.apply_mask
— Methodapply_mask(op::NaiveMaskOp, mask::AbstractMask, score)
Directly broadcast multiply mask to attention score, i.e. score .* mask
.
sourceNeuralAttentionlib.AbstractArrayMask
— TypeAbstractArrayMask <: AbstractAttenMask
Abstract type for mask with array data
sourceNeuralAttentionlib.AbstractAttenMask
— TypeAbstractAttenMask <: AbstractMask
Abstract type for mask data specifically for attention.
sourceNeuralAttentionlib.AbstractDatalessMask
— TypeAbstractDatalessMask <: AbstractAttenMask
Abstract type for mask without array data.
sourceNeuralAttentionlib.AbstractMask
— TypeAbstractMask
Abstract type for mask data.
sourceNeuralAttentionlib.AbstractSeqMask
— TypeAbstractSeqMask <: AbstractMask
Abstract type for mask data specifically for sequence.
sourceNeuralAttentionlib.BandPartMask
— TypeBandPartMask(l::Int, u::Int) <: AbstractAttenMask{DATALESS}
Attention mask that only allow band_part values to pass.
Example
julia> trues(10, 10) .* BandPartMask(3, 5)
+
sourceNeuralAttentionlib.apply_mask
— Methodapply_mask(op::NaiveMaskOp, mask::AbstractMask, score)
Directly broadcast multiply mask to attention score, i.e. score .* mask
.
sourceNeuralAttentionlib.AbstractArrayMask
— TypeAbstractArrayMask <: AbstractAttenMask
Abstract type for mask with array data
sourceNeuralAttentionlib.AbstractAttenMask
— TypeAbstractAttenMask <: AbstractMask
Abstract type for mask data specifically for attention.
sourceNeuralAttentionlib.AbstractDatalessMask
— TypeAbstractDatalessMask <: AbstractAttenMask
Abstract type for mask without array data.
sourceNeuralAttentionlib.AbstractMask
— TypeAbstractMask
Abstract type for mask data.
sourceNeuralAttentionlib.AbstractSeqMask
— TypeAbstractSeqMask <: AbstractMask
Abstract type for mask data specifically for sequence.
sourceNeuralAttentionlib.BandPartMask
— TypeBandPartMask(l::Int, u::Int) <: AbstractAttenMask{DATALESS}
Attention mask that only allow band_part values to pass.
Example
julia> trues(10, 10) .* BandPartMask(3, 5)
10×10 BitMatrix:
1 1 1 1 1 1 0 0 0 0
1 1 1 1 1 1 1 0 0 0
@@ -58,7 +58,7 @@
0 0 0 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 1 1
0 0 0 0 0 1 1 1 1 1
- 0 0 0 0 0 0 1 1 1 1
sourceNeuralAttentionlib.BatchedMask
— TypeBatchedMask(mask::AbstractMask) <: AbstractWrapperMask
Attention mask wrapper over array mask for applying the same mask within the same batch.
Example
julia> m = SymLengthMask([2,3])
+ 0 0 0 0 0 0 1 1 1 1
sourceNeuralAttentionlib.BatchedMask
— TypeBatchedMask(mask::AbstractMask) <: AbstractWrapperMask
Attention mask wrapper over array mask for applying the same mask within the same batch.
Example
julia> m = SymLengthMask([2,3])
SymLengthMask{1, Vector{Int32}}(Int32[2, 3])
julia> trues(3,3, 2) .* m
@@ -99,7 +99,7 @@
1 1 1
1 1 1
1 1 1
-
sourceNeuralAttentionlib.BiLengthMask
— TypeBiLengthMask(q_len::A, k_len::A) where {A <: AbstractArray{Int, N}} <: AbstractAttenMask{ARRAYDATA}
Attention mask specified by two arrays of integer that indicate the length dimension size.
Example
julia> bm = BiLengthMask([2,3], [3, 5])
+
sourceNeuralAttentionlib.BiLengthMask
— TypeBiLengthMask(q_len::A, k_len::A) where {A <: AbstractArray{Int, N}} <: AbstractAttenMask{ARRAYDATA}
Attention mask specified by two arrays of integer that indicate the length dimension size.
Example
julia> bm = BiLengthMask([2,3], [3, 5])
BiLengthMask{1, Vector{Int32}}(Int32[2, 3], Int32[3, 5])
julia> trues(5,5, 2) .* bm
@@ -117,7 +117,7 @@
1 1 1 0 0
1 1 1 0 0
1 1 1 0 0
-
See also: SymLengthMask
, BiSeqMask
, BatchedMask
, RepeatMask
sourceNeuralAttentionlib.BiSeqMask
— TypeBiSeqMask(qmask::A1, kmask::A2) where {A1 <: AbstractSeqMask, A2 <: AbstractSeqMask} <: AbstractAttenMask
Take two sequence mask and construct an attention mask.
Example
julia> trues(7, 7, 2) .* Masks.BiSeqMask(Masks.LengthMask([3, 5]), Masks.RevLengthMask([3, 5]))
+
See also: SymLengthMask
, BiSeqMask
, BatchedMask
, RepeatMask
sourceNeuralAttentionlib.BiSeqMask
— TypeBiSeqMask(qmask::A1, kmask::A2) where {A1 <: AbstractSeqMask, A2 <: AbstractSeqMask} <: AbstractAttenMask
Take two sequence mask and construct an attention mask.
Example
julia> trues(7, 7, 2) .* Masks.BiSeqMask(Masks.LengthMask([3, 5]), Masks.RevLengthMask([3, 5]))
7×7×2 BitArray{3}:
[:, :, 1] =
0 0 0 0 0 0 0
@@ -135,7 +135,7 @@
1 1 1 1 1 0 0
1 1 1 1 1 0 0
1 1 1 1 1 0 0
- 1 1 1 1 1 0 0
See also: BiLengthMask
, RevBiLengthMask
sourceNeuralAttentionlib.CausalMask
— TypeCausalMask() <: AbstractAttenMask{DATALESS}
Attention mask that block the future values.
Similar to applying LinearAlgebra.triu!
on the score matrix
Example
julia> trues(10, 10) .* CausalMask()
+ 1 1 1 1 1 0 0
See also: BiLengthMask
, RevBiLengthMask
sourceNeuralAttentionlib.CausalMask
— TypeCausalMask() <: AbstractAttenMask{DATALESS}
Attention mask that block the future values.
Similar to applying LinearAlgebra.triu!
on the score matrix
Example
julia> trues(10, 10) .* CausalMask()
10×10 BitMatrix:
1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1
@@ -146,7 +146,7 @@
0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 1 1
- 0 0 0 0 0 0 0 0 0 1
sourceNeuralAttentionlib.GenericAttenMask
— TypeGenericAttenMask <: AbstractAttenMask{ARRAYDATA}
Generic attention mask. Just a wrapper over AbstractArray{Bool}
for dispatch.
Example
julia> bitmask = rand(Bool, 10, 10)
+ 0 0 0 0 0 0 0 0 0 1
sourceNeuralAttentionlib.GenericAttenMask
— TypeGenericAttenMask <: AbstractAttenMask{ARRAYDATA}
Generic attention mask. Just a wrapper over AbstractArray{Bool}
for dispatch.
Example
julia> bitmask = rand(Bool, 10, 10)
10×10 Matrix{Bool}:
1 0 1 1 0 0 1 0 1 1
0 0 1 1 0 0 0 1 1 1
@@ -170,7 +170,7 @@
0 0 0 1 1 1 0 1 1 1
1 0 1 0 1 1 1 0 0 1
0 1 0 1 0 0 1 1 0 1
- 0 0 0 1 0 1 0 0 0 1
sourceNeuralAttentionlib.GenericSeqMask
— TypeGenericSeqMask(mask::AbstractArray{Bool}) <: AbstractSeqMask{ARRAYDATA}
Create a sequence mask from an array of Bool
.
Example
julia> m = GenericSeqMask(rand(Bool, 10, 2))
+ 0 0 0 1 0 1 0 0 0 1
sourceNeuralAttentionlib.GenericSeqMask
— TypeGenericSeqMask(mask::AbstractArray{Bool}) <: AbstractSeqMask{ARRAYDATA}
Create a sequence mask from an array of Bool
.
Example
julia> m = GenericSeqMask(rand(Bool, 10, 2))
GenericSeqMask{3, Array{Bool, 3}}([0 1 … 0 0;;; 1 0 … 1 0])
julia> trues(7, 10, 2) .* m
@@ -200,8 +200,8 @@
[:, :, 2] =
1 0 1 1 0 1 1 1 1 0
-
sourceNeuralAttentionlib.Indexer
— TypeIndexer(m::AbstractMask, size::Dims{N}) <: AbstractArray{Bool, N}
-Indexer(m::AbstractMask, size::Dims{N}, scale::T) <: AbstractArray{T, N}
A lazy array-like object that "materialize" the mask m
with size
and a optional scale
without size check.
See also: GetIndexer
sourceNeuralAttentionlib.LengthMask
— TypeLengthMask(len::AbstractArray{Int, N}) <: AbstractSeqMask{ARRAYDATA}
A Sequence Mask specified by an array of integer that indicate the length dimension size. Can be convert to attention mask (SymLengthMask
, BiLengthMask
) with AttenMask
.
Example
julia> ones(7, 7, 2) .* LengthMask([3, 5])
+
sourceNeuralAttentionlib.Indexer
— TypeIndexer(m::AbstractMask, size::Dims{N}) <: AbstractArray{Bool, N}
+Indexer(m::AbstractMask, size::Dims{N}, scale::T) <: AbstractArray{T, N}
A lazy array-like object that "materialize" the mask m
with size
and a optional scale
without size check.
See also: GetIndexer
sourceNeuralAttentionlib.LengthMask
— TypeLengthMask(len::AbstractArray{Int, N}) <: AbstractSeqMask{ARRAYDATA}
A Sequence Mask specified by an array of integer that indicate the length dimension size. Can be convert to attention mask (SymLengthMask
, BiLengthMask
) with AttenMask
.
Example
julia> ones(7, 7, 2) .* LengthMask([3, 5])
7×7×2 Array{Float64, 3}:
[:, :, 1] =
1.0 1.0 1.0 0.0 0.0 0.0 0.0
@@ -220,7 +220,7 @@
1.0 1.0 1.0 1.0 1.0 0.0 0.0
1.0 1.0 1.0 1.0 1.0 0.0 0.0
1.0 1.0 1.0 1.0 1.0 0.0 0.0
-
sourceNeuralAttentionlib.LocalMask
— TypeLocalMask(width::Int) <: AbstractAttenMask{DATALESS}
Attention mask that only allow local (diagonal like) values to pass.
width
should be ≥ 0 and A .* LocalMask(1)
is similar to Diagonal(A)
Example
julia> trues(10, 10) .* LocalMask(3)
+
sourceNeuralAttentionlib.LocalMask
— TypeLocalMask(width::Int) <: AbstractAttenMask{DATALESS}
Attention mask that only allow local (diagonal like) values to pass.
width
should be ≥ 0 and A .* LocalMask(1)
is similar to Diagonal(A)
Example
julia> trues(10, 10) .* LocalMask(3)
10×10 BitMatrix:
1 1 1 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0
@@ -231,7 +231,7 @@
0 0 0 0 1 1 1 1 1 0
0 0 0 0 0 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1
- 0 0 0 0 0 0 0 1 1 1
sourceNeuralAttentionlib.RandomMask
— TypeRandomMask(p::Float32) <: AbstractAttenMask{DATALESS}
Attention mask that block value randomly.
p
specify the percentage of value to block. e.g. A .* RandomMask(0)
is equivalent to identity(A)
and A .* RandomMask(1)
is equivalent to zero(A)
.
Example
julia> trues(10, 10) .* RandomMask(0.5)
+ 0 0 0 0 0 0 0 1 1 1
sourceNeuralAttentionlib.RandomMask
— TypeRandomMask(p::Float32) <: AbstractAttenMask{DATALESS}
Attention mask that block value randomly.
p
specify the percentage of value to block. e.g. A .* RandomMask(0)
is equivalent to identity(A)
and A .* RandomMask(1)
is equivalent to zero(A)
.
Example
julia> trues(10, 10) .* RandomMask(0.5)
10×10 BitMatrix:
1 1 1 1 1 1 0 1 1 1
0 0 1 0 1 0 0 0 1 0
@@ -255,7 +255,7 @@
1 1 1 0 1 1 1 0 0 0
0 0 1 1 0 0 1 1 1 0
0 1 1 1 1 0 1 0 1 0
- 0 0 1 0 0 0 0 1 1 1
sourceNeuralAttentionlib.RepeatMask
— TypeRepeatMask(mask::AbstractMask, num::Int) <: AbstractWrapperMask
Attention mask wrapper over array mask for doing inner repeat on the last dimension.
Example
julia> m = SymLengthMask([2,3])
+ 0 0 1 0 0 0 0 1 1 1
sourceNeuralAttentionlib.RepeatMask
— TypeRepeatMask(mask::AbstractMask, num::Int) <: AbstractWrapperMask
Attention mask wrapper over array mask for doing inner repeat on the last dimension.
Example
julia> m = SymLengthMask([2,3])
SymLengthMask{1, Vector{Int32}}(Int32[2, 3])
julia> trues(3,3, 2) .* m
@@ -296,7 +296,7 @@
1 1 1
1 1 1
1 1 1
-
sourceNeuralAttentionlib.RevBiLengthMask
— TypeRevBiLengthMask(q_len::A, k_len::A) where {A <: AbstractArray{Int, N}} <: AbstractAttenMask{ARRAYDATA}
BiLengthMask
but counts from the end of array, used for left padding.
Example
julia> bm = RevBiLengthMask([2,3], [3, 5])
+
sourceNeuralAttentionlib.RevBiLengthMask
— TypeRevBiLengthMask(q_len::A, k_len::A) where {A <: AbstractArray{Int, N}} <: AbstractAttenMask{ARRAYDATA}
BiLengthMask
but counts from the end of array, used for left padding.
Example
julia> bm = RevBiLengthMask([2,3], [3, 5])
RevBiLengthMask{1, Vector{Int32}}(Int32[2, 3], Int32[3, 5])
julia> trues(5,5, 2) .* bm
@@ -314,7 +314,7 @@
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1
-
See also: RevLengthMask
, RevSymLengthMask
, BiSeqMask
, BatchedMask
, RepeatMask
sourceNeuralAttentionlib.RevLengthMask
— TypeRevLengthMask(len::AbstractArray{Int, N}) <: AbstractSeqMask{ARRAYDATA}
LengthMask
but counts from the end of array, used for left padding. Can be convert to attention mask (RevSymLengthMask
, RevBiLengthMask
) with AttenMask
.
Example
julia> ones(7, 7, 2) .* RevLengthMask([3, 5])
+
See also: RevLengthMask
, RevSymLengthMask
, BiSeqMask
, BatchedMask
, RepeatMask
sourceNeuralAttentionlib.RevLengthMask
— TypeRevLengthMask(len::AbstractArray{Int, N}) <: AbstractSeqMask{ARRAYDATA}
LengthMask
but counts from the end of array, used for left padding. Can be convert to attention mask (RevSymLengthMask
, RevBiLengthMask
) with AttenMask
.
Example
julia> ones(7, 7, 2) .* RevLengthMask([3, 5])
7×7×2 Array{Float64, 3}:
[:, :, 1] =
0.0 0.0 0.0 0.0 1.0 1.0 1.0
@@ -333,7 +333,7 @@
0.0 0.0 1.0 1.0 1.0 1.0 1.0
0.0 0.0 1.0 1.0 1.0 1.0 1.0
0.0 0.0 1.0 1.0 1.0 1.0 1.0
-
sourceNeuralAttentionlib.RevSymLengthMask
— TypeRevSymLengthMask(len::AbstractArray{Int, N}) <: AbstractAttenMask{ARRAYDATA}
SymLengthMask
but counts from the end of array, used for left padding.
Example
julia> m = RevSymLengthMask([2,3])
+
sourceNeuralAttentionlib.RevSymLengthMask
— TypeRevSymLengthMask(len::AbstractArray{Int, N}) <: AbstractAttenMask{ARRAYDATA}
SymLengthMask
but counts from the end of array, used for left padding.
Example
julia> m = RevSymLengthMask([2,3])
RevSymLengthMask{1, Vector{Int32}}(Int32[2, 3])
julia> trues(3,3, 2) .* m
@@ -347,7 +347,7 @@
1 1 1
1 1 1
1 1 1
-
See also: BiLengthMask
, BatchedMask
, RepeatMask
sourceNeuralAttentionlib.SymLengthMask
— TypeSymLengthMask(len::AbstractArray{Int, N}) <: AbstractAttenMask{ARRAYDATA}
Attention mask specified by an array of integer that indicate the length dimension size. assuming Query length and Key length are the same.
Example
julia> m = SymLengthMask([2,3])
+
See also: BiLengthMask
, BatchedMask
, RepeatMask
sourceNeuralAttentionlib.SymLengthMask
— TypeSymLengthMask(len::AbstractArray{Int, N}) <: AbstractAttenMask{ARRAYDATA}
Attention mask specified by an array of integer that indicate the length dimension size. assuming Query length and Key length are the same.
Example
julia> m = SymLengthMask([2,3])
SymLengthMask{1, Vector{Int32}}(Int32[2, 3])
julia> trues(3,3, 2) .* m
@@ -361,7 +361,7 @@
1 1 1
1 1 1
1 1 1
-
See also: LengthMask
, BiLengthMask
, BatchedMask
, RepeatMask
sourceBase.:!
— Method!m::AbstractMask
Boolean not of an attention mask
sourceBase.:&
— Methodm1::AbstractMask & m2::AbstractMask
logical and of two attention mask
sourceBase.:|
— Methodm1::AbstractMask | m2::AbstractMask
logical or of two attention mask
sourceNeuralAttentionlib.AttenMask
— FunctionAttenMask(m::AbstractMask)
Convert mask into corresponding attention mask.
AttenMask(q_mask::AbstractSeqMask, k_mask::AbstractSeqMask)
Create a attention mask from 2 sequence masks specific the sequence mask for "query" and "key".
sourceNeuralAttentionlib.GetIndexer
— FunctionGetIndexer(m::AbstractMask, destsize::Dims{N})
Return the Indexer
of m
and check if the mask m
can be applied to an array with size destsize
.
sourceNeuralAttentionlib.getmask
— Functiongetmask(m::AbstractMask, score, scale = 1)
Convert m
into mask array of AbstractArray
for score
with scale
.
Example
julia> getmask(CausalMask(), randn(7,7), 2)
+
See also: LengthMask
, BiLengthMask
, BatchedMask
, RepeatMask
sourceBase.:!
— Method!m::AbstractMask
Boolean not of an attention mask
sourceBase.:&
— Methodm1::AbstractMask & m2::AbstractMask
logical and of two attention mask
sourceBase.:|
— Methodm1::AbstractMask | m2::AbstractMask
logical or of two attention mask
sourceNeuralAttentionlib.AttenMask
— FunctionAttenMask(m::AbstractMask)
Convert mask into corresponding attention mask.
AttenMask(q_mask::AbstractSeqMask, k_mask::AbstractSeqMask)
Create a attention mask from 2 sequence masks specific the sequence mask for "query" and "key".
sourceNeuralAttentionlib.GetIndexer
— FunctionGetIndexer(m::AbstractMask, destsize::Dims{N})
Return the Indexer
of m
and check if the mask m
can be applied to an array with size destsize
.
sourceNeuralAttentionlib.getmask
— Functiongetmask(m::AbstractMask, score, scale = 1)
Convert m
into mask array of AbstractArray
for score
with scale
.
Example
julia> getmask(CausalMask(), randn(7,7), 2)
7×7 Matrix{Float64}:
2.0 2.0 2.0 2.0 2.0 2.0 2.0
0.0 2.0 2.0 2.0 2.0 2.0 2.0
@@ -370,7 +370,7 @@
0.0 0.0 0.0 0.0 2.0 2.0 2.0
0.0 0.0 0.0 0.0 0.0 2.0 2.0
0.0 0.0 0.0 0.0 0.0 0.0 2.0
-
sourceNeuralAttentionlib.lengths
— Functionlengths(::AbstractSeqMask)
Get the number of true
s of each batch in the sequence mask.
sourceMatmul
NeuralAttentionlib.CollapsedDimsArray
— TypeCollapsedDimsArray{T}(array, ni::Integer, nj::Integer) <: AbstractArray{T, 3}
Similar to lazy reshape array with collapsed_size
sourceNeuralAttentionlib.collapsed_size
— Functioncollapsed_size(x, ni, nj [, n])::Dim{3}
Collapse the dimensionality of x
into 3 according to ni
and nj
where ni
, nj
specify the number of second and third dimensions it take.
(X1, X2, ..., Xk, Xk+1, Xk+2, ..., Xk+ni, Xk+ni+1, ..., Xn)
+
sourceNeuralAttentionlib.lengths
— Functionlengths(::AbstractSeqMask)
Get the number of true
s of each batch in the sequence mask.
sourceMatmul
NeuralAttentionlib.CollapsedDimsArray
— TypeCollapsedDimsArray{T}(array, ni::Integer, nj::Integer) <: AbstractArray{T, 3}
Similar to lazy reshape array with collapsed_size
sourceNeuralAttentionlib.collapsed_size
— Functioncollapsed_size(x, ni, nj [, n])::Dim{3}
Collapse the dimensionality of x
into 3 according to ni
and nj
where ni
, nj
specify the number of second and third dimensions it take.
(X1, X2, ..., Xk, Xk+1, Xk+2, ..., Xk+ni, Xk+ni+1, ..., Xn)
|______dim1___| |_________ni_________| |______nj______|
Example
julia> x = randn(7,6,5,4,3,2);
julia> collapsed_size(x, 2, 2, 1)
@@ -384,7 +384,7 @@
julia> collapsed_size(x, 2, 2)
(42, 20, 6)
-
See also: noncollapsed_size
sourceNeuralAttentionlib.collapseddims
— Methodcollapseddims(x::AbstractArray, xi, xj)
Reshape x
into 3 dim array, equivalent to reshape(x, collapsed_size(x, xi, xj))
See also: collapsed_size
sourceNeuralAttentionlib.collapseddims
— Methodcollapseddims(ca::CollapsedDimsArray)
remove the wrapper and really reshape it.
See also: CollapsedDimsArray
, unwrap_collapse
sourceNeuralAttentionlib.matmul
— Functionmatmul(a::AbstractArray, b::AbstractArray, s::Number = 1)
Equivalent to s .* (a * b)
if a
and b
are Vector
or Matrix
. For array with higher dimension, it will convert a
and b
to CollapsedDimsArray
and perform batched matrix multiplication, and then return the result as CollapsedDimsArray
. This is useful for preserving the dimensionality. If the batch dimension of a
and b
have different shape, it pick the shape of b
for batch dimension. Work with NNlib.batch_transpose
and NNlib.batch_adjoint
.
Example
# b-dim shape: (6,)
+
See also: noncollapsed_size
sourceNeuralAttentionlib.collapseddims
— Methodcollapseddims(x::AbstractArray, xi, xj)
Reshape x
into 3 dim array, equivalent to reshape(x, collapsed_size(x, xi, xj))
See also: collapsed_size
sourceNeuralAttentionlib.collapseddims
— Methodcollapseddims(ca::CollapsedDimsArray)
remove the wrapper and really reshape it.
See also: CollapsedDimsArray
, unwrap_collapse
sourceNeuralAttentionlib.matmul
— Functionmatmul(a::AbstractArray, b::AbstractArray, s::Number = 1)
Equivalent to s .* (a * b)
if a
and b
are Vector
or Matrix
. For array with higher dimension, it will convert a
and b
to CollapsedDimsArray
and perform batched matrix multiplication, and then return the result as CollapsedDimsArray
. This is useful for preserving the dimensionality. If the batch dimension of a
and b
have different shape, it pick the shape of b
for batch dimension. Work with NNlib.batch_transpose
and NNlib.batch_adjoint
.
Example
# b-dim shape: (6,)
julia> a = CollapsedDimsArray(randn(3,4,2,3,6), 2, 1); size(a)
(12, 6, 6)
@@ -402,7 +402,7 @@
# equivanlent to `batched_mul` but preserve shape
julia> NNlib.batched_mul(collapseddims(a), collapseddims(b)) == collapseddims(matmul(a, b))
true
-
See also: CollapsedDimsArray
, unwrap_collapse
, collapseddims
sourceNeuralAttentionlib.noncollapsed_size
— Functionnoncollapsed_size(x, ni, nj [, n])
Collapse the dimensionality of x
into 3 according to ni
and nj
.
(X1, X2, ..., Xk, Xk+1, Xk+2, ..., Xk+ni, Xk+ni+1, ..., Xn)
+
See also: CollapsedDimsArray
, unwrap_collapse
, collapseddims
sourceNeuralAttentionlib.noncollapsed_size
— Functionnoncollapsed_size(x, ni, nj [, n])
Collapse the dimensionality of x
into 3 according to ni
and nj
.
(X1, X2, ..., Xk, Xk+1, Xk+2, ..., Xk+ni, Xk+ni+1, ..., Xn)
|______dim1___| |_________ni_________| |______nj______|
But take the size before collapse. e.g. noncollapsed_size(x, ni, nj, 2)
will be (Xi, Xi+1, ..., Xj-1)
.
Example
julia> x = randn(7,6,5,4,3,2);
julia> noncollapsed_size(x, 2, 2, 1)
@@ -416,4 +416,4 @@
julia> noncollapsed_size(x, 2, 2)
((7, 6), (5, 4), (3, 2))
-
See also: collapsed_size
sourceNeuralAttentionlib.scaled_matmul
— Functionscaled_matmul(a::AbstractArray, b::AbstractArray, s::Number = 1)
Basically equivalent to unwrap_collapse(matmul(a, b, s))
, but not differentiable w.r.t. to s
.
sourceNeuralAttentionlib.unwrap_collapse
— Functionunwrap_collapse(ca::CollapsedDimsArray)
Return the underlying array of CollapsedDimsArray
, otherwise just return the input.
sourceSettings
This document was generated with Documenter.jl version 1.4.1 on Saturday 1 June 2024. Using Julia version 1.10.3.
+
See also: collapsed_size
NeuralAttentionlib.scaled_matmul
— Functionscaled_matmul(a::AbstractArray, b::AbstractArray, s::Number = 1)
Basically equivalent to unwrap_collapse(matmul(a, b, s))
, but not differentiable w.r.t. to s
.
NeuralAttentionlib.unwrap_collapse
— Functionunwrap_collapse(ca::CollapsedDimsArray)
Return the underlying array of CollapsedDimsArray
, otherwise just return the input.