Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
chengchingwen committed Dec 21, 2023
1 parent 2b9a568 commit fbcf4d6
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 1 deletion.
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,32 @@
[![Build Status](https://github.com/chengchingwen/NeuralAttentionlib.jl/workflows/CI/badge.svg)](https://github.com/chengchingwen/NeuralAttentionlib.jl/actions)
[![Coverage](https://codecov.io/gh/chengchingwen/NeuralAttentionlib.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/chengchingwen/NeuralAttentionlib.jl)

Reusable functionality for defining custom attention/transformer layers.
Reusable functionality for defining custom attention/transformer layers. See [document](https://chengchingwen.github.io/NeuralAttentionlib.jl/dev) for more information.

# Design

<div align="center"> <img src="docs/src/assets/overview.png" alt="NeuralAttentionlib_Overview" width="512"></img></div>

The core idea of this package is to make the attention operation composable, so that most of the attention variants can
be easily defined without rewriting other parts. For example, normal attention use `softmax` on the attention score to
normalize weight of each entries. If you want to replace `softmax` with other normalization function, such as L2-norm,
there is a problem that they require different ways to mask specific entries such as paddings. With this package, we
can easily do this by providing a different `AbstractMaskOp` to `masked_score`, so no copy-paste is needed. For another
example, some position embeddings are adding values to the attention scores, with this package, you can directly chain
the position embedding function (or use `biased_score`) with other score functions. Moreover, the same definition can
be used directly for high dimensional attentions, such as image or video.


This package contain 3 submodules: `MatMul`, `Masks`, and `Functional`.

1. `Matmul` defines an Array wrapper `CollapsedDimsArray{T}(array, ni::Integer, nj::Integer)` which treat n-dimensional
array as 3-dimensional array while preserving the original shape. By explicitly specifying which dimensions should be
the "batch" and "length" dimensions, the implementations of attention do not need to worry about the input dimensions.
2. `Masks` provides an interface to define non-allocating masks with support for both CPU and GPU (using Julia's
broadcast interface) and many pre-defined masks. For example, `CausalMask()` is just a Julia object and it would
NOT allocate a `n^2` attention score mask either on CPU or GPU. These masks are also composable, you can use `&`/`|`
to combine, for example, causal mask and padding mask without extra allocation or the need to write extra code.
3. `Functional` contains the implementation for the "attention score"s, "mixing"s, and "attention operation"s. The
interface of "attention score"s allow you to chain different score function together, such as `normalized_score`,
`masked_score`, and `biased_score`. And the interface of "attention operation"s allow you to provide different
score functions and mixing functions. The other part, such as reshaping for multi-head, are automatically handled.
Binary file added docs/src/assets/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit fbcf4d6

Please sign in to comment.