diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 761c765..f2f552a 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-07-06T08:56:26","documenter_version":"1.5.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-07-06T09:07:04","documenter_version":"1.5.0"}} \ No newline at end of file diff --git a/dev/examples/custom_loss_layer/index.html b/dev/examples/custom_loss_layer/index.html index 4967aaf..9863346 100644 --- a/dev/examples/custom_loss_layer/index.html +++ b/dev/examples/custom_loss_layer/index.html @@ -60,4 +60,4 @@ model_loss = SimpleChains.add_loss(model, BinaryLogitCrossEntropyLoss(Y)); SimpleChains.valgrad!(gradients, model_loss, X, parameters)

Or alternatively, if you want to just train the parameters in full:

epochs = 100
-SimpleChains.train_unbatched!(gradients, parameters, model_loss, X, SimpleChains.ADAM(), epochs); 
+SimpleChains.train_unbatched!(gradients, parameters, model_loss, X, SimpleChains.ADAM(), epochs); diff --git a/dev/examples/mnist/index.html b/dev/examples/mnist/index.html index 549cddc..71222a8 100644 --- a/dev/examples/mnist/index.html +++ b/dev/examples/mnist/index.html @@ -27,4 +27,4 @@ SimpleChains.accuracy_and_loss(lenetloss, xtrain4, p) SimpleChains.accuracy_and_loss(lenetloss, xtest4, ytest1, p)

Training for an extra 10 epochs should be fast on most systems. Performance is currently known to be poor on the M1 (PRs welcome, otherwise we'll look into this eventually), but should be good/great on systems with AVX2/AVX512:

@time SimpleChains.train_batched!(G, p, lenetloss, xtrain4, SimpleChains.ADAM(3e-4), 10);
 SimpleChains.accuracy_and_loss(lenetloss, xtrain4, p)
-SimpleChains.accuracy_and_loss(lenetloss, xtest4, ytest1, p)
+SimpleChains.accuracy_and_loss(lenetloss, xtest4, ytest1, p) diff --git a/dev/examples/smallmlp/index.html b/dev/examples/smallmlp/index.html index 9502d6e..faa3216 100644 --- a/dev/examples/smallmlp/index.html +++ b/dev/examples/smallmlp/index.html @@ -68,4 +68,4 @@ LLVM: libLLVM-14.0.5 (ORCJIT, tigerlake) Threads: 8 on 8 virtual cores Environment: - JULIA_NUM_THREADS = 8 + JULIA_NUM_THREADS = 8 diff --git a/dev/index.html b/dev/index.html index 156e569..ff5a3ea 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,11 +1,11 @@ -Home · SimpleChains.jl

SimpleChains

Documentation for SimpleChains.

SimpleChains.AbstractPenaltyType
AbstractPenalty

The AbstractPenalty interface requires supporting the following methods:

  1. getchain(::AbstractPenalty)::SimpleChain returns a SimpleChain if it is carrying one.
  2. apply_penalty(::AbstractPenalty, params)::Number returns the penalty
  3. apply_penalty!(grad, ::AbstractPenalty, params)::Number returns the penalty and updates grad to add the gradient.
source
SimpleChains.ConvType
Conv(activation, dims::Tuple{Vararg{Integer}}, outputdim::Integer)

Performs a convolution with dims and maps it to outputdim output channels, then adds a bias (one per outputdim) and applies activation elementwise.

E.g., Conv(relu, (5, 5), 16) performs a 5 × 5 convolution, and maps the input channels to 16 output channels, before adding a bias and applying relu.

Randomly initializing weights using the (Xavier) Glorot uniform distribution. The bias is zero-initialized.

source
SimpleChains.DropoutType
Dropout(p) # 0 < p < 1

Dropout layer.

When evaluated without gradients, it multiplies inputs by (1 - p). When evaluated with gradients, it randomly zeros p proportion of inputs.

source
SimpleChains.FlattenType
Flatten{N}()

Flattens the first N dimensions. E.g.,

julia> Flatten{2}()(rand(2, 3, 4))
+Home · SimpleChains.jl

SimpleChains

Documentation for SimpleChains.

SimpleChains.AbstractPenaltyType
AbstractPenalty

The AbstractPenalty interface requires supporting the following methods:

  1. getchain(::AbstractPenalty)::SimpleChain returns a SimpleChain if it is carrying one.
  2. apply_penalty(::AbstractPenalty, params)::Number returns the penalty
  3. apply_penalty!(grad, ::AbstractPenalty, params)::Number returns the penalty and updates grad to add the gradient.
source
SimpleChains.ConvType
Conv(activation, dims::Tuple{Vararg{Integer}}, outputdim::Integer)

Performs a convolution with dims and maps it to outputdim output channels, then adds a bias (one per outputdim) and applies activation elementwise.

E.g., Conv(relu, (5, 5), 16) performs a 5 × 5 convolution, and maps the input channels to 16 output channels, before adding a bias and applying relu.

Randomly initializing weights using the (Xavier) Glorot uniform distribution. The bias is zero-initialized.

source
SimpleChains.DropoutType
Dropout(p) # 0 < p < 1

Dropout layer.

When evaluated without gradients, it multiplies inputs by (1 - p). When evaluated with gradients, it randomly zeros p proportion of inputs.

source
SimpleChains.FlattenType
Flatten{N}()

Flattens the first N dimensions. E.g.,

julia> Flatten{2}()(rand(2, 3, 4))
 6×4 Matrix{Float64}:
  0.0609115  0.597285  0.279899  0.888223
  0.0667422  0.315741  0.351003  0.805629
  0.678297   0.350817  0.984215  0.399418
  0.125801   0.566696  0.96873   0.57744
  0.331961   0.350742  0.59598   0.741998
- 0.26345    0.144635  0.076433  0.330475
source
SimpleChains.FrontLastPenaltyType
FrontLastPenalty(SimpleChain, frontpen(λ₁...), lastpen(λ₂...))

Applies frontpen to all but the last layer, applying lastpen to the last layer instead. "Last layer" here ignores the loss function, i.e. if the last element of the chain is a loss layer, the then lastpen applies to the layer preceding this.

source
SimpleChains.SimpleChainType
SimpleChain([inputdim::Union{Integer,Tuple{Vararg{Integer}}, ] layers)

Construct a SimpleChain. Optional inputdim argument allows SimpleChains to check the size of inputs. Making these static will allow SimpleChains to infer size and loop bounds at compile time. Batch size generally should not be included in the inputdim. If inputdim is not specified, some methods, e.g. init_params, will require passing the size as an additional argument, because the number of parameters may be a function of the input size (e.g., for a TurboDense layer).

The layers argument holds various SimpleChains layers, e.g. TurboDense, Conv, Activation, Flatten, Dropout, or MaxPool. It may optionally terminate in an AbstractLoss layer.

These objects are callable, e.g.

c = SimpleChain(...);
+ 0.26345    0.144635  0.076433  0.330475
source
SimpleChains.FrontLastPenaltyType
FrontLastPenalty(SimpleChain, frontpen(λ₁...), lastpen(λ₂...))

Applies frontpen to all but the last layer, applying lastpen to the last layer instead. "Last layer" here ignores the loss function, i.e. if the last element of the chain is a loss layer, the then lastpen applies to the layer preceding this.

source
SimpleChains.SimpleChainType
SimpleChain([inputdim::Union{Integer,Tuple{Vararg{Integer}}, ] layers)

Construct a SimpleChain. Optional inputdim argument allows SimpleChains to check the size of inputs. Making these static will allow SimpleChains to infer size and loop bounds at compile time. Batch size generally should not be included in the inputdim. If inputdim is not specified, some methods, e.g. init_params, will require passing the size as an additional argument, because the number of parameters may be a function of the input size (e.g., for a TurboDense layer).

The layers argument holds various SimpleChains layers, e.g. TurboDense, Conv, Activation, Flatten, Dropout, or MaxPool. It may optionally terminate in an AbstractLoss layer.

These objects are callable, e.g.

c = SimpleChain(...);
 p = SimpleChains.init_params(c);
-c(X, p) # X are the independent variables, and `p` the parameter vector.
source
SimpleChains.TurboDenseType
TurboDense{B=true}(activation, outputdim::Integer)

Linear (dense) layer.

  • B specifies whether the layer includes a bias term.
  • The activation function is applied elementwise to the result.
  • outputdim indicates how many dimensions the input is mapped to.

Randomly initializing weights using the (Xavier) Glorot normal distribution. The bias is zero-initialized.

source
Base.frontMethod
Base.front(c::SimpleChain)

Useful for popping off a loss layer.

source
SimpleChains.add_lossMethod
add_loss(chn, l::AbstractLoss)

Add the loss function l to the simple chain. The loss function should hold the target you're trying to fit.

source
SimpleChains.alloc_threaded_gradMethod
alloc_threaded_grad(chn, id = nothing, ::Type{T} = Float32; numthreads = min(Threads.nthreads(), SimpleChains.num_cores())

Returns a preallocated array for writing gradients, for use with train_batched and train_unbatched. If Julia was started with multiple threads, returns a matrix with one column per thread, so they may accumulate gradients in parallel.

Note that the memory is aligned to avoid false sharing.

source
SimpleChains.biasesFunction
biases(sc::SimpleChain, p::AbstractVector, inputdim = nothing)

Returns a tuple of the biases of the SimpleChain sc, as a view of the parameter vector p.

source
SimpleChains.init_params!Function
SimpleChains.init_params!(chn, p, id = nothing)

Randomly initializes parameter vector p with input dim id. Input dim does not need to be specified if these were provided to the chain object itself. See the documentation of the individual layers to see how they are initialized, but it is generally via (Xavier) Glorot uniform or normal distributions.

source
SimpleChains.init_paramsMethod
SimpleChains.init_params(chn[, id = nothing][, ::Type{T} = Float32])

Creates a parameter vector of element type T with size matching that by id (argument not required if provided to the chain object itself). See the documentation of the individual layers to see how they are initialized, but it is generally via (Xavier) Glorot uniform or normal distributions.

source
SimpleChains.numparamMethod
numparam(d::Layer, inputdim::Tuple)

Returns a Tuple{Int,S}. The first element is the number of parameters required by the layer given an argument of size inputdim. The second argument is the size of the object returned by the layer, which can be fed into numparam of the following layer.

source
SimpleChains.paramsFunction
params(sc::SimpleChain, p::AbstractVector, inputdim = nothing)

Returns a tuple of the parameters of the SimpleChain sc, as a view of the parameter vector p.

source
SimpleChains.pullback_arg!Method
pullback_arg!(dest, layer, C̄, A, p, pu, pu2)

Computes the pullback of layer with respect to A and , storing the result in dest.

pullback_arg!(layer, C̄, A, p, pu, pu2)

Computes the pullback of layer with respect to A and , storing the result in A.

source
SimpleChains.train_batched!Method
train_batched!(g::AbstractVecOrMat, p, chn, X, opt, iters; batchsize = nothing)

Train while batching arguments.

Arguments:

  • g pre-allocated gradient buffer. Can be allocated with similar(p) (if you want to run single threaded), or alloc_threaded_grad(chn, size(X)) (size(X) argument is only necessary if the input dimension was not specified when constructing the chain). If a matrix, the number of columns gives how many threads to use. Do not use more threads than batch size would allow.
  • p is the parameter vector. It is updated inplace. It should be pre-initialized, e.g. with init_params/init_params!. This is to allow calling train_unbatched! several times to train in increments.
  • chn is the SimpleChain. It must include a loss (see SimpleChains.add_loss) containing the target information (dependent variables) you're trying to fit.
  • X the training data input argument (independent variables).
  • opt is the optimizer. Currently, only SimpleChains.ADAM is supported.
  • iters, how many iterations to train for.
  • batchsize keyword argument: the size of the batches to use. If batchsize = nothing, it'll try to do a half-decent job of picking the batch size for you. However, this is not well optimized at the moment.
source
SimpleChains.train_unbatched!Method
train_unbatched!([g::AbstractVecOrMat, ]p, chn, X, opt, iters)

Train without batching inputs.

Arguments:

  • g pre-allocated gradient buffer. Can be allocated with similar(p) (if you want to run single threaded), or alloc_threaded_grad(chn, size(X)) (size(X) argument is only necessary if the input dimension was not specified when constructing the chain). If a matrix, the number of columns gives how many threads to use. Do not use more threads than batch size would allow. This argument is optional. If excluded, it will run multithreaded (assuming you started Julia with multiple threads).
  • p is the parameter vector. It is updated inplace. It should be pre-initialized, e.g. with init_params/init_params!. This is to allow calling train_unbatched! several times to train in increments.
  • chn is the SimpleChain. It must include a loss (see SimpleChains.add_loss) containing the target information (dependent variables) you're trying to fit.
  • X the training data input argument (independent variables).
  • opt is the optimizer. Currently, only SimpleChains.ADAM is supported.
  • iters, how many iterations to train for.
source
SimpleChains.valgrad!Method
valgrad!(g, c::SimpleChain, arg, params)

g can be either an AbstractVector with the same size as params, or a Tuple{A,G}. If g is a tuple, the first element is the gradient with respect to arg, and should either be nothing (for not taking this gradient) or have the same size as arg. The second element is the gradient with respect to params, and should likewise either be nothing or have the same size as params.

Allowed destruction:

valgrad_layer!

Accepts return of previous layer (B) and returns an ouput C. If an internal layer, allowed to destroy B (e.g. dropout layer).

pullback!

Accepts adjoint of its return (). It is allowed to destroy this. It is also allowed to destroy the previous layer's return B to produce (the it receives). Thus, the pullback is not allowed to depend on C, as it may have been destroyed in producing .

source
SimpleChains.weightsFunction
weights(sc::SimpleChain, p::AbstractVector, inputdim = nothing)

Returns a tuple of the weights (parameters other than biases) of the SimpleChain sc, as a view of the parameter vector p.

source
+c(X, p) # X are the independent variables, and `p` the parameter vector.
source
SimpleChains.TurboDenseType
TurboDense{B=true}(activation, outputdim::Integer)

Linear (dense) layer.

  • B specifies whether the layer includes a bias term.
  • The activation function is applied elementwise to the result.
  • outputdim indicates how many dimensions the input is mapped to.

Randomly initializing weights using the (Xavier) Glorot normal distribution. The bias is zero-initialized.

source
Base.frontMethod
Base.front(c::SimpleChain)

Useful for popping off a loss layer.

source
SimpleChains.add_lossMethod
add_loss(chn, l::AbstractLoss)

Add the loss function l to the simple chain. The loss function should hold the target you're trying to fit.

source
SimpleChains.alloc_threaded_gradMethod
alloc_threaded_grad(chn, id = nothing, ::Type{T} = Float32; numthreads = min(Threads.nthreads(), SimpleChains.num_cores())

Returns a preallocated array for writing gradients, for use with train_batched and train_unbatched. If Julia was started with multiple threads, returns a matrix with one column per thread, so they may accumulate gradients in parallel.

Note that the memory is aligned to avoid false sharing.

source
SimpleChains.biasesFunction
biases(sc::SimpleChain, p::AbstractVector, inputdim = nothing)

Returns a tuple of the biases of the SimpleChain sc, as a view of the parameter vector p.

source
SimpleChains.init_params!Function
SimpleChains.init_params!(chn, p, id = nothing)

Randomly initializes parameter vector p with input dim id. Input dim does not need to be specified if these were provided to the chain object itself. See the documentation of the individual layers to see how they are initialized, but it is generally via (Xavier) Glorot uniform or normal distributions.

source
SimpleChains.init_paramsMethod
SimpleChains.init_params(chn[, id = nothing][, ::Type{T} = Float32])

Creates a parameter vector of element type T with size matching that by id (argument not required if provided to the chain object itself). See the documentation of the individual layers to see how they are initialized, but it is generally via (Xavier) Glorot uniform or normal distributions.

source
SimpleChains.numparamMethod
numparam(d::Layer, inputdim::Tuple)

Returns a Tuple{Int,S}. The first element is the number of parameters required by the layer given an argument of size inputdim. The second argument is the size of the object returned by the layer, which can be fed into numparam of the following layer.

source
SimpleChains.paramsFunction
params(sc::SimpleChain, p::AbstractVector, inputdim = nothing)

Returns a tuple of the parameters of the SimpleChain sc, as a view of the parameter vector p.

source
SimpleChains.pullback_arg!Method
pullback_arg!(dest, layer, C̄, A, p, pu, pu2)

Computes the pullback of layer with respect to A and , storing the result in dest.

pullback_arg!(layer, C̄, A, p, pu, pu2)

Computes the pullback of layer with respect to A and , storing the result in A.

source
SimpleChains.train_batched!Method
train_batched!(g::AbstractVecOrMat, p, chn, X, opt, iters; batchsize = nothing)

Train while batching arguments.

Arguments:

  • g pre-allocated gradient buffer. Can be allocated with similar(p) (if you want to run single threaded), or alloc_threaded_grad(chn, size(X)) (size(X) argument is only necessary if the input dimension was not specified when constructing the chain). If a matrix, the number of columns gives how many threads to use. Do not use more threads than batch size would allow.
  • p is the parameter vector. It is updated inplace. It should be pre-initialized, e.g. with init_params/init_params!. This is to allow calling train_unbatched! several times to train in increments.
  • chn is the SimpleChain. It must include a loss (see SimpleChains.add_loss) containing the target information (dependent variables) you're trying to fit.
  • X the training data input argument (independent variables).
  • opt is the optimizer. Currently, only SimpleChains.ADAM is supported.
  • iters, how many iterations to train for.
  • batchsize keyword argument: the size of the batches to use. If batchsize = nothing, it'll try to do a half-decent job of picking the batch size for you. However, this is not well optimized at the moment.
source
SimpleChains.train_unbatched!Method
train_unbatched!([g::AbstractVecOrMat, ]p, chn, X, opt, iters)

Train without batching inputs.

Arguments:

  • g pre-allocated gradient buffer. Can be allocated with similar(p) (if you want to run single threaded), or alloc_threaded_grad(chn, size(X)) (size(X) argument is only necessary if the input dimension was not specified when constructing the chain). If a matrix, the number of columns gives how many threads to use. Do not use more threads than batch size would allow. This argument is optional. If excluded, it will run multithreaded (assuming you started Julia with multiple threads).
  • p is the parameter vector. It is updated inplace. It should be pre-initialized, e.g. with init_params/init_params!. This is to allow calling train_unbatched! several times to train in increments.
  • chn is the SimpleChain. It must include a loss (see SimpleChains.add_loss) containing the target information (dependent variables) you're trying to fit.
  • X the training data input argument (independent variables).
  • opt is the optimizer. Currently, only SimpleChains.ADAM is supported.
  • iters, how many iterations to train for.
source
SimpleChains.valgrad!Method
valgrad!(g, c::SimpleChain, arg, params)

g can be either an AbstractVector with the same size as params, or a Tuple{A,G}. If g is a tuple, the first element is the gradient with respect to arg, and should either be nothing (for not taking this gradient) or have the same size as arg. The second element is the gradient with respect to params, and should likewise either be nothing or have the same size as params.

Allowed destruction:

valgrad_layer!

Accepts return of previous layer (B) and returns an ouput C. If an internal layer, allowed to destroy B (e.g. dropout layer).

pullback!

Accepts adjoint of its return (). It is allowed to destroy this. It is also allowed to destroy the previous layer's return B to produce (the it receives). Thus, the pullback is not allowed to depend on C, as it may have been destroyed in producing .

source
SimpleChains.weightsFunction
weights(sc::SimpleChain, p::AbstractVector, inputdim = nothing)

Returns a tuple of the weights (parameters other than biases) of the SimpleChain sc, as a view of the parameter vector p.

source