Comparison of existing implementation

functionality	gpu nd array(python interface)	Theano CudaNdarray	GPUmat GPU(single/double)
backend	cuda/opencl	cuda	cuda
dtype	float32 {u}int{8,16,32,64} complex64 (float64 and complex128 possible)	float32	float32, complex32, float64, complex64
ndim	generic	generic	generic
memory layout	generic	generic	generic
contiguous transfer to/from gpu	Yes	Yes	Yes
not contiguous transfer to/from gpu	copy if needed	copy if needed	copy if needed
ascontiguousarray	Yes	No	No
asfortranarray	Yes	No	No
copy	Yes	Yes	Yes, clone()
zeros	Yes	Yes	Yes
empty	Yes	No	Yes: GPUsingle();setSize();GPUallocVector()
len	Yes	Yes	Yes: length()
subtensor(var[…])	Yes	Yes	Yes
subtensor(var[N])	Yes	Yes	Yes
subtensor(var[strides with step])	Yes	Yes	Yes
subtensor(var[strides with neg start/stop/step])	Yes	Yes	Yes
subtensor(var[ tuple with mix of slice, integer and numpy.int64])	Yes	Yes	No
elemwise	generic with 1 output with dimensions collapsing, mixed dtype	as gpu nd array	as gpu nd array
elemwise with broadcasting	Yes	Yes	Yes
reduction	sum/prod generic for ndim and any combination of reduced axis	sum/prod/min/max only with this pattern: 1, 11, 10, 01, 001, 010, 100, 110, 011, 111, 0011, 0101, 0111, 1011, 1111, pattern 1+ use only 1 block	sum
__setitem__	Yes (with broadcast if necessary)	Theano Op for slice/int/and list of int.	Yes: subsasgn(), assign()
reshape	Yes	Yes (copy if not c_contiguous)	Yes: setSize(), reshape()
n-dim transpose	Yes (copy when numpy would copy)	Yes(can add dim with shape 1 at the same time)	No
dot/gemm	Yes*	Theano op	Yes: times(), GPUtimes()
gemv	Yes*	Theano op	?

It need an external blas, that is included with CUDA. For OpenCL back-end you can use clmath, but clmath support isn’t good on Mac and Windows.

Not done but planned in gpu nd array.

ones	No	Theano op only	Yes
subtensor with a list of index var[1,2,3,4] (part of numpy advanced indexing)	No	Yes	Yes: slice(A, {[1,2,3,4]})
reduction (max, min, argmax)	No	No	No
ger	No	Theano op	?
flatten	No(you can use reshape for this)	Yes	?
random	No	mrg, curand	Yes: GPUrand(), GPUrandn()
join	No	Theano op	?

Other Theano op: CrossentropySoftmaxArgmax1HotWithBias, CrossentropySoftmax1HotWithBiasDx, Softmax, SoftmaxWithBias, DownsampleFactorMax, GpuImages2Neibs, Dot22SCalar, GpuEye, ErfinvGPU

gnumpy: as_garray, as_garray_or_scalar, as_numpy_array, tile(the same as numpy?), rand, randn, empty, zeros, ones, seed_rand, dot(0d,*d), dot(1d,1d), dot(1d,2d) dot(2d,1d), dot(2d,2d), dot(a1.ndim >= 2, a2.ndim >= 2) with reshape and transpose(transpose done by a loop?), outer, concatenate, where, nonzero, support newaxis?, eye, diagflat, tensordot, reduction(all, any, sum, mean, max, min, (prod and std cpu only)), elemwise(abs, exp, isinf, isnan, log, log_1_plus_exp, logistic, negative, sign, sqrt, tanh, (cpu only: log10)) gnumpy.garray fct: as_numpy_array, astype, ravel(call self.reshape(-1)), item(transfert to cpu), sort(cpu only), reshape_2d, T, transpose, shiftAxesRight, copy, diagflat, diagonal, diag, all_real, isinf, isreal, isnan, isnumber, abs, as_bool, exp, log, log_1_plus_exp, logistic, sigmoid, sign, sqrt, tanh, sum, mean, max, argmax(cpu), argmin(cpu), min, all, any, all2, any2, rand, euclid_norm, dot, where, nonzero, __lt__, gt, le, ge, ne, eq, sub, div, rmul, radd, rsub, rdiv, rpow, pos, neg, iadd, imul, isub, idiv, imod, ipow, len, getitem, iter, __setitem__

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison of existing implementation

It need an external blas, that is included with CUDA. For OpenCL back-end you can use clmath, but clmath support isn’t good on Mac and Windows.

Clone this wiki locally