CUDA utilities

Device, context and memory management on CuPy.

Chainer uses CuPy (with very thin wrapper) to exploit the speed of GPU computation. Following modules and classes are imported to cuda module for convenience (refer to this table when reading chainer’s source codes).

imported name original name
chainer.cuda.cupy cupy
chainer.cuda.ndarray cupy.ndarray
chainer.cuda.cupy.cuda cupy.cuda
chainer.cuda.Device cupy.cuda.Device
chainer.cuda.Event cupy.cuda.Event
chainer.cuda.Stream cupy.cuda.Stream

Chainer replaces the default allocator of CuPy by its memory pool implementation. It enables us to reuse the device memory over multiple forward/backward computations, and temporary arrays for consecutive elementwise operations.

Devices

chainer.cuda.get_device(*args)[ソース]

Gets the device from a device object, an ID integer or an array object.

This is a convenient utility to select a correct device if the type of arg is unknown (i.e., one can use this function on arrays that may be on CPU or GPU). The returned device object supports the context management protocol of Python for the with statement.

パラメータ:args – Values to specify a GPU device. The first device object, integer or cupy.ndarray object is used to select a device. If it is a device object, it is returned. If it is an integer, the corresponding device is returned. If it is a CuPy array, the device on which this array reside is returned. If any arguments are neither integers nor CuPy arrays, a dummy device object representing CPU is returned.
戻り値:Device object specified by given args.

参考

See cupy.cuda.Device for the device selection not by arrays.

CuPy array allocation and copy

注釈

As of v1.3.0, the following array construction wrappers are marked as deprecated. Use the corresponding functions of the cupy module instead. The main difference of them is that the default dtype is changed from float32 to float64.

Deprecated functions Recommended functions
chainer.cuda.empty cupy.empty()
chainer.cuda.empty_like cupy.empty_like()
chainer.cuda.zeros cupy.zeros()
chainer.cuda.zeros_like cupy.zeros_like()
chainer.cuda.ones cupy.ones()
chainer.cuda.ones_like cupy.ones_like()
chainer.cuda.full cupy.full()
chainer.cuda.full_like cupy.full_like()
chainer.cuda.copy(array, out=None, out_device=None, stream=None)[ソース]

Copies a cupy.ndarray object using the default stream.

This function can copy the device array to the destination array on another device.

パラメータ:
  • array (cupy.ndarray) – Array to be copied.
  • out (cupy.ndarray) – Destination array. If it is not None, then out_device argument is ignored.
  • out_device – Destination device specifier. Actual device object is obtained by passing this value to get_device().
  • stream (cupy.cuda.Stream) – CUDA stream.
戻り値:

Copied array.

If out is not specified, then the array is allocated on the device specified by out_device argument.

戻り値の型:

cupy.ndarray

chainer.cuda.to_cpu(array, stream=None)[ソース]

Copies the given GPU array to host CPU.

パラメータ:
  • array – Array to be sent to CPU.
  • stream (cupy.cuda.Stream) – CUDA stream.
戻り値:

Array on CPU.

If given array is already on CPU, then this function just returns array without performing any copy.

戻り値の型:

numpy.ndarray

chainer.cuda.to_gpu(array, device=None, stream=None)[ソース]

Copies the given CPU array to specified device.

パラメータ:
  • array – Array to be sent to GPU.
  • device – Device specifier.
  • stream (cupy.cuda.Stream) – CUDA stream. If not None, the copy runs asynchronously.
戻り値:

Array on GPU.

If array is already on GPU, then this function just returns array without performing any copy. Note that this function does not copy cupy.ndarray into specified device.

戻り値の型:

cupy.ndarray

Kernel definition utilities

chainer.cuda.memoize(for_each_device=False)[ソース]

Makes a function memoizing the result for each argument and device.

This is a similar version of cupy.memoize(). The difference is that this function can be used in the global scope even if CUDA is not available. In such case, this function does nothing.

注釈

This decorator acts as a dummy if CUDA is not available. It cannot be used for general purpose memoization even if for_each_device is set to False.

chainer.cuda.clear_memo()[ソース]

Clears the memoized results for all functions decorated by memoize.

This function works like cupy.clear_memo() as a counterpart for chainer.cuda.memoize(). It can be used even if CUDA is not available. In such a case, this function does nothing.

chainer.cuda.elementwise(in_params, out_params, operation, name, **kwargs)[ソース]

Creates an elementwise kernel function.

This function uses memoize() to cache the kernel object, i.e. the resulting kernel object is cached for each argument combination and CUDA device.

The arguments are the same as those for cupy.ElementwiseKernel, except that the name argument is mandatory.

chainer.cuda.reduce(in_params, out_params, map_expr, reduce_expr, post_map_expr, identity, name, **kwargs)[ソース]

Creates a global reduction kernel function.

This function uses memoize() to cache the resulting kernel object, i.e. the resulting kernel object is cached for each argument combination and CUDA device.

The arguments are the same as those for cupy.ReductionKernel, except that the name argument is mandatory.

CPU/GPU generic code support

chainer.cuda.get_array_module(*args)[ソース]

Gets an appropriate one from numpy or cupy.

This is almost equivalent to cupy.get_array_module(). The only difference is that this function can be used even if CUDA is not available.

パラメータ:args – Values to determine whether NumPy or CuPy should be used.
戻り値:cupy or numpy is returned based on the types of the arguments.
戻り値の型:module

cuDNN support

chainer.cuda.set_max_workspace_size(size)[ソース]

Sets the workspace size for cuDNN.

Check “cuDNN Library User Guide” for detail.

パラメータ:size – The workspace size for cuDNN.
chainer.cuda.get_max_workspace_size()[ソース]

Gets the workspace size for cuDNN.

Check “cuDNN Library User Guide” for detail.

戻り値:The workspace size for cuDNN.
戻り値の型:int