Optimizer

class chainer.Optimizer[ソース]

すべての数値オプティマイザの基本クラス。

This class provides basic features for all optimization methods. It optimizes parameters of a target link. The target link is registered via the setup() method, and then the update() method updates its parameters based on a given loss function.

Each optimizer implementation must be defined as a child class of Optimizer. It must override update() method. An optimizer can use internal states each of which is tied to one of the parameters. State is a dictionary of serializable values (typically arrays of size same as the corresponding parameters). In order to use state dictionaries, the optimizer must override init_state() method (or its CPU/GPU versions, init_state_cpu() and init_state_gpu()).

If the optimizer is based on single gradient computation (like most first-order methods), then it should inherit GradientMethod, which adds some features dedicated for the first order methods.

Optimizer instance also supports hook functions. Hook function is registered by the add_hook() method. Each hook function is called in registration order in advance of the actual parameter update.

変数:
  • target – Target link object. It is set by the setup() method.
  • t – Number of update steps. It must be incremented by the update() method.
  • epoch – Current epoch. It is incremented by the new_epoch() method.
accumulate_grads(grads)[ソース]

他のソースからの勾配を累積します。

This method just adds given gradient arrays to gradients that this optimizer holds. It is typically used in data-parallel optimization, where gradients for different shards are computed in parallel and aggregated by this method. This method correctly treats multiple GPU devices.

パラメータ:grads (Iterable) – Iterable of gradient arrays to be accumulated.

バージョン v1.5 で撤廃: Use the chainer.Link.addgrads() method of the target link instead.

add_hook(hook, name=None)[ソース]

フック関数を登録します。

Hook function is typically called right after the gradient computation, though the timing depends on the optimization method.

パラメータ:
  • hook (function) – Hook function. It accepts the optimizer object.
  • name (str) – Name of the registration. If omitted, hook.name is used by default.
call_hooks()[ソース]

フック関数を登録順に呼び出します。

clip_grads(maxnorm)[ソース]

Clips the norm of whole gradients up to the threshold.

パラメータ:maxnorm (float) – Threshold of gradient L2 norm.

バージョン v1.5 で撤廃: Use the GradientClipping hook function instead.

compute_grads_norm()[ソース]

Computes the norm of whole gradients.

戻り値:L2 norm of whole gradients, i.e. square root of sum of square of all gradient elements.
戻り値の型:float

警告

This method returns a CPU-computed value, which means that this method synchronizes between CPU and GPU if at least one of the gradients reside on the GPU.

バージョン v1.5 で撤廃.

init_state(param, state)[ソース]

パラメータに対応するオプティマイザの状態を初期化します。

This method should add needed items to the state dictionary. Each optimizer implementation that uses its own states should override this method or CPU/GPU dedicated versions (init_state_cpu() and init_state_gpu()).

パラメータ:
  • param (Variable) – Parameter variable.
  • state (dict) – State dictionary.
init_state_cpu(param, state)[ソース]

CPU上のオプティマイザ状態を初期化します。

This method is called from init_state() by default.

パラメータ:

参考

init_state()

init_state_gpu(param, state)[ソース]

GPUでオプティマイザの状態を初期化します。

This method is called from init_state() by default.

パラメータ:
  • param (Variable) – Parameter variable. Its data array is of type cupy.ndarray.
  • state (dict) – State dictionary.

参考

init_state()

new_epoch()[ソース]

Starts a new epoch.

This method increments the epoch count. Note that if the optimizer depends on the epoch count, then user should call this method appropriately at the beginning of each epoch.

prepare()[ソース]

Prepares for an update.

This method initializes missing optimizer states (e.g. for newly added parameters after the set up), and copies arrays in each state dictionary to CPU or GPU according to the corresponding parameter array.

remove_hook(name)[ソース]

Removes a hook function.

パラメータ:name (str) – Registered name of the hook function to remove.
serialize(serializer)[ソース]

Serializes or deserializes the optimizer.

It only saves or loads the following things:

  • Optimizer states
  • Global states (t and epoch)

It does not saves nor loads the parameters of the target link. They should be separately saved or loaded.

パラメータ:serializer (AbstractSerializer) – Serializer or deserializer object.
setup(link)[ソース]

Sets a target link and initializes the optimizer states.

Given link is set to the target attribute. It also prepares the optimizer state dictionaries corresponding to all parameters in the link hierarchy. The existing states are discarded.

パラメータ:link (Link) – Target link object.
update(lossfun=None, *args, **kwds)[ソース]

Updates the parameters and optimizer states.

This method updates the parameters of the target link and corresponding optimizer states. The behavior of this method is different for the cases either lossfun is given or not.

If lossfun is given, then this method initializes the gradients by zeros, calls it with given extra arguments, and calls the backward() method of its output to compute the gradients. The implementation might call lossfun more than once.

If lossfun is not given, then this method assumes that the gradients of all parameters are already computed. An implementation that requires multiple gradient computations might raise an error on this case.

In both cases, this method invokes the update procedure for all parameters.

パラメータ:
  • lossfun (function) – Loss function. It accepts arbitrary arguments and returns one Variable object that represents the loss (or objective) value. This argument can be omitted for single gradient-based methods. In this case, this method assumes gradient arrays computed.
  • kwds (args,) – Arguments for the loss function.
weight_decay(decay)[ソース]

Applies weight decay to the parameter/gradient pairs.

パラメータ:decay (float) – Coefficient of weight decay.

バージョン v1.5 で撤廃: Use the WeightDecay hook function instead.

zero_grads()[ソース]

Fills all gradient arrays by zeros.

バージョン v1.5 で撤廃: Use the chainer.Link.cleargrads() method for the target link instead.

class chainer.GradientMethod[ソース]

Base class of all single gradient-based optimizers.

This is an extension of the Optimizer class. Typical gradient methods that just require the gradient at the current parameter vector on an update can be implemented as its child class.

An implementation of a gradient method must override the following methods:

注釈

It is recommended to call use_cleargrads() after creating a GradientMethod object for efficiency.

update(lossfun=None, *args, **kwds)[ソース]

Updates parameters based on a loss function or computed gradients.

This method runs in two ways.

  • If lossfun is given, then use it as a loss function to compute gradients.
  • Otherwise, this method assumes that the gradients are already computed.

In both cases, the computed gradients are used to update parameters. The actual update routines are defined by the update_one() method (or its CPU/GPU versions, update_one_cpu() and update_one_gpu()).

update_one(param, state)[ソース]

Updates a parameter based on the corresponding gradient and state.

This method calls appropriate one from update_param_cpu() or update_param_gpu().

パラメータ:
  • param (Variable) – Parameter variable.
  • state (dict) – State dictionary.
update_one_cpu(param, state)[ソース]

Updates a parameter on CPU.

パラメータ:
  • param (Variable) – Parameter variable.
  • state (dict) – State dictionary.
update_one_gpu(param, state)[ソース]

Updates a parameter on GPU.

パラメータ:
  • param (Variable) – Parameter variable.
  • state (dict) – State dictionary.
use_cleargrads(use=True)[ソース]

Enables or disables use of cleargrads() in update.

パラメータ:use (bool) – If True, this function enables use of cleargrads. If False, disables use of cleargrads (zerograds is used).

注釈

Note that update() calls zerograds() by default for backward compatibility. It is recommended to call this method before first call of update because cleargrads is more efficient than zerograds.

Hook functions

class chainer.optimizer.WeightDecay(rate)[ソース]

Optimizer hook function for weight decay regularization.

This hook function adds a scaled parameter to the corresponding gradient. It can be used as a regularization.

パラメータ:rate (float) – Coefficient for the weight decay.
変数:rate (float) – Coefficient for the weight decay.
class chainer.optimizer.Lasso(rate)[ソース]

Optimizer hook function for Lasso regularization.

This hook function adds a scaled parameter to the sign of each weight. It can be used as a regularization.

パラメータ:rate (float) – Coefficient for the weight decay.
変数:rate (float) – Coefficient for the weight decay.
class chainer.optimizer.GradientClipping(threshold)[ソース]

Optimizer hook function for gradient clipping.

This hook function scales all gradient arrays to fit to the defined L2 norm threshold.

パラメータ:threshold (float) – L2 norm threshold.
変数:threshold (float) – L2 norm threshold of gradient norm.
class chainer.optimizer.GradientNoise(eta, noise_func=<function exponential_decay_noise>)[ソース]

Optimizer hook function for adding gradient noise.

This hook function simply adds noise generated by the noise_func to the gradient. By default it adds time-dependent annealed Gaussian noise to the gradient at every training step:

\[g_t \leftarrow g_t + N(0, \sigma_t^2)\]

where

\[\sigma_t^2 = \frac{\eta}{(1+t)^\gamma}\]

with \(\eta\) selected from {0.01, 0.3, 1.0} and \(\gamma = 0.55\).

パラメータ: