実装されている標準Link¶
Chainer provides many Link
implementations in the
chainer.links
package.
注釈
Some of the links are originally defined in the chainer.functions
namespace. They are still left in the namespace for backward compatibility,
though it is strongly recommended to use them via the chainer.links
package.
学習可能な接続¶
Bias¶

class
chainer.links.
Bias
(axis=1, shape=None)[ソース]¶ Broadcasted elementwise summation with learnable parameters.
Computes a elementwise summation as
bias()
function does except that its second input is a learnable bias parameter \(b\) the link has.パラメータ:  axis (int) – The first axis of the first input of
bias()
function along which its second input is applied.  shape (tuple of ints) – Shape of the learnable bias parameter. If
None
, this link does not have learnable parameters so an explicit bias needs to be given to its__call__
method’s second input.
参考
See
bias()
for details.変数: b (Variable) – Bias parameter if shape
is given. Otherwise, no attributes. axis (int) – The first axis of the first input of
Bilinear¶

class
chainer.links.
Bilinear
(left_size, right_size, out_size, nobias=False, initialW=None, initial_bias=None)[ソース]¶ Bilinear layer that performs tensor multiplication.
Bilinear is a primitive link that wraps the
bilinear()
functions. It holds parametersW
,V1
,V2
, andb
corresponding to the arguments ofbilinear()
.パラメータ:  left_size (int) – Dimension of input vector \(e^1\) (\(J\))
 right_size (int) – Dimension of input vector \(e^2\) (\(K\))
 out_size (int) – Dimension of output vector \(y\) (\(L\))
 nobias (bool) – If
True
, parametersV1
,V2
, andb
are omitted.  initialW (3D numpy array) – Initial value of \(W\).
Shape of this argument must be
(left_size, right_size, out_size)
. IfNone
, \(W\) is initialized by centered Gaussian distribution properly scaled according to the dimension of inputs and outputs. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (tuple) – Initial values of \(V^1\), \(V^2\)
and \(b\). The length this argument must be 3.
Each element of this tuple must have the shapes of
(left_size, output_size)
,(right_size, output_size)
, and(output_size,)
, respectively. IfNone
, \(V^1\) and \(V^2\) is initialized by scaled centered Gaussian distributions and \(b\) is set to \(0\). May also be a tuple of callables that takenumpy.ndarray
orcupy.ndarray
and edit its value.
参考
See
chainer.functions.bilinear()
for details.変数:
Convolution2D¶

class
chainer.links.
Convolution2D
(in_channels, out_channels, ksize, stride=1, pad=0, wscale=1, bias=0, nobias=False, use_cudnn=True, initialW=None, initial_bias=None, deterministic=False)[ソース]¶ Twodimensional convolutional layer.
This link wraps the
convolution_2d()
function and holds the filter weight and bias vector as parameters.パラメータ:  in_channels (int) – Number of channels of input arrays. If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_channels (int) – Number of channels of output arrays.
 ksize (int or pair of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k)
are equivalent.  stride (int or pair of ints) – Stride of filter applications.
stride=s
andstride=(s, s)
are equivalent.  pad (int or pair of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p)
are equivalent.  wscale (float) – Scaling factor of the initial weight.
 bias (float) – Initial bias value.
 nobias (bool) – If
True
, then this link does not use the bias term.  use_cudnn (bool) – If
True
, then this link uses cuDNN if available.  initialW (4D array) – Initial weight value. If
None
, then this function uses to initializewscale
. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (1D array) – Initial bias value. If
None
, then this function uses to initializebias
. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  deterministic (bool) – The output of this link can be
nondeterministic when it uses cuDNN.
If this option is
True
, then it forces cuDNN to use a deterministic algorithm. This option is only available for cuDNN version >= v4.
参考
See
chainer.functions.convolution_2d()
for the definition of twodimensional convolution.変数:  in_channels (int) – Number of channels of input arrays. If
ConvolutionND¶

class
chainer.links.
ConvolutionND
(ndim, in_channels, out_channels, ksize, stride=1, pad=0, initialW=None, initial_bias=None, use_cudnn=True, cover_all=False)[ソース]¶ Ndimensional convolution layer.
This link wraps the
convolution_nd()
function and holds the filter weight and bias vector as parameters.パラメータ:  ndim (int) – Number of spatial dimensions.
 in_channels (int) – Number of channels of input arrays.
 out_channels (int) – Number of channels of output arrays.
 ksize (int or tuple of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k, ..., k)
are equivalent.  stride (int or tuple of ints) – Stride of filter application.
stride=s
andstride=(s, s, ..., s)
are equivalent.  pad (int or tuple of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p, ..., p)
are equivalent.  initialW – Value used to initialize the filter weight. May be an
initializer instance or another value that
init_weight()
helper function can take.  initial_bias – Value used to initialize the bias vector. May be an
initializer instance or another value except
None
thatinit_weight()
helper function can take. IfNone
is given, this link does not use the bias vector.  use_cudnn (bool) – If
True
, then this link uses cuDNN if available. Seeconvolution_nd()
for exact conditions of cuDNN availability.  cover_all (bool) – If
True
, all spatial locations are convoluted into some output pixels. It may make the output size larger.cover_all
needs to beFalse
if you want to use cuDNN.
参考
See
convolution_nd()
for the definition of Ndimensional convolution. Seeconvolution_2d()
for the definition of twodimensional convolution.変数:
Deconvolution2D¶

class
chainer.links.
Deconvolution2D
(in_channels, out_channels, ksize, stride=1, pad=0, wscale=1, bias=0, nobias=False, outsize=None, use_cudnn=True, initialW=None, initial_bias=None, deterministic=False)[ソース]¶ Two dimensional deconvolution function.
This link wraps the
deconvolution_2d()
function and holds the filter weight and bias vector as parameters.パラメータ:  in_channels (int) – Number of channels of input arrays. If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_channels (int) – Number of channels of output arrays.
 ksize (int or pair of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k)
are equivalent.  stride (int or pair of ints) – Stride of filter applications.
stride=s
andstride=(s, s)
are equivalent.  pad (int or pair of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p)
are equivalent.  wscale (float) – Scaling factor of the initial weight.
 bias (float) – Initial bias value.
 nobias (bool) – If
True
, then this function does not use the bias term.  outsize (tuple) – Expected output size of deconvolutional operation.
It should be pair of height and width \((out_H, out_W)\).
Default value is
None
and the outsize is estimated by input size, stride and pad.  use_cudnn (bool) – If
True
, then this function uses cuDNN if available.  initialW (4D array) – Initial weight value. If
None
, then this function uses to initializewscale
. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (1D array) – Initial bias value. If
None
, then this function uses to initializebias
. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  deterministic (bool) – The output of this link can be
nondeterministic when it uses cuDNN.
If this option is
True
, then it forces cuDNN to use a deterministic algorithm. This option is only available for cuDNN version >= v4.
The filter weight has four dimensions \((c_I, c_O, k_H, k_W)\) which indicate the number of input channels, output channels, height and width of the kernels, respectively. The filter weight is initialized with i.i.d. Gaussian random samples, each of which has zero mean and deviation \(\sqrt{1/(c_I k_H k_W)}\) by default. The deviation is scaled by
wscale
if specified.The bias vector is of size \(c_O\). Its elements are initialized by
bias
argument. Ifnobias
argument is set to True, then this function does not hold the bias parameter.参考
See
chainer.functions.deconvolution_2d()
for the definition of twodimensional convolution. in_channels (int) – Number of channels of input arrays. If
DeconvolutionND¶

class
chainer.links.
DeconvolutionND
(ndim, in_channels, out_channels, ksize, stride=1, pad=0, outsize=None, initialW=None, initial_bias=0, use_cudnn=True)[ソース]¶ Ndimensional deconvolution function.
This link wraps
deconvolution_nd()
function and holds the filter weight and bias vector as its parameters.パラメータ:  ndim (int) – Number of spatial dimensions.
 in_channels (int) – Number of channels of input arrays.
 out_channels (int) – Number of channels of output arrays.
 ksize (int or tuple of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k, ..., k)
are equivalent.  stride (int or tuple of ints) – Stride of filter application.
stride=s
andstride=(s, s, ..., s)
are equivalent.  pad (int or tuple of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p, ..., p)
are equivalent.  outsize (tuple of ints) – Expected output size of deconvolutional
operation. It should be a tuple of ints that represents the output
size of each dimension. Default value is
None
and the outsize is estimated with input size, stride and pad.  initialW – Value used to initialize the filter weight. May be an
initializer instance of another value the same with that
init_weight()
function can take.  initial_bias – Value used to initialize the bias vector. May be an
initializer instance or another value except
None
the same with thatinit_weight()
function can take. IfNone
is supplied, this link does not use the bias vector.  use_cudnn (bool) – If
True
, then this link uses cuDNN if available.
変数:
DilatedConvolution2D¶

class
chainer.links.
DilatedConvolution2D
(in_channels, out_channels, ksize, stride=1, pad=0, dilate=1, wscale=1, bias=0, nobias=False, use_cudnn=True, initialW=None, initial_bias=None)[ソース]¶ Twodimensional dilated convolutional layer.
This link wraps the
dilated_convolution_2d()
function and holds the filter weight and bias vector as parameters.パラメータ:  in_channels (int) – Number of channels of input arrays. If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_channels (int) – Number of channels of output arrays.
 ksize (int or pair of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k)
are equivalent.  stride (int or pair of ints) – Stride of filter applications.
stride=s
andstride=(s, s)
are equivalent.  pad (int or pair of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p)
are equivalent.  dilate (int or pair of ints) – Dilation factor of filter applications.
dilate=d
anddilate=(d, d)
are equivalent.  wscale (float) – Scaling factor of the initial weight.
 bias (float) – Initial bias value.
 nobias (bool) – If
True
, then this link does not use the bias term.  use_cudnn (bool) – If
True
, then this link uses cuDNN if available.  initialW (4D array) – Initial weight value. If
None
, then this function uses to initializewscale
. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (1D array) – Initial bias value. If
None
, then this function uses to initializebias
. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.
参考
See
chainer.functions.dilated_convolution_2d()
for the definition of twodimensional dilated convolution.変数:  in_channels (int) – Number of channels of input arrays. If
EmbedID¶

class
chainer.links.
EmbedID
(in_size, out_size, initialW=None, ignore_label=None)[ソース]¶ Efficient linear layer for onehot input.
This is a link that wraps the
embed_id()
function. This link holds the ID (word) embedding matrixW
as a parameter.パラメータ:  in_size (int) – Number of different identifiers (a.k.a. vocabulary size).
 out_size (int) – Size of embedding vector.
 initialW (2D array) – Initial weight value. If
None
, then the matrix is initialized from the standard normal distribution. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  ignore_label (int or None) – If
ignore_label
is an int value,i
th column of return value is filled with0
.
変数: W (Variable) – Embedding parameter matrix.
GRU¶

class
chainer.links.
GRU
(n_units, n_inputs=None, init=None, inner_init=None, bias_init=0)[ソース]¶ Stateless Gated Recurrent Unit function (GRU).
GRU function has six parameters \(W_r\), \(W_z\), \(W\), \(U_r\), \(U_z\), and \(U\). All these parameters are \(n \times n\) matrices, where \(n\) is the dimension of hidden vectors.
Given two inputs a previous hidden vector \(h\) and an input vector \(x\), GRU returns the next hidden vector \(h'\) defined as
\[\begin{split}r &=& \sigma(W_r x + U_r h), \\ z &=& \sigma(W_z x + U_z h), \\ \bar{h} &=& \tanh(W x + U (r \odot h)), \\ h' &=& (1  z) \odot h + z \odot \bar{h},\end{split}\]where \(\sigma\) is the sigmoid function, and \(\odot\) is the elementwise product.
GRU
does not hold the value of hidden vector \(h\). So this is stateless. UseStatefulGRU
as a stateful GRU.パラメータ:  See:
 On the Properties of Neural Machine Translation: EncoderDecoder Approaches [Cho+, SSST2014].
 Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling [Chung+NIPS2014 DLWorkshop].
参考
Highway¶

class
chainer.links.
Highway
(in_out_size, nobias=False, activate=<function relu>, init_Wh=None, init_Wt=None, init_bh=None, init_bt=1)[ソース]¶ Highway module.
In highway network, two gates are added to the ordinal nonlinear transformation (\(H(x) = activate(W_h x + b_h)\)). One gate is the transform gate \(T(x) = \sigma(W_t x + b_t)\), and the other is the carry gate \(C(x)\). For simplicity, the author defined \(C = 1  T\). Highway module returns \(y\) defined as
\[y = activate(W_h x + b_h) \odot \sigma(W_t x + b_t) + x \odot(1  \sigma(W_t x + b_t))\]The output array has the same spatial size as the input. In order to satisfy this, \(W_h\) and \(W_t\) must be square matrices.
パラメータ:  in_out_size (int) – Dimension of input and output vectors.
 nobias (bool) – If
True
, then this function does not use the bias.  activate – Activation function of plain array. \(tanh\) is also available.
 init_Wh (2D array) – Initial weight value of plain array. If
None
, then this function uses it to initializewscale
. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  init_bh (1D array) – Initial bias value of plain array. If
None
, then this function uses it to initialize zero vector. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  init_Wt (2D array) – Initial weight value of transform array.
If
None
, then this function uses it to initializewscale
. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  init_bt (1D array) – Initial bias value of transform array.
Default value is 1 vector.
May also be a callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. Negative value is recommended by the author of the paper. (e.g. 1, 3, ...).
 See:
 Highway Networks.
Inception¶

class
chainer.links.
Inception
(in_channels, out1, proj3, out3, proj5, out5, proj_pool, conv_init=None, bias_init=None)[ソース]¶ Inception module of GoogLeNet.
It applies four different functions to the input array and concatenates their outputs along the channel dimension. Three of them are 2D convolutions of sizes 1x1, 3x3 and 5x5. Convolution paths of 3x3 and 5x5 sizes have 1x1 convolutions (called projections) ahead of them. The other path consists of 1x1 convolution (projection) and 3x3 max pooling.
The output array has the same spatial size as the input. In order to satisfy this, Inception module uses appropriate padding for each convolution and pooling.
See: Going Deeper with Convolutions.
パラメータ:  in_channels (int) – Number of channels of input arrays.
 out1 (int) – Output size of 1x1 convolution path.
 proj3 (int) – Projection size of 3x3 convolution path.
 out3 (int) – Output size of 3x3 convolution path.
 proj5 (int) – Projection size of 5x5 convolution path.
 out5 (int) – Output size of 5x5 convolution path.
 proj_pool (int) – Projection size of max pooling path.
 conv_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the convolution matrix weights. Maybe beNone
to use default initialization.  bias_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the convolution bias weights. Maybe beNone
to use default initialization.
InceptionBN¶

class
chainer.links.
InceptionBN
(in_channels, out1, proj3, out3, proj33, out33, pooltype, proj_pool=None, stride=1, conv_init=None, dtype=<class 'numpy.float32'>)[ソース]¶ Inception module of the new GoogLeNet with BatchNormalization.
This chain acts like
Inception
, while InceptionBN uses theBatchNormalization
on top of each convolution, the 5x5 convolution path is replaced by two consecutive 3x3 convolution applications, and the pooling method is configurable.See: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
パラメータ:  in_channels (int) – Number of channels of input arrays.
 out1 (int) – Output size of the 1x1 convolution path.
 proj3 (int) – Projection size of the single 3x3 convolution path.
 out3 (int) – Output size of the single 3x3 convolution path.
 proj33 (int) – Projection size of the double 3x3 convolutions path.
 out33 (int) – Output size of the double 3x3 convolutions path.
 pooltype (str) – Pooling type. It must be either
'max'
or'avg'
.  proj_pool (bool) – If
True
, do projection in the pooling path.  stride (int) – Stride parameter of the last convolution of each path.
 conv_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the convolution matrix weights. Maybe beNone
to use default initialization.  dtype (numpy.dtype) – Type to use in
~batch_normalization.BatchNormalization
.
参考
変数: train (bool) – If True
, then batch normalization layers are used in training mode. IfFalse
, they are used in testing mode.
Linear¶

class
chainer.links.
Linear
(in_size, out_size, wscale=1, bias=0, nobias=False, initialW=None, initial_bias=None)[ソース]¶ Linear layer (a.k.a. fullyconnected layer).
This is a link that wraps the
linear()
function, and holds a weight matrixW
and optionally a bias vectorb
as parameters.The weight matrix
W
is initialized with i.i.d. Gaussian samples, each of which has zero mean and deviation \(\sqrt{1/\text{in_size}}\). The bias vectorb
is of sizeout_size
. Each element is initialized with thebias
value. Ifnobias
argument is set to True, then this link does not hold a bias vector.パラメータ:  in_size (int) – Dimension of input vectors. If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_size (int) – Dimension of output vectors.
 wscale (float) – Scaling factor of the weight matrix.
 bias (float) – Initial bias value.
 nobias (bool) – If
True
, then this function does not use the bias.  initialW (2D array) – Initial weight value. If
None
, then this function uses to initializewscale
. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (1D array) – Initial bias value. If
None
, then this function uses to initializebias
. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.
参考
変数:  in_size (int) – Dimension of input vectors. If
LSTM¶

class
chainer.links.
LSTM
(in_size, out_size, **kwargs)[ソース]¶ Fullyconnected LSTM layer.
This is a fullyconnected LSTM layer as a chain. Unlike the
lstm()
function, which is defined as a stateless activation function, this chain holds upward and lateral connections as child links.It also maintains states, including the cell state and the output at the previous time step. Therefore, it can be used as a stateful LSTM.
This link supports variable length inputs. The minibatch size of the current input must be equal to or smaller than that of the previous one. The minibatch size of
c
andh
is determined as that of the first inputx
. When minibatch size ofi
th input is smaller than that of the previous input, this link only updatesc[0:len(x)]
andh[0:len(x)]
and doesn’t change the rest ofc
andh
. So, please sort input sequences in descending order of lengths before applying the function.パラメータ:  in_size (int) – Dimension of input vectors. If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_size (int) – Dimensionality of output vectors.
 lateral_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the lateral connections. Maybe beNone
to use default initialization.  upward_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the upward connections. Maybe beNone
to use default initialization.  bias_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value It is used for initialization of the biases of cell input, input gate and output gate.and gates of the upward connection. Maybe a scalar, in that case, the bias is initialized by this value. Maybe beNone
to use default initialization.  forget_bias_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value It is used for initialization of the biases of the forget gate of the upward connection. Maybe a scalar, in that case, the bias is initialized by this value. Maybe beNone
to use default initialization.
変数:  in_size (int) – Dimension of input vectors. If
MLPConvolution2D¶

class
chainer.links.
MLPConvolution2D
(in_channels, out_channels, ksize, stride=1, pad=0, wscale=1, activation=<function relu>, use_cudnn=True, conv_init=None, bias_init=None)[ソース]¶ Twodimensional MLP convolution layer of Network in Network.
This is an “mlpconv” layer from the Network in Network paper. This layer is a twodimensional convolution layer followed by 1x1 convolution layers and interleaved activation functions.
Note that it does not apply the activation function to the output of the last 1x1 convolution layer.
パラメータ:  in_channels (int) – Number of channels of input arrays.
 out_channels (tuple of ints) – Tuple of number of channels. The ith integer indicates the number of filters of the ith convolution.
 ksize (int or pair of ints) – Size of filters (a.k.a. kernels) of the
first convolution layer.
ksize=k
andksize=(k, k)
are equivalent.  stride (int or pair of ints) – Stride of filter applications at the
first convolution layer.
stride=s
andstride=(s, s)
are equivalent.  pad (int or pair of ints) – Spatial padding width for input arrays at
the first convolution layer.
pad=p
andpad=(p, p)
are equivalent.  activation (function) – Activation function for internal hidden units. Note that this function is not applied to the output of this link.
 use_cudnn (bool) – If
True
, then this link uses cuDNN if available.  conv_init – An initializer of weight matrices passed to the convolution layers.
 bias_init – An initializer of bias vectors passed to the convolution layers.
See: Network in Network <http://arxiv.org/abs/1312.4400v3>.
変数: activation (function) – Activation function.
Scale¶

class
chainer.links.
Scale
(axis=1, W_shape=None, bias_term=False, bias_shape=None)[ソース]¶ Broadcasted elementwise product with learnable parameters.
Computes a elementwise product as
scale()
function does except that its second input is a learnable weight parameter \(W\) the link has.パラメータ:  axis (int) – The first axis of the first input of
scale()
function along which its second input is applied.  W_shape (tuple of ints) – Shape of learnable weight parameter. If
None
, this link does not have learnable weight parameter so an explicit weight needs to be given to its__call__
method’s second input.  bias_term (bool) – Whether to also learn a bias (equivalent to Scale link + Bias link).
 bias_shape (tuple of ints) – Shape of learnable bias. If
W_shape
isNone
, this should be given to determine the shape. Otherwise, the bias has the same shapeW_shape
with the weight parameter andbias_shape
is ignored.
参考
See
scale()
for details.変数:  axis (int) – The first axis of the first input of
StatefulGRU¶

class
chainer.links.
StatefulGRU
(in_size, out_size, init=None, inner_init=None, bias_init=0)[ソース]¶ Stateful Gated Recurrent Unit function (GRU).
Stateful GRU function has six parameters \(W_r\), \(W_z\), \(W\), \(U_r\), \(U_z\), and \(U\). All these parameters are \(n \times n\) matrices, where \(n\) is the dimension of hidden vectors.
Given input vector \(x\), Stateful GRU returns the next hidden vector \(h'\) defined as
\[\begin{split}r &=& \sigma(W_r x + U_r h), \\ z &=& \sigma(W_z x + U_z h), \\ \bar{h} &=& \tanh(W x + U (r \odot h)), \\ h' &=& (1  z) \odot h + z \odot \bar{h},\end{split}\]where \(h\) is current hidden vector.
As the name indicates,
StatefulGRU
is stateful, meaning that it also holds the next hidden vector h’ as a state. UseGRU
as a stateless version of GRU.パラメータ:  in_size (int) – Dimension of input vector \(x\).
 out_size (int) – Dimension of hidden vector \(h\).
 init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the GRU’s input units (\(W\)). Maybe be None to use default initialization.  inner_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the GRU’s inner recurrent units (\(U\)). Maybe beNone
to use default initialization.  bias_init – A callable or scalar used to initialize the bias values for
both the GRU’s inner and input units. Maybe be
None
to use default initialization.
変数: h (Variable) – Hidden vector that indicates the state of
StatefulGRU
.参考
GRU
StatefulPeepholeLSTM¶

class
chainer.links.
StatefulPeepholeLSTM
(in_size, out_size)[ソース]¶ Fullyconnected LSTM layer with peephole connections.
This is a fullyconnected LSTM layer with peephole connections as a chain. Unlike the
LSTM
link, this chain holdspeep_i
,peep_f
andpeep_o
as child links besidesupward
andlateral
.Given a input vector \(x\), Peephole returns the next hidden vector \(h'\) defined as
\[\begin{split}a &=& \tanh(upward x + lateral h), \\ i &=& \sigma(upward x + lateral h + peep_i c), \\ f &=& \sigma(upward x + lateral h + peep_f c), \\ c' &=& a \odot i + f \odot c, \\ o &=& \sigma(upward x + lateral h + peep_o c'), \\ h' &=& o \tanh(c'),\end{split}\]where \(\sigma\) is the sigmoid function, \(\odot\) is the elementwise product, \(c\) is the current cell state, \(c'\) is the next cell state and \(h\) is the current hidden vector.
パラメータ: 変数:  upward (Linear) – Linear layer of upward connections.
 lateral (Linear) – Linear layer of lateral connections.
 peep_i (Linear) – Linear layer of peephole connections to the input gate.
 peep_f (Linear) – Linear layer of peephole connections to the forget gate.
 peep_o (Linear) – Linear layer of peephole connections to the output gate.
 c (Variable) – Cell states of LSTM units.
 h (Variable) – Output at the current time step.
StatelessLSTM¶

class
chainer.links.
StatelessLSTM
(in_size, out_size, lateral_init=None, upward_init=None, bias_init=0, forget_bias_init=0)[ソース]¶ Stateless LSTM layer.
This is a fullyconnected LSTM layer as a chain. Unlike the
lstm()
function, this chain holds upward and lateral connections as child links. This link doesn’t keep cell and hidden states.パラメータ: 変数:  upward (chainer.links.Linear) – Linear layer of upward connections.
 lateral (chainer.links.Linear) – Linear layer of lateral connections.
パラメータによる活性化/損失/正規化関数¶
BatchNormalization¶

class
chainer.links.
BatchNormalization
(size, decay=0.9, eps=2e05, dtype=<class 'numpy.float32'>, use_gamma=True, use_beta=True, initial_gamma=None, initial_beta=None, use_cudnn=True)[ソース]¶ Batch normalization layer on outputs of linear or convolution functions.
This link wraps the
batch_normalization()
andfixed_batch_normalization()
functions.It runs in three modes: training mode, finetuning mode, and testing mode.
In training mode, it normalizes the input by batch statistics. It also maintains approximated population statistics by moving averages, which can be used for instant evaluation in testing mode.
In finetuning mode, it accumulates the input to compute population statistics. In order to correctly compute the population statistics, a user must use this mode to feed minibatches running through whole training dataset.
In testing mode, it uses precomputed population statistics to normalize the input variable. The population statistics is approximated if it is computed by training mode, or accurate if it is correctly computed by finetuning mode.
パラメータ:  size (int or tuple of ints) – Size (or shape) of channel dimensions.
 decay (float) – Decay rate of moving average. It is used on training.
 eps (float) – Epsilon value for numerical stability.
 dtype (numpy.dtype) – Type to use in computing.
 use_gamma (bool) – If
True
, use scaling parameter. Otherwise, use unit(1) which makes no effect.  use_beta (bool) – If
True
, use shifting parameter. Otherwise, use unit(0) which makes no effect.  use_cudnn (bool) – If
True
, then this link uses cuDNN if available.
See: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
変数:  gamma (Variable) – Scaling parameter.
 beta (Variable) – Shifting parameter.
 avg_mean (Variable) – Population mean.
 avg_var (Variable) – Population variance.
 N (int) – Count of batches given for finetuning.
 decay (float) – Decay rate of moving average. It is used on training.
 eps (float) – Epsilon value for numerical stability. This value is added to the batch variances.
 use_cudnn (bool) – If
True
, then this link uses cuDNN if available.
BinaryHierarchicalSoftmax¶

class
chainer.links.
BinaryHierarchicalSoftmax
(in_size, tree)[ソース]¶ Hierarchical softmax layer over binary tree.
In natural language applications, vocabulary size is too large to use softmax loss. Instead, the hierarchical softmax uses product of sigmoid functions. It costs only \(O(\log(n))\) time where \(n\) is the vocabulary size in average.
At first a user need to prepare a binary tree whose each leaf is corresponding to a word in a vocabulary. When a word \(x\) is given, exactly one path from the root of the tree to the leaf of the word exists. Let \(\mbox{path}(x) = ((e_1, b_1), \dots, (e_m, b_m))\) be the path of \(x\), where \(e_i\) is an index of \(i\)th internal node, and \(b_i \in \{1, 1\}\) indicates direction to move at \(i\)th internal node (1 is left, and 1 is right). Then, the probability of \(x\) is given as below:
\[\begin{split}P(x) &= \prod_{(e_i, b_i) \in \mbox{path}(x)}P(b_i  e_i) \\ &= \prod_{(e_i, b_i) \in \mbox{path}(x)}\sigma(b_i x^\top w_{e_i}),\end{split}\]where \(\sigma(\cdot)\) is a sigmoid function, and \(w\) is a weight matrix.
This function costs \(O(\log(n))\) time as an average length of paths is \(O(\log(n))\), and \(O(n)\) memory as the number of internal nodes equals \(n  1\).
パラメータ:  in_size (int) – Dimension of input vectors.
 tree – A binary tree made with tuples like ((1, 2), 3).
変数: W (Variable) – Weight parameter matrix.
See: Hierarchical Probabilistic Neural Network Language Model [Morin+, AISTAT2005].

static
create_huffman_tree
(word_counts)[ソース]¶ Makes a Huffman tree from a dictionary containing word counts.
This method creates a binary Huffman tree, that is required for
BinaryHierarchicalSoftmax
. For example,{0: 8, 1: 5, 2: 6, 3: 4}
is converted to((3, 1), (2, 0))
.パラメータ: word_counts (dict of int key and int or float values) – Dictionary representing counts of words. 戻り値: Binary Huffman tree with tuples and keys of word_coutns
.
BlackOut¶

class
chainer.links.
BlackOut
(in_size, counts, sample_size)[ソース]¶ BlackOut loss layer.
参考
black_out()
for more detail.パラメータ: 変数: W (Variable) – Weight parameter matrix.
CRF1d¶
PReLU¶

class
chainer.links.
PReLU
(shape=(), init=0.25)[ソース]¶ Parametric ReLU function as a link.
パラメータ:  shape (tuple of ints) – Shape of the parameter array.
 init (float) – Initial parameter value.
See the paper for details: Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification.
変数: W (Variable) – Coefficient of parametric ReLU.
Maxout¶

class
chainer.links.
Maxout
(in_size, out_size, pool_size, wscale=1, initialW=None, initial_bias=0)[ソース]¶ Fullyconnected maxout layer.
Let
M
,P
andN
be an input dimension, a pool size, and an output dimension, respectively. For an input vector \(x\) of sizeM
, it computes\[Y_{i} = \mathrm{max}_{j} (W_{ij\cdot}x + b_{ij}).\]Here \(W\) is a weight tensor of shape
(M, P, N)
, \(b\) an optional bias vector of shape(M, P)
and \(W_{ij\cdot}\) is a subvector extracted from \(W\) by fixing first and second dimensions to \(i\) and \(j\), respectively. Minibatch dimension is omitted in the above equation.As for the actual implementation, this chain has a Linear link with a
(M * P, N)
weight matrix and an optionalM * P
dimensional bias vector.パラメータ:  in_size (int) – Dimension of input vectors.
 out_size (int) – Dimension of output vectors.
 pool_size (int) – Number of channels.
 wscale (float) – Scaling factor of the weight matrix.
 initialW (3D array or None) – Initial weight value.
If
None
, then this function useswscale
to initialize.  initial_bias (2D array, float or None) – Initial bias value.
If it is float, initial bias is filled with this value.
If
None
, bias is omitted.
変数: linear (Link) – The Linear link that performs affine transformation.
参考
参考
Goodfellow, I., Wardefarley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout Networks. In Proceedings of the 30th International Conference on Machine Learning (ICML13) (pp. 13191327). URL
NegativeSampling¶

class
chainer.links.
NegativeSampling
(in_size, counts, sample_size, power=0.75)[ソース]¶ Negative sampling loss layer.
This link wraps the
negative_sampling()
function. It holds the weight matrix as a parameter. It also builds a sampler internally given a list of word counts.パラメータ: 参考
negative_sampling()
for more detail.変数: W (Variable) – Weight parameter matrix.
機械学習モデル¶
Classifier¶

class
chainer.links.
Classifier
(predictor, lossfun=<function softmax_cross_entropy>, accfun=<function accuracy>)[ソース]¶ A simple classifier model.
This is an example of chain that wraps another chain. It computes the loss and accuracy based on a given input/label pair.
パラメータ: 変数:  predictor (Link) – Predictor network.
 lossfun (function) – Loss function.
 accfun (function) – Function that computes accuracy.
 y (Variable) – Prediction for the last minibatch.
 loss (Variable) – Loss value for the last minibatch.
 accuracy (Variable) – Accuracy for the last minibatch.
 compute_accuracy (bool) – If
True
, compute accuracy on the forward computation. The default value isTrue
.