7. Copyright (C) DeNA Co.,Ltd. All Rights Reserved.
レコメンデーション例
SOCIAL
芸能News
スポーツNews
▶ Play
✔ Read
興味A
興味B
INTEREST
☆ Like
ACTION
n 親しい友⼈の最近の興味を知ることで、⾃分の興味を広げることが
できる
Social
PF
Friend
Game
Friend
Game
Communication
FriendCommunication
PF
Communication
User
Impression
User
Click
ムに出会え、親しい仲間と複数ゲームを楽しめる
親しいユーザと
一緒に楽しめる
)
親しいユーザが楽しんでいる
ゲームに出会う機会を提供する
(親しいユーザ軸でのゲームとの出会い)
親しいユーザが
増える
興味があうユーザと
出会い・親しくなれる
機会を提供する (友
達推薦・コミュニケー
ション推薦、等)
Familiarities
SANTOS
(Social-Activity NeTwork Optimization System)
15. Copyright (C) DeNA Co.,Ltd. All Rights Reserved.
画像認識における深層学習の成功
n ILSVRC2012(画像認識のコンテスト)での深層学習の圧勝
⁃ Deep Neural Networkを活⽤したチームが他を圧倒
⁃ 近年の深層学習ブームの⽕付け役に
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
2
4
6
8
10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
2
4
6
8
10
Error&(5&predic1ons/image)&
#&Submissions&
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
2
4
6
8
10
ILSVRC&2010&
ILSVRC&2011&
ILSVRC$2012$
0.28&
0.26&
0.16&
Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most pro
The correct label is written under each image, and the probability assigned to the correct
Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and
the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–
http://www.image-net.org/challenges/LSVRC/
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
Deng, J., et al. "Large scale visual recognition challenge (2012)." (2012).
17. Copyright (C) DeNA Co.,Ltd. All Rights Reserved.
深層学習 × 画像⽣成
n Variational Autoencoder (Kingma+, 2014)
⁃ ⼆つのニューラルネットワークを同時に学習する
• 実際の画像xから潜在変数zを推論するニューラルネットワーク
• 潜在変数zとラベルyから画像xʼを⽣成するニューラルネットワーク
(a) Handwriting styles for MNIST obtained by fixing the class label and varying the 2D latent variable z
(b) MNIST analogies (c) SVHN analogies
Figure 1: (a) Visualisation of handwriting styles learned by the model with 2D z-space. (b,c)
Analogical reasoning with generative semi-supervised models using a high-dimensional z-space.
The leftmost columns show images from the test set. The other columns show analogical fantasies
of x by the generative model, where the latent variable z of each row is set to the value inferred from
the test-set image on the left by the inference network. Each column corresponds to a class label y.
Table 2: Semi-supervised classification on
the SVHN dataset with 1000 labels.
KNN TSVM M1+KNN M1+TSVM M1+M2
Table 3: Semi-supervised classification on
the NORB dataset with 1000 labels.
KNN TSVM M1+KNN M1+TSVM
ラベルを固定、潜在変数の値を連続的に変えて数字を⽣成
(a) Handwriting styles for MNIST obtained by fixing the class label and varying the 2D latent variable z
潜在変数を固定し、ラベルを0 ~ 9 まで変えて⽣成
Kingma, Diederik P., et al. "Semi-supervised learning with deep generative models." Advances in Neural Information Processing Systems. 2014.
18. Copyright (C) DeNA Co.,Ltd. All Rights Reserved.
深層学習 × 画像⽣成
n DCGAN (Radford+, 2015)
⁃ ⼆つのニューラルネットワークを同時に学習する
• データセットを基に、ありそうな画像を⽣成するニューラルネットワーク
• ⼊⼒された画像が本物画像なのか、⽣成器により作られた偽物なのかを識別
するニューラルネットワーク
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint
arXiv:1511.06434 (2015).
⽣成された画像例 画像の演算も可能に
19. Copyright (C) DeNA Co.,Ltd. All Rights Reserved.
深層学習 × ⾃然⾔語処理
n Sequence to Sequence Learning (Sutskever+, 2014)
⁃ ⼊⼒された⽂に対し、対になる⽂を単語単位で⽣成する
n Neural Conversational Model (Vinyals+, 2015)
⁃ 翻訳などで⽤いられたseq2seqを対話タスクへ応⽤
道徳観についてのやりとり
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
Oriol Vinyals, Quoc Le.A Neural Conversational Model. arXiv:1506.05869. In ICML2015.
Human: are you a man or a woman ?
Machine: i ’m a woman .
Human: why are we here ?
Machine: i ’m not sure .
Human: okay, bye !
Machine: bye .
Human: see you later .
Machine: bye .
Conversation 2: Simple Q&A (contexts and multiple
choice)
Human: my name is david . what is my name ?
Machine: david .
Human: my name is john . what is my name ?
Machine: john .
Human: are you a leader or a follower ?
Machine: i ’m a leader .
Human: are you a follower or a leader ?
Machine: i ’m a leader .
Conversation 3: General knowledge Q&A
Human: who is skywalker ?
Machine: he is a hero .
Human: who is bill clinton ?
Machine: he ’s a billionaire .
Human: is sky blue or black ?
Machine: blue .
Human: does a cat have a tail ?
Machine: yes .
Human: does a cat have a wing ?
Machine: no
Human: can a cat fly ?
Machine: no .
Human: how many legs does a cat have ?
Machine: four , i think .
Human: how many legs does a spider have ?
Machine: three , i think .
Human: how many legs does a centipede have ?
Machine: eight .
Human: what is the color of the sky ?
Machine: blue .
Human: what is the color of water ?
Machine: water .
Human: what is the color of blood ?
Machine: it is the same as a black eye .
Human: what is the usual color of a leaf ?
Machine: it is a green one .
Human: what is the color of a yellow car ?
Machine: yellow .
Human: how much is two plus two ?
Machine: four .
Machine: seventy - two .
Conversation 4: Philosophical Q&A
Human: what is the purpose of life ?
Machine: to serve the greater good .
Human: what is the purpose of living ?
Machine: to live forever .
Human: what is the purpose of existence ?
Machine: to find out what happens when we get
to the planet earth .
Human: where are you now ?
Machine: i ’m in the middle of nowhere .
Human: what is the purpose of dying ?
Machine: to have a life .
Human: what is the purpose of being intelligent
?
Machine: to find out what it is .
Human: what is the purpose of emotions ?
Machine: i don ’t know .
Conversation 5: Morality
Human: what is moral ?
Machine: what empowered humanity , what in-
tellectual the essence is .
Human: what is immoral ?
Machine: the fact that you have a child .
Human: what is morality ?
Machine: what is altruism ?
Human: what is the definition of altruism ?
Machine: if you don ’t believe in god , then you
don ’t know .
Human: ok ... so what is the definition of moral-
ity ?
Machine: well , the truth is , you ’re not a be-
liever in god almighty .
Human: tell me the definition of morality , i am
quite upset now !
Machine: i ’m not ashamed of being a philoso-
pher !
Human: tell me the definition of morality .
Machine: i don ’t have ethics .
Human: look , i need help , i need to know more
about morality ...
Machine: i don ’t know what ethics is .
Human: seriously , what is morality ?
Machine: what is the definition of living ?
Human: why living has anything to do with
morality ?
Machine: you ’re not a cop .
Human: okay , i need to know how should i be-
have morally ...
A Neural Con
Machine: i don ’t know how to tell you .
Human: is morality and ethics the same ?
Machine: yes , sir .
Human: what are the things that i do to be im-
moral ?
Machine: i don ’t know .
Human: give me some examples of moral ac-
tions ...
Machine: i ’m not a moralist .
Human: alright , morality ?
Machine: integrity .
Human: what is integrity ?
Machine: i ’m sorry , i don ’t know what else to
say .
Human: be moral !
Machine: be a man !
Human: i really like our discussion on morality
and ethics ...
Machine: and how i ’m not in the mood for a
philosophical debate .
Human: what do you like to talk about ?
Machine: nothing .
Conversation 6: Opinions (notice the model outputs “he”
“she” or “it” depending on the subject being asked)
Human: what do you think about tesla ?
Machine: he ’s a good conductor .
Human: what do you think about bill gates ?
Machine: he ’s a good man .
Human: What do you think about messi ?
Machine: he ’s a great player .
Human: what do you think about cleopatra ?
Sequence to Sequence
sequence of words representing the answer. It is therefore clear that a domain-independent method
that learns to map sequences to sequences would be useful.
Sequences pose a challenge for DNNs because they require that the dimensionality of the inputs and
outputs is known and fixed. In this paper, we show that a straightforward application of the Long
Short-Term Memory (LSTM) architecture [16] can solve general sequence to sequence problems.
The idea is to use one LSTM to read the input sequence, one timestep at a time, to obtain large fixed-
dimensional vector representation, and then to use another LSTM to extract the output sequence
from that vector (fig. 1). The second LSTM is essentially a recurrent neural network language model
[28, 23, 30] except that it is conditioned on the input sequence. The LSTM’s ability to successfully
learn on data with long range temporal dependencies makes it a natural choice for this application
due to the considerable time lag between the inputs and their corresponding outputs (fig. 1).
There have been a number of related attempts to address the general sequence to sequence learning
problem with neural networks. Our approach is closely related to Kalchbrenner and Blunsom [18]
who were the first to map the entire input sentence to vector, and is very similar to Cho et al. [5].
Graves [10] introduced a novel differentiable attention mechanism that allows neural networks to
focus on different parts of their input, and an elegant variant of this idea was successfully applied
to machine translation by Bahdanau et al. [2]. The Connectionist Sequence Classification is another
popular technique for mapping sequences to sequences with neural networks, although it assumes a
monotonic alignment between the inputs and the outputs [11].
Figure 1: Our model reads an input sentence “ABC” and produces “WXYZ” as the output sentence. The
model stops making predictions after outputting the end-of-sentence token. Note that the LSTM reads the
input sentence in reverse, because doing so introduces many short term dependencies in the data that make the
optimization problem much easier.
The main result of this work is the following. On the WMT’14 English to French translation task,
we obtained a BLEU score of 34.81 by directly extracting translations from an ensemble of 5 deep
LSTMs (with 380M parameters each) using a simple left-to-right beam-search decoder. This is
by far the best result achieved by direct translation with large neural networks. For comparison,
the BLEU score of a SMT baseline on this dataset is 33.30 [29]. The 34.81 BLEU score was
achieved by an LSTM with a vocabulary of 80k words, so the score was penalized whenever the
reference translation contained a word not covered by these 80k. This result shows that a relatively
unoptimized neural network architecture which has much room for improvement outperforms a
mature phrase-based SMT system.
Finally, we used the LSTM to rescore the publicly available 1000-best lists of the SMT baseline on
the same task [29]. By doing so, we obtained a BLEU score of 36.5, which improves the baseline
by 3.2 BLEU points and is close to the previous state-of-the-art (which is 37.0 [9]).
Surprisingly, the LSTM did not suffer on very long sentences, despite the recent experience of other
researchers with related architectures [26]. We were able to do well on long sentences because we
reversed the order of words in the source sentence but not the target sentences in the training and test
set. By doing so, we introduced many short term dependencies that made the optimization problem
much simpler (see sec. 2 and 3.3). As a result, SGD could learn LSTMs that had no trouble with
long sentences. The simple trick of reversing the words in the source sentence is one of the key
technical contributions of this work.
A useful property of the LSTM is that it learns to map an input sentence of variable length into
a fixed-dimensional vector representation. Given that translations tend to be paraphrases of the
source sentences, the translation objective encourages the LSTM to find sentence representations
that capture their meaning, as sentences with similar meanings are close to each other while different
20. Copyright (C) DeNA Co.,Ltd. All Rights Reserved.
深層学習 × ⾃然⾔語処理
n Memory Network (Sukhbaatar+, 2015)
⁃ 質問応答などのタスクで、質問⽂以外に何か別のリソースを参照して返答
することを可能にする
Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "End-to-end memory networks." Advances in neural information processing systems. 2015.
http://www.thespermwhale.com/jaseweston/icml2016/icml2016-memnn-tutorial.pdf
⽂章を参照した質問応答
interest in using neural network based models for the task, with RNNs [14] and LSTMs [10, 20]
showing clear performance gains over traditional methods. Indeed, the current state-of-the-art is
held by variants of these models, for example very large LSTMs with Dropout [25] or RNNs with
diagonal constraints on the weight matrix [15]. With appropriate weight tying, our model can be
regarded as a modified form of RNN, where the recurrence is indexed by memory lookups to the
word sequence rather than indexed by the sequence itself.
4 Synthetic Question and Answering Experiments
We perform experiments on the synthetic QA tasks defined in [22] (using version 1.1 of the dataset).
A given QA task consists of a set of statements, followed by a question whose answer is typically
a single word (in a few tasks, answers are a set of words). The answer is available to the model at
training time, but must be predicted at test time. There are a total of 20 different types of tasks that
probe different forms of reasoning and deduction. Here are samples of three of the tasks:
Sam walks into the kitchen. Brian is a lion. Mary journeyed to the den.
Sam picks up an apple. Julius is a lion. Mary went back to the kitchen.
Sam walks into the bedroom. Julius is white. John journeyed to the bedroom.
Sam drops the apple. Bernhard is green. Mary discarded the milk.
Q: Where is the apple? Q: What color is Brian? Q: Where was the milk before the den?
A. Bedroom A. White A. Hallway
Note that for each question, only some subset of the statements contain information needed for
the answer, and the others are essentially irrelevant distractors (e.g. the first sentence in the first
example). In the Memory Networks of Weston et al. [22], this supporting subset was explicitly
indicated to the model during training and the key difference between that work and this one is that
this information is no longer provided. Hence, the model must deduce for itself at training and test
time which sentences are relevant and which are not.
Formally, for one of the 20 QA tasks, we are given example problems, each having a set of I
sentences {xi} where I 320; a question sentence q and answer a. Let the jth word of sentence
i be xij, represented by a one-hot vector of length V (where the vocabulary is of size V = 177,
reflecting the simplistic nature of the QA language). The same representation is used for the
question q and answer a. Two versions of the data are used, one that has 1000 training problems
per task and a second larger one with 10,000 per task.
4.1 Model Details
Unless otherwise stated, all experiments used a K = 3 hops model with the adjacent weight sharing
scheme. For all tasks that output lists (i.e. the answers are multiple words), we take each possible
wikipediaを参照した質問応答
Recent Work: New Models for QA on documents
Miller et al. Key-Value Memory Networks for Directly
Reading Documents. arXiv:1606.03126.
2.1 Single Layer
We start by describing our model in the single layer case, which implements a single memory hop
operation. We then show it can be stacked to give multiple hops in memory.
Input memory representation: Suppose we are given an input set x1, .., xi to be stored in memory.
The entire set of {xi} are converted into memory vectors {mi} of dimension d computed by
embedding each xi in a continuous space, in the simplest case, using an embedding matrix A (of
size d⇥V ). The query q is also embedded (again, in the simplest case via another embedding matrix
B with the same dimensions as A) to obtain an internal state u. In the embedding space, we compute
the match between u and each memory mi by taking the inner product followed by a softmax:
pi = Softmax(uT
mi). (1)
where Softmax(zi) = ezi
/
P
j ezj
. Defined in this way p is a probability vector over the inputs.
Output memory representation: Each xi has a corresponding output vector ci (given in the
simplest case by another embedding matrix C). The response vector from the memory o is then a
sum over the transformed inputs ci, weighted by the probability vector from the input:
o =
X
i
pici. (2)
Because the function from input to output is smooth, we can easily compute gradients and back-
propagate through it. Other recently proposed forms of memory or attention take this approach,
notably Bahdanau et al. [2] and Graves et al. [8], see also [9].
Generating the final prediction: In the single layer case, the sum of the output vector o and the
input embedding u is then passed through a final weight matrix W (of size V ⇥ d) and a softmax
to produce the predicted label:
ˆa = Softmax(W(o + u)) (3)
The overall model is shown in Fig. 1(a). During training, all three embedding matrices A, B and C,
as well as W are jointly learned by minimizing a standard cross-entropy loss between ˆa and the true
label a. Training is performed using stochastic gradient descent (see Section 4.2 for more details).
Question
q
OutputInput
Embedding B
Embedding C
Weights
Softmax
Weighted Sum
pi
ci
mi
Sentences
{xi}
Embedding A
o W
Softmax
Predicted
Answer
a^
u
u
Inner Product
Out3In3
B
Sentences
W a^
{xi}
o1
u1
o2
u2
o3
u3
A1
C1
A3
C3
A2
C2
Question q
Out2In2Out1In1
Predicted
Answer
(a) (b)
Figure 1: (a): A single layer version of our model. (b): A three layer version of our model. In
practice, we can constrain several of the embedding matrices to be the same (see Section 2.2).
2.2 Multiple Layers
We now extend our model to handle K hop operations. The memory layers are stacked in the
following way:
• The input to layers above the first is the sum of the output ok
and the input uk
from layer k
(different ways to combine ok
and uk
are proposed later):
uk+1
= uk
+ ok
. (4)
2
21. Copyright (C) DeNA Co.,Ltd. All Rights Reserved.
深層学習 × ゲームを攻略するAI
n Deep Q Network (Mnih+, 2015)
⁃ ブロック崩しやインベーダーゲームなどを、画⾯の画像特徴を基に
強化学習で攻略する
n AlphaGo (Silver+, 2016)
⁃ 教師あり学習と強化学習を組み合わせて囲碁の戦略を学習する
⁃ ⼈間のプロ囲碁棋⼠をハンデ無しで破ったことで話題になった
Deep Q Network
Figure 1: Neural network training pipeline and architecture. a A fast rollout policy p⇡ and su-
pervised learning (SL) policy network p are trained to predict human expert moves in a data-set of
positions. A reinforcement learning (RL) policy network p⇢ is initialised to the SL policy network,
and is then improved by policy gradient learning to maximize the outcome (i.e. winning more
games) against previous versions of the policy network. A new data-set is generated by playing
games of self-play with the RL policy network. Finally, a value network v✓ is trained by regression
to predict the expected outcome (i.e. whether the current player wins) in positions from the self-
AlphaGo
Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.
Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.