The dominant paradigm for retrieving information today is search and fetch (e.g. Google). However, reasoning (i.e. involving the manipulation of knowledge in response to a question) is starting to come-of-age.
I’m going to cover a few recently published neural-network approaches to machine reasoning as well as related background:
- Example problems
- Knowledge graphs
- Iterative reasoning networks (specifically, MACnets)
About Octavian:
https://www.octavian.ai/about
https://medium.com/octavian-ai/our-mission-eeb434d8cb91
3. Today I’ll cover:
1. What is machine reasoning
2. Knowledge
3. Neural reasoning approaches
4. Iterative reasoning with MACnets
4. Goals for this session
• Introduce you to interesting ideas
• Get you excited about neural reasoning!
• We will not cover the full technical details, but I’ve
included links to all of the material
6. What is machine reasoning?
• A system that answers questions about knowledge using
deduction or induction
• That is, doing something more complex than search-and-
retrieve
• E.g. “Who is most likely to win the world cup”
• E.g. “Which bus line visits the most pubs?”
• Can be single-shot (e.g. Google search) or interactive
(e.g. Chat bot)
7. What is machine reasoning?
• Many systems do reasoning on a limited set of questions
• We’re most interested in general(izeable) systems: How can we
answer a broad range of questions?
Google
maps
8. Knowledge
• Reasoning requires knowledge
• Many ways to represent knowledge:
• Sequences (e.g. language strings)
• Images
• Vectors
• Graphs
10. Knowledge graphs
• Can represent a diverse range of
information
• Can be continually extended
• Google’s Knowledge Graph has
over 1bn entities and helps
answer 30bn monthly searches
• Wikidata contains 50bn entities
and is freely available
12. A brief survey of neural reasoning approaches
Recurrent cell
(LSTM/GRU)
RNNTranslation
Question ->
Answer
Question ->
Database Query
Neural turing
machines
MACnets
Interactive Question
Answering
Reinforcement
learning
MacGraph
Note: These are just selected highlights, there are many many variations of these ideas in the literature
18. The challenge: answer questions
about images
• The CLEVR dataset
• Synthetic
• Question, Answer, Image
• Question comes as
English and Functional
program
Image source: COMPOSITIONAL ATTENTION NETWORKS FOR MACHINE REASONING
19. Memory, Attention, and Composition
network (MACnet)
• Introduced by Drew
Hudson and Christopher
Manning at ICLR April
2018
• Answer questions on
CLEVR dataset to 99%
accuracy (humans get
93%)
Image source: COMPOSITIONAL ATTENTION NETWORKS FOR MACHINE REASONING
20. Key idea: use RNN iteration as
instruction cycle (from Neural Turing Machines)
Input
Answer
21. Key idea: Attention over image and
text gives interpretability
Image source: COMPOSITIONAL ATTENTION NETWORKS FOR MACHINE REASONING
22. Key idea: Use question words as the
instructions
Attention
Question
words
Control state Next control
state
Image source: COMPOSITIONAL ATTENTION NETWORKS FOR MACHINE REASONING
Can we achieve recursion/algorithms
through self-talk?
23. Key idea: have separate control and
memory states
Memory
Control c1 c2 c3 c4
m1 m2 m3 m4
Time →
24. Key idea: Preprocess image and text
through existing architectures
Image passed through
ResNet101 Text passed through biLSTM
“question words”
“question”
25. MAC network performs iterative
reasoning
Attention
Image source: COMPOSITIONAL ATTENTION NETWORKS FOR MACHINE REASONING
Attention
32. CLEVR-Graph: Answering questions
about mass transit graphs
• Synthetic dataset
• Question, Answer, Graph
triples
• Each question comes as
English, Functional
program and Cypher
34. Question to cypher query translation
How clean is Spoon Street?
MATCH (var1)
WHERE var1.name="Spoon Street"
WITH 1 AS foo, var1.cleanliness AS var2
RETURN var2
= DIRTY
38. In reality the output elements often
derive from specific input elements
Image source: Distil
39. This input-output mapping is hard work for the
RNN since everything is encoded together
Image source: TensorFlow tutorials
40. … therefore use attention
Image source: Distil
FREQUENTLY USED TECHNIQUE
41. … therefore use attention Softmax
becomes
Attention mechanism
• Normalised sum of
exponentials
• Result sums to 1.0
• “Increases contrast”
FREQUENTLY USED TECHNIQUE
Image source: Distil
42. … therefore use attention
Image source: TensorFlow tutorials
43. Seq2seq Results
• 100% translation accuracy on (reasonably simple)
CLEVR-graph question – cypher pairs
• Google: “Human evaluations show that [Seq2Seq] has
reduced translation errors by 60% compared to our
previous phrase-based system”
50. Attention
1. Compare query to each element in array giving scores
2. Apply softmax to normalise and focus scores
3. Multiply each element by its score
4. Sum all the elements
51. Neural graph memory
• Store a table of nodes and table of edges
• Use attention (aka content addressing) to retrieve data
from_node edge_props to_node
from_node edge_props to_node
from_node edge_props to_node
from_node edge_props to_node
from_node edge_props to_node
node_id node_props
node_id node_props
node_id node_props
node_id node_props
Nodes Edges
52. Let RNN cell read from a memory
1 2 3 4
Image source: Distil
53. What is a neural network?
• Neural network is one which transforms signals through trainable
layers
54. What is a neural network?
• Trained via backpropagation of errors and gradient descent
ERROR
55. LSTM cell
Long term state
passes straight
through
Short term state
Short term state
Long term state
“If you consider the LSTM cell as a black box, it can be used very much like a basic cell, except it will perform much
better; training will converge faster and it will detect long-term dependencies in the data.” -- Safari Books
https://www.safaribooksonline.com/library/view/neural-networks-and/9781492037354/ch04.html