This document provides an overview of deep learning 1.0 and discusses potential directions for deep learning 2.0. It summarizes limitations of deep learning 1.0 such as lack of reasoning abilities and discusses how incorporating memory and reasoning capabilities could help address these limitations. The document outlines several approaches being explored for neural memory and reasoning, including memory networks, neural Turing machines, and self-attentive associative memories. It argues that memory and reasoning will be important for developing more human-like artificial general intelligence.
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Deep learning 1.0 and Beyond, Part 2
1. 16/11/2020 1
A/Prof Truyen Tran
With contribution from Vuong Le, Hung
Le, Thao Le, Tin Pham & Dung Nguyen
Deakin University
December 2020
Deep learning 1.0 and Beyond
A tutorial
Part II
@truyenoz
truyentran.github.io
truyen.tran@deakin.edu.au
letdataspeak.blogspot.com
goo.gl/3jJ1O0
linkedin.com/in/truyen-tran
2. 16/11/2020 2
“[By 2023] …
Emergence of the
generally agreed upon
"next big thing" in AI
beyond deep learning.”
Rodney Brooks
rodneybrooks.com
“[…] general-purpose computer
programs, built on top of far richer
primitives than our current
differentiable layers—[…] we will
get to reasoning and abstraction,
the fundamental weakness of
current models.”
Francois Chollet
blog.keras.io
“Software 2.0 is written in
neural network weights”
Andrej Karpathy
medium.com/@karpathy
3. DL 1.0 has been fantastic, but has serious limitations
(but not always its fault)
DL builds glorified function
approximators using gradient
descent
Great at interpolating. Think GPT-X.
One-step input/output mapping
Require differentiability
Little systematic generalization
#REF: Marcus, Gary. "Deep learning: A critical appraisal." arXiv preprint arXiv:1801.00631 (2018).
Data hungry to cover all possible
patterns
Computation demanding to process large
data
Energy inefficient
Prohibitive for small labs to compete
Engineering effort is huge Technical
debt
A little too much heuristic. Lack of
theory.
4. DL 1.0 has been fantastic, but has serious limitations
(but not always its fault) (cont.)
#REF: Marcus, Gary. "Deep learning: A critical appraisal." arXiv preprint arXiv:1801.00631 (2018).
Lack natural mechanism to
incorporate prior knowledge, e.g.,
common sense
Assume stationaries
Changes cause trouble Expensive
retraining
No causality Random correlations
can be “learnt”
Sensitive to adversarial attacks
Lack of reasoning
Pure pattern recognizer
Little explainability
Trust issue
To be fair, may of these problems are
common issues of statistical
learning!
5. DL 1.0 is great, but it is struggled to solve many
AI/ML problems
Learn to organize and remember ultra-
long sequences
Learn to generate arbitrary objects, with
zero supports
Reasoning about object, relation,
causality, self and other agents
Imagine scenarios, act on the world and
learn from the feedbacks
Continual learning, never-ending, across
tasks, domains, representations
Learn by socializing
Learn just by observing and self-prediction
Organizing and reasoning about (common-
sense) knowledge
Automated discovery of physical laws
Solve genetics, neuroscience and
healthcare
Automate physical sciences
Automate software engineering
6. Neural memories
Theory of mind
Neural reasoning
A system view
Deep learning 2.0
16/11/2020 6
Classic models
Transformers
Graph neural networks
Unsupervised learning
Deep learning 1.0
Agenda
7. 1960s-1990s
Hand-crafting rules,
domain-specific, logic-
based
High in reasoning
Can’t scale.
Fail on unseen cases.
16/11/2020 7
2020s-2030s
Learning + reasoning, general
purpose, human-like
Has contextual and common-
sense reasoning
Requires less data
Adapt to change
Explainable
1990s-present
Machine learning, general
purpose, statistics-based
Low in reasoning
Needs lots of data
Less adaptive
Little explanation
Photo credit: DARPA
8. 8
System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
• Hypothetical thought
• Decoupled from data rep
Single
Memory
• Facts
• Semantics
• Events and relational
associations
• Working space –
temporal buffer
Pattern
recognition
Reasoning
9. Current neural networks offerings
16/11/2020 9
No storage of intermediate results
Little choices over what to compute and what to use
Lack of conditional computation
Little support for complex chained reasoning
Little support for rapid switching of tasks
Credit: hexahedria
10. What is missing? A memory
Use multiple pieces of information
Store intermediate results (RAM like)
Episodic recall of previous tasks (Tape like)
Encode/compress & generate/decompress
long sequences
Learn/store programs (e.g., fast weights)
Store and query external knowledge
Spatial memory for navigation
16/11/2020 10
Rare but important events (e.g., snake
bite)
Needed for complex control
Short-cuts for ease of gradient
propagation = constant path length
Division of labour: program, execution
and storage
Working-memory is an indicator of IQ in
human
11. Memory enables reasoning
Expert reasoning was enabled by a large long-term
memory, acquired through experience
Working memory for analytic reasoning
WM is a system to support information binding to a coordinate
system
Reasoning as deliberative hypothesis testing memory-retrieval
based hypothesis generation
Higher order cognition = creating & manipulating relations
representation of premises, temporarily stored in WM.
Reasoning over concepts & relations requires semantic
memory
Memory is critical for episodic future thinking (mental
simulation)
16/11/2020 11
“[…] one cannot hope to
understand reasoning
without understanding the
memory processes […]”
(Thompson and Feeney, 2014)
12. Neural memories
Theory of mind
Neural reasoning
A system view
Deep learning 2.0
16/11/2020 12
Classic models
Transformers
Graph neural networks
Unsupervised learning
Deep learning 1.0
Agenda
13. Recall: Memory networks
Input is a set Load into memory,
which is NOT updated.
State is a RNN with attention reading
from inputs
Concepts: Query, key and content +
Content addressing.
Deep models, but constant path length
from input to output.
Equivalent to a RNN with shared input
set.
16/11/2020 13
Sukhbaatar, Sainbayar, Jason Weston, and Rob
Fergus. "End-to-end memory networks." Advances in
neural information processing systems. 2015.
14. MANN: Memory-Augmented Neural Networks
(a constant path length)
Long-term dependency
E.g., outcome depends on the far past
Memory is needed (e.g., as in LSTM)
Complex program requires multiple computational steps
Each step can be selective (attentive) to certain memory cell
Operations: Encoding | Decoding | Retrieval
15. 16/11/2020 15
Learning a Turing machine
Can we learn a (neural)
program that learns to
program from data?
Visual reasoning is a
specific program of two
inputs (visual, linguistic)
16. Neural Turing machine (NTM)
(simulating a differentiable Turing machine)
A controller that takes
input/output and talks to an
external memory module.
Memory has read/write
operations.
The main issue is where to write,
and how to update the memory
state.
All operations are differentiable.
Source: rylanschaeffer.github.io
18. 16/11/2020 18
NTM unrolled in time with LSTM as controller
#Ref: https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-turing-machines-in-lasagne-2cdce6837315
19. MANN for reasoning
Three steps:
Store data into memory
Read query, process sequentially, consult memory
Output answer
Behind the scene:
Memory contains data & results of intermediate steps
Drawbacks of current MANNs:
No memory of controllers Less modularity and
compositionality when query is complex
No memory of relations Much harder to chain predicates.
16/11/2020 19
Source: rylanschaeffer.github.io
20. Failures of item-only MANNs for reasoning
Relational representation is NOT stored Can’t reuse later in the
chain
A single memory of items and relations Can’t understand how
relational reasoning occurs
The memory-memory relationship is coarse since it is represented as
either dot product, or weighted sum.
16/11/2020 20
21. Self-attentive associative memories (SAM)
Learning relations automatically over time
16/11/2020 21
Hung Le, Truyen Tran, Svetha Venkatesh, “Self-
attentive associative memory”, ICML'20.
24. Neural memories
Theory of mind
Neural reasoning
A system view
Deep learning 2.0
16/11/2020 24
Classic models
Transformers
Graph neural networks
Unsupervised learning
Deep learning 1.0
Agenda
25. 25
What color is the thing with the same
size as the blue cylinder?
blue
• Requires multi-step
reasoning: find blue cylinder
➔ locate other object of the
same size ➔ determine its
color (green).
A testbed: Visual QA
26. 26
Reasoning
Qualitative spatial
reasoning
Relational, temporal
inference
Commonsense
Object recognition
Scene graphs
Computer Vision
Natural Language
Processing
Machine
learning
Visual QA
Parsing
Symbol binding
Systematic generalisation
Learning to classify
entailment
Unsupervised
learning
Reinforcement
learning
Program synthesis
Action graphs
Event detection
Object
discovery
27. Learning to reason
Learning is to improve itself by experiencing ~ acquiring
knowledge & skills
Reasoning is to deduce knowledge from previously
acquired knowledge in response to a query (or a cues)
Learning to reason is to improve the ability to decide if a
knowledge base entails a predicate.
E.g., given a video f, determines if the person with the hat turns
before singing.
Hypotheses:
Reasoning as just-in-time program synthesis.
It employs conditional computation.
16/11/2020 27
Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM
(JACM) 44.5 (1997): 697-725.
(Dan Roth; ACM
Fellow; IJCAI John
McCarthy Award)
28. Why neural reasoning?
Reasoning is not necessarily achieved by making
logical inferences
There is a continuity between [algebraically rich
inference] and [connecting together trainable
learning systems]
Central to reasoning is composition rules to guide
the combinations of modules to address new tasks
16/11/2020 28
“When we observe a visual scene, when
we hear a complex sentence, we are
able to explain in formal terms the
relation of the objects in the scene, or
the precise meaning of the sentence
components. However, there is no
evidence that such a formal analysis
necessarily takes place: we see a scene,
we hear a sentence, and we just know
what they mean. This suggests the
existence of a middle layer, already a
form of reasoning, but not yet formal
or logical.”
Bottou, Léon. "From machine learning to machine
reasoning." Machine learning 94.2 (2014): 133-149.
29. The two approaches to neural reasoning
Implicit chaining of predicates through recurrence:
Step-wise query-specific attention to relevant concepts & relations.
Iterative concept refinement & combination, e.g., through a working
memory.
Answer is computed from the last memory state & question embedding.
Explicit program synthesis:
There is a set of modules, each performs an pre-defined operation.
Question is parse into a symbolic program.
The program is implemented as a computational graph constructed by
chaining separate modules.
The program is executed to compute an answer.
16/11/2020 29
30. MACNet: Composition-Attention-
Control
(reasoning by progressive refinement
of selected data)
16/11/2020 30
Hudson, Drew A., and Christopher D. Manning.
"Compositional attention networks for machine
reasoning." arXiv preprint arXiv:1803.03067 (2018).
31. LOGNet: Relational object reasoning with language binding
31
• Key insight: Reasoning is chaining of relational predicates to arrive
at a final conclusion
→ Needs to uncover spatial relations, conditioned on query
→ Chaining is query-driven
→ Objects/language needs binding
→ Object semantics is query-dependent
→ Very thing is end-to-end differentiable
System 1: visual
representation
System 2: High-level
reasoning
Thao Minh Le, Vuong Le, Svetha Venkatesh, and
Truyen Tran, “Dynamic Language Binding in
Relational Visual Reasoning”, IJCAI’20.
32. 32
Language-binding Object Graph Network for VQA
Thao Minh Le, Vuong Le,
Svetha Venkatesh, and
Truyen Tran, “Dynamic
Language Binding in
Relational Visual
Reasoning”, IJCAI’20.
34. Transformer as implicit reasoning
Reasoning as (free-) energy minimisation
The classic Belief Propagation algorithm is minimization algorithm of
the Bethe free-energy!
Transformer has relational, iterative state refinement makes
it a great candidate for implicit relational reasoning.
16/11/2020 34
Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free
energy." Advances in neural information processing systems. 2003.
Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint
arXiv:2008.02217 (2020).
36. 16/11/2020 36
Anonymous, “Neural spatio-temporal reasoning with object-centric self-
supervised learning”, https://openreview.net/pdf?id=rEaz5uTcL6Q
Answer place holder
37. 38
Mao, Jiayuan, et al. "The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences
From Natural Supervision." International Conference on Learning Representations. 2019.
NS-CL: Neuro-Symbolic Concept Learner
Question
parser
38. Extract object proposals from the image from which a feature vector is obtained usingRoI Align. Each
object feature is donated as 𝑜𝑜𝑖𝑖
Object concepts of the same attribute is mapped into a embedding space. For example, sphere, cube, and
cylinder are mapped into shape embedding space. This mapping is a classification problem!
= σ < 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠. 𝑜𝑜𝑜𝑜 𝑜𝑜𝑖𝑖, 𝑣𝑣 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
> −γ /τ
Where
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠. 𝑜𝑜𝑜𝑜 is a neural networks
𝑣𝑣𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
is the concept embedding to be learned of cube
σ : sigmoid function
γ and τ are scaling constants. 39
Concept learner
39. Program execution
Work on object-based visual
representation
An intermediate set of objects is
represented by a vector, as attention mask
over all object in the scene. For example,
Filter(Green_cube) outputs a mask
(0,1,0,0).
The output mask is fed into the next
module (e.g Relate)
40
40. Neural memories
Theory of mind
Neural reasoning
A system view
Deep learning 2.0
16/11/2020 41
Classic models
Transformers
Graph neural networks
Unsupervised learning
Deep learning 1.0
Agenda
41. Contextualized recursive reasoning
Thus far, QA tasks are straightforward and
objective:
Questioner: I will ask about what I don’t know.
Answerer: I will answer what I know.
Real life can be tricky, more subjective:
Questioner: I will ask only questions I think
they can answer.
Answerer 1: This is what I think they want from
an answer.
Answerer 2: I will answer only what I think
they think I can.
16/11/2020 42
Source: religious studies project
We need Theory of Mind to function socially.
42. Sally and Anne
Sally Anne
Sally puts her cake
into her basket
Sally’s basket Anne’s box
Sally goes out of
the room.
Anne takes Sally’s
cake out of Sally’s
basket and put this
cake into Anne’s box
Sally comes back to
the room
1
2
4
5
3
Photo: wikipedia
43. Social dilemma: Stag Hunt games
Difficult decision: individual outcomes (selfish) or group outcomes
(cooperative).
Together hunt Stag (both are cooperative): Both have more meat.
Solely hunt Hare (both are selfish): Both have less meat.
One hunts Stag (cooperative), other hunts Hare (selfish): Only one hunts hare
has meat.
Human evidence: Self-interested but considerate of others
(cultures vary).
Idea: Belief-based guilt-aversion
One experiences loss if it lets other down.
Necessitates Theory of Mind: reasoning about other’s mind.
44. A neural theory of mind
Successor
representationsnext-step action
probability
goal
Rabinowitz, Neil C., et al.
"Machine theory of
mind." arXiv preprint
arXiv:1802.07740 (2018).
45. Theory of Mind Agent with Guilt Aversion (ToMAGA)
Update Theory of Mind
Predict whether other’s behaviour are
cooperative or uncooperative
Updated the zero-order belief (what other will
do)
Update the first-order belief (what other think
about me)
Guilt Aversion
Compute the expected material reward of
other based on Theory of Mind
Compute the psychological rewards, i.e.
“feeling guilty”
Reward shaping: subtract the expected loss of
the other.
Nguyen, Dung, et al. "Theory of Mind with Guilt
Aversion Facilitates Cooperative Reinforcement
Learning." Asian Conference on Machine Learning.
PMLR, 2020.
46. 47
System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
• Hypothetical thought
• Decoupled from data rep
Single
Memory
• Facts
• Semantics
• Events and relational
associations
• Working space –
temporal buffer
Pattern
recognition
Reasoning
47. Neural memories
Theory of mind
Neural reasoning
A system view
Deep learning 2.0
16/11/2020 48
Classic models
Transformers
Graph neural networks
Unsupervised learning
Deep learning 1.0
Summary
49. References
Anonymous, “Neuralspatio-temporal reasoning with object-centric self-supervised learning”,
https://openreview.net/pdf?id=rEaz5uTcL6Q
Bello, Irwan, et al. "Neural optimizer search with reinforcement learning." arXiv preprint arXiv:1709.07417 (2017).
Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new perspectives." IEEE
transactions on pattern analysis and machine intelligence 35.8 (2013): 1798-1828.
Bottou, Léon. "From machine learning to machine reasoning." Machine learning 94.2 (2014): 133-149.
Dehghani, Mostafa, et al. "Universal Transformers." International Conference on Learning Representations. 2018.
Kien Do, Truyen Tran, and Svetha Venkatesh. "Graph Transformation Policy Network for Chemical Reaction
Prediction." KDD’19.
Kien Do, Truyen Tran, Svetha Venkatesh, “Learning deep matrix representations”,arXiv preprint arXiv:1703.01454
Gilmer, Justin, et al. "Neural message passing for quantum chemistry."arXiv preprint arXiv:1704.01212 (2017).
Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." arXiv preprint arXiv:1609.09106 (2016).
Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free energy." Advances in
neural information processing systems. 2003.
Hudson, Drew A., and Christopher D. Manning. "Compositional attention networks for machine reasoning."arXiv preprint
arXiv:1803.03067 (2018).
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and
variation. arXiv preprint arXiv:1710.10196.
Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM (JACM) 44.5 (1997): 697-725.
Hung Le, Truyen Tran, Svetha Venkatesh, “Self-attentive associative memory”, ICML'20.
Hung Le, Truyen Tran, Svetha Venkatesh, “Neural stored-program memory”, ICLR'20.
16/11/2020 50
50. Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic Language Binding in Relational Visual
Reasoning”, IJCAI’20.
Le-Khac, Phuc H., Graham Healy, and Alan F. Smeaton. "Contrastive Representation Learning: A Framework and
Review." arXiv preprint arXiv:2010.05113 (2020).
Liu, Xiao, et al. "Self-supervised learning: Generative or contrastive." arXiv preprint arXiv:2006.08218 (2020). Marcus,
Gary. "Deep learning: A critical appraisal." arXiv preprint arXiv:1801.00631 (2018).
Mao, Jiayuan, et al. "The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural
Supervision." International Conference on Learning Representations. 2019.
Nguyen, Dung, et al. "Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning." Asian
Conference on Machine Learning. PMLR, 2020.
Penmatsa, Aravind, Kevin H. Wang, and Eric Gouaux. "X-ray structure of dopamine transporter elucidates antidepressant
mechanism." Nature 503.7474 (2013): 85-90.
Pham, Trang, et al. "Column Networks for Collective Classification."AAAI. 2017.
Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint arXiv:2008.02217 (2020).
Rabinowitz, Neil C., et al. "Machine theory of mind." arXiv preprint arXiv:1802.07740 (2018).
Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "End-to-end memory networks." Advances in neural information
processing systems. 2015.
Tay, Yi, et al. "Efficient transformers: A survey." arXiv preprint arXiv:2009.06732 (2020).
Xie, Tian, and Jeffrey C. Grossman. "Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable
Prediction of Material Properties." Physical review letters 120.14 (2018): 145301.
You, Jiaxuan, et al. "GraphRNN: Generating realistic graphs with deep auto-regressive models." ICML (2018).
16/11/2020 51
References (cont.)