4. Glossary of AI terms
From Roger Parloff, WHY DEEP LEARNING IS SUDDENLY CHANGING YOUR LIFE (Fortune, 2016).
5. Definitions
What is AI ?
“Artificial intelligence is that activity devoted to making machines
intelligent, and intelligence is that quality that enables an entity to
function appropriately and with foresight in its environment.”
Nils J. Nilsson, The Quest for Artificial Intelligence: A History of Ideas and Achievements (Cambridge, UK: Cambridge University Press, 2010).
“a computerized system that exhibits behavior that is commonly thought
of as requiring intelligence”
Executive Office of the President National Science and Technology Council Committee on Technology: PREPARING FOR THE FUTURE OF
ARTIFICIAL INTELLIGENCE (2016).
“any technique that enables computers to mimic human intelligence”
Roger Parloff, WHY DEEP LEARNING IS SUDDENLY CHANGING YOUR LIFE (Fortune, 2016).
6. My diagram of AI terms
Environment
Data, Rules,
Feedbacks ...
Teaching
Self-Learning,
Engineering
...
AI
y = f(x)
Catf F18f
14. 5 Tribes of AI researchers
Symbolists
(Rule, Logic-based)
Connectionists
(PDP assumption)
Bayesians EvolutionistsAnalogizers
vs.
15. Deep learning has had a long
and rich history !
● 3 re-brandings.
○ Cybernetics ( 1940s ~ 1960s )
○ Artificial Neural Networks ( 1980s ~ 1990s)
○ Deep learning ( 2006 ~ )
16. Nothing new !
● Alexnet 2012
○ based on CNN ( LeCunn, 1989 )
● Alpha Go
○ based on Reinforcement learning and
MCTS ( Sutton, 1998 )
17. So, why now ?
● Computing Power
● Large labelled dataset
● Algorithm
18. Size of neural networks
From Ian Goodfellow, Deep Learning (MIT press, 2016).
Singularity or Transcendence ?
20. Brief history of deep learning
From Roger Parloff, WHY DEEP LEARNING IS SUDDENLY CHANGING YOUR LIFE (Fortune, 2016).
1st Boom 2nd Boom1st Winter
21. Brief history of deep learning
From Roger Parloff, WHY DEEP LEARNING IS SUDDENLY CHANGING YOUR LIFE (Fortune, 2016).
22. Brief history of deep learning
From Roger Parloff, WHY DEEP LEARNING IS SUDDENLY CHANGING YOUR LIFE (Fortune, 2016).
2nd Winter
23. Brief history of deep learning
From Roger Parloff, WHY DEEP LEARNING IS SUDDENLY CHANGING YOUR LIFE (Fortune, 2016).
3rd Boom
24. Brief history of deep learning
From Roger Parloff, WHY DEEP LEARNING IS SUDDENLY CHANGING YOUR LIFE (Fortune, 2016).
25. So, when 3rd winter ?
Nope !!!
● Features are mandatory in every AI
problem.
● Deep learning is cheap learning!
(Though someone can disprove the PDP assumptions,
deep learning is the best practical tool in
representation learning.)
26. Biz trends after Oct.2012.
● 4 big players leading this sector.
● Bloody hiring war.
○ Along the lines of NFL football players.
27. Biz trend after Oct.2012.
● 2 leading research firms.
● 60+ startups
38. So what can we do with AI?
● Simply, it’s sophisticated software
writing software.
True personalization at scale!!!
39. Is AI really necessary ?
“a lot of S&P 500 CEOs wished they had started
thinking sooner than they did about their Internet
strategy. I think five years from now there will be
a number of S&P 500 CEOs that will wish
they’d started thinking earlier about their AI
strategy.”
“AI is the new electricity, just as 100 years ago
electricity transformed industry after industry, AI
will now do the same.”
Andrew Ng., chief scientist at Baidu Research.
53. Parameters of convolution
● Kernel size
○ ( row, col, in_channel, out_channel)
● Padding
○ SAME, VALID, FULL
● Stride
○ if S > 1, use even kernel size F >
S * 2
54. 1 dimensional convolution
pad(P=1) pad(P=1) pad(P=1)
stride(S=1)
kernel
(F=3)
stride(S=2)
● ‘SAME’(or ‘HALF’) pad size = (F - 1) * S / 2
● ‘VALID’ pad size = 0
● ‘FULL’ pad size : not used nowadays
61. Pooling vs. Striding
● Same in the downsample aspect
● But, different in the location aspect
○ Location is lost in Pooling
○ Location is preserved in Striding
● Nowadays, striding is more popular
○ some kind of learnable pooling
62. Kernel initialization
● Random number between -1 and 1
○ Orthogonality ( I.I.D. )
○ Uniform or Gaussian random
● Scale is paramount.
○ Adjust such that out(activation)
values have mean 0 and variance 1
○ If you encounter NaN, that may be
because of ill scale.
65. Initialization guide
● Xavier(or Glorot) initialization
○ http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a
.pdf
● He initialization
○ Good for RELU nonlinearity
○ https://arxiv.org/abs/1502.01852
● Use batch normalization if possible
○ Immune to ill-scaled initialization
67. Guide
● Start from robust baseline
○ 3 choices
■ VGG, Inception-v3, Resnet
● Smaller and deeper
● Towards getting rid of POOL and
final dense layer
● BN and skip connection are popular
82. Summary
● Start from Resnet-50
● Use He’s initialization
● learning rate : 0.001 (with BN), 0.0001
(without BN)
● Use Adam ( should be alpha < beta ) optim
○ alpha=0.9, beta=0.999 (with easy training)
○ alpha=0.5, beta=0.95 (with hard training)
83. Summary
● Minimize hyper-parameter tuning or
architecture modification.
○ Deep learning is highly nonlinear and
count-intuitive
○ Grid or random search is expensive
94. Augmentation
● 3 types of augmentation
○ Traing data augmentation
○ Evaluation augmentation
○ Label augmentation
● Augmentation is mandatory
○ If you have really big data, then augment
data and increase model capacity
95. Training Augmentation
● Random crop/scale
○ random L in range [256, 480]
○ Resize training image, short side = L
○ Sample random 224x224 patch
99. Testing Augmentation
● Multi-scale testing
○ Fully convolutional layer is mandatory
○ Random L in range [224, 640]
○ Resize training image such that short side
= L
○ Average(or max) scores
● Used in Resnet
108. Simple recipe
CE loss
L2(MSE) loss
Joint-learning ( Multi-task learning )
or
Separate learning
From : http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf
131. ESPCN ( Efficient Sub-pixel
CNN)
Periodic
shuffle
Wenzhe, Real-Time Single Image and Video Super-Resolution Using and Efficient Sub-Pixel Convolutional
Neural Network, 2016
132. L2 loss issue
Christian, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, 2016
139. Summary
● Model temporal motion locally ( 3D CONV )
● Model temporal motion globally ( RNN )
● Hybrids of both
● IMHO, RNN will be replaced with 1D
convolution dilated (atrous convolution)
150. Results
( From Ian. J. Fellow et al. Generative Adverserial Networks. 2014. )
( From P. Kingma et al. Auto-Encoding Variational Bayes. 2013. )
151. Pitfalls of GAN
● Very difficult to train.
○ No guarantee to Nash Equilibrium.
■ Tim Salimans et al, Improved Techniques for Training GANS, 2016.
■ Junbo Zhao et al, Energy-based Generative Adversarial Network,
2016.
● Cannot control generated data.
○ How can we condition generating
function G(x)?
152. InfoGAN
Xi Chen et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative
Adversarial Nets, 2016 ( https://arxiv.org/abs/1606.03657 )
● Add mutual Information regularizer for inducing latent
codes to original GAN.
159. Features of GAN
● Unsupervised
○ No labelled data used
● End-to-end
○ No human feature engineering
○ No prior nor assumption
● High fidelity
○ automatic highly non-linear pattern finding
⇒ Currently, SOTA in image generation.