25. 参考文献
• (Hoffer et al., 2017): Train longer, generalize better: closing the
generalization gap in large batch training of neural networks.
• (Wilson et al., 2017): The Value of Adaptive Gradient Methods in
Machine Learning.
• (Dinh et al., 2017): Sharp Minima Can generalize For Deep Nets.
• (Kesker et al., 2017): On Large-Batch Training For Deep Learning:
Generalization Gap and Sharp Minima.
• (Chaudhari et al., 2017): Entropy-sgd: Biasing gradient descent
into wide valleys.
• (Hochreiter & Schmidhuber, 1997): Flat minima.