1 Introduction and Motivation |
|
1 | (10) |
|
1.1 Introduction to Neural Networks |
|
|
2 | (2) |
|
|
2 | (1) |
|
1.1.2 Tasks Where Neural Networks Succeed |
|
|
3 | (1) |
|
1.2 Theoretical Contributions to Neural Networks |
|
|
4 | (3) |
|
1.2.1 Universal Approximation Properties |
|
|
4 | (1) |
|
1.2.2 Vanishing and Exploding Gradients |
|
|
5 | (1) |
|
|
6 | (1) |
|
1.3 Mathematical Representations |
|
|
7 | (1) |
|
|
7 | (1) |
|
|
8 | (3) |
2 Mathematical Preliminaries |
|
11 | (12) |
|
2.1 Linear Maps, Bilinear Maps, and Adjoints |
|
|
12 | (1) |
|
|
13 | (2) |
|
|
13 | (1) |
|
|
14 | (1) |
|
2.3 Parameter-Dependent Maps |
|
|
15 | (2) |
|
|
16 | (1) |
|
2.3.2 Higher-Order Derivatives |
|
|
16 | (1) |
|
2.4 Elementwise Functions |
|
|
17 | (5) |
|
|
18 | (1) |
|
2.4.2 Derivatives of Elementwise Functions |
|
|
19 | (1) |
|
2.4.3 The Softmax and Elementwise Log Functions |
|
|
20 | (2) |
|
|
22 | (1) |
|
|
22 | (1) |
3 Generic Representation of Neural Networks |
|
23 | (12) |
|
3.1 Neural Network Formulation |
|
|
24 | (1) |
|
3.2 Loss Functions and Gradient Descent |
|
|
25 | (4) |
|
|
25 | (1) |
|
|
26 | (1) |
|
|
27 | (1) |
|
3.2.4 Gradient Descent Step Algorithm |
|
|
28 | (1) |
|
3.3 Higher-Order Loss Function |
|
|
29 | (4) |
|
3.3.1 Gradient Descent Step Algorithm |
|
|
32 | (1) |
|
|
33 | (1) |
|
|
34 | (1) |
4 Specific Network Descriptions |
|
35 | (24) |
|
4.1 Multilayer Perceptron |
|
|
36 | (4) |
|
|
36 | (1) |
|
4.1.2 Single-Layer Derivatives |
|
|
37 | (1) |
|
4.1.3 Loss Functions and Gradient Descent |
|
|
38 | (2) |
|
4.2 Convolutional Neural Networks |
|
|
40 | (12) |
|
4.2.1 Single Layer Formulation |
|
|
40 | (10) |
|
|
50 | (1) |
|
4.2.3 Single-Layer Derivatives |
|
|
50 | (1) |
|
4.2.4 Gradient Descent Step Algorithm |
|
|
51 | (1) |
|
|
52 | (5) |
|
|
52 | (1) |
|
4.3.2 Single-Layer Formulation |
|
|
53 | (1) |
|
4.3.3 Single-Layer Derivatives |
|
|
54 | (1) |
|
4.3.4 Loss Functions and Gradient Descent |
|
|
55 | (2) |
|
|
57 | (1) |
|
|
58 | (1) |
5 Recurrent Neural Networks |
|
59 | (22) |
|
5.1 Generic RNN Formulation |
|
|
59 | (11) |
|
|
60 | (1) |
|
5.1.2 Hidden States, Parameters, and Forward Propagation |
|
|
60 | (2) |
|
5.1.3 Prediction and Loss Functions |
|
|
62 | (1) |
|
5.1.4 Loss Function Gradients |
|
|
62 | (8) |
|
|
70 | (6) |
|
|
70 | (1) |
|
5.2.2 Single-Layer Derivatives |
|
|
71 | (1) |
|
5.2.3 Backpropagation Through Time |
|
|
72 | (2) |
|
5.2.4 Real-Time Recurrent Learning |
|
|
74 | (2) |
|
|
76 | (2) |
|
|
77 | (1) |
|
|
78 | (1) |
|
|
78 | (1) |
|
|
78 | (1) |
|
|
79 | (2) |
6 Conclusion and Future Work |
|
81 | (2) |
|
|
82 | (1) |
Glossary |
|
83 | |