|
|
Preface |
|
xxvii | |
P.1 Emphasis on Foundations |
|
xxvii | |
P.2 Glimpse of History |
|
xxix | |
P.3 Organization of the Text |
|
xxxi | |
P.4 How to Use the Text |
|
xxxiv | |
P.5 Simulation Datasets |
|
xxxvii | |
P.6 Acknowledgments |
|
xl | |
Notation |
|
xlv | |
|
50 Least-Squares Problems |
|
|
2165 | (56) |
|
|
2165 | (5) |
|
|
2170 | (17) |
|
50.3 Recursive Least-Squares |
|
|
2187 | (8) |
|
|
2195 | (2) |
|
50.5 Commentaries and Discussion |
|
|
2197 | (13) |
|
|
2202 | (8) |
|
50.A Minimum-Norm Solution |
|
|
2210 | (1) |
|
50.B Equivalence in Linear Estimation |
|
|
2211 | (1) |
|
50.C Extended Least-Squares |
|
|
2212 | (9) |
|
|
2217 | (4) |
|
|
2221 | (39) |
|
|
2222 | (3) |
|
|
2225 | (5) |
|
|
2230 | (4) |
|
|
2234 | (8) |
|
51.5 Commentaries and Discussion |
|
|
2242 | (8) |
|
|
2245 | (5) |
|
51.A Constrained Formulations for Regularization |
|
|
2250 | (3) |
|
51.B Expression for LASSO Solution |
|
|
2253 | (7) |
|
|
2257 | (3) |
|
|
2260 | (30) |
|
|
2262 | (3) |
|
|
2265 | (3) |
|
52.3 Performance Guarantee |
|
|
2268 | (2) |
|
|
2270 | (9) |
|
52.5 Commentaries and Discussion |
|
|
2279 | (5) |
|
|
2282 | (2) |
|
52.A Performance of the NN Classifier |
|
|
2284 | (6) |
|
|
2287 | (3) |
|
|
2290 | (23) |
|
|
2290 | (3) |
|
|
2293 | (9) |
|
|
2302 | (8) |
|
53.4 Commentaries and Discussion |
|
|
2310 | (3) |
|
|
2310 | (1) |
|
|
2311 | (2) |
|
|
2313 | (28) |
|
54.1 Trees and Attributes |
|
|
2313 | (4) |
|
54.2 Selecting Attributes |
|
|
2317 | (10) |
|
|
2327 | (8) |
|
54.4 Commentaries and Discussion |
|
|
2335 | (6) |
|
|
2337 | (1) |
|
|
2338 | (3) |
|
55 Naive Bayes Classifier |
|
|
2341 | (16) |
|
55.1 Independence Condition |
|
|
2341 | (2) |
|
55.2 Modeling the Conditional Distribution |
|
|
2343 | (1) |
|
55.3 Estimating the Priors |
|
|
2344 | (7) |
|
55.4 Gaussian Nai've Classifier |
|
|
2351 | (1) |
|
55.5 Commentaries and Discussion |
|
|
2352 | (5) |
|
|
2354 | (2) |
|
|
2356 | (1) |
|
56 Linear Discriminant Analysis |
|
|
2357 | (26) |
|
56.1 Discriminant Functions |
|
|
2357 | (3) |
|
56.2 Linear Discriminant Algorithm |
|
|
2360 | (2) |
|
56.3 Minimum Distance Classifier |
|
|
2362 | (3) |
|
56.4 Fisher Discriminant Analysis |
|
|
2365 | (13) |
|
56.5 Commentaries and Discussion |
|
|
2378 | (5) |
|
|
2379 | (2) |
|
|
2381 | (2) |
|
57 Principal Component Analysis |
|
|
2383 | (41) |
|
|
2383 | (2) |
|
57.2 Dimensionality Reduction |
|
|
2385 | (11) |
|
57.3 Subspace Interpretations |
|
|
2396 | (3) |
|
|
2399 | (5) |
|
|
2404 | (7) |
|
57.6 Commentaries and Discussion |
|
|
2411 | (6) |
|
|
2414 | (3) |
|
57.A Maximum Likelihood Solution |
|
|
2417 | (4) |
|
57.B Alternative Optimization Problem |
|
|
2421 | (3) |
|
|
2422 | (2) |
|
|
2424 | (33) |
|
58.1 Learning Under Regularization |
|
|
2425 | (5) |
|
58.2 Learning Under Constraints |
|
|
2430 | (2) |
|
|
2432 | (3) |
|
58.4 Nonnegative Matrix Factorization |
|
|
2435 | (8) |
|
58.5 Commentaries and Discussion |
|
|
2443 | (5) |
|
|
2446 | (2) |
|
58.A Orthogonal Matching Pursuit |
|
|
2448 | (9) |
|
|
2454 | (3) |
|
|
2457 | (42) |
|
|
2457 | (2) |
|
59.2 Logistic Empirical Risk |
|
|
2459 | (5) |
|
59.3 Multiclass Classification |
|
|
2464 | (7) |
|
|
2471 | (5) |
|
|
2476 | (8) |
|
59.6 Commentaries and Discussion |
|
|
2484 | (8) |
|
|
2488 | (4) |
|
59.A Generalized Linear Models |
|
|
2492 | (7) |
|
|
2496 | (3) |
|
|
2499 | (31) |
|
|
2499 | (2) |
|
60.2 Perceptron Empirical Risk |
|
|
2501 | (6) |
|
60.3 Termination in Finite Steps |
|
|
2507 | (2) |
|
|
2509 | (4) |
|
60.5 Commentaries and Discussion |
|
|
2513 | (7) |
|
|
2517 | (3) |
|
|
2520 | (6) |
|
|
2526 | (4) |
|
|
2528 | (2) |
|
61 Support Vector Machines |
|
|
2530 | (27) |
|
|
2530 | (11) |
|
61.2 Convex Quadratic Program |
|
|
2541 | (5) |
|
|
2546 | (5) |
|
61.4 Commentaries and Discussion |
|
|
2551 | (6) |
|
|
2553 | (1) |
|
|
2554 | (3) |
|
|
2557 | (30) |
|
|
2557 | (4) |
|
|
2561 | (11) |
|
|
2572 | (8) |
|
62.4 Commentaries and Discussion |
|
|
2580 | (7) |
|
|
2581 | (3) |
|
|
2584 | (3) |
|
|
2587 | (63) |
|
|
2587 | (3) |
|
|
2590 | (2) |
|
63.3 Polynomial and Gaussian Kernels |
|
|
2592 | (3) |
|
63.4 Kernel-Based Perceptron |
|
|
2595 | (9) |
|
|
2604 | (6) |
|
63.6 Kernel-Based Ridge Regression |
|
|
2610 | (3) |
|
63.7 Kernel-Based Learning |
|
|
2613 | (5) |
|
|
2618 | (5) |
|
63.9 Inference under Gaussian Processes |
|
|
2623 | (11) |
|
63.10 Commentaries and Discussion |
|
|
2634 | (16) |
|
|
2640 | (6) |
|
|
2646 | (4) |
|
|
2650 | (65) |
|
64.1 Curse of Dimensionality |
|
|
2650 | (4) |
|
64.2 Empirical Risk Minimization |
|
|
2654 | (3) |
|
64.3 Generalization Ability |
|
|
2657 | (5) |
|
|
2662 | (1) |
|
64.5 Bias Variance Trade-off' |
|
|
2663 | (4) |
|
64.6 Surrogate Risk Functions |
|
|
2667 | (5) |
|
64.7 Commentaries and Discussion |
|
|
2672 | (14) |
|
|
2679 | (7) |
|
64.A VC Dimension for Linear Classifiers |
|
|
2686 | (2) |
|
|
2688 | (6) |
|
64.C Vapnik-Chervonenkis Bound |
|
|
2694 | (7) |
|
64.D Rademacher Complexity |
|
|
2701 | (14) |
|
|
2711 | (4) |
|
65 Feedforward Neural Networks |
|
|
2715 | (82) |
|
65.1 Activation Functions |
|
|
2716 | (5) |
|
65.2 Feedforward Networks |
|
|
2721 | (7) |
|
65.3 Regression and Classification |
|
|
2728 | (3) |
|
65.4 Calculation of Gradient Vectors |
|
|
2731 | (8) |
|
65.5 Backpropagation Algorithm |
|
|
2739 | (11) |
|
|
2750 | (4) |
|
65.7 Regularized Cross-Entropy Risk |
|
|
2754 | (14) |
|
65.8 Slowdown in Learning |
|
|
2768 | (1) |
|
|
2769 | (7) |
|
65.10 Commentaries and Discussion |
|
|
2776 | (11) |
|
|
2781 | (6) |
|
65.A Derivation of Batch Normalization Algorithm |
|
|
2787 | (10) |
|
|
2792 | (5) |
|
|
2797 | (41) |
|
66.1 Pre-Training Using Stacked Autoencoders |
|
|
2797 | (5) |
|
66.2 Restricted Boltzmann Machines |
|
|
2802 | (7) |
|
66.3 Contrastive Divergence |
|
|
2809 | (11) |
|
66.4 Pre-Training using Stacked RBMs |
|
|
2820 | (3) |
|
66.5 Deep Generative Model |
|
|
2823 | (7) |
|
66.6 Commentaries and Discussion |
|
|
2830 | (8) |
|
|
2834 | (2) |
|
|
2836 | (2) |
|
67 Convolutional Networks |
|
|
2838 | (67) |
|
|
2839 | (21) |
|
|
2860 | (9) |
|
|
2869 | (7) |
|
|
2876 | (9) |
|
67.5 Commentaries and Discussion |
|
|
2885 | (3) |
|
|
2887 | (1) |
|
67.A Derivation of Training Algorithm |
|
|
2888 | (17) |
|
|
2903 | (2) |
|
|
2905 | (62) |
|
68.1 Variational Autoeneoders |
|
|
2905 | (8) |
|
68.2 Training Variational Autoeneoders |
|
|
2913 | (17) |
|
68.3 Conditional Variational Autoeneoders |
|
|
2930 | (5) |
|
68.4 Generative Adversarial Networks |
|
|
2935 | (8) |
|
|
2943 | (13) |
|
|
2956 | (4) |
|
68.7 Commentaries and Discussion |
|
|
2960 | (7) |
|
|
2963 | (1) |
|
|
2964 | (3) |
|
|
2967 | (75) |
|
69.1 Recurrent Neural Networks |
|
|
2967 | (6) |
|
69.2 Backpropagation Through Time |
|
|
2973 | (22) |
|
69.3 Bidirectional Recurrent Networks |
|
|
2995 | (7) |
|
69.4 Vanishing and Exploding Gradients |
|
|
3002 | (2) |
|
69.5 Long Short-Term Memory Networks |
|
|
3004 | (22) |
|
|
3026 | (8) |
|
69.7 Gated Recurrent Units |
|
|
3034 | (2) |
|
69.8 Commentaries and Discussion |
|
|
3036 | (6) |
|
|
3037 | (3) |
|
|
3040 | (2) |
|
|
3042 | (23) |
|
|
3042 | (4) |
|
70.2 Sensitivity Analysis |
|
|
3046 | (3) |
|
70.3 Gradient X Input Analysis |
|
|
3049 | (1) |
|
|
3050 | (10) |
|
70.5 Commentaries and Discussion |
|
|
3060 | (5) |
|
|
3061 | (1) |
|
|
3062 | (3) |
|
|
3065 | (34) |
|
|
3066 | (4) |
|
71.2 Fast Gradient Sign Method |
|
|
3070 | (5) |
|
71.3 Jacobian Saliency Map Approach |
|
|
3075 | (3) |
|
|
3078 | (10) |
|
|
3088 | (3) |
|
|
3091 | (2) |
|
71.7 Commentaries and Discussion |
|
|
3093 | (6) |
|
|
3095 | (1) |
|
|
3096 | (3) |
|
|
3099 | (50) |
|
|
3099 | (2) |
|
|
3101 | (11) |
|
|
3112 | (6) |
|
|
3118 | (18) |
|
72.5 Commentaries and Discussion |
|
|
3136 | (2) |
|
|
3136 | (2) |
|
|
3138 | (6) |
|
72.B Prototypical Networks |
|
|
3144 | (5) |
|
|
3146 | (3) |
Author Index |
|
3149 | (24) |
Subject, Index |
|
3173 | |