Preface |
|
xvii | |
Acknowledgements |
|
xxi | |
1 Introduction |
|
1 | (35) |
|
1.1 Towards Intelligent Machines |
|
|
1 | (4) |
|
1.2 Well-Posed Machine Learning Problems |
|
|
5 | (2) |
|
1.3 Examples of Applications in Diverse Fields |
|
|
7 | (5) |
|
|
12 | (6) |
|
1.4.1 Time Series Forecasting |
|
|
15 | (2) |
|
1.4.2 Datasets for Toy (Unreastically Simple) and Realistic Problems |
|
|
17 | (1) |
|
1.5 Domain Knowledge for Productive use of Machine Learning |
|
|
18 | (2) |
|
1.6 Diversity of Data: Structured/Unstructured |
|
|
20 | (1) |
|
|
21 | (4) |
|
1.7.1 Supervised/Directed Learning |
|
|
21 | (1) |
|
1.7.2 Unsupervised/Undirected Learning |
|
|
22 | (1) |
|
1.7.3 Reinforcement Learning |
|
|
22 | (1) |
|
1.7.4 Learning Based on Natural Processes: Evolution, Swarming, and Immune Systems |
|
|
23 | (2) |
|
1.8 Machine Learning and Data Mining |
|
|
25 | (1) |
|
1.9 Basic Linear Algebra in Machine Learning Techniques |
|
|
26 | (8) |
|
1.10 Relevant Resources for Machine Learning |
|
|
34 | (2) |
2 Supervised Learning: Rationale and Basics |
|
36 | (37) |
|
2.1 Learning from Observations |
|
|
36 | (6) |
|
|
42 | (4) |
|
2.3 Why Learning Works: Computational Learning Theory |
|
|
46 | (3) |
|
2.4 Occam's Razor Principle and Overfitting Avoidance |
|
|
49 | (2) |
|
2.5 Heuristic Search in Inductive Learning |
|
|
51 | (5) |
|
2.5.1 Search through Hypothesis Space |
|
|
52 | (1) |
|
|
53 | (2) |
|
2.5.3 Evaluation of a Learning System |
|
|
55 | (1) |
|
2.6 Estimating Generalization Errors |
|
|
56 | (3) |
|
2.6.1 Holdout Method and Random Subsampling |
|
|
56 | (1) |
|
|
57 | (1) |
|
|
58 | (1) |
|
2.7 Metrics for Assessing Regression (Numeric Prediction) Accuracy |
|
|
59 | (2) |
|
|
60 | (1) |
|
2.7.2 Mean Absolute Error |
|
|
60 | (1) |
|
2.8 Metrics for Assessing Classification (Pattern Recognition) Accuracy |
|
|
61 | (7) |
|
2.8.1 Misclassification Error |
|
|
61 | (1) |
|
|
62 | (4) |
|
2.8.3 Comparing Classifiers Based on ROC Curves |
|
|
66 | (2) |
|
2.9 An Overview of the Design Cycle and Issues in Machine Learning |
|
|
68 | (5) |
3 Statistical Learning |
|
73 | (57) |
|
3.1 Machine Learning and Inferential Statistical Analysis |
|
|
73 | (1) |
|
3.2 Descriptive Statistics in Learning Techniques |
|
|
74 | (13) |
|
3.2.1 Representing Uncertainties in Data: Probability Distributions |
|
|
75 | (5) |
|
3.2.2 Descriptive Measures of Probability Distributions |
|
|
80 | (3) |
|
3.2.3 Descriptive Measures from Data Sample |
|
|
83 | (1) |
|
3.2.4 Normal Distributions |
|
|
84 | (1) |
|
|
85 | (2) |
|
3.3 Bayesian Reasoning: A Probabilistic Approach to Inference |
|
|
87 | (15) |
|
|
88 | (5) |
|
3.3.2 Naive Bayes Classifier |
|
|
93 | (5) |
|
3.3.3 Bayesian Belief Networks |
|
|
98 | (4) |
|
3.4 k-Nearest Neighbor (k-NN) Classifier |
|
|
102 | (4) |
|
3.5 Discriminant Functions and Regression Functions |
|
|
106 | (6) |
|
3.5.1 Classification and Discriminant Functions |
|
|
107 | (1) |
|
3.5.2 Numeric Prediction and Regression Functions |
|
|
108 | (1) |
|
3.5.3 Practical Hypothesis Functions |
|
|
109 | (3) |
|
3.6 Linear Regression with Least Square Error Criterion |
|
|
112 | (4) |
|
3.6.1 Minimal Sum-of-Error-Squares and the Pseudoinverse |
|
|
113 | (2) |
|
3.6.2 Gradient Descent Optimization Schemes |
|
|
115 | (1) |
|
3.6.3 Least Mean Square (LMS) Algorithm |
|
|
115 | (1) |
|
3.7 Logistic Regression for Classification Tasks |
|
|
116 | (4) |
|
3.8 Fisher's Linear Discriminant and Thresholding for Classification |
|
|
120 | (6) |
|
3.8.1 Fisher's Linear Discriminant |
|
|
120 | (5) |
|
|
125 | (1) |
|
3.9 Minimum Description Length Principle |
|
|
126 | (4) |
|
3.9.1 Bayesian Perspective |
|
|
127 | (1) |
|
3.9.2 Entropy and Information |
|
|
128 | (2) |
4 Learning With Support Vector Machines (SVM) |
|
130 | (51) |
|
|
130 | (2) |
|
4.2 Linear Discriminant Functions for Binary Classification |
|
|
132 | (4) |
|
|
136 | (5) |
|
4.4 Linear Maximal Margin Classifier for Linearly Separable Data |
|
|
141 | (11) |
|
4.5 Linear Soft Margin Classifier for Overlapping Classes |
|
|
152 | (6) |
|
4.6 Kernel-Induced Feature Spaces |
|
|
158 | (4) |
|
|
162 | (5) |
|
4.8 Regression by Support Vector Machines |
|
|
167 | (7) |
|
|
169 | (3) |
|
4.8.2 Nonlinear Regression |
|
|
172 | (2) |
|
4.9 Decomposing Multiclass Classification Problem Into Binary Classification Tasks |
|
|
174 | (3) |
|
4.9.1 One-Against-All (OAA) |
|
|
175 | (1) |
|
4.9.2 One-Against-One (0A0) |
|
|
176 | (1) |
|
4.10 Variants of Basic SVM Techniques |
|
|
177 | (4) |
5 Learning With Neural Networks (NN) |
|
181 | (64) |
|
5.1 Towards Cognitive Machine |
|
|
181 | (3) |
|
5.1.1 From Perceptrons to Deep Networks |
|
|
182 | (2) |
|
|
184 | (9) |
|
|
184 | (2) |
|
|
186 | (4) |
|
|
190 | (3) |
|
5.3 Network Architectures |
|
|
193 | (7) |
|
5.3.1 Feedforward Networks |
|
|
194 | (5) |
|
|
199 | (1) |
|
|
200 | (6) |
|
5.4.1 Limitations of Perceptron Algorithm for Linear Classification Tasks |
|
|
201 | (1) |
|
5.4.2 Linear Classification using Regression Techniques |
|
|
201 | (2) |
|
5.4.3 Standard Gradient Descent Optimization Scheme: Steepest Descent |
|
|
203 | (3) |
|
5.5 Linear Neuron and the Widrow-Hoff Learning Rule |
|
|
206 | (2) |
|
5.5.1 Stochastic Gradient Descent |
|
|
207 | (1) |
|
5.6 The Error-Correction Delta Rule |
|
|
208 | (5) |
|
5.6.1 Sigmoid Unit: Soft-Limiting Perceptron |
|
|
211 | (2) |
|
5.7 Multi-Layer Perceptron (MLP) Networks and the Error-Backpropagation Algorithm |
|
|
213 | (19) |
|
5.7.1 The Generalized Delta Rule |
|
|
216 | (10) |
|
5.7.2 Convergence and Local Minima |
|
|
226 | (1) |
|
5.7.3 Adding Momentum to Gradient Descent |
|
|
227 | (1) |
|
5.7.4 Heuristic Aspects of the Error-backpropagation Algorithm |
|
|
228 | (4) |
|
5.8 Multi-Class Discrimination with MLP Networks |
|
|
232 | (3) |
|
5.9 Radial Basis Functions (RBF) Networks |
|
|
235 | (6) |
|
5.9.1 Training the RBF Network |
|
|
239 | (2) |
|
5.10 Genetic-Neural Systems |
|
|
241 | (4) |
6 Fuzzy Inference Systems |
|
245 | (83) |
|
|
245 | (3) |
|
6.2 Cognitive Uncertainty and Fuzzy Rule-Base |
|
|
248 | (5) |
|
6.3 Fuzzy Quantification of Knowledge |
|
|
253 | (24) |
|
|
253 | (4) |
|
|
257 | (10) |
|
6.3.3 Fuzzy Set Operations |
|
|
267 | (1) |
|
|
268 | (9) |
|
6.4 Fuzzy Rule-Base and Approximate Reasoning |
|
|
277 | (24) |
|
6.4.1 Quantification of Rules via Fuzzy Relations |
|
|
281 | (2) |
|
6.4.2 Fuzzification of Input |
|
|
283 | (1) |
|
6.4.3 Inference Mechanism |
|
|
284 | (14) |
|
6.4.4 Defuzzification of Inferred Fuzzy Set |
|
|
298 | (3) |
|
6.5 Mamdani Model for Fuzzy Inference Systems |
|
|
301 | (10) |
|
6.5.1 Mobile Robot Navigation Among Moving Obstacles |
|
|
303 | (5) |
|
6.5.2 Mortgage Loan Assessment |
|
|
308 | (3) |
|
6.6 Takagi-Sugeno Fuzzy Model |
|
|
311 | (6) |
|
6.7 Neuro-Fuzzy Inference Systems |
|
|
317 | (7) |
|
|
318 | (2) |
|
6.7.2 How Does an ANFIS Learn? |
|
|
320 | (4) |
|
|
324 | (4) |
7 Data Clustering and Data Transformations |
|
328 | (76) |
|
7.1 Unsupervised Learning |
|
|
328 | (3) |
|
|
329 | (2) |
|
|
331 | (10) |
|
7.2.1 Exploratory Data Analysis: Learning about What is in the Data |
|
|
333 | (1) |
|
7.2.2 Cluster Analysis: Finding Similarities in the Data |
|
|
334 | (5) |
|
7.2.3 Data Transformations: Enhancing the Information Content of the Data |
|
|
339 | (2) |
|
7.3 Overview of Basic Clustering Methods |
|
|
341 | (11) |
|
7.3.1 Partitional Clustering |
|
|
341 | (3) |
|
7.3.2 Hierarchical Clustering |
|
|
344 | (1) |
|
7.3.3 Spectral Clustering |
|
|
345 | (4) |
|
7.3.4 Clustering using Self-Organizing Maps |
|
|
349 | (3) |
|
|
352 | (4) |
|
7.5 Fuzzy K-Means Clustering |
|
|
356 | (6) |
|
7.6 Expectation-Maximization (EM) Algorithm and Gaussian Mixtures Clustering |
|
|
362 | (10) |
|
|
362 | (3) |
|
7.6.2 Gaussian Mixture Models |
|
|
365 | (7) |
|
7.7 Some Useful Data Transformations |
|
|
372 | (5) |
|
|
372 | (2) |
|
|
374 | (1) |
|
7.7.3 Discretizing Numeric Attributes |
|
|
375 | (2) |
|
7.7.4 Attribute Reduction Techniques |
|
|
377 | (1) |
|
7.8 Entropy-Based Method for Attribute Discretization |
|
|
377 | (5) |
|
7.9 Principal Components Analysis (PCA) for Attribute Reduction |
|
|
382 | (8) |
|
7.10 Rough Sets-Based Methods for Attribute Reduction |
|
|
390 | (14) |
|
7.10.1 Rough Set Preliminaries |
|
|
392 | (5) |
|
7.10.2 Analysis of Relevance of Attributes |
|
|
397 | (2) |
|
7.10.3 Reduction of Attributes |
|
|
399 | (5) |
8 Decision Tree Learning |
|
404 | (41) |
|
|
404 | (2) |
|
8.2 Example of a Classification Decision Tree |
|
|
406 | (5) |
|
8.3 Measures of Impurity for Evaluating Splits in Decision Trees |
|
|
411 | (7) |
|
8.3.1 Information Gain/Entropy reduction |
|
|
411 | (5) |
|
|
416 | (1) |
|
|
417 | (1) |
|
8.4 ID3, C4.5, and CART Decision Trees |
|
|
418 | (9) |
|
|
427 | (2) |
|
8.6 Strengths and Weaknesses of Decision-Tree Approach |
|
|
429 | (4) |
|
|
433 | (12) |
9 Business Intelligence and Data Mining: Techniques and Applications |
|
445 | (63) |
|
9.1 An Introduction to Analytics |
|
|
445 | (6) |
|
9.1.1 Machine Learning, Data Mining, and Predictive Analytics |
|
|
448 | (1) |
|
9.1.2 Basic Analytics Techniques |
|
|
449 | (2) |
|
9.2 The CRISP-DM (Cross Industry Standard Process for Data Mining) Model |
|
|
451 | (5) |
|
9.3 Data Warehousing and Online Analytical Processing |
|
|
456 | (11) |
|
|
456 | (2) |
|
|
458 | (3) |
|
9.3.3 Data Warehousing: A General Architecture, and OLAP Operations |
|
|
461 | (5) |
|
9.3.4 Data Mining in the Data Warehouse Environment |
|
|
466 | (1) |
|
9.4 Mining Frequent Patterns and Association Rules |
|
|
467 | (12) |
|
|
469 | (2) |
|
9.4.2 Measures of Strength of Frequent Patterns and Association Rules |
|
|
471 | (2) |
|
9.4.3 Frequent Item Set Mining Methods |
|
|
473 | (4) |
|
9.4.4 Generating Association Rules from Frequent Itemsets |
|
|
477 | (2) |
|
9.5 Intelligent Information Retrieval Systems |
|
|
479 | (11) |
|
|
483 | (3) |
|
|
486 | (2) |
|
|
488 | (2) |
|
9.6 Applications and Trends |
|
|
490 | (8) |
|
9.6.1 Data Mining Applications |
|
|
490 | (5) |
|
|
495 | (3) |
|
9.7 Technologies for Big Data |
|
|
498 | (10) |
|
9.7.1 Emerging Analytic Methods |
|
|
500 | (3) |
|
9.7.2 Emerging Technologies for Higher Levels of Scalability |
|
|
503 | (5) |
Appendix A Genetic Algorithm (GA) For Search Optimization |
|
508 | (19) |
|
A.1 A Simple Overview of Genetics |
|
|
510 | (1) |
|
A.2 Genetics on Computers |
|
|
511 | (4) |
|
A.3 The Basic Genetic Algorithm |
|
|
515 | (9) |
|
A.4 Beyond the Basic Genetic Algorithm |
|
|
524 | (3) |
Appendix B Reinforcement Learning (RL) |
|
527 | (22) |
|
|
527 | (3) |
|
B.2 Elements of Reinforcement Learning |
|
|
530 | (5) |
|
B.3 Basics of Dynamic Programming |
|
|
535 | (7) |
|
B.3.1 Finding Optimal Policies |
|
|
538 | (1) |
|
|
539 | (1) |
|
|
540 | (2) |
|
B.4 Temporal Difference Learning |
|
|
542 | (7) |
|
|
544 | (2) |
|
|
546 | (2) |
|
|
548 | (1) |
Datasets from Real-Life Applications for Machine Learning Experiments |
|
549 | (18) |
Problems |
|
567 | (46) |
References |
|
613 | (10) |
Index |
|
623 | |