Contributors |
|
xi | |
Preface |
|
xv | |
|
1 Scaling Up Machine Learning: Introduction |
|
|
1 | (22) |
|
|
|
|
1.1 Machine Learning Basics |
|
|
2 | (1) |
|
1.2 Reasons for Scaling Up Machine Learning |
|
|
3 | (3) |
|
1.3 Key Concepts in Parallel and Distributed Computing |
|
|
6 | (1) |
|
1.4 Platform Choices and Trade-Offs |
|
|
7 | (2) |
|
1.5 Thinking about Performance |
|
|
9 | (1) |
|
1.6 Organization of the Book |
|
|
10 | (7) |
|
|
17 | (6) |
|
|
19 | (4) |
|
Part One Frameworks for Scaling Up Machine Learning |
|
|
|
2 MapReduce and Its Application to Massively Parallel Learning of Decision Tree Ensembles |
|
|
23 | (26) |
|
|
|
|
|
|
24 | (6) |
|
|
30 | (3) |
|
|
33 | (5) |
|
|
38 | (1) |
|
|
39 | (2) |
|
|
41 | (3) |
|
|
44 | (2) |
|
|
46 | (3) |
|
|
47 | (1) |
|
|
47 | (2) |
|
3 Large-Scale Machine Learning Using DryadLINQ |
|
|
49 | (20) |
|
|
|
|
|
|
3.1 Manipulating Datasets with LINQ |
|
|
49 | (3) |
|
|
52 | (1) |
|
3.3 Running LINQ on a Cluster with DryadLINQ |
|
|
53 | (12) |
|
|
65 | (4) |
|
|
57 | (12) |
|
4 IBM Parallel Machine Learning Toolbox |
|
|
69 | (20) |
|
|
|
|
4.1 Data-Parallel Associative-Commutative Computation |
|
|
70 | (1) |
|
4.2 API and Control Layer |
|
|
71 | (5) |
|
4.3 API Extensions for Distributed-State Algorithms |
|
|
76 | (1) |
|
4.4 Control Layer Implementation and Optimizations |
|
|
77 | (2) |
|
4.5 Parallel Kernel k-Means |
|
|
79 | (1) |
|
4.6 Parallel Decision Tree |
|
|
80 | (3) |
|
4.7 Parallel Frequent Pattern Mining |
|
|
83 | (3) |
|
|
86 | (3) |
|
|
87 | (2) |
|
5 Uniformly Fine-Grained Data-Parallel Computing for Machine Learning Algorithms |
|
|
89 | (20) |
|
|
|
|
|
91 | (2) |
|
5.2 Uniformly Fine-Grained Data-Parallel Computing on a GPU |
|
|
93 | (4) |
|
5.3 The k-Means Clustering Algorithm |
|
|
97 | (2) |
|
5.4 The k-Means Regression Clustering Algorithm |
|
|
99 | (3) |
|
5.5 Implementations and Performance Comparisons |
|
|
102 | (3) |
|
|
105 | (4) |
|
|
105 | (4) |
|
Part Two Supervised and Unsupervised Learning Algorithms |
|
|
|
6 PSVM: Parallel Support Vector Machines with Incomplete Cholesky Factorization |
|
|
109 | (18) |
|
|
|
|
|
|
|
6.1 Interior Point Method with Incomplete Cholesky Factorization |
|
|
112 | (2) |
|
|
114 | (7) |
|
|
121 | (4) |
|
|
125 | (2) |
|
|
125 | (1) |
|
|
125 | (2) |
|
7 Massive SVM Parallelization Using Hardware Accelerators |
|
|
127 | (21) |
|
|
|
|
|
|
|
|
|
128 | (3) |
|
7.2 Implementation of the SMO Algorithm |
|
|
131 | (1) |
|
7.3 Micro Parallelization: Related Work |
|
|
132 | (1) |
|
7.4 Previous Parallelizations on Multicore Systems |
|
|
133 | (3) |
|
7.5 Micro Parallelization: Revisited |
|
|
136 | (1) |
|
7.6 Massively Parallel Hardware Accelerator |
|
|
137 | (8) |
|
|
145 | (1) |
|
|
146 | (2) |
|
|
146 | (2) |
|
8 Large-Scale Learning to Rank Using Boosted Decision Trees |
|
|
148 | (22) |
|
|
|
|
149 | (2) |
|
|
151 | (2) |
|
8.3 Approaches to Distributing LambdaMART |
|
|
153 | (5) |
|
|
158 | (10) |
|
8.5 Conclusions and Future Work |
|
|
168 | (1) |
|
|
169 | (1) |
|
|
169 | (1) |
|
9 The Transform Regression Algorithm |
|
|
170 | (20) |
|
|
|
9.1 Classification, Regression, and Loss Functions |
|
|
171 | (1) |
|
|
172 | (1) |
|
9.3 Motivation and Algorithm Description |
|
|
173 | (4) |
|
9.4 TReg Expansion: Initialization and Termination |
|
|
177 | (7) |
|
9.5 Model Accuracy Results |
|
|
184 | (2) |
|
9.6 Parallel Performance Results |
|
|
186 | (2) |
|
|
188 | (2) |
|
|
189 | (1) |
|
10 Parallel Belief Propagation in Factor Graphs |
|
|
190 | (27) |
|
|
|
|
10.1 Belief Propagation in Factor Graphs |
|
|
191 | (4) |
|
10.2 Shared Memory Parallel Belief Propagation |
|
|
195 | (14) |
|
10.3 Multicore Performance Comparison |
|
|
209 | (1) |
|
10.4 Parallel Belief Propagation on Clusters |
|
|
210 | (4) |
|
|
214 | (3) |
|
|
214 | (1) |
|
|
214 | (3) |
|
11 Distributed Gibbs Sampling for Latent Variable Models |
|
|
217 | (23) |
|
|
|
|
|
|
|
11.1 Latent Variable Models |
|
|
217 | (3) |
|
11.2 Distributed Inference Algorithms |
|
|
220 | (4) |
|
11.3 Experimental Analysis of Distributed Topic Modeling |
|
|
224 | (5) |
|
11.4 Practical Guidelines for Implementation |
|
|
229 | (2) |
|
11.5 A Foray into Distributed Inference for Bayesian Networks |
|
|
231 | (5) |
|
|
236 | (4) |
|
|
237 | (1) |
|
|
237 | (3) |
|
12 Large-Scale Spectral Clustering with MapReduce and MPI |
|
|
240 | (22) |
|
|
|
|
|
|
|
241 | (2) |
|
12.2 Spectral Clustering Using a Sparse Similarity Matrix |
|
|
243 | (2) |
|
12.3 Parallel Spectral Clustering (PSC) Using a Sparse Similarity Matrix |
|
|
245 | (6) |
|
|
251 | (7) |
|
|
258 | (4) |
|
|
259 | (3) |
|
13 Parallelizing Information-Theoretic Clustering Methods |
|
|
262 | (21) |
|
|
|
13.1 Information-Theoretic Clustering |
|
|
264 | (2) |
|
|
266 | (3) |
|
13.3 Sequential Co-clustering |
|
|
269 | (1) |
|
13.4 The DataLoom Algorithm |
|
|
270 | (4) |
|
13.5 Implementation and Experimentation |
|
|
274 | (3) |
|
|
277 | (6) |
|
|
278 | (5) |
|
Part Three Alternative Learning Settings |
|
|
|
14 Parallel Online Learning |
|
|
283 | (24) |
|
|
|
|
|
14.1 Limits Due to Bandwidth and Latency |
|
|
285 | (1) |
|
14.2 Parallelization Strategies |
|
|
286 | (2) |
|
14.3 Delayed Update Analysis |
|
|
288 | (2) |
|
14.4 Parallel Learning Algorithms |
|
|
290 | (8) |
|
|
298 | (4) |
|
|
302 | (1) |
|
|
303 | (4) |
|
|
305 | (2) |
|
15 Parallel Graph-Based Semi-Supervised Learning |
|
|
307 | (24) |
|
|
|
15.1 Scaling SSL to Large Datasets |
|
|
309 | (1) |
|
|
310 | (7) |
|
15.3 Dataset: A 120-Million-Node Graph |
|
|
317 | (2) |
|
15.4 Large-Scale Parallel Processing |
|
|
319 | (8) |
|
|
327 | (4) |
|
|
328 | (3) |
|
16 Distributed Transfer Learning via Cooperative Matrix Factorization |
|
|
331 | (21) |
|
|
|
|
16.1 Distributed Coalitional Learning |
|
|
333 | (10) |
|
16.2 Extension of DisCo to Classification Tasks |
|
|
343 | (7) |
|
|
350 | (2) |
|
|
350 | (2) |
|
17 Parallel Large-Scale Feature Selection |
|
|
352 | (21) |
|
|
|
|
|
353 | (1) |
|
|
354 | (4) |
|
17.3 Parallelizing Feature Selection Algorithms |
|
|
358 | (5) |
|
17.4 Experimental Results |
|
|
363 | (5) |
|
|
368 | (5) |
|
|
368 | (5) |
|
|
|
18 Large-Scale Learning for Vision with GPUs |
|
|
373 | (26) |
|
|
|
|
|
374 | (3) |
|
18.2 Introduction to GPUs |
|
|
377 | (3) |
|
18.3 A Standard Approach Scaled Up |
|
|
380 | (8) |
|
18.4 Feature Learning with Deep Belief Networks |
|
|
388 | (7) |
|
|
395 | (4) |
|
|
395 | (4) |
|
19 Large-Scale FPGA-Based Convolutional Networks |
|
|
399 | (21) |
|
|
|
|
|
|
|
|
19.1 Learning Internal Representations |
|
|
400 | (5) |
|
19.2 A Dedicated Digital Hardware Architecture |
|
|
405 | (11) |
|
|
416 | (4) |
|
|
417 | (3) |
|
20 Mining Tree-Structured Data on Multicore Systems |
|
|
420 | (26) |
|
|
|
20.1 The Multicore Challenge |
|
|
422 | (1) |
|
|
423 | (4) |
|
20.3 Memory Optimizations |
|
|
427 | (4) |
|
20.4 Adaptive Parallelization |
|
|
431 | (6) |
|
20.5 Empirical Evaluation |
|
|
437 | (5) |
|
|
442 | (4) |
|
|
443 | (1) |
|
|
443 | (3) |
|
21 Scalable Parallelization of Automatic Speech Recognition |
|
|
446 | (25) |
|
|
|
|
|
21.1 Concurrency Identification |
|
|
450 | (2) |
|
21.2 Software Architecture and Implementation Challenges |
|
|
452 | (2) |
|
21.3 Multicore and Manycore Parallel Platforms |
|
|
454 | (1) |
|
21.4 Multicore Infrastructure and Mapping |
|
|
455 | (4) |
|
21.5 The Manycore Implementation |
|
|
459 | (3) |
|
21.6 Implementation Profiling and Sensitivity Analysis |
|
|
462 | (2) |
|
21.7 Application-Level Optimization |
|
|
464 | (3) |
|
21.8 Conclusion and Key Lessons |
|
|
467 | (4) |
|
|
468 | (3) |
Subject Index |
|
471 | |