Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

E-grāmata: Scaling up Machine Learning: Parallel and Distributed Approaches

4.00/5 (26 ratings by Goodreads)

Edited by Ron Bekkerman, Edited by Mikhail Bilenko, Edited by John Langford

Formāts: PDF+DRM
Izdošanas datums: 30-Dec-2011
Izdevniecība: Cambridge University Press
Valoda: eng
ISBN-13: 9781139210409

Citas grāmatas par šo tēmu:

Formāts - PDF+DRM
Cena: 53,52 €*
* ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
Ielikt grozā
Pievienot vēlmju sarakstam
Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.

Formāts: PDF+DRM
Izdošanas datums: 30-Dec-2011
Izdevniecība: Cambridge University Press
Valoda: eng
ISBN-13: 9781139210409

Citas grāmatas par šo tēmu:

DRM restrictions

Kopēšana (kopēt/ievietot):

nav atļauts
Drukāšana:

nav atļauts
Lietošana:

Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

Nepieciešamā programmatūra
Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

This book presents an integrated collection of representative approaches for scaling up machine learning and data mining methods on parallel and distributed computing platforms. Demand for parallelizing learning algorithms is highly task-specific: in some settings it is driven by the enormous dataset sizes, in others by model complexity or by real-time performance requirements. Making task-appropriate algorithm and platform choices for large-scale machine learning requires understanding the benefits, trade-offs and constraints of the available options. Solutions presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters, concurrent programming frameworks including CUDA, MPI, MapReduce and DryadLINQ, and learning settings (supervised, unsupervised, semi-supervised and online learning). Extensive coverage of parallelization of boosted trees, SVMs, spectral clustering, belief propagation and other popular learning algorithms, and deep dives into several applications, make the book equally useful for researchers, students and practitioners.

Recenzijas

'One of the landmark achievements of our time is the ability to extract value from large volumes of data. Engineering and algorithmic developments on this front have gelled substantially in recent years, and are quickly being reduced to practice in widely available, reusable forms. This book provides a broad and timely snapshot of the state of developments in scalable machine learning, which should be of interest to anyone who wishes to understand and extend the state of the art in analyzing data.' Joseph M. Hellerstein, University of California, Berkeley 'This is a book that every machine learning practitioner should keep in their library.' Yoram Singer, Google Inc. 'The contributions in this book run the gamut from frameworks for large-scale learning to parallel algorithms to applications, and contributors include many of the top people in this burgeoning subfield. Overall this book is an invaluable resource for anyone interested in the problem of learning from and working with big datasets.' William W. Cohen, Carnegie Mellon University, Pennsylvania 'This unique, timely book provides a 360 degrees view and understanding of both conceptual and practical issues that arise when implementing leading machine learning algorithms on a wide range of parallel and high-performance computing platforms. It will serve as an indispensable handbook for the practitioner of large-scale data analytics and a guide to dealing with BIG data and making sound choices for efficient applying learning algorithms to them. It can also serve as the basis for an attractive graduate course on parallel/distributed machine learning and data mining.' Joydeep Ghosh, University of Texas

Papildus informācija

This integrated collection covers a range of parallelization platforms, concurrent programming frameworks and machine learning settings, with case studies.

Contributors

Preface

1 Scaling Up Machine Learning: Introduction

(22)

Ron Bekkerman

Mikhail Bilenko

John Langford

1.1 Machine Learning Basics

(1)

1.2 Reasons for Scaling Up Machine Learning

(3)

1.3 Key Concepts in Parallel and Distributed Computing

(1)

1.4 Platform Choices and Trade-Offs

(2)

1.5 Thinking about Performance

(1)

1.6 Organization of the Book

(7)

1.7 Bibliographic Notes

(6)

References

(4)

Part One Frameworks for Scaling Up Machine Learning

2 MapReduce and Its Application to Massively Parallel Learning of Decision Tree Ensembles

(26)

Biswanath Panda

Joshua S. Herbach

Sugato Basu

Roberto J. Bayardo

2.1 Preliminaries

(6)

2.2 Example of Planet

(3)

2.3 Technical Details

(5)

2.4 Learning Ensembles

(1)

2.5 Engineering Issues

(2)

2.6 Experiments

(3)

2.7 Related Work

(2)

2.8 Conclusions

(3)

Acknowledgments

(1)

References

(2)

3 Large-Scale Machine Learning Using DryadLINQ

(20)

Mihai Budiu

Dennis Fetterly

Michael Isard

Frank McSherry

Yuan Yu

3.1 Manipulating Datasets with LINQ

(3)

3.2 k-Means in LINQ

(1)

3.3 Running LINQ on a Cluster with DryadLINQ

(12)

3.4 Lessons Learned

(4)

References

(12)

4 IBM Parallel Machine Learning Toolbox

(20)

Edwin Pednault

Elad Yom-Tov

Amol Ghoting

4.1 Data-Parallel Associative-Commutative Computation

(1)

4.2 API and Control Layer

(5)

4.3 API Extensions for Distributed-State Algorithms

(1)

4.4 Control Layer Implementation and Optimizations

(2)

4.5 Parallel Kernel k-Means

(1)

4.6 Parallel Decision Tree

(3)

4.7 Parallel Frequent Pattern Mining

(3)

4.8 Summary

(3)

References

(2)

5 Uniformly Fine-Grained Data-Parallel Computing for Machine Learning Algorithms

(20)

Meichun Hsu

Ren Wu

Bin Zhang

5.1 Overview of a GP-GPU

(2)

5.2 Uniformly Fine-Grained Data-Parallel Computing on a GPU

(4)

5.3 The k-Means Clustering Algorithm

(2)

5.4 The k-Means Regression Clustering Algorithm

(3)

5.5 Implementations and Performance Comparisons

102

(3)

5.6 Conclusions

105

(4)

References

105

(4)

Part Two Supervised and Unsupervised Learning Algorithms

6 PSVM: Parallel Support Vector Machines with Incomplete Cholesky Factorization

109

(18)

Edward Y. Chang

Hongjie Bai

Kaihua Zhu

Hao Wang

Jian Li

Zhihuan Qiu

6.1 Interior Point Method with Incomplete Cholesky Factorization

112

(2)

6.2 PSVM Algorithm

114

(7)

6.3 Experiments

121

(4)

6.4 Conclusion

125

(2)

Acknowledgments

125

(1)

References

125

(2)

7 Massive SVM Parallelization Using Hardware Accelerators

127

(21)

Igor Durdanovic

Eric Cosatto

Hans Peter Graf

Srihari Cadambi

Venkata Jakkula

Srimat Chakradhar

Abhinandan Majumdar

7.1 Problem Formulation

128

(3)

7.2 Implementation of the SMO Algorithm

131

(1)

7.3 Micro Parallelization: Related Work

132

(1)

7.4 Previous Parallelizations on Multicore Systems

133

(3)

7.5 Micro Parallelization: Revisited

136

(1)

7.6 Massively Parallel Hardware Accelerator

137

(8)

7.7 Results

145

(1)

7.8 Conclusion

146

(2)

References

146

(2)

8 Large-Scale Learning to Rank Using Boosted Decision Trees

148

(22)

Krysta M. Svore

Christopher J. C. Burges

8.1 Related Work

149

(2)

8.2 LambdaMART

151

(2)

8.3 Approaches to Distributing LambdaMART

153

(5)

8.4 Experiments

158

(10)

8.5 Conclusions and Future Work

168

(1)

8.6 Acknowledgments

169

(1)

References

169

(1)

9 The Transform Regression Algorithm

170

(20)

Ramesh Natarajan

Edwin Pednault

9.1 Classification, Regression, and Loss Functions

171

(1)

9.2 Background

172

(1)

9.3 Motivation and Algorithm Description

173

(4)

9.4 TReg Expansion: Initialization and Termination

177

(7)

9.5 Model Accuracy Results

184

(2)

9.6 Parallel Performance Results

186

(2)

9.7 Summary

188

(2)

References

189

(1)

10 Parallel Belief Propagation in Factor Graphs

190

(27)

Joseph Gonzalez

Yucheng Low

Carlos Guestrin

10.1 Belief Propagation in Factor Graphs

191

(4)

10.2 Shared Memory Parallel Belief Propagation

195

(14)

10.3 Multicore Performance Comparison

209

(1)

10.4 Parallel Belief Propagation on Clusters

210

(4)

10.5 Conclusion

214

(3)

Acknowledgments

214

(1)

References

214

(3)

11 Distributed Gibbs Sampling for Latent Variable Models

217

(23)

Arthur Asuncion

Padhraic Smyth

Max Welling

David Newman

Ian Porteous

Scott Triglia

11.1 Latent Variable Models

217

(3)

11.2 Distributed Inference Algorithms

220

(4)

11.3 Experimental Analysis of Distributed Topic Modeling

224

(5)

11.4 Practical Guidelines for Implementation

229

(2)

11.5 A Foray into Distributed Inference for Bayesian Networks

231

(5)

11.6 Conclusion

236

(4)

Acknowledgments

237

(1)

References

237

(3)

12 Large-Scale Spectral Clustering with MapReduce and MPI

240

(22)

Wen-Yen Chen

Yangqiu Song

Hongjie Bai

Chih-Jen Lin

Edward Y. Chang

12.1 Spectral Clustering

241

(2)

12.2 Spectral Clustering Using a Sparse Similarity Matrix

243

(2)

12.3 Parallel Spectral Clustering (PSC) Using a Sparse Similarity Matrix

245

(6)

12.4 Experiments

251

(7)

12.5 Conclusions

258

(4)

References

259

(3)

13 Parallelizing Information-Theoretic Clustering Methods

262

(21)

Ron Bekkerman

Martin Scholz

13.1 Information-Theoretic Clustering

264

(2)

13.2 Parallel Clustering

266

(3)

13.3 Sequential Co-clustering

269

(1)

13.4 The DataLoom Algorithm

270

(4)

13.5 Implementation and Experimentation

274

(3)

13.6 Conclusion

277

(6)

References

278

(5)

Part Three Alternative Learning Settings

14 Parallel Online Learning

283

(24)

Daniel Hsu

Nikos Karampatziakis

John Langford

Alex J. Smola

14.1 Limits Due to Bandwidth and Latency

285

(1)

14.2 Parallelization Strategies

286

(2)

14.3 Delayed Update Analysis

288

(2)

14.4 Parallel Learning Algorithms

290

(8)

14.5 Global Update Rules

298

(4)

14.6 Experiments

302

(1)

14.7 Conclusion

303

(4)

References

305

(2)

15 Parallel Graph-Based Semi-Supervised Learning

307

(24)

Jeff Bilmes

Amarnag Subramanya

15.1 Scaling SSL to Large Datasets

309

(1)

15.2 Graph-Based SSL

310

(7)

15.3 Dataset: A 120-Million-Node Graph

317

(2)

15.4 Large-Scale Parallel Processing

319

(8)

15.5 Discussion

327

(4)

References

328

(3)

16 Distributed Transfer Learning via Cooperative Matrix Factorization

331

(21)

Evan Xiang

Nathan Liu

Qiang Yang

16.1 Distributed Coalitional Learning

333

(10)

16.2 Extension of DisCo to Classification Tasks

343

(7)

16.3 Conclusion

350

(2)

References

350

(2)

17 Parallel Large-Scale Feature Selection

352

(21)

Jeremy Kubica

Sameer Singh

Daria Sorokina

17.1 Logistic Regression

353

(1)

17.2 Feature Selection

354

(4)

17.3 Parallelizing Feature Selection Algorithms

358

(5)

17.4 Experimental Results

363

(5)

17.5 Conclusions

368

(5)

References

368

(5)

Part Four Applications

18 Large-Scale Learning for Vision with GPUs

373

(26)

Adam Coates

Rajat Raina

Andrew Y. Ng

18.1 A Standard Pipeline

374

(3)

18.2 Introduction to GPUs

377

(3)

18.3 A Standard Approach Scaled Up

380

(8)

18.4 Feature Learning with Deep Belief Networks

388

(7)

18.5 Conclusion

395

(4)

References

395

(4)

19 Large-Scale FPGA-Based Convolutional Networks

399

(21)

Clement Farabet

Yann LeCun

Koray Kavukcuoglu

Berin Martini

Polina Akselrod

Selcuk Talay

Eugenio Culurciello

19.1 Learning Internal Representations

400

(5)

19.2 A Dedicated Digital Hardware Architecture

405

(11)

19.3 Summary

416

(4)

References

417

(3)

20 Mining Tree-Structured Data on Multicore Systems

420

(26)

Shirish Tatikonda

Srinivasan Parthasarathy

20.1 The Multicore Challenge

422

(1)

20.2 Background

423

(4)

20.3 Memory Optimizations

427

(4)

20.4 Adaptive Parallelization

431

(6)

20.5 Empirical Evaluation

437

(5)

20.6 Discussion

442

(4)

Acknowledgments

443

(1)

References

443

(3)

21 Scalable Parallelization of Automatic Speech Recognition

446

(25)

Jike Chong

Ekaterina Gonina

Kisun You

Kurt Keutzer

21.1 Concurrency Identification

450

(2)

21.2 Software Architecture and Implementation Challenges

452

(2)

21.3 Multicore and Manycore Parallel Platforms

454

(1)

21.4 Multicore Infrastructure and Mapping

455

(4)

21.5 The Manycore Implementation

459

(3)

21.6 Implementation Profiling and Sensitivity Analysis

462

(2)

21.7 Application-Level Optimization

464

(3)

21.8 Conclusion and Key Lessons

467

(4)

References

468

(3)

Subject Index

471

Ron Bekkerman is a computer engineer and scientist whose experience spans across disciplines from video processing to business intelligence. Currently a senior research scientist at LinkedIn, he previously worked for a number of major companies including Hewlett-Packard and Motorola. Bekkerman's research interests lie primarily in the area of large-scale unsupervised learning. He is the corresponding author of several publications in top-tier venues, such as ICML, KDD, SIGIR, WWW, IJCAI, CVPR, EMNLP and JMLR. Mikhail Bilenko is a researcher in the Machine Learning and Intelligence group at Microsoft Research. His research interests center on machine learning and data mining tasks that arise in the context of large behavioral and textual datasets. Bilenko's recent work has focused on learning algorithms that leverage user behavior to improve online advertising. His papers have been published at KDD, ICML, SIGIR, and WWW among other venues, and he has received best paper awards from SIGIR and KDD. John Langford is a computer scientist working as a senior researcher at Yahoo! Research. Previously, he was affiliated with the Toyota Technological Institute and IBM T. J. Watson Research Center. Langford's work has been published at conferences and in journals including ICML, COLT, NIPS, UAI, KDD, JMLR and MLJ. He received the Pat Goldberg Memorial Best Paper Award, as well as best paper awards from ACM EC and WSDM. He is also the author of the popular machine learning weblog, hunch.net.

Biežāk uzdotie jautājumi par e-grāmatām

Permanent link: https://www.kriso.lv/db/97811392104092e.html

Keywords:

E-grāmata: Scaling up Machine Learning: Parallel and Distributed Approaches

DRM restrictions

Kopēšana (kopēt/ievietot):

Drukāšana:

Lietošana:

Recenzijas

Papildus informācija

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Ebook Subjects

Izvēlieties iepirkumu grozu