Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

E-grāmata: Contemporary High Performance Computing: From Petascale toward Exascale, Volume 3

4.00/5 (2 ratings by Goodreads)

Edited by Jeffrey S. Vetter (Oak Ridge National Laboratory, Tennessee, USA)

Formāts: 478 pages
Sērija : Chapman & Hall/CRC Computational Science
Izdošanas datums: 08-May-2019
Izdevniecība: CRC Press
Valoda: eng
ISBN-13: 9781351036849

Citas grāmatas par šo tēmu:

Formāts - EPUB+DRM
Cena: 56,34 €*
* ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
Ielikt grozā
Pievienot vēlmju sarakstam
Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.

Formāts: 478 pages
Sērija : Chapman & Hall/CRC Computational Science
Izdošanas datums: 08-May-2019
Izdevniecība: CRC Press
Valoda: eng
ISBN-13: 9781351036849

Citas grāmatas par šo tēmu:

DRM restrictions

Kopēšana (kopēt/ievietot):

nav atļauts
Drukāšana:

nav atļauts
Lietošana:

Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

Nepieciešamā programmatūra
Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

Contemporary High Performance Computing: From Petascale toward Exascale, Volume 3 focuses on the ecosystems surrounding the worlds leading centers for high performance computing (HPC). It covers many of the important factors involved in each ecosystem: computer architectures, software, applications, facilities, and sponsors.

This third volume will be a continuation of the two previous volumes, and will include other HPC ecosystems using the same chapter outline: description of a flagship system, major application workloads, facilities, and sponsors.

Features:

Describes many prominent, international systems in HPC from 2015 through 2017 including each systems hardware and software architecture

Covers facilities for each system including power and cooling

Presents application workloads for each site

Discusses historic and projected trends in technology and applications

Includes contributions from leading experts

Designed for researchers and students in high performance computing, computational science, and related areas, this book provides a valuable guide to the state-of-the art research, trends, and resources in the world of HPC.

Preface

xix

Editor

xxiii

1 Resilient HPC for 24x7x365 Weather Forecast Operations at the Australian Government Bureau of Meteorology

(30)

Lesley Seebeck

Tim F. Pugh

Damian Aigus

Joerg Henrichs

Andrew Khaw

Tennessee Leeuwenburg

James Mandilas

Richard Oxbrow

Naren Rajasingam

Wojtek Uliasz

John Vincent

Craig West

Rob Bell

1.1 Foreword

(1)

1.2 Overview

(3)

1.2.1 Program Background

(1)

1.2.2 Sponsor Background

(1)

1.2.3 Timeline

(1)

1.3 Applications and Workloads

(6)

1.3.1 Highlights of Main Applications

(2)

1.3.2 2017 Case Study: From Nodes to News, TC Debbie

(1)

1.3.3 Benchmark Usage

(1)

1.3.4 SSP - Monitoring System Performance

(2)

1.4 System Overview

(1)

1.4.1 System Design Decisions

(1)

1.5 Hardware Architecture

(4)

1.5.1 Australis Processors

(1)

1.5.2 Australis Node Design

(1)

1.5.2.1 Australis Service Node

(1)

1.5.2.2 Australis Compute Node

(1)

1.5.3 External Nodes

(1)

1.5.4 Australis Memory

(1)

1.5.5 Australis Interconnect

(1)

1.5.6 Australis Storage and Filesystem

(1)

1.6 System Software

(3)

1.6.1 Operating System

(1)

1.6.2 Operating System Upgrade Procedure

(1)

1.6.3 Schedulers

(2)

1.6.3.1 SMS

(1)

1.6.3.2 Cylc

(1)

1.6.3.3 PBS Professional

(1)

1.7 Programming System

(2)

1.7.1 Programming Models

(1)

1.7.2 Compiler Selection

(1)

1.7.3 Optimisations

(1)

1.8 Archiving

(1)

1.8.1 Oracle Hierarchical Storage Manager (SAM-QFS)

(1)

1.8.2 MARS/TSM

(1)

1.9 Data Center/Facility

(1)

1.10 System Statistics

(1)

1.10.1 Systems Usage Patterns

(1)

1.11 Reliability

(3)

1.11.1 Failover Scenarios

(1)

1.11.2 Compute Failover

(1)

1.11.3 Data Mover Failover

(1)

1.11.4 Storage Failover

(1)

1.11.4.1 Normal Mode

(1)

1.11.4.2 Failover Mode

(1)

1.11.4.3 Recovery Mode

(1)

1.11.4.4 Isolated Mode

(1)

1.11.5 SSH File Transfer Failover

(1)

1.12 Implementing a Product Generation Platform

(3)

2 Theta and Mira at Argonne National Laboratory

(32)

Mark R. Fahey

Yuri Alexeev

Bill Allcock

Benjamin S. Allen

Ramesh Balakrishnan

Anouar Benali

Liza Booker

Ashley Boyle

Laural Briggs

Edouard Brooks

Phil Carns

Beth Cerny

Andrew Cherry

Lisa Childers

Sudheer Chunduri

Richard Coffey

James Collins

Paul Coffman

Susan Coghlan

Kathy DiBennardi

Ginny Doyle

Hal Finkel

Graham Fletcher

Marta Garcia

Ira Goldberg

Cheetah Goletz

Susan Gregurich

Kevin Harms

Carissa Holohan

Joseph A. Insley

Tommie Jackson

Janet Jaseckas

Elise Jennings

Derek Jensen

Wei Jiang

Margaret Kaczmarski

Chris Knight

Janet Knowles

Kalyan Kumaran

Ti Leggett

Ben Lenard

Anping Liu

Ray Loy

Preeti Malakar

Avanthi Mantrala

David E. Martin

Guillermo Mayorga

Gordon McPheeters

Paul Messina

Ryan Milner

Vitali Morozov

Zachary Nault

Denise Nelson

Jack O'Connell

James Osborn

Michael E. Papka

Scott Parker

Pragnesh Patel

Saumil Patel

Eric Pershey

Renee Plzak

Adrian Pope

Jared Punzel

Sreeranjani Ramprakash

John Reddy

Paul Rich

Katherine Riley

Silvio Rizzi

George Rojas

Nichols A. Romero

Robert Scott

Adam Scovel

William Scullin

Emily Shemon

Haritha Siddabathuni Som

Joan Stover

Mirek Suliba

Brian Toonen

Tom Uram

Alvaro Vazquez-Mayagoitia

Venkatram Vishwanath

R. Douglas Waldron

Gabe West

Timothy J. Williams

Darin Wills

Laura Wolf

Wanda Woods

Michael Zhang

2.1 ALCF Overview

(2)

2.1.1 Argonne Leadership Computing Facility

(1)

2.1.2 Timeline

(1)

2.1.3 Organization of This
Chapter

(1)

2.2 Facility

(2)

2.2.1 Mira Facility Improvements

(1)

2.2.2 Theta Facility Improvements

(1)

2.3 Theta

(12)

2.3.1 Architecture

(4)

2.3.1.1 Processor

(1)

2.3.1.2 Memory

(1)

2.3.1.3 Network

(1)

2.3.1.4 Storage System

(1)

2.3.2 System Software

(1)

2.3.2.1 Systems Administration of the Cray Linux Environment

(1)

2.3.2.2 Scheduler

(1)

2.3.3 Programming System

(2)

2.3.3.1 Programming Models

(1)

2.3.3.2 Languages and Compilers

(1)

2.3.4 Deployment and Acceptance

(2)

2.3.4.1 Benchmarks

(1)

2.3.4.2 Applications

(1)

2.3.5 Early Science and Transition to Operations

(2)

2.4 Mira

(7)

2.4.1 Architecture and Software Summary

(1)

2.4.2 Evolution of Ecosystem

(2)

2.4.3 Notable Science Accomplishments

(2)

2.4.4 System Statistics

(1)

2.5 Cobalt Job Scheduler

(2)

2.6 Job Failure Analysis

(2)

2.7 Acknowledgments

(4)

3 Zuse Institute Berlin (ZIB)

(30)

Alexander Reinefeld

Thomas Steinke

Matthias Noack

Florian Wende

3.1 Overview

(2)

3.1.1 Research Center for Many-Core HPC

(1)

3.1.2 Timeline

(1)

3.2 Applications and Workloads

(2)

3.2.1 VASP

(1)

3.2.2 GLAT

(1)

3.2.3 HEOM

(1)

3.3 System Hardware Architecture

(3)

3.3.1 Cray TDS at ZIB with Intel Xeon Phi Processors

(1)

3.3.2 Intel Xeon Phi 71xx

(1)

3.3.3 Intel Xeon Phi 72xx

(1)

3.4 Many-Core in HPC: The Need for Code Modernization

(19)

3.4.1 High-level SIMD Vectorization

(6)

3.4.2 Offloading over Fabric

(7)

3.4.3 Runtime Kernel Compilation with KART

(5)

3.5 Summary

(4)

4 The Mont-Blanc Prototype

(30)

Filippo Mantovani

Daniel Ruiz

Leonardo Bautista

Vishal Metha

Fabio Banchelli

Nikola Rajovic

Eduard Ayguade

Jesus Labarta

Mateo Valero

Alejandro Rico Carro

Alex Ramirez Bellido

Markus Geimer

Daniele Tafani

4.1 Overview

(2)

4.1.1 Project Context and Challenges

(2)

4.1.2 Objectives and Timeline

(1)

4.2 Hardware Architecture

(4)

4.2.1 Compute Node

(1)

4.2.2 Blade

(1)

4.2.3 The Overall System

(1)

4.2.4 Performance Summary

(1)

4.3 System Software

100

(3)

4.3.1 Development Tools Ecosystem

101

(1)

4.3.2 OpenStack

102

(1)

4.4 Applications and Workloads

103

(5)

4.4.1 Core Evaluation

104

(1)

4.4.2 Node Evaluation

105

(1)

4.4.3 System Evaluation

106

(1)

4.4.4 Node Power Profiling

106

(2)

4.5 Deployment and Operational Information

108

(2)

4.5.1 Thermal Experiments

109

(1)

4.6 Highlights of Mont-Blanc

110

(9)

4.6.1 Reliability Study of an Unprotected RAM System

111

(3)

4.6.2 Network Retransmission and OS Noise Study

114

(3)

4.6.3 The Power Monitoring Tool of the Mont-Blanc System

117

(2)

4.7 Acknowledgments

119

(4)

5 Chameleon

123

(26)

Kate Keahey

Pierre Riteau

Dan Stanzione

Tim Cockerill

Joe Mambretti

Paul Rad

Paul Ruth

5.1 Overview

124

(2)

5.1.1 A Case for a Production Testbed

124

(1)

5.1.2 Program Background

125

(1)

5.1.3 Timeline

126

(1)

5.2 Hardware Architecture

126

(4)

5.2.1 Projected Use Cases

127

(1)

5.2.2 Phase 1 Chameleon Deployment

127

(2)

5.2.3 Experience with Phase 1 Hardware and Future Plans

129

(1)

5.3 System Overview

130

(1)

5.4 System Software

130

(5)

5.4.1 Core Services

131

(1)

5.4.2 Implementation

132

(3)

5.5 Appliances

135

(2)

5.5.1 System Appliances

136

(1)

5.5.2 Complex Appliances

137

(1)

5.6 Data Center/Facility

137

(1)

5.6.1 University of Chicago Facility

137

(1)

5.6.2 TACC Facility

137

(1)

5.6.3 Wide-Area Connectivity

137

(1)

5.7 System Management and Policies

138

(1)

5.8 Statistics and Lessons Learned

138

(3)

5.9 Research Projects Highlights

141

(9)

5.9.1 Chameleon Slices for Wide-Area Networking Research

141

(1)

5.9.2 Machine Learning Experiments on Chameleon

142

(7)

6 CSCS and the Piz Daint System

149

(26)

Sadaf R. Alam

Ladina Gilly

Cohn J. McMurtrie

Thomas C. Schulthess

6.1 Introduction

150

(3)

6.1.1 Program and Sponsor

150

(1)

6.1.2 Timeline

151

(2)

6.2 Co-designing Piz Daint

153

(2)

6.3 Hardware Architecture

155

(4)

6.3.1 Overview of the Cray XC50 Architecture

155

(1)

6.3.2 Cray XC50 Hybrid Compute Node and Blade

155

(1)

6.3.3 Interconnect

156

(1)

6.3.4 Scratch File System Configuration

157

(2)

6.4 Innovative Features of Piz Daint

159

(4)

6.4.1 New Cray Linux Environment (CLE 6.0)

160

(1)

6.4.2 Public IP Routing

161

(1)

6.4.3 GPU Monitoring

162

(1)

6.4.4 System Management and Monitoring

162

(1)

6.5 Data Center/Facility

163

(4)

6.5.1 Design Criteria for the Facility

163

(2)

6.5.2 Lake Water Cooling

165

(1)

6.5.3 Cooling Distribution

165

(1)

6.5.4 Electrical Distribution

166

(1)

6.5.5 Siting the Current Piz Daint System

166

(1)

6.5.5.1 Cooling

166

(1)

6.5.5.2 Power

166

(1)

6.5.5.3 Challenges

166

(1)

6.6 Consolidation of Services

167

(4)

6.6.1 High Performance Computing Service

167

(1)

6.6.2 Visualization and Data Analysis Service

168

(1)

6.6.3 Data Mover Service

169

(1)

6.6.4 Container Service

170

(1)

6.6.5 Cray Urika-XC Analytics Software Suite Services

170

(1)

6.6.6 Worldwide Large Hadron Collider (LHC) Computing Grid (WLCG) Services

170

(1)

6.7 Acknowledgements

171

(4)

7 Facility Best Practices

175

(14)

Ladina Gilly

7.1 Introduction

175

(1)

7.2 Forums That Discuss Best Practices in HPC

176

(1)

7.3 Relevant Standards for Data Centres

176

(1)

7.4 Most Frequently Encountered Infrastructure Challenges

177

(1)

7.5 Compilation of Best Practices

178

(7)

7.5.1 Management Topics

178

(1)

7.5.2 Tendering Processes

179

(1)

7.5.3 Building Envelope

180

(1)

7.5.4 Power Density and Capacity

180

(1)

7.5.5 Raised Floor

181

(1)

7.5.6 Electrical Infrastructure

182

(1)

7.5.7 Cooling

183

(1)

7.5.8 Fire Protection

184

(1)

7.5.9 Measuring and Monitoring

184

(1)

7.5.10 Once in Operation

185

(1)

7.6 Limitations and Implications

185

(1)

7.7 Conclusion

185

(4)

8 Jetstream

189

(34)

Craig A. Stewart

David Y. Hancock

Therese Miller

Jeremy Fischer

R. Lee Liming

George Turner

John Michael Lowe

Steven Gregory

Edwin Skidmore

Matthew Vaughn

Dan Stanzione

Nirav Merchant

Ian Foster

James Taylor

Paul Rad

Volker Brendel

Enis Afgan

Michael Packard

Therese Miller

Winona Snapp-Childs

8.1 Overview

191

(9)

8.1.1 Jetstream Motivation and Sponsor Background

192

(3)

8.1.2 Timeline

195

(1)

8.1.3 Hardware Acceptance

196

(1)

8.1.4 Benchmark Results

197

(1)

8.1.5 Cloud Functionality Tests

198

(1)

8.1.6 Gateway Functionality Tests

199

(1)

8.1.7 Data Movement, Storage, and Dissemination

199

(1)

8.1.8 Acceptance by NSF

200

(1)

8.2 Applications and Workloads

200

(3)

8.2.1 Highlights of Main Applications

201

(2)

8.3 System Overview

203

(1)

8.4 Hardware Architecture

203

(1)

8.4.1 Node Design and Processor Elements

203

(1)

8.4.2 Interconnect

204

(1)

8.4.3 Storage Subsystem

204

(1)

8.5 System Software

204

(6)

8.5.1 Operating System

204

(2)

8.5.2 System Administration

206

(1)

8.5.3 Schedulers and Virtualization

206

(1)

8.5.4 Security

207

(1)

8.5.5 Storage Software

208

(1)

8.5.6 User Authentication

208

(1)

8.5.7 Allocation Software and Processes

209

(1)

8.6 Programming System

210

(3)

8.6.1 Atmosphere

210

(1)

8.6.2 Jetstream Plugins for the Atmosphere Platform

211

(1)

8.6.2.1 Authorization

211

(1)

8.6.2.2 Allocation Sources and Special Allocations

211

(1)

8.6.3 Globus Authentication and Data Access

212

(1)

8.6.4 The Jetstream OpenStack API

212

(1)

8.6.5 VM libraries

212

(1)

8.7 Data Center Facilities

213

(1)

8.8 System Statistics

214

(1)

8.9 Interesting Features

215

(2)

8.9.1 Jupyter and Kubernetes

216

(1)

8.10 Artificial Intelligence Technology Education

217

(1)

8.11 Jetstream VM Image Use for Scientific Reproducibility - Bioinformatics as an Example

217

(1)

8.12 Running a Virtual Cluster on Jetstream

218

(5)

9 Modular Supercomputing Architecture: From Idea to Production

223

(34)

Estela Suarez

Norbert Eicker

Thomas Lippert

9.1 The Julich Supercomputing Centre (JSC)

224

(5)

9.2 Supercomputing Architectures at JSC

224

(1)

9.2.1 The Dual Supercomputer Strategy

225

(2)

9.2.2 The Cluster-Booster Concept

227

(1)

9.2.3 The Modular Supercomputing Architecture

228

(1)

9.3 Applications and Workloads

229

(3)

9.3.1 Co-design Applications in the DEEP Projects

231

(1)

9.4 Systems Overview

232

(2)

9.4.1 Sponsors

233

(1)

9.4.2 Timeline

233

(1)

9.5 Hardware Implementation

234

(7)

9.5.1 First Generation (DEEP) Prototype

235

(3)

9.5.2 Second Generation (DEEP-ER) Prototype

238

(1)

9.5.3 JURECA

239

(2)

9.6 System Software

241

(4)

9.6.1 System Administration

241

(1)

9.6.2 Schedulers and Resource Management

242

(2)

9.6.3 Network-bridging Protocol

244

(1)

9.6.4 I/O Software and File System

244

(1)

9.7 Programming Model

245

(4)

9.7.1 Inter-module MPI Offloading

245

(1)

9.7.2 OmpSs Abstraction Layer

246

(1)

9.7.3 Resiliency Software

247

(2)

9.8 Cooling and Facility Infrastructure

249

(1)

9.9 Conclusions and Next steps

250

(1)

9.10 Acknowledgments

251

(6)

10 SuperMUC at LRZ

257

(18)

Hayk Shoukourian

Arndt Bode

Herbert Huber

Michael Ott

Dieter Kranzlmuller

10.1 Overview

257

(3)

10.1.1 Timeline

258

(2)

10.2 System Overview

260

(1)

10.3 Applications and Workloads

261

(4)

10.4 System Stability

265

(1)

10.5 Data Center/Facility

266

(2)

10.6 R&D on Energy-Efficiency at LRZ

268

(7)

11 The NERSC Cori HPC System

275

(30)

Katie Antypas Brian Austin

Deborah Bard

Wahid Bhimji

Brandon Cook

Tina Declerck

Jack Deslippe

Richard Gerber

Rebecca Hartman-Baker

Yun He

Douglas Jacobsen

Thorsten Kurth

Jay Srinivasan

Nicholas J. Wright

11.1 Overview

276

(1)

11.1.1 Sponsor and Program Background

276

(1)

11.1.2 Timeline

277

(1)

11.2 Applications and Workloads

277

(2)

11.2.1 Benchmarks

277

(2)

11.3 System Overview

279

(1)

11.4 Hardware Architecture

280

(1)

11.4.1 Node Types and Design

280

(1)

11.4.1.1 Xeon Phi "Knights Landing" Compute Nodes

280

(1)

11.4.1.2 Xeon "Haswell" Compute Nodes

280

(1)

11.4.1.3 Service Nodes

280

(1)

11.4.2 Interconnect

281

(1)

11.4.3 Storage - Burst Buffer and Lustre Filesystem

281

(1)

11.5 System Software

281

(4)

11.5.1 System Software Overview

281

(1)

11.5.2 System Management Stack

282

(1)

11.5.3 Resource Management

282

(1)

11.5.4 Storage Resources and Software

283

(1)

11.5.5 Networking Resources and Software

284

(1)

11.5.6 Containers and User-Defined Images

284

(1)

11.6 Programming Environment

285

(3)

11.6.1 Programming Models

285

(1)

11.6.2 Languages and Compilers

285

(1)

11.6.3 Libraries and Tools

286

(1)

11.6.4 Building Software for a Heterogeneous System

286

(1)

11.6.5 Default Mode Selection Considerations

287

(1)

11.6.6 Running Jobs

287

(1)

11.7 NESAP

288

(7)

11.7.1 Introduction

288

(1)

11.7.2 Optimization Strategy and Tools

288

(2)

11.7.3 Most Effective Optimizations

290

(1)

11.7.4 NESAP Result Overview

291

(1)

11.7.5 Application Highlights

291

(4)

11.7.5.1 Quantum ESPRESSO

291

(2)

11.7.5.2 MFDn

293

(2)

11.8 Data Science

295

(4)

11.8.1 IO Improvement: Burst Buffer

295

(2)

11.8.2 Workflows

297

(2)

11.8.2.1 Network Connectivity to External Nodes

298

(1)

11.8.2.2 Burst Buffer Filesystem for In-situ Workflows

298

(1)

11.8.2.3 Real-time and Interactive Queues for Time Sensitive Analyses

298

(1)

11.8.2.4 Scheduler and Queue Improvements to Support Data-intensive Computing

299

(1)

11.9 System Statistics

299

(2)

11.9.1 System Utilizations

299

(1)

11.9.2 Job Completion Statistics

299

(2)

11.10 Summary

301

(1)

11.11 Acknowledgments

302

(3)

12 Lomonosov-2

305

(26)

Vladimir Voevodin

Alexander Antonov

Dmitry Nikitenko

Pavel Shvets

Sergey Sobolev

Konstantin Stefanov

Vadim Voevodin

Sergey Zhumatiy

Andrey Brechalov

Alexander Naumov

12.1 Overview

305

(4)

12.1.1 HPC History of MSU

305

(3)

12.1.2 Lomonosov-2 Supercomputer: Timeline

308

(1)

12.2 Applications and Workloads

309

(2)

12.2.1 Main Applications Highlights

309

(1)

12.2.2 Benchmark Results and Rating Positions

309

(1)

12.2.3 Users and Workloads

310

(1)

12.3 System Overview

311

(2)

12.4 System Software and Programming Systems

313

(2)

12.5 Networks

315

(2)

12.5.1 Communication Network

315

(1)

12.5.2 Auxiliary InfiniBand Network

315

(1)

12.5.3 Management and Service Network

316

(1)

12.6 Storage

317

(1)

12.7 Engineering Infrastructure

318

(5)

12.7.1 Infrastructure Support

318

(1)

12.7.2 Power Distribution

318

(2)

12.7.3 Engineering Equipment

320

(1)

12.7.4 Overall Cooling System

320

(2)

12.7.5 Cooling Auxiliary IT Equipment

322

(1)

12.7.6 Emergency Cooling

322

(1)

12.7.7 Efficiency

323

(1)

12.8 Efficiency of the Supercomputer Center

323

(8)

13 Electra

331

(24)

Rupak Biswas

Jeff Becker

Damn Chan

David Ellsworth

Robert Hood

Piyush Mehrotra

Michelle Moyer

Chris Tanner

William Thigpen

13.1 Introduction

332

(1)

13.2 NASA Requirements for Supercomputing

333

(1)

13.3 Supercomputing Capabilities: Conventional Facilities

333

(4)

13.3.1 Computer Systems

333

(1)

13.3.2 Interconnect

334

(1)

13.3.3 Network Connectivity

334

(1)

13.3.4 Storage Resources

335

(1)

13.3.5 Visualization and Hyperwall

336

(1)

13.3.6 Primary NAS Facility

336

(1)

13.4 Modular Supercomputing Facility

337

(5)

13.4.1 Limitations of the Primary NAS Facility

337

(1)

13.4.2 Expansion and Integration Strategy

337

(1)

13.4.3 Site Preparation

338

(1)

13.4.4 Module Design

338

(1)

13.4.5 Power, Cooling, Network

339

(1)

13.4.6 Facility Operations and Maintenance

340

(1)

13.4.7 Environmental Impact

341

(1)

13.5 Electra Supercomputer

342

(2)

13.5.1 Performance

342

(1)

13.5.2 I/O Subsystem Architecture

343

(1)

13.6 User Environment

344

(1)

13.6.1 System Software

344

(1)

13.6.2 Resource Allocation and Scheduling

344

(1)

13.6.3 User Services

344

(1)

13.7 Application Benchmarking and Performance

345

(2)

13.8 Utilization Statistics of HECC Resources

347

(1)

13.9 System Operations and Maintenance

348

(2)

13.9.1 Administration Tools

348

(1)

13.9.2 Monitoring, Diagnosis, and Repair Tools

349

(1)

13.9.3 System Enhancements and Maintenance

350

(1)

13.10 Featured Application

350

(2)

13.11 Conclusions

352

(3)

14 Bridges: Converging HPC, AI, and Big Data for Enabling Discovery

355

(30)

Nicholas A. Nystrom

Paola A. Buitrago

Philip D. Blood

14.1 Overview

356

(3)

14.1.1 Sponsor/Program Background

357

(1)

14.1.2 Timeline

358

(1)

14.2 Applications and Workloads

359

(6)

14.2.1 Highlights of Main Applications and Data

360

(1)

14.2.2 Artificial Intelligence

361

(1)

14.2.3 Genomics

362

(1)

14.2.4 Gateways

363

(1)

14.2.5 Allocations

364

(1)

14.3 System Overview

365

(1)

14.4 Hardware Architecture

366

(4)

14.4.1 Processors and Accelerators

366

(2)

14.4.2 Node Design

368

(1)

14.4.3 Memory

369

(1)

14.4.4 Interconnect

369

(1)

14.4.5 Storage System

369

(1)

14.5 System Software

370

(2)

14.5.1 Operating System

370

(1)

14.5.2 File Systems

371

(1)

14.5.3 System Administration

371

(1)

14.5.4 Scheduler

372

(1)

14.6 Interactivity

372

(1)

14.6.1 Virtualization and Containers

372

(1)

14.7 User Environment

373

(3)

14.7.1 User Environment Customization

373

(1)

14.7.2 Programming Models

374

(1)

14.7.3 Languages and Compilers

374

(1)

14.7.4 Programming Tools

374

(1)

14.7.5 Spark and Hadoop

374

(1)

14.7.6 Databases

375

(1)

14.7.7 Domain-Specific Frameworks and Libraries

375

(1)

14.7.8 Gateways, Workflows, and Distributed Applications

375

(1)

14.8 Storage, Visualization, and Analytics

376

(1)

14.8.1 Community Datasets and Big Data as a Service

376

(1)

14.9 Datacenter

376

(1)

14.10 System Statistics

377

(1)

14.10.1 Reliability and Uptime

377

(1)

14.11 Science Highlights: Bridges-Enabled Breakthroughs

377

(2)

14.11.1 Artificial Intelligence and Big Data

377

(1)

14.11.2 Genomics

378

(1)

14.12 Acknowledgments

379

(6)

15 Stampede at TACC

385

(16)

Dan Stanzione

John West

15.1 Overview

385

(3)

15.1.1 Program Background

386

(1)

15.1.2 Lessons Learned on the Path to Stampede 2

386

(2)

15.2 Workload and the Design of Stampede 2

388

(2)

15.2.1 Science Highlights

389

(1)

15.3 System Configuration

390

(2)

15.3.1 Processors and Memory

390

(1)

15.3.2 Interconnect

391

(1)

15.3.3 Disk I/O Subsystem

391

(1)

15.3.4 Non-volatile Memory

392

(1)

15.4 System Software

392

(2)

15.4.1 System Performance Monitoring and Administration

392

(1)

15.4.2 Job Submission and System Health

393

(1)

15.4.3 Application Development Tools

393

(1)

15.5 Visualization and Analytics

394

(1)

15.5.1 Visualization on Stampede 2

394

(1)

15.5.2 Data Analysis

395

(1)

15.6 Datacenter, Layout, and Cybersecurity

395

(1)

15.6.1 System Layout and Phased Deployment

396

(1)

15.6.2 Cybersecurity and Identity Management

396

(1)

15.7 Conclusion

396

(5)

16 Oakforest-PACS

401

(22)

Taisuke Boku

Osamu Tate be

Daisuke Takahashi

Kazuhiro Yabana

Yuta Hirokawa

Masayuki Umemura

Toshihiro Hanawa

Kengo Nakajima

Hiroshi Nakamura

Tsuyoshi Ichimura

Kohei Fujita

Yutaka Ishikawa

Mitsuhisa Sato

Balazs Gerofi

Masamichi Takagi

16.1 Overview

402

(1)

16.2 Timeline

402

(1)

16.3 Applications and Workloads

403

(4)

16.3.1 GAMERA/GHYDRA

403

(1)

16.3.2 ARTED

404

(2)

16.3.3 Benchmark Results

406

(3)

16.3.3.1 HPL

406

(1)

16.3.3.2 HPCG

407

(1)

16.4 System Overview

407

(1)

16.5 Hardware Architecture

408

(1)

16.6 System Software

409

(3)

16.6.1 Basic System Software

409

(1)

16.6.2 IHK/McKernel

409

(3)

16.7 Programming System

412

(6)

16.7.1 Basic Programming Environment

412

(1)

16.7.2 XcalableMP: A PGAS Parallel Programming Language for Parallel Many-core Processor System

413

(10)

16.7.2.1 Overview of XcalableMP

413

(1)

16.7.2.2 OpenMP and XMP Tasklet Directive

414

(1)

16.7.2.3 Multi-tasking Execution Model in XcalableMP between Nodes

415

(1)

16.7.2.4 Preliminary Performance Evaluation on Oakforest-PACS

416

(1)

16.7.2.5 Communication Optimization for Many-Core Clusters

417

(1)

16.8 Storage System

418

(1)

16.9 Data Center/Facility

419

(4)

17 CHPC in South Africa

423

(28)

Happy M. Sithole

Werner Janse Van Rensburg

Dorah Thobye

Krishna Govender

Charles Crosby

Kevin Colville

Anita Loots

17.1 Overview

423

(3)

17.1.1 Sponsor/Program Background

423

(1)

17.1.2 Business Case of the Installation of Lengau

424

(1)

17.1.3 Timeline

425

(1)

17.2 Applications and Workloads

426

(10)

17.2.1 Highlights of Main Applications

426

(1)

17.2.2 Benchmark Results

427

(9)

17.2.2.1 Computational Mechanics

428

(2)

17.2.2.2 Earth Sciences

430

(1)

17.2.2.3 Computational Chemistry

430

(3)

17.2.2.4 Astronomy

433

(3)

17.3 System Overview

436

(2)

17.4 Storage, Visualisation and Analytics

438

(1)

17.5 Data Center/Facility

438

(1)

17.6 System Statistics

439

(8)

17.7 Square Kilometer Array

447

(4)

Index

451

Jeffrey S. Vetter, Ph.D., is a Distinguished R&D Staff Member, and the founding group leader of the Future Technologies Group in the Computer Science and Mathematics Division of Oak Ridge National Laboratory. Vetter also holds a joint appointment at the Electrical Engineering and Computer Science Department of the University of Tennessee-Knoxville. From 2005 through 2015, Vetter held a Joint position at Georgia Institute of Technology, where, from 2009 to 2015, he was the Principal Investigator of the NSF Track 2D Experimental Computing XSEDE Facility, named Keeneland, for large scale heterogeneous computing using graphics processors, and the Director of the NVIDIA CUDA Center of Excellence.

Biežāk uzdotie jautājumi par e-grāmatām