Atjaunināt sīkdatņu piekrišanu

E-grāmata: Contemporary High Performance Computing: From Petascale toward Exascale, Volume 3

Edited by (Oak Ridge National Laboratory, Tennessee, USA)
  • Formāts - EPUB+DRM
  • Cena: 56,34 €*
  • * ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
  • Ielikt grozā
  • Pievienot vēlmju sarakstam
  • Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.

DRM restrictions

  • Kopēšana (kopēt/ievietot):

    nav atļauts

  • Drukāšana:

    nav atļauts

  • Lietošana:

    Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
    Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

    Nepieciešamā programmatūra
    Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

    Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

    Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

Contemporary High Performance Computing: From Petascale toward Exascale, Volume 3 focuses on the ecosystems surrounding the worlds leading centers for high performance computing (HPC). It covers many of the important factors involved in each ecosystem: computer architectures, software, applications, facilities, and sponsors.

This third volume will be a continuation of the two previous volumes, and will include other HPC ecosystems using the same chapter outline: description of a flagship system, major application workloads, facilities, and sponsors.

Features:











Describes many prominent, international systems in HPC from 2015 through 2017 including each systems hardware and software architecture





Covers facilities for each system including power and cooling





Presents application workloads for each site





Discusses historic and projected trends in technology and applications





Includes contributions from leading experts

Designed for researchers and students in high performance computing, computational science, and related areas, this book provides a valuable guide to the state-of-the art research, trends, and resources in the world of HPC.
Preface xix
Editor xxiii
1 Resilient HPC for 24x7x365 Weather Forecast Operations at the Australian Government Bureau of Meteorology 1(30)
Lesley Seebeck
Tim F. Pugh
Damian Aigus
Joerg Henrichs
Andrew Khaw
Tennessee Leeuwenburg
James Mandilas
Richard Oxbrow
Naren Rajasingam
Wojtek Uliasz
John Vincent
Craig West
Rob Bell
1.1 Foreword
3(1)
1.2 Overview
4(3)
1.2.1 Program Background
5(1)
1.2.2 Sponsor Background
6(1)
1.2.3 Timeline
7(1)
1.3 Applications and Workloads
7(6)
1.3.1 Highlights of Main Applications
8(2)
1.3.2 2017 Case Study: From Nodes to News, TC Debbie
10(1)
1.3.3 Benchmark Usage
11(1)
1.3.4 SSP - Monitoring System Performance
11(2)
1.4 System Overview
13(1)
1.4.1 System Design Decisions
13(1)
1.5 Hardware Architecture
14(4)
1.5.1 Australis Processors
14(1)
1.5.2 Australis Node Design
15(1)
1.5.2.1 Australis Service Node
15(1)
1.5.2.2 Australis Compute Node
16(1)
1.5.3 External Nodes
16(1)
1.5.4 Australis Memory
17(1)
1.5.5 Australis Interconnect
17(1)
1.5.6 Australis Storage and Filesystem
17(1)
1.6 System Software
18(3)
1.6.1 Operating System
18(1)
1.6.2 Operating System Upgrade Procedure
18(1)
1.6.3 Schedulers
19(2)
1.6.3.1 SMS
20(1)
1.6.3.2 Cylc
20(1)
1.6.3.3 PBS Professional
20(1)
1.7 Programming System
21(2)
1.7.1 Programming Models
21(1)
1.7.2 Compiler Selection
22(1)
1.7.3 Optimisations
22(1)
1.8 Archiving
23(1)
1.8.1 Oracle Hierarchical Storage Manager (SAM-QFS)
23(1)
1.8.2 MARS/TSM
23(1)
1.9 Data Center/Facility
24(1)
1.10 System Statistics
25(1)
1.10.1 Systems Usage Patterns
25(1)
1.11 Reliability
25(3)
1.11.1 Failover Scenarios
26(1)
1.11.2 Compute Failover
26(1)
1.11.3 Data Mover Failover
26(1)
1.11.4 Storage Failover
26(1)
1.11.4.1 Normal Mode
27(1)
1.11.4.2 Failover Mode
27(1)
1.11.4.3 Recovery Mode
27(1)
1.11.4.4 Isolated Mode
27(1)
1.11.5 SSH File Transfer Failover
27(1)
1.12 Implementing a Product Generation Platform
28(3)
2 Theta and Mira at Argonne National Laboratory 31(32)
Mark R. Fahey
Yuri Alexeev
Bill Allcock
Benjamin S. Allen
Ramesh Balakrishnan
Anouar Benali
Liza Booker
Ashley Boyle
Laural Briggs
Edouard Brooks
Phil Carns
Beth Cerny
Andrew Cherry
Lisa Childers
Sudheer Chunduri
Richard Coffey
James Collins
Paul Coffman
Susan Coghlan
Kathy DiBennardi
Ginny Doyle
Hal Finkel
Graham Fletcher
Marta Garcia
Ira Goldberg
Cheetah Goletz
Susan Gregurich
Kevin Harms
Carissa Holohan
Joseph A. Insley
Tommie Jackson
Janet Jaseckas
Elise Jennings
Derek Jensen
Wei Jiang
Margaret Kaczmarski
Chris Knight
Janet Knowles
Kalyan Kumaran
Ti Leggett
Ben Lenard
Anping Liu
Ray Loy
Preeti Malakar
Avanthi Mantrala
David E. Martin
Guillermo Mayorga
Gordon McPheeters
Paul Messina
Ryan Milner
Vitali Morozov
Zachary Nault
Denise Nelson
Jack O'Connell
James Osborn
Michael E. Papka
Scott Parker
Pragnesh Patel
Saumil Patel
Eric Pershey
Renee Plzak
Adrian Pope
Jared Punzel
Sreeranjani Ramprakash
John Reddy
Paul Rich
Katherine Riley
Silvio Rizzi
George Rojas
Nichols A. Romero
Robert Scott
Adam Scovel
William Scullin
Emily Shemon
Haritha Siddabathuni Som
Joan Stover
Mirek Suliba
Brian Toonen
Tom Uram
Alvaro Vazquez-Mayagoitia
Venkatram Vishwanath
R. Douglas Waldron
Gabe West
Timothy J. Williams
Darin Wills
Laura Wolf
Wanda Woods
Michael Zhang
2.1 ALCF Overview
32(2)
2.1.1 Argonne Leadership Computing Facility
32(1)
2.1.2 Timeline
33(1)
2.1.3 Organization of This
Chapter
34(1)
2.2 Facility
34(2)
2.2.1 Mira Facility Improvements
34(1)
2.2.2 Theta Facility Improvements
35(1)
2.3 Theta
36(12)
2.3.1 Architecture
37(4)
2.3.1.1 Processor
38(1)
2.3.1.2 Memory
39(1)
2.3.1.3 Network
39(1)
2.3.1.4 Storage System
40(1)
2.3.2 System Software
41(1)
2.3.2.1 Systems Administration of the Cray Linux Environment
42(1)
2.3.2.2 Scheduler
42(1)
2.3.3 Programming System
42(2)
2.3.3.1 Programming Models
42(1)
2.3.3.2 Languages and Compilers
43(1)
2.3.4 Deployment and Acceptance
44(2)
2.3.4.1 Benchmarks
44(1)
2.3.4.2 Applications
45(1)
2.3.5 Early Science and Transition to Operations
46(2)
2.4 Mira
48(7)
2.4.1 Architecture and Software Summary
49(1)
2.4.2 Evolution of Ecosystem
50(2)
2.4.3 Notable Science Accomplishments
52(2)
2.4.4 System Statistics
54(1)
2.5 Cobalt Job Scheduler
55(2)
2.6 Job Failure Analysis
57(2)
2.7 Acknowledgments
59(4)
3 Zuse Institute Berlin (ZIB) 63(30)
Alexander Reinefeld
Thomas Steinke
Matthias Noack
Florian Wende
3.1 Overview
63(2)
3.1.1 Research Center for Many-Core HPC
64(1)
3.1.2 Timeline
64(1)
3.2 Applications and Workloads
65(2)
3.2.1 VASP
66(1)
3.2.2 GLAT
66(1)
3.2.3 HEOM
67(1)
3.3 System Hardware Architecture
67(3)
3.3.1 Cray TDS at ZIB with Intel Xeon Phi Processors
68(1)
3.3.2 Intel Xeon Phi 71xx
69(1)
3.3.3 Intel Xeon Phi 72xx
69(1)
3.4 Many-Core in HPC: The Need for Code Modernization
70(19)
3.4.1 High-level SIMD Vectorization
71(6)
3.4.2 Offloading over Fabric
77(7)
3.4.3 Runtime Kernel Compilation with KART
84(5)
3.5 Summary
89(4)
4 The Mont-Blanc Prototype 93(30)
Filippo Mantovani
Daniel Ruiz
Leonardo Bautista
Vishal Metha
Fabio Banchelli
Nikola Rajovic
Eduard Ayguade
Jesus Labarta
Mateo Valero
Alejandro Rico Carro
Alex Ramirez Bellido
Markus Geimer
Daniele Tafani
4.1 Overview
94(2)
4.1.1 Project Context and Challenges
94(2)
4.1.2 Objectives and Timeline
96(1)
4.2 Hardware Architecture
96(4)
4.2.1 Compute Node
96(1)
4.2.2 Blade
97(1)
4.2.3 The Overall System
98(1)
4.2.4 Performance Summary
99(1)
4.3 System Software
100(3)
4.3.1 Development Tools Ecosystem
101(1)
4.3.2 OpenStack
102(1)
4.4 Applications and Workloads
103(5)
4.4.1 Core Evaluation
104(1)
4.4.2 Node Evaluation
105(1)
4.4.3 System Evaluation
106(1)
4.4.4 Node Power Profiling
106(2)
4.5 Deployment and Operational Information
108(2)
4.5.1 Thermal Experiments
109(1)
4.6 Highlights of Mont-Blanc
110(9)
4.6.1 Reliability Study of an Unprotected RAM System
111(3)
4.6.2 Network Retransmission and OS Noise Study
114(3)
4.6.3 The Power Monitoring Tool of the Mont-Blanc System
117(2)
4.7 Acknowledgments
119(4)
5 Chameleon 123(26)
Kate Keahey
Pierre Riteau
Dan Stanzione
Tim Cockerill
Joe Mambretti
Paul Rad
Paul Ruth
5.1 Overview
124(2)
5.1.1 A Case for a Production Testbed
124(1)
5.1.2 Program Background
125(1)
5.1.3 Timeline
126(1)
5.2 Hardware Architecture
126(4)
5.2.1 Projected Use Cases
127(1)
5.2.2 Phase 1 Chameleon Deployment
127(2)
5.2.3 Experience with Phase 1 Hardware and Future Plans
129(1)
5.3 System Overview
130(1)
5.4 System Software
130(5)
5.4.1 Core Services
131(1)
5.4.2 Implementation
132(3)
5.5 Appliances
135(2)
5.5.1 System Appliances
136(1)
5.5.2 Complex Appliances
137(1)
5.6 Data Center/Facility
137(1)
5.6.1 University of Chicago Facility
137(1)
5.6.2 TACC Facility
137(1)
5.6.3 Wide-Area Connectivity
137(1)
5.7 System Management and Policies
138(1)
5.8 Statistics and Lessons Learned
138(3)
5.9 Research Projects Highlights
141(9)
5.9.1 Chameleon Slices for Wide-Area Networking Research
141(1)
5.9.2 Machine Learning Experiments on Chameleon
142(7)
6 CSCS and the Piz Daint System 149(26)
Sadaf R. Alam
Ladina Gilly
Cohn J. McMurtrie
Thomas C. Schulthess
6.1 Introduction
150(3)
6.1.1 Program and Sponsor
150(1)
6.1.2 Timeline
151(2)
6.2 Co-designing Piz Daint
153(2)
6.3 Hardware Architecture
155(4)
6.3.1 Overview of the Cray XC50 Architecture
155(1)
6.3.2 Cray XC50 Hybrid Compute Node and Blade
155(1)
6.3.3 Interconnect
156(1)
6.3.4 Scratch File System Configuration
157(2)
6.4 Innovative Features of Piz Daint
159(4)
6.4.1 New Cray Linux Environment (CLE 6.0)
160(1)
6.4.2 Public IP Routing
161(1)
6.4.3 GPU Monitoring
162(1)
6.4.4 System Management and Monitoring
162(1)
6.5 Data Center/Facility
163(4)
6.5.1 Design Criteria for the Facility
163(2)
6.5.2 Lake Water Cooling
165(1)
6.5.3 Cooling Distribution
165(1)
6.5.4 Electrical Distribution
166(1)
6.5.5 Siting the Current Piz Daint System
166(1)
6.5.5.1 Cooling
166(1)
6.5.5.2 Power
166(1)
6.5.5.3 Challenges
166(1)
6.6 Consolidation of Services
167(4)
6.6.1 High Performance Computing Service
167(1)
6.6.2 Visualization and Data Analysis Service
168(1)
6.6.3 Data Mover Service
169(1)
6.6.4 Container Service
170(1)
6.6.5 Cray Urika-XC Analytics Software Suite Services
170(1)
6.6.6 Worldwide Large Hadron Collider (LHC) Computing Grid (WLCG) Services
170(1)
6.7 Acknowledgements
171(4)
7 Facility Best Practices 175(14)
Ladina Gilly
7.1 Introduction
175(1)
7.2 Forums That Discuss Best Practices in HPC
176(1)
7.3 Relevant Standards for Data Centres
176(1)
7.4 Most Frequently Encountered Infrastructure Challenges
177(1)
7.5 Compilation of Best Practices
178(7)
7.5.1 Management Topics
178(1)
7.5.2 Tendering Processes
179(1)
7.5.3 Building Envelope
180(1)
7.5.4 Power Density and Capacity
180(1)
7.5.5 Raised Floor
181(1)
7.5.6 Electrical Infrastructure
182(1)
7.5.7 Cooling
183(1)
7.5.8 Fire Protection
184(1)
7.5.9 Measuring and Monitoring
184(1)
7.5.10 Once in Operation
185(1)
7.6 Limitations and Implications
185(1)
7.7 Conclusion
185(4)
8 Jetstream 189(34)
Craig A. Stewart
David Y. Hancock
Therese Miller
Jeremy Fischer
R. Lee Liming
George Turner
John Michael Lowe
Steven Gregory
Edwin Skidmore
Matthew Vaughn
Dan Stanzione
Nirav Merchant
Ian Foster
James Taylor
Paul Rad
Volker Brendel
Enis Afgan
Michael Packard
Therese Miller
Winona Snapp-Childs
8.1 Overview
191(9)
8.1.1 Jetstream Motivation and Sponsor Background
192(3)
8.1.2 Timeline
195(1)
8.1.3 Hardware Acceptance
196(1)
8.1.4 Benchmark Results
197(1)
8.1.5 Cloud Functionality Tests
198(1)
8.1.6 Gateway Functionality Tests
199(1)
8.1.7 Data Movement, Storage, and Dissemination
199(1)
8.1.8 Acceptance by NSF
200(1)
8.2 Applications and Workloads
200(3)
8.2.1 Highlights of Main Applications
201(2)
8.3 System Overview
203(1)
8.4 Hardware Architecture
203(1)
8.4.1 Node Design and Processor Elements
203(1)
8.4.2 Interconnect
204(1)
8.4.3 Storage Subsystem
204(1)
8.5 System Software
204(6)
8.5.1 Operating System
204(2)
8.5.2 System Administration
206(1)
8.5.3 Schedulers and Virtualization
206(1)
8.5.4 Security
207(1)
8.5.5 Storage Software
208(1)
8.5.6 User Authentication
208(1)
8.5.7 Allocation Software and Processes
209(1)
8.6 Programming System
210(3)
8.6.1 Atmosphere
210(1)
8.6.2 Jetstream Plugins for the Atmosphere Platform
211(1)
8.6.2.1 Authorization
211(1)
8.6.2.2 Allocation Sources and Special Allocations
211(1)
8.6.3 Globus Authentication and Data Access
212(1)
8.6.4 The Jetstream OpenStack API
212(1)
8.6.5 VM libraries
212(1)
8.7 Data Center Facilities
213(1)
8.8 System Statistics
214(1)
8.9 Interesting Features
215(2)
8.9.1 Jupyter and Kubernetes
216(1)
8.10 Artificial Intelligence Technology Education
217(1)
8.11 Jetstream VM Image Use for Scientific Reproducibility - Bioinformatics as an Example
217(1)
8.12 Running a Virtual Cluster on Jetstream
218(5)
9 Modular Supercomputing Architecture: From Idea to Production 223(34)
Estela Suarez
Norbert Eicker
Thomas Lippert
9.1 The Julich Supercomputing Centre (JSC)
224(5)
9.2 Supercomputing Architectures at JSC
224(1)
9.2.1 The Dual Supercomputer Strategy
225(2)
9.2.2 The Cluster-Booster Concept
227(1)
9.2.3 The Modular Supercomputing Architecture
228(1)
9.3 Applications and Workloads
229(3)
9.3.1 Co-design Applications in the DEEP Projects
231(1)
9.4 Systems Overview
232(2)
9.4.1 Sponsors
233(1)
9.4.2 Timeline
233(1)
9.5 Hardware Implementation
234(7)
9.5.1 First Generation (DEEP) Prototype
235(3)
9.5.2 Second Generation (DEEP-ER) Prototype
238(1)
9.5.3 JURECA
239(2)
9.6 System Software
241(4)
9.6.1 System Administration
241(1)
9.6.2 Schedulers and Resource Management
242(2)
9.6.3 Network-bridging Protocol
244(1)
9.6.4 I/O Software and File System
244(1)
9.7 Programming Model
245(4)
9.7.1 Inter-module MPI Offloading
245(1)
9.7.2 OmpSs Abstraction Layer
246(1)
9.7.3 Resiliency Software
247(2)
9.8 Cooling and Facility Infrastructure
249(1)
9.9 Conclusions and Next steps
250(1)
9.10 Acknowledgments
251(6)
10 SuperMUC at LRZ 257(18)
Hayk Shoukourian
Arndt Bode
Herbert Huber
Michael Ott
Dieter Kranzlmuller
10.1 Overview
257(3)
10.1.1 Timeline
258(2)
10.2 System Overview
260(1)
10.3 Applications and Workloads
261(4)
10.4 System Stability
265(1)
10.5 Data Center/Facility
266(2)
10.6 R&D on Energy-Efficiency at LRZ
268(7)
11 The NERSC Cori HPC System 275(30)
Katie Antypas Brian Austin
Deborah Bard
Wahid Bhimji
Brandon Cook
Tina Declerck
Jack Deslippe
Richard Gerber
Rebecca Hartman-Baker
Yun He
Douglas Jacobsen
Thorsten Kurth
Jay Srinivasan
Nicholas J. Wright
11.1 Overview
276(1)
11.1.1 Sponsor and Program Background
276(1)
11.1.2 Timeline
277(1)
11.2 Applications and Workloads
277(2)
11.2.1 Benchmarks
277(2)
11.3 System Overview
279(1)
11.4 Hardware Architecture
280(1)
11.4.1 Node Types and Design
280(1)
11.4.1.1 Xeon Phi "Knights Landing" Compute Nodes
280(1)
11.4.1.2 Xeon "Haswell" Compute Nodes
280(1)
11.4.1.3 Service Nodes
280(1)
11.4.2 Interconnect
281(1)
11.4.3 Storage - Burst Buffer and Lustre Filesystem
281(1)
11.5 System Software
281(4)
11.5.1 System Software Overview
281(1)
11.5.2 System Management Stack
282(1)
11.5.3 Resource Management
282(1)
11.5.4 Storage Resources and Software
283(1)
11.5.5 Networking Resources and Software
284(1)
11.5.6 Containers and User-Defined Images
284(1)
11.6 Programming Environment
285(3)
11.6.1 Programming Models
285(1)
11.6.2 Languages and Compilers
285(1)
11.6.3 Libraries and Tools
286(1)
11.6.4 Building Software for a Heterogeneous System
286(1)
11.6.5 Default Mode Selection Considerations
287(1)
11.6.6 Running Jobs
287(1)
11.7 NESAP
288(7)
11.7.1 Introduction
288(1)
11.7.2 Optimization Strategy and Tools
288(2)
11.7.3 Most Effective Optimizations
290(1)
11.7.4 NESAP Result Overview
291(1)
11.7.5 Application Highlights
291(4)
11.7.5.1 Quantum ESPRESSO
291(2)
11.7.5.2 MFDn
293(2)
11.8 Data Science
295(4)
11.8.1 IO Improvement: Burst Buffer
295(2)
11.8.2 Workflows
297(2)
11.8.2.1 Network Connectivity to External Nodes
298(1)
11.8.2.2 Burst Buffer Filesystem for In-situ Workflows
298(1)
11.8.2.3 Real-time and Interactive Queues for Time Sensitive Analyses
298(1)
11.8.2.4 Scheduler and Queue Improvements to Support Data-intensive Computing
299(1)
11.9 System Statistics
299(2)
11.9.1 System Utilizations
299(1)
11.9.2 Job Completion Statistics
299(2)
11.10 Summary
301(1)
11.11 Acknowledgments
302(3)
12 Lomonosov-2 305(26)
Vladimir Voevodin
Alexander Antonov
Dmitry Nikitenko
Pavel Shvets
Sergey Sobolev
Konstantin Stefanov
Vadim Voevodin
Sergey Zhumatiy
Andrey Brechalov
Alexander Naumov
12.1 Overview
305(4)
12.1.1 HPC History of MSU
305(3)
12.1.2 Lomonosov-2 Supercomputer: Timeline
308(1)
12.2 Applications and Workloads
309(2)
12.2.1 Main Applications Highlights
309(1)
12.2.2 Benchmark Results and Rating Positions
309(1)
12.2.3 Users and Workloads
310(1)
12.3 System Overview
311(2)
12.4 System Software and Programming Systems
313(2)
12.5 Networks
315(2)
12.5.1 Communication Network
315(1)
12.5.2 Auxiliary InfiniBand Network
315(1)
12.5.3 Management and Service Network
316(1)
12.6 Storage
317(1)
12.7 Engineering Infrastructure
318(5)
12.7.1 Infrastructure Support
318(1)
12.7.2 Power Distribution
318(2)
12.7.3 Engineering Equipment
320(1)
12.7.4 Overall Cooling System
320(2)
12.7.5 Cooling Auxiliary IT Equipment
322(1)
12.7.6 Emergency Cooling
322(1)
12.7.7 Efficiency
323(1)
12.8 Efficiency of the Supercomputer Center
323(8)
13 Electra 331(24)
Rupak Biswas
Jeff Becker
Damn Chan
David Ellsworth
Robert Hood
Piyush Mehrotra
Michelle Moyer
Chris Tanner
William Thigpen
13.1 Introduction
332(1)
13.2 NASA Requirements for Supercomputing
333(1)
13.3 Supercomputing Capabilities: Conventional Facilities
333(4)
13.3.1 Computer Systems
333(1)
13.3.2 Interconnect
334(1)
13.3.3 Network Connectivity
334(1)
13.3.4 Storage Resources
335(1)
13.3.5 Visualization and Hyperwall
336(1)
13.3.6 Primary NAS Facility
336(1)
13.4 Modular Supercomputing Facility
337(5)
13.4.1 Limitations of the Primary NAS Facility
337(1)
13.4.2 Expansion and Integration Strategy
337(1)
13.4.3 Site Preparation
338(1)
13.4.4 Module Design
338(1)
13.4.5 Power, Cooling, Network
339(1)
13.4.6 Facility Operations and Maintenance
340(1)
13.4.7 Environmental Impact
341(1)
13.5 Electra Supercomputer
342(2)
13.5.1 Performance
342(1)
13.5.2 I/O Subsystem Architecture
343(1)
13.6 User Environment
344(1)
13.6.1 System Software
344(1)
13.6.2 Resource Allocation and Scheduling
344(1)
13.6.3 User Services
344(1)
13.7 Application Benchmarking and Performance
345(2)
13.8 Utilization Statistics of HECC Resources
347(1)
13.9 System Operations and Maintenance
348(2)
13.9.1 Administration Tools
348(1)
13.9.2 Monitoring, Diagnosis, and Repair Tools
349(1)
13.9.3 System Enhancements and Maintenance
350(1)
13.10 Featured Application
350(2)
13.11 Conclusions
352(3)
14 Bridges: Converging HPC, AI, and Big Data for Enabling Discovery 355(30)
Nicholas A. Nystrom
Paola A. Buitrago
Philip D. Blood
14.1 Overview
356(3)
14.1.1 Sponsor/Program Background
357(1)
14.1.2 Timeline
358(1)
14.2 Applications and Workloads
359(6)
14.2.1 Highlights of Main Applications and Data
360(1)
14.2.2 Artificial Intelligence
361(1)
14.2.3 Genomics
362(1)
14.2.4 Gateways
363(1)
14.2.5 Allocations
364(1)
14.3 System Overview
365(1)
14.4 Hardware Architecture
366(4)
14.4.1 Processors and Accelerators
366(2)
14.4.2 Node Design
368(1)
14.4.3 Memory
369(1)
14.4.4 Interconnect
369(1)
14.4.5 Storage System
369(1)
14.5 System Software
370(2)
14.5.1 Operating System
370(1)
14.5.2 File Systems
371(1)
14.5.3 System Administration
371(1)
14.5.4 Scheduler
372(1)
14.6 Interactivity
372(1)
14.6.1 Virtualization and Containers
372(1)
14.7 User Environment
373(3)
14.7.1 User Environment Customization
373(1)
14.7.2 Programming Models
374(1)
14.7.3 Languages and Compilers
374(1)
14.7.4 Programming Tools
374(1)
14.7.5 Spark and Hadoop
374(1)
14.7.6 Databases
375(1)
14.7.7 Domain-Specific Frameworks and Libraries
375(1)
14.7.8 Gateways, Workflows, and Distributed Applications
375(1)
14.8 Storage, Visualization, and Analytics
376(1)
14.8.1 Community Datasets and Big Data as a Service
376(1)
14.9 Datacenter
376(1)
14.10 System Statistics
377(1)
14.10.1 Reliability and Uptime
377(1)
14.11 Science Highlights: Bridges-Enabled Breakthroughs
377(2)
14.11.1 Artificial Intelligence and Big Data
377(1)
14.11.2 Genomics
378(1)
14.12 Acknowledgments
379(6)
15 Stampede at TACC 385(16)
Dan Stanzione
John West
15.1 Overview
385(3)
15.1.1 Program Background
386(1)
15.1.2 Lessons Learned on the Path to Stampede 2
386(2)
15.2 Workload and the Design of Stampede 2
388(2)
15.2.1 Science Highlights
389(1)
15.3 System Configuration
390(2)
15.3.1 Processors and Memory
390(1)
15.3.2 Interconnect
391(1)
15.3.3 Disk I/O Subsystem
391(1)
15.3.4 Non-volatile Memory
392(1)
15.4 System Software
392(2)
15.4.1 System Performance Monitoring and Administration
392(1)
15.4.2 Job Submission and System Health
393(1)
15.4.3 Application Development Tools
393(1)
15.5 Visualization and Analytics
394(1)
15.5.1 Visualization on Stampede 2
394(1)
15.5.2 Data Analysis
395(1)
15.6 Datacenter, Layout, and Cybersecurity
395(1)
15.6.1 System Layout and Phased Deployment
396(1)
15.6.2 Cybersecurity and Identity Management
396(1)
15.7 Conclusion
396(5)
16 Oakforest-PACS 401(22)
Taisuke Boku
Osamu Tate be
Daisuke Takahashi
Kazuhiro Yabana
Yuta Hirokawa
Masayuki Umemura
Toshihiro Hanawa
Kengo Nakajima
Hiroshi Nakamura
Tsuyoshi Ichimura
Kohei Fujita
Yutaka Ishikawa
Mitsuhisa Sato
Balazs Gerofi
Masamichi Takagi
16.1 Overview
402(1)
16.2 Timeline
402(1)
16.3 Applications and Workloads
403(4)
16.3.1 GAMERA/GHYDRA
403(1)
16.3.2 ARTED
404(2)
16.3.3 Benchmark Results
406(3)
16.3.3.1 HPL
406(1)
16.3.3.2 HPCG
407(1)
16.4 System Overview
407(1)
16.5 Hardware Architecture
408(1)
16.6 System Software
409(3)
16.6.1 Basic System Software
409(1)
16.6.2 IHK/McKernel
409(3)
16.7 Programming System
412(6)
16.7.1 Basic Programming Environment
412(1)
16.7.2 XcalableMP: A PGAS Parallel Programming Language for Parallel Many-core Processor System
413(10)
16.7.2.1 Overview of XcalableMP
413(1)
16.7.2.2 OpenMP and XMP Tasklet Directive
414(1)
16.7.2.3 Multi-tasking Execution Model in XcalableMP between Nodes
415(1)
16.7.2.4 Preliminary Performance Evaluation on Oakforest-PACS
416(1)
16.7.2.5 Communication Optimization for Many-Core Clusters
417(1)
16.8 Storage System
418(1)
16.9 Data Center/Facility
419(4)
17 CHPC in South Africa 423(28)
Happy M. Sithole
Werner Janse Van Rensburg
Dorah Thobye
Krishna Govender
Charles Crosby
Kevin Colville
Anita Loots
17.1 Overview
423(3)
17.1.1 Sponsor/Program Background
423(1)
17.1.2 Business Case of the Installation of Lengau
424(1)
17.1.3 Timeline
425(1)
17.2 Applications and Workloads
426(10)
17.2.1 Highlights of Main Applications
426(1)
17.2.2 Benchmark Results
427(9)
17.2.2.1 Computational Mechanics
428(2)
17.2.2.2 Earth Sciences
430(1)
17.2.2.3 Computational Chemistry
430(3)
17.2.2.4 Astronomy
433(3)
17.3 System Overview
436(2)
17.4 Storage, Visualisation and Analytics
438(1)
17.5 Data Center/Facility
438(1)
17.6 System Statistics
439(8)
17.7 Square Kilometer Array
447(4)
Index 451
Jeffrey S. Vetter, Ph.D., is a Distinguished R&D Staff Member, and the founding group leader of the Future Technologies Group in the Computer Science and Mathematics Division of Oak Ridge National Laboratory. Vetter also holds a joint appointment at the Electrical Engineering and Computer Science Department of the University of Tennessee-Knoxville. From 2005 through 2015, Vetter held a Joint position at Georgia Institute of Technology, where, from 2009 to 2015, he was the Principal Investigator of the NSF Track 2D Experimental Computing XSEDE Facility, named Keeneland, for large scale heterogeneous computing using graphics processors, and the Director of the NVIDIA CUDA Center of Excellence.