Preface |
|
ix | |
|
|
1 | (2) |
|
2 The Co-array Programming Model |
|
|
3 | (6) |
|
|
3 | (4) |
|
|
7 | (2) |
|
|
9 | (10) |
|
|
9 | (3) |
|
3.2 Non-uniform partitions |
|
|
12 | (2) |
|
3.3 Row-partitioned matrix-vector multiplication |
|
|
14 | (2) |
|
3.4 Input/output in the co-array model |
|
|
16 | (1) |
|
|
17 | (2) |
|
4 Reverse Partition Operators |
|
|
19 | (8) |
|
4.1 The partition of unity |
|
|
19 | (2) |
|
4.2 Column-partitioned matrix-vector multiplication |
|
|
21 | (2) |
|
4.3 The dot-product operation |
|
|
23 | (1) |
|
4.4 Extended definition of partition operators |
|
|
24 | (1) |
|
|
25 | (2) |
|
|
27 | (12) |
|
|
27 | (3) |
|
|
30 | (1) |
|
5.3 The sum-to-all operation |
|
|
31 | (1) |
|
5.4 The max-to-all and min-to-all operations |
|
|
32 | (1) |
|
|
33 | (1) |
|
5.6 Collectives with array arguments |
|
|
33 | (1) |
|
5.7 The scatter and gather operations |
|
|
34 | (3) |
|
5.8 A cautionary note about functions with side effects |
|
|
37 | (1) |
|
|
37 | (2) |
|
|
39 | (18) |
|
6.1 Execution time for the sum-to-all operation |
|
|
40 | (1) |
|
6.2 Execution time for the dot-product operation |
|
|
41 | (2) |
|
6.3 Speedup and efficiency |
|
|
43 | (1) |
|
6.4 Strong scaling under a fixed-size constraint |
|
|
43 | (4) |
|
6.5 Weak scaling under a fixed-time constraint |
|
|
47 | (2) |
|
6.6 Weak scaling under a fixed-work constraint |
|
|
49 | (1) |
|
6.7 Weak scaling under a fixed-efficiency constraint |
|
|
50 | (2) |
|
6.8 Some remarks on computer performance modeling |
|
|
52 | (2) |
|
|
54 | (3) |
|
7 Partitioned Matrix Classes |
|
|
57 | (10) |
|
7.1 The abstract matrix class |
|
|
57 | (2) |
|
7.2 Sparse matrix classes |
|
|
59 | (2) |
|
7.3 The compressed-sparse-row matrix class |
|
|
61 | (3) |
|
7.4 Matrix-vector multiplication for a CSR matrix |
|
|
64 | (2) |
|
|
66 | (1) |
|
8 Iterative Solvers for Sparse Matrices |
|
|
67 | (12) |
|
8.1 The conjugate gradient algorithm |
|
|
67 | (3) |
|
|
70 | (1) |
|
8.3 Performance analysis for the conjugate gradient algorithm |
|
|
71 | (3) |
|
|
74 | (2) |
|
|
76 | (2) |
|
|
78 | (1) |
|
|
78 | (1) |
|
|
79 | (20) |
|
9.1 Partitioned dense matrices |
|
|
80 | (1) |
|
9.2 An abstract class for dense matrices |
|
|
80 | (1) |
|
9.3 The dense matrix class |
|
|
81 | (3) |
|
9.4 Matrix-matrix multiplication |
|
|
84 | (3) |
|
|
87 | (5) |
|
|
92 | (2) |
|
9.7 Solving triangular systems of equations |
|
|
94 | (4) |
|
|
98 | (1) |
|
10 The Matrix Transpose Operation |
|
|
99 | (12) |
|
10.1 The transpose operation |
|
|
99 | (3) |
|
10.2 A row-partitioned matrix transposed to a row-partitioned matrix |
|
|
102 | (3) |
|
10.3 The Fast Fourier Transform |
|
|
105 | (1) |
|
10.4 Performance analysis |
|
|
106 | (2) |
|
|
108 | (1) |
|
|
109 | (1) |
|
|
110 | (1) |
|
11 The Halo Exchange Operation |
|
|
111 | (8) |
|
11.1 Finite difference methods |
|
|
111 | (3) |
|
11.2 Partitioned finite difference methods |
|
|
114 | (2) |
|
11.3 The halo-exchange subroutine |
|
|
116 | (2) |
|
|
118 | (1) |
|
12 Subpartition Operators |
|
|
119 | (8) |
|
12.1 Subpartition operators |
|
|
119 | (2) |
|
12.2 Assigning blocks to images |
|
|
121 | (1) |
|
12.3 Combined effect of the two partition operations |
|
|
122 | (1) |
|
12.4 Permuted distributions |
|
|
122 | (2) |
|
12.5 The cyclic distribution |
|
|
124 | (1) |
|
|
125 | (1) |
|
|
126 | (1) |
|
13 Blocked Linear Algebra |
|
|
127 | (14) |
|
|
127 | (3) |
|
13.2 The block matrix class |
|
|
130 | (6) |
|
13.3 Optimization of the LU-decomposition algorithm |
|
|
136 | (2) |
|
|
138 | (3) |
|
14 The Finite Element Method |
|
|
141 | (18) |
|
14.1 Basic ideas from finite element analysis |
|
|
142 | (1) |
|
14.2 Nodes, elements and basis functions |
|
|
143 | (4) |
|
14.3 Mesh partition operators |
|
|
147 | (3) |
|
|
150 | (4) |
|
14.5 Integrating the heat equation |
|
|
154 | (2) |
|
|
156 | (3) |
|
|
159 | (12) |
|
|
159 | (2) |
|
15.2 The breadth-first search |
|
|
161 | (3) |
|
|
164 | (1) |
|
15.4 A parallel breadth-first-search algorithm |
|
|
165 | (2) |
|
15.5 The Graph 500 benchmark |
|
|
167 | (2) |
|
|
169 | (2) |
|
|
171 | (2) |
A A Brief Reference Manual for the Co-array Model |
|
173 | (24) |
|
|
174 | (1) |
|
A.2 Co-arrays and co-dimensions |
|
|
175 | (2) |
|
A.3 Relative co-dimension indices |
|
|
177 | (1) |
|
A.4 Co-array variables with multiple co-dimensions |
|
|
178 | (1) |
|
A.5 Co-array variables of derived type |
|
|
179 | (3) |
|
A.6 Allocatable co-array variables |
|
|
182 | (1) |
|
|
182 | (1) |
|
|
183 | (1) |
|
|
183 | (1) |
|
|
184 | (1) |
|
|
185 | (2) |
|
A.12 Critical segments and locks |
|
|
187 | (2) |
|
|
189 | (1) |
|
A.14 Command line arguments |
|
|
190 | (1) |
|
|
191 | (1) |
|
|
191 | (1) |
|
A.17 Image index functions |
|
|
192 | (1) |
|
A.18 Execution control statements |
|
|
192 | (5) |
Bibliography |
|
197 | (10) |
Index |
|
207 | |