Atjaunināt sīkdatņu piekrišanu

E-grāmata: Advanced Backend Code Optimization

  • Formāts: PDF+DRM
  • Izdošanas datums: 03-Jun-2014
  • Izdevniecība: ISTE Ltd and John Wiley & Sons Inc
  • Valoda: eng
  • ISBN-13: 9781118648940
Citas grāmatas par šo tēmu:
  • Formāts - PDF+DRM
  • Cena: 165,30 €*
  • * ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
  • Ielikt grozā
  • Pievienot vēlmju sarakstam
  • Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.
  • Bibliotēkām
  • Formāts: PDF+DRM
  • Izdošanas datums: 03-Jun-2014
  • Izdevniecība: ISTE Ltd and John Wiley & Sons Inc
  • Valoda: eng
  • ISBN-13: 9781118648940
Citas grāmatas par šo tēmu:

DRM restrictions

  • Kopēšana (kopēt/ievietot):

    nav atļauts

  • Drukāšana:

    nav atļauts

  • Lietošana:

    Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
    Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

    Nepieciešamā programmatūra
    Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

    Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

    Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

This book is a summary of more than a decade of research in the area of backend optimization. It contains the latest fundamental research results in this field. While existing books are often more oriented toward Masters students, this book is aimed more towards professors and researchers as it contains more advanced subjects.
It is unique in the sense that it contains information that has not previously been covered by other books in the field, with chapters on phase ordering in optimizing compilation; register saturation in instruction level parallelism; code size reduction for software pipelining; memory hierarchy effects and instruction level parallelism.
Other chapters provide the latest research results in well-known topics such as register need, and software pipelining and periodic register allocation.

Introduction xiii
Part 1 Prolog: Optimizing Compilation
1(22)
Chapter 1 On the Decidability of Phase Ordering in Optimizing Compilation
3(20)
1.1 Introduction to the phase ordering problem
3(2)
1.2 Background on phase ordering
5(2)
1.2.1 Performance modeling and prediction
5(1)
1.2.2 Some attempts in phase ordering
6(1)
1.3 Toward a theoretical model for the phase ordering problem
7(5)
1.3.1 Decidability results
8(3)
1.3.2 Another formulation of the phase ordering problem
11(1)
1.4 Examples of decidable simplified cases
12(4)
1.4.1 Models with compilation costs
12(1)
1.4.2 One-pass generative compilers
13(3)
1.5 Compiler optimization parameter" space exploration
16(4)
1.5.1 Toward a theoretical model
17(2)
1.5.2 Examples of simplified decidable cases
19(1)
1.6 Conclusion on phase ordering in optimizing compilation
20(3)
Part 2 Instruction Scheduling
23(96)
Chapter 2 Instruction Scheduling Problems and Overview
25(14)
2.1 VLIW instruction scheduling problems
25(4)
2.1.1 Instruction scheduling and register allocation in a code generator
25(2)
2.1.2 The block and pipeline VLIW instruction scheduling problems
27(2)
2.2 Software pipelining
29(6)
2.2.1 Cyclic, periodic and pipeline scheduling problems
29(3)
2.2.2 Modulo instruction scheduling problems and techniques
32(3)
2.3 Instruction scheduling and register allocation
35(4)
2.3.1 Register instruction scheduling problem solving approaches
35(4)
Chapter 3 Applications of Machine Scheduling to Instruction Scheduling
39(12)
3.1 Advances in machine scheduling
39(4)
3.1.1 Parallel machine scheduling problems
39(2)
3.1.2 Parallel machine scheduling extensions and relaxations
41(2)
3.2 List scheduling algorithms
43(4)
3.2.1 List scheduling algorithms and list scheduling priorities
43(2)
3.2.2 The scheduling algorithm of Leung, Palem and Pnueli
45(2)
3.3 Time-indexed scheduling problem formulations
47(4)
3.3.1 The non-preemptive time-indexed RCPSP formulation
47(1)
3.3.2 Time-indexed formulation for the modulo RPISP
48(3)
Chapter 4 Instruction Scheduling Before Register Allocation
51(26)
4.1 Instruction scheduling for an ILP processor: case of a VLIW architecture
51(16)
4.1.1 Minimum cumulative register lifetime modulo scheduling
51(3)
4.1.2 Resource modeling in instruction scheduling problems
54(2)
4.1.3 The modulo insertion scheduling theorems
56(2)
4.1.4 Insertion scheduling in a backend compiler
58(2)
4.1.5 Example of an industrial production compiler from STMicroelectronics
60(4)
4.1.6 Time-indexed formulation of the modulo RCISP
64(3)
4.2 Large neighborhood search for the resource-constrained modulo scheduling problem
67(1)
4.3 Resource-constrained modulo scheduling problem
68(3)
4.3.1 Resource-constrained cyclic scheduling problems
68(1)
4.3.2 Resource-constrained modulo scheduling problem statement
69(1)
4.3.3 Solving resource-constrained modulo scheduling problems
70(1)
4.4 Time-indexed integer programming formulations
71(3)
4.4.1 The non-preemptive time-indexed RCPSP formulation
71(1)
4.4.2 The classic modulo scheduling integer programming formulation
72(1)
4.4.3 A new time-indexed formulation for modulo scheduling
73(1)
4.5 Large neighborhood search heuristic
74(2)
4.5.1 Variables and constraints in time-indexed formulations
74(1)
4.5.2 A large neighborhood search heuristic for modulo scheduling
74(1)
4.5.3 Experimental results with a production compiler
75(1)
4.6 Summary and conclusions
76(1)
Chapter 5 Instruction Scheduling After Register Allocation
77(14)
5.1 Introduction
77(2)
5.2 Local instruction scheduling
79(5)
5.2.1 Acyclic instruction scheduling
79(1)
5.2.2 Scoreboard Scheduling principles
80(2)
5.2.3 Scoreboard Scheduling implementation
82(2)
5.3 Global instruction scheduling
84(3)
5.3.1 Postpass inter-region scheduling
84(2)
5.3.2 Inter-block Scoreboard Scheduling
86(1)
5.3.3 Characterization of fixed points
87(1)
5.4 Experimental results
87(2)
5.5 Conclusions
89(2)
Chapter 6 Dealing in Practice with Memory Hierarchy Effects and Instruction Level Parallelism
91(28)
6.1 The problem of hardware memory disambiguation at runtime
92(12)
6.1.1 Introduction
92(1)
6.1.2 Related work
93(1)
6.1.3 Experimental environment
94(1)
6.1.4 Experimentation methodology
95(1)
6.1.5 Precise experimental study of memory hierarchy performance
95(5)
6.1.6 The effectiveness of load/store vectorization
100(3)
6.1.7 Conclusion on hardware memory disambiguation mechanisms
103(1)
6.2 Data preloading and prefetching
104(15)
6.2.1 Introduction
104(1)
6.2.2 Related work
105(2)
6.2.3 Problems of optimizing cache effects at the instruction level
107(2)
6.2.4 Target processor description
109(1)
6.2.5 Our method of instruction-level code optimization
110(6)
6.2.6 Experimental results
116(1)
6.2.7 Conclusion on prefetching and preloading at instruction level
117(2)
Part 3 Register Optimization
119(112)
Chapter 7 The Register Need of a Fixed Instruction Schedule
121(20)
7.1 Data dependence graph and processor model for register optimization
122(1)
7.1.1 NUAL and UAL semantics
122(1)
7.2 The acyclic register need
123(2)
7.3 The periodic register need
125(4)
7.3.1 Software pipelining, periodic scheduling and cyclic scheduling
125(2)
7.3.2 The circular lifetime intervals
127(2)
7.4 Computing the periodic register need
129(3)
7.5 Some theoretical results on the periodic register need
132(7)
7.5.1 Minimal periodic register need versus initiation interval
133(1)
7.5.2 Computing the periodic register sufficiency
133(1)
7.5.3 Stage scheduling under register constraints
134(5)
7.6 Conclusion on the register requirement
139(2)
Chapter 8 The Register Saturation
141(18)
8.1 Motivations on the register saturation concept
141(3)
8.2 Computing the acyclic register saturation
144(9)
8.2.1 Characterizing the register saturation
146(3)
8.2.2 Efficient algorithmic heuristic for register saturation computation
149(2)
8.2.3 Experimental efficiency of Greedy-k
151(2)
8.3 Computing the periodic register saturation
153(4)
8.3.1 Basic integer linear variables
154(1)
8.3.2 Integer linear constraints
154(2)
8.3.3 Linear objective function
156(1)
8.4 Conclusion on the register saturation
157(2)
Chapter 9 Spill Code Reduction
159(18)
9.1 Introduction to register constraints in software pipelining
159(1)
9.2 Related work in periodic register allocation
160(2)
9.3 SIRA: schedule independant register allocation
162(7)
9.3.1 Reuse graphs
162(2)
9.3.2 DDG associated with reuse graph
164(2)
9.3.3 Exact SIRA with integer linear programming
166(2)
9.3.4 SIRA-with fixed reuse edges
168(1)
9.4 SIRALINA: an efficient polynomial heuristic for SIRA
169(4)
9.4.1 Integer variables for the linear problem
170(1)
9.4.2 Step 1: the scheduling problem
170(2)
9.4.3 Step 2: the linear assignment problem
172(1)
9.5 Experimental results with SIRA
173(2)
9.6 Conclusion on spill code reduction
175(2)
Chapter 10 Exploiting the Register Access Delays Before Instruction Scheduling
177(14)
10.1 Introduction
177(2)
10.2 Problem description of DDG circuits with non-positive distances
179(1)
10.3 Necessary and sufficient condition to avoid non-positive circuits
180(2)
10.4 Application to the SIRA framework
182(5)
10.4.1 Recall on SIRALINA heuristic
183(1)
10.4.2 Step 1: the scheduling problem for a fixed II
183(1)
10.4.3 Step 2: the linear assignment problem
184(1)
10.4.4 Eliminating non-positive circuits in SIRALINA
184(2)
10.4.5 Updating reuse distances
186(1)
10.5 Experimental results on eliminating non-positive circuits
187(1)
10.6 Conclusion on non-positive circuit elimination
188(3)
Chapter 11 Loop Unrolling Degree Minimization for Periodic Register Allocation
191(40)
11.1 Introduction
191(4)
11.2 Background
195(9)
11.2.1 Loop unrolling after SWP with modulo variable expansion
196(1)
11.2.2 Meeting graphs (MG)
197(3)
11.2.3 SIRA, reuse graphs and loop unrolling
200(4)
11.3 Problem description of unroll factor minimization for unscheduled loops
204(1)
11.4 Algorithmic solution for unroll factor minimization: single register type
205(8)
11.4.1 Fixed loop unrolling problem
206(1)
11.4.2 Solution for the fixed loop unrolling problem
207(2)
11.4.3 Solution for LCM-MIN problem
209(4)
11.5 Unroll factor minimization in the presence of multiple register types
213(8)
11.5.1 Search space for minimal kernel loop unrolling
217(1)
11.5.2 Generalization of the fixed loop unrolling problem in the presence of multiple register types
218(1)
11.5.3 Algorithmic solution for the loop unrolling minimization (LUM, problem 11.1)
219(2)
11.6 Unroll factor reduction for already scheduled loops
221(3)
11.6.1 Improving algorithm 11.4 (LCM-MIN) for the meeting graph framework
224(1)
11.7 Experimental results
224(2)
11.8 Related work
226(2)
11.8.1 Rotating register files
226(1)
11.8.2 Inserting move operations
227(1)
11.8.3 Loop unrolling after software pipelining
228(1)
11.8.4 Code generation for multidimensional loops
228(1)
11.9 Conclusion on loop unroll degree minimization
228(3)
Part 4 Epilog: Performance, Open Problems
231(26)
Chapter 12 Statistical Performance Analysis: The Speedup-Test Protocol
233(24)
12.1 Code performance variation
233(3)
12.2 Background and notations
236(3)
12.3 Analyzing the statistical significance of the observed speedups
239(5)
12.3.1 The speedup of the observed average execution time
239(2)
12.3.2 The speedup of the observed median execution time, as well as individual runs
241(3)
12.4 The Speedup-Test software
244(2)
12.5 Evaluating the proportion of accelerated benchmarks by a confidence interval
246(2)
12.6 Experiments and applications
248(5)
12.6.1 Comparing the performances of compiler optimization levels
249(1)
12.6.2 Testing the performances of parallel executions of OpenMP applications
250(1)
12.6.3 Comparing the efficiency of two compilers
251(2)
12.6.4 The impact of the Speedup-Test protocol over some observed speedups
253(1)
12.7 Related work
253(2)
12.7.1 Observing execution times variability
253(1)
12.7.2 Program performance evaluation in presence of variability
254(1)
12.8 Discussion and conclusion on the Speedup-Test protocol
255(2)
Conclusion 257(6)
Appendix 1 Presentation of the Benchmarks used in our Experiments 263(8)
Appendix 2 Register Saturation Computation on Stand-alone DDG 271(8)
Appendix 3 Efficiency of SIRA on the Benchmarks 279(14)
Appendix 4 Efficiency of Non-Positive Circuit Elimination in the SIRA Framework 293(10)
Appendix 5 Loop Unroll Degree Minimization: Experimental Results 303(12)
Appendix 6 Experimental Efficiency of Software Data Preloading and Prefetching for Embedded VLIW 315(4)
Appendix 7 Appendix of the Speedup-Test Protocol 319(2)
Bibliography 321(24)
Lists of Figures, Tables and Algorithms 345(8)
Index 353
Sid Touati is Full professor at the university of Nice Sophia-Antipolis, France.