|
|
1 | |
|
1.1 The System-on-Chip Era |
|
|
2 | |
|
1.2 Characteristics of Embedded Software |
|
|
5 | |
|
1.3 Context and Motivation |
|
|
9 | |
|
|
12 | |
|
|
13 | |
|
|
15 | |
|
|
15 | |
|
|
16 | |
|
2.1.2 Fixed or Dynamic Priority Scheduling |
|
|
17 | |
|
|
19 | |
|
2.1.4 Real-Time Operating Systems |
|
|
21 | |
|
2.1.5 Comparison with TCM Scheduling |
|
|
22 | |
|
2.2 Low-power Considerations |
|
|
22 | |
|
2.2.1 Dynamic Power Management |
|
|
22 | |
|
2.2.2 Dynamic Voltage Scaling |
|
|
23 | |
|
2.2.3 Battery Life Related |
|
|
28 | |
|
2.2.4 Physical Low-power Implementation |
|
|
28 | |
|
|
28 | |
|
2.2.6 Comparison with TCM Low-power Scheduling |
|
|
29 | |
|
2.3 Platform Issues and Codesign Framework |
|
|
29 | |
|
2.3.1 Processor Architecture |
|
|
29 | |
|
2.3.2 Data Memory Organization |
|
|
30 | |
|
|
30 | |
|
2.3.4 Energy/Performance Estimation |
|
|
30 | |
|
|
31 | |
|
|
31 | |
|
2.3.7 Co-Synthesis Workflows |
|
|
32 | |
|
3 System Model and Work Flow |
|
|
35 | |
|
3.1 Overview of TCM Work flow |
|
|
35 | |
|
|
37 | |
|
3.2.1 Thread Frame Identification |
|
|
38 | |
|
3.2.2 Thread Node Identification |
|
|
40 | |
|
3.3 System Scenario Selection |
|
|
43 | |
|
3.3.1 Handling of Intra-TF Scenarios |
|
|
43 | |
|
3.3.2 Handling of Inter-TF Scenarios |
|
|
45 | |
|
|
45 | |
|
|
48 | |
|
4 Basic Design-Time Scheduling |
|
|
51 | |
|
|
51 | |
|
4.2 Exact Scheduling Algorithms |
|
|
53 | |
|
4.3 Forward Search Algorithm |
|
|
55 | |
|
4.3.1 The Kernel Heuristic |
|
|
55 | |
|
4.3.2 Tuning the Load Calculation |
|
|
59 | |
|
4.3.3 Tuning the Processor Selection Priority |
|
|
60 | |
|
4.3.4 Tuning both Load Calculation and Processor Priority |
|
|
62 | |
|
4.3.5 Improving Node Selection Policy |
|
|
64 | |
|
4.3.6 Comparing ASAPACAP and ASAP |
|
|
67 | |
|
4.3.7 Pruning Techniques for Tie-Breaking |
|
|
69 | |
|
4.3.8 Further Pruning Technique for Tie-Breaking |
|
|
73 | |
|
4.3.9 Extension with Exhaustive Search |
|
|
75 | |
|
4.4 Backward Search Algorithm |
|
|
89 | |
|
4.4.1 Experiment with Fig. 4.11 |
|
|
94 | |
|
4.4.2 Experiment with Fig. 4.35 |
|
|
96 | |
|
|
98 | |
|
4.5 Subplatform Scheduling |
|
|
98 | |
|
4.5.1 Experiment with Fig. 4.11 |
|
|
99 | |
|
4.5.2 Experiment with Fig. 4.35 |
|
|
103 | |
|
|
103 | |
|
4.6 Handling Timing-Constraints |
|
|
104 | |
|
|
107 | |
|
5 Scalable Design-Time Scheduling |
|
|
109 | |
|
|
109 | |
|
|
110 | |
|
5.3 Thread Frame Decomposition |
|
|
114 | |
|
5.3.1 Problem Formulation |
|
|
115 | |
|
5.3.2 Decomposition Guidelines |
|
|
116 | |
|
5.3.3 Valid Decomposition |
|
|
120 | |
|
5.3.4 Thread Frame Decomposition Algorithm |
|
|
122 | |
|
5.4 Thread Partition Clustering |
|
|
123 | |
|
5.4.1 Identifying Thread Partition Clusters |
|
|
123 | |
|
5.4.2 Generating New Pareto Curves |
|
|
123 | |
|
5.5 Thread Partition Interleaving |
|
|
124 | |
|
|
124 | |
|
|
125 | |
|
5.5.3 Interleaving Technique |
|
|
125 | |
|
5.6 Experimental Results and Discussions |
|
|
128 | |
|
|
128 | |
|
5.6.2 Experiments with Random Thread Frames |
|
|
128 | |
|
5.7 Comparison with State of the Art |
|
|
130 | |
|
|
133 | |
|
6 Fast and Scalable Run-time Scheduling |
|
|
135 | |
|
6.1 Two-Phase Task Scheduling: Why and How |
|
|
135 | |
|
6.2 Run-Time Scheduling Algorithm |
|
|
141 | |
|
|
141 | |
|
6.2.2 Problem Formulation |
|
|
142 | |
|
|
143 | |
|
|
146 | |
|
6.3.1 Randomly Generated Test Cases |
|
|
146 | |
|
6.3.2 Real-Life Applications |
|
|
148 | |
|
|
150 | |
|
7 Handling of Multidimensional Pareto Curves |
|
|
151 | |
|
7.1 Overview of the Customized Run-Time Management |
|
|
152 | |
|
7.2 Problem Formulation of Run-Time Operating Point Selector |
|
|
155 | |
|
|
156 | |
|
7.4 MP-SoC Heuristic Description |
|
|
157 | |
|
7.4.1 Pareto Filtering Preprocessing |
|
|
158 | |
|
7.4.2 Multidimension Resource Reduction |
|
|
158 | |
|
7.4.3 Pareto Point Sorting |
|
|
158 | |
|
7.4.4 Greedy Algorithm to Solve MMKP |
|
|
159 | |
|
|
160 | |
|
|
162 | |
|
8 Run-Time Software Multithreading |
|
|
163 | |
|
8.1 Motivation of Run-Time Rescheduling |
|
|
164 | |
|
8.2 Run-Time Interleaving |
|
|
166 | |
|
|
167 | |
|
8.2.2 Interleaving Technique |
|
|
168 | |
|
8.3 Experimental Results and Discussion |
|
|
173 | |
|
8.4 Comparison with State of the Art |
|
|
174 | |
|
8.4.1 Pure Dynamic Scheduling |
|
|
174 | |
|
8.4.2 Hybrid Dynamic Scheduling |
|
|
175 | |
|
|
176 | |
|
9 Fast Source-level Performance Estimation |
|
|
177 | |
|
|
177 | |
|
|
179 | |
|
9.3 Comparison with State of the Art |
|
|
182 | |
|
9.4 Fundamentals of the Estimation Technique |
|
|
184 | |
|
9.4.1 Determining Single-Execution Costs |
|
|
185 | |
|
9.4.2 Determining Execution Counts |
|
|
189 | |
|
|
191 | |
|
|
193 | |
10 Handling of Task-Level Data Communication and Storage |
|
195 | |
|
|
196 | |
|
10.1.1 Multilayer Memory Architecture in a Multiprocessor Environment |
|
|
197 | |
|
10.1.2 Modeling Energy Consumption |
|
|
198 | |
|
10.2 Exploring Thread Node Level Data Reuse |
|
|
200 | |
|
10.3 Data Assignment on L1 Memory Layer |
|
|
201 | |
|
10.3.1 Access Conflicts Degrade Energy and Performance |
|
|
202 | |
|
10.3.2 Removing access conflicts with data assignment |
|
|
204 | |
|
10.3.3 Energy Aware Data Assignment |
|
|
207 | |
|
10.3.4 Evaluating SRAM Energy/Performance Trade-offs |
|
|
209 | |
|
10.4 Bandwidth Aware Scheduling |
|
|
212 | |
|
10.4.1 Bandwidth Aware Thread Node Scheduling |
|
|
213 | |
|
10.4.2 Experimental Results |
|
|
215 | |
|
10.5 Handling inter-TN and inter-TF Data Transfers |
|
|
219 | |
|
10.5.1 Handling inter-TN Accesses Within a Single TF |
|
|
219 | |
|
10.5.2 Inter-Thread Frame Transfers |
|
|
221 | |
|
|
223 | |
11 Demonstration on Heterogeneous Multiprocessor SoCs |
|
225 | |
|
11.1 Motivation for Heterogeneous Multiprocessor Platforms |
|
|
225 | |
|
11.2 Mapping Visual Texture Coding Decoder |
|
|
226 | |
|
11.2.1 Overview of VTC Decoder |
|
|
226 | |
|
11.2.2 Target Platform for VTC Decoder |
|
|
228 | |
|
11.2.3 TCM Methodology Illustrated on VTC Decoder |
|
|
229 | |
|
|
236 | |
12 Conclusions and future research work |
|
239 | |
A Input and output data of scheduling examples in Section 4.3.1 |
|
243 | |
References |
|
249 | |