Preface |
|
vii | |
Editor |
|
xi | |
Contributors |
|
xiii | |
|
1 Algorithms for Bioinformatics |
|
|
1 | (28) |
|
|
|
1 | (1) |
|
1.2 Pairwise Sequence Alignment |
|
|
2 | (12) |
|
1.2.1 Definitions and Notations |
|
|
2 | (4) |
|
1.2.2 DP for Optimal Pairwise Alignment with Linear Gap Penalty Function |
|
|
6 | (4) |
|
1.2.3 DP for Optimal Pairwise Alignment with Affine Gap Penalty Fuction |
|
|
10 | (2) |
|
1.2.4 Computing Alignments in Linear Space Using Divide and Conquer |
|
|
12 | (2) |
|
1.3 Multiple Sequence Alignment |
|
|
14 | (8) |
|
|
14 | (4) |
|
1.3.2 Progressive Alignment |
|
|
18 | (4) |
|
1.4 Database Search and Exact Matching |
|
|
22 | (4) |
|
|
22 | (2) |
|
1.4.2 Suffix Trees and Suffix Arrays |
|
|
24 | (2) |
|
|
26 | (3) |
|
2 Introduction to GPGPUs and Massively Threaded Programming |
|
|
29 | (20) |
|
|
|
29 | (2) |
|
2.2 Massive Multithreading Is the Key |
|
|
31 | (4) |
|
2.3 CUDA Simplifies the Creation of Massively Threaded Software |
|
|
35 | (10) |
|
2.3.1 Step 1: Getting (and Keeping) the Data on the GPU |
|
|
38 | (1) |
|
2.3.2 Step 2: Maximizing the Amount of Work Performed per Call to the GPU |
|
|
39 | (2) |
|
2.3.3 Step 3: Exploiting Internal Resources on the GPU |
|
|
41 | (1) |
|
2.3.3.1 Register and Shared Memory |
|
|
42 | (1) |
|
|
43 | (1) |
|
|
43 | (1) |
|
|
44 | (1) |
|
|
45 | (1) |
|
|
45 | (1) |
|
|
46 | (1) |
|
|
47 | (2) |
|
3 FPGA: Architecture and Programming |
|
|
49 | (10) |
|
|
|
49 | (1) |
|
3.2 The Need for FPGA Computing |
|
|
50 | (2) |
|
3.3 FPGA Computing Architectures |
|
|
52 | (2) |
|
3.4 FPGA Development Tools |
|
|
54 | (2) |
|
|
56 | (1) |
|
|
56 | (3) |
|
4 Parallel Algorithms for Alignments on the Cell Be |
|
|
59 | (26) |
|
|
|
|
61 | (2) |
|
4.2 Sequence Alignments on the Cell Processor |
|
|
63 | (1) |
|
4.3 A Parallel Communication Scheme |
|
|
63 | (5) |
|
4.3.1 Tiling Scheme for Aligning Longer Sequences |
|
|
65 | (1) |
|
4.3.2 Computing the Optimal Alignment Score Using Tiling |
|
|
66 | (1) |
|
4.3.3 Computing an Optimal Alignment Using Tiling |
|
|
67 | (1) |
|
4.4 A Hybrid Parallel Algorithm |
|
|
68 | (8) |
|
4.4.1 Parallel Alignment Scheme Using Prefix Computations |
|
|
68 | (2) |
|
4.4.2 Problem Decomposition Using Wavefornt Scheme |
|
|
70 | (2) |
|
4.4.3 Subproblem Alignment Phase Using Hirschberg's Technique |
|
|
72 | (1) |
|
4.4.4 Further Optimizations: Vectorization and Memory Management |
|
|
73 | (1) |
|
|
74 | (1) |
|
4.4.6 Performance of the Hybrid Algorithm |
|
|
74 | (2) |
|
4.5 Algorithms for Specialized Alignments |
|
|
76 | (6) |
|
|
76 | (2) |
|
4.5.2 Performance of Parallel Spliced Alignment Algorithm |
|
|
78 | (1) |
|
4.5.3 Syntenic Alignments |
|
|
79 | (1) |
|
4.5.4 Performance of Parallel Syntenic Alignment Algorithm |
|
|
80 | (2) |
|
|
82 | (1) |
|
|
82 | (3) |
|
5 Orchestrating the Phylogenetic Likelihood Function on Emerging Parallel Architectures |
|
|
85 | (32) |
|
|
5.1 Phylogenetic Inference |
|
|
86 | (2) |
|
5.2 The Phylogenetic Likelihood Function |
|
|
88 | (9) |
|
5.2.1 Avoiding Numerical Underflow |
|
|
92 | (2) |
|
5.2.2 Memory Requirements |
|
|
94 | (1) |
|
5.2.3 Single or Double Precision? |
|
|
95 | (2) |
|
5.3 Parallelization Strategies |
|
|
97 | (10) |
|
5.3.1 Parallel Programming Paradigms |
|
|
98 | (1) |
|
5.3.2 General Fine-Grain Parallelization |
|
|
98 | (1) |
|
5.3.1 Parallel Programming Paradigms |
|
|
98 | (1) |
|
5.3.2 General Fine-Grain Parallelization |
|
|
98 | (3) |
|
5.3.2.1 A Library for the PLF |
|
|
101 | (1) |
|
5.3.2.2 Scalability Issues |
|
|
102 | (1) |
|
5.3.3 The Real World: Load Balance Issues |
|
|
103 | (4) |
|
5.4 Adaptations to Emerging Parallel Architectures |
|
|
107 | (3) |
|
|
110 | (2) |
|
|
112 | (5) |
|
6 Parallel Bioinformatics Algorithms for CUDA-Enabled GPUs |
|
|
117 | (22) |
|
|
|
|
|
117 | (1) |
|
6.2 Techniques for Many-Core GPUs |
|
|
118 | (3) |
|
6.2.1 Hybrid Computing Framework |
|
|
118 | (1) |
|
6.2.2 Intertask and Intratask Parallelization |
|
|
119 | (1) |
|
6.2.3 Coalesced Subject Sequence Arrangement |
|
|
120 | (1) |
|
6.2.4 Coalesced Global Memory Access |
|
|
120 | (1) |
|
6.2.5 Cell Block Division Method |
|
|
121 | (1) |
|
|
121 | (2) |
|
6.4 Multiple Sequence Alignment |
|
|
123 | (7) |
|
|
130 | (5) |
|
|
135 | (1) |
|
|
136 | (3) |
|
7 CUDA Error Correction Method for High-Throughput Short-Read Sequencing Data |
|
|
139 | (18) |
|
|
|
|
|
139 | (2) |
|
7.2 Spectral Alignment Approach to Error Correction |
|
|
141 | (2) |
|
7.3 Parallel Error Correction with CUDA |
|
|
143 | (6) |
|
7.3.1 Bloom Filter Data Structure and Spectrum Computation |
|
|
143 | (2) |
|
7.3.2 Parallel Error Correction Using CUDA |
|
|
145 | (2) |
|
|
147 | (2) |
|
7.4 Performance Evaluation |
|
|
149 | (5) |
|
7.5 Conclusion and Future Work |
|
|
154 | (1) |
|
|
155 | (2) |
|
8 FPGA Acceleration of Seeded Similarity Searching |
|
|
157 | (24) |
|
|
|
|
|
|
161 | (4) |
|
|
162 | (1) |
|
|
163 | (1) |
|
|
164 | (1) |
|
8.1.4 Execution Profile of the BLAST Algorithm |
|
|
164 | (1) |
|
8.2 A Streaming Hardware Architecture for BLAST |
|
|
165 | (9) |
|
|
166 | (1) |
|
8.2.1.1 Nucleotide Seed Generation Architecture |
|
|
166 | (2) |
|
8.2.1.2 Protein Seed Generation |
|
|
168 | (2) |
|
|
170 | (2) |
|
|
172 | (2) |
|
|
174 | (3) |
|
|
175 | (1) |
|
|
176 | (1) |
|
|
177 | (2) |
|
|
179 | (2) |
|
9 Seed-Based Parallel Protein Sequence Comparison Combining Multithreading, GPU, and FPGA Technologies |
|
|
181 | (22) |
|
|
|
|
181 | (3) |
|
9.2 Principles of the Algorithm |
|
|
184 | (6) |
|
|
184 | (2) |
|
|
186 | (2) |
|
|
188 | (1) |
|
|
188 | (1) |
|
9.2.5 Generic Hardware Implementation |
|
|
189 | (1) |
|
|
190 | (4) |
|
9.3.1 UNGAP Parallelization on GPU |
|
|
191 | (1) |
|
9.3.2 UNGAP Parallelization on FPGA |
|
|
191 | (2) |
|
9.3.3 SMALL GAP Parallelization on GPU |
|
|
193 | (1) |
|
9.4 Comparison of the GPU/FPGA Technologies |
|
|
194 | (3) |
|
|
194 | (1) |
|
|
194 | (1) |
|
9.4.3 Software and Dataset |
|
|
195 | (1) |
|
9.4.4 Comparison of the Execution Times |
|
|
195 | (1) |
|
|
196 | (1) |
|
9.4.6 FPGA Implementation |
|
|
197 | (1) |
|
|
197 | (6) |
|
|
200 | (3) |
|
10 Database Searching with Profile-Hidden Markov Models on Reconfigurable and Many-Core Architectures |
|
|
203 | (20) |
|
|
|
|
|
203 | (1) |
|
|
204 | (5) |
|
10.3 FPGA Parallelization and Results |
|
|
209 | (6) |
|
|
209 | (4) |
|
10.3.2 Performance Evaluation |
|
|
213 | (2) |
|
10.4 GPU Parallelization and Results |
|
|
215 | (5) |
|
|
216 | (1) |
|
|
216 | (1) |
|
10.4.2.1 Database Sorting |
|
|
217 | (1) |
|
10.4.2.2 Memory Layout Optimizations |
|
|
217 | (1) |
|
10.4.2.3 Memory Hierarchy Optimizations |
|
|
218 | (1) |
|
10.4.2.4 Host Optimizations |
|
|
219 | (1) |
|
|
220 | (1) |
|
|
221 | (2) |
|
11 Copacobana: A Massively Parallel FPGA-Based Computer Architecture |
|
|
223 | (40) |
|
|
|
|
|
|
224 | (2) |
|
11.1.1 History of Complexity |
|
|
224 | (1) |
|
11.1.2 Basic Idea of the Copacobana Series |
|
|
225 | (1) |
|
|
226 | (5) |
|
|
227 | (1) |
|
|
228 | (1) |
|
11.2.3 Interface Controller |
|
|
229 | (1) |
|
11.2.4 Application Development |
|
|
230 | (1) |
|
11.3 Cryptanalysis with Copacobana 1000 |
|
|
231 | (11) |
|
11.3.1 Previous Work on DES Breaking |
|
|
232 | (1) |
|
11.3.2 Exhaustive Key Search on DES |
|
|
233 | (2) |
|
11.3.3 Breaking DES-Based Crypto Tokens |
|
|
235 | (1) |
|
11.3.3.1 Basics of Token-Based Data Authentication |
|
|
235 | (2) |
|
11.3.3.2 Cryptanalysis of the ANSI X9.9-Based Challenge-Response Authentication |
|
|
237 | (1) |
|
11.3.3.3 Possible Attack Scenarios on Banking Systems |
|
|
238 | (1) |
|
11.3.3.4 Implementing the Token Attack on Copacobana |
|
|
239 | (3) |
|
|
242 | (6) |
|
11.4.1 Direction toward New Applications |
|
|
242 | (1) |
|
|
242 | (1) |
|
11.4.3 Architecture of Copacobana 5000 |
|
|
243 | (1) |
|
11.4.3.1 Bus Concept and Backplane |
|
|
243 | (1) |
|
|
244 | (2) |
|
11.4.3.3 Interface Controller |
|
|
246 | (1) |
|
11.4.3.4 Power Supply and Cooling Mechanism |
|
|
246 | (1) |
|
11.4.3.5 Application Development |
|
|
247 | (1) |
|
11.5 Applications in Bioinformatics |
|
|
248 | (11) |
|
11.5.1 Sequence Alignment |
|
|
249 | (1) |
|
11.5.1.1 Simth-Waterman Alignment |
|
|
249 | (1) |
|
11.5.1.2 Hardware Implementation |
|
|
250 | (1) |
|
11.5.1.3 Performance on Copacobana 5000 |
|
|
251 | (1) |
|
|
252 | (1) |
|
11.5.2.1 The BMA Alogrithm |
|
|
253 | (1) |
|
11.5.2.2 Implementation of BMA |
|
|
253 | (2) |
|
11.5.2.3 Parallelization of BMA in Hardware |
|
|
255 | (2) |
|
11.5.2.4 Performance Results of BMA |
|
|
257 | (2) |
|
|
259 | (1) |
|
|
259 | (4) |
|
12 Accelerating String Set Matching for Bioinformatics Using FPGA Hardware |
|
|
263 | (22) |
|
|
|
263 | (3) |
|
12.1.1 String Matching Approaches |
|
|
264 | (1) |
|
12.1.2 Use of the ACA in Computational Biology |
|
|
264 | (1) |
|
12.1.3 Use of FPGAs in Computational Biology |
|
|
265 | (1) |
|
12.1.4 Use of String Set Matching in FPGAs in Other Domains |
|
|
265 | (1) |
|
|
266 | (3) |
|
12.2.1 The Aho-Corasick Preprocessing Phase |
|
|
266 | (3) |
|
12.3 FPGA Implementation of the String Set Matching DFA |
|
|
269 | (9) |
|
12.3.1 Bit-Split DFA Architecture |
|
|
270 | (3) |
|
12.3.2 Implementing Bit-Split DFA Tables in FPGAs |
|
|
273 | (3) |
|
12.3.3 Analysis of DFA Storage Utilization Efficiency |
|
|
276 | (1) |
|
|
277 | (1) |
|
12.4.1 Storage Utilization |
|
|
277 | (1) |
|
12.4.2 Implementation Performance |
|
|
278 | (2) |
|
|
280 | (1) |
|
|
281 | (4) |
|
13 Reconfigurable Neural System and Its Application to Dimeric Protein Binding Site Identification |
|
|
285 | (28) |
|
|
|
285 | (2) |
|
13.2 Design of the Neural System |
|
|
287 | (10) |
|
13.2.1 Numerical Representation of DNA Sequences |
|
|
287 | (1) |
|
|
288 | (1) |
|
|
289 | (2) |
|
13.2.4 Adaptation of the HNN |
|
|
291 | (6) |
|
13.3 Reconfigurable DP-HNN |
|
|
297 | (5) |
|
13.3.1 Representation of Numerical Values and Operations on FPGA |
|
|
298 | (1) |
|
13.3.2 Control and Matching Units |
|
|
298 | (2) |
|
13.3.3 Neuron and Memory Units |
|
|
300 | (1) |
|
13.3.4 Operation of DP-HNN |
|
|
301 | (1) |
|
13.4 Application to Dimeric Protein Binding Site Identification |
|
|
302 | (5) |
|
13.4.1 The Biological Problem |
|
|
302 | (1) |
|
13.4.2 Dimeric Structure of HREs |
|
|
303 | (2) |
|
13.4.3 Two-Phase Neural System for HRE Prediction |
|
|
305 | (1) |
|
13.4.4 Performance of the Hardware-Accelerated System |
|
|
306 | (1) |
|
|
307 | (2) |
|
|
309 | (4) |
|
14 Parallel FPGA Search Engine for Protein Identification |
|
|
313 | (24) |
|
|
|
|
|
313 | (2) |
|
14.2 The Reconfigurable Computing Paradigm |
|
|
315 | (2) |
|
14.3 Protein Identification by Sequence Database Searching Using Mass Spectral Fingerprints |
|
|
317 | (5) |
|
14.3.1 Overview of the Approach |
|
|
317 | (1) |
|
14.3.2 Abstract Computational Model |
|
|
318 | (1) |
|
|
318 | (3) |
|
14.3.4 Protein Identification by Spectral Matching |
|
|
321 | (1) |
|
14.4 Reconfigurable Computing Platform |
|
|
322 | (3) |
|
14.5 Protein Sequence Database FPGA Search Engine |
|
|
325 | (6) |
|
|
325 | (1) |
|
14.5.2 Database Search Processor |
|
|
326 | (1) |
|
|
326 | (1) |
|
14.5.2.2 Variable Modifications |
|
|
327 | (1) |
|
|
328 | (3) |
|
14.6 Performance Evaluation |
|
|
331 | (2) |
|
|
333 | (4) |
Index |
|
337 | |