Preface |
|
xi | |
Acknowledgments |
|
xiii | |
About This Book |
|
xv | |
About The Author |
|
xix | |
About The Cover Illustration |
|
xx | |
Part 1 Introduction And Overview |
|
1 | (64) |
|
1 What is transfer learning? |
|
|
3 | (21) |
|
1.1 Overview of representative NLP tasks |
|
|
5 | (2) |
|
1.2 Understanding NLP in the context of AI |
|
|
7 | (7) |
|
Artificial intelligence (AI) |
|
|
8 | (1) |
|
|
8 | (4) |
|
Natural language processing (NLP) |
|
|
12 | (2) |
|
1.3 A brief history of NLP advances |
|
|
14 | (4) |
|
|
14 | (2) |
|
Recent transfer learning advances |
|
|
16 | (2) |
|
1.4 Transfer learning in computer vision |
|
|
18 | (3) |
|
|
18 | (1) |
|
Pretrained ImageNet models |
|
|
19 | (1) |
|
Fine-tuning pretrained ImageNet models |
|
|
20 | (1) |
|
1.5 Why is NLP transfer learning an exciting topic to study now? |
|
|
21 | (3) |
|
2 Getting started with baselines: Data preprocessing |
|
|
24 | (20) |
|
2.1 Preprocessing email spam classification example data |
|
|
27 | (10) |
|
Loading and visualizing the Enron corpus |
|
|
28 | (2) |
|
Loading and visualizing the fraudulent email corpus |
|
|
30 | (4) |
|
Converting the email text into numbers |
|
|
34 | (3) |
|
2.2 Preprocessing movie sentiment classification example data |
|
|
37 | (2) |
|
2.3 Generalized linear models |
|
|
39 | (5) |
|
|
40 | (2) |
|
Support vector machines (SUMS) |
|
|
42 | (2) |
|
3 Getting started with baselines: Benchmarking and optimization |
|
|
44 | (21) |
|
3.1 Decision-tree-based models |
|
|
45 | (5) |
|
|
45 | (1) |
|
Gradient-boosting machines (GBMs) |
|
|
46 | (4) |
|
3.2 Neural network models |
|
|
50 | (9) |
|
Embeddings from Language Models (ELMo) |
|
|
51 | (5) |
|
Bidirectional Encoder Representations from Transformers (BERT) |
|
|
56 | (3) |
|
3.3 Optimizing performance |
|
|
59 | (8) |
|
Manual hyperparameter tuning |
|
|
60 | (1) |
|
Systematic hyperparameter tuning |
|
|
61 | (4) |
Part 2 Shallow Transfer Learning And Deep Transfer Learning With Recurrent Neural Networks (RNNs) |
|
65 | (54) |
|
4 Shallow transfer learning for NLP |
|
|
67 | (19) |
|
4.1 Semisupervised learning with pretrained word embeddings |
|
|
70 | (5) |
|
4.2 Semisupervised learning with higher-level representations |
|
|
75 | (1) |
|
|
76 | (5) |
|
Problem setup and a shallow neural single-task baseline |
|
|
78 | (2) |
|
|
80 | (1) |
|
|
81 | (5) |
|
5 Preprocessing data for recurrent neural network deep transfer learning experiments |
|
|
86 | (13) |
|
5.1 Preprocessing tabular column-type classification data |
|
|
89 | (7) |
|
Obtaining and visualizing tabular data |
|
|
90 | (3) |
|
Preprocessing tabular data |
|
|
93 | (2) |
|
Encoding preprocessed data as numbers |
|
|
95 | (1) |
|
5.2 Preprocessing fact-checking example data |
|
|
96 | (3) |
|
Special problem considerations |
|
|
96 | (1) |
|
Loading and visualizing fact-checking data |
|
|
97 | (2) |
|
6 Deep transfer learning for NLP with recurrent neural networks |
|
|
99 | (20) |
|
6.1 Semantic Inference for the Modeling of Ontologies (SIMOn) |
|
|
100 | (10) |
|
General neural architecture overview |
|
|
101 | (1) |
|
|
102 | (1) |
|
Application of SIMOn to tabular column-type classification data |
|
|
102 | (8) |
|
6.2 Embeddings from Language Models (ELMo) |
|
|
110 | (4) |
|
ELMo bidirectional language modeling |
|
|
111 | (1) |
|
Application to fake news detection |
|
|
112 | (2) |
|
6.3 Universal Language Model Fine-Tuning (ULMFiT) |
|
|
114 | (7) |
|
Target task language model fine-tuning |
|
|
115 | (1) |
|
Target task classifier fine-tuning |
|
|
116 | (3) |
Part 3 Deep Transfer Learning With Transformers And Adaptation Strategies |
|
119 | (99) |
|
7 Deep transfer learning for NLP with the transformer and GPT |
|
|
121 | (24) |
|
|
123 | (13) |
|
An introduction to the transformers library and attention visualization |
|
|
126 | (2) |
|
|
128 | (4) |
|
Residual connections, encoder-decoder attention, and positional encoding |
|
|
132 | (2) |
|
Application of pretrained encoder-decoder to translation |
|
|
134 | (2) |
|
7.2 The Generative Pretrained Transformer |
|
|
136 | (9) |
|
|
137 | (3) |
|
Transformers pipelines introduction and application to text generation |
|
|
140 | (1) |
|
|
141 | (4) |
|
8 Deep transfer learning for NLP with BERT and multilingual BERT |
|
|
145 | (17) |
|
8.1 Bidirectional Encoder Representations from Transformers (BERT) |
|
|
146 | (10) |
|
|
148 | (3) |
|
Application to question answering |
|
|
151 | (3) |
|
Application to fill in the blanks and next-sentence prediction tasks |
|
|
154 | (2) |
|
8.2 Cross-lingual learning with multilingual BERT (mBERT) |
|
|
156 | (6) |
|
Brief JW300 dataset overview |
|
|
157 | (1) |
|
Transfer mBERT to monolingual Twi data with the pretrained tokenizer |
|
|
158 | (2) |
|
mBERT and tokenizer trained from scratch on monolingual Twi data |
|
|
160 | (2) |
|
9 ULMFiT and knowledge distillation adaptation strategies |
|
|
162 | (15) |
|
9.1 Gradual unfreezing and discriminative fine-tuning |
|
|
163 | (7) |
|
Pretrained language model fine-tuning |
|
|
165 | (3) |
|
Target task classifier fine-tuning |
|
|
168 | (2) |
|
9.2 Knowledge distillation |
|
|
170 | (7) |
|
Transfer DistilmBERT to monolingual Twi data with pretrained tokenizer |
|
|
172 | (5) |
|
10 ALBERT, adapters, and multitask adaptation strategies |
|
|
177 | (18) |
|
10.1 Embedding factorization and cross-layer parameter sharing |
|
|
179 | (4) |
|
Fine-tuning pretrained ALBERT on MDSD book reviews |
|
|
180 | (3) |
|
10.2 Multitask fine-tuning |
|
|
183 | (8) |
|
General Language Understanding Dataset (GLUE) |
|
|
184 | (2) |
|
Fine-tuning on a single GLUE task |
|
|
186 | (2) |
|
|
188 | (3) |
|
|
191 | (4) |
|
|
195 | (23) |
|
11.1 Overview of key concepts |
|
|
196 | (7) |
|
11.2 Other emerging research trends |
|
|
203 | (7) |
|
|
203 | (1) |
|
|
203 | (2) |
|
|
205 | (1) |
|
|
206 | (1) |
|
|
206 | (1) |
|
|
206 | (1) |
|
|
207 | (1) |
|
|
208 | (1) |
|
|
209 | (1) |
|
|
209 | (1) |
|
11.3 Future of transfer learning in NLP |
|
|
210 | (2) |
|
11.4 Ethical and environmental considerations |
|
|
212 | (2) |
|
|
214 | (2) |
|
Kaggle and Zindi competitions |
|
|
214 | (1) |
|
|
215 | (1) |
|
News and social media (Twitter) |
|
|
215 | (1) |
|
|
216 | (2) |
Appendix A Kaggle primer |
|
218 | (10) |
Appendix B Introduction to fundamental deep learning tools |
|
228 | (9) |
Index |
|
237 | |