Preface |
|
v | |
Authors' Biographies |
|
xiii | |
|
1 Introduction to Text Mining with Machine Learning |
|
|
1 | (12) |
|
|
1 | (1) |
|
1.2 Relation of Text Mining to Data Mining |
|
|
2 | (3) |
|
1.3 The Text Mining Process |
|
|
5 | (1) |
|
1.4 Machine Learning for Text Mining |
|
|
6 | (3) |
|
1.4.1 Inductive Machine Learning |
|
|
8 | (1) |
|
1.5 Three Fundamental Learning Directions |
|
|
9 | (2) |
|
1.5.1 Supervised Machine Learning |
|
|
9 | (1) |
|
1.5.2 Unsupervised Machine Learning |
|
|
9 | (1) |
|
1.5.3 Semi-supervised Machine Learning |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
11 | (2) |
|
|
13 | (62) |
|
|
14 | (1) |
|
|
15 | (2) |
|
|
17 | (2) |
|
|
18 | (1) |
|
|
19 | (1) |
|
2.4 Writing and Executing Commands |
|
|
19 | (2) |
|
2.5 Variables and Data Types |
|
|
21 | (1) |
|
|
22 | (9) |
|
|
25 | (1) |
|
|
26 | (1) |
|
|
27 | (1) |
|
|
28 | (1) |
|
|
29 | (2) |
|
|
31 | (4) |
|
|
35 | (1) |
|
|
36 | (7) |
|
|
36 | (2) |
|
2.9.2 Naming Vector Elements |
|
|
38 | (2) |
|
2.9.3 Operations with Vectors |
|
|
40 | (2) |
|
2.9.4 Accessing Vector Elements |
|
|
42 | (1) |
|
|
43 | (4) |
|
|
47 | (2) |
|
|
49 | (2) |
|
|
51 | (4) |
|
2.14 Functions Useful in Machine Learning |
|
|
55 | (6) |
|
2.15 Flow Control Structures |
|
|
61 | (4) |
|
2.15.1 Conditional Statement |
|
|
61 | (3) |
|
|
64 | (1) |
|
|
65 | (2) |
|
2.16.1 Installing Packages |
|
|
66 | (1) |
|
|
67 | (1) |
|
|
67 | (8) |
|
3 Structured Text Representations |
|
|
75 | (62) |
|
|
75 | (4) |
|
3.2 The Bag-of-Words Model |
|
|
79 | (1) |
|
3.3 The Limitations of the Bag-of-Words Model |
|
|
80 | (3) |
|
|
83 | (2) |
|
|
85 | (5) |
|
3.6 Texts in Different Encodings |
|
|
90 | (2) |
|
3.7 Language Identification |
|
|
92 | (1) |
|
|
92 | (1) |
|
|
93 | (1) |
|
3.10 Filtering Stop Words, Common, and Rare Terms |
|
|
94 | (4) |
|
|
98 | (1) |
|
|
99 | (5) |
|
|
99 | (1) |
|
3.12.2 Stemming and Lemmatization |
|
|
100 | (2) |
|
3.12.3 Spelling Correction |
|
|
102 | (2) |
|
|
104 | (5) |
|
3.13.1 Part of Speech Tagging |
|
|
104 | (3) |
|
|
107 | (2) |
|
3.14 Calculating the Weights in the Bag-of-Words Model |
|
|
109 | (5) |
|
|
109 | (1) |
|
|
110 | (1) |
|
3.14.3 Normalization Factor |
|
|
111 | (3) |
|
3.15 Common Formats for Storing Structured Data |
|
|
114 | (9) |
|
3.15.1 Attribute-Relation File Format (ARFF) |
|
|
114 | (1) |
|
3.15.2 Comma-Separated Values (CSV) |
|
|
115 | (2) |
|
|
117 | (4) |
|
3.15.4 Matrix Files for CLUTO |
|
|
121 | (1) |
|
|
121 | (1) |
|
|
122 | (1) |
|
|
123 | (14) |
|
|
137 | (8) |
|
|
137 | (3) |
|
|
140 | (2) |
|
4.3 Classifier Quality Measurement |
|
|
142 | (3) |
|
|
145 | (18) |
|
|
145 | (1) |
|
|
146 | (2) |
|
5.3 Optimal Bayes Classifier |
|
|
148 | (1) |
|
5.4 Naive Bayes Classifier |
|
|
149 | (1) |
|
5.5 Illustrative Example of Naive Bayes |
|
|
150 | (3) |
|
5.6 Naive Bayes Classifier in R |
|
|
153 | (10) |
|
5.6.1 Running Naive Bayes Classifier in RStudio |
|
|
154 | (2) |
|
5.6.2 Testing with an External Dataset |
|
|
156 | (2) |
|
5.6.3 Testing with 10-Fold Cross-Validation |
|
|
158 | (5) |
|
|
163 | (10) |
|
|
163 | (1) |
|
6.2 Similarity as Distance |
|
|
164 | (2) |
|
6.3 Illustrative Example of k-NN |
|
|
166 | (2) |
|
|
168 | (5) |
|
|
173 | (20) |
|
|
173 | (1) |
|
7.2 Entropy Minimization-Based c5 Algorithm |
|
|
174 | (7) |
|
7.2.1 The Principle of Generating Trees |
|
|
174 | (4) |
|
|
178 | (3) |
|
7.3 C5 Tree Generator in R |
|
|
181 | (12) |
|
|
181 | (3) |
|
7.3.2 Information Acquired from C5-Tree |
|
|
184 | (3) |
|
7.3.3 Using Testing Samples to Assess Tree Accuracy |
|
|
187 | (1) |
|
7.3.4 Using Cross-Validation to Assess Tree Accuracy |
|
|
188 | (1) |
|
7.3.5 Generating Decision Rules |
|
|
189 | (4) |
|
|
193 | (8) |
|
|
193 | (2) |
|
|
193 | (2) |
|
8.1.2 Stability and Robustness |
|
|
195 | (1) |
|
8.1.3 Which Tree Algorithm? |
|
|
195 | (1) |
|
|
195 | (6) |
|
|
201 | (10) |
|
|
201 | (1) |
|
|
201 | (1) |
|
|
202 | (2) |
|
|
204 | (1) |
|
|
205 | (6) |
|
10 Support Vector Machines |
|
|
211 | (12) |
|
|
211 | (2) |
|
10.2 Support Vector Machines Principles |
|
|
213 | (4) |
|
10.2.1 Finding Optimal Separation Hyperplane |
|
|
213 | (1) |
|
10.2.2 Nonlinear Classification and Kernel Functions |
|
|
214 | (1) |
|
10.2.3 Multiclass SVM Classification |
|
|
215 | (1) |
|
|
216 | (1) |
|
|
217 | (6) |
|
|
223 | (12) |
|
|
223 | (2) |
|
11.2 Artificial Neural Networks |
|
|
225 | (2) |
|
|
227 | (8) |
|
|
235 | (52) |
|
12.1 Introduction to Clustering |
|
|
235 | (1) |
|
12.2 Difficulties of Clustering |
|
|
236 | (2) |
|
|
238 | (4) |
|
|
239 | (1) |
|
12.3.2 Euclidean Distance |
|
|
240 | (1) |
|
12.3.3 Manhattan Distance |
|
|
240 | (1) |
|
12.3.4 Chebyshev Distance |
|
|
241 | (1) |
|
12.3.5 Minkowski Distance |
|
|
241 | (1) |
|
12.3.6 Jaccard Coefficient |
|
|
241 | (1) |
|
12.4 Types of Clustering Algorithms |
|
|
242 | (4) |
|
12.4.1 Partitional (Flat) Clustering |
|
|
242 | (1) |
|
12.4.2 Hierarchical Clustering |
|
|
243 | (2) |
|
12.4.3 Graph Based Clustering |
|
|
245 | (1) |
|
12.5 Clustering Criterion Functions |
|
|
246 | (3) |
|
12.5.1 Internal Criterion Functions |
|
|
247 | (1) |
|
12.5.2 External Criterion Function |
|
|
248 | (1) |
|
12.5.3 Hybrid Criterion Functions |
|
|
248 | (1) |
|
12.5.4 Graph Based Criterion Functions |
|
|
248 | (1) |
|
12.6 Deciding on the Number of Clusters |
|
|
249 | (2) |
|
|
251 | (1) |
|
|
252 | (1) |
|
12.9 Criterion Function Optimization |
|
|
253 | (1) |
|
12.10 Agglomerative Hierarchical Clustering |
|
|
253 | (4) |
|
12.11 Scatter-Gather Algorithm |
|
|
257 | (2) |
|
12.12 Divisive Hierarchical Clustering |
|
|
259 | (1) |
|
12.13 Constrained Clustering |
|
|
260 | (1) |
|
12.14 Evaluating Clustering Results |
|
|
261 | (9) |
|
12.14.1 Metrics Based on Counting Pairs |
|
|
263 | (1) |
|
|
264 | (1) |
|
|
264 | (1) |
|
|
265 | (1) |
|
12.14.5 Normalized Mutual Information |
|
|
266 | (1) |
|
|
267 | (2) |
|
12.14.7 Evaluation Based on Expert Opinion |
|
|
269 | (1) |
|
|
270 | (1) |
|
|
271 | (16) |
|
|
287 | (14) |
|
|
287 | (2) |
|
13.2 Determining the Context and Word Similarity |
|
|
289 | (2) |
|
|
291 | (1) |
|
13.4 Computing Word Embeddings |
|
|
291 | (3) |
|
13.5 Aggregation of Word Vectors |
|
|
294 | (1) |
|
|
295 | (6) |
|
|
301 | (22) |
|
|
301 | (2) |
|
14.2 Feature Selection as State Space Search |
|
|
303 | (1) |
|
14.3 Feature Selection Methods |
|
|
304 | (9) |
|
|
306 | (1) |
|
14.3.2 Mutual Information |
|
|
307 | (4) |
|
|
311 | (2) |
|
14.4 Term Elimination Based on Frequency |
|
|
313 | (1) |
|
|
314 | (1) |
|
|
315 | (1) |
|
14.7 Entropy-Based Ranking |
|
|
315 | (1) |
|
|
316 | (1) |
|
|
316 | (7) |
References |
|
323 | (24) |
Index |
|
347 | |