Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

E-grāmata: Text Mining with Machine Learning: Principles and Techniques

4.00/5 (2 ratings by Goodreads)

Arnot Svoboda (Masarytk University, Czech Republic), Frantiek Daena (Mendel University, Czech Republic), Jan ika

Formāts: 366 pages
Izdošanas datums: 31-Oct-2019
Izdevniecība: CRC Press
ISBN-13: 9780429890260

Citas grāmatas par šo tēmu:

Formāts - EPUB+DRM
Cena: 57,60 €*
* ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
Ielikt grozā
Pievienot vēlmju sarakstam
Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.

Formāts: 366 pages
Izdošanas datums: 31-Oct-2019
Izdevniecība: CRC Press
ISBN-13: 9780429890260

Citas grāmatas par šo tēmu:

DRM restrictions

Kopēšana (kopēt/ievietot):

nav atļauts
Drukāšana:

nav atļauts
Lietošana:

Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

Nepieciešamā programmatūra
Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

This book provides a perspective on the application of machine learning-based methods in knowledge discovery from natural languages texts. By analysing various data sets, conclusions which are not normally evident, emerge and can be used for various purposes and applications. The book provides explanations of principles of time-proven machine learning algorithms applied in text mining together with step-by-step demonstrations of how to reveal the semantic contents in real-world datasets using the popular R-language with its implemented machine learning algorithms. The book is not only aimed at IT specialists, but is meant for a wider audience that needs to process big sets of text documents and has basic knowledge of the subject, e.g. e-mail service providers, online shoppers, librarians, etc.

The book starts with an introduction to text-based natural language data processing and its goals and problems. It focuses on machine learning, presenting various algorithms with their use and possibilities, and reviews the positives and negatives. Beginning with the initial data pre-processing, a reader can follow the steps provided in the R-language including the subsuming of various available plug-ins into the resulting software tool. A big advantage is that R also contains many libraries implementing machine learning algorithms, so a reader can concentrate on the principal target without the need to implement the details of the algorithms her- or himself. To make sense of the results, the book also provides explanations of the algorithms, which supports the final evaluation and interpretation of the results. The examples are demonstrated using realworld data from commonly accessible Internet sources.

Preface

Authors' Biographies

xiii

1 Introduction to Text Mining with Machine Learning

(12)

1.1 Introduction

(1)

1.2 Relation of Text Mining to Data Mining

(3)

1.3 The Text Mining Process

(1)

1.4 Machine Learning for Text Mining

(3)

1.4.1 Inductive Machine Learning

(1)

1.5 Three Fundamental Learning Directions

(2)

1.5.1 Supervised Machine Learning

(1)

1.5.2 Unsupervised Machine Learning

(1)

1.5.3 Semi-supervised Machine Learning

(1)

1.6 Big Data

(1)

1.7 About This Book

(2)

2 Introduction to R

(62)

2.1 Installing R

(1)

2.2 Running R

(2)

2.3 RStudio

(2)

2.3.1 Projects

(1)

2.3.2 Getting Help

(1)

2.4 Writing and Executing Commands

(2)

2.5 Variables and Data Types

(1)

2.6 Objects in R

(9)

2.6.1 Assignment

(1)

2.6.2 Logical Values

(1)

2.6.3 Numbers

(1)

2.6.4 Character Strings

(1)

2.6.5 Special Values

(2)

2.7 Functions

(4)

2.8 Operators

(1)

2.9 Vectors

(7)

2.9.1 Creating Vectors

(2)

2.9.2 Naming Vector Elements

(2)

2.9.3 Operations with Vectors

(2)

2.9.4 Accessing Vector Elements

(1)

2.10 Matrices and Arrays

(4)

2.11 Lists

(2)

2.12 Factors

(2)

2.13 Data Frames

(4)

2.14 Functions Useful in Machine Learning

(6)

2.15 Flow Control Structures

(4)

2.15.1 Conditional Statement

(3)

2.15.2 Loops

(1)

2.16 Packages

(2)

2.16.1 Installing Packages

(1)

2.16.2 Loading Packages

(1)

2.17 Graphics

(8)

3 Structured Text Representations

(62)

3.1 Introduction

(4)

3.2 The Bag-of-Words Model

(1)

3.3 The Limitations of the Bag-of-Words Model

(3)

3.4 Document Features

(2)

3.5 Standardization

(5)

3.6 Texts in Different Encodings

(2)

3.7 Language Identification

(1)

3.8 Tokenization

(1)

3.9 Sentence Detection

(1)

3.10 Filtering Stop Words, Common, and Rare Terms

(4)

3.11 Removing Diacritics

(1)

3.12 Normalization

(5)

3.12.1 Case Folding

(1)

3.12.2 Stemming and Lemmatization

100

(2)

3.12.3 Spelling Correction

102

(2)

3.13 Annotation

104

(5)

3.13.1 Part of Speech Tagging

104

(3)

3.13.2 Parsing

107

(2)

3.14 Calculating the Weights in the Bag-of-Words Model

109

(5)

3.14.1 Local Weights

109

(1)

3.14.2 Global Weights

110

(1)

3.14.3 Normalization Factor

111

(3)

3.15 Common Formats for Storing Structured Data

114

(9)

3.15.1 Attribute-Relation File Format (ARFF)

114

(1)

3.15.2 Comma-Separated Values (CSV)

115

(2)

3.15.3 C5 format

117

(4)

3.15.4 Matrix Files for CLUTO

121

(1)

3.15.5 SVMlight Format

121

(1)

3.15.6 Reading Data in R

122

(1)

3.16 A Complex Example

123

(14)

4 Classification

137

(8)

4.1 Sample Data

137

(3)

4.2 Selected Algorithms

140

(2)

4.3 Classifier Quality Measurement

142

(3)

5 Bayes Classifier

145

(18)

5.1 Introduction

145

(1)

5.2 Bayes' Theorem

146

(2)

5.3 Optimal Bayes Classifier

148

(1)

5.4 Naive Bayes Classifier

149

(1)

5.5 Illustrative Example of Naive Bayes

150

(3)

5.6 Naive Bayes Classifier in R

153

(10)

5.6.1 Running Naive Bayes Classifier in RStudio

154

(2)

5.6.2 Testing with an External Dataset

156

(2)

5.6.3 Testing with 10-Fold Cross-Validation

158

(5)

6 Nearest Neighbors

163

(10)

6.1 Introduction

163

(1)

6.2 Similarity as Distance

164

(2)

6.3 Illustrative Example of k-NN

166

(2)

6.4 k-NN in R

168

(5)

7 Decision Trees

173

(20)

7.1 Introduction

173

(1)

7.2 Entropy Minimization-Based c5 Algorithm

174

(7)

7.2.1 The Principle of Generating Trees

174

(4)

7.2.2 Pruning

178

(3)

7.3 C5 Tree Generator in R

181

(12)

7.3.1 Generating a Tree

181

(3)

7.3.2 Information Acquired from C5-Tree

184

(3)

7.3.3 Using Testing Samples to Assess Tree Accuracy

187

(1)

7.3.4 Using Cross-Validation to Assess Tree Accuracy

188

(1)

7.3.5 Generating Decision Rules

189

(4)

8 Random Forest

193

(8)

8.1 Introduction

193

(2)

8.1.1 Bootstrap

193

(2)

8.1.2 Stability and Robustness

195

(1)

8.1.3 Which Tree Algorithm?

195

(1)

8.2 Random Forest in R

195

(6)

9 Adaboost

201

(10)

9.1 Introduction

201

(1)

9.2 Boosting Principle

201

(1)

9.3 Adaboost Principle

202

(2)

9.4 Weak Learners

204

(1)

9.5 Adaboost in R

205

(6)

10 Support Vector Machines

211

(12)

10.1 Introduction

211

(2)

10.2 Support Vector Machines Principles

213

(4)

10.2.1 Finding Optimal Separation Hyperplane

213

(1)

10.2.2 Nonlinear Classification and Kernel Functions

214

(1)

10.2.3 Multiclass SVM Classification

215

(1)

10.2.4 SVM Summary

216

(1)

10.3 SVM in R

217

(6)

11 Deep Learning

223

(12)

11.1 Introduction

223

(2)

11.2 Artificial Neural Networks

225

(2)

11.3 Deep Learning in R

227

(8)

12 Clustering

235

(52)

12.1 Introduction to Clustering

235

(1)

12.2 Difficulties of Clustering

236

(2)

12.3 Similarity Measures

238

(4)

12.3.1 Cosine Similarity

239

(1)

12.3.2 Euclidean Distance

240

(1)

12.3.3 Manhattan Distance

240

(1)

12.3.4 Chebyshev Distance

241

(1)

12.3.5 Minkowski Distance

241

(1)

12.3.6 Jaccard Coefficient

241

(1)

12.4 Types of Clustering Algorithms

242

(4)

12.4.1 Partitional (Flat) Clustering

242

(1)

12.4.2 Hierarchical Clustering

243

(2)

12.4.3 Graph Based Clustering

245

(1)

12.5 Clustering Criterion Functions

246

(3)

12.5.1 Internal Criterion Functions

247

(1)

12.5.2 External Criterion Function

248

(1)

12.5.3 Hybrid Criterion Functions

248

(1)

12.5.4 Graph Based Criterion Functions

248

(1)

12.6 Deciding on the Number of Clusters

249

(2)

12.7 K-Means

251

(1)

12.8 K-Medoids

252

(1)

12.9 Criterion Function Optimization

253

(1)

12.10 Agglomerative Hierarchical Clustering

253

(4)

12.11 Scatter-Gather Algorithm

257

(2)

12.12 Divisive Hierarchical Clustering

259

(1)

12.13 Constrained Clustering

260

(1)

12.14 Evaluating Clustering Results

261

(9)

12.14.1 Metrics Based on Counting Pairs

263

(1)

12.14.2 Purity

264

(1)

12.14.3 Entropy

264

(1)

12.14.4 F-Measure

265

(1)

12.14.5 Normalized Mutual Information

266

(1)

12.14.6 Silhouette

267

(2)

12.14.7 Evaluation Based on Expert Opinion

269

(1)

12.15 Cluster Labeling

270

(1)

12.16 A Few Examples

271

(16)

13 Word Embeddings

287

(14)

13.1 Introduction

287

(2)

13.2 Determining the Context and Word Similarity

289

(2)

13.3 Context Windows

291

(1)

13.4 Computing Word Embeddings

291

(3)

13.5 Aggregation of Word Vectors

294

(1)

13.6 An Example

295

(6)

14 Feature Selection

301

(22)

14.1 Introduction

301

(2)

14.2 Feature Selection as State Space Search

303

(1)

14.3 Feature Selection Methods

304

(9)

14.3.1 Chi Squared (Χ2)

306

(1)

14.3.2 Mutual Information

307

(4)

14.3.3 Information Gain

311

(2)

14.4 Term Elimination Based on Frequency

313

(1)

14.5 Term Strength

314

(1)

14.6 Term Contribution

315

(1)

14.7 Entropy-Based Ranking

315

(1)

14.8 Term Variance

316

(1)

14.9 An Example

316

(7)

References

323

(24)

Index

347

Jan ika is a consultant in machine learning and data mining. He has worked as a system programmer, developer of advanced software systems, and researcher. For the last 25 years, he has devoted himself to AI and machine learning, especially text mining. He has been a faculty at a number of universities and research institutes. He has authored approximately 100 international publications.

Frantiek Daena is an associate professor and the head of the Text Mining and NLP group at the Department of Informatics, Mendel University, Brno. He has published numerous articles in international scientific journals, conference proceedings, and monographs, and is a member of editorial boards of several international journals. His research includes text/data mining, intelligent data processing, and machine learning.

Arnot Svoboda is an expert programer. His speciality includes programming languages and systems such as R, Assembler, Matlab, PL/1, Cobol, Fortran, Pascal, and others. He started as a system programmer. The last 20 years, Arnot has worked also as a teacher and researcher at Masaryk University in Brno. His current interest are machine learning and data mining.

Biežāk uzdotie jautājumi par e-grāmatām

Permanent link: https://www.kriso.lv/db/97804298902606e.html

Keywords:

E-grāmata: Text Mining with Machine Learning: Principles and Techniques

DRM restrictions

Kopēšana (kopēt/ievietot):

Drukāšana:

Lietošana:

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Ebook Subjects

Izvēlieties iepirkumu grozu