Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

Knowledge Discovery from Data Streams [Hardback]

4.25/5 (7 ratings by Goodreads)

Joao Gama (University of Porto, Portugal)

Formāts: Hardback, 258 pages, height x width: 234x156 mm, weight: 476 g, 11 Tables, black and white; 62 Illustrations, black and white
Sērija : Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
Izdošanas datums: 25-May-2010
Izdevniecība: Chapman & Hall/CRC
ISBN-10: 1439826110
ISBN-13: 9781439826119

Citas grāmatas par šo tēmu:

Algorithms & data structures

Hardback
Cena: 119,73 €
Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
Daudzums:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Ielikt grozā
Piegādes laiks - 4-6 nedēļas
Pievienot vēlmju sarakstam
Bibliotēkām

Formāts: Hardback, 258 pages, height x width: 234x156 mm, weight: 476 g, 11 Tables, black and white; 62 Illustrations, black and white
Sērija : Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
Izdošanas datums: 25-May-2010
Izdevniecība: Chapman & Hall/CRC
ISBN-10: 1439826110
ISBN-13: 9781439826119

Citas grāmatas par šo tēmu:

Algorithms & data structures

Permanent link: https://www.kriso.lv/db/9781439826119.html

Keywords:

Since the beginning of the Internet age and the increased use of ubiquitous computing devices, the large volume and continuous flow of distributed data have imposed new constraints on the design of learning algorithms. Exploring how to extract knowledge structures from evolving and time-changing data, Knowledge Discovery from Data Streams presents a coherent overview of state-of-the-art research in learning from data streams.

The book covers the fundamentals that are imperative to understanding data streams and describes important applications, such as TCP/IP traffic, GPS data, sensor networks, and customer click streams. It also addresses several challenges of data mining in the future, when stream mining will be at the core of many applications. These challenges involve designing useful and efficient data mining solutions applicable to real-world problems. In the appendix, the author includes examples of publicly available software and online data sets.

This practical, up-to-date book focuses on the new requirements of the next generation of data mining. Although the concepts presented in the text are mainly about data streams, they also are valid for different areas of machine learning and data mining.

Recenzijas

this book is the first authored text (that is, not an edited collection) about the area The book covers a lot of ground in just 200 pages, including discussion of relatively advanced methods such as wavelets, bagging, boosting, dynamic time warping, and symbolic representation of time series. There is also, I was pleased to see, a chapter on evaluating streaming algorithms . Evaluation, in general, deserves more attention than it generally receives, so I was delighted to see the focus on it here. a good introduction to an area of data analysis which is going to be very important indeed. David J. Hand, International Statistical Review, 2012

Gama is one of the leading investigators in the hottest research topic in machine learning and data mining: data streams. This book is the first book to didactically cover in a clear, comprehensive and mathematically rigorous way the main machine learning related aspects of this relevant research field. an up-to-date, broad and useful source of reference for all those interested in knowledge acquisition by learning techniques. From the Foreword by André Ponce de Leon Ferreira de Carvalho, University of Sćo Paulo, Brazil

List of Tables

List of Figures

xiii

List of Algorithms

Foreword

xvii

Acknowledgments

xix

1 Knowledge Discovery from Data Streams

(6)

1.1 Introduction

(1)

1.2 An Illustrative Example

(2)

1.3 A World in Movement

(1)

1.4 Data Mining and Data Streams

(2)

2 Introduction to Data Streams

(26)

2.1 Data Stream Models

(2)

2.1.1 Research Issues in Data Stream Management Systems

(1)

2.1.2 An Illustrative Problem

(1)

2.2 Basic Streaming Methods

(14)

2.2.1 Illustrative Examples

(1)

2.2.1.1 Counting the Number of Occurrences of the Elements in a Stream

(1)

2.2.1.2 Counting the Number of Distinct Values in a Stream

(1)

2.2.2 Bounds of Random Variables

(2)

2.2.3 Poisson Processes

(1)

2.2.4 Maintaining Simple Statistics from Data Streams

(1)

2.2.5 Sliding Windows

(2)

2.2.5.1 Computing Statistics over Sliding Windows: The ADWIN Algorithm

(3)

2.2.6 Data Synopsis

(1)

2.2.6.1 Sampling

(1)

2.2.6.2 Synopsis and Histograms

(1)

2.2.6.3 Wavelets

(1)

2.2.6.4 Discrete Fourier Transform

(1)

2.3 Illustrative Applications

(7)

2.3.1 A Data Warehouse Problem: Hot-Lists

(1)

2.3.2 Computing the Entropy in a Stream

(3)

2.3.3 Monitoring Correlations Between Data Streams

(2)

2.3.4 Monitoring Threshold Functions over Distributed Data Streams

(1)

2.4 Notes

(3)

3 Change Detection

(16)

3.1 Introduction

(1)

3.2 Tracking Drifting Concepts

(8)

3.2.1 The Nature of Change

(1)

3.2.2 Characterization of Drift Detection Methods

(1)

3.2.2.1 Data Management

(1)

3.2.2.2 Detection Methods

(2)

3.2.2.3 Adaptation Methods

(1)

3.2.2.4 Decision Model Management

(1)

3.2.3 A Note on Evaluating Change Detection Methods

(1)

3.3 Monitoring the Learning Process

(4)

3.3.1 Drift Detection Using Statistical Process Control

(3)

3.3.2 An Illustrative Example

(1)

3.4 Final Remarks

(1)

3.5 Notes

(2)

4 Maintaining Histograms from Data Streams

(14)

4.1 Introduction

(1)

4.2 Histograms from Data Streams

(3)

4.2.1 K-buckets Histograms

(1)

4.2.2 Exponential Histograms

(1)

4.2.2.1 An Illustrative Example

(1)

4.2.2.2 Discussion

(1)

4.3 The Partition Incremental Discretization Algorithm - PiD

(6)

4.3.1 Analysis of the Algorithm

(1)

4.3.2 Change Detection in Histograms

(1)

4.3.3 An Illustrative Example

(2)

4.4 Applications to Data Mining

(3)

4.4.1 Applying PiD in Supervised Learning

(2)

4.4.2 Time-Changing Environments

(1)

4.5 Notes

(1)

5 Evaluating Streaming Algorithms

(16)

5.1 Introduction

(1)

5.2 Learning from Data Streams

(1)

5.3 Evaluation Issues

(10)

5.3.1 Design of Evaluation Experiments

(1)

5.3.2 Evaluation Metrics

(1)

5.3.2.1 Error Estimators Using a Single Algorithm and a Single Dataset

(1)

5.3.2.2 An Illustrative Example

(1)

5.3.3 Comparative Assessment

(1)

5.3.3.1 The 0-1 Loss Function

(1)

5.3.3.2 Illustrative Example

(1)

5.3.4 Evaluation Methodology in Non-Stationary Environments

(1)

5.3.4.1 The Page-Hinkley Algorithm

(1)

5.3.4.2 Illustrative Example

(2)

5.4 Lessons Learned and Open Issues

(2)

5.5 Notes

(2)

6 Clustering from Data Streams

(18)

6.1 Introduction

(1)

6.2 Clustering Examples

(10)

6.2.1 Basic Concepts

(2)

6.2.2 Partitioning Clustering

(1)

6.2.2.1 The Leader Algorithm

(1)

6.2.2.2 Single Pass k-Means

(1)

6.2.3 Hierarchical Clustering

(2)

6.2.4 Micro Clustering

(1)

6.2.4.1 Discussion

(1)

6.2.4.2 Monitoring Cluster Evolution

(1)

6.2.5 Grid Clustering

(1)

6.2.5.1 Computing the Fractal Dimension

(1)

6.2.5.2 Fractal Clustering

(2)

6.3 Clustering Variables

(6)

6.3.1 A Hierarchical Approach

(1)

6.3.1.1 Growing the Hierarchy

(3)

6.3.1.2 Aggregating at Concept Drift Detection

(2)

6.3.1.3 Analysis of the Algorithm

(1)

6.4 Notes

(1)

7 Frequent Pattern Mining

(18)

7.1 Introduction to Frequent Itemset Mining

(4)

7.1.1 The Search Space

(2)

7.1.2 The FP-growth Algorithm

100

(1)

7.1.3 Summarizing Itemsets

100

(1)

7.2 Heavy Hitters

101

(2)

7.3 Mining Frequent Itemsets from Data Streams

103

(7)

7.3.1 Landmark Windows

104

(1)

7.3.1.1 The LossyCounting Algorithm

104

(1)

7.3.1.2 Frequent Itemsets Using LossyCounting

104

(1)

7.3.2 Mining Recent Frequent Itemsets

105

(1)

7.3.2.1 Maintaining Frequent Itemsets in Sliding Windows

105

(1)

7.3.2.2 Mining Closed Frequent Itemsets over Sliding Windows

106

(2)

7.3.3 Frequent Itemsets at Multiple Time Granularities

108

(2)

7.4 Sequence Pattern Mining

110

(3)

7.4.1 Reservoir Sampling for Sequential Pattern Mining over Data Streams

111

(2)

7.5 Notes

113

(2)

8 Decision Trees from Data Streams

115

(18)

8.1 Introduction

115

(1)

8.2 The Very Fast Decision Tree Algorithm

116

(3)

8.2.1 VFDT ---The Base Algorithm

116

(2)

8.2.2 Analysis of the VFDT Algorithm

118

(1)

8.3 Extensions to the Basic Algorithm

119

(10)

8.3.1 Processing Continuous Attributes

119

(1)

8.3.1.1 Exhaustive Search

119

(2)

8.3.1.2 Discriminant Analysis

121

(2)

8.3.2 Functional Tree Leaves

123

(1)

8.3.3 Concept Drift

124

(2)

8.3.3.1 Detecting Changes

126

(1)

8.3.3.2 Reacting to Changes

127

(1)

8.3.4 Final Comments

128

(1)

8.4 OLIN: Info-Fuzzy Algorithms

129

(3)

8.5 Notes

132

(1)

9 Novelty Detection in Data Streams

133

(20)

9.1 Introduction

133

(1)

9.2 Learning and Novelty

134

(1)

9.2.1 Desiderata for Novelty Detection

135

(1)

9.3 Novelty Detection as a One-Class Classification Problem

135

(6)

9.3.1 Autoassociator Networks

136

(1)

9.3.2 The Positive Naive-Bayes

137

(1)

9.3.3 Decision Trees for One-Class Classification

138

(1)

9.3.4 The One-Class SVM

138

(1)

9.3.5 Evaluation of One-Class Classification Algorithms

139

(2)

9.4 Learning New Concepts

141

(3)

9.4.1 Approaches Based on Extreme Values

141

(1)

9.4.2 Approaches Based on the Decision Structure

142

(1)

9.4.3 Approaches Based on Frequency

143

(1)

9.4.4 Approaches Based on Distances

144

(1)

9.5 The Online Novelty and Drift Detection Algorithm

144

(7)

9.5.1 Initial Learning Phase

145

(1)

9.5.2 Continuous Unsupervised Learning Phase

146

(1)

9.5.2.1 Identifying Novel Concepts

147

(2)

9.5.2.2 Attempting to Determine the Nature of New Concepts

149

(1)

9.5.2.3 Merging Similar Concepts

149

(1)

9.5.2.4 Automatically Adapting the Number of Clusters

150

(1)

9.5.3 Computational Cost

150

(1)

9.6 Notes

151

(2)

10 Ensembles of Classifiers

153

(14)

10.1 Introduction

153

(2)

10.2 Linear Combination of Ensembles

155

(1)

10.3 Sampling from a Training Set

156

(4)

10.3.1 Online Bagging

157

(1)

10.3.2 Online Boosting

158

(2)

10.4 Ensembles of Trees

160

(2)

10.4.1 Option Trees

160

(1)

10.4.2 Forest of Trees

161

(1)

10.4.2.1 Generating Forest of Trees

162

(1)

10.4.2.2 Classifying Test Examples

162

(1)

10.5 Adapting to Drift Using Ensembles of Classifiers

162

(3)

10.6 Mining Skewed Data Streams with Ensembles

165

(1)

10.7 Notes

166

(1)

11 Time Series Data Streams

167

(18)

11.1 Introduction to Time Series Analysis

167

(2)

11.1.1 Trend

167

(2)

11.1.2 Seasonality

169

(1)

11.1.3 Stationarity

169

(1)

11.2 Time-Series Prediction

169

(8)

11.2.1 The Kalman Filter

170

(3)

11.2.2 Least Mean Squares

173

(1)

11.2.3 Neural Nets and Data Streams

173

(1)

11.2.3.1 Stochastic Sequential Learning of Neural Networks

174

(1)

11.2.3.2 Illustrative Example: Load Forecast in Data Streams

175

(2)

11.3 Similarity between Time-Series

177

(3)

11.3.1 Euclidean Distance

177

(1)

11.3.2 Dynamic Time-Warping

178

(2)

11.4 Symbolic Approximation - SAX

180

(4)

11.4.1 The SAX Transform

180

(1)

11.4.1.1 Piecewise Aggregate Approximation (PAA)

181

(1)

11.4.1.2 Symbolic Discretization

181

(1)

11.4.1.3 Distance Measure

182

(1)

11.4.1.4 Discussion

182

(1)

11.4.2 Finding Motifs Using SAX

183

(1)

11.4.3 Finding Discords Using SAX

183

(1)

11.5 Notes

184

(1)

12 Ubiquitous Data Mining

185

(20)

12.1 Introduction to Ubiquitous Data Mining

185

(1)

12.2 Distributed Data Stream Monitoring

186

(7)

12.2.1 Distributed Computing of Linear Functions

187

(1)

12.2.1.1 A General Algorithm for Computing Linear Functions

188

(1)

12.2.2 Computing Sparse Correlation Matrices Efficiently

189

(2)

12.2.2.1 Monitoring Sparse Correlation Matrices

191

(1)

12.2.2.2 Detecting Significant Correlations

192

(1)

12.2.2.3 Dealing with Data Streams

192

(1)

12.3 Distributed Clustering

193

(4)

12.3.1 Conquering the Divide

193

(1)

12.3.1.1 Furthest Point Clustering

193

(1)

12.3.1.2 The Parallel Guessing Clustering

193

(1)

12.3.2 DGClust - Distributed Grid Clustering

194

(1)

12.3.2.1 Local Adaptive Grid

194

(1)

12.3.2.2 Frequent State Monitoring

195

(1)

12.3.2.3 Centralized Online Clustering

196

(1)

12.4 Algorithm Granularity

197

(6)

12.4.1 Algorithm Granularity Overview

199

(1)

12.4.2 Formalization of Algorithm Granularity

200

(1)

12.4.2.1 Algorithm Granularity Procedure

200

(1)

12.4.2.2 Algorithm Output Granularity

201

(2)

12.5 Notes

203

(2)

13 Final Comments

205

(4)

13.1 The Next Generation of Knowledge Discovery

205

(1)

13.1.1 Mining Spatial Data

206

(1)

13.1.2 The Time Situation of Data

206

(1)

13.1.3 Structured Data

206

(1)

13.2 Where We Want to Go

206

(3)

Appendix A Resources

209

(2)

A.1 Software

209

(1)

A.2 Datasets

209

(2)

Bibliography

211

(24)

Index

235

Joćo Gama is an associate professor and senior researcher in the Laboratory of Artificial Intelligence and Decision Support (LIAAD) at the University of Porto in Portugal.

Knowledge Discovery from Data Streams [Hardback]

Recenzijas

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Grāmatas angļu valodā

Izvēlieties iepirkumu grozu