Atjaunināt sīkdatņu piekrišanu

Knowledge Discovery from Data Streams [Hardback]

(University of Porto, Portugal)
  • Formāts: Hardback, 258 pages, height x width: 234x156 mm, weight: 476 g, 11 Tables, black and white; 62 Illustrations, black and white
  • Sērija : Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
  • Izdošanas datums: 25-May-2010
  • Izdevniecība: Chapman & Hall/CRC
  • ISBN-10: 1439826110
  • ISBN-13: 9781439826119
Citas grāmatas par šo tēmu:
  • Hardback
  • Cena: 119,73 €
  • Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
  • Daudzums:
  • Ielikt grozā
  • Piegādes laiks - 4-6 nedēļas
  • Pievienot vēlmju sarakstam
  • Bibliotēkām
  • Formāts: Hardback, 258 pages, height x width: 234x156 mm, weight: 476 g, 11 Tables, black and white; 62 Illustrations, black and white
  • Sērija : Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
  • Izdošanas datums: 25-May-2010
  • Izdevniecība: Chapman & Hall/CRC
  • ISBN-10: 1439826110
  • ISBN-13: 9781439826119
Citas grāmatas par šo tēmu:
Since the beginning of the Internet age and the increased use of ubiquitous computing devices, the large volume and continuous flow of distributed data have imposed new constraints on the design of learning algorithms. Exploring how to extract knowledge structures from evolving and time-changing data, Knowledge Discovery from Data Streams presents a coherent overview of state-of-the-art research in learning from data streams.

The book covers the fundamentals that are imperative to understanding data streams and describes important applications, such as TCP/IP traffic, GPS data, sensor networks, and customer click streams. It also addresses several challenges of data mining in the future, when stream mining will be at the core of many applications. These challenges involve designing useful and efficient data mining solutions applicable to real-world problems. In the appendix, the author includes examples of publicly available software and online data sets.

This practical, up-to-date book focuses on the new requirements of the next generation of data mining. Although the concepts presented in the text are mainly about data streams, they also are valid for different areas of machine learning and data mining.

Recenzijas

this book is the first authored text (that is, not an edited collection) about the area The book covers a lot of ground in just 200 pages, including discussion of relatively advanced methods such as wavelets, bagging, boosting, dynamic time warping, and symbolic representation of time series. There is also, I was pleased to see, a chapter on evaluating streaming algorithms . Evaluation, in general, deserves more attention than it generally receives, so I was delighted to see the focus on it here. a good introduction to an area of data analysis which is going to be very important indeed. David J. Hand, International Statistical Review, 2012

Gama is one of the leading investigators in the hottest research topic in machine learning and data mining: data streams. This book is the first book to didactically cover in a clear, comprehensive and mathematically rigorous way the main machine learning related aspects of this relevant research field. an up-to-date, broad and useful source of reference for all those interested in knowledge acquisition by learning techniques. From the Foreword by André Ponce de Leon Ferreira de Carvalho, University of Sćo Paulo, Brazil

List of Tables
xi
List of Figures
xiii
List of Algorithms
xv
Foreword xvii
Acknowledgments xix
1 Knowledge Discovery from Data Streams
1(6)
1.1 Introduction
1(1)
1.2 An Illustrative Example
2(2)
1.3 A World in Movement
4(1)
1.4 Data Mining and Data Streams
5(2)
2 Introduction to Data Streams
7(26)
2.1 Data Stream Models
7(2)
2.1.1 Research Issues in Data Stream Management Systems
8(1)
2.1.2 An Illustrative Problem
8(1)
2.2 Basic Streaming Methods
9(14)
2.2.1 Illustrative Examples
10(1)
2.2.1.1 Counting the Number of Occurrences of the Elements in a Stream
10(1)
2.2.1.2 Counting the Number of Distinct Values in a Stream
11(1)
2.2.2 Bounds of Random Variables
11(2)
2.2.3 Poisson Processes
13(1)
2.2.4 Maintaining Simple Statistics from Data Streams
14(1)
2.2.5 Sliding Windows
14(2)
2.2.5.1 Computing Statistics over Sliding Windows: The ADWIN Algorithm
16(3)
2.2.6 Data Synopsis
19(1)
2.2.6.1 Sampling
19(1)
2.2.6.2 Synopsis and Histograms
20(1)
2.2.6.3 Wavelets
21(1)
2.2.6.4 Discrete Fourier Transform
22(1)
2.3 Illustrative Applications
23(7)
2.3.1 A Data Warehouse Problem: Hot-Lists
23(1)
2.3.2 Computing the Entropy in a Stream
24(3)
2.3.3 Monitoring Correlations Between Data Streams
27(2)
2.3.4 Monitoring Threshold Functions over Distributed Data Streams
29(1)
2.4 Notes
30(3)
3 Change Detection
33(16)
3.1 Introduction
33(1)
3.2 Tracking Drifting Concepts
34(8)
3.2.1 The Nature of Change
35(1)
3.2.2 Characterization of Drift Detection Methods
36(1)
3.2.2.1 Data Management
37(1)
3.2.2.2 Detection Methods
38(2)
3.2.2.3 Adaptation Methods
40(1)
3.2.2.4 Decision Model Management
41(1)
3.2.3 A Note on Evaluating Change Detection Methods
41(1)
3.3 Monitoring the Learning Process
42(4)
3.3.1 Drift Detection Using Statistical Process Control
42(3)
3.3.2 An Illustrative Example
45(1)
3.4 Final Remarks
46(1)
3.5 Notes
47(2)
4 Maintaining Histograms from Data Streams
49(14)
4.1 Introduction
49(1)
4.2 Histograms from Data Streams
50(3)
4.2.1 K-buckets Histograms
50(1)
4.2.2 Exponential Histograms
51(1)
4.2.2.1 An Illustrative Example
52(1)
4.2.2.2 Discussion
52(1)
4.3 The Partition Incremental Discretization Algorithm - PiD
53(6)
4.3.1 Analysis of the Algorithm
56(1)
4.3.2 Change Detection in Histograms
56(1)
4.3.3 An Illustrative Example
57(2)
4.4 Applications to Data Mining
59(3)
4.4.1 Applying PiD in Supervised Learning
59(2)
4.4.2 Time-Changing Environments
61(1)
4.5 Notes
62(1)
5 Evaluating Streaming Algorithms
63(16)
5.1 Introduction
63(1)
5.2 Learning from Data Streams
64(1)
5.3 Evaluation Issues
65(10)
5.3.1 Design of Evaluation Experiments
66(1)
5.3.2 Evaluation Metrics
67(1)
5.3.2.1 Error Estimators Using a Single Algorithm and a Single Dataset
68(1)
5.3.2.2 An Illustrative Example
68(1)
5.3.3 Comparative Assessment
69(1)
5.3.3.1 The 0-1 Loss Function
70(1)
5.3.3.2 Illustrative Example
71(1)
5.3.4 Evaluation Methodology in Non-Stationary Environments
72(1)
5.3.4.1 The Page-Hinkley Algorithm
72(1)
5.3.4.2 Illustrative Example
73(2)
5.4 Lessons Learned and Open Issues
75(2)
5.5 Notes
77(2)
6 Clustering from Data Streams
79(18)
6.1 Introduction
79(1)
6.2 Clustering Examples
80(10)
6.2.1 Basic Concepts
80(2)
6.2.2 Partitioning Clustering
82(1)
6.2.2.1 The Leader Algorithm
82(1)
6.2.2.2 Single Pass k-Means
82(1)
6.2.3 Hierarchical Clustering
83(2)
6.2.4 Micro Clustering
85(1)
6.2.4.1 Discussion
86(1)
6.2.4.2 Monitoring Cluster Evolution
86(1)
6.2.5 Grid Clustering
87(1)
6.2.5.1 Computing the Fractal Dimension
88(1)
6.2.5.2 Fractal Clustering
88(2)
6.3 Clustering Variables
90(6)
6.3.1 A Hierarchical Approach
91(1)
6.3.1.1 Growing the Hierarchy
91(3)
6.3.1.2 Aggregating at Concept Drift Detection
94(2)
6.3.1.3 Analysis of the Algorithm
96(1)
6.4 Notes
96(1)
7 Frequent Pattern Mining
97(18)
7.1 Introduction to Frequent Itemset Mining
97(4)
7.1.1 The Search Space
98(2)
7.1.2 The FP-growth Algorithm
100(1)
7.1.3 Summarizing Itemsets
100(1)
7.2 Heavy Hitters
101(2)
7.3 Mining Frequent Itemsets from Data Streams
103(7)
7.3.1 Landmark Windows
104(1)
7.3.1.1 The LossyCounting Algorithm
104(1)
7.3.1.2 Frequent Itemsets Using LossyCounting
104(1)
7.3.2 Mining Recent Frequent Itemsets
105(1)
7.3.2.1 Maintaining Frequent Itemsets in Sliding Windows
105(1)
7.3.2.2 Mining Closed Frequent Itemsets over Sliding Windows
106(2)
7.3.3 Frequent Itemsets at Multiple Time Granularities
108(2)
7.4 Sequence Pattern Mining
110(3)
7.4.1 Reservoir Sampling for Sequential Pattern Mining over Data Streams
111(2)
7.5 Notes
113(2)
8 Decision Trees from Data Streams
115(18)
8.1 Introduction
115(1)
8.2 The Very Fast Decision Tree Algorithm
116(3)
8.2.1 VFDT ---The Base Algorithm
116(2)
8.2.2 Analysis of the VFDT Algorithm
118(1)
8.3 Extensions to the Basic Algorithm
119(10)
8.3.1 Processing Continuous Attributes
119(1)
8.3.1.1 Exhaustive Search
119(2)
8.3.1.2 Discriminant Analysis
121(2)
8.3.2 Functional Tree Leaves
123(1)
8.3.3 Concept Drift
124(2)
8.3.3.1 Detecting Changes
126(1)
8.3.3.2 Reacting to Changes
127(1)
8.3.4 Final Comments
128(1)
8.4 OLIN: Info-Fuzzy Algorithms
129(3)
8.5 Notes
132(1)
9 Novelty Detection in Data Streams
133(20)
9.1 Introduction
133(1)
9.2 Learning and Novelty
134(1)
9.2.1 Desiderata for Novelty Detection
135(1)
9.3 Novelty Detection as a One-Class Classification Problem
135(6)
9.3.1 Autoassociator Networks
136(1)
9.3.2 The Positive Naive-Bayes
137(1)
9.3.3 Decision Trees for One-Class Classification
138(1)
9.3.4 The One-Class SVM
138(1)
9.3.5 Evaluation of One-Class Classification Algorithms
139(2)
9.4 Learning New Concepts
141(3)
9.4.1 Approaches Based on Extreme Values
141(1)
9.4.2 Approaches Based on the Decision Structure
142(1)
9.4.3 Approaches Based on Frequency
143(1)
9.4.4 Approaches Based on Distances
144(1)
9.5 The Online Novelty and Drift Detection Algorithm
144(7)
9.5.1 Initial Learning Phase
145(1)
9.5.2 Continuous Unsupervised Learning Phase
146(1)
9.5.2.1 Identifying Novel Concepts
147(2)
9.5.2.2 Attempting to Determine the Nature of New Concepts
149(1)
9.5.2.3 Merging Similar Concepts
149(1)
9.5.2.4 Automatically Adapting the Number of Clusters
150(1)
9.5.3 Computational Cost
150(1)
9.6 Notes
151(2)
10 Ensembles of Classifiers
153(14)
10.1 Introduction
153(2)
10.2 Linear Combination of Ensembles
155(1)
10.3 Sampling from a Training Set
156(4)
10.3.1 Online Bagging
157(1)
10.3.2 Online Boosting
158(2)
10.4 Ensembles of Trees
160(2)
10.4.1 Option Trees
160(1)
10.4.2 Forest of Trees
161(1)
10.4.2.1 Generating Forest of Trees
162(1)
10.4.2.2 Classifying Test Examples
162(1)
10.5 Adapting to Drift Using Ensembles of Classifiers
162(3)
10.6 Mining Skewed Data Streams with Ensembles
165(1)
10.7 Notes
166(1)
11 Time Series Data Streams
167(18)
11.1 Introduction to Time Series Analysis
167(2)
11.1.1 Trend
167(2)
11.1.2 Seasonality
169(1)
11.1.3 Stationarity
169(1)
11.2 Time-Series Prediction
169(8)
11.2.1 The Kalman Filter
170(3)
11.2.2 Least Mean Squares
173(1)
11.2.3 Neural Nets and Data Streams
173(1)
11.2.3.1 Stochastic Sequential Learning of Neural Networks
174(1)
11.2.3.2 Illustrative Example: Load Forecast in Data Streams
175(2)
11.3 Similarity between Time-Series
177(3)
11.3.1 Euclidean Distance
177(1)
11.3.2 Dynamic Time-Warping
178(2)
11.4 Symbolic Approximation - SAX
180(4)
11.4.1 The SAX Transform
180(1)
11.4.1.1 Piecewise Aggregate Approximation (PAA)
181(1)
11.4.1.2 Symbolic Discretization
181(1)
11.4.1.3 Distance Measure
182(1)
11.4.1.4 Discussion
182(1)
11.4.2 Finding Motifs Using SAX
183(1)
11.4.3 Finding Discords Using SAX
183(1)
11.5 Notes
184(1)
12 Ubiquitous Data Mining
185(20)
12.1 Introduction to Ubiquitous Data Mining
185(1)
12.2 Distributed Data Stream Monitoring
186(7)
12.2.1 Distributed Computing of Linear Functions
187(1)
12.2.1.1 A General Algorithm for Computing Linear Functions
188(1)
12.2.2 Computing Sparse Correlation Matrices Efficiently
189(2)
12.2.2.1 Monitoring Sparse Correlation Matrices
191(1)
12.2.2.2 Detecting Significant Correlations
192(1)
12.2.2.3 Dealing with Data Streams
192(1)
12.3 Distributed Clustering
193(4)
12.3.1 Conquering the Divide
193(1)
12.3.1.1 Furthest Point Clustering
193(1)
12.3.1.2 The Parallel Guessing Clustering
193(1)
12.3.2 DGClust - Distributed Grid Clustering
194(1)
12.3.2.1 Local Adaptive Grid
194(1)
12.3.2.2 Frequent State Monitoring
195(1)
12.3.2.3 Centralized Online Clustering
196(1)
12.4 Algorithm Granularity
197(6)
12.4.1 Algorithm Granularity Overview
199(1)
12.4.2 Formalization of Algorithm Granularity
200(1)
12.4.2.1 Algorithm Granularity Procedure
200(1)
12.4.2.2 Algorithm Output Granularity
201(2)
12.5 Notes
203(2)
13 Final Comments
205(4)
13.1 The Next Generation of Knowledge Discovery
205(1)
13.1.1 Mining Spatial Data
206(1)
13.1.2 The Time Situation of Data
206(1)
13.1.3 Structured Data
206(1)
13.2 Where We Want to Go
206(3)
Appendix A Resources
209(2)
A.1 Software
209(1)
A.2 Datasets
209(2)
Bibliography 211(24)
Index 235
Joćo Gama is an associate professor and senior researcher in the Laboratory of Artificial Intelligence and Decision Support (LIAAD) at the University of Porto in Portugal.