Atjaunināt sīkdatņu piekrišanu

E-grāmata: Contrast Data Mining: Concepts, Algorithms, and Applications [Taylor & Francis e-book]

Edited by (The University of Melbourne, Victoria, Australia), Edited by (Wright State University, Ohio, USA)
Citas grāmatas par šo tēmu:
  • Taylor & Francis e-book
  • Cena: 160,08 €*
  • * this price gives unlimited concurrent access for unlimited time
  • Standarta cena: 228,69 €
  • Ietaupiet 30%
Citas grāmatas par šo tēmu:
"Preface Contrasting is one of the most basic types of analysis. Contrasting based analysis is routinely employed, often subconsciously, by all types of people. People use contrasting to better understand the world around them and the challenging problems they want to solve. People use contrasting to accurately assess the desirability of important situations, and to help them better avoid potentially harmful situations and embrace potentially beneficial ones. Contrasting involves the comparison of one dataset against another. The datasets may represent data of different time periods, spatial locations, or classes, or they may represent data satisfying different conditions. Contrasting is often employed to compare cases with a desirable outcome against cases with an undesirable one, for example comparing the benign and diseased tissue classes of a cancer, or comparing students who graduate with university degrees against those who do not. Contrasting can identify patterns that capture changes and trends over time or space, or identify discriminative patterns that capture differences among contrasting classes or conditions. Traditional methods for contrasting multiple datasets were often very simple so that they could be performed by hand. For example, onecould compare the respective feature means, compare the respective attribute-value distributions, or compare the respective probabilities of simple patterns, in the datasets being contrasted. However, the simplicity of such approaches has limitations, asit is difficult to use them to identify specific patterns that offer novel and actionable insights, and identify desirable sets of discriminative patterns for building accurate and explainable classifiers"--



A Fruitful Field for Researching Data Mining Methodology and for Solving Real-Life Problems
Contrast Data Mining: Concepts, Algorithms, and Applications collects recent results from this specialized area of data mining that have previously been scattered in the literature, making them more accessible to researchers and developers in data mining and other fields. The book not only presents concepts and techniques for contrast data mining, but also explores the use of contrast mining to solve challenging problems in various scientific, medical, and business domains.

Learn from Real Case Studies of Contrast Mining Applications
In this volume, researchers from around the world specializing in architecture engineering, bioinformatics, computer science, medicine, and systems engineering focus on the mining and use of contrast patterns. They demonstrate many useful and powerful capabilities of a variety of contrast mining techniques and algorithms, including tree-based structures, zero-suppressed binary decision diagrams, data cube representations, and clustering algorithms. They also examine how contrast mining is used in leukemia characterization, discriminative gene transfer and microarray analysis, computational toxicology, spatial and image data classification, voting analysis, heart disease prediction, crime analysis, understanding customer behavior, genetic algorithms, and network security.

Foreword xix
Preface xxi
I Preliminaries and Statistical Contrast Measures
1(20)
1 Preliminaries
3(10)
Guozhu Dong
1.1 Datasets of Various Data Types
3(1)
1.2 Data Preprocessing
4(2)
1.3 Patterns and Models
6(2)
1.4 Contrast Patterns and Models
8(5)
2 Statistical Measures for Contrast Patterns
13(8)
James Bailey
2.1 Introduction
13(2)
2.1.1 Terminology
14(1)
2.2 Measures for Assessing Quality of Discrete Contrast Patterns
15(3)
2.3 Measures for Assessing Quality of Continuous Valued Contrast Patterns
18(1)
2.4 Feature Construction and Selection: PCA and Discriminative Methods
19(1)
2.5 Summary
20(1)
II Contrast Mining Algorithms
21(66)
3 Mining Emerging Patterns Using Tree Structures or Tree Based Searches
23(8)
James Bailey
Kotagiri Ramamohanarao
3.1 Introduction
23(2)
3.1.1 Terminology
24(1)
3.2 Ratio Tree Structure for Mining Jumping Emerging Patterns
25(2)
3.3 Contrast Pattern Tree Structure
27(1)
3.4 Tree Based Contrast Pattern Mining with Equivalence Classes
28(1)
3.5 Summary and Conclusion
29(2)
4 Mining Emerging Patterns Using Zero-Suppressed Binary Decision Diagrams
31(8)
James Bailey
Elsa Loekito
4.1 Introduction
31(1)
4.2 Background on Binary Decision Diagrams and ZBDDs
32(3)
4.3 Mining Emerging Patterns Using ZBDDs
35(3)
4.4 Discussion and Summary
38(1)
5 Efficient Direct Mining of Selective Discriminative Patterns for Classification
39(20)
Hong Cheng
Jiawei Han
Xifeng Yan
Philip S. Yu
5.1 Introduction
40(2)
5.2 DDPMine: Direct Discriminative Pattern Mining
42(7)
5.2.1 Branch-and-Bound Search
42(2)
5.2.2 Training Instance Elimination
44(2)
5.2.2.1 Progressively Shrinking FP-Tree
46(1)
5.2.2.2 Feature Coverage
46(2)
5.2.3 Efficiency Analysis
48(1)
5.2.4 Summary
49(1)
5.3 Harmony: Efficiently Mining The Best Rules For Classification
49(6)
5.3.1 Rule Enumeration
50(1)
5.3.2 Ordering of the Local Items
51(2)
5.3.3 Search Space Pruning
53(1)
5.3.4 Summary
54(1)
5.4 Performance Comparison Between DDPMine and Harmony
55(1)
5.5 Related Work
56(2)
5.5.1 MbT: Direct Mining Discriminative Patterns via Model-based Search Tree
56(1)
5.5.2 NDPMine: Direct Mining Discriminative Numerical Features
56(1)
5.5.3 uHarmony: Mining Discriminative Patterns from Uncertain Data
57(1)
5.5.4 Applications of Discriminative Pattern Based Classification
57(1)
5.5.5 Discriminative Frequent Pattern Based Classification vs. Traditional Classification
58(1)
5.6 Conclusions
58(1)
6 Mining Emerging Patterns from Structured Data
59(10)
James Bailey
6.1 Introduction
59(1)
6.2 Contrasts in Sequence Data: Distinguishing Sequence Patterns
60(2)
6.2.1 Definitions
61(1)
6.2.2 Mining Approach
62(1)
6.3 Contrasts in Graph Datasets: Minimal Contrast Subgraph Patterns
62(4)
6.3.1 Terminology and Definitions for Contrast Subgraphs
64(1)
6.3.2 Mining Algorithms for Minimal Contrast Subgraphs
65(1)
6.4 Summary
66(3)
7 Incremental Maintenance of Emerging Patterns
69(18)
Mengling Feng
Guozhu Dong
7.1 Background & Potential Applications
70(2)
7.2 Problem Definition & Challenges
72(2)
7.2.1 Potential Challenges
73(1)
7.3 Concise Representation of Pattern Space: The Border
74(2)
7.4 Maintenance of Border
76(7)
7.4.1 Basic Border Operations
77(1)
7.4.2 Insertion of New Instances
78(2)
7.4.3 Removal of Existing Instances
80(1)
7.4.4 Expansion of Query Item Space
81(1)
7.4.5 Shrinkage of Query Item Space
82(1)
7.5 Related Work
83(2)
7.6 Closing Remarks
85(2)
III Generalized Contrasts, Emerging Data Cubes, and Rough Sets
87(62)
8 More Expressive Contrast Patterns and Their Mining
89(20)
Lei Duan
Milton Garcia Borroto
Guozhu Dong
8.1 Introduction
89(1)
8.2 Disjunctive Emerging Pattern Mining
90(3)
8.2.1 Basic Definitions
90(1)
8.2.2 ZBDD Based Approach to Disjunctive EP Mining
91(2)
8.3 Fuzzy Emerging Pattern Mining
93(7)
8.3.1 Advantages of Fuzzy Logic
93(1)
8.3.2 Fuzzy Emerging Patterns Defined
94(1)
8.3.3 Mining Fuzzy Emerging Patterns
95(3)
8.3.4 Using Fuzzy Emerging Patterns in Classification
98(2)
8.4 Contrast Inequality Discovery
100(7)
8.4.1 Basic Definitions
100(2)
8.4.2 Brief Introduction to GEP
102(1)
8.4.3 GEP Algorithm for Mining Contrast Inequalities
103(2)
8.4.4 Experimental Evaluation of GEPCIM
105(1)
8.4.5 Future Work
106(1)
8.5 Contrast Equation Mining
107(1)
8.6 Discussion
108(1)
9 Emerging Data Cube Representations for OLAP Database Mining
109(20)
Sebastien Nedjar
Lotfi Lakhal
Rosine Cicchetti
9.1 Introduction
109(2)
9.2 Emerging Cube
111(3)
9.3 Representations of the Emerging Cube
114(11)
9.3.1 Representations for OLAP Classification
114(1)
9.3.1.1 Borders [ L; U]
114(2)
9.3.1.2 Borders [ U#; U]
116(1)
9.3.2 Representations for OLAP Querying
117(1)
9.3.2.1 L-Emerging Closed Cubes
117(3)
9.3.2.2 U#-Emerging Closed Cubes
120(1)
9.3.2.3 Reduced U#-Emerging Closed Cubes
121(1)
9.3.3 Representation for OLAP Navigation
122(3)
9.4 Discussion
125(1)
9.5 Conclusion
126(3)
10 Relation Between Jumping Emerging Patterns and Rough Set Theory
129(20)
Pawel Terlecki
Krzysztof Walczak
10.1 Introduction
129(1)
10.2 Theoretical Foundations
130(3)
10.3 JEPs with Negation
133(8)
10.3.1 Negative Knowledge in Transaction Databases
133(3)
10.3.2 Transformation to Decision Table
136(1)
10.3.3 Properties
137(2)
10.3.4 Mining Approaches
139(2)
10.4 JEP Mining by Means of Local Reducts
141(8)
10.4.1 Global Condensation
142(1)
10.4.1.1 Condensed Decision Table
142(1)
10.4.1.2 Proper Partition Finding as Graph Coloring
143(1)
10.4.1.3 Discovery Method
144(1)
10.4.2 Local Projection
145(1)
10.4.2.1 Locally Projected Decision Table
146(1)
10.4.2.2 Discovery Method
147(2)
IV Contrast Mining for Classification & Clustering
149(68)
11 Overview and Analysis of Contrast Pattern Based Classification
151(20)
Xiuzhen Zhang
Guozhu Dong
11.1 Introduction
151(1)
11.2 Main Issues in Contrast Pattern Based Classification
152(2)
11.3 Representative Approaches
154(6)
11.3.1 Contrast Pattern Mining and Selection
154(1)
11.3.2 Classification Strategy
155(4)
11.3.3 Summary
159(1)
11.4 Bias Variance Analysis of iCAEP and Others
160(2)
11.5 Overfitting Avoidance by CP-Based Approaches
162(2)
11.6 Solving the Imbalanced Classification Problem
164(3)
11.6.1 Advantages of Contrast Pattern Based Classification
164(1)
11.6.2 Performance Results of iCAEP
165(2)
11.7 Conclusion and Discussion
167(4)
12 Using Emerging Patterns in Outlier and Rare-Class Prediction
171(16)
Lijun Chen
Guozhu Dong
12.1 Introduction
171(1)
12.2 EP-length Statistic Based Outlier Detection
172(3)
12.2.1 EP Based Discriminative Information for One Class
173(1)
12.2.2 Mining EPs From One-class Data
173(1)
12.2.3 Defining the Length Statistics of EPs
174(1)
12.2.4 Using Average Length Statistics for Classification
174(1)
12.2.5 The Complete OCLEP Classifier
175(1)
12.3 Experiments on OCLEP on Masquerader Detection
175(8)
12.3.1 Masquerader Detection
176(1)
12.3.2 Data Used and Evaluation Settings
176(1)
12.3.3 Data Preprocessing and Feature Construction
177(1)
12.3.4 One-class Support Vector Machine (ocSVM)
178(1)
12.3.5 Experiment Results Using OCLEP
178(1)
12.3.5.1 SEA Experiment
178(3)
12.3.5.2 1v49' Experiment
181(1)
12.3.5.3 Situations When OCLEP is Better
181(1)
12.3.5.4 Feature Based OCLEP Ensemble
182(1)
12.4 Rare-class Classification Using EPs
183(1)
12.5 Advantages of EP-based Rare-class Instance Creation
184(1)
12.6 Related Work and Discussion
185(2)
13 Enhancing Traditional Classifiers Using Emerging Patterns
187(10)
Guozhu Dong
Kotagiri Ramamohanarao
13.1 Introduction
187(1)
13.2 Emerging Pattern Based Class Membership Score
188(1)
13.3 Emerging Pattern Enhanced Weighted/Fuzzy SVM
188(5)
13.3.1 Determining Instance Relevance Weight
189(2)
13.3.2 Constructing Weighted SVM
191(1)
13.3.3 Performance Evaluation
192(1)
13.4 Emerging Pattern Based Weighted Decision Trees
193(3)
13.4.1 Determining Class Membership Weight
193(1)
13.4.2 Constructing Weighted Decision Trees
194(1)
13.4.3 Performance Evaluation
195(1)
13.4.4 Discussion
195(1)
13.5 Related Work
196(1)
14 CPC: A Contrast Pattern Based Clustering Algorithm
197(20)
Neil Fore
Guozhu Dong
14.1 Introduction
197(2)
14.2 Related Work
199(1)
14.3 Preliminaries
200(2)
14.3.1 Equivalence Classes of Frequent Itemsets
200(1)
14.3.2 CPCQ: Contrast Pattern Based Clustering Quality Index
200(2)
14.4 CPC Design and Rationale
202(8)
14.4.1 Overview
202(1)
14.4.2 MPQ
202(3)
14.4.3 The CPC Algorithm
205(3)
14.4.4 CPC Illustration
208(1)
14.4.5 Optimization and Implementation Details
209(1)
14.5 Experimental Evaluation
210(6)
14.5.1 Datasets and Clustering Algorithms
210(1)
14.5.2 CPC Parameters
211(1)
14.5.3 Experiment Settings
211(1)
14.5.4 Categorical Datasets
212(1)
14.5.5 Numerical Dataset
213(1)
14.5.6 Document Clustering
213(1)
14.5.7 CPC Execution Time and Memory Use
214(1)
14.5.8 Effect of Pattern Limit on Clustering Quality
215(1)
14.6 Discussion and Future Work
216(1)
14.6.1 Alternate MPQ Definition
216(1)
14.6.2 Future Work
216(1)
V Contrast Mining for Bioinformatics and Chemoinformatics
217(66)
15 Emerging Pattern Based Rules Characterizing Subtypes of Leukemia
219(14)
Jinyan Li
Limsoon Wong
15.1 Introduction
219(1)
15.2 Motivation and Overview of PCL
220(1)
15.3 Data Used in the Study
221(1)
15.4 Discovery of Emerging Patterns
222(2)
15.4.1 Step 1: Gene Selection and Discretization
222(1)
15.4.2 Step 2: Discovering EPs
223(1)
15.5 Deriving Rules from Tree-Structured Leukemia Datasets
224(2)
15.5.1 Rules for T-All vs Others1
225(1)
15.5.2 Rules for E2A-PBX1 vs Others2
225(1)
15.5.3 Rules through Level 3 to Level 6
225(1)
15.6 Classification by PCL on the Tree-Structured Data
226(4)
15.6.1 PCL: Prediction by Collective Likelihood of Emerging Patterns
226(2)
15.6.2 Strengthening the Prediction Method at Levels 1 & 2
228(1)
15.6.3 Comparison with Other Methods
229(1)
15.7 Generalized PCL for Parallel Multi-Class Classification
230(1)
15.8 Performance Using Randomly Selected Genes
231(1)
15.9 Summary
232(1)
16 Discriminating Gene Transfer and Microarray Concordance Analysis
233(8)
Shihong Mao
Guozhu Dong
16.1 Introduction
233(1)
16.2 Datasets Used in Experiments and Preprocessing
234(2)
16.3 Discriminating Genes and Associated Classifiers
236(1)
16.4 Measures for Transferability
237(1)
16.4.1 Measures for Discriminative Gene Transferability
237(1)
16.4.2 Measures for Classifier Transferability
238(1)
16.5 Findings on Microarray Concordance
238(1)
16.5.1 Concordance Test by Classifier Transferability
238(1)
16.5.2 Split Value Consistency Rate Analysis
238(1)
16.5.3 Shared Discriminating Gene Based P-Value
239(1)
16.6 Discussion
239(2)
17 Towards Mining Optimal Emerging Patterns Amidst 1000s of Genes
241(12)
Shihong Mao
Guozhu Dong
17.1 Introduction
241(2)
17.2 Gene Club Formation Methods
243(2)
17.2.1 The Independent Gene Club Formation Method
244(1)
17.2.2 The Iterative Gene Club Formation Method
244(1)
17.2.3 Two Divisive Gene Club Formation Methods
244(1)
17.3 Interaction Based Importance Index of Genes
245(1)
17.4 Computing IBIG and Highest Support EPs for Top IBIG Genes
246(1)
17.5 Experimental Evaluation of Gene Club Methods
246(4)
17.5.1 Ability to Find Top Quality EPs from 75 Genes
246(1)
17.5.2 Ability to Discover High Support EPs and Signature EPs, Possibly Involving Lowly Ranked Genes
247(1)
17.5.3 High Support Emerging Patterns Mined
248(1)
17.5.4 Comparison of the Four Gene Club Methods
249(1)
17.5.5 IBIG vs Information Gain Based Ranking
250(1)
17.6 Discussion
250(3)
18 Emerging Chemical Patterns - Theory and Applications
253(16)
Jens Auer
Martin Vogt
Jurgen Bajorath
18.1 Introduction
253(1)
18.2 Theory
254(3)
18.3 Compound Classification
257(2)
18.4 Computational Medicinal Chemistry Applications
259(6)
18.4.1 Simulated Lead Optimization
259(1)
18.4.2 Simulated Sequential Screening
260(2)
18.4.3 Bioactive Conformation Analysis
262(3)
18.5 Chemoinformatics Glossary
265(4)
19 Emerging Patterns as Structural Alerts for Computational Toxicology
269(14)
Bertrand Cuissart
Guillaume Poezevara
Bruno Cremilleux
Alban Lepailleur
Ronan Bureau
19.1 Introduction
270(1)
19.2 Frequent Emerging Molecular Patterns as Potential Structural Alerts
271(4)
19.2.1 Definition of Frequent Emerging Molecular Pattern
271(1)
19.2.2 Using RPMPs as Condensed Representation of FEMPs
272(2)
19.2.3 Notes on the Computation
274(1)
19.2.4 Related Work
274(1)
19.3 Experiments in Predictive Toxicology
275(3)
19.3.1 Materials and Experimental Setup
275(1)
19.3.2 Generalization of the RPMPs
276(2)
19.4 A Chemical Analysis of RPMPs
278(2)
19.5 Conclusion
280(3)
VI Contrast Mining for Special Domains
283(68)
20 Emerging Patterns and Classification for Spatial and Image Data
285(18)
Lukasz Kobylinski
Krzysztof Walczak
20.1 Introduction
285(1)
20.2 Previous Work
286(1)
20.3 Image Representation
287(1)
20.4 Jumping Emerging Patterns with Occurrence Counts
288(6)
20.4.1 Formal Definition
288(2)
20.4.2 Mining Algorithm
290(3)
20.4.3 Use in Classification
293(1)
20.5 Spatial Emerging Patterns
294(3)
20.6 Jumping Emerging Substrings
297(1)
20.7 Experimental Results
298(2)
20.8 Conclusions
300(3)
21 Geospatial Contrast Mining with Applications on Labeled Spatial Data
303(14)
Wei Ding
Tomasz F. Stepinski
Josue Salazar
21.1 Introduction
303(1)
21.2 Related Work
304(2)
21.3 Problem Formulation
306(1)
21.4 Identification of Geospatial Discriminative Patterns and Discovery of Optimal Boundary
306(2)
21.5 Pattern Summarization
308(2)
21.6 Application on Vegetation Analysis
310(2)
21.7 Application on Presidential Election Data Analysis
312(1)
21.8 Application on Biodiversity Analysis of Bird Species
313(2)
21.9 Conclusion
315(2)
22 Mining Emerging Patterns for Activity Recognition
317(12)
Tao Gu
Zhanqing Wu
XianPing Tao
Hung Keng Pung
Jian Lu
22.1 Introduction
318(1)
22.2 Data Preprocessing
318(1)
22.3 Mining Emerging Patterns For Activity Recognition
319(1)
22.3.1 Problem Statement
319(1)
22.3.2 Mining Emerging Patterns from Sequential Activity Instances
319(1)
22.4 The epSICAR Algorithm
320(4)
22.4.1 Score Function for Sequential Activity
320(1)
22.4.1.1 EP Score
320(1)
22.4.1.2 Coverage Score
321(1)
22.4.1.3 Correlation Score
322(1)
22.4.2 Score Function for Interleaved and Concurrent Activities
322(1)
22.4.3 The epSICAR Algorithm
323(1)
22.5 Empirical Studies
324(3)
22.5.1 Trace Collection and Evaluation Methodology
324(1)
22.5.2 Experiment 1: Accuracy Performance
325(1)
22.5.3 Experiment 2: Model Analysis
326(1)
22.6 Conclusion
327(2)
23 Emerging Pattern Based Prediction of Heart Diseases and Powerline Safety
329(8)
Keun Ho Ryu
Dong Gyu Lee
Minghao Piao
23.1 Introduction
329(1)
23.2 Prediction of Myocardial Ischemia
330(3)
23.3 Coronary Artery Disease Diagnosis
333(1)
23.4 Classification of Powerline Safety
334(2)
23.5 Conclusion
336(1)
24 Emerging Pattern Based Crime Spots Analysis and Rental Price Prediction
337(14)
Naoki Katoh
Atsushi Takizawa
24.1 Introduction
337(1)
24.2 Street Crime Analysis
337(7)
24.2.1 Studied Area and Databases
338(1)
24.2.2 Attributes on Visibility
339(2)
24.2.3 Preparation of the Analysis
341(1)
24.2.4 Result
341(3)
24.3 Prediction of Apartment Rental Price
344(7)
24.3.1 Background and Motivation
344(1)
24.3.2 Data
344(3)
24.3.3 Extracting Frequent Subgraphs
347(1)
24.3.4 Discovering Primary Subgraphs by Emerging Patterns
348(1)
24.3.5 Rent Price Prediction Model
349(2)
VII Survey of Other Papers
351(12)
25 Overview of Results on Contrast Mining and Applications
353(10)
Guozhu Dong
25.1 General Papers, Events, PhD Dissertations
354(1)
25.2 Analysis and Measures on Contrasts and Similarity
354(1)
25.3 Contrast Mining Algorithms
355(3)
25.3.1 Mining Contrasts and Changes in General Data
355(2)
25.3.2 Mining Contrasts in Stream, Temporal, Sequence Data
357(1)
25.3.3 Mining Contrasts in Spatial, Image, and Graph Data
357(1)
25.3.4 Unusual Subgroup Discovery and Description
358(1)
25.3.5 Mining Conditional Contrasts and Gradients
358(1)
25.4 Contrast Pattern Based Classification
358(1)
25.5 Contrast Pattern Based Clustering
359(1)
25.6 Contrast Mining and Bioinformatics and Chemoinformatics
360(1)
25.7 Contrast Mining Applications in Various Domains
361(2)
25.7.1 Medicine, Environment, Security, Privacy, Activity Recognition
361(1)
25.7.2 Business, Customer Behavior, Music, Video, Blog
361(1)
25.7.3 Model Error Analysis, and Genetic Algorithm Improvement
362(1)
Bibliography 363(40)
Index 403
Guozhu Dong is a professor at Wright State University. A senior member of the IEEE and ACM, Dr. Dong holds four U.S. patents and has authored over 130 articles on databases, data mining, and bioinformatics; co-authored Sequence Data Mining; and co-edited Contrast Data Mining and Applications. His research focuses on contrast/emerging pattern mining and applications as well as first-order incremental view maintenance. He has a PhD in computer science from the University of Southern California.

James Bailey is an Australian Research Council Future Fellow in the Department of Computing and Information Systems at the University of Melbourne. Dr. Bailey has authored over 100 articles and is an associate editor of IEEE Transactions on Knowledge and Data Engineering and Knowledge and Information Systems: An International Journal. His research focuses on fundamental topics in data mining and machine learning, such as contrast pattern mining and data clustering, as well as application aspects in areas, including health informatics and bioinformatics. He has a PhD in computer science from the University of Melbourne.