Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

E-grāmata: Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale

3.00/5 (2 ratings by Goodreads)

Douglas Eadline, Casey Stella, Ofer Mendelevitch

Formāts: 256 pages
Izdošanas datums: 08-Dec-2016
Izdevniecība: Addison Wesley
Valoda: eng
ISBN-13: 9780134029726

Citas grāmatas par šo tēmu:

Formāts - EPUB+DRM
Cena: 24,41 €*
* ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
Ielikt grozā
Pievienot vēlmju sarakstam
Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.

Formāts: 256 pages
Izdošanas datums: 08-Dec-2016
Izdevniecība: Addison Wesley
Valoda: eng
ISBN-13: 9780134029726

Citas grāmatas par šo tēmu:

DRM restrictions

Kopēšana (kopēt/ievietot):

nav atļauts
Drukāšana:

nav atļauts
Lietošana:

Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

Nepieciešamā programmatūra
Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

The Complete Guide to Data Science with HadoopFor Technical Professionals, Businesspeople, and Students

Demand is soaring for professionals who can solve real data science problems with Hadoop and Spark. Practical Data Science with Hadoop® and Spark is your complete guide to doing just that. Drawing on immense experience with Hadoop and big data, three leading experts bring together everything you need: high-level concepts, deep-dive techniques, real-world use cases, practical applications, and hands-on tutorials.

The authors introduce the essentials of data science and the modern Hadoop ecosystem, explaining how Hadoop and Spark have evolved into an effective platform for solving data science problems at scale. In addition to comprehensive application coverage, the authors also provide useful guidance on the important steps of data ingestion, data munging, and visualization.

Once the groundwork is in place, the authors focus on specific applications, including machine learning, predictive modeling for sentiment analysis, clustering for document analysis, anomaly detection, and natural language processing (NLP).

This guide provides a strong technical foundation for those who want to do practical data science, and also presents business-driven guidance on how to apply Hadoop and Spark to optimize ROI of data science initiatives.

Learn

What data science is, how it has evolved, and how to plan a data science career How data volume, variety, and velocity shape data science use cases Hadoop and its ecosystem, including HDFS, MapReduce, YARN, and Spark Data importation with Hive and Spark Data quality, preprocessing, preparation, and modeling Visualization: surfacing insights from huge data sets Machine learning: classification, regression, clustering, and anomaly detection Algorithms and Hadoop tools for predictive modeling Cluster analysis and similarity functions Large-scale anomaly detection NLP: applying data science to human language

Foreword

xiii

Preface

Acknowledgments

xxi

About the Authors

xxiii

I Data Science with Hadoop---An Overview

(52)

1 Introduction to Data Science

(16)

What Is Data Science?

(1)

Example: Search Advertising

(1)

A Bit of Data Science History

(1)

Statistics and Machine Learning

(1)

Innovation from Internet Giants

(1)

Data Science in the Modern Enterprise

(1)

Becoming a Data Scientist

(1)

The Data Engineer

(1)

The Applied Scientist

(1)

Transitioning to a Data Scientist Role

(2)

Soft Skills of a Data Scientist

(1)

Building a Data Science Team

(1)

The Data Science Project Life Cycle

(1)

Ask the Right Question

(1)

Data Acquisition

(1)

Data Cleaning: Taking Care of Data Quality

(1)

Explore the Data and Design Model Features

(1)

Building and Tuning the Model

(1)

Deploy to Production

(1)

Managing a Data Science Project

(1)

Summary

(1)

2 Use Cases for Data Science

(12)

Big Data---A Driver of Change

(1)

Volume: More Data Is Now Available

(1)

Variety: More Data Types

(1)

Velocity: Fast Data Ingest

(1)

Business Use Cases

(1)

Product Recommendation

(1)

Customer Churn Analysis

(1)

Customer Segmentation

(1)

Sales Leads Prioritization

(1)

Sentiment Analysis

(1)

Fraud Detection

(1)

Predictive Maintenance

(1)

Market Basket Analysis

(1)

Predictive Medical Diagnosis

(1)

Predicting Patient Re-admission

(1)

Detecting Anomalous Record Access

(1)

Insurance Risk Analysis

(1)

Predicting Oil and Gas Well Production Levels

(1)

Summary

(2)

3 Hadoop and Data Science

(22)

What Is Hadoop?

(1)

Distributed File System

(2)

Resource Manager and Scheduler

(1)

Distributed Data Processing Frameworks

(2)

Hadoop's Evolution

(1)

Hadoop Tools for Data Science

(1)

Apache Sqoop

(1)

Apache Flume

(1)

Apache Hive

(1)

Apache Pig

(1)

Apache Spark

(3)

Python

(1)

Java Machine Learning Packages

(1)

Why Hadoop Is Useful to Data Scientists

(1)

Cost Effective Storage

(1)

Schema on Read

(1)

Unstructured and Semi-Structured Data

(1)

Multi-Language Tooling

(1)

Robust Scheduling and Resource Management

(1)

Levels of Distributed Systems Abstractions

(1)

Scalable Creation of Models

(1)

Scalable Application of Models

(1)

Summary

(2)

II Preparing and Visualizing Data with Hadoop

(72)

4 Getting Data Into Hadoop

(30)

Hadoop as a Data Lake

(2)

The Hadoop Distributed File System (HDFS)

(1)

Direct File Transfer to Hadoop HDFS

(1)

Importing Data from Files into Hive Tables

(1)

Import CSV Files into Hive Tables

(3)

Importing Data into Hive Tables Using Spark

(1)

Import CSV Files into HIVE Using Spark

(1)

Import a JSON File into HIVE Using Spark

(1)

Using Apache Sqoop to Acquire Relational Data

(1)

Data Import and Export with Sqoop

(1)

Apache Sqoop Version Changes

(1)

Using Sqoop V2: A Basic Example

(6)

Using Apache Flume to Acquire Data Streams

(2)

Using Flume: A Web Log Example Overview

(3)

Manage Hadoop Work and Data Flows with Apache Oozie

(2)

Apache Falcon

(1)

What's Next in Data Ingestion?

(1)

Summary

(3)

5 Data Munging with Hadoop

(22)

Why Hadoop for Data Munging?

(1)

Data Quality

(1)

What Is Data Quality?

(1)

Dealing with Data Quality Issues

(5)

Using Hadoop for Data Quality

(1)

The Feature Matrix

(1)

Choosing the "Right" Features

(1)

Sampling: Choosing Instances

(2)

Generating Features

(1)

Text Features

(3)

Time-Series Features

100

(1)

Features from Complex Data Types

101

(1)

Feature Manipulation

102

(1)

Dimensionality Reduction

103

(3)

Summary

106

(1)

6 Exploring and Visualizing Data

107

(18)

Why Visualize Data?

107

(1)

Motivating Example: Visualizing Network Throughput

108

(2)

Visualizing the Breakthrough That Never Happened

110

(2)

Creating Visualizations

112

(1)

Comparison Charts

113

(1)

Composition Charts

114

(3)

Distribution Charts

117

(1)

Relationship Charts

118

(3)

Using Visualization for Data Science

121

(1)

Popular Visualization Tools

121

(1)

121

(1)

Python: Matplotlib, Seaborn, and Others

122

(1)

SAS

122

(1)

Matlab

123

(1)

Julia

123

(1)

Other Visualization Tools

123

(1)

Visualizing Big Data with Hadoop

123

(1)

Summary

124

(1)

III Applying Data Modeling with Hadoop

125

(76)

7 Machine Learning with Hadoop

127

(6)

Overview of Machine Learning

127

(1)

Terminology

128

(1)

Task Types in Machine Learning

129

(1)

Big Data and Machine Learning

130

(1)

Tools for Machine Learning

131

(1)

The Future of Machine Learning and Artificial Intelligence

132

(1)

Summary

132

(1)

8 Predictive Modeling

133

(18)

Overview of Predictive Modeling

133

(1)

Classification Versus Regression

134

(2)

Evaluating Predictive Models

136

(1)

Evaluating Classifiers

136

(3)

Evaluating Regression Models

139

(1)

Cross Validation

139

(1)

Supervised Learning Algorithms

140

(1)

Building Big Data Predictive Model Solutions

141

(1)

Model Training

141

(2)

Batch Prediction

143

(1)

Real-Time Prediction

144

(1)

Example: Sentiment Analysis

145

(1)

Tweets Dataset

145

(1)

Data Preparation

145

(1)

Feature Generation

146

(3)

Building a Classifier

149

(1)

Summary

150

(1)

9 Clustering

151

(14)

Overview of Clustering

151

(1)

Uses of Clustering

152

(1)

Designing a Similarity Measure

153

(1)

Distance Functions

153

(1)

Similarity Functions

154

(1)

Clustering Algorithms

154

(1)

Example: Clustering Algorithms

155

(1)

k-means Clustering

155

(2)

Latent Dirichlet Allocation

157

(1)

Evaluating the Clusters and Choosing the Number of Clusters

157

(1)

Building Big Data Clustering Solutions

158

(2)

Example: Topic Modeling with Latent Dirichlet Allocation

160

(1)

Feature Generation

160

(2)

Running Latent Dirichlet Allocation

162

(1)

Summary

163

(2)

10 Anomaly Detection with Hadoop

165

(16)

Overview

165

(1)

Uses of Anomaly Detection

166

(1)

Types of Anomalies in Data

166

(1)

Approaches to Anomaly Detection

167

(1)

Rules-based Methods

167

(1)

Supervised Learning Methods

168

(1)

Unsupervised Learning Methods

168

(2)

Semi-Supervised Learning Methods

170

(1)

Tuning Anomaly Detection Systems

170

(1)

Building a Big Data Anomaly Detection Solution with Hadoop

171

(1)

Example: Detecting Network Intrusions

172

(1)

Data Ingestion

172

(4)

Building a Classifier

176

(1)

Evaluating Performance

177

(2)

Summary

179

(2)

11 Natural Language Processing

181

(14)

Natural Language Processing

181

(1)

Historical Approaches

182

(1)

NLP Use Cases

182

(1)

Text Segmentation

183

(1)

Part-of-Speech Tagging

183

(1)

Named Entity Recognition

184

(1)

Sentiment Analysis

184

(1)

Topic Modeling

184

(1)

Tooling for NLP in Hadoop

184

(1)

Small-Model NLP

184

(2)

Big-Model NLP

186

(1)

Textual Representations

187

(1)

Bag-of-Words

187

(1)

Word2vec

188

(1)

Sentiment Analysis Example

189

(1)

Stanford CoreNLP

189

(1)

Using Spark for Sentiment Analysis

189

(4)

Summary

193

(2)

12 Data Science with Hadoop---The Next Frontier

195

(6)

Automated Data Discovery

195

(2)

Deep Learning

197

(2)

Summary

199

(2)

A Book Web Page and Code Download

201

(2)

B HDFS Quick Start

203

(6)

Quick Command Dereference

204

(1)

General User HDFS Commands

204

(1)

List Files in HDFS

205

(1)

Make a Directory in HDFS

206

(1)

Copy Files to HDFS

206

(1)

Copy Files from HDFS

207

(1)

Copy Files within HDFS

207

(1)

Delete a File within HDFS

207

(1)

Delete a Directory in HDFS

207

(1)

Get an HDFS Status Report (Administrators)

207

(1)

Perform an FSCK on HDFS (Administrators)

208

(1)

C Additional Background on Data Science and Apache Hadoop and Spark

209

(4)

General Hadoop/Spark Information

209

(1)

Hadoop/Spark Installation Recipes

210

(1)

HDFS

210

(1)

MapReduce

211

(1)

Spark

211

(1)

Essential Tools

211

(1)

Machine Learning

212

(1)

Index

213

Ofer Mendelevitch is Vice President of Data Science at Lendup, where he is responsible for Lendups machine learning and advanced analytics group. Prior to joining Lendup, Ofer was Director of Data Science at Hortonworks, where he was responsible for helping Hortonworks customers apply Data Science with Hadoop and Spark to big data across various industries including healthcare, finance, retail and others. Before Hortonworks, Ofer served as Entrepreneur in Residence at XSeed Capital, VP of Engineering at Nor1, and Director of Engineering at Yahoo!.

Casey Stella is a Principal Software Engineer focusing on Data Science at Hortonworks, which provides an open source Hadoop distribution. Caseys primary responsibility is leading the analytics/data science team for the Apache Metron (Incubating) Project, an open source cybersecurity project. Prior to Hortonworks, Casey was an architect at Explorys, which was a medical informatics startup spun out of the Cleveland Clinic. In the more distant past, Casey served as a developer at Oracle, Research Geophysicist at ION Geophysical and as a poor graduate student in Mathematics at Texas A&M.

Douglas Eadline, PhD, began his career as analytical chemist with an interest in computer methods. Starting with the first Beowulf how-to document, Doug has written hundreds of articles, white papers, and instructional documents covering many aspects of HPC and Hadoop computing. Prior to starting and editing the popular ClusterMonkey.net website in 2005, he served as editoræinæchief for ClusterWorld Magazine and was senior HPC editor for Linux Magazine. He has practical hands-on experience in many aspects of HPC and Apache Hadoop, including hardware and software design, benchmarking, storage, GPU, cloud computing, and parallel computing. Currently, he is a writer and consultant to the HPC/analytics industry and leader of the Limulus Personal Cluster Project (http://limulus.basement-supercomputing.com). He is author of the Apache Hadoop® Fundamentals LiveLessons and Apache Hadoop® YARN Fundamentals LiveLessons videos from Pearson, and is book co-author of Apache Hadoop® YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 and author of Hadoop® 2 Quick Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem, also from Addison-Wesley, and is author of High Performance Computing for Dummies.

Biežāk uzdotie jautājumi par e-grāmatām

Permanent link: https://www.kriso.lv/db/97801340297266e.html

Keywords:

E-grāmata: Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale

DRM restrictions

Kopēšana (kopēt/ievietot):

Drukāšana:

Lietošana:

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Ebook Subjects

Izvēlieties iepirkumu grozu