Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

E-grāmata: Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale

4.20/5 (26 ratings by Goodreads)

Lars George, Jan Kunigk, Paul Wilkinson, Ian Buss

Formāts: 636 pages
Izdošanas datums: 05-Dec-2018
Izdevniecība: O'Reilly Media
Valoda: eng
ISBN-13: 9781491969229

Citas grāmatas par šo tēmu:

Formāts - EPUB+DRM
Cena: 46,20 €*
* ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
Ielikt grozā
Pievienot vēlmju sarakstam
Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.

Formāts: 636 pages
Izdošanas datums: 05-Dec-2018
Izdevniecība: O'Reilly Media
Valoda: eng
ISBN-13: 9781491969229

Citas grāmatas par šo tēmu:

DRM restrictions

Kopēšana (kopēt/ievietot):

nav atļauts
Drukāšana:

nav atļauts
Lietošana:

Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

Nepieciešamā programmatūra
Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

Theres a lot of information about big data technologies, but splicing these technologies into an end-to-end enterprise data platform is a daunting task not widely covered. With this practical book, youll learn how to build big data infrastructure both on-premises and in the cloud and successfully architect a modern data platform.

Ideal for enterprise architects, IT managers, application architects, and data engineers, this book shows you how to overcome the many challenges that emerge during Hadoop projects. Youll explore the vast landscape of tools available in the Hadoop and big data realm in a thorough technical primer before diving into:

Infrastructure: Look at all component layers in a modern data platform, from the server to the data center, to establish a solid foundation for data in your enterprise Platform: Understand aspects of deployment, operation, security, high availability, and disaster recovery, along with everything you need to know to integrate your platform with the rest of your enterprise IT Taking Hadoop to the cloud: Learn the important architectural aspects of running a big data platform in the cloud while maintaining enterprise security and high availability

Foreword

xiii

Preface

xvii

1 Big Data Technology Primer

(30)

A Tour of the Landscape

(23)

Core Components

(5)

Computational Frameworks

(4)

Analytical SQL Engines

(4)

Storage Engines

(7)

Ingestion

(1)

Orchestration

(1)

Summary

(5)

Part I. Infrastructure

2 Clusters

(14)

Reasons for Multiple Clusters

(4)

Multiple Clusters for Resiliency

(1)

Multiple Clusters for Software Development

(1)

Multiple Clusters for Workload Isolation

(1)

Multiple Clusters for Legal Separation

(1)

Multiple Clusters and Independent Storage and Compute

(1)

Multitenancy

(2)

Requirements for Multitenancy

(1)

Sizing Clusters

(4)

Sizing by Storage

(2)

Sizing by Ingest Rate

(1)

Sizing by Workload

(1)

Cluster Growth

(2)

The Drivers of Cluster Growth

(1)

Implementing Cluster Growth

(1)

Data Replication

(1)

Replication for Software Development

(1)

Replication and Workload Isolation

(1)

Summary

(1)

3 Compute and Storage

(62)

Computer Architecture for Hadoop

(9)

Commodity Servers

(2)

Server CPUs and RAM

(2)

Nonuniform Memory Access

(4)

CPU Specifications

(1)

RAM

(1)

Commoditized Storage Meets the Enterprise

(3)

Modularity of Compute and Storage

(1)

Everything Is Java

(1)

Replication or Erasure Coding?

(1)

Alternatives

(1)

Hadoop and the Linux Storage Stack

(13)

User Space

(3)

Important System Calls

(1)

The Linux Page Cache

(3)

Short-Circuit and Zero-Copy Reads

(4)

Filesystems

(2)

Erasure Coding Versus Replication

(10)

Discussion

(3)

Guidance

(2)

Low-Level Storage

(10)

Storage Controllers

(3)

Disk Layer

(7)

Server Form Factors

(5)

Form Factor Comparison

(1)

Guidance

(1)

Workload Profiles

(1)

Cluster Configurations and Node Types

(7)

Master Nodes

(1)

Worker Nodes

(1)

Utility Nodes

100

(1)

Edge Nodes

101

(1)

Small Cluster Configurations

101

(1)

Medium Cluster Configurations

102

(1)

Large Cluster Configurations

103

(1)

Summary

104

(3)

4 Networking

107

(32)

How Services Use a Network

107

(7)

Remote Procedure Calls (RPCs)

107

(2)

Data Transfers

109

(4)

Monitoring

113

(1)

Backup

113

(1)

Consensus

114

(1)

Network Architectures

114

(14)

Small Cluster Architectures

115

(1)

Medium Cluster Architectures

116

(8)

Large Cluster Architectures

124

(4)

Network Integration

128

(3)

Reusing an Existing Network

128

(1)

Creating an Additional Network

129

(2)

Network Design Considerations

131

(7)

Layer 1 Recommendations

131

(2)

Layer 2 Recommendations

133

(2)

Layer 3 Recommendations

135

(3)

Summary

138

(1)

5 Organizational Challenges

139

(20)

Who Runs It?

140

(1)

Is It Infrastructure, Middleware, or an Application?

140

(1)

Case Study: A Typical Business Intelligence Project

141

(16)

The Traditional Approach

141

(2)

Typical Team Setup

143

(3)

Compartmentalization of IT

146

(1)

Revised Team Setup for Hadoop in the Enterprise

147

(7)

Solution Overview with Hadoop

154

(1)

New Team Setup

155

(1)

Split Responsibilities

156

(1)

Do I Need DevOps?

156

(1)

Do I Need a Center of Excellence/Competence?

157

(1)

Summary

157

(2)

6 Datacenter Considerations

159

(26)

Why Does It Matter ?

159

(1)

Basic Datacenter Concepts

160

(8)

Cooling

162

(1)

Power

163

(1)

Network

164

(1)

Rack Awareness and Rack Failures

165

(2)

Failure Domain Alignment

167

(1)

Space and Racking Constraints

168

(1)

Ingest and Intercluster Connectivity

169

(2)

Software

169

(1)

Hardware

170

(1)

Replacements and Repair

171

(1)

Operational Procedures

172

(1)

Typical Pitfalls

172

(9)

Networking

172

(1)

Cluster Spanning

173

(8)

Summary

181

(4)

Part II. Platform

7 Provisioning Clusters

185

(26)

Operating Systems

185

(9)

OS Choices

187

(1)

OS Configuration for Hadoop

188

(5)

Automated Configuration Example

193

(1)

Service Databases

194

(8)

Required Databases

196

(1)

Database Integration Options

197

(4)

Database Considerations

201

(1)

Hadoop Deployment

202

(8)

Hadoop Distributions

202

(3)

Installation Choices

205

(1)

Distribution Architecture

206

(2)

Installation Process

208

(2)

Summary

210

(1)

8 Platform Validation

211

(26)

Testing Methodology

212

(1)

Useful Tools

213

(1)

Hardware Validation

213

(14)

CPU

213

(3)

Disks

216

(5)

Network

221

(6)

Hadoop Validation

227

(7)

HDFS Validation

228

(2)

General Validation

230

(4)

Validating Other Components

234

(2)

Operations Validation

235

(1)

Summary

236

(1)

9 Security

237

(44)

In-Flight Encryption

237

(5)

TLS Encryption

238

(2)

SASL Quality of Protection

240

(1)

Enabling in-Flight Encryption

241

(1)

Authentication

242

(8)

Kerberos

242

(5)

LDAP Authentication

247

(1)

Delegation Tokens

248

(1)

Impersonation

249

(1)

Authorization

250

(20)

Group Resolution

251

(2)

Superusers and Supergroups

253

(4)

Hadoop Service Level Authorization

257

(1)

Centralized Security Management

258

(2)

HDFS

260

(1)

YARN

261

(1)

ZooKeeper

262

(1)

Hive

263

(1)

Impala

264

(1)

HBase

264

(1)

So1r

265

(1)

Kudu

266

(1)

Oozie

266

(1)

Hue

266

(3)

Kafka

269

(1)

Sentry

270

(1)

At-Rest Encryption

270

(9)

Volume Encryption with Cloudera Navigator Encrypt and Key Trustee Server

273

(1)

HDFS Transparent Data Encryption

274

(5)

Encrypting Temporary Files

279

(1)

Summary

279

(2)

10 Integration with Identity Management Providers

281

(30)

Integration Areas

281

(1)

Integration Scenarios

282

(3)

Scenario 1: Writing a File to HDFS

282

(1)

Scenario 2: Submitting a Hive Query

283

(1)

Scenario 3: Running a Spark Job

284

(1)

Integration Providers

285

(2)

LDAP Integration

287

(9)

Background

287

(2)

LDAP Security

289

(1)

Load Balancing

290

(1)

Application Integration

290

(2)

Linux Integration

292

(4)

Kerberos Integration

296

(8)

Kerberos Clients

296

(2)

KDC Integration

298

(6)

Certificate Management

304

(5)

Signing Certificates

305

(2)

Converting Certificates

307

(1)

Wildcard Certificates

308

(1)

Automation

309

(1)

Summary

309

(2)

11 Accessing and Interacting with Clusters

311

(18)

Access Mechanisms

311

(2)

Programmatic Access

311

(1)

Command-Line Access

312

(1)

Web UIs

312

(1)

Access Topologies

313

(10)

Interaction Patterns

314

(2)

Proxy Access

316

(2)

Load Balancing

318

(1)

Edge Node Interactions

318

(5)

Access Security

323

(1)

Administration Gateways

324

(1)

Workbenches

324

(2)

Hue

324

(1)

Notebooks

325

(1)

Landing Zones

326

(2)

Summary

328

(1)

12 High Availability

329

(48)

High Availability Defined

330

(1)

Lateral/Service HA

330

(1)

Vertical/Systemic HA

330

(1)

Measuring Availability

331

(1)

Percentages

331

(1)

Percentiles

331

(1)

Operating for HA

331

(1)

Monitoring

331

(1)

Playbooks and Postmortems

332

(1)

HA Building Blocks

332

(13)

Quorums

332

(2)

Load Balancing

334

(7)

Database HA

341

(2)

Ancillary Services

343

(2)

General Considerations

345

(2)

Separation of Master and Worker Processes

345

(1)

Separation of Identical Service Roles

345

(1)

Master Servers in Separate Failure Domains

346

(1)

Balanced Master Configurations

346

(1)

Optimized Server Configurations

346

(1)

High Availability of Cluster Services

347

(29)

ZooKeeper

347

(1)

HDFS

348

(5)

YARN

353

(3)

HBase

356

(2)

KMS

358

(1)

Hive

359

(3)

Impala

362

(5)

Solr

367

(2)

Kafka

369

(2)

Oozie

371

(1)

Hue

372

(3)

Other Services

375

(1)

Autoconfiguration

375

(1)

Summary

376

(1)

13 Backup and Disaster Recovery

377

(34)

Context

377

(11)

Many Distributed Systems

377

(1)

Policies and Objectives

378

(1)

Failure Scenarios

379

(3)

Suitable Data Sources

382

(1)

Strategies

383

(3)

Data Types

386

(1)

Consistency

386

(1)

Validation

387

(1)

Summary

388

(1)

Data Replication

388

(3)

HBase

389

(1)

Cluster Management Tools

389

(1)

Kafka

390

(1)

Summary

391

(1)

Hadoop Cluster Backups

391

(14)

Subsystems

394

(4)

Case Study: Automating Backups with Oozie

398

(7)

Restore

405

(1)

Summary

406

(5)

Part III. Taking Hadoop to the Cloud

14 Basics of Virtualization for Hadoop

411

(22)

Compute Virtualization

412

(3)

Virtual Machine Distribution

413

(1)

Anti-Affinity Groups

414

(1)

Storage Virtualization

415

(8)

Virtualizing Local Storage

416

(1)

SANs

417

(4)

Object Storage and Network-Attached Storage

421

(2)

Network Virtualization

423

(2)

Cluster Life Cycle Models

425

(5)

Summary

430

(3)

15 Solutions for Private Clouds

433

(22)

OpenStack

435

(4)

Automation and Integration

436

(1)

Life Cycle and Storage

436

(2)

Isolation

438

(1)

Summary

438

(1)

OpenShift

439

(3)

Automation

439

(1)

Life Cycle and Storage

440

(1)

Isolation

441

(1)

Summary

441

(1)

VMware and Pivotal Cloud Foundry

442

(1)

Do It Yourself?

442

(6)

Automation

445

(1)

Isolation

446

(1)

Life Cycle Model

446

(1)

Summary

447

(1)

Object Storage for Private Clouds

448

(5)

EMC Isilon

448

(2)

Ceph

450

(3)

Summary

453

(2)

16 Solutions in the Public Cloud

455

(42)

Key Things to Know

455

(2)

Cloud Providers

457

(16)

AWS

457

(7)

Microsoft Azure

464

(6)

Google Cloud Platform

470

(3)

Implementing Clusters

473

(22)

Instances

473

(5)

Storage and Life Cycle Models

478

(6)

Network Architecture

484

(4)

High Availability

488

(7)

Summary

495

(2)

17 Automated Provisioning

497

(16)

Long-Lived Clusters

497

(13)

Configuration and Templating

498

(1)

Deployment Phases

499

(3)

Vendor Solutions

502

(3)

One-Click Deployments

505

(1)

Homegrown Automation

505

(1)

Hooking Into a Provisioning Life Cycle

505

(1)

Scaling Up and Down

506

(2)

Deploying with Security

508

(2)

Transient Clusters

510

(1)

Sharing Metadata Services

511

(1)

Summary

512

(1)

18 Security in the Cloud

513

(48)

Assessing the Risk

513

(2)

Risk Model

515

(2)

Environmental Risks

515

(1)

Deployment Risks

516

(1)

Identity Provider Options for Hadoop

517

(6)

Option A: Cloud-Only Self-Contained ID Services

519

(1)

Option B: Cloud-Only Shared ID Services

520

(1)

Option C: On-Premises ID Services

521

(2)

Object Storage Security and Hadoop

523

(12)

Identity and Access Management

523

(1)

Amazon Simple Storage Service

524

(3)

GCP Cloud Storage

527

(4)

Microsoft Azure

531

(4)

Auditing

535

(1)

Encryption for Data at Rest

535

(15)

Requirements for Key Material

536

(1)

Options for Encryption in the Cloud

537

(2)

On-Premises Key Persistence

539

(1)

Encryption via the Cloud Provider

539

(8)

Encryption Feature and Interoperability Summary

547

(2)

Recommendations and Summary for Cloud Encryption

549

(1)

Encrypting Data in Flight in the Cloud

550

(1)

Perimeter Controls and Firewalling

551

(8)

GCP

553

(2)

AWS

555

(2)

Azure

557

(2)

Summary

559

(2)

A Backup Onboarding Checklist

561

(10)

Index

571

Jan Kunigk has worked on enterprise Hadoop solutions since 2010. Before he joined Cloudera in 2014, Jan built optimized systems architectures for Hadoop at IBM and implemented a Hadoop-as-a-Service offering at T-Systems. In his current role as a Solutions Architect he makes Hadoop projects at Cloudera's enterprise customers successful, covering a wide spectrum of architectural decisions to the implementation of big data applications across all industry sectors on a day-to-day basis.

Ian Buss began his journey into distributed computing with parallel computational electromagnetics whilst studying for a PhD in photonics at the University of Bristol. After simulating LEDs on supercomputers, he made the move from big compute in academia to big data in the public sector, first encountering Hadoop in 2012. After having fun building, deploying, managing and using Hadoop clusters, Ian joined Cloudera as a Solutions Architect in 2014. His day job now involves integrating Hadoop into enterprises and making stuff work in the real world.

Paul Wilkinson has been wrestling with big data in the public sector since before Hadoop existed and was very glad when it arrived in his life in 2009. He became a Cloudera consultant in 2012, advising customers on all things hadoop: application design, information architecture, cluster management and infrastructure planning the FullStack. After a torrent of professional services work across financial services, cybersecurity, adtech, gaming and government, he's seen it all warts and all. Or at least, he hopes he has.

Lars George has been involved with Hadoop and HBase since 2007, and became a full HBase committer in 2009. He has spoken at many Hadoop User Group meetings, and conferences such as Hadoop World and Hadoop Summit, ApacheCon, FOSDEM, QCon etc. He also started the Munich OpenHUG meetings. Lars worked for Cloudera for over five years, as the EMEA Chief Architect, acting as a liaison between the Cloudera professional services team and customers as well as partners in and around Europe, building the next data driven solutions. In 2016 he started with his own Hadoop advisory firm, extending on what he has learned and seen in the field for more than 8 years. He is also the author or O'Reilly's "HBase The Definitive Guide."

Biežāk uzdotie jautājumi par e-grāmatām

Permanent link: https://www.kriso.lv/db/97814919692296e.html

Keywords:

E-grāmata: Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale

DRM restrictions

Kopēšana (kopēt/ievietot):

Drukāšana:

Lietošana:

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Ebook Subjects

Izvēlieties iepirkumu grozu