当前位置:网站首页>推荐书籍 > 正文 >>

Spark权威指南(影印版)

图书信息

作者:(美)比尔·钱伯斯(BillChambers),(MateiZaharia)马太·扎哈里亚(MateiZaharia)著

出版社:东南大学出版社

定价:128.00

ISBN:9787564179816

出版时间:2018-11-01

分类:图书,行业职业,计算机,数据库

商品介绍

目录

Preface

Part Ⅰ.Gentle Overview of Big Data and Spark

1.What Is Apache Spark?

Apache Spark's Philosophy

Context:The Big Data Problem

History of Spark

The Present and Future of Spark

Running Spark

Downloading Spark Locally

Launching Spark's Interactive Consoles

Running Spark in the Cloud

Data Used in This Book

2.A Gentle Introduction to Spark

Spark's Basic Architecture

Spark Applications

Spark's Language APIs

Spark's APIs

Starting Spark

The SparkSession

DataFrames

Partitions

Transformations

Lazy Evaluation

Actions

Spark UI

An End-to-End Example

DataFrames and SQL

Conclusion

3.A Tour of Spark's Too1set

Running Production Applications

Datasets:Type-Safe Structured APIs

Structured Streaming

Machine Learning and Advanced Analytics

Lower-Level APIs

SparkR

Spark's Ecosystem and Packages

Conclusion

Part Ⅱ.Structured APls-DataFrames,SQL,and Datasets

4.Structured API Overview

DataFrames and Datasets

Schemas

Overview of Structured Spark Types

DataFrames Versus Datasets

Columns

Rows

Spark Types

Overview of Structured API Execution

Logical Planning

Physical Planning

Execution

Conclusion

5.Basic Structured Operations

Schemas

Columns and Expressions

Columns

Expressions

Records and Rows

Creating Rows

DataFrame Transformations

Creating DataFrames

select and selectExpr

Converting to Spark Types (Literals)

Adding Columns

……

6.Working with Different Types of Data

7.Aggregations

8.Joins

9.Data Sources

10.Spark SQL

11.Datasets

Part Ⅲ.Low-Level APIs

12.Resilient Distributed Datasets(RDDs)

13.Advanced RDDs

14.Distributed Shared Variables

Part Ⅳ.Production Applications

15.HowSparkRunson a Cluster

16.Developing Spark Applications

17.Deploying Spark

18.Monitoring and Debugging

19.Performance Tuning

Part Ⅴ.Streaming

20.Stream Processing Fundamentals

21.Structured Streaming Basics

22.Event-Time and Stateful Processing

23.Structured Streaming in Production

Part Ⅵ.Advanced Analytics and Machine Learning

24.Advanced Analytics and Machine Learning Overview

25.Preprocessing and Feature Engineering

26.Classification

27.Regression

28.Recommendation

29.Unsupervised Learning

30.Graph Analytics

31.Deep Learning

Part Ⅶ.Ecosystem

32.Language Specifics:Python(PySpark)and R(SparkR and sparklyr)

33.Ecosystem and Community

Index

内容简介

为了帮助读者学习如何使用、部署和维护Apache Spark,该开源集群计算框架的部分创建者编写了本书这本综合指南。本书作者比尔·钱伯斯和马太·扎哈里亚在强调Spark 2.0的改进和新功能的同时,将Spark题分为不同的部分,每个部分都有其独特的目标。你将探索Spark的结构化API的基本操作和常见功能以及Structured Streaming,后者是用于构建端到端流应用的一种全新的高层API。开发人员和系统管理员会学Spark监控、调优、调试的基础知识,探索机器学习技术以及Spark可扩展机器学习库MLlib的部署场景。

作者简介

比尔·钱伯斯,是Databricks的产品经理,专注于大规模分析、健全的文档以及跨组织协作,帮助客户在Spark和Datab ricks方面获得成功。

推荐书籍