Spark: The Definitive Guide: Big Data Processing Made Simple

단행본

Spark: The Definitive Guide: Big Data Processing Made Simple

저자: Chambers, Bill / | Zaharia, Matei
판사항: 1st Edition
발행사항: Sebastapol, CA : O'Reilly Media, 2018
형태사항: xxvi, 576p. : illustrations ; 24cm
서지주기: Includes index
주제명: Big data Data mining - - Computer programs Electronic data processing Information retrieval Telecommunication - - Message processing Web applications - - Development Web servers - - Computer programs

소장정보

위치	등록번호	청구기호 / 출력	상태	반납예정일
이용 가능 (1)
자료실	E207405		대출가능	-

이용 가능 (1)

등록번호
E207405
상태/반납예정일
대출가능
-
위치/청구기호(출력)
자료실

책 소개

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of this open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.

You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine learning library.

Get a gentle overview of big data and Spark
Learn about DataFrames, SQL, and Datasets?Spark’s core APIs?through worked examples
Dive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames
Understand how Spark runs on a cluster
Debug, monitor, and tune Spark clusters and applications
Learn the power of Spark’s Structured Streaming and MLlib for machine learning tasks
Explore the wider Spark ecosystem, including SparkR and Graph Analysis
Examine Spark deployment, including coverage of Spark in the Cloud

Part 1. Gentle overview of big data and Spark 1. What is Apache Spark? 2. A gentle introduction to Spark 3. A tour of Spark's toolset Part 2. Structured APIs : DataFrames, SQL, and datasets 4. Structured API overview 5. Basic structured operations 6. Working with different types of data 7. Aggregations 8. Joins 9. Data sources 10. Spark SQL 11. Datasets Part 3. Low-level APIs 12. Resilient distributed datasets (RDDs) 13. Advanced RDDs 14. Distributed shared variables Part 4. Production applications 15. How Spark runs on a cluster 16. Developing Spark applications 17. Deploying Spark 18. Monitoring and debugging 19. Performance tuning Part 5. Streaming 20. Stream processing fundamentals 21. Structured streaming basics 22. Event-time and stateful processing 23. Structured streaming in production Part 6. Advanced analytics and machine learning 24. Advanced analytics and machine learning overview 25. Preprocessing and feature engineering 26. Classification 27. Regression 28. Recommendation 29. Unsupervised learning 30. Graph analytics 31. Deep learning Part 7. Ecosystem 32. Language specifics : Python (PySpark) and R (SparkR and sparklyr) 33. Ecosystem and community Index

저자 소개

저자 빌 체임버스

2014년에 몇몇 연구 프로젝트에 스파크를 도입했습니다. 데이터브릭스에서 제품 관리를 맡고 있으며 사용자들이 다양한 아파치 스파크 애플리케이션을 개발할 수 있는 환경을 만들기 위해 노력하고 있습니다. 또한 정기적으로 스파크와 관련된 블로그를 작성하고 콘퍼런스 발표와 밋업에 참여하고 있습니다. UC버클리 대학교 정보대학원에서 정보 관리와 시스템 분야의 석사학위를 취득했습니다.

작가의 다른 작품

저자 마테이 자하리아

2009년에 아파치 스파크 프로젝트를 시작했고 UC버클리 대학교 박사 과정 동안 스파크와 함께 했습니다. 버클리의 여러 연구원 및 외부 공동 작업자와 함께 스파크의 핵심 API를 설계하고 스파크 커뮤니티를 성장시키고 있으며 구조적 API와 구조적 스트리밍 같은 새로운 개념을 만드는 데 참여하고 있습니다. 2013년 마테이와 버클리 스파크 팀은 오픈소스 프로젝트의 성장을 도우려 데이터브릭스를 설립하고 상업용 제품을 제공하기 시작했습니다. 현재 데이터브릭스의 최고 기술 전문가로 일하고 있으며 스탠퍼드 대학교의 컴퓨터 과학 분야 조교수를...

작가의 다른 작품

알라딘에서 제공한 저자 정보입니다.상세보기

자료검색

통합검색

Spark: The Definitive Guide: Big Data Processing Made Simple

소장정보

책 소개

목차

저자 소개

주제어

주제어

저자 소개