에너지경제연구원 전자도서관

로그인

에너지경제연구원 전자도서관

자료검색

  1. 메인
  2. 자료검색
  3. 통합검색

통합검색

단행본

Spark: The Definitive Guide: Big Data Processing Made Simple

판사항
1st Edition
발행사항
Sebastapol, CA : O'Reilly Media, 2018
형태사항
xxvi, 576p. : illustrations ; 24cm
서지주기
Includes index
소장정보
위치등록번호청구기호 / 출력상태반납예정일
이용 가능 (1)
자료실E207405대출가능-
이용 가능 (1)
  • 등록번호
    E207405
    상태/반납예정일
    대출가능
    -
    위치/청구기호(출력)
    자료실
책 소개

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of this open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.

You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine learning library.

  • Get a gentle overview of big data and Spark
  • Learn about DataFrames, SQL, and Datasets?Spark’s core APIs?through worked examples
  • Dive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames
  • Understand how Spark runs on a cluster
  • Debug, monitor, and tune Spark clusters and applications
  • Learn the power of Spark’s Structured Streaming and MLlib for machine learning tasks
  • Explore the wider Spark ecosystem, including SparkR and Graph Analysis
  • Examine Spark deployment, including coverage of Spark in the Cloud
목차
Part 1. Gentle overview of big data and Spark 1. What is Apache Spark? 2. A gentle introduction to Spark 3. A tour of Spark's toolset Part 2. Structured APIs : DataFrames, SQL, and datasets 4. Structured API overview 5. Basic structured operations 6. Working with different types of data 7. Aggregations 8. Joins 9. Data sources 10. Spark SQL 11. Datasets Part 3. Low-level APIs 12. Resilient distributed datasets (RDDs) 13. Advanced RDDs 14. Distributed shared variables Part 4. Production applications 15. How Spark runs on a cluster 16. Developing Spark applications 17. Deploying Spark 18. Monitoring and debugging 19. Performance tuning Part 5. Streaming 20. Stream processing fundamentals 21. Structured streaming basics 22. Event-time and stateful processing 23. Structured streaming in production Part 6. Advanced analytics and machine learning 24. Advanced analytics and machine learning overview 25. Preprocessing and feature engineering 26. Classification 27. Regression 28. Recommendation 29. Unsupervised learning 30. Graph analytics 31. Deep learning Part 7. Ecosystem 32. Language specifics : Python (PySpark) and R (SparkR and sparklyr) 33. Ecosystem and community Index