Big data analytics : a handy reference guide for data analysts and data scientists to help obtain value from big data analytics using Spark on Hadoop clusters / Venkat Ankam
Material type:
- text
- computer
- online resource
- 9781785884696
- QA76.9.B45 A542B 2016
Includes index.
About This BookThis book is based on the latest 2.0 version of Apache Spark and 2.7 version of Hadoop integrated with most commonly used tools.Learn all Spark stack components including latest topics such as DataFrames, DataSets, GraphFrames, Structured Streaming, DataFrame based ML Pipelines and SparkR.Integrations with frameworks such as HDFS, YARN and tools such as Jupyter, Zeppelin, NiFi, Mahout, HBase Spark Connector, GraphFrames, H2O and Hivemall.Who This Book Is ForThough this book is primarily aimed at data analysts and data scientists, it will also help architects, programmers, and practitioners. Knowledge of either Spark or Hadoop would be beneficial. It is assumed that you have basic programming background in Scala, Python, SQL, or R programming with basic Linux experience. Working experience within big data environments is not mandatory.What You Will LearnDiscover and implement wide variety of tools and techniques of Big Data Analytics using Spark on Hadoop clustersUnderstand all the Hadoop and Spark ecosystem componentsGet to know all the Spark components: Spark Core, Spark SQL, DataFrames, DataSets, Conventional and Structured Streaming, MLLib, ML Pipelines, and GraphxLearn to implement batch and real-time data analytics using Spark Core, Spark SQL, and Conventional and Structured StreamingGet to grips with data science and machine learning using MLLib, ML Pipelines, H2O, Mahout Samsara, Hivemall, Graphx, SparkRGet an introduction to all the new tools (Based on Notebooks – Jupyter, Zeppelin, Data Flow – Apache NiFi and Spark as a Service – Livy Server) and their integrations with Spark and HadoopIn DetailThis book aims at providing the fundamentals of Apache Spark and Hadoop with most commonly used tools and techniques in an easy way. All the Spark components, Spark Core, Spark SQL, DataFrames, datasets, Conventional and Structured Streaming, MLlib, Graphx; and HDFS and Yarn, are explored in great depth with implementation examples on Spark and Hadoop clusters.Big Data Analytics is moving away from MapReduce to Spark. So, the advantages Spark has over MapReduce are explained at great depth to understand its in-memory speeds.Usage patterns of the Dataset, DataFrames, and Data Sources APIs are explained to build Spark SQL based Big Data analytical applications. Spark Streaming with Apache Kafka and HBase is covered to help building streaming applications. The new Structured streaming concept is explained with an IOT use case. Machine learning techniques are covered using MLLib, ML Pipelines, Mahout Samsara, H20, SparkR, and Graph Analytics are covered with the GraphX and GraphFrames components of Spark.This book also introduces web based notebooks Jupyter and Apache Zeppelin, and data flow tool Apache NiFi and offering'Spark as a Service'using Livy Server.
There are no comments on this title.