Big data analytics : a handy reference guide for data analysts and data scientists to help obtain value from big data analytics using Spark on Hadoop clusters / Venkat Ankam

By:

Ankam, Venkat [author]

Material type: Text

TextPublisher: Birmingham : Packt Publishing, 2016Description: 1 online resourceContent type:

text

Media type:

computer

Carrier type:

online resource

ISBN:

9781785884696

Subject(s):

Big data

LOC classification:

QA76.9.B45 A542B 2016

Online resources:

Electronic Resources

Summary: About This BookThis book is based on the latest 2.0 version of Apache Spark and 2.7 version of Hadoop integrated with most commonly used tools.Learn all Spark stack components including latest topics such as DataFrames, DataSets, GraphFrames, Structured Streaming, DataFrame based ML Pipelines and SparkR.Integrations with frameworks such as HDFS, YARN and tools such as Jupyter, Zeppelin, NiFi, Mahout, HBase Spark Connector, GraphFrames, H2O and Hivemall.Who This Book Is ForThough this book is primarily aimed at data analysts and data scientists, it will also help architects, programmers, and practitioners. Knowledge of either Spark or Hadoop would be beneficial. It is assumed that you have basic programming background in Scala, Python, SQL, or R programming with basic Linux experience. Working experience within big data environments is not mandatory.What You Will LearnDiscover and implement wide variety of tools and techniques of Big Data Analytics using Spark on Hadoop clustersUnderstand all the Hadoop and Spark ecosystem componentsGet to know all the Spark components: Spark Core, Spark SQL, DataFrames, DataSets, Conventional and Structured Streaming, MLLib, ML Pipelines, and GraphxLearn to implement batch and real-time data analytics using Spark Core, Spark SQL, and Conventional and Structured StreamingGet to grips with data science and machine learning using MLLib, ML Pipelines, H2O, Mahout Samsara, Hivemall, Graphx, SparkRGet an introduction to all the new tools (Based on Notebooks – Jupyter, Zeppelin, Data Flow – Apache NiFi and Spark as a Service – Livy Server) and their integrations with Spark and HadoopIn DetailThis book aims at providing the fundamentals of Apache Spark and Hadoop with most commonly used tools and techniques in an easy way. All the Spark components, Spark Core, Spark SQL, DataFrames, datasets, Conventional and Structured Streaming, MLlib, Graphx; and HDFS and Yarn, are explored in great depth with implementation examples on Spark and Hadoop clusters.Big Data Analytics is moving away from MapReduce to Spark. So, the advantages Spark has over MapReduce are explained at great depth to understand its in-memory speeds.Usage patterns of the Dataset, DataFrames, and Data Sources APIs are explained to build Spark SQL based Big Data analytical applications. Spark Streaming with Apache Kafka and HBase is covered to help building streaming applications. The new Structured streaming concept is explained with an IOT use case. Machine learning techniques are covered using MLLib, ML Pipelines, Mahout Samsara, H20, SparkR, and Graph Analytics are covered with the GraphX and GraphFrames components of Spark.This book also introduces web based notebooks Jupyter and Apache Zeppelin, and data flow tool Apache NiFi and offering'Spark as a Service'using Livy Server.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

No physical items for this record

Includes index.

About This BookThis book is based on the latest 2.0 version of Apache Spark and 2.7 version of Hadoop integrated with most commonly used tools.Learn all Spark stack components including latest topics such as DataFrames, DataSets, GraphFrames, Structured Streaming, DataFrame based ML Pipelines and SparkR.Integrations with frameworks such as HDFS, YARN and tools such as Jupyter, Zeppelin, NiFi, Mahout, HBase Spark Connector, GraphFrames, H2O and Hivemall.Who This Book Is ForThough this book is primarily aimed at data analysts and data scientists, it will also help architects, programmers, and practitioners. Knowledge of either Spark or Hadoop would be beneficial. It is assumed that you have basic programming background in Scala, Python, SQL, or R programming with basic Linux experience. Working experience within big data environments is not mandatory.What You Will LearnDiscover and implement wide variety of tools and techniques of Big Data Analytics using Spark on Hadoop clustersUnderstand all the Hadoop and Spark ecosystem componentsGet to know all the Spark components: Spark Core, Spark SQL, DataFrames, DataSets, Conventional and Structured Streaming, MLLib, ML Pipelines, and GraphxLearn to implement batch and real-time data analytics using Spark Core, Spark SQL, and Conventional and Structured StreamingGet to grips with data science and machine learning using MLLib, ML Pipelines, H2O, Mahout Samsara, Hivemall, Graphx, SparkRGet an introduction to all the new tools (Based on Notebooks – Jupyter, Zeppelin, Data Flow – Apache NiFi and Spark as a Service – Livy Server) and their integrations with Spark and HadoopIn DetailThis book aims at providing the fundamentals of Apache Spark and Hadoop with most commonly used tools and techniques in an easy way. All the Spark components, Spark Core, Spark SQL, DataFrames, datasets, Conventional and Structured Streaming, MLlib, Graphx; and HDFS and Yarn, are explored in great depth with implementation examples on Spark and Hadoop clusters.Big Data Analytics is moving away from MapReduce to Spark. So, the advantages Spark has over MapReduce are explained at great depth to understand its in-memory speeds.Usage patterns of the Dataset, DataFrames, and Data Sources APIs are explained to build Spark SQL based Big Data analytical applications. Spark Streaming with Apache Kafka and HBase is covered to help building streaming applications. The new Structured streaming concept is explained with an IOT use case. Machine learning techniques are covered using MLLib, ML Pipelines, Mahout Samsara, H20, SparkR, and Graph Analytics are covered with the GraphX and GraphFrames components of Spark.This book also introduces web based notebooks Jupyter and Apache Zeppelin, and data flow tool Apache NiFi and offering'Spark as a Service'using Livy Server.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Print
Cite
Add to your cart (remove)
Send to device
Save record
BIBTEX Dublin Core MARCXML MARC (non-Unicode/MARC-8) MARC (Unicode/UTF-8) MARC (Unicode/UTF-8, Standard) MODS (XML) RIS ISBD
More searches
Search for this title in:
Find Books from Google Book Find Papers from Google Scholar Online Stores (Bookfinder.com) Citation Machine: APA style EasyBib: MLA style

Share this title:

Storage	Name	Value	Expiration	Description
Cookie	CGISESSID	Session ID	Until logout or end of session	Session cookie
Cookie	KohaOpacLanguage	Language code	3 years	Stores the language the user selected, so the online catalog will appear in that same language the next time it is visited.
Cookie	form_serialized form_serialized_limits	Search terms and limits	End of session or when the advanced search page is accessed again.	jQuery cookie. Stores search terms and limits of the last advanced search. Set when an advanced search is submitted.
Cookie	search_path_code	ads (fewer) or exs (more)	End of session or when the advanced search page is accessed again.	jQuery cookie. Related to serialized_form* cookies. Stores if the advanced search form was used with 'More options' or 'Fewer options'.
Cookie	num_paragraph	Count of search options added	End of session or when the advanced search page is accessed again.	jQuery cookie. Used to store the number of created options when user selects 'More options' in advanced search to increase search boxes.
Cookie	bib_list	List of record IDs (biblionumbers) separated by /	End of session or until the cart is emptied.	Stores cart contents in the online catalog. Set when records are added to the cart for the first time.

Big data analytics : a handy reference guide for data analysts and data scientists to help obtain value from big data analytics using Spark on Hadoop clusters / Venkat Ankam

Library Opening Hours

Contact Us

Knowledge Center