For More Udemy Free Courses >>> https://freetutorials.us/
For more Lynda and other Courses >>> https://www.freecoursesonline.me/
Forum for discussion >>> https://1hack.us/
: Tomasz Lelek
: November 30, 2018
: 79 Files, 1 Folders
Discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs
2 hours 26 minutes
Table of Contents
â€¢ Transformations and Actions
â€¢ Immutable Design
â€¢ Avoid Shuffle and Reduce Operational Expenses
â€¢ Saving Data in the Correct Format
â€¢ Working with Spark Key/Value API
â€¢ Testing Apache Spark Jobs
â€¢ Leveraging Spark GraphX API
â€¢ Compose Spark jobs from actions and transformations
â€¢ Create highly concurrent Spark programs by leveraging immutability
â€¢ Ways to avoid the most expensive operation in the Spark APIâ€”Shuffle
â€¢ How to save data for further processing by picking the proper data format saved by Spark
â€¢ Parallelize keyed data; learn of how to use Spark's Key/Value API
â€¢ Re-design your jobs to use reduceByKey instead of groupBy
â€¢ Create robust processing pipelines by testing Apache Spark jobs
â€¢ Solve repeated problems by leveraging the GraphX API
Apache Spark has been around for quite some time, but do you really know how to get the most out of Spark? This course aims at giving you new possibilities; you will explore many aspects of Spark, some you may have never heard of and some you never knew existed.
In this course you'll learn to implement some practical and proven techniques to improve particular aspects of programming and administration in Apache Spark. You will explore 7 sections that will address different aspects of Spark via 5 specific techniques with clear instructions on how to carry out different Apache Spark tasks with hands-on experience. The techniques are demonstrated using practical examples and best practices.
By the end of this course, you will have learned some exciting tips, best practices, and techniques with Apache Spark. You will be able to perform tasks and get the best data out of your databases much faster and with ease.
All the code and supporting files for this course are available on Github at https://github.com/PacktPublishing/Apache-Spark-Tips-Tricks-Techniques
Style and Approach
This step-by-step and fast-paced guide will help you learn different techniques you can use to optimize your testing time, speed, and results with a practical approach, take your skills to the next level, and get you up-and-running with Spark.
â€¢ Speed up your Spark jobs by reducing shuffles
â€¢ Leverage the Key/Value API in your big data processing to make your jobs work faster with lower network traffic
â€¢ Test Spark jobs using the unit, integration, and end-to-end techniques to make your data pipeline robust and bullet proof
Tomasz Lelek is a software engineer who programs mostly in Java and Scala. He has worked with the core Java language for the past six years. He has developed multiple production Java software projects that work in a reactive way. He is passionate about nearly everything associated with software development and believes that we should always try to consider different solutions and approaches before solving a problem. Recently, he was a speaker at conferences in Poland, at JDD (Java Developers Day), and at Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He is a co-founder of initLearn, an e-learning platform that was built with the Java language. He has also written articles about everything related to the Java world.