Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Published on
Holden Karau walks attendees through a number of common mistakes that can keep your Spark programs from scaling and examines solutions and general techniques useful for moving beyond a proof of concept to production.
Topics include:
Working with key/value data
Replacing groupByKey for awesomeness
Key skew: your data probably has it and how to survive
Effective caching and checkpointing
Considerations for noisy clusters
Functional transformations with Spark Datasets: getting the benefits of Catalyst with the ease of functional development
How to make our code testable
Login to see the comments