This document discusses running Apache Spark and Apache Zeppelin in production. It begins by introducing the author and their background. It then covers security best practices for Spark deployments, including authentication using Kerberos, authorization using Ranger/Sentry, encryption, and audit logging. Different Spark deployment modes like Spark on YARN are explained. The document also discusses optimizing Spark performance by tuning executor size and multi-tenancy. Finally, it covers security features for Apache Zeppelin like authentication, authorization, and credential management.
John Doe first authenticates to Kerberos before launching Spark Shell
kinit -kt /etc/security/keytabs/johndoe.keytab johndoe@EXAMPLE.COM
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
The first step of security is network security
The second step of security is Authentication
Most Hadoop echo system projects rely on Kerberos for Authentication
Kerberos – 3 Headed Guard Dog : https://en.wikipedia.org/wiki/Cerberus
John Doe first authenticates to Kerberos before launching Spark Shell
kinit -kt /etc/security/keytabs/johndoe.keytab johndoe@EXAMPLE.COM
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
Controlling HDFS Authorization is easy/Done
Controlling Hive row/column level authorization in Spark is WIP
For HDFS as Data Source can use RPC or use SSL with WebHDFS
For NM Shuffle Data – Use YARN SSL
Spark support SSL for FS (Broadcast or File download)
Shuffle Block Transfer supports SASL based encryption – SSL coming
Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)