Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed big data applications like Apache Hadoop and Apache Spark. This session at Strata + Hadoop World in New York City (September 2016) explores various solutions and tips to address the challenges encountered while deploying multi-node Hadoop and Spark production workloads using Docker containers.
Some of these challenges include container life-cycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. BlueData is "all in” on Docker containers—with a specific focus on big data applications. BlueData has learned firsthand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy big data workloads using Docker.
This session by Thomas Phelan, co-founder and chief architect at BlueData, discusses how to securely network Docker containers across multiple hosts and discusses ways to achieve high availability across distributed big data applications and hosts in your data center. Since we’re talking about very large volumes of data, performance is a key factor, so Thomas shares some of the storage options implemented at BlueData to achieve near bare-metal I/O performance for Hadoop and Spark using Docker as well as lessons learned and some tips and tricks on how to Dockerize your big data applications in a reliable, scalable, and high-performance environment.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52042
Lessons Learned Running Hadoop and Spark in Docker Containers
1. Lessons Learned
Running Hadoop and
Spark in Docker
Thomas Phelan
Chief Architect, BlueData
@tapbluedata
September 29, 2016
2. Outline
• Docker Containers and Big Data
• Hadoop and Spark on Docker: Challenges
• How We Did It: Lessons Learned
• Key Takeaways
• Q & A
3. A Better Way to Deploy Big Data
Traditional Approach
IT
ManufacturingSalesR&DServices
< 30%
utilization
Weeks to
build
each
cluster
Duplication
of dataManagement
complexity
Painful,
complex
upgrades
Hadoop and Spark on Docker
ManufacturingSalesR&DServices
BI/Analytics
Tools
> 90%
utilization
A New Approach
No duplication
of data
Simplified
management
Multi-tenant
Self-service,
on-demand
clusters
Simple,
instant
upgrades
4. Deploying Multiple Big Data Clusters
Data scientists want flexibility:
• Different versions of Hadoop, Spark, et.al.
• Different sets of tools
IT wants control:
• Multi-tenancy
- Data security
- Network isolation
5. Containers = the Future of Big Data
Infrastructure
• Agility and elasticity
• Standardized environments
(dev, test, prod)
• Portability
(on-premises and cloud)
• Higher resource utilization
Applications
• Fool-proof packaging
(configs, libraries, driver
versions, etc.)
• Repeatable builds and
orchestration
• Faster app dev cycles
6. The Journey to Big Data on Docker
Start with a clear goal
in sight
Begin with your Docker
toolbox of a single
container and basic
networking and storage
So you want to run Hadoop and Spark on Docker
in a multi-tenant enterprise deployment?
Beware … there is trouble ahead
7. Traverse the tightrope of
network configurations
Navigate the river of
container managers
• Swarm ?
• Kubernetes ?
• AWS ECS ?
• Mesos ?
• Overlay files ?
• Flocker ?
• Convoy ?
• Docker Networking ?
Calico
• Kubernetes Networking ?
Flannel, Weave Net
Big Data on Docker: Pitfalls
Cross the desert of storage
configurations
8. Big Data on Docker: Challenges
Pass thru the jungle of
software compatibility
Tame the lion of
performance
Finally you get to the top!
Trip down the staircase of
deployment mistakes
9. But for deployment in the enterprise, you are
not even close to being done …
Big Data on Docker: Next Steps?
You still have to climb past:
high availability,
backup/recovery, security,
multi-host, multi-container,
upgrades and patches
10. You realize it’s time
to get some help!
Big Data on Docker: Quagmire
11. How We Did It: Design Decisions
• Run Hadoop/Spark distros and applications
unmodified
- Deploy all services that typically run on a single BM
host in a single container
• Multi-tenancy support is key
- Network and storage security
12. How We Did It: Design Decisions
• Images built to “auto-configure” themselves at
time of instantiation
- Not all instances of a single image run the same set of
services when instantiated
• Master vs. worker cluster nodes
13. How We Did It: Design Decisions
• Maintain the promise of containers
- Keep them as stateless as possible
- Container storage is always ephemeral
- Persistent storage is external to the container
14. How We Did It: Implementation
Resource Utilization
• CPU cores vs. CPU shares
• Over-provisioning of CPU recommended
• No over-provisioning of memory
Network
• Connect containers across hosts
• Persistence of IP address across container restart
• Deploy VLANs and VxLAN tunnels for tenant-level traffic isolation
Noisy neighbors
15. How We Did It: Network Architecture
OVS
Container
Orchestrator
DHCP/DNS
VxLAN tunnel
NIC
Tenant Networks
OVS
NIC
Resource
Manager
Node
Manager
Node Manager SparkMaster
SparkWorker
Zeppelin
16. How We Did It: Implementation
Storage
• Tweak the default size of a container’s /root
- Resizing of storage inside an existing container is tricky
• Mount logical volume on /data
- No use of overlay file systems
• DataTap (version-independent, HDFS-compliant)
connectivity to external storage
Image Management
• Utilize Docker’s image repository
TIP: Mounting block
devices into a container
does not support
symbolic links (IOW:
/dev/sdb will not work,
/dm/… PCI device can
change across host
reboot).
TIP: Docker images can
get large. Use “docker
squash” to save on size.
17. How We Did It: Security Considerations
• Security is essential since containers and host share one kernel
- Non-privileged containers
• Achieved through layered set of capabilities
• Different capabilities provide different levels of isolation and protection
• Add “capabilities” to a container based on what operations are permitted
18. How We Did It: Sample Dockerfile
# Spark-1.5.2 docker image for RHEL/CentOS 6.x
FROM centos:centos6
# Download and extract spark
RUN mkdir /usr/lib/spark; curl -s http://d3kbcqa49mib13.cloudfront.net/spark-1.5.2-bin-hadoop2.4.tgz | tar -xz -C /usr/lib/spark/
# Download and extract scala
RUN mkdir /usr/lib/scala; curl -s http://www.scala-lang.org/files/archive/scala-2.10.3.tgz | tar xz -C /usr/lib/scala/
# Install zeppelin
RUN mkdir /usr/lib/zeppelin; curl -s http://10.10.10.10:8080/build/thirdparty/zeppelin/zeppelin-0.6.0-incubating-SNAPSHOT-v2.tar.gz|tar xz -C
/usr/lib/zeppelin
RUN yum clean all && rm -rf /tmp/* /var/tmp/* /var/cache/yum/*
ADD configure_spark_services.sh /root/configure_spark_services.sh
RUN chmod -x /root/configure_spark_services.sh && /root/configure_spark_services.sh
19. BlueData Application Image (.bin file)
Application bin file
Docker
image
CentOS
Dockerfile
RHEL
Dockerfile
appconfig
conf Init.d startscript
<app>
logo file
<app>.wb
bdwb
command
clusterconfig, image, role,
appconfig, catalog, service, ..
Sources
Docker file,
logo .PNG,
Init.d
RuntimeSoftware Bits
OR
Development
(e.g. extract .bin and modify to
create new bin)
28. 5 fully managed Docker
containers with persistent
IP addresses
“Dockerized” Spark Standalone
Spark with Zeppelin
Notebook
29. Big Data on Docker: Key Takeaways
• All apps can be “Dockerized”,
including Hadoop & Spark
- Traditional bare-metal approach to Big
Data is rigid and inflexible
- Containers (e.g. Docker) provide a more
flexible & agile model
- Faster app dev cycles for Big Data app
developers, data scientists, & engineers
30. Big Data on Docker: Key Takeaways
• There are unique Big Data pitfalls & challenges with Docker
- For enterprise deployments, you will need to overcome these and more:
Docker base images include Big Data libraries and jar files
Container orchestration, including networking and storage
Resource-aware runtime environment, including CPU and RAM
31. Big Data on Docker: Key Takeaways
• There are unique Big Data pitfalls & challenges with Docker
- More:
Access to Container secured with ssh keypair or PAM module
(LDAP/AD)
Fast access to external storage
Management agents in Docker images
Runtime injection of resource and configuration
information
32. Big Data on Docker: Key Takeaways
• “Do It Yourself” will be costly and time-consuming
- Be prepared to tackle the infrastructure & plumbing challenges
- In the enterprise, the business value is in the applications / data science
• There are other options …
- Public Cloud – AWS
- BlueData - turnkey solution
Overview of Results
Testing revealed that, across the HiBench micro-workloads investigated, the BlueData EPIC software platform enables performance that is comparable or superior to that on bare-metal.
Results are summarized in the above chart. The elapsed or execution times for physical Hadoop were used as the baseline (i.e. 100%) and the corresponding elapsed or execution time for the same test on BlueData EPIC compared to the bare metal execution time.
The elapsed time results show that while BlueData EPIC is twice as fast for write dominant I/O workloads as shown with DFSIO Write as well as Teragen.
The elapsed time results show the BlueData EPIC is comparable to bare metal for read dominant I/O workloads. However, even with lower throughput for DFSIO read, the performance of real world workloads that consist of series of read and write operations over a period of time, would make BlueData performance equivalent or better than bare metal.
This balanced read/write performance is demonstrated by the TeraSort read/write results (shown above). Even with just a single virtual node per host, the performance of virtualized EPIC platform is within 2% of bare metal performance.
The elapsed time results show that BlueData EPIC is 10-15% faster for compute intensive workloads as shown with Wordcount.
The superior performance of the BlueData EPIC software platform for write-dominant workloads is due to the application aware caching enabled by IOBoost technology. The non-persistent memory cache improves the efficiency of access to physical-storage devices by enabling a write-behind cache, optimizing the performance of sequential writes. In general, any write operation, including those outside the scope of these benchmarks will benefit from write-behind cache since Hadoop uses a separate thread for write operations. BlueData IOBoost acknowledges this write request immediately thereby enabling the application to continue data processing without any latency.
BlueData performance on read-dominant workloads is comparable to bare metal in part due to the read-ahead cache implemented in IOBoost. Unlike write operations where a separate thread is used, Hadoop applications have rigid semantics for read operations where in they wait for the read to complete before processing is continued. As a result, the read ahead cache has a lesser contribution to the I/O throughput for typical Hadoop applications.
In summary, the use of a single virtual node per physical host shows that there is minimal to no overhead using virtualized EPIC platform compared to bare metal/physical Hadoop.
Jason to moderate and introduce questions – then field questions with Anant & Tom to answer
Jason: Thank you to everyone for attending this webinar – and a special thanks to our presenters ….
All attendees of this webcast can click on the “Attachments” tab and download a copy of the slides – you’ll also have access to the on-demand replay.
And if you’d like more information about BlueData software, you can visit our website at bluedata.com, contact us directly, or try the free version of our software at bluedata.com/free
Again, thanks and have a great day.