An overview of our experiments at Industrial Light and Magic to create a fully cloud based pipeline, based on Mesos, Docker and automated with Ansible.
4. Nomenclature, glossary and other big words
★ VFX Visual Effects
★ Pipeline Data->Process->Data repeat!
★ Show Film
★ Sequence A thematically linked series of
(continuous) scenes!
★ Shot An uninterrupted portion of the
sequence
9. What VFX isn’t
★ Rendering and Sims are our ‘Big Data’
★ We’re not crunching analytics in real-time
★ Rendering != MapReduce
★ Apps run on hardware, not in a browser
★ We’re not here to re-write a renderer (not yet...)
Where does the cloud meet VFX?
10. What’s in it for us?
★ Reducing Capital Expenditure
★ Potentially reducing overheads
★ Flexibility
★ Giving power back to developers
13. First, what is rendering!?
★ Take a virtual 3D representation of a scene
○ 3D Models
○ Textures
○ Light sources
○ Static backgrounds (plates)
★ Place a virtual camera in the scene
★ Compute the 2D image that the camera will see
14. Rendering in the cloud
★ Low hanging fruit
★ Already happening
★ Typical Farm 30-50k procs
★ Managed by specialist software (Tractor/Deadline/in-house etc)
★ VFX has been doing clustered computing for decades
What’s next?
15. Mesos
★ Open Source framework for scheduling
★ Already used at massive scale
★ NOT a job scheduler
★ We can concentrate on the scheduling logic
★ Support for task isolation/containment (eg
Docker)
16. Automating our Mesos cluster with Docker and Ansible
★ Goals: Quick - Easy - Repeatable
★ Didn’t want to spend time fighting our config manager (or each other)
★ Be able to deploy a virtual studio from scratch in under an hour (including
provisioning, building software, deploying, configuration)
★ Run multiple versions of the infrastructure at the same time (in the same
availability zone/network)
★ If something is typed in the terminal, we want to automate and version it
Docker + Ansible was the answer
17. Automating our Mesos cluster with Ansible
★ Heavily using tags and variables in Ansible
★ Cloud agnostic: Some modification of GCE inventory and launch modules
★ Example: Creating a multi-host dynamic Zookeeper configuration -- name: Append the zookeeper server entries
lineinfile:
dest=/etc/zookeeper/conf/zoo.cfg
insertafter=EOF
line="server.{{hostvars[item]['zkid']}}={{hostvars[item]['ansible_eth0']['ipv4']['address']}}:2888:3888"
with_items: "{{ groups['tag_zookeeper_server_' + consul_domain ]}}"
18. Service Discovery in Mesos
★ No control over where a service or render runs
★ Services may move hosts
★ Can’t guarantee hosts will have same IP
★ Options:
○ Mesos-DNS
○ Homegrown (etcd etc)
○ Consul
19. Mesos and Consul
★ What is Consul?
★ Every host runs an agent
★ All DNS lookups on a host go to its agent
★ Consul servers outside the Mesos cluster
★ Mesos-Consul automates service registry
★ Can be used for services outside the cluster
20. Example - Static service outside the cluster
$ ssh -i mykey.pem username@172.100.121.100
$ docker run -d -p 5000:5000 --restart=always -e REGISTRY_STORAGE_S3_ACCESSKEY
-e REGISTRY_STORAGE_S3_SECRETKEY -e REGISTRY_STORAGE_S3_REGION -e REGISTRY_STORAGE=s3
$ curl -H "Content-Type: application/json" -X POST -d '{ "Name": "docker-registry",
"Tags": ["docker-registry", "v2"], "Port": 5000 }'
http://127.0.0.1:8500/v1/agent/service/register
21. Example - Static service outside the cluster
- name: Run docker registry container
docker:
name: docker-registry
image: registry:2.1
state: started
ports:
- "5000:5000"
restart_policy: always
env:
REGISTRY_STORAGE_S3_ACCESSKEY:
REGISTRY_STORAGE_S3_SECRETKEY:
REGISTRY_STORAGE_S3_REGION:
REGISTRY_STORAGE_S3_BUCKET:
REGISTRY_STORAGE: s3
- name: Register registry with consul
uri:
url: http://127.0.0.1:8500/v1/agent/service/register
method: PUT
body: '{
"Name": "docker-registry",
"Tags": [
"docker-registry",
"v2"
],
"Port": 5000
}'
body_format: json
30. Cloud Storage Pros and Cons
★ Managed
★ No more tape archives/backups
But..
★ Getting data into the cloud is expensive
★ Getting data into the cloud is slooow
Is there another way?
31. Work in Progress...
★ Applications need a POSIX filesystem interface
★ Can we cache cloud storage?
○ EFS
○ Avere
○ Homegrown
Can we create content entirely in the cloud?
33. Can we create content entirely in the Cloud?
★ Applications require OpenGL
★ OpenGL requires hardware
★ Hardware needs drivers
Can we do this in Docker?
34. Dockerising OpenGL Applications
★ NVIDIA drivers must match the host version
exactly
★ Driver inside the container must not install
kernel module
★ Container requires access to GPU device and X
Server
35. Running an OpenGL Docker application
docker run
-it
-v /tmp/.X11-unix:/tmp/.X11-unix:rw
--device=/dev/dri/card0
--device=/dev/nvidia0
--device=/dev/nvidiactl
-e DISPLAY
36. Scheduling a VFX app on Mesos in the cloud
★ Must use custom Mesos resources/attributes to
only schedule on GPU machines
★ Cloud machines have no monitor
★ Remote desktop apps will forward GL calls to
the client machine
37. Using VirtualGL
★ Intercepts GLX calls on the host
★ Calls forwarded to 2nd (local) X Server
★ GPU computation is done on the GPU and
output forwarded to the 2D (VNC) X Server