SlideShare a Scribd company logo
1 of 61
Your Trusted Third Party in the Digital Age™
Scalding on Tez
Twitter HQ, July 14th, 2015
Copyright©2015TransparencyRightsManagement.Allrightsreserved
2
• Who’s this guy?
• How did we come to use Scalding?
• Scalding on Tez: the Mini-HOWTO
• In practice
• Tips and Tricks
• All aboard: how?
• Performance
Agenda
Copyright©2015TransparencyRightsManagement.Allrightsreserved
3
WHO’S THIS GUY?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
4Images: Amos Evans / « Rama » / Marcin Wichary // Wikipedia
• I’m 39
• My oldest
computer is 33
Who’s this guy?
8-bit
Basic(s) Z80
assembly
Turbo
Pascal
C++
Python
Java
ISO CNC
C#
Scala
Still afraid of
Shapeless
Copyright©2015TransparencyRightsManagement.Allrightsreserved
5
HOW DID WE COME TO SCALDING?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
6
• A Trusted Third Party
– Data escrow, controlled execution
– Independent re-computation
– Privacy & Personal Data compliance assessment
• Big Data Services for Entertainment
– Metadata enrichment
– IP use certification
– Dataset analysis as a service
Why Scalding?
Transparency Rights Management:
Copyright©2015TransparencyRightsManagement.Allrightsreserved
7
Why Scalding?
« Big Data Services for Entertainment » - a Use Case
Digital Service
Provider
Report
Copyright Owners /
Collective
Management
Organizations
Copyright©2015TransparencyRightsManagement.Allrightsreserved
8
Why Scalding?
« Big Data Services for Entertainment » - a Use Case
Digital Service
Provider
Report
Copyright Owners /
Collective
Management
Organizations
Data
Improvement
Automatic Data Feed
(« in your format »)
Independent
Report
Conformance
Report
Copyright©2015TransparencyRightsManagement.Allrightsreserved
9
• September 2013: SQL Server overheats
• October 2013: using Lingual
12 SQL steps + bash scripts
• September 2014: Cascading + Java
• September 28th: tried out Scalding
• November 2014: delivered first results on
Scalding
• April 2015: First success on Scalding+Tez
Why Scalding?
Dataset analysis (from YouTube monthly reports)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
10
Our system…
Jenkins
git
Mesos
Chronos Marathon
YARN 2.6.0
HDFS 2.6.0
Debian Debian Debian DebianDebian
Ansible
APP
scalding
cascading
YARN
RM
APP (WS)
Akka Spray
Artifactory
4-way
Non-Reg
Jenkins
Slave
Copyright©2015TransparencyRightsManagement.Allrightsreserved
11
Our system…
7 machines, and still a lot of things to discover
Copyright©2015TransparencyRightsManagement.Allrightsreserved
12
SCALDING ON TEZ,
THE MINI-HOWTO
Copyright©2015TransparencyRightsManagement.Allrightsreserved
13
• Step 0: Prerequisites:
– A YARN cluster
– Cascading 3.0
– TEZ runtime lib in HDFS
– A version of scalding with fabric selection
Scalding on Tez, the mini-howto
(2.6.0)
0.6.2-SNAPSHOT
0.13.1 + PR1220
Copyright©2015TransparencyRightsManagement.Allrightsreserved
14https://github.com/cchepelov/wcplus/blob/master/build.sbt
Scalding on Tez, the mini-HOWTO
• Step 1: build.sbt
Copyright©2015TransparencyRightsManagement.Allrightsreserved
15
Scalding on Tez, the mini-HOWTO
• Step 1: build.sbt (redux)
1. Regain control on what libraries are included
2. Exclude some « long transitive » dependencies that pull in junk
3. Put in the desired fabric, in a configurable way
sbt --DCASCADING_FABRIC=hadoop clean assembly
Copyright©2015TransparencyRightsManagement.Allrightsreserved
16
Scalding on Tez, the mini-HOWTO
• Step 1bis: assembly.sbt
We’re using fatjars to simplify deployment.
Because of jar hell, we « need » a complicated assembly.sbt
https://github.com/cchepelov/wcplus/blob/master/assembly.sbt
Copyright©2015TransparencyRightsManagement.Allrightsreserved
17https://github.com/cchepelov/wcplus/blob/master/src/main/scala/com/transparencyrights/demo/wcplus/CommonJob.scala
Scalding on Tez, the mini-HOWTO
• Step 2: a few job flags
Copyright©2015TransparencyRightsManagement.Allrightsreserved
18
• tez.task.resource.memory.mb
– As large as you can afford to give, per CPU per node
– The more memory, the less Tez needs to spill
intermediates to disk
• tez.container.max.java.heap.fraction
– Defaults (1024MiB * 0.8) assume the JVM’s Native
memory requirements don’t exceed 208 MiB
– Scalding + the Scala runtime + Cascading on top of
Tez seems to require more.
YARN kills offenders switftly!
– The 460MiB figure we’re using (1024+512)*(1-0.7)
may be a bit wasteful
• Step 2: a few job flags (continued)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
19
THAT’S IT.
(ALMOST)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
20
IN PRACTICE…
Copyright©2015TransparencyRightsManagement.Allrightsreserved
21
« A VERSION OF SCALDING WITH
FABRIC SELECTION »
WAIT, WHAT?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
22
Scalding traditional --local and --hdfs flags:
– Uses either LocalFlowConnector or
HadoopFlowConnector
– Types are hard-coded
Cascading 2.5 introduced a new fabric concept.
You can run either with cascading-hadoop or
with cascading-hadoop2-mr1. But:
– Incompatible jars (can’t load both)
– Main types visible to Scalding are different
In practice
« A version of scalding with fabric selection » Wait, What?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
23
PR1220:
 No longer hardcodes « either Local or Hadoop 1.X »
 Enables supplying any flow connector
implementation, as long as the jar’s around.
 --hdfs to be deprecated as an alias to --hadoop1
 Still built against Cascading 2.6
In practice
« A version of scalding with fabric selection » Wait, What?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
24
« STILL BUILT ON CASCADING 2.6 »
WHY?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
25
Cascading 3.0 has carefully updated some argument types
to prepare for the future
This is source- and binary-compatible:
In practice
« Still built on Cascading 2.6 »
Scala enforces generic type safety, and the Cascading 3.0
upgrades are not legal with scalac.
But they still are with the JVM…
libraryconsumer
LibraryV2
Same
consumer
In Java
Copyright©2015TransparencyRightsManagement.Allrightsreserved
26
Scalding will require some adjustment to
become compatible with the java-level source
upgrades.
Can this happen without breaking scalding
application source code ?
In practice
… Going to native Cascading 3.0 ?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
27
GUAVA
Copyright©2015TransparencyRightsManagement.Allrightsreserved
28
GUAVAGUAVA
Copyright©2015TransparencyRightsManagement.Allrightsreserved
29
• Guava is a nice library…
… of little use in Scala (?)
• In a Scalding/Cascading/Tez JVM, multiple versions of
guava are required. Each layer depends on its own
version.
About every single version from 11.0 to 16.0.2
• There have been breaking changes (method renames &
removals) in guava 13
• These happen on really mundane objects (Closeable,
Stopwatch), but they’re major troublemakers
In practice…
Guava.
Copyright©2015TransparencyRightsManagement.Allrightsreserved
30
• Asking Apache to quickly upgrade to guava
18, or Google to re-introduce deprecated
interfaces… probably not immediate
• Solution: Frankenguava.
In practice…
Guava Hell: a temporary solution
Guava 18.0 JAR
Copyright©2015TransparencyRightsManagement.Allrightsreserved
31
• Asking Apache to quickly upgrade to guava
18, or Google to re-introduce deprecated
interfaces… probably not immediate
• Solution: Frankenguava.
In practice…
Guava Hell: a temporary solution
Guava 18.0 JAR
Stopwatch &
Closeables
Copyright©2015TransparencyRightsManagement.Allrightsreserved
32
• Asking Apache to quickly upgrade to guava
18, or Google to re-introduce deprecated
interfaces… probably not immediate
• Solution: Frankenguava.
In practice…
Guava Hell: a temporary solution
Guava 18.0 JAR
Stopwatch &
Closeables including
deprecated
overloads
Stopwatch &
Closeables
Copyright©2015TransparencyRightsManagement.Allrightsreserved
33
• Step 1: Post-prepare
the Tez runtime
• Step 2: Enforce the use
of the appropriate
guava
In practice…
Frankenguava: howto
• Build tez from source
• Unpack runtime jar from tez-dist
• Remove guava
• Put frankenguava
• Repack
• Deploy on HDFS
Copyright©2015TransparencyRightsManagement.Allrightsreserved
34
CASCADING’S TEZ*REGISTRY
Copyright©2015TransparencyRightsManagement.Allrightsreserved
35
• Cascading 3.0 uses a set of mapping registries
to convert cascading patterns into the back-
end API.
The Tez registries are new, and distinct from the MR
registries
• The Tez registries are hardened against
Concurrent’s extensive test library, which is built
on years of MR experience.
Tez has its own trouble spots.
Beware of hash joins.
• It works fine now, but getting the scalding test
library onboard will help a long way.
In practice…
Cascading’s Tez*Registry
Copyright©2015TransparencyRightsManagement.Allrightsreserved
36
• It works mostly fine now, but getting the scalding
test library onboard will help a long way.
In practice…
Cascading’s Tez*Registry
Last-minute update:
.filterWithValue / .mapWithValue currently
crash the Cascading planner (as of 3.0.1)
(implementation uses a HashJoin)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
37
AN EXAMPLE
Copyright©2015TransparencyRightsManagement.Allrightsreserved
38
A small test:
Copyright©2015TransparencyRightsManagement.Allrightsreserved
39
A small test: « wc plus »
70 books
1.1M lines
10M words
56M bytes
Word,
relative frequency,
deviation from median relative freq
Two Words,
relative frequency,
deviation from median relative freq
Ten Words,
relative frequency,
deviation from median relative freq
Compute
Frequencies
Ignoring things that
are more frequent
than 80% of the max
word frequency
All Expressions (1-W to 10-W),
relative frequency,
deviation from median relative freq
…
Copyright©2015TransparencyRightsManagement.Allrightsreserved
40
A small test: « wc plus »
70 books
1.1M lines
10M words
56M bytes
Word,
relative frequency,
deviation from median relative freq
Two Words,
relative frequency,
deviation from median relative freq
Ten Words,
relative frequency,
deviation from median relative freq
Compute
Frequencies
Ignoring things that
are more frequent
than 80% of the max
word frequency
All Expressions (1-W to 10-W),
relative frequency,
deviation from median relative freq
…
No .filterWithValue /
.mapWithValue for now
Roulex45 / Wikipedia
count
count
count
count
Copyright©2015TransparencyRightsManagement.Allrightsreserved
41
A small test: « wc plus »
Copyright©2015TransparencyRightsManagement.Allrightsreserved
42
TIPS & TRICKS
Copyright©2015TransparencyRightsManagement.Allrightsreserved
43
Run your job with
-Dcascading.planner.plan.path=/tmp/path/to/plan.lst
The planner will output a lot of useful files. One of them is
…/$(Job)/4-final-flow-steps/0000-step-node-sub-graph.dot
Run that file through graphviz
dot –O –Tpdf 0000-step-node-sub-graph.dot
or, if the PDF is illegible, Firefox’s great at zooming into SVG files:
dot –O –Tsvg 0000-step-node-sub-graph.dot
Tips & Tricks
0000-step-node-sub-graph.dot
Copyright©2015TransparencyRightsManagement.Allrightsreserved
44
Tips & Tricks
0000-step-node-sub-graph.dot
This is how TEZ names
our stuff !
Copyright©2015TransparencyRightsManagement.Allrightsreserved
45
MR
– One flow, many (MANY)
independent steps
– One or more operators per
step
– Step-to-step
communications involve
disk (HDFS)
– Each step is independent
as far as MR is concerned
– Step scheduling managed
from outside the cluster,
by Cascading
TEZ
– One flow, one DAG. A DAG
includes several nodes.
– One or more operators per
node
– Node-to-Node
communications managed
by TEZ. Memory, direct
network or disk as
necessary
– YARN sees one
« Application » per flow
– Node scheduling managed
by TEZ DAG AppMaster
Tips & Tricks
Major differences between how a cascading job gets
mapped to MR and to TEZ:
Copyright©2015TransparencyRightsManagement.Allrightsreserved
46
Tips & Tricks
yarn-swimlanes.sh
• A tool included in the tez source distribution,
in tez-tools/swimlanes (bash + python)
• Requires YARN ATS to work
« yarn logs –applicationId application_1345431315_1511 » must work
• Reports, in a GANTT chart, the per-container
occupation
Copyright©2015TransparencyRightsManagement.Allrightsreserved
47
Tips & Tricks
yarn-swimlanes.sh (2)
application_1435150225179_0474.svg
Copyright©2015TransparencyRightsManagement.Allrightsreserved
48
Tips & Tricks
yarn-swimlanes.sh (3)
time
containers
Copyright©2015TransparencyRightsManagement.Allrightsreserved
49
Tips & Tricks
Consider using .forceToDisk to ensure work is balanced
within the DAG
890 seconds
160 seconds
Copyright©2015TransparencyRightsManagement.Allrightsreserved
50
Tips & Tricks
Consider using .forceToDisk to ensure work is balanced
within the DAG
890 seconds 160 seconds
Copyright©2015TransparencyRightsManagement.Allrightsreserved
51
• .forceToDisk really means « don’t merge
those two TEZ nodes » which implies
« manage appropriate data transmission
between these two nodes »
• TextFile & other FixedPathSource friends
don’t seem to automatically spread out work
as well as they used to (huh?)
• YMMV, WIP.
Tips & Tricks
• Consider using .forceToDisk to ensure work is balanced
within the DAG
Copyright©2015TransparencyRightsManagement.Allrightsreserved
52
ALL ABOARD: HOW?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
53
• A build of scalding against Cascading 3.0.x
 Fabric-switching logic
 Get the test library to pass also on Tez
 Some applications might still uncover new mapping issues 
increased community test case experience
 ???
• Getting the « guava mess » fixed
 Ideally all of Apache goes to recent guavas
 Enforced shading of Guava across the whole stack?
 Failing that, automated runtime patcher?
 (my « build stuff » partner makes me write: OSGI/Java9)
 ???
• Except for that, Tez is really easy for a YARN shop. Drop it
in, and it runs!
All aboard: how?
Smoothening up the UX for us app developers
Copyright©2015TransparencyRightsManagement.Allrightsreserved
54
PERFORMANCE
Copyright©2015TransparencyRightsManagement.Allrightsreserved
55
Performance
MR vs TEZ
Copyright©2015TransparencyRightsManagement.Allrightsreserved
56
Performance
MR vs TEZ; to scale
Copyright©2015TransparencyRightsManagement.Allrightsreserved
57
Performance
MR vs TEZ; TO SCALE!!!
MR run time:
14:22 (wall)
12:49 (cluster time)
5:43:26 (total CPU)
TEZ run time:
4:03(wall)
2:50(cluster time)
1:25:35 (total CPU)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
58
Performance
Output of tez-tool « yarn-swimlanes.sh »
• 1 « swimlane » per active container
• 1 colour per DAG Vertex (the black dots are actually the Vertex ID)
• Container occupation is pretty good while there is work to do
• (not demonstrated here) containers die when they are idle.
This is good!
Copyright©2015TransparencyRightsManagement.Allrightsreserved
59
CONCLUSION
Copyright©2015TransparencyRightsManagement.Allrightsreserved
60
As a conclusion…
A lot of effort so far… …but worth it!
Images: Nicholas Babaian // Flickr. Marathon du Médoc 2008
Copyright©2015TransparencyRightsManagement.Allrightsreserved
61
THANKS!
For building that tech
For helping out
For your attention today

More Related Content

What's hot

Red Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionRed Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionJohn Archer
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Akihiro Suda
 
Docker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the FatDocker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the FatErik Osterman
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopGanesh Raju
 

What's hot (6)

Red Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionRed Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus Introduction
 
MySQL-and-virtualization
MySQL-and-virtualizationMySQL-and-virtualization
MySQL-and-virtualization
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
 
Juggva cloud
Juggva cloudJuggva cloud
Juggva cloud
 
Docker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the FatDocker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the Fat
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 

Similar to Scalding on tez (final)

Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerBob Killen
 
Ephemeral DevOps: Adventures in Managing Short-Lived Systems
Ephemeral DevOps: Adventures in Managing Short-Lived SystemsEphemeral DevOps: Adventures in Managing Short-Lived Systems
Ephemeral DevOps: Adventures in Managing Short-Lived SystemsPriyanka Aash
 
Vagrant Workshop
Vagrant WorkshopVagrant Workshop
Vagrant Workshopsys army
 
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarryCeph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarryThe Linux Foundation
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Dinakar Guniguntala
 
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)Andrew Randall
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesWeaveworks
 
Fabric8 - Being devOps doesn't suck anymore
Fabric8 - Being devOps doesn't suck anymoreFabric8 - Being devOps doesn't suck anymore
Fabric8 - Being devOps doesn't suck anymoreHenryk Konsek
 
Vagrant crash course
Vagrant crash courseVagrant crash course
Vagrant crash courseMarcus Deglos
 
GC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerGC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerMonica Beckwith
 
Kubernetes 101 VMworld 2019 workshop slides
Kubernetes 101 VMworld 2019 workshop slidesKubernetes 101 VMworld 2019 workshop slides
Kubernetes 101 VMworld 2019 workshop slidesSimone Morellato
 
Nagios Conference 2014 - Gerald Combs - A Trillion Truths
Nagios Conference 2014 - Gerald Combs - A Trillion TruthsNagios Conference 2014 - Gerald Combs - A Trillion Truths
Nagios Conference 2014 - Gerald Combs - A Trillion TruthsNagios
 
[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1Rubens Dos Santos Filho
 
The Container Security Checklist
The Container Security Checklist The Container Security Checklist
The Container Security Checklist LibbySchulze
 
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...Cloud Native Day Tel Aviv
 
WWCode Dallas - Kubernetes: Learning from Zero to Production
WWCode Dallas - Kubernetes: Learning from Zero to ProductionWWCode Dallas - Kubernetes: Learning from Zero to Production
WWCode Dallas - Kubernetes: Learning from Zero to ProductionRosemary Wang
 
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019The Eclipse Foundation
 

Similar to Scalding on tez (final) (20)

Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and Docker
 
Ephemeral DevOps: Adventures in Managing Short-Lived Systems
Ephemeral DevOps: Adventures in Managing Short-Lived SystemsEphemeral DevOps: Adventures in Managing Short-Lived Systems
Ephemeral DevOps: Adventures in Managing Short-Lived Systems
 
Vagrant Workshop
Vagrant WorkshopVagrant Workshop
Vagrant Workshop
 
Make Accelerator Pluggable for Container Engine
Make Accelerator Pluggable for Container EngineMake Accelerator Pluggable for Container Engine
Make Accelerator Pluggable for Container Engine
 
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarryCeph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !
 
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slides
 
Fabric8 - Being devOps doesn't suck anymore
Fabric8 - Being devOps doesn't suck anymoreFabric8 - Being devOps doesn't suck anymore
Fabric8 - Being devOps doesn't suck anymore
 
Vagrant crash course
Vagrant crash courseVagrant crash course
Vagrant crash course
 
GC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerGC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance Engineer
 
Kubernetes 101 VMworld 2019 workshop slides
Kubernetes 101 VMworld 2019 workshop slidesKubernetes 101 VMworld 2019 workshop slides
Kubernetes 101 VMworld 2019 workshop slides
 
Nagios Conference 2014 - Gerald Combs - A Trillion Truths
Nagios Conference 2014 - Gerald Combs - A Trillion TruthsNagios Conference 2014 - Gerald Combs - A Trillion Truths
Nagios Conference 2014 - Gerald Combs - A Trillion Truths
 
Devstack On Demand
Devstack On DemandDevstack On Demand
Devstack On Demand
 
[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1
 
The Container Security Checklist
The Container Security Checklist The Container Security Checklist
The Container Security Checklist
 
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
 
WWCode Dallas - Kubernetes: Learning from Zero to Production
WWCode Dallas - Kubernetes: Learning from Zero to ProductionWWCode Dallas - Kubernetes: Learning from Zero to Production
WWCode Dallas - Kubernetes: Learning from Zero to Production
 
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
 
London Hug 20/6 - Vault production
London Hug 20/6 - Vault productionLondon Hug 20/6 - Vault production
London Hug 20/6 - Vault production
 

Recently uploaded

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 

Recently uploaded (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 

Scalding on tez (final)

Editor's Notes

  1. ZX-81: CC-BY-SA 3.0 Amos Evans / WP TO7-70: CC-BY-SA 2.0 « Rama » /WP PC-1512: CC-BY-SA 3.0 Marcin Wichary / WP
  2. Meet Jane. Jane loves music. And Jane’s favourite music video platform has all the music Jane loves. So Jane listens to music from the Platform.
  3. After october 2013: went on different things, the topic was left in storage for a while September 2014: new model, same concept; built on plain Cascading to simplify some of the hairiest SQL logic (Optiq lacks(ed) analytic functions, so the pretty much single SQL statement from SQL Server days had to be exploded into the 12 stages) Met guys from Lausanne at the end of September. Was already curious about Scala / Scalding then, decided to spend two days to give it a spin. Never turned back !
  4. Myriad’s still a wishlist item for now, as it doesn’t seem to play nice with YARN in HA mode.
  5. We REALLY don’t want to misrepresent our maturity level
  6. TEZ 0.6.2-SNAPSHOT is required, as Warning: TEZ 0.7 runtime is not API-compatible with 0.6 (altough the source-level API is quite close). Cascading might change the Tez dependency from time to time…
  7. The typical Hadoop+Tez stacks pulls in a Jetty, a Tomcat, a Jersey, multiple guavas, and the kitchen sink.
  8. We believe our workload requires 270-ish MiB of native memory. When we have time, we’ll either power down for extra sticks of RAM, or attempt to shave 20 MiB of heap per TezChild.
  9. (reportedly)
  10. Prune & Graft
  11. Prune & Graft
  12. Prune & Graft
  13. Why these two steps? The « same » code is getting executed in wildly different CLASSPATH: Cascading driver, TezChild, etc.
  14. Hash joins means hash joins, but also .filter/mapWithValue, joinWithTiny, etc.
  15. Hash joins means hash joins, but also .filter/mapWithValue, joinWithTiny, etc.
  16. Who wants to see another « Word Count » ?
  17. Who wants to see another « Word Count » ?
  18. Who wants to see another « Word Count » ?
  19. I’m not going to look into that, fairly standard code except where I’ve been naïve. You get the idea.
  20. « All of Apache goes to recent guavas… » or drops the library altogether. At the very least, every one not using the most recent version effing shades it.
  21. CC-BY-SA 2.0