24. Initial approach
• Use Mesos to provide resource guarantees
• Users include resources needed as part of
topology submission
25.
26. Solution
• Implement new scheduler which gives
production topologies dedicated hardware
• Only Storm team can configure production
topologies
• Left-over machines are used as failover or
for in-development topologies
27.
28. Data Engineering vs Data Science
• Well-defined problems
• No special statistics skills required
• Larger scope
• Not just analytics
29. Open source
• Almost all major Big Data tools are open
source (e.g. Hadoop, Storm, Spark, Kafka,
Cassandra, HBase, etc.)
• Many have commercial support
30. Open source
• Very important for recruiting data
engineers
• Strong developers want to work at places
where they can be involved with open
source
31. Open source
• Develop a technology brand for company
(in conjunction with a tech blog)
• Creating a popular open source project can
give you access to lots of strong engineers
32. Open source
• Identify strong engineers in the community
you may want to recruit
• Learn best practices and get help from the
people who know the tools the best
• *Do not* expect to get “free work” on
your projects
33. Ideal data engineer
• Strong software engineering skills
• Abstraction
• Testing
• Version control
• Refactoring
35. Ideal data engineer
• Strong software engineering skills
• Strong algorithm skills
• Good at digging into open source code
36. Ideal data engineer
• Strong software engineering skills
• Strong algorithm skills
• Good at digging into open source code
• Good at stress testing
37.
38. Finding strong data engineers
• Standard “coding on the whiteboard”
interviews are near useless
• Use take home projects to gauge general
programming ability
• The best is to see projects that require
data engineering