Presentation discussing the story of engineering culture at Etsy and the lessons learnt of maintaining a genuine and engaging culture in a rapidly growing technology company.
1. Continuously Deploying
Culture 2.0
A Story Of Scaling Engineering Culture at Etsy
Rich Smith
Director of Security, Etsy
@iodboi Agile Iceland 2014
2. Who Am I ?
• Rich Smith, Brooklyn NYC
• Director of Security Engineering at Etsy
• Cover the AppSec, NetSec, Risk Engineering teams
• Also building the security organisation & security culture
• Co-Founder of here in Iceland !
@iodboi
3. @iodboi
WTF is !
• Online community for
handcrafted & vintage
items
• Human scale
manufacturing
• Marketplace for small
businesses
4. @iodboi
Now (FY 2013)
• Gross Marketplace Sales (GMS) $1.35 Billion
• 40 million members, 1 million active sellers
• 26 million active listings
• 200+ Countries Performing Transactions
• >615 Employees
• Offices in 8 countries
5. @iodboi
Core Engineering Principles
• Empower the edges
• Trustworthy not trusted
• Every engineer can push to prod at any time
• ‘Just Ship’ - Get things done
• ‘If it moves graph it’ - Let the data lead you
9. Continuously Deploying Culture v1
• Mike Rembetsy & Patrick McDonnell
• Gave a talk at the Velocity conferences in 2011/12
• Etsy’s engineering culture evolution 2006-2011/12
• Slides here: http://slidesha.re/1xYxZrG
Watch it here: http://vimeo.com/51310058
• Today we are extending those lessons up to the
@iodboi
13. 2006 - 2008 Silos and Barriers
• Etsy 4 person startup grows to employ 30 - 35 FTE’s
• Around 15 engineers
• A very siloed culture, creates barriers to engineering
collaboration
• Bred initiatives like Sprouter - ‘Middleware of distrust!’
• Project dedicated to stopping engineers touching databases
@iodboi
14. Management Changes
• Maria Thomas from NPR promoted to CEO
• Brings a clear understanding that community is very important
• Prioritises a culture that supports community
• Chad Dickerson brought on as CTO
• Brings a clearer focus to the engineering team
• ‘This Silo’d culture cannot work, we need to start over’
@iodboi
15. @iodboi
2006-2008 Takeaways
• Downtime was an accepted fact of life
• It was even expected to a degree!
• Engineering projects were often low impact
• Community needs to be a technical focus
• Survived the holiday season … just!
16. @iodboi
2008 End of Year Snapshot
Employees Engineers
35 15
$87.5 Million
GMS
17. 2009 Internal Improvements
• As teams grow, big efforts in good communication
• Daily standups begun
• Much better cross-team collaboration between Ops &
others
• Network solidified and provided basis for future growth
• Moved from Downtown Brooklyn to DUMBO
@iodboi
18. 2009 Takeaways
• Big growth year
• Built solid foundations:
• Infrastructure
• Invested in human capital
• ‘DevOps’ culture begins in earnest …..
• A lot of reflection and finding an Engineering identity
@iodboi
22. 2010 Standardisation & Graphs
• Moved to PHP & MySQL for *
• It almost doesn’t matter what you choose, just stick to it
• ‘If it moves Graph it’
• Graphite, Ganglia, FITB, Weathermap, Nagios, Naglite …..
• Starting to use this data for work/life balance as well as
technical/systems reasons
@iodboi
25. Management Ideals Engineering Ideals
@iodboi
2010 Following our ideals …..
• Blameless PostMortems
• 1:1’s as a core mgmt tool
• Eng career planning (Reverb)
• Accept failures, but not low
standards
• Developer on-call
• Use of A/B testing
• Lots of Prototypes
• FeatureFlags & Ramp Up
26. 2010 Takeaways
• Reduce number of technologies used in development
• Focus on technical visibility throughout the org
• Developers responsible for code release (start of DevOps)
• Member support rotations for all
• Work hard at work/life balance & have data to support
@iodboi
28. 2011 Tech highlights
• End of long tail legacy silo holdovers (Sprouter gone!)
• Non-Standard technologies removed from production
• Engineers receive 3 annual goals:
• Speak at a conference
• Write a blog post
• Release open source software
@iodboi
29. 2011 - Organisational Changes
• Snr. management to become more Engineering focused
• Chad to CEO
• Kellan to CTO
• Allspaw to SVP of Operations
• Consolidates importance of engineering culture to the
very top of Etsy and increases stability
@iodboi
30. 2011 Takeaways
• Year of the Open Sourced tool
• Statsd, Logster, Deployinator, Supergrep, Schemanator ….
• Overall maturing of engineering - platform & people
• Automation & config management solidified (Chef everything)
• Security starts becoming a 1st class citizen
• ‘Security Culture at Etsy’ begins to be chased & discussed
@iodboi
33. 2011-2012 - A Focus on Security
• Security alongside Dev & Ops as being integral to culture
• Applying our ‘DevOps’ principles & learnings to security
• Emphasis on security being a facilitator not a blocker
• Security often ‘enforced’ with terrible cultural impact
• Build a human and effective security organization
@iodboi
34. 2012 - Growth + Foster Our Values
• Explosive growth in hiring, allow easy transfers
• Some major changes around product
• Increased focus on community
• Internationalisation
• High impact products (Shipping Labels, Gift Cards)
• Became a certified B-Corp - not just the bottom line
@iodboi
35. What’s a B-Corp ?
• Aim to use the power of business to solve social &
environmental issues
• Impacts engineering in new and interesting ways:
• Waste, Recycling, Compost, Flushes (Yes we graph them!)
• Efficiency of our tech, data centre usage & partners
• ‘Make the world more like Etsy’ - Extending the culture
@iodboi
36. 2012 - Technical Achievements
• Create wholly separate payments environment
• Allows PCI compliance without disrupting the culture
• Interface with the webstack via a restricted Internet facing API
• Get serious on Data Science
• Dedicated Hadoop cluster for full time data scientists
• Taking some chances and broadening of our engineers
@iodboi
37. 2012 Takeaways
• Do what’s needed to sustain long term & not just keep
the lights on
• More headcount than required allows us to take chances
• Focus on info exchange, internally & externally with
communities
• Open source all of the things
@iodboi
39. 2012 Action Items
• Security is integral DevOps lifecycle and culture
• Know when to flick sights from short to longer term
goals
• Pursue dynamic engineering resource allocation
• Do not allow increasing org size to dictate culture
@iodboi
40. 2013 - An Interesting Year!
• Had many of the hard engineering wins taken care of …
• Time to focus internally
• No engineer can know everything any longer
• Need to maintain the culture of transparency & trust
• Really was the year of internal tooling to achieve this
@iodboi
41. 2013 - Technical
• Morgue tool created to capture and aid postmortems
• Moved to Vertica for BI data & metrics
• Superbit allowed simple querying of Vertica & big data
by anyone who knows SQL
• Catapult launched to relate metrics to experiments
• Begin a refocus on a Mobile/API First product vision
@iodboi
42. 2013 Takeaways
• Democratisation of data is made easier with tooling that
levels access and allows interrogation by ALL
• Conscious effort on internal tooling to minimize the
pain of large & complex stack
• Engineering invested in transparency & trust
• The world doesn’t wait, mobile is the future
@iodboi
43. @iodboi
2013 End of Year Snapshot
Employees Engineers
615 >33%
$1.35 Billion
GMS
44. 2013 Action Items
• Datasets grow, evaluate how they can be accessed,
evaluated and contextualized
• Have you reached a point where no one can know everything?
• While tooling can’t create culture it can help you support it
• Be free to apply your culture in new ways
• Inward focus cannot lead to outward blindness, tech changes fast
@iodboi
45.
46. 2014 - Organisational changes
• Everyone pushes on their 1st day, not just Engineers
• Yearly planning is restructured
• Take account of a growing Etsy
• San Francisco opens as 1st non-Brooklyn Eng hub
• Acquire & integrate A Little Market with Etsy
@iodboi
47. Cultural Acquisition
• As part of growth, Paris based A Little Market acquired
• Integrating another engineering culture can be tough
• Etsy’s culture is ‘different’ & this can be a big step
• Language, timezone and human cultural differences
• Can be very successful, but don’t underestimate
@iodboi
48. 2014 - Technical
• Move away from Splunk and to ElasticSearch/Logstash/
Kibana (ELK)
• Mobile CI infrastructure embedded & ramped up
• API First a huge effort and development push
• Mobile First as an increasing product focus
• Technical work for quality of life - on-call sleep tracking
@iodboi
49. Logstash Lessons
• Replacing Splunk with LogStash taught many lessons
• Changing of a core tool require huge comms investment
• Without it enclaves & silos can form to resist change
• Explain the whys not just in terms of technicals or $$
• Fully understanding all use cases, not just the main ones
• Don’t settle for a half complete end goal, go the distance
@iodboi
50. API First
• Supporting the Mobile First push & diversity of clients
• No longer assume LAMP, decoupling required
• Adds security & agility
• Embeds fundamental future resilience
• Capacity planning becomes more challenging
@iodboi
51. Mobile First
• Applying your principles and culture to the changing
tech landscape is key
• Continuous Deployment hard in the ‘App Store world’
• Continuous Integration still applies of course
• Continuous Deployment becomes Continuous Delivery
• Still use API to enable feature flag driven native apps
@iodboi
52. Continuous Deployment Continuous Delivery
Frequent checkins
directly to mainline ✓ ✓
Automated build & test
cycle ✓ ✓
Keep the build green,
always ready to release ✓ ✓
One button deploys ✓ ✓
Business dictates when
to deploy ✓
Every passing build
deployed to prod ✓
All enhancements gated
by feature flag ✓ ?
53. Why This Approach?
• Continuous integration, Continuous Delivery
• Build your apps in a reproducible way after each push to git
• Identify bugs, missing dependencies early & often
• Integrate security testing throughout lifecycle
• Improve Mean Time To Recovery
• Stop stressing about releases!
@iodboi
54. Single
release
Many
releases
50K LOC/month
Few opportunities for failure
Wide surface area (50,000 LOC)
High MTTR
!
All of the bugs we’ve written
More opportunities for failure
Narrow surface area (< 100 LOC)
Low MTTR
!
A fraction of the bugs we’ve
written per release
Imagine that we’ll write
55. @iodboi
Sleep Tracking
• Experiment with fitbands & Ops
• Collect sleep data for on-call
• Analyse in a variety of manners
• Sleep lost when on-call/pagerduty
• Alert on VPN/SSH logins while asleep
• Focus on data for quality of life
56. 2014 Takeaways
• Another year of big growth, also now M&A
• Integrating other engineering cultures inside your own
is a challenge you should prepare for
• Core tooling changes require great thought & comms
• Mobile focus does not mean the end of always pushing
• Tooling for happiness & W/L balance is a win for all
@iodboi
57. 2014 Action Items
• Culture is still king despite growth or M&A activity
• It takes effort to keep it so however
• Ensure your API is up to the job of supporting Mobile 1st
• Ensure core tooling changes are understood &
embraced by all
• Communicate your Eng culture & history to new hires
@iodboi
59. Conclusions
• Culture doesn’t come for free, it takes continuous work
• Iterate & improve - Even when you think you have ‘it’
• Don’t give in to potential disruptors like growth &
security and let them destroy your culture
• Get smart and use them to test, support and improve it
@iodboi
62. @iodboi
Links / References
Continuously Deploying Culture
(Mike Rembetsy, Patrick McDonnell)
Slides: http://slidesha.re/1xYxZrG
Video: http://vimeo.com/51310058
Scaling Etsy, what went wrong, what
went right (Ross Snyder)
Slides & Video: http://bit.ly/po8zIj
Etsy’s journey to continuous
integration for mobile apps (Nassim
Kammah)
Blog post: http://bit.ly/1yiGWwc
Mean time to sleep (Ryan Frantz,
Laurie Denness)
Slides, Blog post, code:
http://ryanfrantz.com/mtts/