Gives a high level view about Insight Engineering team @Netflix. We build and operate the observability stack for Netflix. Learn more about the scope and impact from this team.
7. Atlas
● Dimensional time series DB
● Massive scale: 5 Billion unique
time-series every minute
● Flexible and powerful stack
language
● UI to explore data
● Streaming, near real-time and
historical data access
● Instrumentation libraries
● Polyglot support
● Open source
9. Chronos
● Helps answer “What changed?”
● Change audit system
● Tracks all deployment
changes, configurations
updates and more.
● Integrated with AWS CloudTrail
● REST interface and client
library
● Multi-tenant deployment -
supports multiple indexes
● Backed by ElasticSearch
● Self serve UI to explore data
10. Tracing
● Distributed tracing solution
● Drives performance analysis
● Builds a holistic SOA graph
● Aggregation capabilities
● Self service UI
● Migrating towards Zipkin
11. Lumen Dashboard
● Data Visualization Platform
● Data-source agnostic (Http, SSE,
Websockets, Atlas, Mantis, Druid
ElasticSearch etc.)
● JSON Config driven
● Multi tenant
● Fully self-service
● Reusable definitions
● Deep linking and URL friendly
● Powerful and flexible
● Dynamic
● Satisfies thousands of custom
dashboard needs across Netflix
12. Alerting
Trend Deviation
● Detect anomalies from
operational datasets
● Real-time processing
● Anomaly definition as config
● Threshold based (dynamic,
static)
● Outlier detection
● Self-service UI
● Wizard interface for quick
setup
● Templatized best practices
● Versioning, Analytics, Audit
interfaces
13. Diagnostics
Trend Deviation
● Help reduce Mean Time to
Resolve.
● Context Integrated with
existing modes of visualization
(Lumen/AtlasUI/Email)
● Inout for auto-remediation
● Overlay’s targeted
infrastructure events over
metrics
● Identify useful change in metric
relationships at the time of
failure.
14. Remediate
● Automated actions to mitigate
impact
● Diverse actions (Terminate,
Reboot, Detach ...)
● Notifications support
(email/page/slack)
● Safety, Deduping, fall back
policies etc.
● Custom runbook support
● Runbook lifecycle
management
● Auditing, history, versioning
etc..
16. Open Source links and publications
Open Source projects
● https://github.com/Netflix/atlas
● https://github.com/Netflix/spectator
Publications
● https://tinyurl.com/NetflixWinston
● https://tinyurl.com/NetflixBolt
● https://tinyurl.com/NetflixAtlas
● https://tinyurl.com/NetflixEdda
17. We are hiring!
Join us to build and evolve the observability stack.
● UI Engineer - Build interfaces to deliver critical
Insights and enable self serve experiences.
● Backend Engineer - Build fault tolerant and
scalable distributed systems.
● Engineering Manager - Lead the RADAR
(Real-time Anomaly Detection and
Remediation) team within Insight Engineering.
Contact Vinay Shah
● vshah@netflix.com
● https://www.linkedin.com/in/shahvinay/
Netflix is a unique place to work. Read more about
Netflix Culture and learn more at jobs.netflix.com