Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Challenges of a multi tenant kafka service

Presentation at Seattle Apache Kafka Meetup Apr 18, 2017
Abstract: Microsoft has extensive deployments of Kafka supporting large scale data streaming. This talk will introduce the challenges in building a multi-tenant system for the enterprise, and discuss the design approach we have taken.
Speaker: Thomas Alex, Principal Program Manager, Microsoft
Thomas Alex is a Program Manager in the Shared Data team at Microsoft, and has worked on many aspects of big data: data ingestion, data distribution, master data management, orchestration and ETL pipeline management, data virtualization, in-memory databases, business intelligence, and reporting.

  • Login to see the comments

Challenges of a multi tenant kafka service

  1. 1. Thomas Alex Principal Program Manager Microsoft
  2. 2.  Introduction  Goals  Solution  Tenant model  Deployment architecture  Open Discussion
  3. 3.  Siphon: Enterprise Data Bus  Near real-time  Compliant  No data dead-ends  Hyper scale  Reliable  Network effects 8 million EVENTS PER SECOND PEAK INGRESS 800 TB (10 GB per Sec) INGRESS PER DAY 1,800 PRODUCTION KAFKA BROKERS 450 TOPICS 15 Sec 99th PERCENTILE LATENCY
  4. 4. SDK Collector Siphon connector API Management UI Metadata dB
  5. 5.  Customer: Major Car Manufacturer  Scenario: Connected Car Telematics  Data producers  Millions of cars  Routed via cloud gateway to Siphon endpoint  Data consumers  Spark streaming applications  Siphon compute forwards data to blob storage
  6. 6. UI Backend Source systems Destination systems Data producers • Send data reliably Customers • Manage capacity • Manage tenant/topic/subscription • Pay for the service Data consumers • Consume data in NRT Service owners • Manage service with SLA
  7. 7.  Managed service  Availability  Reliability  Isolation  Low cost  Self-service  Regulatory Compliance  Data sharing
  8. 8. Instance Instance Instance Customer A Customer B Customer C Multiple instances Single tenant per instance
  9. 9. Customer A Customer B Customer C Single instance Multiple tenant per instance Instance
  10. 10. Customer A Customer B Customer C Multiple instances Multiple tenant per instance Instance Instance
  11. 11. Siphon Deployment Unit • Ingress service (Collector) • Kafka cluster • Connector (HLC) • Monitoring Management Service • Metadata • Self-serve API • Self-serve UI Collector HLC APIMetadata dB
  12. 12.  Tenant  Principals (administrators, users)  Resources  Endpoint  Topics  Subscriptions  Quota  Storage capacity  Throughput  Threshold for auto-approval  Default limits  Topic capacity  Retention  Partitions Tenant 3 Traffic Manager 3 Tenant 2 Traffic Manager 2 Siphon DU 1 Collector HLC Siphon DU 2 Collector HLC Siphon DU 3 Collector HLC Tenant 1 Traffic Manager 1
  13. 13.  Scalability  Underlying infra is IaaS  Isolation  Availability and Latency SLA  Regulatory compliance guarantees  Enterprise cloud depends on data security & privacy  Regulatory framework for certifications e.g. SOC, FEDRAMP, HIPAA  Data sharing  Manageability  Provisioning  Monitoring  Maintainability
  14. 14.  Comments / Feedback  https://www.linkedin.com/in/tomalex/  tomalex@microsoft.com
  15. 15.  Compliance regions  North America  South America  Europe  Asia Pacific  Go Local  Australia  Canada  India  Japan  United Kingdom  Sovereign  Germany  China  Government
  16. 16.  Self-service  Tenant creation & management  Topic creation & management  Topic health & data preview  Subscription creation & management  AuthN  Azure AD based for Self-service API & UI  Cert based for data producers and consumers  AuthZ  Siphon Metadata used to authorize provisioning & management (tenants, topics, etc.)  Kafka ACLs for topic level access control  Throttling  EventServer throttles based on quota limit  Monitoring  Operational metrics in a single system (MDM) for monitoring and alerting  Data quality  Audit Trail system for e2e latency and completeness monitoring

×