SlideShare a Scribd company logo
1 of 34
Download to read offline
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Building a Streaming
Microservices
Architecture
With Apache Spark Structured Streaming & Friends
Scott Haines
Senior Principal Software Engineer, Twilio
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
A little about me.
• I work at Twilio building massive data systems
• I run a bi-weekly internal Spark Office Hours where I offer
training and guidance to teams at the company
• >12 years working on large distributed analytics systems
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
A little about me.
• I work at Twilio building massive data systems
• I run a bi-weekly internal Spark Office Hours where I offer
training and guidance to teams at the company
• >12 years working on large distributed analytics systems
• Published work on Distributed Analytics Systems
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Voice Insights
Accountable observability data and
interactive analytics and insights for
the voice business and customers.
VIRGINIA, USA
DUBLIN, IRELAND
SINGAPORE
SYDNEY, AUSTRALIA
TOKYO, JAPAN
SAO PAULO, BRAZIL
DATA CENT ERS
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Events per Second
>1MIL
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Let’s Build a Reliable Data Architecture
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Goal: Reliable E2E Streaming Data Pipeline
GRPC Client
GRPC Server GRPC Server GRPC Server
1
2
3
Kafka Broker
4
Kafka Broker
5
6
Spark Application
7 8
HDFS
S39
HTTP /2
@newfront
Strong and Reliable Data starts at Ingest
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Of people love
JSON*.
>95%
{
"type": "CallEvent",
“call_sid”: "CA123",
"attributes": [
{
“account_sid”: “AC123”
“start_ms": 123,
“end_ms": "435"
}
]
}
• JSON has Structure.
• But JSON isn't strictly Structured Data.
Structured Data
@newfront @twilio
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Of people saddened by
bad data*
100%
{
"type": "CallEvent",
“call_sid": “CA234”,
“attributes":"oops"
}
• JSON has poor runtime guarantees due to
its flexible nature. Optimize for compile time
guarantees.
• Debugging corrupt data in a large
distributed system ruins hopes and dreams.
Structured Data
@newfront @twilio
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Protocol Buffers
message CallEvent {
uint64 created_ms = 1;
string call_sid = 2;
uint64 account_sid = 3;
EventType event_type = 4;
Region region = 5;
}
• Well Defined Events tell their own Story
• Type-Safety Rules
• Versioning your API / Pipeline / Data
now just means sticking to a version of
your schema
• Rely on Releases for versioning
• Interoperable with most major languages
(java/scala/c++/go/obj-c/node-js/
python/...)
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Protocol Buffers
message CallEvent {
uint64 created_ms = 1;
string call_sid = 2;
uint64 account_sid = 3;
EventType event_type = 4;
Region region = 5;
}
• Data Accountability
• Lightning Fast Serialization /
Deserialization
• Plays with nicely gRPC
• Interoperable with Spark SQL
• Like “JSON with Guard Rails”
Of people like when
things work between
releases!
100%
Take Aways
Data Engineers love gRPC
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
*technically not universally accepted
@newfront
GRPC Client
GRPC Server GRPC Server GRPC Server
HTTP /2
gRPC | saves time
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
GRPC Client
GRPC Server GRPC Server GRPC Server
HTTP /2
gRPC | saves time
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
val time = “$$$”
GRPC Client
GRPC Server GRPC Server GRPC Server
HTTP /2
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
GRPC.
GRPC // nutshell
• RPC = remote procedure call. “G” stands
for generic or Google
• Great for Internal Services
• High Performance
• Compact Binary Exchange Format
• Compile Idiomatic API Definitions
• Capable of Bi-Directional Streaming
• Pluggable HTTP/2 transport
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
GRPC.
• Building a CallEvent Service.
• 1: Define your messages (call.proto)
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
GRPC.
• Building a CallEvent Service.
• 1: Define your messages (call.proto)
• 2: Define your services (service.proto)
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
GRPC.
• Building a CallEvent Service.
• 1: Define your messages (call.proto)
• 2: Define your services (service.proto)
• 3: Compile your messages and service
stubs.
sbt clean compile package publishLocal
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
GRPC.
• Building a CallEvent Service.
• 1: Define your messages (call.proto)
• 2: Define your services (service.proto)
• 3: Compile your messages and service
stubs.
• 4: Implement Traits and Run!
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
GRPC.
• Building a CallEvent Service.
• Client SDKs essentially write themselves.
• JSON <-> Protobuf is still possible for
maintaining customer facing APIs
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Protocol Gatewaymessage T<:Event { … }
protobuf @ version
Kafka Broker
gRPC
Common Pattern Emerges
Protocol Streams
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Still Building…
GRPC Client
GRPC Server GRPC Server GRPC Server
1
2
3
Kafka Broker
4
Kafka Broker
5
6
Spark Application
7 8
HDFS
S39
HTTP /2
@newfront
GRPC Client
GRPC Server GRPC Server GRPC Server
1
2
3
Kafka Broker
4
Kafka Broker
5
6
Spark Application
7 8
9
HTTP /2
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Kafka + Protobuf + Spark
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Kafka.
• Solve for common Data Pipeline Problems
• Partition Keys are important. Use them to
your advantage
• Spark AQE - adaptive query execution can
handle hot spots. Non-spark services not-
so-much.
Topic: CallEvents
key: call_sid
partitions*: n
* number of partitions: factor of producer records/s * avg(record.bytesize)
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Spark.
• Use ScalaPB’s ExpressionEncoders to
natively convert protobuf to Catalyst
Optimizable DataFrames
• Marry this with Sparks Kafka
DataSource Reader/Writer
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Spark.
• Behind the Scenes…
• Spark is doing magic
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Spark.
• End to End Tests ensure you can press
the release button anywhere in the
pipeline.
• Can be automated to ensure your Spark
Apps can continue working with any
changes to the upstream gRPC
• Can use for Canary testing updates
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Spark.
• Use ScalaPB to read local streams
• Ensure your Spark Apps can continue
working with any new protobuf
dependencies
• Test simple reads and complex
aggregations locally, deploy globally
@newfront
GRPC Client
GRPC Server GRPC Server GRPC Server
1
Kafka Broker
4
Kafka Broker
5
6
Spark Application
7 8
HDFS
S39
TP /2
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Spark & Beyond
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Spark + Friends.
• Proto to Catalyst DataFrame
• Conversion to Parquet in Delta
Table
• Partitioned by Date
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Spark + Friends.
• Now it is easy for Downstream
applications to pick up using
Parquet Protocol Streams
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Rinse and Repeat
@newfront
Just keep adding new Flows.
1. Define your Structured Data
2. Define your service definitions
3. Emit Data / Enqueue to Kafka
4. Read and Drop in HDFS (delta) or
pass along to a new Topic
Kafka Topic Kafka Topic
Spark Application Spark Application Spark Application
Kafka Topic
Data Table Data Table
Spark Application
GRPC Server
THANK YOU
@newfront

More Related Content

What's hot

Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDataWorks Summit
 
Delta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkDelta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkGeorge Chow
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaDatabricks
 
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesData Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesDatabricks
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Databricks
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringDatabricks
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTGuido Schmutz
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?Jeraldine Phneah
 
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...Databricks
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesDatabricks
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraDataStax Academy
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpJoseph Arriola
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudDatabricks
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data IntegrationsPat Patterson
 

What's hot (20)

Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druid
 
Delta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkDelta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache Spark
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks Delta
 
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesData Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive Approaches
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data Engineering
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
LinkedIn2
LinkedIn2LinkedIn2
LinkedIn2
 
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data Lakes
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with Cassandra
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
 
Instrumenting your Instruments
Instrumenting your Instruments Instrumenting your Instruments
Instrumenting your Instruments
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to Cloud
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 

Similar to Building a Streaming Microservices Architecture - Data + AI Summit EU 2020

Integrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and CamelIntegrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and CamelJustin Reock
 
Oracle Modern AppDev Approach to Cloud & Container Native App
Oracle Modern AppDev Approach to Cloud & Container Native AppOracle Modern AppDev Approach to Cloud & Container Native App
Oracle Modern AppDev Approach to Cloud & Container Native AppPaulo Alberto Simoes ∴
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyStreamNative
 
Pulsar summit-keynote-final
Pulsar summit-keynote-finalPulsar summit-keynote-final
Pulsar summit-keynote-finalKarthik Ramasamy
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfAmazon Web Services
 
CI/CD for Containers: A Way Forward for Your DevOps Pipeline
CI/CD for Containers: A Way Forward for Your DevOps PipelineCI/CD for Containers: A Way Forward for Your DevOps Pipeline
CI/CD for Containers: A Way Forward for Your DevOps PipelineAmazon Web Services
 
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - PivotalOpenStack Korea Community
 
Kamailio practice Quobis-University of Vigo Laboratory of Commutation 2012-2...
Kamailio practice Quobis-University of Vigo Laboratory of Commutation  2012-2...Kamailio practice Quobis-University of Vigo Laboratory of Commutation  2012-2...
Kamailio practice Quobis-University of Vigo Laboratory of Commutation 2012-2...Quobis
 
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...Splunk
 
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...Harry McLaren
 
Cisco Connect Ottawa 2018 dev net
Cisco Connect Ottawa 2018 dev netCisco Connect Ottawa 2018 dev net
Cisco Connect Ottawa 2018 dev netCisco Canada
 
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...Amazon Web Services
 
Analytics im DevOps Lebenszyklus
Analytics im DevOps LebenszyklusAnalytics im DevOps Lebenszyklus
Analytics im DevOps LebenszyklusSplunk
 
Emulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersEmulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersCisco DevNet
 
AWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsAWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsCobus Bernard
 
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK Framework
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK FrameworkLeveraging Splunk Enterprise Security with the MITRE’s ATT&CK Framework
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK FrameworkSplunk
 
TechWiseTV Workshop: Cisco Hybrid Cloud Platform for Google Cloud
TechWiseTV Workshop:  Cisco Hybrid Cloud Platform for Google CloudTechWiseTV Workshop:  Cisco Hybrid Cloud Platform for Google Cloud
TechWiseTV Workshop: Cisco Hybrid Cloud Platform for Google CloudRobb Boyd
 

Similar to Building a Streaming Microservices Architecture - Data + AI Summit EU 2020 (20)

Apache Pulsar @Splunk
Apache Pulsar @SplunkApache Pulsar @Splunk
Apache Pulsar @Splunk
 
Integrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and CamelIntegrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and Camel
 
Oracle Modern AppDev Approach to Cloud & Container Native App
Oracle Modern AppDev Approach to Cloud & Container Native AppOracle Modern AppDev Approach to Cloud & Container Native App
Oracle Modern AppDev Approach to Cloud & Container Native App
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik Ramasamy
 
Pulsar summit-keynote-final
Pulsar summit-keynote-finalPulsar summit-keynote-final
Pulsar summit-keynote-final
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdf
 
Netflix MSA and Pivotal
Netflix MSA and PivotalNetflix MSA and Pivotal
Netflix MSA and Pivotal
 
CI/CD for Containers: A Way Forward for Your DevOps Pipeline
CI/CD for Containers: A Way Forward for Your DevOps PipelineCI/CD for Containers: A Way Forward for Your DevOps Pipeline
CI/CD for Containers: A Way Forward for Your DevOps Pipeline
 
CI/CD for Modern Applications
CI/CD for Modern ApplicationsCI/CD for Modern Applications
CI/CD for Modern Applications
 
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
 
Kamailio practice Quobis-University of Vigo Laboratory of Commutation 2012-2...
Kamailio practice Quobis-University of Vigo Laboratory of Commutation  2012-2...Kamailio practice Quobis-University of Vigo Laboratory of Commutation  2012-2...
Kamailio practice Quobis-University of Vigo Laboratory of Commutation 2012-2...
 
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...
 
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
 
Cisco Connect Ottawa 2018 dev net
Cisco Connect Ottawa 2018 dev netCisco Connect Ottawa 2018 dev net
Cisco Connect Ottawa 2018 dev net
 
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
 
Analytics im DevOps Lebenszyklus
Analytics im DevOps LebenszyklusAnalytics im DevOps Lebenszyklus
Analytics im DevOps Lebenszyklus
 
Emulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersEmulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API Providers
 
AWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsAWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applications
 
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK Framework
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK FrameworkLeveraging Splunk Enterprise Security with the MITRE’s ATT&CK Framework
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK Framework
 
TechWiseTV Workshop: Cisco Hybrid Cloud Platform for Google Cloud
TechWiseTV Workshop:  Cisco Hybrid Cloud Platform for Google CloudTechWiseTV Workshop:  Cisco Hybrid Cloud Platform for Google Cloud
TechWiseTV Workshop: Cisco Hybrid Cloud Platform for Google Cloud
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 

Recently uploaded (20)

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 

Building a Streaming Microservices Architecture - Data + AI Summit EU 2020

  • 1. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Building a Streaming Microservices Architecture With Apache Spark Structured Streaming & Friends Scott Haines Senior Principal Software Engineer, Twilio @newfront
  • 2. © 2019 TWILIO INC. ALL RIGHTS RESERVED. A little about me. • I work at Twilio building massive data systems • I run a bi-weekly internal Spark Office Hours where I offer training and guidance to teams at the company • >12 years working on large distributed analytics systems @newfront
  • 3. © 2019 TWILIO INC. ALL RIGHTS RESERVED. A little about me. • I work at Twilio building massive data systems • I run a bi-weekly internal Spark Office Hours where I offer training and guidance to teams at the company • >12 years working on large distributed analytics systems • Published work on Distributed Analytics Systems @newfront
  • 4. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Voice Insights Accountable observability data and interactive analytics and insights for the voice business and customers.
  • 5. VIRGINIA, USA DUBLIN, IRELAND SINGAPORE SYDNEY, AUSTRALIA TOKYO, JAPAN SAO PAULO, BRAZIL DATA CENT ERS © 2019 TWILIO INC. ALL RIGHTS RESERVED. Events per Second >1MIL
  • 6. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Let’s Build a Reliable Data Architecture
  • 7. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Goal: Reliable E2E Streaming Data Pipeline GRPC Client GRPC Server GRPC Server GRPC Server 1 2 3 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 HDFS S39 HTTP /2 @newfront
  • 8. Strong and Reliable Data starts at Ingest © 2019 TWILIO INC. ALL RIGHTS RESERVED. @newfront
  • 9. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Of people love JSON*. >95% { "type": "CallEvent", “call_sid”: "CA123", "attributes": [ { “account_sid”: “AC123” “start_ms": 123, “end_ms": "435" } ] } • JSON has Structure. • But JSON isn't strictly Structured Data. Structured Data @newfront @twilio
  • 10. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Of people saddened by bad data* 100% { "type": "CallEvent", “call_sid": “CA234”, “attributes":"oops" } • JSON has poor runtime guarantees due to its flexible nature. Optimize for compile time guarantees. • Debugging corrupt data in a large distributed system ruins hopes and dreams. Structured Data @newfront @twilio
  • 11. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Protocol Buffers message CallEvent { uint64 created_ms = 1; string call_sid = 2; uint64 account_sid = 3; EventType event_type = 4; Region region = 5; } • Well Defined Events tell their own Story • Type-Safety Rules • Versioning your API / Pipeline / Data now just means sticking to a version of your schema • Rely on Releases for versioning • Interoperable with most major languages (java/scala/c++/go/obj-c/node-js/ python/...)
  • 12. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Protocol Buffers message CallEvent { uint64 created_ms = 1; string call_sid = 2; uint64 account_sid = 3; EventType event_type = 4; Region region = 5; } • Data Accountability • Lightning Fast Serialization / Deserialization • Plays with nicely gRPC • Interoperable with Spark SQL • Like “JSON with Guard Rails” Of people like when things work between releases! 100% Take Aways
  • 13. Data Engineers love gRPC © 2019 TWILIO INC. ALL RIGHTS RESERVED. *technically not universally accepted @newfront
  • 14. GRPC Client GRPC Server GRPC Server GRPC Server HTTP /2 gRPC | saves time © 2019 TWILIO INC. ALL RIGHTS RESERVED.
  • 15. GRPC Client GRPC Server GRPC Server GRPC Server HTTP /2 gRPC | saves time © 2019 TWILIO INC. ALL RIGHTS RESERVED. val time = “$$$”
  • 16. GRPC Client GRPC Server GRPC Server GRPC Server HTTP /2 © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. GRPC // nutshell • RPC = remote procedure call. “G” stands for generic or Google • Great for Internal Services • High Performance • Compact Binary Exchange Format • Compile Idiomatic API Definitions • Capable of Bi-Directional Streaming • Pluggable HTTP/2 transport
  • 17. © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. • Building a CallEvent Service. • 1: Define your messages (call.proto)
  • 18. © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. • Building a CallEvent Service. • 1: Define your messages (call.proto) • 2: Define your services (service.proto) @newfront
  • 19. © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. • Building a CallEvent Service. • 1: Define your messages (call.proto) • 2: Define your services (service.proto) • 3: Compile your messages and service stubs. sbt clean compile package publishLocal
  • 20. © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. • Building a CallEvent Service. • 1: Define your messages (call.proto) • 2: Define your services (service.proto) • 3: Compile your messages and service stubs. • 4: Implement Traits and Run!
  • 21. © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. • Building a CallEvent Service. • Client SDKs essentially write themselves. • JSON <-> Protobuf is still possible for maintaining customer facing APIs
  • 22. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Protocol Gatewaymessage T<:Event { … } protobuf @ version Kafka Broker gRPC Common Pattern Emerges Protocol Streams
  • 23. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Still Building… GRPC Client GRPC Server GRPC Server GRPC Server 1 2 3 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 HDFS S39 HTTP /2 @newfront
  • 24. GRPC Client GRPC Server GRPC Server GRPC Server 1 2 3 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 9 HTTP /2 © 2019 TWILIO INC. ALL RIGHTS RESERVED. Kafka + Protobuf + Spark
  • 25. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Kafka. • Solve for common Data Pipeline Problems • Partition Keys are important. Use them to your advantage • Spark AQE - adaptive query execution can handle hot spots. Non-spark services not- so-much. Topic: CallEvents key: call_sid partitions*: n * number of partitions: factor of producer records/s * avg(record.bytesize)
  • 26. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark. • Use ScalaPB’s ExpressionEncoders to natively convert protobuf to Catalyst Optimizable DataFrames • Marry this with Sparks Kafka DataSource Reader/Writer
  • 27. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark. • Behind the Scenes… • Spark is doing magic
  • 28. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark. • End to End Tests ensure you can press the release button anywhere in the pipeline. • Can be automated to ensure your Spark Apps can continue working with any changes to the upstream gRPC • Can use for Canary testing updates
  • 29. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark. • Use ScalaPB to read local streams • Ensure your Spark Apps can continue working with any new protobuf dependencies • Test simple reads and complex aggregations locally, deploy globally @newfront
  • 30. GRPC Client GRPC Server GRPC Server GRPC Server 1 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 HDFS S39 TP /2 © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark & Beyond @newfront
  • 31. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark + Friends. • Proto to Catalyst DataFrame • Conversion to Parquet in Delta Table • Partitioned by Date @newfront
  • 32. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark + Friends. • Now it is easy for Downstream applications to pick up using Parquet Protocol Streams @newfront
  • 33. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Rinse and Repeat @newfront Just keep adding new Flows. 1. Define your Structured Data 2. Define your service definitions 3. Emit Data / Enqueue to Kafka 4. Read and Drop in HDFS (delta) or pass along to a new Topic Kafka Topic Kafka Topic Spark Application Spark Application Spark Application Kafka Topic Data Table Data Table Spark Application GRPC Server