SlideShare a Scribd company logo
1 of 36
███████╗██╗ ██╗ ██╗██╗ ██╗
██╔════╝██║ ██║ ██║╚██╗██╔╝
█████╗ ██║ ██║ ██║ ╚███╔╝
██╔══╝ ██║ ██║ ██║ ██╔██╗
██║ ███████╗╚██████╔╝██╔╝
██╗
╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝
Apache Storm
Frictionless Topology Configuration & Deployment
P. Taylor Goetz, Hortonworks
@ptgoetz
Storm BoF - Hadoop Summit Brussels 2015
About me…
• VP - Apache Storm
• ASF Member
• Member of Technical Staff, Hortonworks
What is Flux?
• An easier way to configure and deploy Apache Storm topologies
• A YAML DSL for defining and configuring Storm topologies
• And more…
Why Flux?
Because seeing duplication of
effort makes me sad…
What’s wrong here?
public static void main(String[] args) throws Exception {
String name = "myTopology";
StormTopology topology = getTopology();
// logic to determine if we're running locally or not…
boolean runLocal = shouldRunLocal();
// create necessary config options…
Config conf = new Config();
if(runLocal){
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, topology);
} else {
StormSubmitter.submitTopology(name, conf, topology);
}
}
What’s wrong here?
public static void main(String[] args) throws Exception {
String name = "myTopology";
StormTopology topology = getTopology();
// logic to determine if we're running locally or not…
boolean runLocal = shouldRunLocal();
// create necessary config options…
Config conf = new Config();
if(runLocal){
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, topology);
} else {
StormSubmitter.submitTopology(name, conf, topology);
}
}
• Configuration tightly coupled with code.
• Changes require recompilation & repackaging.
Wouldn’t this be easier?
storm jar mycomponents.jar org.apache.storm.flux.Flux --local config.yaml
OR…
storm jar mycomponents.jar org.apache.storm.flux.Flux --remote config.yaml
Flux allows you to package all your Storm components once.
Then wire, configure, and deploy topologies using a YAML
definition.
Flux Features
• Easily configure and deploy Storm topologies (Both Storm core and Microbatch
API) without embedding configuration in your topology code
• Support for existing topology code
• Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL
• YAML DSL support for virtually any Storm component (storm-kafka, storm-hdfs,
storm-hbase, etc.)
• Convenient support for multi-lang components
• External property substitution/filtering for easily switching between
configurations/environments (similar to Maven-style ${variable.name}
substitution)
Flux YAML DSL
YAML Definition Consists of:
• Topology Name (1)
• Includes (0…*)
• Config Map (0…1)
• Components (0…*)
• Spouts (1…*)
• Bolts (1…*)
• Streams (1…*)
Flux YAML DSL
Config
A Map-of-Maps (Objects) that will be passed to the topology at
submission time (Storm config).
# topology name
name: “myTopology"
# topology configuration
config:
topology.workers: 5
topology.max.spout.pending: 1000
# ...
Components
• Catalog (list/map) of Objects that can be used/referenced in other
parts of the YAML configuration
• Roughly analogous to Spring beans.
Components
Simple Java class with default constructor:
# Components
components:
- id: "stringScheme"
className: "storm.kafka.StringScheme"
Components: Constructor Arguments
Component classes can be instantiate with “constructorArgs” (a list of
class constructor arguments):
# Components
components:
- id: "zkHosts"
className: "storm.kafka.ZkHosts"
constructorArgs:
- "localhost:2181"
Components: References
Components can be “referenced” throughout the YAML config and
used as arguments:
# Components
components:
- id: "stringScheme"
className: "storm.kafka.StringScheme"
- id: "stringMultiScheme"
className: "backtype.storm.spout.SchemeAsMultiScheme"
constructorArgs:
- ref: "stringScheme"
Components: Properties
Components can be configured using JavaBean setter methods and
public instance variables:
- id: "spoutConfig"
className: "storm.kafka.SpoutConfig"
properties:
- name: "forceFromStart"
value: true
- name: "scheme"
ref: "stringMultiScheme"
Components: Config Methods
Call arbitrary methods to configure a component:
- id: "recordFormat"
className:
"org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat"
configMethods:
- name: "withFieldDelimiter"
args: ["|"]
References can be used here as well.
Spouts
A list of objects that implement the IRichSpout interface and an
associated parallelism setting.
# spout definitions
spouts:
- id: "sentence-spout"
className: "org.apache.storm.flux.wrappers.spouts.FluxShellSpout"
# shell spout constructor takes 2 arguments: String[], String[]
constructorArgs:
# command line
- ["node", "randomsentence.js"]
# output fields
- ["word"]
parallelism: 1
# ...
Bolts
A list of objects that implement the IRichBolt or IBasicBolt interface with
an associated parallelism setting.
# bolt definitions
bolts:
- id: "splitsentence"
className: "org.apache.storm.flux.wrappers.bolts.FluxShellBolt"
constructorArgs:
# command line
- ["python", "splitsentence.py"]
# output fields
- ["word"]
parallelism: 1
# ...
- id: "count"
className: "backtype.storm.testing.TestWordCounter"
parallelism: 1
# ...
Spout and Bolt definitions are just extensions of
“Component” with a “parallelism” attribute, so all
component features (references, constructor
args, properties, config methods) can be used.
Streams
• Represent Spout-to-Bolt and Bolt-to-Bolt connections
• In graph terms: “edges”
• Also define Stream Groupings:
• ALL, CUSTOM, DIRECT, SHUFFLE, LOCAL_OR_SHUFFLE,
FIELDS, GLOBAL, or NONE.
Streams
Custom stream grouping:
- from: "bolt-1"
to: "bolt-2"
grouping:
type: CUSTOM
customClass:
className: "backtype.storm.testing.NGrouping"
constructorArgs:
- 1
Again, you can use references, properties, and config methods.
Filtering/Variable Substitution
Define properties in an external properties file, and reference them in
YAML using ${} syntax:
- id: "rotationAction"
className:
"org.apache.storm.hdfs.common.rotation.MoveFileAction"
configMethods:
- name: "toDestination"
args: ["${hdfs.dest.dir}"]
Will get replaced with value of property prior to YAML parsing.
Filtering/Variable Substitution
Environment variables can be referenced in YAML using ${ENV-}
syntax:
- id: "rotationAction"
className:
"org.apache.storm.hdfs.common.rotation.MoveFileAction"
configMethods:
- name: "toDestination"
args: [“${ENV-HDFS_DIR}”]
Will get replaced with value of $HDFS_DIR env variable prior to YAML parsing.
File Includes and Overrides
Include files/classpath resources and optionally override values:
name: "include-topology"
includes:
- resource: true
file: "/configs/shell_test.yaml"
override: false #otherwise subsequent includes that define 'name'
would override
Existing Topologies
&
Trident Topologies
Existing Topologies
• Alternative to YAML Spout/Bolt/Stream DSL
• Same syntax
• Works with transactional/micro-batch (Trident) topologies
• Tell Flux about the class that will produce your topology
• Components, references, constructor args, properties, config
methods, etc. can all be used
Existing Topologies
Provide a class with a public method that returns a StormTopology
instance:
/**
* Marker interface for objects that can produce `StormTopology` objects.
*
* If a `topology-source` class implements the `getTopology()` method, Flux will
* call that method. Otherwise, it will introspect the given class and look for a
* similar method that produces a `StormTopology` instance.
*
* Note that it is not strictly necessary for a class to implement this interface.
* If a class defines a method with a similar signature, Flux should be able to find
* and invoke it.
*
*/
public interface TopologySource {
public StormTopology getTopology(Map<String, Object> config);
}
This can be a Spout/Bolt or Trident topology.
Existing Topologies
Define a topologySource to tell Flux how to configure the class that
creates the topology:
# configuration that uses an existing topology that does not implement
TopologySource
name: "existing-topology"
topologySource:
className: "org.apache.storm.flux.test.SimpleTopology"
methodName: "getTopologyWithDifferentMethodName"
constructorArgs:
- "foo"
- "bar"
Components, references, constructor args, properties,
config methods, etc. can all be used.
Flux Usage
• Add the Flux dependency to your project.
• Use the Maven shade plugin to create a fat jar file.
• Use the `storm` command to run (locally) or deploy (remotely) your
topology:
storm jar mycomponents.jar org.apache.storm.flux.Flux [options] <config file>
Flux Usage: Command Line Options
usage: storm jar <my_topology_uber_jar.jar> org.apache.storm.flux.Flux [options] <topology-config.yaml>
-d,--dry-run Do not run or deploy the topology. Just build, validate, and print information about the
topology.
-e,--env-filter Perform environment variable substitution. Replace keysidentified with `${ENV-[NAME]}` will be
replaced with the corresponding `NAME` environment value
-f,--filter <file> Perform property substitution. Use the specified file as a source of properties, and replace
keys identified with {$[property name]} with the value defined in the properties file.
-i,--inactive Deploy the topology, but do not activate it.
-l,--local Run the topology in local mode.
-n,--no-splash Suppress the printing of the splash screen.
-q,--no-detail Suppress the printing of topology details.
-r,--remote Deploy the topology to a remote cluster.
-R,--resource Treat the supplied path as a class path resource instead of a file.
-s,--sleep <ms> When running locally, the amount of time to sleep (in ms.) before killing the topology and
shutting down the local cluster.
-z,--zookeeper <host:port> When running in local mode, use the ZooKeeper at the specified <host>:<port> instead of the
in-process ZooKeeper. (requires Storm 0.9.3 or later)
With great power comes great
responsibility.
It’s up to you to avoid shooting yourself in the foot!
Feedback/Contributions Welcome
https://github.com/ptgoetz/fluxFlux on GitHub:
Thank you! AMA…
P. Taylor Goetz, Hortonworks
@ptgoetz

More Related Content

What's hot

Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
 
Kotlin coroutines and spring framework
Kotlin coroutines and spring frameworkKotlin coroutines and spring framework
Kotlin coroutines and spring frameworkSunghyouk Bae
 
Python client api
Python client apiPython client api
Python client apidreampuf
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database JonesJohn David Duncan
 
JUnit5 and TestContainers
JUnit5 and TestContainersJUnit5 and TestContainers
JUnit5 and TestContainersSunghyouk Bae
 
Backbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserBackbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserHoward Lewis Ship
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Sunghyouk Bae
 
Zabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet MensZabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet MensNETWAYS
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsAndrei Pangin
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select TopicsJay Coskey
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
XQuery Extensions
XQuery ExtensionsXQuery Extensions
XQuery ExtensionsAaron Buma
 
Developing for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQLDeveloping for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQLJohn David Duncan
 
Scala ActiveRecord
Scala ActiveRecordScala ActiveRecord
Scala ActiveRecordscalaconfjp
 

What's hot (20)

Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
Kotlin coroutines and spring framework
Kotlin coroutines and spring frameworkKotlin coroutines and spring framework
Kotlin coroutines and spring framework
 
Spring data requery
Spring data requerySpring data requery
Spring data requery
 
Python client api
Python client apiPython client api
Python client api
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database Jones
 
JUnit5 and TestContainers
JUnit5 and TestContainersJUnit5 and TestContainers
JUnit5 and TestContainers
 
Backbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserBackbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The Browser
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
 
Scala active record
Scala active recordScala active record
Scala active record
 
Zabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet MensZabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet Mens
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
 
Java 7 New Features
Java 7 New FeaturesJava 7 New Features
Java 7 New Features
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
XQuery Extensions
XQuery ExtensionsXQuery Extensions
XQuery Extensions
 
Developing for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQLDeveloping for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQL
 
Polyglot Persistence
Polyglot PersistencePolyglot Persistence
Polyglot Persistence
 
XQuery Design Patterns
XQuery Design PatternsXQuery Design Patterns
XQuery Design Patterns
 
Scala ActiveRecord
Scala ActiveRecordScala ActiveRecord
Scala ActiveRecord
 

Similar to Flux: Apache Storm Frictionless Topology Configuration & Deployment

Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Jonathon Brouse
 
Comprehensive Terraform Training
Comprehensive Terraform TrainingComprehensive Terraform Training
Comprehensive Terraform TrainingYevgeniy Brikman
 
Infrastructure as code deployed using Stacker
Infrastructure as code deployed using StackerInfrastructure as code deployed using Stacker
Infrastructure as code deployed using StackerMessageMedia
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache MesosJoe Stein
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Prajal Kulkarni
 
Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with ClojureJohn Stevenson
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentationIlya Bogunov
 
Terraform infraestructura como código
Terraform infraestructura como códigoTerraform infraestructura como código
Terraform infraestructura como códigoVictor Adsuar
 
Testing NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoverageTesting NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoveragemlilley
 
DZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsDZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsSpeedment, Inc.
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing UpDavid Padbury
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSpark Summit
 

Similar to Flux: Apache Storm Frictionless Topology Configuration & Deployment (20)

Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017
 
Lobos Introduction
Lobos IntroductionLobos Introduction
Lobos Introduction
 
Comprehensive Terraform Training
Comprehensive Terraform TrainingComprehensive Terraform Training
Comprehensive Terraform Training
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Infrastructure as code deployed using Stacker
Infrastructure as code deployed using StackerInfrastructure as code deployed using Stacker
Infrastructure as code deployed using Stacker
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with Clojure
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
 
Jstl Guide
Jstl GuideJstl Guide
Jstl Guide
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Terraform infraestructura como código
Terraform infraestructura como códigoTerraform infraestructura como código
Terraform infraestructura como código
 
Testing NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoverageTesting NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoverage
 
DZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsDZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using Streams
 
Storm
StormStorm
Storm
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSource
 

More from P. Taylor Goetz

From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...P. Taylor Goetz
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormP. Taylor Goetz
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache StormP. Taylor Goetz
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 

More from P. Taylor Goetz (8)

From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Flux: Apache Storm Frictionless Topology Configuration & Deployment

  • 1. ███████╗██╗ ██╗ ██╗██╗ ██╗ ██╔════╝██║ ██║ ██║╚██╗██╔╝ █████╗ ██║ ██║ ██║ ╚███╔╝ ██╔══╝ ██║ ██║ ██║ ██╔██╗ ██║ ███████╗╚██████╔╝██╔╝ ██╗ ╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝ Apache Storm Frictionless Topology Configuration & Deployment P. Taylor Goetz, Hortonworks @ptgoetz Storm BoF - Hadoop Summit Brussels 2015
  • 2. About me… • VP - Apache Storm • ASF Member • Member of Technical Staff, Hortonworks
  • 3. What is Flux? • An easier way to configure and deploy Apache Storm topologies • A YAML DSL for defining and configuring Storm topologies • And more…
  • 5. Because seeing duplication of effort makes me sad…
  • 6. What’s wrong here? public static void main(String[] args) throws Exception { String name = "myTopology"; StormTopology topology = getTopology(); // logic to determine if we're running locally or not… boolean runLocal = shouldRunLocal(); // create necessary config options… Config conf = new Config(); if(runLocal){ LocalCluster cluster = new LocalCluster(); cluster.submitTopology(name, conf, topology); } else { StormSubmitter.submitTopology(name, conf, topology); } }
  • 7. What’s wrong here? public static void main(String[] args) throws Exception { String name = "myTopology"; StormTopology topology = getTopology(); // logic to determine if we're running locally or not… boolean runLocal = shouldRunLocal(); // create necessary config options… Config conf = new Config(); if(runLocal){ LocalCluster cluster = new LocalCluster(); cluster.submitTopology(name, conf, topology); } else { StormSubmitter.submitTopology(name, conf, topology); } } • Configuration tightly coupled with code. • Changes require recompilation & repackaging.
  • 8. Wouldn’t this be easier? storm jar mycomponents.jar org.apache.storm.flux.Flux --local config.yaml OR… storm jar mycomponents.jar org.apache.storm.flux.Flux --remote config.yaml
  • 9. Flux allows you to package all your Storm components once. Then wire, configure, and deploy topologies using a YAML definition.
  • 10. Flux Features • Easily configure and deploy Storm topologies (Both Storm core and Microbatch API) without embedding configuration in your topology code • Support for existing topology code • Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL • YAML DSL support for virtually any Storm component (storm-kafka, storm-hdfs, storm-hbase, etc.) • Convenient support for multi-lang components • External property substitution/filtering for easily switching between configurations/environments (similar to Maven-style ${variable.name} substitution)
  • 11. Flux YAML DSL YAML Definition Consists of: • Topology Name (1) • Includes (0…*) • Config Map (0…1) • Components (0…*) • Spouts (1…*) • Bolts (1…*) • Streams (1…*)
  • 13. Config A Map-of-Maps (Objects) that will be passed to the topology at submission time (Storm config). # topology name name: “myTopology" # topology configuration config: topology.workers: 5 topology.max.spout.pending: 1000 # ...
  • 14. Components • Catalog (list/map) of Objects that can be used/referenced in other parts of the YAML configuration • Roughly analogous to Spring beans.
  • 15. Components Simple Java class with default constructor: # Components components: - id: "stringScheme" className: "storm.kafka.StringScheme"
  • 16. Components: Constructor Arguments Component classes can be instantiate with “constructorArgs” (a list of class constructor arguments): # Components components: - id: "zkHosts" className: "storm.kafka.ZkHosts" constructorArgs: - "localhost:2181"
  • 17. Components: References Components can be “referenced” throughout the YAML config and used as arguments: # Components components: - id: "stringScheme" className: "storm.kafka.StringScheme" - id: "stringMultiScheme" className: "backtype.storm.spout.SchemeAsMultiScheme" constructorArgs: - ref: "stringScheme"
  • 18. Components: Properties Components can be configured using JavaBean setter methods and public instance variables: - id: "spoutConfig" className: "storm.kafka.SpoutConfig" properties: - name: "forceFromStart" value: true - name: "scheme" ref: "stringMultiScheme"
  • 19. Components: Config Methods Call arbitrary methods to configure a component: - id: "recordFormat" className: "org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat" configMethods: - name: "withFieldDelimiter" args: ["|"] References can be used here as well.
  • 20. Spouts A list of objects that implement the IRichSpout interface and an associated parallelism setting. # spout definitions spouts: - id: "sentence-spout" className: "org.apache.storm.flux.wrappers.spouts.FluxShellSpout" # shell spout constructor takes 2 arguments: String[], String[] constructorArgs: # command line - ["node", "randomsentence.js"] # output fields - ["word"] parallelism: 1 # ...
  • 21. Bolts A list of objects that implement the IRichBolt or IBasicBolt interface with an associated parallelism setting. # bolt definitions bolts: - id: "splitsentence" className: "org.apache.storm.flux.wrappers.bolts.FluxShellBolt" constructorArgs: # command line - ["python", "splitsentence.py"] # output fields - ["word"] parallelism: 1 # ... - id: "count" className: "backtype.storm.testing.TestWordCounter" parallelism: 1 # ...
  • 22. Spout and Bolt definitions are just extensions of “Component” with a “parallelism” attribute, so all component features (references, constructor args, properties, config methods) can be used.
  • 23. Streams • Represent Spout-to-Bolt and Bolt-to-Bolt connections • In graph terms: “edges” • Also define Stream Groupings: • ALL, CUSTOM, DIRECT, SHUFFLE, LOCAL_OR_SHUFFLE, FIELDS, GLOBAL, or NONE.
  • 24. Streams Custom stream grouping: - from: "bolt-1" to: "bolt-2" grouping: type: CUSTOM customClass: className: "backtype.storm.testing.NGrouping" constructorArgs: - 1 Again, you can use references, properties, and config methods.
  • 25. Filtering/Variable Substitution Define properties in an external properties file, and reference them in YAML using ${} syntax: - id: "rotationAction" className: "org.apache.storm.hdfs.common.rotation.MoveFileAction" configMethods: - name: "toDestination" args: ["${hdfs.dest.dir}"] Will get replaced with value of property prior to YAML parsing.
  • 26. Filtering/Variable Substitution Environment variables can be referenced in YAML using ${ENV-} syntax: - id: "rotationAction" className: "org.apache.storm.hdfs.common.rotation.MoveFileAction" configMethods: - name: "toDestination" args: [“${ENV-HDFS_DIR}”] Will get replaced with value of $HDFS_DIR env variable prior to YAML parsing.
  • 27. File Includes and Overrides Include files/classpath resources and optionally override values: name: "include-topology" includes: - resource: true file: "/configs/shell_test.yaml" override: false #otherwise subsequent includes that define 'name' would override
  • 29. Existing Topologies • Alternative to YAML Spout/Bolt/Stream DSL • Same syntax • Works with transactional/micro-batch (Trident) topologies • Tell Flux about the class that will produce your topology • Components, references, constructor args, properties, config methods, etc. can all be used
  • 30. Existing Topologies Provide a class with a public method that returns a StormTopology instance: /** * Marker interface for objects that can produce `StormTopology` objects. * * If a `topology-source` class implements the `getTopology()` method, Flux will * call that method. Otherwise, it will introspect the given class and look for a * similar method that produces a `StormTopology` instance. * * Note that it is not strictly necessary for a class to implement this interface. * If a class defines a method with a similar signature, Flux should be able to find * and invoke it. * */ public interface TopologySource { public StormTopology getTopology(Map<String, Object> config); } This can be a Spout/Bolt or Trident topology.
  • 31. Existing Topologies Define a topologySource to tell Flux how to configure the class that creates the topology: # configuration that uses an existing topology that does not implement TopologySource name: "existing-topology" topologySource: className: "org.apache.storm.flux.test.SimpleTopology" methodName: "getTopologyWithDifferentMethodName" constructorArgs: - "foo" - "bar" Components, references, constructor args, properties, config methods, etc. can all be used.
  • 32. Flux Usage • Add the Flux dependency to your project. • Use the Maven shade plugin to create a fat jar file. • Use the `storm` command to run (locally) or deploy (remotely) your topology: storm jar mycomponents.jar org.apache.storm.flux.Flux [options] <config file>
  • 33. Flux Usage: Command Line Options usage: storm jar <my_topology_uber_jar.jar> org.apache.storm.flux.Flux [options] <topology-config.yaml> -d,--dry-run Do not run or deploy the topology. Just build, validate, and print information about the topology. -e,--env-filter Perform environment variable substitution. Replace keysidentified with `${ENV-[NAME]}` will be replaced with the corresponding `NAME` environment value -f,--filter <file> Perform property substitution. Use the specified file as a source of properties, and replace keys identified with {$[property name]} with the value defined in the properties file. -i,--inactive Deploy the topology, but do not activate it. -l,--local Run the topology in local mode. -n,--no-splash Suppress the printing of the splash screen. -q,--no-detail Suppress the printing of topology details. -r,--remote Deploy the topology to a remote cluster. -R,--resource Treat the supplied path as a class path resource instead of a file. -s,--sleep <ms> When running locally, the amount of time to sleep (in ms.) before killing the topology and shutting down the local cluster. -z,--zookeeper <host:port> When running in local mode, use the ZooKeeper at the specified <host>:<port> instead of the in-process ZooKeeper. (requires Storm 0.9.3 or later)
  • 34. With great power comes great responsibility. It’s up to you to avoid shooting yourself in the foot!
  • 36. Thank you! AMA… P. Taylor Goetz, Hortonworks @ptgoetz