SlideShare a Scribd company logo
1 of 42
Download to read offline
Build Large Scale Applications in YARN with
Henry Saputra (@Kingwulf) - hsaputra@apache.org
Terence Yim (@chtyim) - chtyim@apache.org
Apache Big Data Conference - North America - 2016
Disclaimer
Apache Twill is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by
Incubator. Incubation is required of all newly accepted projects until a further review indicates that the
infrastructure, communications, and decision making process have stabilized in a manner consistent with
other successful ASF projects. While incubation status is not necessarily a reflection of the completeness
or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF
Agenda
● Why Apache Twill?
● Architecture and Components
● Features
● Real World Enterprise Use Cases - CDAP
● Roadmap
● Q & A
Apache Hadoop ® YARN
● MapReduce NextGen aka MRv2
● New ResourceManager manages the global assignment of compute
resources to applications
● Introduce concept of ApplicationMaster per application to communicate
with ResourceManager for compute resource management
● http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-
site/index.html
Apache Hadoop ® YARN Architecture
Developing Application in YARN
● http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-
site/WritingYarnApplications.html
● It is actually not as “simple” as it sounds
● Lots of boilerplates code with very steep learning curve
● Given the power and generic nature of YARN, developing applications
directly on top of YARN could be very difficult
● Standard components:
○ Application Client
○ Application Master
○ Application
Apache Hadoop ® YARN Application
How YARN Application Works
Apache Twill
● Add simplicity to the power of YARN
○ Java thread-like programming model
○ Instead of running in multiple threads, it runs in many containers in YARN
● Incubated at Apache Software Foundation since November 2013
○ Has successfully created seven releases
○ http://twill.incubator.apache.org/index.html
Hello World in Twill
public class HelloWorld {
public static class HelloWorldRunnable extends AbstractTwillRunnable {
@Override
public void run() {
LOG.info("Hello World. My first distributed application.");
}
}
public static void main(String[] args) throws Exception {
TwillRunnerService twillRunner = new YarnTwillRunnerService(new YarnConfiguration(), "localhost:2181");
twillRunner.startAndWait();
TwillController controller = twillRunner.prepare(new HelloWorldRunnable())
.addLogHandler(new PrinterLogHandler(new PrintWriter(System.out, true)))
.start();
try {
controller.awaitTermination();
} catch (Exception ex) {
...
}
}
}
Why Apache Twill
● Apache Twill provides abstraction and virtualization for YARN to reduce
complexity to develop complex and distributed large scale applications
● Apache Twill allows developers to leverage the power of YARN by offering
programming paradigms
● Apache Twill offers commons needs for distributed large scale application
development
○ Lifecycle management
○ Service discovery
○ Distributed coordination and resiliency to failures
○ Real-time Logging
Architecture and Components
Main Apache Projects Used
● Apache Hadoop YARN
● Apache Hadoop HDFS
● Apache Zookeeper
● Apache Kafka
Features - 1
Main Features
1. Real-time logging
2. Resource Report
3. State Recovery
4. Elastic Scaling
5. Command Messages
6. Service Discovery
Features -2
Notable New Features
1. 0.5.0-incubating
a. Placement Policy APIs
b. Submitting to non default YARN Queue
c. Distributed Lock
2. 0.6.0-incubating
a. Restart instances of runnables for Twill applications
b. MapR Extension
c. Remove Guava Dependencies from client APIs
3. 0.7.0-incubating
a. Allow setting environment variable on Twill containers
b. Support for Azure Blob Storage
Real-time Logging
Resource Report - 1
/**
* This interface provides a snapshot of the resources an application is using
* broken down by each runnable.
*/
public interface ResourceReport {
// Get all the run resources being used by all instances of the specified runnable.
Collection<TwillRunResources> getRunnableResources(String runnableName);
// Get all the run resources being used across all runnables.
Map<String, Collection<TwillRunResources>> getResources();
// Get the resources application master is using.
TwillRunResources getAppMasterResources();
// Get the id of the application master.
String getApplicationId();
// Get the id of the application master.
List<String> getServices();
}
Resource Report - 2
/**
* Information about the container the {@link TwillRunnable}
* is running in.
*/
public interface TwillRunResources {
int getInstanceId();
int getVirtualCores();
int getMemoryMB();
String getHost();
String getContainerId();
Integer getDebugPort();
LogEntry.Level getLogLevel();
}
Resource Report - 3
● Client get the resource report from Twill using the TwillController.
getResourceReport API to return resource reporting
public interface TwillController extends ServiceController {
...
/**
* Get a snapshot of the resources used by the application, broken down by each runnable.
*
* @return A {@link ResourceReport} containing information about resources used by the application or
* null in case the user calls this before the application completely starts.
*/
@Nullable
ResourceReport getResourceReport();
...
}
State Recovery
Command Messages
Elastic Scaling
● Ability to add or reduce number of YARN containers to run the application
● Twill API TwillController.changeInstances is used to accomplish this
task
/**
* Changes the number of running instances of a given runnable.
*
* @param runnable The name of the runnable.
* @param newCount Number of instances for the given runnable.
* @return A {@link Future} that will be completed when the number running instances has been
* successfully changed. The future will carry the new count as the result. If there is any error
* while changing instances, it'll be reflected in the future.
*/
Future<Integer> changeInstances(String runnable, int newCount);
Service Discovery
Placement Policy API - 1 (New)
● Expose container placement policy from YARN
● Will allow Twill to allocate containers in specific racks and host based on
DISTRIBUTED deployment mode
Placement Policy API - 2 (New)
/**
* Defines a container placement policy.
*/
interface PlacementPolicy {
enum Type {
DISTRIBUTED, DEFAULT
}
Set<String> getNames();
Type getType();
Set<String> getHosts();
Set<String> getRacks();
}
Restart Instances for Twill Runnables - 1 (New)
● Instances of TwillRunnable will be run in YARN containers
● Each Twill application could have one or more instances of
TwillRunnable
● Twill provides ability to restart particular runnable instance without
affecting other runnables
● This is useful when certain runnables are not running well and you would
need to restart certain instances based on the identifier
Restart Instances for Twill Runnables - 2 (New)
/**
* For controlling a running application.
*/
public interface TwillController extends ServiceController {
...
Future<String> restartAllInstances(String runnable);
Future<Set<String>> restartInstances(Map<String, ? extends Set<Integer>> runnableToInstanceIds);
Future<String> restartInstances(String runnable, int instanceId, int... moreInstanceIds);
...
}
Setting Environment Variables on Containers (New)
● Provides ability to set environment variables on the YARN containers where TwillRunnable
instances are running
/**
* This interface exposes methods to set up the Twill runtime environment and start a Twill application.
*/
public interface TwillPreparer {
...
// Adds the set of environment variables that will be set as container environment variables for all runnables.
TwillPreparer withEnv(Map<String, String> env);
/**
* Adds the set of environment variables that will be set as container environment variables for the given runnable.
* Environment variables set through this method has higher precedence than the one set through {@link #withEnv(Map)}
* if there is a key clash.
*/
TwillPreparer withEnv(String runnableName, Map<String, String> env);
...
}
Real World Enterprise Usages - CDAP
● Cask Data Application Platform (CDAP) - http://cdap.io
○ Open source data application framework
○ Simplifies and enhances data application development and management
■ APIs for simplification, portability and standardization
● Works across wide range of Hadoop versions and all common distros
■ Built-in System services, such as metrics and logs aggregation, dataset
management, and distributed transaction service for common big data applications
needs
○ Extensions to enhance user experience
■ Hydrator - Interactive data pipeline construction
■ Tracker - Metadata discovery and data lineage
CDAP Logical View
Apache Twill in CDAP
● CDAP runs different types of work on YARN
○ Long running daemons
○ Real-time transactional streaming framework
○ REST services
○ Workflow execution
● CDAP only interacts with Twill
○ Greatly simplifies the CDAP code base
○ Just a matter of minutes to add support for new type of work to run on YARN
○ Native support of common needs
■ Service discovery
■ Leader election and distributed locking
■ Elastic scaling
■ Security
CDAP Architecture Diagram
Service Discovery
● CDAP exposes all functionalities through REST
● Almost all CDAP HTTP services are running in YARN
○ No fixed host and port.
○ Bind to ephemeral port
○ Announce the host and port through Twill
■ Unique service name for a given service type
● Router inspects the request URI to derive a service name
○ Uses Twill discovery service client to locate actual host and port
○ Proxy the request and response
Long Running Applications
● All CDAP services on YARN are long running
○ Transaction server, metrics and log processing, real-time data ingestion, …
● Many user applications are long running too
○ Real-time streaming, HTTP service, application daemon
● Not too big of a problem in non-secure cluster
○ Logs not collected, log files may get too big, …
■ Twill build-in log collections can help
● Secure cluster, specifically Kerberos enabled cluster
○ All all Hadoop services use delegation token
■ NN, RM, HBase Master, Hive, KMS, ...
○ YARN containers doesn’t have the keytab, and it should not, hence can’t update the token
Long Running Applications in Twill
● Twill provides native support for updating delegation tokens
○ TwillRunner.scheduleSecureStoreUpdate
● Update delegation tokens from the launcher process (kinit process)
○ Acquires new delegation tokens periodically
○ Serializes tokens to HDFS
○ Notifies all running applications about the update
■ Through command message
○ Each runnable refreshes delegation tokens by reading from HDFS
■ Requires a non-expired HDFS delegation token
● New launcher process will discovery all Twill apps from ZK
○ Can run HA launcher processes using leader election support from Twill
Scalability
● Many components in CDAP are linearly scalable, such as
○ Streaming data ingestion (through REST endpoint)
○ Log processing
■ Reads from Kafka, writes to HDFS
○ Metrics processing
■ Reads from Kafka, writes to timeseries table
○ User real-time streaming DAG
○ User HTTP service
● Twill supports adding/reducing YARN containers for a given TwillRunnable
○ No need to restart application
○ Guarantees a unique instance ID is assigned
■ Application can use it for partitioning
High Availability
● In production environment, it is important to have high availability
● Twill provides couple means to achieve that
○ Running multiple instances of the same TwillRunnable
○ Use dynamic service discovery to route requests
○ Twill Automatic restart of TwillRunnable container if it gets killed / exit abnormally
■ Killed container will be removed from the service discovery
■ Restarted container will be added to the service discovery
○ Built-in leader election support to have active-passive type of redundancy
■ Tephra service use that, as it requires only having one active server
○ Built-in distributed lock to help synchronization
■ Synchronize when there is configuration changes among TwillRunnable instances
Placement Policy
● CDAP can run multiple instances for a given service type
○ Scalability
○ Redundancy for availability
● YARN doesn’t expect applications care where containers run
○ Can provide location hint, but is not guaranteed
○ Depends on the YARN scheduler
● CDAP uses Twill to control container placement
○ Different instances of the same TwillRunnable runs on different host
○ Certain TwillRunnable cannot run on the same host
■ Stream handler, Tephra transaction server
● Both are heavy CPU and IO bound
Performance and Load Testing
● We perform load testing for CDAP components
○ Real-time stream ingestion handler
○ Tephra transaction server
● A scalable load testing framework written using Twill
○ Multiple REST clients in each TwillRunnable
■ One client per thread
○ Can gradually increase number of threads as well as number of containers
■ Use command message to increase threads
■ Use elastic scaling API to increase number of containers
○ Collect metrics through log messages
■ Use the built-in log collection support
Apache Twill in Enterprise
● CDAP, which uses Twill, is being used by large enterprise in production
● Apache Twill is proven framework
○ Has been running on different cluster configurations
■ AWS, Azure, bare metal, VMs
● Compatible with wide range of Hadoop versions
○ Vanilla Hadoop 2.0 - 2.7
○ HDP 2.1 - 2.3
○ CDH 5
○ MapR 4.1 - 5.1
Roadmap
● Expose newly added YARN features
● Smarter containers management
○ Run simple runnable in AM
○ Multiple runnables in one container
● Speedup application launch time
● Fine-grained control of containers lifecycle
○ When to start, stop and restart on failure
● Simple application launching with better classloader isolation
● Smaller footprint
○ Optional Kafka, optional ZooKeeper
● Generalize to run on more frameworks
○ Apache Mesos, Kubernetes
Thank you!
● Twill is Open Source and needs your contributions
○ http://twill.incubator.apache.org
○ dev@twill.incubator.apache.org
○ @ApacheTwill
● Contributions are welcomed!

More Related Content

What's hot

Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itBruno Faria
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링OpenStack Korea Community
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Treasure Data, Inc.
 
NGSI によるデータ・モデリング - FIWARE WednesdayWebinars
NGSI によるデータ・モデリング - FIWARE WednesdayWebinarsNGSI によるデータ・モデリング - FIWARE WednesdayWebinars
NGSI によるデータ・モデリング - FIWARE WednesdayWebinarsfisuda
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeDatabricks
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
HTML5と WebSocket / WebRTC / Web Audio API / WebGL 技術解説
HTML5と WebSocket / WebRTC / Web Audio API / WebGL 技術解説HTML5と WebSocket / WebRTC / Web Audio API / WebGL 技術解説
HTML5と WebSocket / WebRTC / Web Audio API / WebGL 技術解説You_Kinjoh
 
【第26回Elasticsearch勉強会】Logstashとともに振り返る、やっちまった事例ごった煮
【第26回Elasticsearch勉強会】Logstashとともに振り返る、やっちまった事例ごった煮【第26回Elasticsearch勉強会】Logstashとともに振り返る、やっちまった事例ごった煮
【第26回Elasticsearch勉強会】Logstashとともに振り返る、やっちまった事例ごった煮Hibino Hisashi
 
EmbulkのGCS/BigQuery周りのプラグインについて
EmbulkのGCS/BigQuery周りのプラグインについてEmbulkのGCS/BigQuery周りのプラグインについて
EmbulkのGCS/BigQuery周りのプラグインについてSatoshi Akama
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方Yoshiyasu SAEKI
 
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)NTT DATA Technology & Innovation
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기AWSKRUG - AWS한국사용자모임
 
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかBigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかRyuji Tamagawa
 
kube-system落としてみました
kube-system落としてみましたkube-system落としてみました
kube-system落としてみましたShuntaro Saiba
 
2021 10-13 i ox query processing
2021 10-13 i ox query processing2021 10-13 i ox query processing
2021 10-13 i ox query processingAndrew Lamb
 
ポスト・ラムダアーキテクチャの切り札? Apache Hudi(NTTデータ テクノロジーカンファレンス 2020 発表資料)
ポスト・ラムダアーキテクチャの切り札? Apache Hudi(NTTデータ テクノロジーカンファレンス 2020 発表資料)ポスト・ラムダアーキテクチャの切り札? Apache Hudi(NTTデータ テクノロジーカンファレンス 2020 発表資料)
ポスト・ラムダアーキテクチャの切り札? Apache Hudi(NTTデータ テクノロジーカンファレンス 2020 発表資料)NTT DATA Technology & Innovation
 
大規模DCのネットワークデザイン
大規模DCのネットワークデザイン大規模DCのネットワークデザイン
大規模DCのネットワークデザインMasayuki Kobayashi
 

What's hot (20)

Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using it
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
 
NGSI によるデータ・モデリング - FIWARE WednesdayWebinars
NGSI によるデータ・モデリング - FIWARE WednesdayWebinarsNGSI によるデータ・モデリング - FIWARE WednesdayWebinars
NGSI によるデータ・モデリング - FIWARE WednesdayWebinars
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
HTML5と WebSocket / WebRTC / Web Audio API / WebGL 技術解説
HTML5と WebSocket / WebRTC / Web Audio API / WebGL 技術解説HTML5と WebSocket / WebRTC / Web Audio API / WebGL 技術解説
HTML5と WebSocket / WebRTC / Web Audio API / WebGL 技術解説
 
【第26回Elasticsearch勉強会】Logstashとともに振り返る、やっちまった事例ごった煮
【第26回Elasticsearch勉強会】Logstashとともに振り返る、やっちまった事例ごった煮【第26回Elasticsearch勉強会】Logstashとともに振り返る、やっちまった事例ごった煮
【第26回Elasticsearch勉強会】Logstashとともに振り返る、やっちまった事例ごった煮
 
EmbulkのGCS/BigQuery周りのプラグインについて
EmbulkのGCS/BigQuery周りのプラグインについてEmbulkのGCS/BigQuery周りのプラグインについて
EmbulkのGCS/BigQuery周りのプラグインについて
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
 
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
 
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかBigQueryの課金、節約しませんか
BigQueryの課金、節約しませんか
 
kube-system落としてみました
kube-system落としてみましたkube-system落としてみました
kube-system落としてみました
 
2021 10-13 i ox query processing
2021 10-13 i ox query processing2021 10-13 i ox query processing
2021 10-13 i ox query processing
 
ポスト・ラムダアーキテクチャの切り札? Apache Hudi(NTTデータ テクノロジーカンファレンス 2020 発表資料)
ポスト・ラムダアーキテクチャの切り札? Apache Hudi(NTTデータ テクノロジーカンファレンス 2020 発表資料)ポスト・ラムダアーキテクチャの切り札? Apache Hudi(NTTデータ テクノロジーカンファレンス 2020 発表資料)
ポスト・ラムダアーキテクチャの切り札? Apache Hudi(NTTデータ テクノロジーカンファレンス 2020 発表資料)
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
大規模DCのネットワークデザイン
大規模DCのネットワークデザイン大規模DCのネットワークデザイン
大規模DCのネットワークデザイン
 

Similar to Build Large Scale Apps in YARN with Apache Twill

Building Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache TwillBuilding Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache TwillCask Data
 
Microservices and modularity with java
Microservices and modularity with javaMicroservices and modularity with java
Microservices and modularity with javaDPC Consulting Ltd
 
Developing Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideDeveloping Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideMohanraj Thirumoorthy
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flinkRenato Guimaraes
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
 
Jakarta Concurrency: Present and Future
Jakarta Concurrency: Present and FutureJakarta Concurrency: Present and Future
Jakarta Concurrency: Present and FuturePayara
 
Oracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityOracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityLudovico Caldara
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle ManagementDoKC
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle ManagementDoKC
 
Get SaaSy with Red Hat OpenShift on AWS (CON305-S) - AWS re:Invent 2018
Get SaaSy with Red Hat OpenShift on AWS (CON305-S) - AWS re:Invent 2018Get SaaSy with Red Hat OpenShift on AWS (CON305-S) - AWS re:Invent 2018
Get SaaSy with Red Hat OpenShift on AWS (CON305-S) - AWS re:Invent 2018Amazon Web Services
 
Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019Mike Villiger
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kuberneteskloia
 
Scaling application with RabbitMQ
Scaling application with RabbitMQScaling application with RabbitMQ
Scaling application with RabbitMQNahidul Kibria
 

Similar to Build Large Scale Apps in YARN with Apache Twill (20)

Building Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache TwillBuilding Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache Twill
 
Microservices and modularity with java
Microservices and modularity with javaMicroservices and modularity with java
Microservices and modularity with java
 
Developing Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideDeveloping Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's Guide
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Arquitecturas de microservicios - Medianet Software
Arquitecturas de microservicios   -  Medianet SoftwareArquitecturas de microservicios   -  Medianet Software
Arquitecturas de microservicios - Medianet Software
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
Jakarta Concurrency: Present and Future
Jakarta Concurrency: Present and FutureJakarta Concurrency: Present and Future
Jakarta Concurrency: Present and Future
 
Springboot Microservices
Springboot MicroservicesSpringboot Microservices
Springboot Microservices
 
Oracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityOracle Drivers configuration for High Availability
Oracle Drivers configuration for High Availability
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 
Java one2013
Java one2013Java one2013
Java one2013
 
Get SaaSy with Red Hat OpenShift on AWS (CON305-S) - AWS re:Invent 2018
Get SaaSy with Red Hat OpenShift on AWS (CON305-S) - AWS re:Invent 2018Get SaaSy with Red Hat OpenShift on AWS (CON305-S) - AWS re:Invent 2018
Get SaaSy with Red Hat OpenShift on AWS (CON305-S) - AWS re:Invent 2018
 
Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
Scaling application with RabbitMQ
Scaling application with RabbitMQScaling application with RabbitMQ
Scaling application with RabbitMQ
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Build Large Scale Apps in YARN with Apache Twill

  • 1. Build Large Scale Applications in YARN with Henry Saputra (@Kingwulf) - hsaputra@apache.org Terence Yim (@chtyim) - chtyim@apache.org Apache Big Data Conference - North America - 2016
  • 2. Disclaimer Apache Twill is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF
  • 3. Agenda ● Why Apache Twill? ● Architecture and Components ● Features ● Real World Enterprise Use Cases - CDAP ● Roadmap ● Q & A
  • 4. Apache Hadoop ® YARN ● MapReduce NextGen aka MRv2 ● New ResourceManager manages the global assignment of compute resources to applications ● Introduce concept of ApplicationMaster per application to communicate with ResourceManager for compute resource management ● http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn- site/index.html
  • 5. Apache Hadoop ® YARN Architecture
  • 6. Developing Application in YARN ● http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn- site/WritingYarnApplications.html ● It is actually not as “simple” as it sounds ● Lots of boilerplates code with very steep learning curve ● Given the power and generic nature of YARN, developing applications directly on top of YARN could be very difficult ● Standard components: ○ Application Client ○ Application Master ○ Application
  • 7. Apache Hadoop ® YARN Application
  • 9. Apache Twill ● Add simplicity to the power of YARN ○ Java thread-like programming model ○ Instead of running in multiple threads, it runs in many containers in YARN ● Incubated at Apache Software Foundation since November 2013 ○ Has successfully created seven releases ○ http://twill.incubator.apache.org/index.html
  • 10. Hello World in Twill public class HelloWorld { public static class HelloWorldRunnable extends AbstractTwillRunnable { @Override public void run() { LOG.info("Hello World. My first distributed application."); } } public static void main(String[] args) throws Exception { TwillRunnerService twillRunner = new YarnTwillRunnerService(new YarnConfiguration(), "localhost:2181"); twillRunner.startAndWait(); TwillController controller = twillRunner.prepare(new HelloWorldRunnable()) .addLogHandler(new PrinterLogHandler(new PrintWriter(System.out, true))) .start(); try { controller.awaitTermination(); } catch (Exception ex) { ... } } }
  • 11. Why Apache Twill ● Apache Twill provides abstraction and virtualization for YARN to reduce complexity to develop complex and distributed large scale applications ● Apache Twill allows developers to leverage the power of YARN by offering programming paradigms ● Apache Twill offers commons needs for distributed large scale application development ○ Lifecycle management ○ Service discovery ○ Distributed coordination and resiliency to failures ○ Real-time Logging
  • 13. Main Apache Projects Used ● Apache Hadoop YARN ● Apache Hadoop HDFS ● Apache Zookeeper ● Apache Kafka
  • 14. Features - 1 Main Features 1. Real-time logging 2. Resource Report 3. State Recovery 4. Elastic Scaling 5. Command Messages 6. Service Discovery
  • 15. Features -2 Notable New Features 1. 0.5.0-incubating a. Placement Policy APIs b. Submitting to non default YARN Queue c. Distributed Lock 2. 0.6.0-incubating a. Restart instances of runnables for Twill applications b. MapR Extension c. Remove Guava Dependencies from client APIs 3. 0.7.0-incubating a. Allow setting environment variable on Twill containers b. Support for Azure Blob Storage
  • 17. Resource Report - 1 /** * This interface provides a snapshot of the resources an application is using * broken down by each runnable. */ public interface ResourceReport { // Get all the run resources being used by all instances of the specified runnable. Collection<TwillRunResources> getRunnableResources(String runnableName); // Get all the run resources being used across all runnables. Map<String, Collection<TwillRunResources>> getResources(); // Get the resources application master is using. TwillRunResources getAppMasterResources(); // Get the id of the application master. String getApplicationId(); // Get the id of the application master. List<String> getServices(); }
  • 18. Resource Report - 2 /** * Information about the container the {@link TwillRunnable} * is running in. */ public interface TwillRunResources { int getInstanceId(); int getVirtualCores(); int getMemoryMB(); String getHost(); String getContainerId(); Integer getDebugPort(); LogEntry.Level getLogLevel(); }
  • 19. Resource Report - 3 ● Client get the resource report from Twill using the TwillController. getResourceReport API to return resource reporting public interface TwillController extends ServiceController { ... /** * Get a snapshot of the resources used by the application, broken down by each runnable. * * @return A {@link ResourceReport} containing information about resources used by the application or * null in case the user calls this before the application completely starts. */ @Nullable ResourceReport getResourceReport(); ... }
  • 22. Elastic Scaling ● Ability to add or reduce number of YARN containers to run the application ● Twill API TwillController.changeInstances is used to accomplish this task /** * Changes the number of running instances of a given runnable. * * @param runnable The name of the runnable. * @param newCount Number of instances for the given runnable. * @return A {@link Future} that will be completed when the number running instances has been * successfully changed. The future will carry the new count as the result. If there is any error * while changing instances, it'll be reflected in the future. */ Future<Integer> changeInstances(String runnable, int newCount);
  • 24. Placement Policy API - 1 (New) ● Expose container placement policy from YARN ● Will allow Twill to allocate containers in specific racks and host based on DISTRIBUTED deployment mode
  • 25. Placement Policy API - 2 (New) /** * Defines a container placement policy. */ interface PlacementPolicy { enum Type { DISTRIBUTED, DEFAULT } Set<String> getNames(); Type getType(); Set<String> getHosts(); Set<String> getRacks(); }
  • 26. Restart Instances for Twill Runnables - 1 (New) ● Instances of TwillRunnable will be run in YARN containers ● Each Twill application could have one or more instances of TwillRunnable ● Twill provides ability to restart particular runnable instance without affecting other runnables ● This is useful when certain runnables are not running well and you would need to restart certain instances based on the identifier
  • 27. Restart Instances for Twill Runnables - 2 (New) /** * For controlling a running application. */ public interface TwillController extends ServiceController { ... Future<String> restartAllInstances(String runnable); Future<Set<String>> restartInstances(Map<String, ? extends Set<Integer>> runnableToInstanceIds); Future<String> restartInstances(String runnable, int instanceId, int... moreInstanceIds); ... }
  • 28. Setting Environment Variables on Containers (New) ● Provides ability to set environment variables on the YARN containers where TwillRunnable instances are running /** * This interface exposes methods to set up the Twill runtime environment and start a Twill application. */ public interface TwillPreparer { ... // Adds the set of environment variables that will be set as container environment variables for all runnables. TwillPreparer withEnv(Map<String, String> env); /** * Adds the set of environment variables that will be set as container environment variables for the given runnable. * Environment variables set through this method has higher precedence than the one set through {@link #withEnv(Map)} * if there is a key clash. */ TwillPreparer withEnv(String runnableName, Map<String, String> env); ... }
  • 29. Real World Enterprise Usages - CDAP ● Cask Data Application Platform (CDAP) - http://cdap.io ○ Open source data application framework ○ Simplifies and enhances data application development and management ■ APIs for simplification, portability and standardization ● Works across wide range of Hadoop versions and all common distros ■ Built-in System services, such as metrics and logs aggregation, dataset management, and distributed transaction service for common big data applications needs ○ Extensions to enhance user experience ■ Hydrator - Interactive data pipeline construction ■ Tracker - Metadata discovery and data lineage
  • 31. Apache Twill in CDAP ● CDAP runs different types of work on YARN ○ Long running daemons ○ Real-time transactional streaming framework ○ REST services ○ Workflow execution ● CDAP only interacts with Twill ○ Greatly simplifies the CDAP code base ○ Just a matter of minutes to add support for new type of work to run on YARN ○ Native support of common needs ■ Service discovery ■ Leader election and distributed locking ■ Elastic scaling ■ Security
  • 33. Service Discovery ● CDAP exposes all functionalities through REST ● Almost all CDAP HTTP services are running in YARN ○ No fixed host and port. ○ Bind to ephemeral port ○ Announce the host and port through Twill ■ Unique service name for a given service type ● Router inspects the request URI to derive a service name ○ Uses Twill discovery service client to locate actual host and port ○ Proxy the request and response
  • 34. Long Running Applications ● All CDAP services on YARN are long running ○ Transaction server, metrics and log processing, real-time data ingestion, … ● Many user applications are long running too ○ Real-time streaming, HTTP service, application daemon ● Not too big of a problem in non-secure cluster ○ Logs not collected, log files may get too big, … ■ Twill build-in log collections can help ● Secure cluster, specifically Kerberos enabled cluster ○ All all Hadoop services use delegation token ■ NN, RM, HBase Master, Hive, KMS, ... ○ YARN containers doesn’t have the keytab, and it should not, hence can’t update the token
  • 35. Long Running Applications in Twill ● Twill provides native support for updating delegation tokens ○ TwillRunner.scheduleSecureStoreUpdate ● Update delegation tokens from the launcher process (kinit process) ○ Acquires new delegation tokens periodically ○ Serializes tokens to HDFS ○ Notifies all running applications about the update ■ Through command message ○ Each runnable refreshes delegation tokens by reading from HDFS ■ Requires a non-expired HDFS delegation token ● New launcher process will discovery all Twill apps from ZK ○ Can run HA launcher processes using leader election support from Twill
  • 36. Scalability ● Many components in CDAP are linearly scalable, such as ○ Streaming data ingestion (through REST endpoint) ○ Log processing ■ Reads from Kafka, writes to HDFS ○ Metrics processing ■ Reads from Kafka, writes to timeseries table ○ User real-time streaming DAG ○ User HTTP service ● Twill supports adding/reducing YARN containers for a given TwillRunnable ○ No need to restart application ○ Guarantees a unique instance ID is assigned ■ Application can use it for partitioning
  • 37. High Availability ● In production environment, it is important to have high availability ● Twill provides couple means to achieve that ○ Running multiple instances of the same TwillRunnable ○ Use dynamic service discovery to route requests ○ Twill Automatic restart of TwillRunnable container if it gets killed / exit abnormally ■ Killed container will be removed from the service discovery ■ Restarted container will be added to the service discovery ○ Built-in leader election support to have active-passive type of redundancy ■ Tephra service use that, as it requires only having one active server ○ Built-in distributed lock to help synchronization ■ Synchronize when there is configuration changes among TwillRunnable instances
  • 38. Placement Policy ● CDAP can run multiple instances for a given service type ○ Scalability ○ Redundancy for availability ● YARN doesn’t expect applications care where containers run ○ Can provide location hint, but is not guaranteed ○ Depends on the YARN scheduler ● CDAP uses Twill to control container placement ○ Different instances of the same TwillRunnable runs on different host ○ Certain TwillRunnable cannot run on the same host ■ Stream handler, Tephra transaction server ● Both are heavy CPU and IO bound
  • 39. Performance and Load Testing ● We perform load testing for CDAP components ○ Real-time stream ingestion handler ○ Tephra transaction server ● A scalable load testing framework written using Twill ○ Multiple REST clients in each TwillRunnable ■ One client per thread ○ Can gradually increase number of threads as well as number of containers ■ Use command message to increase threads ■ Use elastic scaling API to increase number of containers ○ Collect metrics through log messages ■ Use the built-in log collection support
  • 40. Apache Twill in Enterprise ● CDAP, which uses Twill, is being used by large enterprise in production ● Apache Twill is proven framework ○ Has been running on different cluster configurations ■ AWS, Azure, bare metal, VMs ● Compatible with wide range of Hadoop versions ○ Vanilla Hadoop 2.0 - 2.7 ○ HDP 2.1 - 2.3 ○ CDH 5 ○ MapR 4.1 - 5.1
  • 41. Roadmap ● Expose newly added YARN features ● Smarter containers management ○ Run simple runnable in AM ○ Multiple runnables in one container ● Speedup application launch time ● Fine-grained control of containers lifecycle ○ When to start, stop and restart on failure ● Simple application launching with better classloader isolation ● Smaller footprint ○ Optional Kafka, optional ZooKeeper ● Generalize to run on more frameworks ○ Apache Mesos, Kubernetes
  • 42. Thank you! ● Twill is Open Source and needs your contributions ○ http://twill.incubator.apache.org ○ dev@twill.incubator.apache.org ○ @ApacheTwill ● Contributions are welcomed!