SlideShare a Scribd company logo
1 of 43
Get your hands on implementing a Flink app: A
tutorial
Christos Hadjinikolis & Satyasheel | DataReply.uk
Tutorial Overview:
 What is Apache Flink?
 Why Flink?
 Processing both bounded and un-bounded data!
 Anatomy of a Flink App
 Windowing in Flink
 Event time & Process time in Flink
2/22/17C. Hadjinikolis & Satyasheel | DataReply 2
What is Apache Flink?
“A distributed data processing platform…”
2/22/17C. Hadjinikolis & Satyasheel | DataReply 3
2/22/17C. Hadjinikolis & Satyasheel | DataReply 4
Flink is a distributed stream- & batch- data
processing platform
 Stream processing
…the real-time processing of data continuously, concurrently, and in a record-by-record
fashion, where data is not static.
 Batch processing
…the execution of a series of programs each on a set or "batch" of static inputs, rather
than a single input (which would instead be a custom job).
2/22/17C. Hadjinikolis & Satyasheel | DataReply 5
…distributed processing dataset types
 Unbounded
Infinite datasets that are appended to continuously:
 End users interacting with mobile or web applications
 Physical sensors providing measurements
 Financial markets
 Machine log data
 Surveillance camera frames
2/22/17C. Hadjinikolis & Satyasheel | DataReply 6
…distributed processing dataset types
 Bounded
Finite, unchanging datasets:
 Pictures
 Documents
 Database tables
Why Flink?
“The world is turning more and more towards stream processing…”
2/22/17C. Hadjinikolis & Satyasheel | DataReply 7
2/22/17C. Hadjinikolis & Satyasheel | DataReply 8
Opt for Flink because it:
 Provides results that are accurate
 Is stateful and fault-tolerant and can seamlessly
recover from failures
 Performs at large scale
2/22/17C. Hadjinikolis & Satyasheel | DataReply 9
…exactly-once semantics
 Statefull
… apps can maintain summaries of
processed data.
 Checkpointing
… a mechanism that ensures that in the
event of failure no duplicate re-
computation of an event will take place.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 10
…event time semantics
…event-time-based windowing
Event time makes it easy to compute accurate results over streams where events arrive out
of order and where events may arrive delayed.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 11
… flexible windowing
Windows can be customized with flexible triggering conditions to
support sophisticated streaming patterns based on:
 Time;
 Count, and;
 Sessions.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 12
… lightweight fault tolerance
Recovers from failures with zero
data loss while the tradeoff
between reliability and latency is
negligible.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 13
… lightweight fault tolerance
Savepoints
 Provide a state versioning mechanism.
 Applications can update and reprocess historic
data with no lost state.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 14
… Scalable
Designed to run on large
scale clusters with many
thousands on nodes.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 15
So, in summary…
Flink is an open-source stream processing framework, which:
 Eliminates the “performance vs. reliability” problem and;
 Performs consistently in both categories.
Processing both
bounded & un-bouded data!
“Unbounding the boundaries…”
2/22/17C. Hadjinikolis & Satyasheel | DataReply 16
2/22/17C. Hadjinikolis & Satyasheel | DataReply 17
…the streaming model & bounded datasets
 DataStream API  un-bounded data
 DataSet API  bounded data
A bounded dataset is handled inside of Flink
as a “finite stream”, with only a few minor
differences in how Flink manages un-
bounded datasets.
Anatomy of a Flink App
“Let’s get this started…”
2/22/17C. Hadjinikolis & Satyasheel | DataReply 18
2/22/17C. Hadjinikolis & Satyasheel | DataReply 19
…Flink programs transform collections of data
Each program consists of the same basic parts:
 Obtain an execution environment,
 Load/create the initial data,
 Specify transformations on this data,
 Specify where to put the results of your computations
 Trigger the program execution
2/22/17C. Hadjinikolis & Satyasheel | DataReply 20
Create execution environment
Load streaming data
Trigger transformations
Specify dumping location
Execute
2/22/17C. Hadjinikolis & Satyasheel | DataReply 21
…Lazy evaluation
When the program’s main method is
executed:
 Each operation is created and added to the
program’s plan.
 execution is explicitly triggered by
an execute() call.
This helps with constructing an optimised
data-flow as a holistically planned unit.
Lets take 15 mins
…
2/22/17C. Hadjinikolis & Satyasheel | DataReply 22
Windowing in Flink
“…a simple word count app.”
2/22/17C. Hadjinikolis & Satyasheel | DataReply 23
2/22/17C. Hadjinikolis & Satyasheel | DataReply 24
…so what is a window?
 A window is a way to get a {snapshot} of the streaming data.
 A {snapshot} can be based on time or other variables.
 One can define the window based on no of records or other stream
specific variables.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 25
…enough with theory! Give us some code!
A streaming word count example with no windowing
2/22/17C. Hadjinikolis & Satyasheel | DataReply 26
…updating states
 Flink automatically updates its states without
the user explicitly doing so.
 To better appreciate this, it is worth
contrasting Flink with Spark.
 Spark relies on micro-batches:
 This means one has to define the batch size either in
terms of time or size
 Flink, does not require defining a batch size.
 It can process each and every new event individually
(it is true stream processing!)
Lets see an example
…
2/22/17C. Hadjinikolis & Satyasheel | DataReply 27
Windowing in Flink
“Don't waste a minute not being happy. If one window closes, run to the next window - or break down a door. …”
2/22/17C. Hadjinikolis & Satyasheel | DataReply 28
2/22/17C. Hadjinikolis & Satyasheel | DataReply 29
…so why use windowing at all?
 Aggregation on DataStream is different from aggregation
Dataset.
 One cannot count all records on infinite stream.
 DataStream aggregation makes sense on window stream.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 30
…what types of windowing can you use?
 Tumbling Windows :
 Aligned, fixed length, non-overlapping window.
 Sliding Windows :
 Aligned, fixed length, overlapping window.
 Session Windows :
 Non aligned, variable length window.
 Count Windows :
 Fixed number of records/events, non-overlapping window.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 31
…anatomy of the window API
 3 window functions:
 Window Assigner:
 Responsible for assigning given element to window.
 Depending upon the definition of window, one element can belong to one or more windows at a
time.
 Trigger:
 Defines the condition for triggering window evaluation.
 This function controls when a given window created by window assigner is evaluated.
 Evictor:
 An optional function which defines the preprocessing before firing window operations.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 32
…understanding count window
 Window Assigner (for count-based window  user-defined)
 No start or end to the window, therefore the window is non-time based.
 For these windows we use the GlobalWindows window assigner.
 For a given key, all key-values are filled into the same window.
keyValue.window(GlobalWindows.create())
 The window API allows us to add the window assigner to the window.
 Every window assigner has a default trigger.
 for global windows that trigger is NeverTrigger which never triggers.
 so, this window assigner has to be used with a custom trigger.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 33
…understanding count window
 Count trigger
 Once we have the window assigner, we have to define when the window needs to be
trigger-ed, for example:
trigger(CountTrigger.of(2))
 This results in the window being evaluated every two records.
 Evictor
 In addition to these, an evictor can be used for further preprocessing tasks before firing a
window operation, e.g. to remove the every 3rd element of all window.
 Some default evictors:
 CountEvictor , DeltaEvictor , TimeEvictor
The anatomy of a
window API
…
2/22/17C. Hadjinikolis & Satyasheel | DataReply 34
Tumbling Windows
…
2/22/17C. Hadjinikolis & Satyasheel | DataReply 35
Sliding Windows
…
2/22/17C. Hadjinikolis & Satyasheel | DataReply 36
Lets take 15 mins
…
2/22/17C. Hadjinikolis & Satyasheel | DataReply 37
Timing in Flink
“The two most powerful warriors are patience and time.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 38
2/22/17C. Hadjinikolis & Satyasheel | DataReply 39
…the time concept in streaming
 A streaming application is an always running application.
 ..we need to take snapshots of the stream at various points.
 ..these points can be defined using a time component.
 ..we can group, correlate, different events happening in the stream.
 Some of the constructs like window, heavily use the time component.
 Most of the streaming frameworks support a single meaning of time, which
is mostly tied to the processing time.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 40
…time in Flink
 When we say, last “t” seconds, what do we mean exactly? Well in Flink
it’s one of three things:
 Processing Time
“…the records arrived in last "t" seconds for the processing.”
 Event Time
“… all the records generated in those last "t" seconds at the source.”
 Ingestion Time
 The time when events ingested into the system.
 This time is in between of the event time and processing time.
2/22/17C. Hadjinikolis & Satyasheel | DataReply 41
…time in Flink
Time in Flink
…
2/22/17C. Hadjinikolis & Satyasheel | DataReply 42
2/22/17C. Hadjinikolis & Satyasheel | DataReply 43
Thanks for your attention!

More Related Content

Viewers also liked

Becoming an Influencer: Strategies for Change
Becoming an Influencer: Strategies for ChangeBecoming an Influencer: Strategies for Change
Becoming an Influencer: Strategies for ChangeDr. Ed Cabellon
 
Semestrario Esparza Herrera Ramón
Semestrario Esparza Herrera RamónSemestrario Esparza Herrera Ramón
Semestrario Esparza Herrera RamónRamon Herrera
 
Aplicaciones de Herramientas
Aplicaciones de HerramientasAplicaciones de Herramientas
Aplicaciones de Herramientasdayitagaona08
 
Testing with Python, Pytest and Vim
Testing with Python, Pytest and VimTesting with Python, Pytest and Vim
Testing with Python, Pytest and VimMaximilian Jackson
 
Practical Guide to Product Roadmapping
Practical Guide to Product RoadmappingPractical Guide to Product Roadmapping
Practical Guide to Product RoadmappingJoe Granda
 
Value-Based Payments and Managed Care Contracting - Crash Course Webinar Series
Value-Based Payments and Managed Care Contracting - Crash Course Webinar SeriesValue-Based Payments and Managed Care Contracting - Crash Course Webinar Series
Value-Based Payments and Managed Care Contracting - Crash Course Webinar SeriesEpstein Becker Green
 
Piwik PRO The Real Cost of Data Privacy
Piwik PRO The Real Cost of Data Privacy Piwik PRO The Real Cost of Data Privacy
Piwik PRO The Real Cost of Data Privacy Piwik PRO
 
самостійна робота
самостійна роботасамостійна робота
самостійна роботаslavinskiy
 
Peter Hinssen @ Revolve! UnConference
Peter Hinssen @ Revolve! UnConferencePeter Hinssen @ Revolve! UnConference
Peter Hinssen @ Revolve! UnConferencenexxworks
 
Why Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoWhy Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoBig Data Spain
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Stephan Ewen
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuSlim Baltagi
 
Streaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the DivideStreaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the DivideBen Stopford
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingRobert Metzger
 

Viewers also liked (20)

TLC
TLCTLC
TLC
 
Becoming an Influencer: Strategies for Change
Becoming an Influencer: Strategies for ChangeBecoming an Influencer: Strategies for Change
Becoming an Influencer: Strategies for Change
 
Ondas y sonido
Ondas y sonidoOndas y sonido
Ondas y sonido
 
Semestrario Esparza Herrera Ramón
Semestrario Esparza Herrera RamónSemestrario Esparza Herrera Ramón
Semestrario Esparza Herrera Ramón
 
Aplicaciones de Herramientas
Aplicaciones de HerramientasAplicaciones de Herramientas
Aplicaciones de Herramientas
 
Testing with Python, Pytest and Vim
Testing with Python, Pytest and VimTesting with Python, Pytest and Vim
Testing with Python, Pytest and Vim
 
Practical Guide to Product Roadmapping
Practical Guide to Product RoadmappingPractical Guide to Product Roadmapping
Practical Guide to Product Roadmapping
 
Scooters for sale
Scooters for saleScooters for sale
Scooters for sale
 
Value-Based Payments and Managed Care Contracting - Crash Course Webinar Series
Value-Based Payments and Managed Care Contracting - Crash Course Webinar SeriesValue-Based Payments and Managed Care Contracting - Crash Course Webinar Series
Value-Based Payments and Managed Care Contracting - Crash Course Webinar Series
 
CUMPLEAÑOS 84 DE LA CIUDAD DE EL TIGRE pdf
CUMPLEAÑOS 84 DE LA CIUDAD DE EL TIGRE pdfCUMPLEAÑOS 84 DE LA CIUDAD DE EL TIGRE pdf
CUMPLEAÑOS 84 DE LA CIUDAD DE EL TIGRE pdf
 
Piwik PRO The Real Cost of Data Privacy
Piwik PRO The Real Cost of Data Privacy Piwik PRO The Real Cost of Data Privacy
Piwik PRO The Real Cost of Data Privacy
 
самостійна робота
самостійна роботасамостійна робота
самостійна робота
 
Peter Hinssen @ Revolve! UnConference
Peter Hinssen @ Revolve! UnConferencePeter Hinssen @ Revolve! UnConference
Peter Hinssen @ Revolve! UnConference
 
Digipak case study
Digipak case studyDigipak case study
Digipak case study
 
Why Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoWhy Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén Casado
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
 
Streaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the DivideStreaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the Divide
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
 

Similar to Flink meetup

Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Stavros Kontopoulos
 
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowHow to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowLucas Arruda
 
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...tdc-globalcode
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDLionel Mommeja
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos
 
Tutorial 37 API Coding
Tutorial 37 API CodingTutorial 37 API Coding
Tutorial 37 API CodingMax Kleiner
 
Getting More Out of the Node.js, PHP, and Python Agents - AppSphere16
Getting More Out of the Node.js, PHP, and Python Agents - AppSphere16Getting More Out of the Node.js, PHP, and Python Agents - AppSphere16
Getting More Out of the Node.js, PHP, and Python Agents - AppSphere16AppDynamics
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Timo Walther
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)KafkaZone
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thessaloniki
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
OpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven appsOpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven appsDaniel Krook
 
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache FlinkCloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache FlinkDataWorks Summit
 
Serverless apps with OpenWhisk
Serverless apps with OpenWhiskServerless apps with OpenWhisk
Serverless apps with OpenWhiskDaniel Krook
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Big Data Spain
 

Similar to Flink meetup (20)

Apache flink
Apache flinkApache flink
Apache flink
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
 
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowHow to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
 
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-RED
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Tutorial 37 API Coding
Tutorial 37 API CodingTutorial 37 API Coding
Tutorial 37 API Coding
 
Nexmark with beam
Nexmark with beamNexmark with beam
Nexmark with beam
 
Getting More Out of the Node.js, PHP, and Python Agents - AppSphere16
Getting More Out of the Node.js, PHP, and Python Agents - AppSphere16Getting More Out of the Node.js, PHP, and Python Agents - AppSphere16
Getting More Out of the Node.js, PHP, and Python Agents - AppSphere16
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
OpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven appsOpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven apps
 
Introduction to Redux.pptx
Introduction to Redux.pptxIntroduction to Redux.pptx
Introduction to Redux.pptx
 
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache FlinkCloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
 
Serverless apps with OpenWhisk
Serverless apps with OpenWhiskServerless apps with OpenWhisk
Serverless apps with OpenWhisk
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
 

Recently uploaded

The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 

Recently uploaded (20)

The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 

Flink meetup

  • 1. Get your hands on implementing a Flink app: A tutorial Christos Hadjinikolis & Satyasheel | DataReply.uk
  • 2. Tutorial Overview:  What is Apache Flink?  Why Flink?  Processing both bounded and un-bounded data!  Anatomy of a Flink App  Windowing in Flink  Event time & Process time in Flink 2/22/17C. Hadjinikolis & Satyasheel | DataReply 2
  • 3. What is Apache Flink? “A distributed data processing platform…” 2/22/17C. Hadjinikolis & Satyasheel | DataReply 3
  • 4. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 4 Flink is a distributed stream- & batch- data processing platform  Stream processing …the real-time processing of data continuously, concurrently, and in a record-by-record fashion, where data is not static.  Batch processing …the execution of a series of programs each on a set or "batch" of static inputs, rather than a single input (which would instead be a custom job).
  • 5. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 5 …distributed processing dataset types  Unbounded Infinite datasets that are appended to continuously:  End users interacting with mobile or web applications  Physical sensors providing measurements  Financial markets  Machine log data  Surveillance camera frames
  • 6. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 6 …distributed processing dataset types  Bounded Finite, unchanging datasets:  Pictures  Documents  Database tables
  • 7. Why Flink? “The world is turning more and more towards stream processing…” 2/22/17C. Hadjinikolis & Satyasheel | DataReply 7
  • 8. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 8 Opt for Flink because it:  Provides results that are accurate  Is stateful and fault-tolerant and can seamlessly recover from failures  Performs at large scale
  • 9. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 9 …exactly-once semantics  Statefull … apps can maintain summaries of processed data.  Checkpointing … a mechanism that ensures that in the event of failure no duplicate re- computation of an event will take place.
  • 10. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 10 …event time semantics …event-time-based windowing Event time makes it easy to compute accurate results over streams where events arrive out of order and where events may arrive delayed.
  • 11. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 11 … flexible windowing Windows can be customized with flexible triggering conditions to support sophisticated streaming patterns based on:  Time;  Count, and;  Sessions.
  • 12. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 12 … lightweight fault tolerance Recovers from failures with zero data loss while the tradeoff between reliability and latency is negligible.
  • 13. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 13 … lightweight fault tolerance Savepoints  Provide a state versioning mechanism.  Applications can update and reprocess historic data with no lost state.
  • 14. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 14 … Scalable Designed to run on large scale clusters with many thousands on nodes.
  • 15. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 15 So, in summary… Flink is an open-source stream processing framework, which:  Eliminates the “performance vs. reliability” problem and;  Performs consistently in both categories.
  • 16. Processing both bounded & un-bouded data! “Unbounding the boundaries…” 2/22/17C. Hadjinikolis & Satyasheel | DataReply 16
  • 17. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 17 …the streaming model & bounded datasets  DataStream API  un-bounded data  DataSet API  bounded data A bounded dataset is handled inside of Flink as a “finite stream”, with only a few minor differences in how Flink manages un- bounded datasets.
  • 18. Anatomy of a Flink App “Let’s get this started…” 2/22/17C. Hadjinikolis & Satyasheel | DataReply 18
  • 19. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 19 …Flink programs transform collections of data Each program consists of the same basic parts:  Obtain an execution environment,  Load/create the initial data,  Specify transformations on this data,  Specify where to put the results of your computations  Trigger the program execution
  • 20. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 20 Create execution environment Load streaming data Trigger transformations Specify dumping location Execute
  • 21. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 21 …Lazy evaluation When the program’s main method is executed:  Each operation is created and added to the program’s plan.  execution is explicitly triggered by an execute() call. This helps with constructing an optimised data-flow as a holistically planned unit.
  • 22. Lets take 15 mins … 2/22/17C. Hadjinikolis & Satyasheel | DataReply 22
  • 23. Windowing in Flink “…a simple word count app.” 2/22/17C. Hadjinikolis & Satyasheel | DataReply 23
  • 24. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 24 …so what is a window?  A window is a way to get a {snapshot} of the streaming data.  A {snapshot} can be based on time or other variables.  One can define the window based on no of records or other stream specific variables.
  • 25. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 25 …enough with theory! Give us some code! A streaming word count example with no windowing
  • 26. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 26 …updating states  Flink automatically updates its states without the user explicitly doing so.  To better appreciate this, it is worth contrasting Flink with Spark.  Spark relies on micro-batches:  This means one has to define the batch size either in terms of time or size  Flink, does not require defining a batch size.  It can process each and every new event individually (it is true stream processing!)
  • 27. Lets see an example … 2/22/17C. Hadjinikolis & Satyasheel | DataReply 27
  • 28. Windowing in Flink “Don't waste a minute not being happy. If one window closes, run to the next window - or break down a door. …” 2/22/17C. Hadjinikolis & Satyasheel | DataReply 28
  • 29. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 29 …so why use windowing at all?  Aggregation on DataStream is different from aggregation Dataset.  One cannot count all records on infinite stream.  DataStream aggregation makes sense on window stream.
  • 30. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 30 …what types of windowing can you use?  Tumbling Windows :  Aligned, fixed length, non-overlapping window.  Sliding Windows :  Aligned, fixed length, overlapping window.  Session Windows :  Non aligned, variable length window.  Count Windows :  Fixed number of records/events, non-overlapping window.
  • 31. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 31 …anatomy of the window API  3 window functions:  Window Assigner:  Responsible for assigning given element to window.  Depending upon the definition of window, one element can belong to one or more windows at a time.  Trigger:  Defines the condition for triggering window evaluation.  This function controls when a given window created by window assigner is evaluated.  Evictor:  An optional function which defines the preprocessing before firing window operations.
  • 32. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 32 …understanding count window  Window Assigner (for count-based window  user-defined)  No start or end to the window, therefore the window is non-time based.  For these windows we use the GlobalWindows window assigner.  For a given key, all key-values are filled into the same window. keyValue.window(GlobalWindows.create())  The window API allows us to add the window assigner to the window.  Every window assigner has a default trigger.  for global windows that trigger is NeverTrigger which never triggers.  so, this window assigner has to be used with a custom trigger.
  • 33. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 33 …understanding count window  Count trigger  Once we have the window assigner, we have to define when the window needs to be trigger-ed, for example: trigger(CountTrigger.of(2))  This results in the window being evaluated every two records.  Evictor  In addition to these, an evictor can be used for further preprocessing tasks before firing a window operation, e.g. to remove the every 3rd element of all window.  Some default evictors:  CountEvictor , DeltaEvictor , TimeEvictor
  • 34. The anatomy of a window API … 2/22/17C. Hadjinikolis & Satyasheel | DataReply 34
  • 35. Tumbling Windows … 2/22/17C. Hadjinikolis & Satyasheel | DataReply 35
  • 36. Sliding Windows … 2/22/17C. Hadjinikolis & Satyasheel | DataReply 36
  • 37. Lets take 15 mins … 2/22/17C. Hadjinikolis & Satyasheel | DataReply 37
  • 38. Timing in Flink “The two most powerful warriors are patience and time. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 38
  • 39. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 39 …the time concept in streaming  A streaming application is an always running application.  ..we need to take snapshots of the stream at various points.  ..these points can be defined using a time component.  ..we can group, correlate, different events happening in the stream.  Some of the constructs like window, heavily use the time component.  Most of the streaming frameworks support a single meaning of time, which is mostly tied to the processing time.
  • 40. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 40 …time in Flink  When we say, last “t” seconds, what do we mean exactly? Well in Flink it’s one of three things:  Processing Time “…the records arrived in last "t" seconds for the processing.”  Event Time “… all the records generated in those last "t" seconds at the source.”  Ingestion Time  The time when events ingested into the system.  This time is in between of the event time and processing time.
  • 41. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 41 …time in Flink
  • 42. Time in Flink … 2/22/17C. Hadjinikolis & Satyasheel | DataReply 42
  • 43. 2/22/17C. Hadjinikolis & Satyasheel | DataReply 43 Thanks for your attention!

Editor's Notes

  1. Second, 2 types of execution models Streaming: Processing that executes continuously as long as data is being produced Batch: Processing that is executed and runs to completeness in a finite amount of time, releasing computing resources when finished It’s possible, though not necessarily optimal, to process either type of dataset with either type of execution model. For instance, batch execution has long been applied to unbounded datasets despite potential problems with windowing, state management, and out-of-order data. Flink relies on a streaming execution model, which is an intuitive fit for processing unbounded datasets: streaming execution is continuous processing on data that is continuously produced. And alignment between the type of dataset and the type of execution model offers many advantages with regard to accuracy and performance.
  2. Before we go into detail about Flink, let’s review at a higher level the types of datasets you’re likely to encounter when processing data as well as types of execution models you can choose for processing. These two ideas are often conflated, and it’s useful to clearly separate them. First, 2 types of datasets Unbounded: Infinite datasets that are appended to continuously Bounded: Finite, unchanging datasets Many real-word data sets that are traditionally thought of as bounded or “batch” data are in reality unbounded datasets. This is true whether the data is stored in a sequence of directories on HDFS or in a log-based system like Apache Kafka. Examples of unbounded datasets include but are not limited to: End users interacting with mobile or web applications Physical sensors providing measurements Financial markets Machine log data
  3. We have all interacted with bounded dataset on our machines, like: picturesm or documents of any kind, database tables etc.
  4. Earlier, we discussed aligning the type of dataset (bounded vs. unbounded) with the type of execution model (batch vs. streaming). Many of the Flink features listed below–state management, handling of out-of-order data, flexible windowing–are essential for computing accurate results on unbounded datasets and are enabled by Flink’s streaming execution model. Flink guarantees exactly-once semantics for stateful computations. ‘Stateful’ means that applications can maintain an aggregation or summary of data that has been processed over time, and Flink’s checkpointing mechanism ensures exactly-once semantics for an application’s state in the event of a failure.
  5. Flink guarantees exactly-once semantics for stateful computations. ‘Stateful’ means that applications can maintain an aggregation or summary of data that has been processed over time, and Flink’s checkpointing mechanism ensures exactly-once semantics for an application’s state in the event of a failure.
  6. Flink supports stream processing and windowing with event time semantics. Event time makes it easy to compute accurate results over streams where events arrive out of order and where events may arrive delayed.
  7. Flink supports flexible windowing based on time, count, or sessions in addition to data-driven windows. Windows can be customized with flexible triggering conditions to support sophisticated streaming patterns. Flink’s windowing makes it possible to model the reality of the environment in which data is created.
  8. … allows the system to maintain high throughput rates and provide exactly-once consistency guarantees at the same time. Flink recovers from failures with zero data loss while the tradeoff between reliability and latency is negligible.
  9. Flink’s savepoints provide a state versioning mechanism, making it possible to update applications or reprocess historic data with no lost state and minimal downtime.
  10. Flink is designed to run on large-scale clusters with many thousands of nodes, and in addition to a standalone cluster mode, Flink provides support for YARN and Mesos
  11. In summary, Apache Flink is an open-source stream processing framework that eliminates the “performance vs. reliability” tradeoff often associated with open-source streaming engines and performs consistently in both categories.
  12. Earlier in this write-up, we introduced the streaming execution model (“processing that executes continuously, an event-at-a-time”) as an intuitive fit for unbounded datasets. So how do bounded datasets relate to the stream processing paradigm? In Flink’s case, the relationship is quite natural. A bounded dataset can simply be treated as a special case of an unbounded one, so it’s possible to apply all of the same streaming concepts that we’ve laid out above to finite data. This is exactly how Flink’s DataSet API behaves. A bounded dataset is handled inside of Flink as a “finite stream”, with only a few minor differences in how Flink manages bounded vs. unbounded datasets. And so it’s possible to use Flink to process both bounded and unbounded data, with both APIs running on the same distributed streaming execution engine–a simple yet powerful architecture.
  13. Lazy Evaluation All Flink programs are executed lazily: When the program’s main method is executed, the data loading and transformations do not happen directly. Rather, each operation is created and added to the program’s plan. The operations are actually executed when the execution is explicitly triggered by an execute() call on the execution environment. Whether the program is executed locally or on a cluster depends on the type of execution environment. The lazy evaluation lets you construct sophisticated programs that Flink executes as one holistically planned unit.
  14. For example, if we create a window for 5 seconds then it will be all the records which arrived in the that time frame. Why do we need windowing? Aggregation on DataStream is different from aggregation dataset, One cannot count all records on infinite stream. DataStream aggregation makes sense on window stream.
  15. In spark, after each batch, the state has to be updated explicitly if you want to keep track of wordcount across batches. But in flink the state is up-to-dated as and when new records arrive implicitly.
  16. Most of the window operations are encouraged to be used on KeyedDataStream. A KeyedDataStream is a datastream which is partitioned by the key. This partitioning by key allows window to be distributed across machines resulting in good performance.
  17. Trigerring: Most of the window operations are encouraged to be used on KeyedDataStream. A KeyedDataStream is a datastream which is partitioned by the key. This partitioning by key allows window to be distributed across machines resulting in good performance. Evictor: Like removing the third element in a count window of 10 elements…
  18. Most of the window operations are encouraged to be used on KeyedDataStream. A KeyedDataStream is a datastream which is partitioned by the key. This partitioning by key allows window to be distributed across machines resulting in good performance.
  19. **CountEvictor:** keeps up to a user-specified number of elements from the window and discards the remaining ones from the beginning of the window buffer. **DeltaEvictor:** takes a DeltaFunction and a threshold, computes the delta between the last element in the window buffer and each of the remaining ones, and removes the ones with a delta greater or equal to the threshold. **TimeEvictor:** takes as argument an interval in milliseconds and for a given window, it finds the maximum timestamp max_ts among its elements and removes all the elements with timestamps smaller than max_ts - interval. **Note:** All evictors apply their logic before the window function.
  20. In Flink it depends and it could be one of three following. Processing Time Most of the streaming application uses this concept and this is one of the most familiar concept users. This time is tracked using a clock run by the processing engine. So, last "t" seconds means the records arrived in last "t" seconds for the processing. Processing time is very good way of keeping track of time, but not always helpful. Let's say we want to measure the state of sensor at a given point of time so, we want to collect the event at that time. But if the events arrive lately to processing system due to various reasons, we may miss some of the events as processing clock does not care about the actual time of events. To address this, Flink support another kind of time called event time. Event Time This time is embedded in data. Means this time comes with the data. So here last "t" seconds means, all the records generated in those last "t" seconds at the source. These may come out of order to processing. This time is independent of the clock that is kept by the processing engine.Event time is extremely useful for handling the late arrival events. Ingestion Time Ingestion time is the time when events ingested into the system. This time is in between of the event time and processing time. Normally in processing time, each machine in cluster is used to assign the time stamp to track events. This may result in little inconsistent view of the data, as there may be delays in time across the cluster. But ingestion time, timestamp is assigned in ingestion so that all the machines in the cluster have exact same view. These are useful to calculate results on data that arrive in order at the level of ingestion.