Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Bay Area Apache Flink Meetup Community Update August 2015
1. Bay Area Apache Flink Meetup #2
Distributed Stream and Graph Processing
Community Update
August 2015
Henry Saputra
Committer and PMC Member
hsaputra@apache.org
@Kingwulf
2. Apache Flink is an open source platform for
scalable batch and stream data processing.
Apache Flink is …
2
• The core of Apache Flink is a
distributed streaming dataflow
engine.
• Executing dataflows in
parallel on clusters
• Providing a reliable
foundation for various
workloads
• DataSet and DataStream
programming abstractions are
the foundation for user programs
and higher layers
3. One engine for many use cases
3
Real time streaming
topologies
Machine Learning at scale
Graph Analysis
Long batch
pipelines
4. What happened? - 1
• New PMC: Maximilian Michels
• New Committer: Chesnay Schepler
• Discussions for a 0.9.1 release had started
• Apache Flink is becoming more popular:
– 1000+ Twitter followers
– 500+ GitHub stars
– Named as “open source Big Data project” to
watch by ZDNet.
– Flink Forward schedule with great speakers
announced
4
5. What happened? - 2
• Apache Flink on Wikipedia: https://
en.wikipedia.org/wiki/Apache_Flink
• New JobManager Dashboard
• Apache SAMOA 0.3.0-incubating with Flink
integration
• New “Features” page
• Contributors list (can you spot your name?)
https://cwiki.apache.org/confluence/display/
FLINK/List+of+contributors
5
10. In master (0.10-SNAPSHOT) - 1
10
• Gelly Scala API
• More improvements and fixes for YARN
• Flink dropped Java 6 support
• Streaming connector for Elastic Search
• Sampling operation on DataSet API
• A lot of bug fixes:
– Streaming: APIs, general stability, kafka
connector
11. In master (0.10-SNAPSHOT) - 2
• Low watermarks / Event time
• New JM Dashboard
• Akka messages are now aware of leader
IDs (for HA)
• Zookeeper integration (for HA)
• Live accumulators (runtime only)
• Stability improvements
11
12. Articles and Mentions
• High-throughput, low-latency, and exactly-once stream
processing with Apache Flink [1]
• Introducing Gelly: Graph Processing with Apache Flink [2]
• Apache Flink and the case for stream processing [3]
• Crunching Parquet Files with Apache Flink [4]
• The morning paper: Asynchronous Distributed Snapshots for
Distributed Dataflows [5]
• Five open source Big Data projects to watch [6]
• Big Data Performance Engineering: Examples from Hadoop,
Pig, HBase, Flink and Spark [7]
12
[1] http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
[2] http://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
[3] http://www.kdnuggets.com/2015/08/apache-flink-stream-processing.html
[4] https://medium.com/@istanbul_techie/crunching-parquet-files-with-apache-flink-200bec90d8a7
[5] http://blog.acolyer.org/2015/08/19/asynchronous-distributed-snapshots-for-distributed-dataflows/
[6] http://www.zdnet.com/article/five-open-source-big-data-projects-to-watch/
[7] http://www.bigsynapse.com/addressing-big-data-performance
13. New Meetups and Events
13
• Chicago: Flink Training @ Capital One
• Bay Area: Stream & Graph Processing @
MapR
13
15. Upcoming
• Sept 15: Washington DC Area Apache
Flink Meetup
• Sept 17: StreamProcessing.be meetup
• Sept 28-30: Flink Talks at ApacheCon Big
Data Budapest
New Meetup groups:
• New York
• Boston
15
16. Flink Forward schedule published
16
• http://flink-forward.org/?post_type=day
• Talks by Google, Data Artisans, Huawei,
CapitalOne, Bouyges, Ericsson, Amadeus,
ResearchGate, RedHat, and many more.
50%
off for this meetup‘s guests
FlinkMeetupBayArea50