In the past decade a number of technologies have revolutionized the way we do analytics in banking. In this talk we would like to summarize this journey from classical statistical offline modeling to the latest real-time streaming predictive analytical techniques. In particular, we will look at hadoop and how this distributing computing paradigm has evolved with the advent of in-memory computing and distributed machine learning using Spark.Finally, we will describe how to make data science actionable and how to overcome some of the limitations of current batch processing with streaming analytics.
We are living the big data revolution. But what about fast data? Analytics on recent data is becoming increasingly relevant, since it provides better insight and better models in a world of rapidly changing trends and conditions.
Streaming Analytics allows to compute and process data and events as soon as they enter the data system, providing unprecedented levels of reactiveness. Customers are enjoying live, personalized information streams. Companies can be more effective with respect to marketing, security, operation excellence and business process management.
In this talk we will start from traditional batch processes, touching upon the latest development about big data and hadoop, to move further into the world of fast moving data.
We will explore some of the bespoken systems and tools in streaming Analytics such as Spark, Samza, Kafka, Akka and describe some typical it architectures and data processing related to streaming data. Finally we will look at how to combine Streaming Data with an existing batch, off-line analytical solution.
Presented at Big Data & Analytics Innovation Summit
The Innovation Enterprise, November 11 & 12, London, 2015
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
Fast data and Streaming analytics
1. Fast Data and
Streaming Analytics
Natalino Busa
Enterprise Data Architect at ING
The Evolution of Data Analytics
2. @natbusa | linkedin.com: Natalino Busa
ING group
http://www.ing.com/About-us/Purpose-Strategy.htm
3. @natbusa | linkedin: Natalino Busa
ING group
http://www.ing.com/About-us/Purpose-Strategy.htm
Clear and Easy Anytime, Anywhere Empower Keep getting better
4. @natbusa | linkedin.com: Natalino Busa
about:
how to grok data with machines
and keep up with changing times & techs
5. @natbusa | linkedin.com: Natalino Busa
Analytics goes mainstream (70s, 80s)
● The Relational Database is born!
1972: E.F. Codd relational database model, normalization:
(free from insertion, deletion and update anomalies)
1978: Peter Chen, The entity-relationship model
6. @natbusa | linkedin.com: Natalino Busa
Exploratory Data Analysis
In 1977, Tukey published Exploratory Data Analysis,
arguing that more emphasis needed to be placed on using
data to suggest hypotheses to test and that Exploratory
Data Analysis and Confirmatory Data Analysis “can—and
should—proceed side by side.”
Analytics goes mainstream (70s, 80s)
7. @natbusa | linkedin.com: Natalino Busa
● 1995: Amazon
● 1995: eBay
● 1996: HotMail
● 1998: Google
● 1998: Paypal
Internet goes Global (90s)
9. @natbusa | linkedin.com: Natalino Busa
● Analytics (OLAP):
Long queries, aggregations, data mining, reporting, models
● Operations (OLTP):
Fast transactions, ACID, consistent, available, fault-tolerant
The internet goes global (90s)
10. @natbusa | linkedin.com: Natalino Busa
The World goes Social (00s)
Web apps go in hyper - growth
● 2003: LinkedIn
● 2003: Skype
● 2004: Facebook
● 2006: Twitter
16. @natbusa | linkedin.com: Natalino Busa
● MPP
for speed and accuracy,
well structured data
● Hadoop
for size, flexibility, raw files
Hadoop and MPPs (00s)
Diagram from: http://hortonworks.com/
17. @natbusa | linkedin.com: Natalino Busa
http://flowingdata.com/2009/06/04/rise-of-the-data-scientist/
http://medriscoll.com/post/4740157098/the-three-sexy-skills-of-data-geeks
The Rise of the Data Scientist (00s)
18. @natbusa | linkedin.com: Natalino Busa
● WhatsApp: in a day
● 31 billion messages sent
● 700 million photo’s sent
Fast Data, API, Mobile and IoT (10s)
23. @natbusa | linkedin.com: Natalino Busa
New Problems:
● Hadoop is getting too slow (File -> File)
● Productivity of Data Science goes down
● SQL is not enough
● Distributed Machine Learning algorithms?
Fast Data, API, Mobile and IoT (10s)
24. @natbusa | linkedin.com: Natalino Busa
10 yrs 5 yrs 1 yr 1 month 1 day 1hour 1m
time
population:events,transactions,
sessions,customers,etc
Customer
Journey
Analytics
Recent data
streaming analytics
historical big data
Streaming and Real-Time Analytics (10s)
25. @natbusa | linkedin.com: Natalino Busa25
Distributed
Data Store
Fast Analytics
Event Processing
Real Time APIs
Streaming Data
Data Modeling
Data Sources,
Files, DB extracts
Batched Data
Alerts and Notifications
API for mobile and web
Training, Scoring and Exposing models
read the model
read the data
write the model
26. @natbusa | linkedin.com: Natalino Busa
in-memory computing
is winning!
Spark is emerging as an
improved, faster, better,
“new” hadoop.
The RAM is the new Disk (10s)
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
32. @natbusa | linkedin.com: Natalino Busa
Deep Learning to assist doctors treating and classifying cancer
http://www.enlitic.com/
Data Science new trends (10s)
33. @natbusa | linkedin.com: Natalino Busa
- Deep Learning
Data Science new trends (10s)
DL4J
http://deeplearning4j.org/
Theano
http://deeplearning.net/software/theano/
TensorFlow
http://tensorflow.org/
34. @natbusa | linkedin.com: Natalino Busa
- Topological Data Analysis
Analyze high-dimensional data, visually
http://datarefiner.com/
Analysis of NetFlix Prize Dataset.
Data sets statistics:
● 100,480,507 ratings
● 480,189 users
● 17,770 movies
● 2.8 GB CSV file size
Data Science new trends (10s)
35. @natbusa | linkedin.com: Natalino Busa
1) SQL + Machine Learning
2) Diversity in your team: great asset
3) Data science: R-Scala-Python-Java
Takeaways: Data Science
36. @natbusa | linkedin.com: Natalino Busa
1) Memory is King
2) The “Event Stream”
2) Spark is the new Hadoop
Takeaways: Techs
37. @natbusa | linkedin.com: Natalino Busa
It starts and end with people.
Value the experience not the tools
Takeaways: Customer’s Journey