SlideShare a Scribd company logo
1 of 38
Download to read offline
Fast Data and
Streaming Analytics
Natalino Busa
Enterprise Data Architect at ING
The Evolution of Data Analytics
@natbusa | linkedin.com: Natalino Busa
ING group
http://www.ing.com/About-us/Purpose-Strategy.htm
@natbusa | linkedin: Natalino Busa
ING group
http://www.ing.com/About-us/Purpose-Strategy.htm
Clear and Easy Anytime, Anywhere Empower Keep getting better
@natbusa | linkedin.com: Natalino Busa
about:
how to grok data with machines
and keep up with changing times & techs
@natbusa | linkedin.com: Natalino Busa
Analytics goes mainstream (70s, 80s)
● The Relational Database is born!
1972: E.F. Codd relational database model, normalization:
(free from insertion, deletion and update anomalies)
1978: Peter Chen, The entity-relationship model
@natbusa | linkedin.com: Natalino Busa
Exploratory Data Analysis
In 1977, Tukey published Exploratory Data Analysis,
arguing that more emphasis needed to be placed on using
data to suggest hypotheses to test and that Exploratory
Data Analysis and Confirmatory Data Analysis “can—and
should—proceed side by side.”
Analytics goes mainstream (70s, 80s)
@natbusa | linkedin.com: Natalino Busa
● 1995: Amazon
● 1995: eBay
● 1996: HotMail
● 1998: Google
● 1998: Paypal
Internet goes Global (90s)
@natbusa | linkedin.com: Natalino Busa
Knowledge Data in Databases (1996)
@natbusa | linkedin.com: Natalino Busa
● Analytics (OLAP):
Long queries, aggregations, data mining, reporting, models
● Operations (OLTP):
Fast transactions, ACID, consistent, available, fault-tolerant
The internet goes global (90s)
@natbusa | linkedin.com: Natalino Busa
The World goes Social (00s)
Web apps go in hyper - growth
● 2003: LinkedIn
● 2003: Skype
● 2004: Facebook
● 2006: Twitter
@natbusa | linkedin.com: Natalino Busa
more users, events, transactions.
@natbusa | linkedin.com: Natalino Busa
The Advent of MPP ALAPs (Early 00s)
● Massive multi-rack systems
● 100’s of Computing Cores
● 100’s Terabytes of Storage
● Distributed computing
● Advanced Query Plans
● Columnar Data Models
● Re-programmable hardware
@natbusa | linkedin.com: Natalino Busa
● Simpler programming paradigm
● Distributed, Replicated File System
Map-Reduce and Hadoop (Early 00s)
@natbusa | linkedin.com: Natalino Busa
Map-Reduce and Hadoop (Early 00s)
@natbusa | linkedin.com: Natalino Busa
Hadoop or MPPs or both?
@natbusa | linkedin.com: Natalino Busa
● MPP
for speed and accuracy,
well structured data
● Hadoop
for size, flexibility, raw files
Hadoop and MPPs (00s)
Diagram from: http://hortonworks.com/
@natbusa | linkedin.com: Natalino Busa
http://flowingdata.com/2009/06/04/rise-of-the-data-scientist/
http://medriscoll.com/post/4740157098/the-three-sexy-skills-of-data-geeks
The Rise of the Data Scientist (00s)
@natbusa | linkedin.com: Natalino Busa
● WhatsApp: in a day
● 31 billion messages sent
● 700 million photo’s sent
Fast Data, API, Mobile and IoT (10s)
@natbusa | linkedin.com: Natalino Busa
Stream Centric Architectures (10s)
@natbusa | linkedin.com: Natalino Busa
Stream Centric Architectures (10s)
@natbusa | linkedin.com: Natalino Busa
Stream Centric Architectures (10s)
@natbusa | linkedin.com: Natalino Busa
Stream Centric Architectures (10s)
● Streaming events
● Resilient, Scalable
● Publisher and Subscribers
● Distributed Queue
http://kafka.apache.org/
@natbusa | linkedin.com: Natalino Busa
New Problems:
● Hadoop is getting too slow (File -> File)
● Productivity of Data Science goes down
● SQL is not enough
● Distributed Machine Learning algorithms?
Fast Data, API, Mobile and IoT (10s)
@natbusa | linkedin.com: Natalino Busa
10 yrs 5 yrs 1 yr 1 month 1 day 1hour 1m
time
population:events,transactions,
sessions,customers,etc
Customer
Journey
Analytics
Recent data
streaming analytics
historical big data
Streaming and Real-Time Analytics (10s)
@natbusa | linkedin.com: Natalino Busa25
Distributed
Data Store
Fast Analytics
Event Processing
Real Time APIs
Streaming Data
Data Modeling
Data Sources,
Files, DB extracts
Batched Data
Alerts and Notifications
API for mobile and web
Training, Scoring and Exposing models
read the model
read the data
write the model
@natbusa | linkedin.com: Natalino Busa
in-memory computing
is winning!
Spark is emerging as an
improved, faster, better,
“new” hadoop.
The RAM is the new Disk (10s)
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
@natbusa | linkedin.com: Natalino Busa
Unified Distributed
Computing paradigm:
SQL,
Statistics
Machine Learning
Graph Analytics
@natalinobusa | linkedin.com: Natalino Busa
Integrated Data Science (10s)
Polyglot
Programming:
R
Python
Scala
Java
@natbusa | linkedin.com: Natalino Busa
https://spark.apache.org/
Spark
Streaming SQL MLlib Graphx
Analytics, Statistics, Data
Science, Model Training
HDFS NoSQL SQL
Data Sources
Map-Reduce
HDFS
KAFKA
Spark: Hadoop evolved (10s)
@natbusa | linkedin.com: Natalino Busa
Kafka + Spark + Cassandra + Akka
(noSQL stack, Fast Data)
MPP + HDFS + Spark
(“new” Hadoop / Data Lake)
@natalinobusa | linkedin.com: Natalino Busa
Popular Operational Analytics Stacks (10s)
@natbusa | linkedin.com: Natalino Busa
Micro-Batch and Event Streaming Analytics
- Micro-Batch (Spark Streaming)
- Log Oriented (Kafka, Samza)
- NewSQL (VoldDB)
- Streaming computing (MillWheel, Flink, Apex)
Kings and new entries (10s, 20s)
@natbusa | linkedin.com: Natalino Busa
- Deep Learning
Data Science new trends (10s)
@natbusa | linkedin.com: Natalino Busa
Deep Learning to assist doctors treating and classifying cancer
http://www.enlitic.com/
Data Science new trends (10s)
@natbusa | linkedin.com: Natalino Busa
- Deep Learning
Data Science new trends (10s)
DL4J
http://deeplearning4j.org/
Theano
http://deeplearning.net/software/theano/
TensorFlow
http://tensorflow.org/
@natbusa | linkedin.com: Natalino Busa
- Topological Data Analysis
Analyze high-dimensional data, visually
http://datarefiner.com/
Analysis of NetFlix Prize Dataset.
Data sets statistics:
● 100,480,507 ratings
● 480,189 users
● 17,770 movies
● 2.8 GB CSV file size
Data Science new trends (10s)
@natbusa | linkedin.com: Natalino Busa
1) SQL + Machine Learning
2) Diversity in your team: great asset
3) Data science: R-Scala-Python-Java
Takeaways: Data Science
@natbusa | linkedin.com: Natalino Busa
1) Memory is King
2) The “Event Stream”
2) Spark is the new Hadoop
Takeaways: Techs
@natbusa | linkedin.com: Natalino Busa
It starts and end with people.
Value the experience not the tools
Takeaways: Customer’s Journey
@natbusa | linkedin.com: Natalino Busa
Distributed computing Machine Learning
Statistics Big/Fast Data Streaming Computing

More Related Content

Viewers also liked

Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
Big Data Analytics in Ecommerce industry
Big Data Analytics in Ecommerce industryBig Data Analytics in Ecommerce industry
Big Data Analytics in Ecommerce industryRashed Moslem
 
Pervasive Computing : You're Already Knee Deep In It
Pervasive Computing : You're Already Knee Deep In ItPervasive Computing : You're Already Knee Deep In It
Pervasive Computing : You're Already Knee Deep In ItRob Manson
 
Conversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data StewardshipConversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data StewardshipLoren Davie
 
Brief lessons from the greatest product managers
Brief lessons from the greatest product managersBrief lessons from the greatest product managers
Brief lessons from the greatest product managersJeffrey T. Pollock
 
Guide Report - Wireless Fundementals v1.0 150114
Guide Report - Wireless Fundementals v1.0 150114Guide Report - Wireless Fundementals v1.0 150114
Guide Report - Wireless Fundementals v1.0 150114Clay Melugin
 
Broadband World Forum Summary 2013
Broadband World Forum Summary 2013Broadband World Forum Summary 2013
Broadband World Forum Summary 2013Alan Quayle
 
Wi-Fi Doctor: Keeping your WLAN healthy - White Paper - The Future Trust
Wi-Fi Doctor: Keeping your WLAN healthy - White Paper - The Future TrustWi-Fi Doctor: Keeping your WLAN healthy - White Paper - The Future Trust
Wi-Fi Doctor: Keeping your WLAN healthy - White Paper - The Future TrustTechnicolor
 
Conways Law & Continuous Delivery
Conways Law & Continuous DeliveryConways Law & Continuous Delivery
Conways Law & Continuous Deliveryallan kelly
 
A Random Walk Through Search Research
A Random Walk Through Search ResearchA Random Walk Through Search Research
A Random Walk Through Search ResearchNick Watkins
 
Broadband world forum service delivery framework KPN presentation
Broadband world forum service delivery framework KPN presentationBroadband world forum service delivery framework KPN presentation
Broadband world forum service delivery framework KPN presentationAlan Quayle
 
Pervasive Computing
Pervasive ComputingPervasive Computing
Pervasive ComputingSangeetha Sg
 
Ambient intelligence & Ubiquitous Computing
Ambient intelligence & Ubiquitous ComputingAmbient intelligence & Ubiquitous Computing
Ambient intelligence & Ubiquitous ComputingRohit Arora
 

Viewers also liked (14)

Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Big Data Analytics in Ecommerce industry
Big Data Analytics in Ecommerce industryBig Data Analytics in Ecommerce industry
Big Data Analytics in Ecommerce industry
 
Rich Mironov Presentation
Rich Mironov PresentationRich Mironov Presentation
Rich Mironov Presentation
 
Pervasive Computing : You're Already Knee Deep In It
Pervasive Computing : You're Already Knee Deep In ItPervasive Computing : You're Already Knee Deep In It
Pervasive Computing : You're Already Knee Deep In It
 
Conversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data StewardshipConversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data Stewardship
 
Brief lessons from the greatest product managers
Brief lessons from the greatest product managersBrief lessons from the greatest product managers
Brief lessons from the greatest product managers
 
Guide Report - Wireless Fundementals v1.0 150114
Guide Report - Wireless Fundementals v1.0 150114Guide Report - Wireless Fundementals v1.0 150114
Guide Report - Wireless Fundementals v1.0 150114
 
Broadband World Forum Summary 2013
Broadband World Forum Summary 2013Broadband World Forum Summary 2013
Broadband World Forum Summary 2013
 
Wi-Fi Doctor: Keeping your WLAN healthy - White Paper - The Future Trust
Wi-Fi Doctor: Keeping your WLAN healthy - White Paper - The Future TrustWi-Fi Doctor: Keeping your WLAN healthy - White Paper - The Future Trust
Wi-Fi Doctor: Keeping your WLAN healthy - White Paper - The Future Trust
 
Conways Law & Continuous Delivery
Conways Law & Continuous DeliveryConways Law & Continuous Delivery
Conways Law & Continuous Delivery
 
A Random Walk Through Search Research
A Random Walk Through Search ResearchA Random Walk Through Search Research
A Random Walk Through Search Research
 
Broadband world forum service delivery framework KPN presentation
Broadband world forum service delivery framework KPN presentationBroadband world forum service delivery framework KPN presentation
Broadband world forum service delivery framework KPN presentation
 
Pervasive Computing
Pervasive ComputingPervasive Computing
Pervasive Computing
 
Ambient intelligence & Ubiquitous Computing
Ambient intelligence & Ubiquitous ComputingAmbient intelligence & Ubiquitous Computing
Ambient intelligence & Ubiquitous Computing
 

More from Natalino Busa

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationNatalino Busa
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networksNatalino Busa
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooksNatalino Busa
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditingNatalino Busa
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayNatalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsNatalino Busa
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Natalino Busa
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsNatalino Busa
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsNatalino Busa
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsNatalino Busa
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 

More from Natalino Busa (20)

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovation
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
Data in Action
Data in ActionData in Action
Data in Action
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and Spray
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analytics
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analytics
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topics
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 

Recently uploaded

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

Fast data and Streaming analytics

  • 1. Fast Data and Streaming Analytics Natalino Busa Enterprise Data Architect at ING The Evolution of Data Analytics
  • 2. @natbusa | linkedin.com: Natalino Busa ING group http://www.ing.com/About-us/Purpose-Strategy.htm
  • 3. @natbusa | linkedin: Natalino Busa ING group http://www.ing.com/About-us/Purpose-Strategy.htm Clear and Easy Anytime, Anywhere Empower Keep getting better
  • 4. @natbusa | linkedin.com: Natalino Busa about: how to grok data with machines and keep up with changing times & techs
  • 5. @natbusa | linkedin.com: Natalino Busa Analytics goes mainstream (70s, 80s) ● The Relational Database is born! 1972: E.F. Codd relational database model, normalization: (free from insertion, deletion and update anomalies) 1978: Peter Chen, The entity-relationship model
  • 6. @natbusa | linkedin.com: Natalino Busa Exploratory Data Analysis In 1977, Tukey published Exploratory Data Analysis, arguing that more emphasis needed to be placed on using data to suggest hypotheses to test and that Exploratory Data Analysis and Confirmatory Data Analysis “can—and should—proceed side by side.” Analytics goes mainstream (70s, 80s)
  • 7. @natbusa | linkedin.com: Natalino Busa ● 1995: Amazon ● 1995: eBay ● 1996: HotMail ● 1998: Google ● 1998: Paypal Internet goes Global (90s)
  • 8. @natbusa | linkedin.com: Natalino Busa Knowledge Data in Databases (1996)
  • 9. @natbusa | linkedin.com: Natalino Busa ● Analytics (OLAP): Long queries, aggregations, data mining, reporting, models ● Operations (OLTP): Fast transactions, ACID, consistent, available, fault-tolerant The internet goes global (90s)
  • 10. @natbusa | linkedin.com: Natalino Busa The World goes Social (00s) Web apps go in hyper - growth ● 2003: LinkedIn ● 2003: Skype ● 2004: Facebook ● 2006: Twitter
  • 11. @natbusa | linkedin.com: Natalino Busa more users, events, transactions.
  • 12. @natbusa | linkedin.com: Natalino Busa The Advent of MPP ALAPs (Early 00s) ● Massive multi-rack systems ● 100’s of Computing Cores ● 100’s Terabytes of Storage ● Distributed computing ● Advanced Query Plans ● Columnar Data Models ● Re-programmable hardware
  • 13. @natbusa | linkedin.com: Natalino Busa ● Simpler programming paradigm ● Distributed, Replicated File System Map-Reduce and Hadoop (Early 00s)
  • 14. @natbusa | linkedin.com: Natalino Busa Map-Reduce and Hadoop (Early 00s)
  • 15. @natbusa | linkedin.com: Natalino Busa Hadoop or MPPs or both?
  • 16. @natbusa | linkedin.com: Natalino Busa ● MPP for speed and accuracy, well structured data ● Hadoop for size, flexibility, raw files Hadoop and MPPs (00s) Diagram from: http://hortonworks.com/
  • 17. @natbusa | linkedin.com: Natalino Busa http://flowingdata.com/2009/06/04/rise-of-the-data-scientist/ http://medriscoll.com/post/4740157098/the-three-sexy-skills-of-data-geeks The Rise of the Data Scientist (00s)
  • 18. @natbusa | linkedin.com: Natalino Busa ● WhatsApp: in a day ● 31 billion messages sent ● 700 million photo’s sent Fast Data, API, Mobile and IoT (10s)
  • 19. @natbusa | linkedin.com: Natalino Busa Stream Centric Architectures (10s)
  • 20. @natbusa | linkedin.com: Natalino Busa Stream Centric Architectures (10s)
  • 21. @natbusa | linkedin.com: Natalino Busa Stream Centric Architectures (10s)
  • 22. @natbusa | linkedin.com: Natalino Busa Stream Centric Architectures (10s) ● Streaming events ● Resilient, Scalable ● Publisher and Subscribers ● Distributed Queue http://kafka.apache.org/
  • 23. @natbusa | linkedin.com: Natalino Busa New Problems: ● Hadoop is getting too slow (File -> File) ● Productivity of Data Science goes down ● SQL is not enough ● Distributed Machine Learning algorithms? Fast Data, API, Mobile and IoT (10s)
  • 24. @natbusa | linkedin.com: Natalino Busa 10 yrs 5 yrs 1 yr 1 month 1 day 1hour 1m time population:events,transactions, sessions,customers,etc Customer Journey Analytics Recent data streaming analytics historical big data Streaming and Real-Time Analytics (10s)
  • 25. @natbusa | linkedin.com: Natalino Busa25 Distributed Data Store Fast Analytics Event Processing Real Time APIs Streaming Data Data Modeling Data Sources, Files, DB extracts Batched Data Alerts and Notifications API for mobile and web Training, Scoring and Exposing models read the model read the data write the model
  • 26. @natbusa | linkedin.com: Natalino Busa in-memory computing is winning! Spark is emerging as an improved, faster, better, “new” hadoop. The RAM is the new Disk (10s) https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
  • 27. @natbusa | linkedin.com: Natalino Busa Unified Distributed Computing paradigm: SQL, Statistics Machine Learning Graph Analytics @natalinobusa | linkedin.com: Natalino Busa Integrated Data Science (10s) Polyglot Programming: R Python Scala Java
  • 28. @natbusa | linkedin.com: Natalino Busa https://spark.apache.org/ Spark Streaming SQL MLlib Graphx Analytics, Statistics, Data Science, Model Training HDFS NoSQL SQL Data Sources Map-Reduce HDFS KAFKA Spark: Hadoop evolved (10s)
  • 29. @natbusa | linkedin.com: Natalino Busa Kafka + Spark + Cassandra + Akka (noSQL stack, Fast Data) MPP + HDFS + Spark (“new” Hadoop / Data Lake) @natalinobusa | linkedin.com: Natalino Busa Popular Operational Analytics Stacks (10s)
  • 30. @natbusa | linkedin.com: Natalino Busa Micro-Batch and Event Streaming Analytics - Micro-Batch (Spark Streaming) - Log Oriented (Kafka, Samza) - NewSQL (VoldDB) - Streaming computing (MillWheel, Flink, Apex) Kings and new entries (10s, 20s)
  • 31. @natbusa | linkedin.com: Natalino Busa - Deep Learning Data Science new trends (10s)
  • 32. @natbusa | linkedin.com: Natalino Busa Deep Learning to assist doctors treating and classifying cancer http://www.enlitic.com/ Data Science new trends (10s)
  • 33. @natbusa | linkedin.com: Natalino Busa - Deep Learning Data Science new trends (10s) DL4J http://deeplearning4j.org/ Theano http://deeplearning.net/software/theano/ TensorFlow http://tensorflow.org/
  • 34. @natbusa | linkedin.com: Natalino Busa - Topological Data Analysis Analyze high-dimensional data, visually http://datarefiner.com/ Analysis of NetFlix Prize Dataset. Data sets statistics: ● 100,480,507 ratings ● 480,189 users ● 17,770 movies ● 2.8 GB CSV file size Data Science new trends (10s)
  • 35. @natbusa | linkedin.com: Natalino Busa 1) SQL + Machine Learning 2) Diversity in your team: great asset 3) Data science: R-Scala-Python-Java Takeaways: Data Science
  • 36. @natbusa | linkedin.com: Natalino Busa 1) Memory is King 2) The “Event Stream” 2) Spark is the new Hadoop Takeaways: Techs
  • 37. @natbusa | linkedin.com: Natalino Busa It starts and end with people. Value the experience not the tools Takeaways: Customer’s Journey
  • 38. @natbusa | linkedin.com: Natalino Busa Distributed computing Machine Learning Statistics Big/Fast Data Streaming Computing