SlideShare a Scribd company logo
1 of 31
Download to read offline
Apache Spark
Mate Gulyas
WHY WE DO IT?
54% average
viewability**
36% non-human
visitor *
What percentage of
digital ads reach
people?
Clickbots,
botnets
Invisible,
hidden ads
Transparency in
the market
*http://technorati.com/iab-keynote-36-percent-ad-traffic-from-bots-and-threatening-industry/
**http://www.statista.com/statistics/255061/viewability-rates-for-rich-media-ads-worldwide-by-industry/
JavaScript SegmentationBehaviour analysis
WHAT WE DO?
Distributed data processing
WHAT DO WE NEED?
Averge size client
30 GB / day
900 GB / month
20 average size clients
600 GB / day
18 TB / month
Recurring data
transformations
WHAT DO WE NEED?
Interactive / Batch /
Streaming / SQL /
Graph processing
IT’S SO COOL
In-memory
WHY SPARK?
Productive API
WHY SPARK?
Multiple language
WHY SPARK?
Active community
WHY SPARK?
friendly
WHY SPARK?
developer
analyst
CFO
friendly
WHY SPARK?
developer
analyst
CFO
friendly
WHY SPARK?
developer
analyst
CFO
Resilient Distributed
Dataset (RDD)
ONE THING TO REMEMBER
RDD
IT RUN’S ON
Mesos
YARN
Standalone
AWS EC2
IT GET’S DATA FROM?
Amazon S3, HDFS,
Cassandra, Hive, Hbase,
Tachyon, Local Filesystem,
ODBC databases, etc...
Batch processing
THE OLD WAY
Interactive analytics
THE NEW WAY
SPARK WITH IPYTHON
Spark SQL
THE NOT THAT OLD WAY
{"name": "Mate Gulyas", "twitter": "gulyasm"}
{"name": "John Doe", "email": "jdoe@freemail.com"}
{"name": "Jane Doe", "email": "janedoe@citromail.com"}
val input = hiveCtx.jsonFile(“example.json”)
input.registerAsTable(“users”)
hiveCtx.sql(“SELECT name, twitter FROM people;”)
SQL WITH JSON
Spark Streaming
THE LOW LATENCY WAY
DStream
MLlib
THE SKYNET WAY
GraphX
I LOVE GRAPHS
Third party modules
THE OTHERS WAY
On-premises
AWS
Databricks Cloud
BUT… WHERE TO GO?
TAKEAWAY I
Spark can provide one
platform to cover most of
the use-cases in data
analytics
TAKEAWAY II
Productive, fast data
processing framework that
helps you minimize to time
business impact.
MATE GULYAS
gulyasm@enbrite.ly
@gulyasm
@enbritely
THANK YOU!

More Related Content

Similar to Apache Spark: The modern data analytics platform

Microservics, serverless and real time; Building blocks of the modern data pi...
Microservics, serverless and real time; Building blocks of the modern data pi...Microservics, serverless and real time; Building blocks of the modern data pi...
Microservics, serverless and real time; Building blocks of the modern data pi...Manisha Sule
 
An Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyAn Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyRoger Giuffre
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Amr Awadallah
 
An Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyAn Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyRoger Giuffre
 
When Open Source Meets the Enterprise
When Open Source Meets the EnterpriseWhen Open Source Meets the Enterprise
When Open Source Meets the EnterpriseMariaDB plc
 
[WSO2 Summit Brazil 2018] The API-driven World
[WSO2 Summit Brazil 2018] The API-driven World[WSO2 Summit Brazil 2018] The API-driven World
[WSO2 Summit Brazil 2018] The API-driven WorldWSO2
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?Jos van Dongen
 
Hacking Marketing: The Amazing Convergence of Marketing & Software
Hacking Marketing: The Amazing Convergence of Marketing & SoftwareHacking Marketing: The Amazing Convergence of Marketing & Software
Hacking Marketing: The Amazing Convergence of Marketing & SoftwareEnsighten
 
Low-code is developing and will continue to progress in 2023. (1).pdf
Low-code is developing and will continue to progress in 2023.  (1).pdfLow-code is developing and will continue to progress in 2023.  (1).pdf
Low-code is developing and will continue to progress in 2023. (1).pdfArgpnteq
 
Hacking Marketing Q&A Session
Hacking Marketing Q&A SessionHacking Marketing Q&A Session
Hacking Marketing Q&A SessionScott Brinker
 
Real-World, Open Source, End-to-End JavaScript in IoT
Real-World, Open Source, End-to-End JavaScript in IoTReal-World, Open Source, End-to-End JavaScript in IoT
Real-World, Open Source, End-to-End JavaScript in IoTAll Things Open
 
CDS + Power Apps
CDS + Power Apps CDS + Power Apps
CDS + Power Apps Juan Fabian
 
Why Software-Defined Storage Matters
Why Software-Defined Storage MattersWhy Software-Defined Storage Matters
Why Software-Defined Storage MattersRed_Hat_Storage
 
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQBuilding a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQDominik Obermaier
 
BSFI Technology Offerings by Value Innovation Labs
BSFI Technology Offerings by Value Innovation LabsBSFI Technology Offerings by Value Innovation Labs
BSFI Technology Offerings by Value Innovation LabsMount Talent Consulting
 
Accelerate IoT Development with KnowThings.io
Accelerate IoT Development with KnowThings.ioAccelerate IoT Development with KnowThings.io
Accelerate IoT Development with KnowThings.ioCA Technologies
 
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...Richard Harbridge
 

Similar to Apache Spark: The modern data analytics platform (20)

Microservics, serverless and real time; Building blocks of the modern data pi...
Microservics, serverless and real time; Building blocks of the modern data pi...Microservics, serverless and real time; Building blocks of the modern data pi...
Microservics, serverless and real time; Building blocks of the modern data pi...
 
Bigdata
BigdataBigdata
Bigdata
 
An Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyAn Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech Company
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008
 
An Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyAn Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech Company
 
When Open Source Meets the Enterprise
When Open Source Meets the EnterpriseWhen Open Source Meets the Enterprise
When Open Source Meets the Enterprise
 
[WSO2 Summit Brazil 2018] The API-driven World
[WSO2 Summit Brazil 2018] The API-driven World[WSO2 Summit Brazil 2018] The API-driven World
[WSO2 Summit Brazil 2018] The API-driven World
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
Hacking Marketing: The Amazing Convergence of Marketing & Software
Hacking Marketing: The Amazing Convergence of Marketing & SoftwareHacking Marketing: The Amazing Convergence of Marketing & Software
Hacking Marketing: The Amazing Convergence of Marketing & Software
 
Low-code is developing and will continue to progress in 2023. (1).pdf
Low-code is developing and will continue to progress in 2023.  (1).pdfLow-code is developing and will continue to progress in 2023.  (1).pdf
Low-code is developing and will continue to progress in 2023. (1).pdf
 
Hacking Marketing Q&A Session
Hacking Marketing Q&A SessionHacking Marketing Q&A Session
Hacking Marketing Q&A Session
 
Real-World, Open Source, End-to-End JavaScript in IoT
Real-World, Open Source, End-to-End JavaScript in IoTReal-World, Open Source, End-to-End JavaScript in IoT
Real-World, Open Source, End-to-End JavaScript in IoT
 
CDS + Power Apps
CDS + Power Apps CDS + Power Apps
CDS + Power Apps
 
Transforma Insights
Transforma InsightsTransforma Insights
Transforma Insights
 
Why Software-Defined Storage Matters
Why Software-Defined Storage MattersWhy Software-Defined Storage Matters
Why Software-Defined Storage Matters
 
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQBuilding a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
 
BSFI Technology Offerings by Value Innovation Labs
BSFI Technology Offerings by Value Innovation LabsBSFI Technology Offerings by Value Innovation Labs
BSFI Technology Offerings by Value Innovation Labs
 
Accelerate IoT Development with KnowThings.io
Accelerate IoT Development with KnowThings.ioAccelerate IoT Development with KnowThings.io
Accelerate IoT Development with KnowThings.io
 
Greetings david cutler inform and connect
Greetings   david cutler inform and connectGreetings   david cutler inform and connect
Greetings david cutler inform and connect
 
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
 

Recently uploaded

Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxnuruddin69
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stageAbc194748
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 

Recently uploaded (20)

Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 

Apache Spark: The modern data analytics platform