SlideShare a Scribd company logo
1 of 37
Big Data
Trends, Challenges, and Opportunities
Mohammed Guller
Jan 30, 2015
About Me
 Principal Architect at Glassbeam
 Founded two startups
 Passionate about building products,
big data analytics, and machine
learning
www.linkedin.com/in/mohammedguller
@MohammedGuller
3
Available on Amazon
Functional Programming
CPU Trend
 CPU clock speed plateaued around 2004
 CPUs are not getting any faster
 Trend is to add more cores/CPU and more CPUs/system
5
Challenges
 Multi-threaded programs required to utilize all cores in a machine
 Writing multi-threaded program is hard
 Tools provided by traditional languages are primitive
 Problems such as deadlocks, livelocks, starvation, and race
conditions are difficult to avoid and detect
6
Functional Programming (FP)
 Based on theory developed in the 1930s
 Program composed of functions
– Executed by evaluating expressions
 Functions are first-class citizens
– Can be passed as an argument to another function
– Can be returned by another function
– Can be defined inside another function
– Can be defined as an unnamed literal similar to a string literal
 Functions do not have side effect
– Always returns the same output for a given input
– Order of execution is not important
 Discourages mutable variables
7
Benefits of Functional Programming
 Makes it easier to write multi-threaded programs
 Improves developer productivity
 Enables better quality code
8
Functional Programming Languages
 Lisp
 Erlang
 Haskell
 Scala
 Swift
9
Opportunities
 High demand for people who know Scala
– Scala is one of the most popular FP languages
 Shortage of people who know Scala
10
Big Data
3 Vs of Big Data
Volume
Scale of Data
Variety
Diversity of Data
Velocity
Speed of Data
Amount of Data Generated is Exploding
13
5x More Connected Things Than People by 2020
14
Network of objects embedded with software for
collecting and exchanging data over the Internet
Big Data Challenges
 Storage
– Traditional SAN and NAS storage devices are expensive
 Processing
– Traditional RDBMS were not designed to handle big data
 How to get value out of data
 How to do it economically
15
Open-source Big Data Storage Technologies
 Distributed File Systems
– HDFS
 NoSQL data stores
– Cassandra
– HBase
– MongoDB
– Druid
– ElasticSearch
– SolrCloud
16
How Much Data Can a Standard Server Process
100
GB
10
TB
100
TB1
TB
Options For Increasing Data Processing Power
 Scale-up
 Scale-out
18
Scale-up
 Use a more powerful high-end server
– Faster CPU
– Faster Disk
– Large number of CPUs
– Large amount of memory
 Proprietary
 Expensive
 Limited scalability
19
Scale-out
 Use a cluster of commodity servers
 Inexpensive
 Economical to scale
 Preferred architecture
20
Challenges With Scale-out Architecture
 Writing an distributed application is even harder than writing a
multi-threaded one
 Many details involved
– Split a workload into chunks that can be distributed across a cluster
– Schedule compute resources among different jobs
– Manage inter-node communication
– Handle network and node failures
 Hardware failures are more common at a cluster level
– Probability of a single node failing is very low
– Probability of any one node failing from a cluster of thousands of
nodes is very high
21
Getting Value Out of Data
 Traditional analytics / BI
 Machine Learning
– Predictive analytics
– Train software to do human tasks
22
Traditional Analytics / BI
 What happened
– Revenue growth for the last month/quarter/year
– Customer growth for the last month/quarter/year
 Why it happened
– Why profit dropped
– Why sales dropped
 Other insights
– What is the country-wise breakup of people downloading an app
– How much time people spend in an app
23
Predictive Analytics
 Ask software to predict
– What product will a customer most likely buy
– What ad will a visitor most likely click
– What movies/songs/books will a customer like
– What are chances that a patient may have an heart attack
 More interesting and valuable than traditional analytics
24
Train Software To Do Human Tasks
 Image classification
– Facebook
– Flickr
 Voice recognition and natural
language processing
– Siri
 Body movement recognition
– Xbox Kinect
 Self-driving car
– Google car
 Medical diagnosis
 Anomaly detection
– Fraudulent transaction
– Security attack
25
Distributed Data Processing Frameworks
 Batch processing
– MapReduce
 Stream processing
– Samza
– Heron
– Storm
 Batch and stream processing
– Spark
– Flink
– Apex
26
Spark
27
Fast, easy-to-use, and general-purpose cluster
computing framework for processing large datasets
Supports a Variety of Data Sources
28
Spark Benefits
 Makes it easy to write distributed data processing applications
– Expressive API
 Takes care of the messy details of distributed computing
 Allows developers to just focus on the business logic
– Same code works on a single computer or a cluster of nodes
29
Integrated Libraries for a Variety of Tasks
30
Spark Core
Spark
SQL
GraphX
Spark
Streaming
MLlib &
Spark
ML
Spark is Fast
 In-memory computation
 Advanced Directed Acyclic Graph (DAG) execution engine
32
Why In-memory Computation Matters
33
100 MB/s
500 MB/s
10 GB/s
Read Time Comparison
0
50
100
150
200
1 TB
Time (Min)
Data Read
HDD
SSD
RAM
34
What Are People Using Spark For
35
Source: Databricks Survey 2015
Top Reasons For Using Spark
36
Source: Databricks Survey 2015
Adoption of Spark is Growing Rapidly
Opportunities
 Big data will only get bigger
– Everything will be data driven
– New data-driven applications will be invented
– Data will enable us to solve extremely difficult problems
 Spark and other big data technologies are rapidly evolving
 Strong demand for people who know how to store, process and
get value out of big data
40
41

More Related Content

What's hot

THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012Gigaom
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsKaniska Mandal
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsNguyen Cao
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopDavid Yahalom
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworksAmal Targhi
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use CasesInSemble
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data WarehousingThomas Kejser
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLTushar Shende
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Jordan Chung
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big DataMatthew Dennis
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 

What's hot (20)

THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQL
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 

Similar to Big data trends challenges opportunities

2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Sergey Platonov
 
Power Software Development with Apache Spark
Power Software Development with Apache SparkPower Software Development with Apache Spark
Power Software Development with Apache SparkOpenPOWERorg
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsArcadia Data
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorialcybercbm
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
IMS02 autonomics for ims with the ibm management console for ims and db2 fo...
IMS02   autonomics for ims with the ibm management console for ims and db2 fo...IMS02   autonomics for ims with the ibm management console for ims and db2 fo...
IMS02 autonomics for ims with the ibm management console for ims and db2 fo...Robert Hain
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture corehard_by
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hPrecisely
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementClusterpoint
 

Similar to Big data trends challenges opportunities (20)

2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...
 
Power Software Development with Apache Spark
Power Software Development with Apache SparkPower Software Development with Apache Spark
Power Software Development with Apache Spark
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorial
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
IMS02 autonomics for ims with the ibm management console for ims and db2 fo...
IMS02   autonomics for ims with the ibm management console for ims and db2 fo...IMS02   autonomics for ims with the ibm management console for ims and db2 fo...
IMS02 autonomics for ims with the ibm management console for ims and db2 fo...
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
L23 Summary and Conclusions
L23 Summary and ConclusionsL23 Summary and Conclusions
L23 Summary and Conclusions
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data Management
 

Recently uploaded

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 

Recently uploaded (20)

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Big data trends challenges opportunities

  • 1. Big Data Trends, Challenges, and Opportunities Mohammed Guller Jan 30, 2015
  • 2. About Me  Principal Architect at Glassbeam  Founded two startups  Passionate about building products, big data analytics, and machine learning www.linkedin.com/in/mohammedguller @MohammedGuller 3 Available on Amazon
  • 4. CPU Trend  CPU clock speed plateaued around 2004  CPUs are not getting any faster  Trend is to add more cores/CPU and more CPUs/system 5
  • 5. Challenges  Multi-threaded programs required to utilize all cores in a machine  Writing multi-threaded program is hard  Tools provided by traditional languages are primitive  Problems such as deadlocks, livelocks, starvation, and race conditions are difficult to avoid and detect 6
  • 6. Functional Programming (FP)  Based on theory developed in the 1930s  Program composed of functions – Executed by evaluating expressions  Functions are first-class citizens – Can be passed as an argument to another function – Can be returned by another function – Can be defined inside another function – Can be defined as an unnamed literal similar to a string literal  Functions do not have side effect – Always returns the same output for a given input – Order of execution is not important  Discourages mutable variables 7
  • 7. Benefits of Functional Programming  Makes it easier to write multi-threaded programs  Improves developer productivity  Enables better quality code 8
  • 8. Functional Programming Languages  Lisp  Erlang  Haskell  Scala  Swift 9
  • 9. Opportunities  High demand for people who know Scala – Scala is one of the most popular FP languages  Shortage of people who know Scala 10
  • 11. 3 Vs of Big Data Volume Scale of Data Variety Diversity of Data Velocity Speed of Data
  • 12. Amount of Data Generated is Exploding 13
  • 13. 5x More Connected Things Than People by 2020 14 Network of objects embedded with software for collecting and exchanging data over the Internet
  • 14. Big Data Challenges  Storage – Traditional SAN and NAS storage devices are expensive  Processing – Traditional RDBMS were not designed to handle big data  How to get value out of data  How to do it economically 15
  • 15. Open-source Big Data Storage Technologies  Distributed File Systems – HDFS  NoSQL data stores – Cassandra – HBase – MongoDB – Druid – ElasticSearch – SolrCloud 16
  • 16. How Much Data Can a Standard Server Process 100 GB 10 TB 100 TB1 TB
  • 17. Options For Increasing Data Processing Power  Scale-up  Scale-out 18
  • 18. Scale-up  Use a more powerful high-end server – Faster CPU – Faster Disk – Large number of CPUs – Large amount of memory  Proprietary  Expensive  Limited scalability 19
  • 19. Scale-out  Use a cluster of commodity servers  Inexpensive  Economical to scale  Preferred architecture 20
  • 20. Challenges With Scale-out Architecture  Writing an distributed application is even harder than writing a multi-threaded one  Many details involved – Split a workload into chunks that can be distributed across a cluster – Schedule compute resources among different jobs – Manage inter-node communication – Handle network and node failures  Hardware failures are more common at a cluster level – Probability of a single node failing is very low – Probability of any one node failing from a cluster of thousands of nodes is very high 21
  • 21. Getting Value Out of Data  Traditional analytics / BI  Machine Learning – Predictive analytics – Train software to do human tasks 22
  • 22. Traditional Analytics / BI  What happened – Revenue growth for the last month/quarter/year – Customer growth for the last month/quarter/year  Why it happened – Why profit dropped – Why sales dropped  Other insights – What is the country-wise breakup of people downloading an app – How much time people spend in an app 23
  • 23. Predictive Analytics  Ask software to predict – What product will a customer most likely buy – What ad will a visitor most likely click – What movies/songs/books will a customer like – What are chances that a patient may have an heart attack  More interesting and valuable than traditional analytics 24
  • 24. Train Software To Do Human Tasks  Image classification – Facebook – Flickr  Voice recognition and natural language processing – Siri  Body movement recognition – Xbox Kinect  Self-driving car – Google car  Medical diagnosis  Anomaly detection – Fraudulent transaction – Security attack 25
  • 25. Distributed Data Processing Frameworks  Batch processing – MapReduce  Stream processing – Samza – Heron – Storm  Batch and stream processing – Spark – Flink – Apex 26
  • 26. Spark 27 Fast, easy-to-use, and general-purpose cluster computing framework for processing large datasets
  • 27. Supports a Variety of Data Sources 28
  • 28. Spark Benefits  Makes it easy to write distributed data processing applications – Expressive API  Takes care of the messy details of distributed computing  Allows developers to just focus on the business logic – Same code works on a single computer or a cluster of nodes 29
  • 29. Integrated Libraries for a Variety of Tasks 30 Spark Core Spark SQL GraphX Spark Streaming MLlib & Spark ML
  • 30. Spark is Fast  In-memory computation  Advanced Directed Acyclic Graph (DAG) execution engine 32
  • 31. Why In-memory Computation Matters 33 100 MB/s 500 MB/s 10 GB/s
  • 32. Read Time Comparison 0 50 100 150 200 1 TB Time (Min) Data Read HDD SSD RAM 34
  • 33. What Are People Using Spark For 35 Source: Databricks Survey 2015
  • 34. Top Reasons For Using Spark 36 Source: Databricks Survey 2015
  • 35. Adoption of Spark is Growing Rapidly
  • 36. Opportunities  Big data will only get bigger – Everything will be data driven – New data-driven applications will be invented – Data will enable us to solve extremely difficult problems  Spark and other big data technologies are rapidly evolving  Strong demand for people who know how to store, process and get value out of big data 40
  • 37. 41