SlideShare a Scribd company logo
1 of 35
Download to read offline
Apache Hama 
Big Data 2nd Generation, 
Big Compute and Big Insight! 
2014 Samsung Open Source Conference
Edward J. Yoon @ 
eddieyoon 
edwardyoon@apache.org
1. Big Compute Platform based on Apache Hama. 
2. Big Data Crowdsourcing Service: datacrowds.com
1. Big Data Trends 
2. What’s Hama? 
3. Future Architecture of Hama
Big Data Analytics 
● Large-scale unstructured data processing 
● Statistics and Data mining 
To mine the valuable insights
HDFS 
Data Gathering 
(Flume) 
SQL-on-Hadoop 
Pig, Hive, Tez, 
Impala, Presto, …, 
etc. 
MapReduce 
? 
Legacy DW 
or OLAP
HDFS 
Real-time Data 
Processing 
(Storm, .., etc) 
In-Memory or Message-Passing 
Spark, Hama, GraphLab, Giraph, Flink, …, etc
2. In-Memory 
and Message-Passing 
Spark, 
Hama, 
Giraph, 
Storm 
1. MapReduce 
and SQL-on-Hadoop 
Mahout, 
Pig, 
Hive, ... 
2007 2010 2014
● 1990 ~ : Web Documents 
Web 2.0 
Blog, Open API 
Smartphone 
Social Network 
● ~ 2014 : Responsive Apps for multi-devices
● 1990 ~ : Server/Web Hosting 
Google Apps 
Cloud Computing 
IaaS, PaaS, SaaS 
● ~ 2014 : Cloud/App Hosting
● 1990 ~ : Text processing and mining 
MapReduce 
SQL-on-Hadoop 
In-memory or Message-Passing 
● ~ 2014 : Matrix, Mining Networks and Graphs
Hama[hɑ́ːma] is a general-purpose 
BSP computing engine on Top of Hadoop
1 / 154
Apache Hama is 
listed on the Best Open 
Source Big Data tools, 
Bossie Awards 2013
Streaming Graph Machine Learning Incremental Learning 
Hama 
(General-purpose BSP) 
O O O O 
Spark (Databricks) 
(In-memory MapReduce) 
O O 
(GraphX) 
O X 
GraphLab 
(Asynchronous graph computing) 
X O O O 
(Limited) 
Giraph 
(BSP-based graph computing) 
X O X X
Think about the Spam Filtering of Google’s Gmail! 
Real-time Data 
Processing Apache Hama 
HDFS
M 
M 
M 
Twitter 
Twitter 
1. Streaming Job: 
Group1: 
Filter 
mentions 
M 
M 
Group2: 
Extract node and 
edge 
iteration 1 iteration 2 … 
G 
G 
G 
Twitter 
Another Example 
2. Graph Job: 
Streaming graph updates while 
computing 
Direct 
Network 
Transfer 
Multi-BSP Job on Apache Hama 
G 
G 
G 
…
Appendix. 
Why all platforms uses BSP-style for graph-parallel?
MR version: Shortest Path 
- A map task receives a node n as a key, and (D, points-to) 
as its value 
- D is the distance to the node from the start 
- points-to is a list of nodes reachable from n 
∀p ∈ points-to, emit (p, D+1) 
- Reduce task gathers possible distances to a given p and 
selects the minimum one
BSP version: Shortest Path 
1: (2, 4) 
2: (1, 3, 4) 
3: (2) 
4: (1, 2, 5, 6) 
5: (4) 
6: (4, 7) 
7: (6) 
1: (2, 4) D(0) 
2: (1, 3, 4) D(1) 
3: (2) D(2) 
4: (1, 2, 5, 6) D(1) 
5: (4) D(2) 
6: (4, 7) D(2) 
7: (6) D(3)
Why Google’s Pregel (graph) 
and DistBelief (deep learning) uses BSP-style?
Apache Hama’s Advanced Analytics Examples: 
● Sparse Matrix-Vector Multiplication 
● Semi-Clustering 
● K-means Clustering 
● Neural Networks 
● Gradient Descent 
● PageRank 
● Single Source Shortest Path 
● Bipartite Matching
Hama Supports: 
Hadoop 1.0 
Hama on Hadoop 2.0 YARN 
Hama on Apache Mesos
Apache Hama 
at Sogou
Sogou.com runs 7,200 cores Hama cluster for SiteRank. 
● SiteRank is the ranking generated by applying the 
classical PageRank algorithm to the graph of Web sites. 
● Dataset is about 400GB, contains 6 Billion edges.
The future features of Apache Hama: 
Kryo serialization, Rootbeer GPU acceleration (Martin 
Illecker, University of Innsbruck), …, etc.
References 
● Hama Website http://hama.apache.org/ 
● Scientific Computing in the Cloud with Apache Hadoop 
and Apache Hama on GPU by Martin Illecker
If you want to be one of us, be one of us. 
Thanks!

More Related Content

What's hot

A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovSpark Summit
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learningMehdi Shibahara
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - HortonworksAvery Ching
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraphsscdotopen
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...rhatr
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveJoydeep Sen Sarma
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windowsMuhammad Shahid
 

What's hot (20)

A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
 
Tutorial5
Tutorial5Tutorial5
Tutorial5
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windows
 

Viewers also liked

The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big dataEdward Yoon
 
What's in your filter bubble? Or, how has the internet censored you today?
What's in your filter bubble? Or, how has the internet censored you today?What's in your filter bubble? Or, how has the internet censored you today?
What's in your filter bubble? Or, how has the internet censored you today?Emily Ford
 
Τα παιδιά ως κοινωνικοί ερευνητές. Οδηγός για δασκάλους και ...
Τα   παιδιά   ως   κοινωνικοί   ερευνητές.   Οδηγός   για   δασκάλους   και  ...Τα   παιδιά   ως   κοινωνικοί   ερευνητές.   Οδηγός   για   δασκάλους   και  ...
Τα παιδιά ως κοινωνικοί ερευνητές. Οδηγός για δασκάλους και ...Αννα Παππα
 
Web 1.0 2.0 3.0 características, definiciones, ejemplos.
Web 1.0 2.0 3.0 características, definiciones, ejemplos.Web 1.0 2.0 3.0 características, definiciones, ejemplos.
Web 1.0 2.0 3.0 características, definiciones, ejemplos.SantiagoDiazSalamanca
 
Web 3.0 The Semantic Web
Web 3.0 The Semantic WebWeb 3.0 The Semantic Web
Web 3.0 The Semantic WebHatem Mahmoud
 
Web 1.0, Web 2.0 & Web 3.0
Web 1.0, Web 2.0 & Web 3.0Web 1.0, Web 2.0 & Web 3.0
Web 1.0, Web 2.0 & Web 3.0tokey_sport
 

Viewers also liked (9)

The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
 
What's in your filter bubble? Or, how has the internet censored you today?
What's in your filter bubble? Or, how has the internet censored you today?What's in your filter bubble? Or, how has the internet censored you today?
What's in your filter bubble? Or, how has the internet censored you today?
 
Why Web 3.0?
Why Web 3.0?Why Web 3.0?
Why Web 3.0?
 
Τα παιδιά ως κοινωνικοί ερευνητές. Οδηγός για δασκάλους και ...
Τα   παιδιά   ως   κοινωνικοί   ερευνητές.   Οδηγός   για   δασκάλους   και  ...Τα   παιδιά   ως   κοινωνικοί   ερευνητές.   Οδηγός   για   δασκάλους   και  ...
Τα παιδιά ως κοινωνικοί ερευνητές. Οδηγός για δασκάλους και ...
 
Web 1.0 2.0 3.0 características, definiciones, ejemplos.
Web 1.0 2.0 3.0 características, definiciones, ejemplos.Web 1.0 2.0 3.0 características, definiciones, ejemplos.
Web 1.0 2.0 3.0 características, definiciones, ejemplos.
 
Web 3.0 Intro
Web 3.0 IntroWeb 3.0 Intro
Web 3.0 Intro
 
Web 3.0
Web 3.0Web 3.0
Web 3.0
 
Web 3.0 The Semantic Web
Web 3.0 The Semantic WebWeb 3.0 The Semantic Web
Web 3.0 The Semantic Web
 
Web 1.0, Web 2.0 & Web 3.0
Web 1.0, Web 2.0 & Web 3.0Web 1.0, Web 2.0 & Web 3.0
Web 1.0, Web 2.0 & Web 3.0
 

Similar to Apache Hama at Samsung Open Source Conference

Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overviewMartin Zapletal
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with PythonDonald Miner
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSandish Kumar H N
 
Python in big data world
Python in big data worldPython in big data world
Python in big data worldRohit
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Anirudh Gangwar
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangaloreappaji intelhunt
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013alanfgates
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 

Similar to Apache Hama at Samsung Open Source Conference (20)

Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
 
Data Science
Data ScienceData Science
Data Science
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 

More from Edward Yoon

(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learningEdward Yoon
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링Edward Yoon
 
차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스Edward Yoon
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introductionEdward Yoon
 
Monitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsMonitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsEdward Yoon
 
Apache hama 0.2-userguide
Apache hama 0.2-userguideApache hama 0.2-userguide
Apache hama 0.2-userguideEdward Yoon
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time applicationEdward Yoon
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopEdward Yoon
 
Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear AlgebraEdward Yoon
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And HbaseEdward Yoon
 

More from Edward Yoon (11)

(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링
 
차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introduction
 
Monitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsMonitoring and mining network traffic in clouds
Monitoring and mining network traffic in clouds
 
Apache hama 0.2-userguide
Apache hama 0.2-userguideApache hama 0.2-userguide
Apache hama 0.2-userguide
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
 
Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear Algebra
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And Hbase
 
Heart Proposal
Heart ProposalHeart Proposal
Heart Proposal
 

Recently uploaded

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 

Recently uploaded (20)

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Apache Hama at Samsung Open Source Conference

  • 1. Apache Hama Big Data 2nd Generation, Big Compute and Big Insight! 2014 Samsung Open Source Conference
  • 2. Edward J. Yoon @ eddieyoon edwardyoon@apache.org
  • 3. 1. Big Compute Platform based on Apache Hama. 2. Big Data Crowdsourcing Service: datacrowds.com
  • 4. 1. Big Data Trends 2. What’s Hama? 3. Future Architecture of Hama
  • 5. Big Data Analytics ● Large-scale unstructured data processing ● Statistics and Data mining To mine the valuable insights
  • 6. HDFS Data Gathering (Flume) SQL-on-Hadoop Pig, Hive, Tez, Impala, Presto, …, etc. MapReduce ? Legacy DW or OLAP
  • 7. HDFS Real-time Data Processing (Storm, .., etc) In-Memory or Message-Passing Spark, Hama, GraphLab, Giraph, Flink, …, etc
  • 8. 2. In-Memory and Message-Passing Spark, Hama, Giraph, Storm 1. MapReduce and SQL-on-Hadoop Mahout, Pig, Hive, ... 2007 2010 2014
  • 9. ● 1990 ~ : Web Documents Web 2.0 Blog, Open API Smartphone Social Network ● ~ 2014 : Responsive Apps for multi-devices
  • 10. ● 1990 ~ : Server/Web Hosting Google Apps Cloud Computing IaaS, PaaS, SaaS ● ~ 2014 : Cloud/App Hosting
  • 11. ● 1990 ~ : Text processing and mining MapReduce SQL-on-Hadoop In-memory or Message-Passing ● ~ 2014 : Matrix, Mining Networks and Graphs
  • 12. Hama[hɑ́ːma] is a general-purpose BSP computing engine on Top of Hadoop
  • 13.
  • 15. Apache Hama is listed on the Best Open Source Big Data tools, Bossie Awards 2013
  • 16.
  • 17. Streaming Graph Machine Learning Incremental Learning Hama (General-purpose BSP) O O O O Spark (Databricks) (In-memory MapReduce) O O (GraphX) O X GraphLab (Asynchronous graph computing) X O O O (Limited) Giraph (BSP-based graph computing) X O X X
  • 18. Think about the Spam Filtering of Google’s Gmail! Real-time Data Processing Apache Hama HDFS
  • 19. M M M Twitter Twitter 1. Streaming Job: Group1: Filter mentions M M Group2: Extract node and edge iteration 1 iteration 2 … G G G Twitter Another Example 2. Graph Job: Streaming graph updates while computing Direct Network Transfer Multi-BSP Job on Apache Hama G G G …
  • 20. Appendix. Why all platforms uses BSP-style for graph-parallel?
  • 21. MR version: Shortest Path - A map task receives a node n as a key, and (D, points-to) as its value - D is the distance to the node from the start - points-to is a list of nodes reachable from n ∀p ∈ points-to, emit (p, D+1) - Reduce task gathers possible distances to a given p and selects the minimum one
  • 22. BSP version: Shortest Path 1: (2, 4) 2: (1, 3, 4) 3: (2) 4: (1, 2, 5, 6) 5: (4) 6: (4, 7) 7: (6) 1: (2, 4) D(0) 2: (1, 3, 4) D(1) 3: (2) D(2) 4: (1, 2, 5, 6) D(1) 5: (4) D(2) 6: (4, 7) D(2) 7: (6) D(3)
  • 23. Why Google’s Pregel (graph) and DistBelief (deep learning) uses BSP-style?
  • 24.
  • 25. Apache Hama’s Advanced Analytics Examples: ● Sparse Matrix-Vector Multiplication ● Semi-Clustering ● K-means Clustering ● Neural Networks ● Gradient Descent ● PageRank ● Single Source Shortest Path ● Bipartite Matching
  • 26. Hama Supports: Hadoop 1.0 Hama on Hadoop 2.0 YARN Hama on Apache Mesos
  • 27.
  • 28. Apache Hama at Sogou
  • 29. Sogou.com runs 7,200 cores Hama cluster for SiteRank. ● SiteRank is the ranking generated by applying the classical PageRank algorithm to the graph of Web sites. ● Dataset is about 400GB, contains 6 Billion edges.
  • 30.
  • 31. The future features of Apache Hama: Kryo serialization, Rootbeer GPU acceleration (Martin Illecker, University of Innsbruck), …, etc.
  • 32.
  • 33.
  • 34. References ● Hama Website http://hama.apache.org/ ● Scientific Computing in the Cloud with Apache Hadoop and Apache Hama on GPU by Martin Illecker
  • 35. If you want to be one of us, be one of us. Thanks!