Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Hama at Samsung Open Source Conference

Big Data 2nd Generation, Big Compute and Big Insight!

Apache Hama at Samsung Open Source Conference

  1. 1. Apache Hama Big Data 2nd Generation, Big Compute and Big Insight! 2014 Samsung Open Source Conference
  2. 2. Edward J. Yoon @ eddieyoon edwardyoon@apache.org
  3. 3. 1. Big Compute Platform based on Apache Hama. 2. Big Data Crowdsourcing Service: datacrowds.com
  4. 4. 1. Big Data Trends 2. What’s Hama? 3. Future Architecture of Hama
  5. 5. Big Data Analytics ● Large-scale unstructured data processing ● Statistics and Data mining To mine the valuable insights
  6. 6. HDFS Data Gathering (Flume) SQL-on-Hadoop Pig, Hive, Tez, Impala, Presto, …, etc. MapReduce ? Legacy DW or OLAP
  7. 7. HDFS Real-time Data Processing (Storm, .., etc) In-Memory or Message-Passing Spark, Hama, GraphLab, Giraph, Flink, …, etc
  8. 8. 2. In-Memory and Message-Passing Spark, Hama, Giraph, Storm 1. MapReduce and SQL-on-Hadoop Mahout, Pig, Hive, ... 2007 2010 2014
  9. 9. ● 1990 ~ : Web Documents Web 2.0 Blog, Open API Smartphone Social Network ● ~ 2014 : Responsive Apps for multi-devices
  10. 10. ● 1990 ~ : Server/Web Hosting Google Apps Cloud Computing IaaS, PaaS, SaaS ● ~ 2014 : Cloud/App Hosting
  11. 11. ● 1990 ~ : Text processing and mining MapReduce SQL-on-Hadoop In-memory or Message-Passing ● ~ 2014 : Matrix, Mining Networks and Graphs
  12. 12. Hama[hɑ́ːma] is a general-purpose BSP computing engine on Top of Hadoop
  13. 13. 1 / 154
  14. 14. Apache Hama is listed on the Best Open Source Big Data tools, Bossie Awards 2013
  15. 15. Streaming Graph Machine Learning Incremental Learning Hama (General-purpose BSP) O O O O Spark (Databricks) (In-memory MapReduce) O O (GraphX) O X GraphLab (Asynchronous graph computing) X O O O (Limited) Giraph (BSP-based graph computing) X O X X
  16. 16. Think about the Spam Filtering of Google’s Gmail! Real-time Data Processing Apache Hama HDFS
  17. 17. M M M Twitter Twitter 1. Streaming Job: Group1: Filter mentions M M Group2: Extract node and edge iteration 1 iteration 2 … G G G Twitter Another Example 2. Graph Job: Streaming graph updates while computing Direct Network Transfer Multi-BSP Job on Apache Hama G G G …
  18. 18. Appendix. Why all platforms uses BSP-style for graph-parallel?
  19. 19. MR version: Shortest Path - A map task receives a node n as a key, and (D, points-to) as its value - D is the distance to the node from the start - points-to is a list of nodes reachable from n ∀p ∈ points-to, emit (p, D+1) - Reduce task gathers possible distances to a given p and selects the minimum one
  20. 20. BSP version: Shortest Path 1: (2, 4) 2: (1, 3, 4) 3: (2) 4: (1, 2, 5, 6) 5: (4) 6: (4, 7) 7: (6) 1: (2, 4) D(0) 2: (1, 3, 4) D(1) 3: (2) D(2) 4: (1, 2, 5, 6) D(1) 5: (4) D(2) 6: (4, 7) D(2) 7: (6) D(3)
  21. 21. Why Google’s Pregel (graph) and DistBelief (deep learning) uses BSP-style?
  22. 22. Apache Hama’s Advanced Analytics Examples: ● Sparse Matrix-Vector Multiplication ● Semi-Clustering ● K-means Clustering ● Neural Networks ● Gradient Descent ● PageRank ● Single Source Shortest Path ● Bipartite Matching
  23. 23. Hama Supports: Hadoop 1.0 Hama on Hadoop 2.0 YARN Hama on Apache Mesos
  24. 24. Apache Hama at Sogou
  25. 25. Sogou.com runs 7,200 cores Hama cluster for SiteRank. ● SiteRank is the ranking generated by applying the classical PageRank algorithm to the graph of Web sites. ● Dataset is about 400GB, contains 6 Billion edges.
  26. 26. The future features of Apache Hama: Kryo serialization, Rootbeer GPU acceleration (Martin Illecker, University of Innsbruck), …, etc.
  27. 27. References ● Hama Website http://hama.apache.org/ ● Scientific Computing in the Cloud with Apache Hadoop and Apache Hama on GPU by Martin Illecker
  28. 28. If you want to be one of us, be one of us. Thanks!

    Be the first to comment

    Login to see the comments

  • ssuser4a734e

    Sep. 21, 2014
  • koorukuroo

    Sep. 22, 2014
  • eggywat

    Nov. 21, 2014
  • typeonkang

    Nov. 30, 2014
  • youngsunhan33

    Dec. 3, 2014
  • ljhmir

    Feb. 9, 2015
  • aki9

    Apr. 19, 2015
  • kabjinkwon

    Oct. 16, 2015

Big Data 2nd Generation, Big Compute and Big Insight!

Views

Total views

2,592

On Slideshare

0

From embeds

0

Number of embeds

43

Actions

Downloads

50

Shares

0

Comments

0

Likes

8

×