GraphMat: Bridging the Productivity-Performance Gap in Graph Analytics: With increasing interest in large-scale distributed graph analytics for machine learning and data mining, more data scientists and developers are struggling to achieve high performance without sacrificing productivity on large graph problems. In this talk, I will discuss our solution to this problem: GraphMat. Using generalized sparse matrix-based primitives, we are able to achieve performance that is very close to hand-optimized native code, while allowing users to write programs using the familiar vertex-centric programming paradigm. I will show how we optimized GraphMat to achieve this performance on distributed platforms and provide programming examples. We have integrated GraphMat with Apache Spark in a manner that allows the combination to outperform all other distributed graph frameworks. I will explain the reasons for this performance and show that our approach achieves very high hardware efficiency in both single-node and distributed environments using primitives that are applicable to many machine learning and HPC problems. GraphMat is open source software and available for download.
Marc Smith: NodeXL Twitter Network Graphs: CHI2010: https://www.flickr.com/photos/marc_smith/4511844243 (License: CC BY 2.0 http://creativecommons.org/licenses/by/2.5 )
Larry & Teddy Page: Blog webgraph: https://www.flickr.com/photos/igboo/1814232325 (License: CC BY 2.0 http://creativecommons.org/licenses/by/2.5 )
Xavier Gigandet et. al. - Gigandet X, Hagmann P, Kurant M, Cammoun L, Meuli R, et al. (2008) Estimating the Confidence Level of White Matter Connections Obtained with MRI Tractography. PLoS ONE 3(12): e4006. doi:10.1371/journal.pone.0004006 (License: CC BY 2.0 http://creativecommons.org/licenses/by/2.5 )
Machine 12 core IvyTown * 2 sockets (single node -- averaged over 5 algorithms)
Galois has no multinode support
MapGraph does not include CPU-GPU copy time