SlideShare a Scribd company logo
1 of 33
Download to read offline
Apache Mahout
Distributed Matrix Math for Machine Learning
About Me
• Senior Director of Data Science at
Lucidworks (Apache Solr/Lucene, Fusion
search tools)
• Formerly Chief Data Scientist, Technical
Lead of Data Science Practice at Accenture
• Committer and PMC Member, Apache
Mahout
• On Twitter @akm
• Email at akm@apache.org,
andrew.musselman@lucidworks.com
• Adversarial Learning podcast with
@joelgrus at http://adversariallearning.com
Apache Mahout
Recent Trends
in 0.12/0.13
• Simplify and improve performance of
distributed matrix-math programming
• Provide flexible computation options for
software and hardware
• Enable easier and quicker new algorithm
development
• Allow polyglot programming and plotting in
notebooks via Apache Zeppelin
Introduction to Apache Mahout
Apache Mahout is an environment for creating scalable, performant, machine-learning
applications
Apache Mahout provides:
• Mathematically expressive Scala DSL
• A collection of pre-canned math and statistics algorithms
• Interchangeable distributed engines
• Interchangeable native solvers (JVM, CPU, GPU, CUDA, or custom)
Feature Highlights in Recent Releases
• v 0.13.1, Soon — CUDA Solvers, Apache Spark 2.1/Scala 2.11 support
• New web site platform, May 2017 — Moved from ASF CMS system to Markdown and
Jekyll; allows documentation pull requests to be merged in and published
automatically
• v 0.13.0, Apr 2017 — GPU/CPU Solvers, algorithm framework
• v 0.12.2, Nov 2016 — Apache Zeppelin integration for notebooks and visualization
• v 0.12.0, Apr 2016 — Apache Flink backend support
• New Mahout book, Feb 2016 — ‘Apache Mahout: Beyond MapReduce’ by Dmitriy
Lyubimov and Andrew Palumbo
• v 0.10.0 - Apr 2015 - Mahout-Samsara vector-math DSL, MapReduce jobs soft-
deprecated, Spark backend support
Topic Overview
• Mahout-Samsara: Declarative, R-like, domain-specific language (DSL) for matrix math
• Backend-agnostic programming
• Apache Zeppelin notebooks
• Algorithm development framework (modeled after sk-learn)
• Solve on available CPU cores, single or multiple GPUs, or in the JVM
• Next steps, and how to get involved
Mahout-Samsara
Mahout-Samsara
MapReduce is dead; long live the little clip-art blue man!
Mahout-Samsara
• Mahout-Samsara is an easy-to-use domain-specific language (DSL) for large-scale
machine learning on distributed systems like Apache Spark and Flink
• Uses Scala as programming/scripting environment
• Algebraic expression optimizer for distributed linear algebra
• Provides a translation layer to distributed engines
• Support for Spark and Flink DataSets, RDDs
• System-agnostic, R-like DSL; actual formula from (d)spca:
val G = B %*% B.t - C - C.t + (ksi dot ksi) * (s_q cross s_q)
Mahout-Samsara
• Mahout-Samsara computes C = A’A via row-outer-product formulation:
• Executes in a single pass over row-partitioned A
Example of an algebraic optimization
• Logical optimization
• Optimizer rewrites plan to
use logical operator for
Transpose-Times-Self
matrix multiplication
• Single pass: multiply
partitioned rows by
themselves as transposed
columns
• Computation of A’A:
val C = A.t %*% A
• Naïve execution
• 1st pass: transpose A
(requires repartitioning of
A)
• 2nd pass: multiply result
with A (expensive,
potentially requires
repartitioning again)
Mahout-Samsara
• Mahout-Samsara computes C = A’A via row-outer-product formulation:
• Executes in a single pass over row-partitioned A
Example of an algebraic optimization
Mahout-Samsara
• Mahout-Samsara computes C = A’A via row-outer-product formulation:
• Executes in a single pass over row-partitioned A
Example of an algebraic optimization
Mahout-Samsara
• Mahout-Samsara computes C = A’A via row-outer-product formulation:
• Executes in a single pass over row-partitioned A
Example of an algebraic optimization
Mahout-Samsara
• Mahout-Samsara computes C = A’A via row-outer-product formulation:
• Executes in a single pass over row-partitioned A
Example of an algebraic optimization
Backend-Agnostic Programming
Backend-Agnostic Programming
Apache Zeppelin Notebooks
Apache Zeppelin Notebooks
• Notebooks for polyglot programming with all types of data
• Plotting with R and Python off of computed data from other tools in the same notebook
• Share variables between interpreters
• For more: https://zeppelin.apache.org
• Mahout interpreter for Zeppelin released June 2016
• Post by Trevor Grant on how to use it at https://rawkintrevo.org/2016/05/19/visualizing-
apache-mahout-in-r-via-apache-zeppelin-incubating
• https://mahout.apache.org/docs/0.13.1-SNAPSHOT/tutorials/misc/mahout-in-zeppelin/
Apache Zeppelin Notebooks
Add the Mahout Interpreter
Apache Zeppelin Notebooks
Add the Mahout Interpreter, click “Create”
Apache Zeppelin Notebooks
Example usage
Apache Zeppelin Notebooks
Example usage
Apache Zeppelin Notebooks
Hand results to R for plotting
Algorithm Development Framework
Algorithm Development Framework
• Patterned after R and Python (sk-learn) APIs
• Fitter populates a Model, which contains the parameter estimates, fit statistics, a summary, and has a predict()
method
• https://rawkintrevo.org/2017/05/02/introducing-pre-canned-algorithms-apache-mahout
• https://mahout.apache.org/docs/0.13.1-SNAPSHOT/tutorials/misc/contributing-algos
Solve on CPU, GPU, or JVM
Solve on CPU, GPU, or JVM
Current architecture with native CPU and GPU support and unreleased jCUDA bindings
Solve on CPU, GPU, or JVM
Initial benchmarking on latest release
Solve on CPU, GPU, or JVM
Initial benchmarking on latest release
• Sparse MMul at geometry of 1000 x 1000 %*% 1000 x 1000 density = 0.2, with 5 runs
Mahout JVM Sparse multiplication time: 1501 ms

Mahout jCUDA Sparse multiplication time: 49 ms
30x speedup
• Sparse MMul at geometry of 1000 x 1000 %*% 1000 x 1000 density = .02, with 5 runs
Mahout JVM Sparse multiplication time: 34 ms
Mahout jCUDA Sparse multiplication time: 4 ms
8.5x speedup

• Sparse MMul at geometry of 1000 x 1000 %*% 1000 x 1000 density = .002, with 5 runs
Mahout JVM Sparse multiplication time: 1 ms
Mahout jCUDA Sparse multiplication time: 1 ms
0x speedup
Solve on CPU, GPU, or JVM
• jCUDA work is still in a branch, will be in master in the next couple months
• Currently the modes of compute are JVM, CPU (using all available cores), and single
GPU
• Multi-GPU is next priority
• Currently multiplication takes place in different solvers based on matrix shape
(banding, triangularity, etc.)
• Directing location for data and compute based on shape and density is another priority
• Watch this space for other speedups
Next steps
How to Use Mahout and Get Involved
How to Use Mahout and Get Involved
Web: https://mahout.apache.org
Source code, PRs welcome: https://github.com/apache/mahout
Mailing lists: https://mahout.apache.org/community/mailing-
lists.html
Download, install, embed: https://mahout.apache.org/
downloads.html
Thank You
Q&A
h2ps://mahout.apache.org
h2ps://github.com/apache/mahout

More Related Content

What's hot

Neural networks and google tensor flow
Neural networks and google tensor flowNeural networks and google tensor flow
Neural networks and google tensor flowShannon McCormick
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowAI Frontiers
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf
 
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017MLconf
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15MLconf
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowPaolo Tomeo
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlowDarshan Patel
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Teaching Recurrent Neural Networks using Tensorflow (May 2016)
Teaching Recurrent Neural Networks using Tensorflow (May 2016)Teaching Recurrent Neural Networks using Tensorflow (May 2016)
Teaching Recurrent Neural Networks using Tensorflow (May 2016)Rajiv Shah
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017Ashish Bansal
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & PythonLonghow Lam
 

What's hot (20)

Neural networks and google tensor flow
Neural networks and google tensor flowNeural networks and google tensor flow
Neural networks and google tensor flow
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
Machine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlowMachine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlow
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlow
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
Tensorflow
TensorflowTensorflow
Tensorflow
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Teaching Recurrent Neural Networks using Tensorflow (May 2016)
Teaching Recurrent Neural Networks using Tensorflow (May 2016)Teaching Recurrent Neural Networks using Tensorflow (May 2016)
Teaching Recurrent Neural Networks using Tensorflow (May 2016)
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 

Viewers also liked

Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017MLconf
 
Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017MLconf
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016MLconf
 
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...MLconf
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016MLconf
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisMLconf
 
Layla El Asri, Research Scientist, Maluuba
Layla El Asri, Research Scientist, Maluuba Layla El Asri, Research Scientist, Maluuba
Layla El Asri, Research Scientist, Maluuba MLconf
 
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 MLconf
 
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017MLconf
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016MLconf
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...MLconf
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017MLconf
 
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017MLconf
 

Viewers also liked (20)

Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
 
Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
 
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
 
Layla El Asri, Research Scientist, Maluuba
Layla El Asri, Research Scientist, Maluuba Layla El Asri, Research Scientist, Maluuba
Layla El Asri, Research Scientist, Maluuba
 
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
 
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
 
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
 

Similar to Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle 2017

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?MapR Technologies
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnJosef A. Habdank
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Spark Summit
 
BigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache ApexBigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache ApexThomas Weise
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceChin Huang
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Amazon Web Services
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersJen Aman
 
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016Badri Narayan Bhaskar
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLArnab Biswas
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Summit
 

Similar to Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle 2017 (20)

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
BigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache ApexBigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache Apex
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Recently uploaded (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle 2017

  • 1. Apache Mahout Distributed Matrix Math for Machine Learning
  • 2. About Me • Senior Director of Data Science at Lucidworks (Apache Solr/Lucene, Fusion search tools) • Formerly Chief Data Scientist, Technical Lead of Data Science Practice at Accenture • Committer and PMC Member, Apache Mahout • On Twitter @akm • Email at akm@apache.org, andrew.musselman@lucidworks.com • Adversarial Learning podcast with @joelgrus at http://adversariallearning.com
  • 3. Apache Mahout Recent Trends in 0.12/0.13 • Simplify and improve performance of distributed matrix-math programming • Provide flexible computation options for software and hardware • Enable easier and quicker new algorithm development • Allow polyglot programming and plotting in notebooks via Apache Zeppelin
  • 4. Introduction to Apache Mahout Apache Mahout is an environment for creating scalable, performant, machine-learning applications Apache Mahout provides: • Mathematically expressive Scala DSL • A collection of pre-canned math and statistics algorithms • Interchangeable distributed engines • Interchangeable native solvers (JVM, CPU, GPU, CUDA, or custom)
  • 5. Feature Highlights in Recent Releases • v 0.13.1, Soon — CUDA Solvers, Apache Spark 2.1/Scala 2.11 support • New web site platform, May 2017 — Moved from ASF CMS system to Markdown and Jekyll; allows documentation pull requests to be merged in and published automatically • v 0.13.0, Apr 2017 — GPU/CPU Solvers, algorithm framework • v 0.12.2, Nov 2016 — Apache Zeppelin integration for notebooks and visualization • v 0.12.0, Apr 2016 — Apache Flink backend support • New Mahout book, Feb 2016 — ‘Apache Mahout: Beyond MapReduce’ by Dmitriy Lyubimov and Andrew Palumbo • v 0.10.0 - Apr 2015 - Mahout-Samsara vector-math DSL, MapReduce jobs soft- deprecated, Spark backend support
  • 6. Topic Overview • Mahout-Samsara: Declarative, R-like, domain-specific language (DSL) for matrix math • Backend-agnostic programming • Apache Zeppelin notebooks • Algorithm development framework (modeled after sk-learn) • Solve on available CPU cores, single or multiple GPUs, or in the JVM • Next steps, and how to get involved
  • 8. Mahout-Samsara MapReduce is dead; long live the little clip-art blue man!
  • 9. Mahout-Samsara • Mahout-Samsara is an easy-to-use domain-specific language (DSL) for large-scale machine learning on distributed systems like Apache Spark and Flink • Uses Scala as programming/scripting environment • Algebraic expression optimizer for distributed linear algebra • Provides a translation layer to distributed engines • Support for Spark and Flink DataSets, RDDs • System-agnostic, R-like DSL; actual formula from (d)spca: val G = B %*% B.t - C - C.t + (ksi dot ksi) * (s_q cross s_q)
  • 10. Mahout-Samsara • Mahout-Samsara computes C = A’A via row-outer-product formulation: • Executes in a single pass over row-partitioned A Example of an algebraic optimization • Logical optimization • Optimizer rewrites plan to use logical operator for Transpose-Times-Self matrix multiplication • Single pass: multiply partitioned rows by themselves as transposed columns • Computation of A’A: val C = A.t %*% A • Naïve execution • 1st pass: transpose A (requires repartitioning of A) • 2nd pass: multiply result with A (expensive, potentially requires repartitioning again)
  • 11. Mahout-Samsara • Mahout-Samsara computes C = A’A via row-outer-product formulation: • Executes in a single pass over row-partitioned A Example of an algebraic optimization
  • 12. Mahout-Samsara • Mahout-Samsara computes C = A’A via row-outer-product formulation: • Executes in a single pass over row-partitioned A Example of an algebraic optimization
  • 13. Mahout-Samsara • Mahout-Samsara computes C = A’A via row-outer-product formulation: • Executes in a single pass over row-partitioned A Example of an algebraic optimization
  • 14. Mahout-Samsara • Mahout-Samsara computes C = A’A via row-outer-product formulation: • Executes in a single pass over row-partitioned A Example of an algebraic optimization
  • 18. Apache Zeppelin Notebooks • Notebooks for polyglot programming with all types of data • Plotting with R and Python off of computed data from other tools in the same notebook • Share variables between interpreters • For more: https://zeppelin.apache.org • Mahout interpreter for Zeppelin released June 2016 • Post by Trevor Grant on how to use it at https://rawkintrevo.org/2016/05/19/visualizing- apache-mahout-in-r-via-apache-zeppelin-incubating • https://mahout.apache.org/docs/0.13.1-SNAPSHOT/tutorials/misc/mahout-in-zeppelin/
  • 19. Apache Zeppelin Notebooks Add the Mahout Interpreter
  • 20. Apache Zeppelin Notebooks Add the Mahout Interpreter, click “Create”
  • 23. Apache Zeppelin Notebooks Hand results to R for plotting
  • 25. Algorithm Development Framework • Patterned after R and Python (sk-learn) APIs • Fitter populates a Model, which contains the parameter estimates, fit statistics, a summary, and has a predict() method • https://rawkintrevo.org/2017/05/02/introducing-pre-canned-algorithms-apache-mahout • https://mahout.apache.org/docs/0.13.1-SNAPSHOT/tutorials/misc/contributing-algos
  • 26. Solve on CPU, GPU, or JVM
  • 27. Solve on CPU, GPU, or JVM Current architecture with native CPU and GPU support and unreleased jCUDA bindings
  • 28. Solve on CPU, GPU, or JVM Initial benchmarking on latest release
  • 29. Solve on CPU, GPU, or JVM Initial benchmarking on latest release • Sparse MMul at geometry of 1000 x 1000 %*% 1000 x 1000 density = 0.2, with 5 runs Mahout JVM Sparse multiplication time: 1501 ms
 Mahout jCUDA Sparse multiplication time: 49 ms 30x speedup • Sparse MMul at geometry of 1000 x 1000 %*% 1000 x 1000 density = .02, with 5 runs Mahout JVM Sparse multiplication time: 34 ms Mahout jCUDA Sparse multiplication time: 4 ms 8.5x speedup
 • Sparse MMul at geometry of 1000 x 1000 %*% 1000 x 1000 density = .002, with 5 runs Mahout JVM Sparse multiplication time: 1 ms Mahout jCUDA Sparse multiplication time: 1 ms 0x speedup
  • 30. Solve on CPU, GPU, or JVM • jCUDA work is still in a branch, will be in master in the next couple months • Currently the modes of compute are JVM, CPU (using all available cores), and single GPU • Multi-GPU is next priority • Currently multiplication takes place in different solvers based on matrix shape (banding, triangularity, etc.) • Directing location for data and compute based on shape and density is another priority • Watch this space for other speedups Next steps
  • 31. How to Use Mahout and Get Involved
  • 32. How to Use Mahout and Get Involved Web: https://mahout.apache.org Source code, PRs welcome: https://github.com/apache/mahout Mailing lists: https://mahout.apache.org/community/mailing- lists.html Download, install, embed: https://mahout.apache.org/ downloads.html