SlideShare a Scribd company logo
1 of 22
Download to read offline
Artificial Intelligence
Layer
Mahout, MLLib, & other projects
Víctor Sánchez Anguix
Universitat Politècnica de València
MSc. In Artificial Intelligence, Pattern Recognition, and Digital
Image
Course 2014/2015
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Core technologies like DFS and MR (i.e.,
Hadoop)
➢ ETL for transforming data (i.e., Pig)
➢ Alternative core/ETL technology (i.e., Spark)
➢ Now we can build AI tools from scratch
So far...
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
Can I save some work with
existing code?
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Write some UDF wrappers for Weka in
Pig/Spark
➢ Use connectors to R and Python
➢ Parallelize execution of multiple non-
distributed algorithms
Actually, we can...
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Still problematic if algorithm instances are
very big
➢ They are not really parallel algorithms
➢ Use parallel algorithms to tackle big problems:
○ Apache Mahout
○ Apache Spark
But we can do better!
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Collection of
parallel AI & ML
algorithms
➢ Map Reduce
algorithms → Spark
➢ Latest major
release: Mahout 0.9
(February 2014)
http://mahout.apache.org/
Apache Mahout
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Clustering algorithms:
○ K-means (parallel)
○ Fuzzy K-means (parallel)
○ Spectral K-means (parallel)
➢ Classification algorithms:
○ Logistic regression (non parallel)
○ Naive Bayes (parallel)
○ Random Forest (parallel)
○ Multilayer perceptron (non parallel)
Apache Mahout: Algorithms
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Dimensionality reduction:
○ Singular Value Decomposition (parallel)
○ PCA (parallel)
○ Lanczos decomposition (parallel)
○ QR decomposition (parallel)
➢ Text algorithms:
○ TF-IDF (parallel)
Apache Mahout: Algorithms
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Just type mahout in the shell
➢ A list of available algorithms will pop out
➢ Typing mahout algorithm_name will print the
help for the specific algorithm
➢ Executing distributed algorithms requires of
Hadoop and DFS
Mahout from shell
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ mahout recommenditembased :
○ --input: file with user_id item_id rows to represent
purchases
○ --output: where mahout should store results
○ --usersFile: who we should recommend
○ --itemsFile: what items we can recommend
○ -b: true (in our case, binary data)
○ --similarityClassname: SIMILARITY_LOGLIKELIHOOD
or SIMILARITY_TANIMOTO_COEFFICIENT (in our
case, binary data)
Mahout example: Item-based
Collaborative filtering
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Execute:
mahout recommenditembased --input
data/purchases_mahout.tsv --output mahout_cf --
usersFile data/users_mahout.tsv --itemsFile
data/valid_products_mahout.tsv --booleanData --
similarityClassname SIMILARITY_LOGLIKELIHOOD
Mahout example: Item-based
Collaborative filtering
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Machine learning
library inside Spark
➢ Completely
distributed
➢ It is bundled with
Spark!
MLLib
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Classification & Regression:
○ Support Vector Machines
○ Logistic Regression
○ Linear Regression
○ Random Forests
➢ Clustering:
○ K-means
MLLib: Algorithms
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Dimensionality reduction:
○ Singular Value Decomposition
○ PCA
➢ Clustering:
○ K-means
➢ Collaborative filtering
○ ALS item-based recommender
MLLib: Algorithms
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Let us apply K-means on the iris data set
MLLib: K-Means example
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors
val numClusters = 3
val numIteration = 20
val data_iris = sc.textFile( “hdfs:///user/sanguix/data/iris.csv”).map(
l=> l.split(“,”,-1) )
val parsedData = data_iris.map( r => Vectors.dense( Array( r(0).toDouble,
r(1).toDouble, r(2).toDouble, r(3).toDouble ) ) ).cache()
val clusters = KMeans.train( parsedData, numClusters, numIteration )
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Spark built-in library for graphs
➢ Algorithms:
○ PageRank
○ (Strong) Connected components
○ Label propagation
○ Other basic graph operations
Other projects: Graphx
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Graph framework over Hadoop
➢ Specialized for building
algorithms for graphs
➢ Latest major release:
Giraph 1.1.0 (Nov. 2014)
http://giraph.apache.org/
Other projects: Giraph
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Distributed framework for machine learning
➢ Originally created at Carnegie Mellon
➢ Algorithms:
○ Collaborative filtering
○ Text analysis
○ Page Rank
○ Deep learning
➢ Latest release: GraphLab 2.2 (July 2013)
https://github.com/graphlab-code/graphlab
Other projects: GraphLab
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ ML library on top of Hadoop/Spark
➢ Algorithms:
○ Random Forests
○ Generalized Linear Model
○ Deep learning
○ K-Means
➢ Latest release: H2O 2.8.4.4
(February 2015)
https://github.com/h2oai/h2o-dev
Other projects: H2O
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Large scale data processing engine in
Java/Scala
➢ In memory collections
➢ Latest release: Flink 0.8.0
(January 2015)
http://flink.apache.org/
Other projects: Apache Flink
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Mahout in Action. Sean Owen. Eds. Manning
Publications (2011)
➢ Apache Mahout Cookbook. Piero Giacomelli.
Ed. Packt Publishing (2013)
➢ StackOverflow
Extra information
Artificial Intelligence
Layer
Mahout, MLLib, & other projects
Víctor Sánchez Anguix
Universitat Politècnica de València
MSc. In Artificial Intelligence, Pattern Recognition, and Digital
Image
Course 2014/2015

More Related Content

What's hot

Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deckEric Dill
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityJoshua Shinavier
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...Till Blume
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Modelssaurav singla
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Joshua Shinavier
 
Algebraic Property Graphs
Algebraic Property GraphsAlgebraic Property Graphs
Algebraic Property GraphsAdrian Wilke
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumRalf Gommers
 
Machine learning libraries with python
Machine learning libraries with pythonMachine learning libraries with python
Machine learning libraries with pythonVishalBisht9217
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoMLNing Jiang
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Yannis Kalfoglou
 

What's hot (15)

Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deck
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
Poster
PosterPoster
Poster
 
Persian MNIST in 5 Minutes
Persian MNIST in 5 MinutesPersian MNIST in 5 Minutes
Persian MNIST in 5 Minutes
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Models
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
 
Algebraic Property Graphs
Algebraic Property GraphsAlgebraic Property Graphs
Algebraic Property Graphs
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS Forum
 
Tutorial4
Tutorial4Tutorial4
Tutorial4
 
Machine learning libraries with python
Machine learning libraries with pythonMachine learning libraries with python
Machine learning libraries with python
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
Python libraries
Python librariesPython libraries
Python libraries
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002
 

Similar to Artificial Intelligence Layer: Mahout, MLLib, and other projects

Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Python Machine Learning - Getting Started
Python Machine Learning - Getting StartedPython Machine Learning - Getting Started
Python Machine Learning - Getting StartedRafey Iqbal Rahman
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easyVictor Sanchez Anguix
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycleDatabricks
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
DeepLearning4J: Open Source Neural Net Platform
DeepLearning4J: Open Source Neural Net PlatformDeepLearning4J: Open Source Neural Net Platform
DeepLearning4J: Open Source Neural Net PlatformTuri, Inc.
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systemsXavier Amatriain
 
Machine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroMachine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroGraphAware
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is DistributedAlluxio, Inc.
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningHaptik
 
Антон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabАнтон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabDiana Dymolazova
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...TigerGraph
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
 

Similar to Artificial Intelligence Layer: Mahout, MLLib, and other projects (20)

Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Python Machine Learning - Getting Started
Python Machine Learning - Getting StartedPython Machine Learning - Getting Started
Python Machine Learning - Getting Started
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easy
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
DeepLearning4J: Open Source Neural Net Platform
DeepLearning4J: Open Source Neural Net PlatformDeepLearning4J: Open Source Neural Net Platform
DeepLearning4J: Open Source Neural Net Platform
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Machine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroMachine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro Negro
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
Антон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabАнтон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLab
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 

Recently uploaded

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 

Recently uploaded (20)

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 

Artificial Intelligence Layer: Mahout, MLLib, and other projects

  • 1. Artificial Intelligence Layer Mahout, MLLib, & other projects Víctor Sánchez Anguix Universitat Politècnica de València MSc. In Artificial Intelligence, Pattern Recognition, and Digital Image Course 2014/2015
  • 2. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Core technologies like DFS and MR (i.e., Hadoop) ➢ ETL for transforming data (i.e., Pig) ➢ Alternative core/ETL technology (i.e., Spark) ➢ Now we can build AI tools from scratch So far...
  • 3. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image Can I save some work with existing code?
  • 4. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Write some UDF wrappers for Weka in Pig/Spark ➢ Use connectors to R and Python ➢ Parallelize execution of multiple non- distributed algorithms Actually, we can...
  • 5. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Still problematic if algorithm instances are very big ➢ They are not really parallel algorithms ➢ Use parallel algorithms to tackle big problems: ○ Apache Mahout ○ Apache Spark But we can do better!
  • 6. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Collection of parallel AI & ML algorithms ➢ Map Reduce algorithms → Spark ➢ Latest major release: Mahout 0.9 (February 2014) http://mahout.apache.org/ Apache Mahout
  • 7. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Clustering algorithms: ○ K-means (parallel) ○ Fuzzy K-means (parallel) ○ Spectral K-means (parallel) ➢ Classification algorithms: ○ Logistic regression (non parallel) ○ Naive Bayes (parallel) ○ Random Forest (parallel) ○ Multilayer perceptron (non parallel) Apache Mahout: Algorithms
  • 8. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Dimensionality reduction: ○ Singular Value Decomposition (parallel) ○ PCA (parallel) ○ Lanczos decomposition (parallel) ○ QR decomposition (parallel) ➢ Text algorithms: ○ TF-IDF (parallel) Apache Mahout: Algorithms
  • 9. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Just type mahout in the shell ➢ A list of available algorithms will pop out ➢ Typing mahout algorithm_name will print the help for the specific algorithm ➢ Executing distributed algorithms requires of Hadoop and DFS Mahout from shell
  • 10. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ mahout recommenditembased : ○ --input: file with user_id item_id rows to represent purchases ○ --output: where mahout should store results ○ --usersFile: who we should recommend ○ --itemsFile: what items we can recommend ○ -b: true (in our case, binary data) ○ --similarityClassname: SIMILARITY_LOGLIKELIHOOD or SIMILARITY_TANIMOTO_COEFFICIENT (in our case, binary data) Mahout example: Item-based Collaborative filtering
  • 11. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Execute: mahout recommenditembased --input data/purchases_mahout.tsv --output mahout_cf -- usersFile data/users_mahout.tsv --itemsFile data/valid_products_mahout.tsv --booleanData -- similarityClassname SIMILARITY_LOGLIKELIHOOD Mahout example: Item-based Collaborative filtering
  • 12. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Machine learning library inside Spark ➢ Completely distributed ➢ It is bundled with Spark! MLLib
  • 13. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Classification & Regression: ○ Support Vector Machines ○ Logistic Regression ○ Linear Regression ○ Random Forests ➢ Clustering: ○ K-means MLLib: Algorithms
  • 14. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Dimensionality reduction: ○ Singular Value Decomposition ○ PCA ➢ Clustering: ○ K-means ➢ Collaborative filtering ○ ALS item-based recommender MLLib: Algorithms
  • 15. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Let us apply K-means on the iris data set MLLib: K-Means example import org.apache.spark.mllib.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors val numClusters = 3 val numIteration = 20 val data_iris = sc.textFile( “hdfs:///user/sanguix/data/iris.csv”).map( l=> l.split(“,”,-1) ) val parsedData = data_iris.map( r => Vectors.dense( Array( r(0).toDouble, r(1).toDouble, r(2).toDouble, r(3).toDouble ) ) ).cache() val clusters = KMeans.train( parsedData, numClusters, numIteration )
  • 16. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Spark built-in library for graphs ➢ Algorithms: ○ PageRank ○ (Strong) Connected components ○ Label propagation ○ Other basic graph operations Other projects: Graphx
  • 17. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Graph framework over Hadoop ➢ Specialized for building algorithms for graphs ➢ Latest major release: Giraph 1.1.0 (Nov. 2014) http://giraph.apache.org/ Other projects: Giraph
  • 18. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Distributed framework for machine learning ➢ Originally created at Carnegie Mellon ➢ Algorithms: ○ Collaborative filtering ○ Text analysis ○ Page Rank ○ Deep learning ➢ Latest release: GraphLab 2.2 (July 2013) https://github.com/graphlab-code/graphlab Other projects: GraphLab
  • 19. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ ML library on top of Hadoop/Spark ➢ Algorithms: ○ Random Forests ○ Generalized Linear Model ○ Deep learning ○ K-Means ➢ Latest release: H2O 2.8.4.4 (February 2015) https://github.com/h2oai/h2o-dev Other projects: H2O
  • 20. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Large scale data processing engine in Java/Scala ➢ In memory collections ➢ Latest release: Flink 0.8.0 (January 2015) http://flink.apache.org/ Other projects: Apache Flink
  • 21. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Mahout in Action. Sean Owen. Eds. Manning Publications (2011) ➢ Apache Mahout Cookbook. Piero Giacomelli. Ed. Packt Publishing (2013) ➢ StackOverflow Extra information
  • 22. Artificial Intelligence Layer Mahout, MLLib, & other projects Víctor Sánchez Anguix Universitat Politècnica de València MSc. In Artificial Intelligence, Pattern Recognition, and Digital Image Course 2014/2015