SlideShare a Scribd company logo
1 of 13
Download to read offline
Developing Machine Learning Algorithms on HPCC/ECL Platform 
Industry/University Cooperative Research 
LexisNexis/Florida Atlantic University 
Victor Herrera – FAU – Fall 2014
Developing ML Algorithms On HPCC/ECL Platform 
LexisNexis/Florida Atlantic University Cooperative Research 
HPCC ECL ML Library 
High Level 
Data Centric Declarative Language 
Scalability Dictionary Approach 
Open Source 
LexisNexis HPCC Platform 
Big Data Management 
Optimized Code 
Parallel Processing 
Machine Learning Algorithms Victor Herrera – FAU – Fall 2014
FAU 
Principal Investigator: Taghi Khoshgoftaar Data Mining and Machine Learning 
Researchers: Victor Herrera Maryam Najafabadi 
LexisNexis 
Flavio Villanustre VP Technology & Product 
Edin Muharemagic Data Scientist and Architect 
Project Team Victor Herrera – FAU – Fall 2014
Victor Herrera – FAU – Fall 2014 
Machine Learning Algorithm 
Classification Model 
Training Data 
Constructor 
Businessman 
Doctor 
Learning 
Classification 
Constructor 
Project Scope 
• 
Supervised Learning for Classification 
• 
Big Data 
• 
Parallel Computing 
Implement Machine Learning Algorithms in HPCC ECL-ML library, focusing on:
ECL-ML Learning Library 
ML 
Supervised Learning 
Classifiers 
•Naïve Bayes 
•Perceptron 
•Logistic Regression 
Regression 
Linear Regression 
Unsupervised Learning 
Clustering 
KMeans 
Association Analysis 
•APrioriN 
•EclatN 
Project Starting Point April 2012 
Victor Herrera – FAU – Fall 2014
Community 
Machine Learning Documentation 
Review Code 
• 
Comment 
• 
Merge 
• 
Commit 
• 
Pull Request 
Process comments 
ML-ECL Library Contributor 
ML-ECL Library 
Administrator 
• 
Use 
• 
Contribute 
ECL – ML Library 
HPCC-ECL Training 
HPCC-ECL Documentation Forums 
ECL-ML Open Source Blueprint Victor Herrera – FAU – Fall 2014
Victor Herrera – FAU – Fall 2014 
Algorithm Complexity 
Naïve Bayes 
Decision Trees 
•KNN 
•Random Forest 
Language Knowledge 
•Transformation 
•Aggregations 
Data Distribution 
Parallel Optimization 
Implementation Complexity 
•Understand Existing Code 
•Fix Existing Modules 
Add Functionality 
Design Modules 
Data 
Categorical (Discrete) 
Continuous 
Mixed (Future Work) 
Experience Progress
Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
// Learning Phase 
minNumObj:= 2; maxLevel := 5; // DecTree Parameters 
learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); 
tmodel:= learner1.LearnC(indepData, depData); // DecTree Model 
// Classification Phase 
results1:= learner1.ClassifyC(indepData, tmodel); 
// Comparing to Original 
compare1:= Classify.Compare(depData, results1); 
OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results 
// Learning Phase 
treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters 
learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); 
rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model 
// Classification Phase 
results2:= trainer2.ClassifyC(indepData, rfmodel); 
// Comparing to Original 
compare2:= Classify.Compare(depData, results2); 
OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results 
A40A4A4112A3A412<= 0.6> 0.6<= 4.7> 4.7<= 1.5> 1.5<= 1.8> 1.8<= 1.7> 1.7 
Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
// Learning Phase 
minNumObj:= 2; maxLevel := 5; // DecTree Parameters 
learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); 
tmodel:= learner1.LearnC(indepData, depData); // DecTree Model 
// Classification Phase 
results1:= learner1.ClassifyC(indepData, tmodel); 
// Comparing to Original 
compare1:= Classify.Compare(depData, results1); 
OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results 
// Learning Phase 
treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters 
learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); 
rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model 
// Classification Phase 
results2:= trainer2.ClassifyC(indepData, rfmodel); 
// Comparing to Original 
compare2:= Classify.Compare(depData, results2); 
OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results 
Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
Victor Herrera – FAU – Fall 2014 
Supervised Learning 
Naïve Bayes 
Maximum Likelihood 
Probabilistic Model 
Decision Tree 
Top-Down Induction 
Recursive Data Partitioning 
Decision Tree Model 
Case Based 
KD-Tree Partitioning 
Nearest Neighbor Search 
No Model – Lazy KNN Algorithm 
Ensemble 
Data Sampling Tree Bagging 
F. Sampling Rdn Subspace 
Ensemble Tree Model 
Classification 
Class Probability 
Missing Values Classification 
Area Under ROC 
Project Contribution
• 
Development of Machine Learning Algorithms in ECL Language, focused on the implementation of Classification Algorithms, Big Data and Parallel Processing. 
• 
Added functionality to Naïve Bayes Classifier and implemented Decision Trees, KNN and Random Forest Classifiers into ECL-ML library. 
• 
Currently working on Big Data Learning & Evaluation : 
– 
Experiment design and implementation of a Big Data CASE Study 
– 
Analysis of Results. 
– 
Classifiers enhancements based upon analysis of results. Victor Herrera – FAU – Fall 2014 
Overall Summary
Thank you. Victor Herrera – FAU – Fall 2014

More Related Content

What's hot

Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositoriesfeiwin
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GANDai-Hai Nguyen
 
Mit102 data & file structures
Mit102  data & file structuresMit102  data & file structures
Mit102 data & file structuressmumbahelp
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with pythonKumud Arora
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologiesPolad Saruxanov
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
AjayBhullar_Resume (5)
AjayBhullar_Resume (5)AjayBhullar_Resume (5)
AjayBhullar_Resume (5)Ajay Bhullar
 
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | EdurekaTensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | EdurekaEdureka!
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNINGMLReview
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataVrije Universiteit Amsterdam
 
ModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationSimon Roberts
 
Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...
Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...
Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...Florian Mai
 

What's hot (15)

Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositories
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
Mit102 data & file structures
Mit102  data & file structuresMit102  data & file structures
Mit102 data & file structures
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with python
 
Lecture 5
Lecture 5Lecture 5
Lecture 5
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologies
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
AjayBhullar_Resume (5)
AjayBhullar_Resume (5)AjayBhullar_Resume (5)
AjayBhullar_Resume (5)
 
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | EdurekaTensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
ModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex information
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...
Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...
Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...
 

Viewers also liked

20130313 het abc van sociale media ename
20130313 het abc van sociale media ename20130313 het abc van sociale media ename
20130313 het abc van sociale media enamekwb_eensgezind
 
20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwelskwb_eensgezind
 
INSPIRED Magazine Vol 02 Issue 02
INSPIRED Magazine Vol 02 Issue 02INSPIRED Magazine Vol 02 Issue 02
INSPIRED Magazine Vol 02 Issue 02Amy Bensema
 
jkintl profile
jkintl profilejkintl profile
jkintl profileJaya Kumar
 
2015 HPCC Systems Engineering Summit Community Day Wrap-up
2015 HPCC Systems Engineering Summit Community Day Wrap-up2015 HPCC Systems Engineering Summit Community Day Wrap-up
2015 HPCC Systems Engineering Summit Community Day Wrap-upHPCC Systems
 
Presentación defensa proyecto
Presentación defensa proyecto Presentación defensa proyecto
Presentación defensa proyecto José Kent
 
LEVICK Weekly - Jan 11 2013
LEVICK Weekly - Jan 11 2013LEVICK Weekly - Jan 11 2013
LEVICK Weekly - Jan 11 2013LEVICK
 
HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...
HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...
HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...HPCC Systems
 
What is SPF record good for? | Part 7#17
What is SPF record good for? | Part 7#17What is SPF record good for? | Part 7#17
What is SPF record good for? | Part 7#17Eyal Doron
 
Challenges and opportunities of the Mexican Space Agency
Challenges and opportunities of the Mexican Space Agency Challenges and opportunities of the Mexican Space Agency
Challenges and opportunities of the Mexican Space Agency Carlos Duarte
 
第一次 概念發想
第一次 概念發想第一次 概念發想
第一次 概念發想alan03265
 
LEVICK Weekly - Mar 8 2013
LEVICK Weekly - Mar 8 2013LEVICK Weekly - Mar 8 2013
LEVICK Weekly - Mar 8 2013LEVICK
 
The Laws of the Christian Life
The Laws of the Christian LifeThe Laws of the Christian Life
The Laws of the Christian LifeVictorias Church
 
Leveraging SMEs’ strengths for INSPIRE
Leveraging SMEs’ strengths for INSPIRELeveraging SMEs’ strengths for INSPIRE
Leveraging SMEs’ strengths for INSPIREsmespire
 

Viewers also liked (20)

 
Manual
ManualManual
Manual
 
20130313 het abc van sociale media ename
20130313 het abc van sociale media ename20130313 het abc van sociale media ename
20130313 het abc van sociale media ename
 
20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels
 
INSPIRED Magazine Vol 02 Issue 02
INSPIRED Magazine Vol 02 Issue 02INSPIRED Magazine Vol 02 Issue 02
INSPIRED Magazine Vol 02 Issue 02
 
Daily routine
Daily routineDaily routine
Daily routine
 
jkintl profile
jkintl profilejkintl profile
jkintl profile
 
2015 HPCC Systems Engineering Summit Community Day Wrap-up
2015 HPCC Systems Engineering Summit Community Day Wrap-up2015 HPCC Systems Engineering Summit Community Day Wrap-up
2015 HPCC Systems Engineering Summit Community Day Wrap-up
 
Presentación defensa proyecto
Presentación defensa proyecto Presentación defensa proyecto
Presentación defensa proyecto
 
LEVICK Weekly - Jan 11 2013
LEVICK Weekly - Jan 11 2013LEVICK Weekly - Jan 11 2013
LEVICK Weekly - Jan 11 2013
 
HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...
HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...
HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...
 
What is SPF record good for? | Part 7#17
What is SPF record good for? | Part 7#17What is SPF record good for? | Part 7#17
What is SPF record good for? | Part 7#17
 
Starfile0001
Starfile0001Starfile0001
Starfile0001
 
Challenges and opportunities of the Mexican Space Agency
Challenges and opportunities of the Mexican Space Agency Challenges and opportunities of the Mexican Space Agency
Challenges and opportunities of the Mexican Space Agency
 
第一次 概念發想
第一次 概念發想第一次 概念發想
第一次 概念發想
 
LEVICK Weekly - Mar 8 2013
LEVICK Weekly - Mar 8 2013LEVICK Weekly - Mar 8 2013
LEVICK Weekly - Mar 8 2013
 
The Laws of the Christian Life
The Laws of the Christian LifeThe Laws of the Christian Life
The Laws of the Christian Life
 
Linea del tiempo
Linea del tiempoLinea del tiempo
Linea del tiempo
 
Training program
Training programTraining program
Training program
 
Leveraging SMEs’ strengths for INSPIRE
Leveraging SMEs’ strengths for INSPIRELeveraging SMEs’ strengths for INSPIRE
Leveraging SMEs’ strengths for INSPIRE
 

Similar to HPCC Systems Engineering Summit Presentation - Collaborative Research with FAU and LexisNexis Deck 2

kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceJim Dowling
 
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRGData Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRGThamme Gowda
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATANexgen Technology
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...Databricks
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahoutGaurav Kasliwal
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf
 
Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017Mark Yashar
 
Mark_Yashar_Resume_Fall_2016
Mark_Yashar_Resume_Fall_2016Mark_Yashar_Resume_Fall_2016
Mark_Yashar_Resume_Fall_2016Mark Yashar
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Enhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through OntologyEnhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through OntologyChunhua Liao
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Sujit Pal
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
Basic ideas on keras framework
Basic ideas on keras frameworkBasic ideas on keras framework
Basic ideas on keras frameworkAlison Marczewski
 

Similar to HPCC Systems Engineering Summit Presentation - Collaborative Research with FAU and LexisNexis Deck 2 (20)

kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
 
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRGData Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developments
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017
 
Mark_Yashar_Resume_Fall_2016
Mark_Yashar_Resume_Fall_2016Mark_Yashar_Resume_Fall_2016
Mark_Yashar_Resume_Fall_2016
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Enhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through OntologyEnhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through Ontology
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Basic ideas on keras framework
Basic ideas on keras frameworkBasic ideas on keras framework
Basic ideas on keras framework
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 

More from HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

HPCC Systems Engineering Summit Presentation - Collaborative Research with FAU and LexisNexis Deck 2

  • 1. Developing Machine Learning Algorithms on HPCC/ECL Platform Industry/University Cooperative Research LexisNexis/Florida Atlantic University Victor Herrera – FAU – Fall 2014
  • 2. Developing ML Algorithms On HPCC/ECL Platform LexisNexis/Florida Atlantic University Cooperative Research HPCC ECL ML Library High Level Data Centric Declarative Language Scalability Dictionary Approach Open Source LexisNexis HPCC Platform Big Data Management Optimized Code Parallel Processing Machine Learning Algorithms Victor Herrera – FAU – Fall 2014
  • 3. FAU Principal Investigator: Taghi Khoshgoftaar Data Mining and Machine Learning Researchers: Victor Herrera Maryam Najafabadi LexisNexis Flavio Villanustre VP Technology & Product Edin Muharemagic Data Scientist and Architect Project Team Victor Herrera – FAU – Fall 2014
  • 4. Victor Herrera – FAU – Fall 2014 Machine Learning Algorithm Classification Model Training Data Constructor Businessman Doctor Learning Classification Constructor Project Scope • Supervised Learning for Classification • Big Data • Parallel Computing Implement Machine Learning Algorithms in HPCC ECL-ML library, focusing on:
  • 5. ECL-ML Learning Library ML Supervised Learning Classifiers •Naïve Bayes •Perceptron •Logistic Regression Regression Linear Regression Unsupervised Learning Clustering KMeans Association Analysis •APrioriN •EclatN Project Starting Point April 2012 Victor Herrera – FAU – Fall 2014
  • 6. Community Machine Learning Documentation Review Code • Comment • Merge • Commit • Pull Request Process comments ML-ECL Library Contributor ML-ECL Library Administrator • Use • Contribute ECL – ML Library HPCC-ECL Training HPCC-ECL Documentation Forums ECL-ML Open Source Blueprint Victor Herrera – FAU – Fall 2014
  • 7. Victor Herrera – FAU – Fall 2014 Algorithm Complexity Naïve Bayes Decision Trees •KNN •Random Forest Language Knowledge •Transformation •Aggregations Data Distribution Parallel Optimization Implementation Complexity •Understand Existing Code •Fix Existing Modules Add Functionality Design Modules Data Categorical (Discrete) Continuous Mixed (Future Work) Experience Progress
  • 8. Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  • 9. // Learning Phase minNumObj:= 2; maxLevel := 5; // DecTree Parameters learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); tmodel:= learner1.LearnC(indepData, depData); // DecTree Model // Classification Phase results1:= learner1.ClassifyC(indepData, tmodel); // Comparing to Original compare1:= Classify.Compare(depData, results1); OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results // Learning Phase treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model // Classification Phase results2:= trainer2.ClassifyC(indepData, rfmodel); // Comparing to Original compare2:= Classify.Compare(depData, results2); OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results A40A4A4112A3A412<= 0.6> 0.6<= 4.7> 4.7<= 1.5> 1.5<= 1.8> 1.8<= 1.7> 1.7 Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  • 10. // Learning Phase minNumObj:= 2; maxLevel := 5; // DecTree Parameters learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); tmodel:= learner1.LearnC(indepData, depData); // DecTree Model // Classification Phase results1:= learner1.ClassifyC(indepData, tmodel); // Comparing to Original compare1:= Classify.Compare(depData, results1); OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results // Learning Phase treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model // Classification Phase results2:= trainer2.ClassifyC(indepData, rfmodel); // Comparing to Original compare2:= Classify.Compare(depData, results2); OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  • 11. Victor Herrera – FAU – Fall 2014 Supervised Learning Naïve Bayes Maximum Likelihood Probabilistic Model Decision Tree Top-Down Induction Recursive Data Partitioning Decision Tree Model Case Based KD-Tree Partitioning Nearest Neighbor Search No Model – Lazy KNN Algorithm Ensemble Data Sampling Tree Bagging F. Sampling Rdn Subspace Ensemble Tree Model Classification Class Probability Missing Values Classification Area Under ROC Project Contribution
  • 12. • Development of Machine Learning Algorithms in ECL Language, focused on the implementation of Classification Algorithms, Big Data and Parallel Processing. • Added functionality to Naïve Bayes Classifier and implemented Decision Trees, KNN and Random Forest Classifiers into ECL-ML library. • Currently working on Big Data Learning & Evaluation : – Experiment design and implementation of a Big Data CASE Study – Analysis of Results. – Classifiers enhancements based upon analysis of results. Victor Herrera – FAU – Fall 2014 Overall Summary
  • 13. Thank you. Victor Herrera – FAU – Fall 2014