Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Developing Machine Learning Algorithms on HPCC/ECL Platform 
Industry/University Cooperative Research 
LexisNexis/Florida ...
Developing ML Algorithms On HPCC/ECL Platform 
LexisNexis/Florida Atlantic University Cooperative Research 
HPCC ECL ML Li...
FAU 
Principal Investigator: Taghi Khoshgoftaar Data Mining and Machine Learning 
Researchers: Victor Herrera Maryam Najaf...
Victor Herrera – FAU – Fall 2014 
Machine Learning Algorithm 
Classification Model 
Training Data 
Constructor 
Businessma...
ECL-ML Learning Library 
ML 
Supervised Learning 
Classifiers 
•Naïve Bayes 
•Perceptron 
•Logistic Regression 
Regression...
Community 
Machine Learning Documentation 
Review Code 
• 
Comment 
• 
Merge 
• 
Commit 
• 
Pull Request 
Process comments...
Victor Herrera – FAU – Fall 2014 
Algorithm Complexity 
Naïve Bayes 
Decision Trees 
•KNN 
•Random Forest 
Language Knowle...
Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
// Learning Phase 
minNumObj:= 2; maxLevel := 5; // DecTree Parameters 
learner1:= ML.Classify.DecisionTree.C45Binary(minN...
// Learning Phase 
minNumObj:= 2; maxLevel := 5; // DecTree Parameters 
learner1:= ML.Classify.DecisionTree.C45Binary(minN...
Victor Herrera – FAU – Fall 2014 
Supervised Learning 
Naïve Bayes 
Maximum Likelihood 
Probabilistic Model 
Decision Tree...
• 
Development of Machine Learning Algorithms in ECL Language, focused on the implementation of Classification Algorithms,...
Thank you. Victor Herrera – FAU – Fall 2014
Upcoming SlideShare
Loading in …5
×

HPCC Systems Engineering Summit Presentation - Collaborative Research with FAU and LexisNexis Deck 2

Presenter: Victor Herrera, PhD Candidate and Research Assistant, Florida Atlantic University

NOTE: This is the 2nd presentation in this session and also the 2nd presentation shown in the accompanying YouTube video

  • Login to see the comments

HPCC Systems Engineering Summit Presentation - Collaborative Research with FAU and LexisNexis Deck 2

  1. 1. Developing Machine Learning Algorithms on HPCC/ECL Platform Industry/University Cooperative Research LexisNexis/Florida Atlantic University Victor Herrera – FAU – Fall 2014
  2. 2. Developing ML Algorithms On HPCC/ECL Platform LexisNexis/Florida Atlantic University Cooperative Research HPCC ECL ML Library High Level Data Centric Declarative Language Scalability Dictionary Approach Open Source LexisNexis HPCC Platform Big Data Management Optimized Code Parallel Processing Machine Learning Algorithms Victor Herrera – FAU – Fall 2014
  3. 3. FAU Principal Investigator: Taghi Khoshgoftaar Data Mining and Machine Learning Researchers: Victor Herrera Maryam Najafabadi LexisNexis Flavio Villanustre VP Technology & Product Edin Muharemagic Data Scientist and Architect Project Team Victor Herrera – FAU – Fall 2014
  4. 4. Victor Herrera – FAU – Fall 2014 Machine Learning Algorithm Classification Model Training Data Constructor Businessman Doctor Learning Classification Constructor Project Scope • Supervised Learning for Classification • Big Data • Parallel Computing Implement Machine Learning Algorithms in HPCC ECL-ML library, focusing on:
  5. 5. ECL-ML Learning Library ML Supervised Learning Classifiers •Naïve Bayes •Perceptron •Logistic Regression Regression Linear Regression Unsupervised Learning Clustering KMeans Association Analysis •APrioriN •EclatN Project Starting Point April 2012 Victor Herrera – FAU – Fall 2014
  6. 6. Community Machine Learning Documentation Review Code • Comment • Merge • Commit • Pull Request Process comments ML-ECL Library Contributor ML-ECL Library Administrator • Use • Contribute ECL – ML Library HPCC-ECL Training HPCC-ECL Documentation Forums ECL-ML Open Source Blueprint Victor Herrera – FAU – Fall 2014
  7. 7. Victor Herrera – FAU – Fall 2014 Algorithm Complexity Naïve Bayes Decision Trees •KNN •Random Forest Language Knowledge •Transformation •Aggregations Data Distribution Parallel Optimization Implementation Complexity •Understand Existing Code •Fix Existing Modules Add Functionality Design Modules Data Categorical (Discrete) Continuous Mixed (Future Work) Experience Progress
  8. 8. Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  9. 9. // Learning Phase minNumObj:= 2; maxLevel := 5; // DecTree Parameters learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); tmodel:= learner1.LearnC(indepData, depData); // DecTree Model // Classification Phase results1:= learner1.ClassifyC(indepData, tmodel); // Comparing to Original compare1:= Classify.Compare(depData, results1); OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results // Learning Phase treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model // Classification Phase results2:= trainer2.ClassifyC(indepData, rfmodel); // Comparing to Original compare2:= Classify.Compare(depData, results2); OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results A40A4A4112A3A412<= 0.6> 0.6<= 4.7> 4.7<= 1.5> 1.5<= 1.8> 1.8<= 1.7> 1.7 Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  10. 10. // Learning Phase minNumObj:= 2; maxLevel := 5; // DecTree Parameters learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); tmodel:= learner1.LearnC(indepData, depData); // DecTree Model // Classification Phase results1:= learner1.ClassifyC(indepData, tmodel); // Comparing to Original compare1:= Classify.Compare(depData, results1); OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results // Learning Phase treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model // Classification Phase results2:= trainer2.ClassifyC(indepData, rfmodel); // Comparing to Original compare2:= Classify.Compare(depData, results2); OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  11. 11. Victor Herrera – FAU – Fall 2014 Supervised Learning Naïve Bayes Maximum Likelihood Probabilistic Model Decision Tree Top-Down Induction Recursive Data Partitioning Decision Tree Model Case Based KD-Tree Partitioning Nearest Neighbor Search No Model – Lazy KNN Algorithm Ensemble Data Sampling Tree Bagging F. Sampling Rdn Subspace Ensemble Tree Model Classification Class Probability Missing Values Classification Area Under ROC Project Contribution
  12. 12. • Development of Machine Learning Algorithms in ECL Language, focused on the implementation of Classification Algorithms, Big Data and Parallel Processing. • Added functionality to Naïve Bayes Classifier and implemented Decision Trees, KNN and Random Forest Classifiers into ECL-ML library. • Currently working on Big Data Learning & Evaluation : – Experiment design and implementation of a Big Data CASE Study – Analysis of Results. – Classifiers enhancements based upon analysis of results. Victor Herrera – FAU – Fall 2014 Overall Summary
  13. 13. Thank you. Victor Herrera – FAU – Fall 2014

×