SlideShare a Scribd company logo
1 of 33
Download to read offline
Hue
A closer Look at Hue
What's on the Menu
● Hue Architecture
  ○ Many interfaces to implement
  ○ How do I list HDFS files, how do I submit a job...?
  ○ SDK
● Hue UI: Dynamic Workflow Editor
  ○   Why improve the user experience?
  ○   How can we improve the user experience?
  ○   Design Considerations
  ○   Design and Code Deep Dive
View from 30 000 feet
Ecosystem
Integrate with the Web
● HTTP, stateless (async queries)
● Frontend / Backend (e.g. different servers,
  pagination)
● Resources (e.g. img, js, callbacks, css, json)
● Browsers, multi techs
● DB (sqlite, MySql, PostGres...)
● i18n
● ...

More on UI later
Integrate Users
● Auth
  ○   Standard
  ○   LDAP
  ○   PAM
  ○   Spnego
  ○   Custom
      (OAuth, Cookie...)
Integrate HDFS
● Interfaces
     ○ Thrift (old)
       ■ NN
     ○ REST
       ■ WebHdfs
       ■ HttpFs (HA, new bugs)


● Uploads to HDFS
class HDFStemporaryUploadedFile(object):
class HDFSfileUploadHandler(FileUploadHandler):
Integrate Hive
   ● Beeswax: embedded Hive CLI
   ● Concurrent executions

   ● Beeswax / Hive Server 2 Thrift interfaces
   ● Hue models, HQL, Impala, DDL
service BeeswaxService {
                                                     service TCLIService {
 QueryHandle query(1:Query query) throws(1:
BeeswaxException error),                              TExecuteStatementResp ExecuteStatement(1:
                                                     TExecuteStatementReq req);
  QueryHandle executeAndWait(1:Query query, 2:
LogContextId clientCtx)                                TGetOperationStatusResp GetOperationStatus
                 throws(1:BeeswaxException error),   (1:TGetOperationStatusReq req);
....                                                 ....
Integrate Hive
Moving to Pluggable interfaces
                                       DBMS
                                      SQL API




                                 Beeswax        HS2




                                       Table




                                  BTable       HS2Table
Integrate Impala

● New app
● Same Beeswax/Hive Server 2 interfaces
● One more moving target..
Integrate Jobs
    ● List, access, kill
    ● aka JobBrowser

    ● JobTracker Thrift Plugin
mapred-site.xml
                                                       More Thrift
<property>
 <name>jobtracker.thrift.address</name>                service Jobtracker extends common.
 <value>0.0.0.0:9290</value>                           HadoopServiceBase {
</property>                                             ThriftJobInProgress getJob(10: common.
<property>                                             RequestContext ctx, 1: ThriftJobID jobID)
 <name>mapred.jobtracker.plugins</name>                      throws(1: JobNotFoundException err),
 <value>
   org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin    ThriftJobList getRunningJobs(10: common.
 </value>                                              RequestContext ctx),
</property>
Integrate Jobs
● Submit jobs (MR, Hive, Java, Pig...)
● Manage workflows
● Schedule workflows


● REST (GET, PUT, POST)
Integrate Shell
● Pig
● HBase
● Sqoop 2

●   Spawning Server
●   Greenlets
●   popen/pty/tty
●   IO (HTTP, DB...)
●   setuid
●   css/js/POST
Integrate YARN
● JobBrowser MR2, Oozie

● No JT, 4 more REST API
● MR to History Server, missing logs...
● MR1/2 API not 100% compatible
  (like Beeswax/HiveServer2, Beeswax
  UI/Impala switches)
Integrate security
● 'hue' superuser                                 ●   One 'hue'
  JT, Shell setuid root:hue                           Kerberos ticket
                                                  ●   Hive Server 2 ?
● 'hue' Proxy User / doAs
  HDFS
  Oozie
      <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
        <value>*</value>
      </property>
SDK: Integrate Developers
● Set of raw libs         ● Hue models

libs                      apps/
    /hadoop                   /jobbrowser
            /jobtracker       /oozie
            /webhdfs          /...
            /yarn
    /liboozie
    /rest
    /thrift
SDK: Integrate Developers
$ ./build/env/bin/hue create_desktop_app
clouddemo


● Custom: views/model/templates
● Reuse Hue libs

http://cloudera.github.com/hue/docs-2.1.0
/sdk/sdk.html#fast-guide-to-creating-a-new-
hue-application
CloudDemo example
Single click:

●   HTTP
●   HDFS
●   Oozie
●   JT
After the Interfaces...

... now the dynamic UI
    (Oozie App use case)
Why Improve User Experience
● Users like things that are easy to use
● Intuition and ease of use
How to Improve User Experience
● How can we do this for Oozie?
  ○ Hue users are not engineers
  ○ Most users are not familiar with shortcuts and
    command lines
  ○ Windowing systems have taught us drag and drop is
    good




Drag and drop every thing in a Workflow!
Old Hue Windowing System
Fundamentals of Front End Design
● Behavior
  ○ Javascript
  ○ Knockout JS
  ○ JQuery
● Presentation
  ○ CSS
  ○ Bootstrap
● Content
  ○ HTML (Templates)
● MV*
  ○ MVC
  ○ MVP
  ○ MVVM
Design Constraints
● Existing backend from Hue 2.1
  ○ Need to be able to easily migrate from Hue 2.1 to
    Hue 2.2
● Knockout JS and JQuery already chosen
  ○   Rudimentary templating
  ○   Subscription based bindings
  ○   Observables for arrays and Javascript literals only
  ○   Event delegation
● Existing UI from Hue 2.1
  ○ Provides basic node movement through form
    submission (reloads the page)
  ○ Not dynamic
Other Design Considerations
● Serializing should be trivial
● Basic API
   ○ Save a workflow
   ○ Validate a node
   ○ Read a workflow
● Difference in representation between Hue
  2.1 backend and the KnockoutJS way of
  doing things
● New nodes need an ID
Design - High Level Components




● Left out
  ○ Many event bindings and custom events
  ○ Views left out
Purpose of the Node Model
● Provides defaults for data:
var NodeModel = ModelModule($);
$.extend(NodeModel.prototype, {
  id: 0,
  name: '',
  description: '',
  node_type: '',
  workflow: 0,
  child_links: []
});

● Sent over the wire
● Mimics Django models
Model - ModelView Separation
● ModelViews should be the "shield" and
  Models the source of truth.
● Models are more serializable if they do not
  carry extraneous data.
● Subscribed update through KnockoutJS:
$.each(mapping, function(key, value) {
    var key = key;
    if (ko.isObservable(self[key])) {
         self[key].subscribe(function(value) {
             model[key] = ko.mapping.toJS(value);
         });
    }
});
Purpose of the Registry
●   Construction optimization
●   Constant time node lookup
●   Looking towards the future and storage
●   Simple start:
    var self = this;
    self.nodes = {};
    module.prototype.initialize.apply(self, arguments);
    return self;
Purpose of ID Generation
● Unique identifier for new nodes (IE: mapreduce:1).
● Assists in creating parent-child relationships through
   links.
var IdGeneratorModule = function($) {
   return function(options) {
      var self = this;
      $.extend(self, options);
      self.counter = 1;
      self.nextId = function() {
         return ((self.prefix) ? self.prefix + ':' : '') +
self.counter++;
      };
   };
};
Transpose to Show
● KnockoutJS supports 3 kinds of observables
    ○ Observables for literals
    ○ Observable arrays
    ○ Computed Observables
●   DAG received is represented as a tree




● DAG represented as a list of lists when we display...
    MVVM restriction
Other Difficulties
● Decision node representation
● JSON.stringify does not include parent class
  members
● Memory consumption
● Cycles, cycles, cycles
Next steps
● Integrate
  ○   Pig, Hive Server 2
  ○   Oozie Bundles, SLA
  ○   Document model, "Editors", git
  ○   SDK revamp, language agnostic, proxy app
● UX
  ○ Impala real time UI
  ○ Redesign overall layout
● Sqoop 2, HBase? Mahout?...


              Face of Hadoop/CDH

More Related Content

Recently uploaded

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

A closer look at hue: how to interface with Hadoop

  • 2. What's on the Menu ● Hue Architecture ○ Many interfaces to implement ○ How do I list HDFS files, how do I submit a job...? ○ SDK ● Hue UI: Dynamic Workflow Editor ○ Why improve the user experience? ○ How can we improve the user experience? ○ Design Considerations ○ Design and Code Deep Dive
  • 3. View from 30 000 feet
  • 5. Integrate with the Web ● HTTP, stateless (async queries) ● Frontend / Backend (e.g. different servers, pagination) ● Resources (e.g. img, js, callbacks, css, json) ● Browsers, multi techs ● DB (sqlite, MySql, PostGres...) ● i18n ● ... More on UI later
  • 6. Integrate Users ● Auth ○ Standard ○ LDAP ○ PAM ○ Spnego ○ Custom (OAuth, Cookie...)
  • 7. Integrate HDFS ● Interfaces ○ Thrift (old) ■ NN ○ REST ■ WebHdfs ■ HttpFs (HA, new bugs) ● Uploads to HDFS class HDFStemporaryUploadedFile(object): class HDFSfileUploadHandler(FileUploadHandler):
  • 8. Integrate Hive ● Beeswax: embedded Hive CLI ● Concurrent executions ● Beeswax / Hive Server 2 Thrift interfaces ● Hue models, HQL, Impala, DDL service BeeswaxService { service TCLIService { QueryHandle query(1:Query query) throws(1: BeeswaxException error), TExecuteStatementResp ExecuteStatement(1: TExecuteStatementReq req); QueryHandle executeAndWait(1:Query query, 2: LogContextId clientCtx) TGetOperationStatusResp GetOperationStatus throws(1:BeeswaxException error), (1:TGetOperationStatusReq req); .... ....
  • 9. Integrate Hive Moving to Pluggable interfaces DBMS SQL API Beeswax HS2 Table BTable HS2Table
  • 10. Integrate Impala ● New app ● Same Beeswax/Hive Server 2 interfaces ● One more moving target..
  • 11. Integrate Jobs ● List, access, kill ● aka JobBrowser ● JobTracker Thrift Plugin mapred-site.xml More Thrift <property> <name>jobtracker.thrift.address</name> service Jobtracker extends common. <value>0.0.0.0:9290</value> HadoopServiceBase { </property> ThriftJobInProgress getJob(10: common. <property> RequestContext ctx, 1: ThriftJobID jobID) <name>mapred.jobtracker.plugins</name> throws(1: JobNotFoundException err), <value> org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin ThriftJobList getRunningJobs(10: common. </value> RequestContext ctx), </property>
  • 12. Integrate Jobs ● Submit jobs (MR, Hive, Java, Pig...) ● Manage workflows ● Schedule workflows ● REST (GET, PUT, POST)
  • 13. Integrate Shell ● Pig ● HBase ● Sqoop 2 ● Spawning Server ● Greenlets ● popen/pty/tty ● IO (HTTP, DB...) ● setuid ● css/js/POST
  • 14. Integrate YARN ● JobBrowser MR2, Oozie ● No JT, 4 more REST API ● MR to History Server, missing logs... ● MR1/2 API not 100% compatible (like Beeswax/HiveServer2, Beeswax UI/Impala switches)
  • 15. Integrate security ● 'hue' superuser ● One 'hue' JT, Shell setuid root:hue Kerberos ticket ● Hive Server 2 ? ● 'hue' Proxy User / doAs HDFS Oozie <property> <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name> <value>*</value> </property>
  • 16. SDK: Integrate Developers ● Set of raw libs ● Hue models libs apps/ /hadoop /jobbrowser /jobtracker /oozie /webhdfs /... /yarn /liboozie /rest /thrift
  • 17. SDK: Integrate Developers $ ./build/env/bin/hue create_desktop_app clouddemo ● Custom: views/model/templates ● Reuse Hue libs http://cloudera.github.com/hue/docs-2.1.0 /sdk/sdk.html#fast-guide-to-creating-a-new- hue-application
  • 18. CloudDemo example Single click: ● HTTP ● HDFS ● Oozie ● JT
  • 19. After the Interfaces... ... now the dynamic UI (Oozie App use case)
  • 20. Why Improve User Experience ● Users like things that are easy to use ● Intuition and ease of use
  • 21. How to Improve User Experience ● How can we do this for Oozie? ○ Hue users are not engineers ○ Most users are not familiar with shortcuts and command lines ○ Windowing systems have taught us drag and drop is good Drag and drop every thing in a Workflow!
  • 23. Fundamentals of Front End Design ● Behavior ○ Javascript ○ Knockout JS ○ JQuery ● Presentation ○ CSS ○ Bootstrap ● Content ○ HTML (Templates) ● MV* ○ MVC ○ MVP ○ MVVM
  • 24. Design Constraints ● Existing backend from Hue 2.1 ○ Need to be able to easily migrate from Hue 2.1 to Hue 2.2 ● Knockout JS and JQuery already chosen ○ Rudimentary templating ○ Subscription based bindings ○ Observables for arrays and Javascript literals only ○ Event delegation ● Existing UI from Hue 2.1 ○ Provides basic node movement through form submission (reloads the page) ○ Not dynamic
  • 25. Other Design Considerations ● Serializing should be trivial ● Basic API ○ Save a workflow ○ Validate a node ○ Read a workflow ● Difference in representation between Hue 2.1 backend and the KnockoutJS way of doing things ● New nodes need an ID
  • 26. Design - High Level Components ● Left out ○ Many event bindings and custom events ○ Views left out
  • 27. Purpose of the Node Model ● Provides defaults for data: var NodeModel = ModelModule($); $.extend(NodeModel.prototype, { id: 0, name: '', description: '', node_type: '', workflow: 0, child_links: [] }); ● Sent over the wire ● Mimics Django models
  • 28. Model - ModelView Separation ● ModelViews should be the "shield" and Models the source of truth. ● Models are more serializable if they do not carry extraneous data. ● Subscribed update through KnockoutJS: $.each(mapping, function(key, value) { var key = key; if (ko.isObservable(self[key])) { self[key].subscribe(function(value) { model[key] = ko.mapping.toJS(value); }); } });
  • 29. Purpose of the Registry ● Construction optimization ● Constant time node lookup ● Looking towards the future and storage ● Simple start: var self = this; self.nodes = {}; module.prototype.initialize.apply(self, arguments); return self;
  • 30. Purpose of ID Generation ● Unique identifier for new nodes (IE: mapreduce:1). ● Assists in creating parent-child relationships through links. var IdGeneratorModule = function($) { return function(options) { var self = this; $.extend(self, options); self.counter = 1; self.nextId = function() { return ((self.prefix) ? self.prefix + ':' : '') + self.counter++; }; }; };
  • 31. Transpose to Show ● KnockoutJS supports 3 kinds of observables ○ Observables for literals ○ Observable arrays ○ Computed Observables ● DAG received is represented as a tree ● DAG represented as a list of lists when we display... MVVM restriction
  • 32. Other Difficulties ● Decision node representation ● JSON.stringify does not include parent class members ● Memory consumption ● Cycles, cycles, cycles
  • 33. Next steps ● Integrate ○ Pig, Hive Server 2 ○ Oozie Bundles, SLA ○ Document model, "Editors", git ○ SDK revamp, language agnostic, proxy app ● UX ○ Impala real time UI ○ Redesign overall layout ● Sqoop 2, HBase? Mahout?... Face of Hadoop/CDH