Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Uber's data science workbench

2,829 views

Published on

https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56711

Published in: Engineering
  • You can ask here for a help. They helped me a lot an i`m highly satisfied with quality of work done. I can promise you 100% un-plagiarized text and good experts there. Use with pleasure! ⇒ www.HelpWriting.net ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! I can recommend a site that has helped me. It's called ⇒ www.WritePaper.info ⇐ They helped me for writing my quality research paper.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you’re looking for a great essay service then you should check out ⇒ www.HelpWriting.net ⇐. A friend of mine asked them to write a whole dissertation for him and he said it turned out great! Afterwards I also ordered an essay from them and I was very happy with the work I got too.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • taking surveys for cash online? =>> https://t.cn/A6ybK3XL
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • God bless you Ted. You saved me tons of money. I almost went to bought an overpriced side table until I saw your plans. Thanks for all the great ideas. It's gonna keep me occupied for a long time :) ➢➢➢ https://t.cn/A62YeZUX
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Uber's data science workbench

  1. 1. Uber’s Data Science Workbench Randy Wei Peng Du
  2. 2. Mission Unleash the productivity of the Data Science community at Uber by providing scalable infrastructure, tools, customization and support.
  3. 3. Tools of the Trade: Jupyter Notebooks Alternative to traditional CLIs Interactive tool which combines Prose (HTML Markdown), Code (Py, R, Scala) Visualization (charts, maps, tables) Shareable artifact of knowledge Hosted webapp Notebook, Notes, Cells Each cell is an executable line of code Used for Data exploration, Cleansing, Modeling Dashboarding/reporting HTML Code Output
  4. 4. Tools of the Trade: RStudio Server Browser interface to a remote R server Centrally manage compute infrastructure IDE for R Syntax highlight, code completion Debugging Charts File Browser RStudio also has Notebook functionality R has a huge library repository Used mostly for rapid prototyping of models on small datasets (UbeR) Data Code Output
  5. 5. Tools of the Trade: Apache Spark Distributed statistical computing framework Run R code without translating it to Java Choice of Intelligent Decision, Insurance, etc teams Distributed machine learning framework Easy to integrate with scientific Python libraries Choice of Fraud Detection, Sensing and Perception, etc teams SparkR PySpark
  6. 6. ● Productivity ● Py, R, Scala interpreters in Jupyter ● Hosted RStudio support ● Version Control ● Custom libraries/environment ● Single-pane lifecycle mgmnt. ● PySpark, SparkR Scale ● Scalable Jupyter Server infra. ● Large dist. computation backend ● Multitenancy ● File Persistence ● Security Requirements Ecosystem Integration ● Scheduling: Piper ● Dashboards: Shiny ● Data Exploration: Query engine API ● Deploy: Machine learning platform ● Chargeback: Monitoring platform ● Knowledge ● Search ● Access Controls ● Sharing Controls ● Publish ● Comments & Discussion Scale Productivity Social Ecosystem
  7. 7. State of the Union Problem ● Data Scientists (DSs) start at Uber with diverse skillsets and backgrounds ● Precious time wasted in infra. setup, version control, search, sharing... ● Teams are building their own solutions Vision ● Web-based hub for all Data Scientists at Uber ● Ability to centrally: ○ provision tools ○ leverage dist. Backend ○ search, comment, share ○ monitor ● Integrated with Uber’s data ecosystem ● Dedicated SRE Opportunity ● Find and reuse knowledge ● Opportunity for a dedicated team to advocate for and build tools needs to make DSs hyper-productive ● Cloud experience ● Chargeback
  8. 8. Similar offerings...
  9. 9. Management Service Create, Delete, Search, Share, Publish, Schedule RStudio (Docker) Uber Mesos Infra Shared File System MLlib Worker MLlib Worker MLlib Worker MLlib Worker MLlib Worker PySpark Worker MLlib Worker MLlib Worker SparkR Worker Uber spark debugging toolkit Uber spark development toolkit RStudio (Docker) RStudio (Docker) RStudio (Docker) RStudio (Docker) Jupyter (Docker) Manage Mesos Spark Architecture
  10. 10. Architecture NB1 Application Management Service session / file management, proxy Mesos Cluster Docker Container Hadoop Cluster (Hive, Presto, Spark) Distributed ProcessingDocker Container Docker Container RStudio Server RStudio Jupyter Docker Container NB1Jupyter Server NB2 Web GUI
  11. 11. Data Science Workbench Uber ML platform Palette Hive Cassandra Spark Spark SDK, Spark Debug tool, Spark templates Uber Ecosystem Models HDFS Query Runner Production PySpark for ML Data Visualization
  12. 12. Workflow Demo
  13. 13. Q&A

×