SlideShare a Scribd company logo
1 of 12
Download to read offline
DESIGN FOR X
exploring data science product design with apache spark + graphlab {create}
@amcasari @Concur
data science summit 2016, san francisco
nasa
data science via random walks
senior product mgr +
data scientist
@ Concur Labs
control systems
engineering +
robotics + legos
officer in USN
operations research
analyst
wandering dirtbag +
conservation volunteer
EE +
applied math
+ complex systems
underwater robotics
engineer
technology
consultant
SAHM
INSANELY QUICK INTRO TO +
➤ Concur Accelerator Team
➤ Concur Labs
➤ Incubator (still brewing)
850K
Users log into Concur
300K
Expense reports
processed
120K
Trips booked
170M
Trips & expense
reports warehoused
Typical Day at Concur
How do we encourage a culture of innovation
while delivering quality service to our existing
33,000 business clients and 40M users?
DESIGN SPRINTS FOR DATA SCIENCEY PROTOTYPES
courtesy google ventures {we iterated…because data}
INSANELY QUICK INTRO TO
➤ “fast and general engine for large-scale data processing”
➤ advanced cyclic data flow and in-memory computing > runs
10x-100x faster than Hadoop MR
➤ interactive shells in several languages (incl. SQL)
➤ performant + scalable
courtesy databricks
ALMOST AS INSANELY QUICK INTRO TO +
➤ graphlab create is based on a python data science library
developed + (some) os’d by turi
➤ SFrame <<>> Spark DataFrame | SparkRDD
➤ (yes it works with Open Source SFrame and GLC)
courtesy turi
WHAT PROBLEM DO WE WANT TO DATA SCIENCE?
Knowledge
Gaps
IOT
Networks
Bots
Fairness
+
➤ “We could {build this} {answer this better} if….”
➤ Reciprocal Data Applications
DESIGN FOR KNOWLEDGE GAPS
rda rdarda
choose
your data
storage
choose
your data
storage
choose
your data
storage
the app you
really
want to make
➤ “Can we trust our sensors?”
➤ “Has our network been hacked?”
DESIGN FOR IOT NETWORKS
device
device
device
alerts,
notifications,
monitoring
dashboards
data
services
Anomaly Detection Toolkit
TimeSeries <<>> SFrame
➤ “How do we create a conversational interface?”
….nothing new, just the burning question since Turing, 1950
DESIGN FOR BOTS
what NOT to do….
non-creepy
unisex
animal mascot
conversational
ui
choose
or
create
your
framework
choose your data storage
Advanced Deep Learning
Text Analysis Toolkit
Graph Analytics Toolkit
➤ know your biases + limitations
➤ in your data, their data, all the data
➤ in your feature selection
➤ in your algorithm
…..because ethics (these ALL bias your results + communications)
DESIGN FOR FAIRNESS
learn more at data & society’s case studies
+ +
open source. reproducible. transparent.
{THANKS MUCH}
➤ Concur is hiring!
➤ SAP + SAP Ariba are
hiring!
concurlabs.com
github.com/
concurlabs
➤ example notebooks will
be posted on our
github in the future
@amcasari

More Related Content

What's hot

The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
 
Big data-science-oanyc
Big data-science-oanycBig data-science-oanyc
Big data-science-oanycOpen Analytics
 
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...Databricks
 
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixPutting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixJeff Magnusson
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Big Data Spain
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With PythonSarah Guido
 
Data Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopData Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopMichelle Ufford
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardDemai Ni
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaSpark Summit
 
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Wes McKinney
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoDatabricks
 
WeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew ReyesWeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew ReyesWeCloudData
 
Making it easy to work with data
Making it easy to work with dataMaking it easy to work with data
Making it easy to work with dataCharles Smith
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster ServicesAdam Doyle
 

What's hot (20)

The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 
Big data-science-oanyc
Big data-science-oanycBig data-science-oanyc
Big data-science-oanyc
 
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
 
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixPutting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at Netflix
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 
Data Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopData Warehousing Patterns for Hadoop
Data Warehousing Patterns for Hadoop
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforward
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
 
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
 
WeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew ReyesWeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew Reyes
 
Making it easy to work with data
Making it easy to work with dataMaking it easy to work with data
Making it easy to work with data
 
OSCON 2015
OSCON 2015OSCON 2015
OSCON 2015
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 

Viewers also liked

Diapositivas casa de la calidad
Diapositivas casa de la calidadDiapositivas casa de la calidad
Diapositivas casa de la calidadEthetson Pineda
 
Design for 'X' and be prepared for anything
Design for 'X' and be prepared for anythingDesign for 'X' and be prepared for anything
Design for 'X' and be prepared for anythingTeam Consulting Ltd
 
Qfd - Despliegue de la función de Calidad
Qfd - Despliegue de la función de CalidadQfd - Despliegue de la función de Calidad
Qfd - Despliegue de la función de CalidadKelly Cuervo
 
Determinación de características qfd
Determinación de características qfdDeterminación de características qfd
Determinación de características qfdSagui Lab
 
Design for x : Design for Manufacturing,Design for Assembly
Design for x : Design for Manufacturing,Design for Assembly  Design for x : Design for Manufacturing,Design for Assembly
Design for x : Design for Manufacturing,Design for Assembly Naseel Ibnu Azeez
 
Metodologia Casa de la Calidad QFD
Metodologia Casa de la Calidad QFDMetodologia Casa de la Calidad QFD
Metodologia Casa de la Calidad QFDJ Carlos Ordoñez
 
Material Selection and Design Standards
Material Selection and Design Standards Material Selection and Design Standards
Material Selection and Design Standards Naseel Ibnu Azeez
 

Viewers also liked (8)

Diapositivas casa de la calidad
Diapositivas casa de la calidadDiapositivas casa de la calidad
Diapositivas casa de la calidad
 
Design for 'X' and be prepared for anything
Design for 'X' and be prepared for anythingDesign for 'X' and be prepared for anything
Design for 'X' and be prepared for anything
 
Qfd - Despliegue de la función de Calidad
Qfd - Despliegue de la función de CalidadQfd - Despliegue de la función de Calidad
Qfd - Despliegue de la función de Calidad
 
Determinación de características qfd
Determinación de características qfdDeterminación de características qfd
Determinación de características qfd
 
Module 4: Design For X
Module 4: Design For XModule 4: Design For X
Module 4: Design For X
 
Design for x : Design for Manufacturing,Design for Assembly
Design for x : Design for Manufacturing,Design for Assembly  Design for x : Design for Manufacturing,Design for Assembly
Design for x : Design for Manufacturing,Design for Assembly
 
Metodologia Casa de la Calidad QFD
Metodologia Casa de la Calidad QFDMetodologia Casa de la Calidad QFD
Metodologia Casa de la Calidad QFD
 
Material Selection and Design Standards
Material Selection and Design Standards Material Selection and Design Standards
Material Selection and Design Standards
 

Similar to Design for X: Exploring Product Design with Apache Spark and GraphLab

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Sandra Garcia
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Hortonworks
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceAlex Danvy
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Mark Tabladillo
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure TechnologiesKoray Kocabas
 
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014Le_GFII
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 

Similar to Design for X: Exploring Product Design with Apache Spark and GraphLab (20)

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
 
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 

More from Amanda Casari

When Privacy Scales - Intelligent Product Design under GDPR
When Privacy Scales - Intelligent Product Design under GDPRWhen Privacy Scales - Intelligent Product Design under GDPR
When Privacy Scales - Intelligent Product Design under GDPRAmanda Casari
 
Scaling Data Science Products, Not Data Science Teams
Scaling Data Science Products, Not Data Science TeamsScaling Data Science Products, Not Data Science Teams
Scaling Data Science Products, Not Data Science TeamsAmanda Casari
 
Spark Hearts GraphLab Create
Spark Hearts GraphLab CreateSpark Hearts GraphLab Create
Spark Hearts GraphLab CreateAmanda Casari
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 
20160512 apache-spark-for-everyone
20160512 apache-spark-for-everyone20160512 apache-spark-for-everyone
20160512 apache-spark-for-everyoneAmanda Casari
 
Feature Engineering for Machine Learning at QConSP
Feature Engineering for Machine Learning at QConSPFeature Engineering for Machine Learning at QConSP
Feature Engineering for Machine Learning at QConSPAmanda Casari
 
Understanding Products Driven by Machine Learning and AI: A Data Scientist's ...
Understanding Products Driven by Machine Learning and AI: A Data Scientist's ...Understanding Products Driven by Machine Learning and AI: A Data Scientist's ...
Understanding Products Driven by Machine Learning and AI: A Data Scientist's ...Amanda Casari
 
PyLadies Seattle - Lessons in Interactive Visualizations
PyLadies Seattle - Lessons in Interactive VisualizationsPyLadies Seattle - Lessons in Interactive Visualizations
PyLadies Seattle - Lessons in Interactive VisualizationsAmanda Casari
 

More from Amanda Casari (8)

When Privacy Scales - Intelligent Product Design under GDPR
When Privacy Scales - Intelligent Product Design under GDPRWhen Privacy Scales - Intelligent Product Design under GDPR
When Privacy Scales - Intelligent Product Design under GDPR
 
Scaling Data Science Products, Not Data Science Teams
Scaling Data Science Products, Not Data Science TeamsScaling Data Science Products, Not Data Science Teams
Scaling Data Science Products, Not Data Science Teams
 
Spark Hearts GraphLab Create
Spark Hearts GraphLab CreateSpark Hearts GraphLab Create
Spark Hearts GraphLab Create
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
20160512 apache-spark-for-everyone
20160512 apache-spark-for-everyone20160512 apache-spark-for-everyone
20160512 apache-spark-for-everyone
 
Feature Engineering for Machine Learning at QConSP
Feature Engineering for Machine Learning at QConSPFeature Engineering for Machine Learning at QConSP
Feature Engineering for Machine Learning at QConSP
 
Understanding Products Driven by Machine Learning and AI: A Data Scientist's ...
Understanding Products Driven by Machine Learning and AI: A Data Scientist's ...Understanding Products Driven by Machine Learning and AI: A Data Scientist's ...
Understanding Products Driven by Machine Learning and AI: A Data Scientist's ...
 
PyLadies Seattle - Lessons in Interactive Visualizations
PyLadies Seattle - Lessons in Interactive VisualizationsPyLadies Seattle - Lessons in Interactive Visualizations
PyLadies Seattle - Lessons in Interactive Visualizations
 

Recently uploaded

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 

Recently uploaded (20)

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 

Design for X: Exploring Product Design with Apache Spark and GraphLab

  • 1. DESIGN FOR X exploring data science product design with apache spark + graphlab {create} @amcasari @Concur data science summit 2016, san francisco nasa
  • 2. data science via random walks senior product mgr + data scientist @ Concur Labs control systems engineering + robotics + legos officer in USN operations research analyst wandering dirtbag + conservation volunteer EE + applied math + complex systems underwater robotics engineer technology consultant SAHM
  • 3. INSANELY QUICK INTRO TO + ➤ Concur Accelerator Team ➤ Concur Labs ➤ Incubator (still brewing) 850K Users log into Concur 300K Expense reports processed 120K Trips booked 170M Trips & expense reports warehoused Typical Day at Concur How do we encourage a culture of innovation while delivering quality service to our existing 33,000 business clients and 40M users?
  • 4. DESIGN SPRINTS FOR DATA SCIENCEY PROTOTYPES courtesy google ventures {we iterated…because data}
  • 5. INSANELY QUICK INTRO TO ➤ “fast and general engine for large-scale data processing” ➤ advanced cyclic data flow and in-memory computing > runs 10x-100x faster than Hadoop MR ➤ interactive shells in several languages (incl. SQL) ➤ performant + scalable courtesy databricks
  • 6. ALMOST AS INSANELY QUICK INTRO TO + ➤ graphlab create is based on a python data science library developed + (some) os’d by turi ➤ SFrame <<>> Spark DataFrame | SparkRDD ➤ (yes it works with Open Source SFrame and GLC) courtesy turi
  • 7. WHAT PROBLEM DO WE WANT TO DATA SCIENCE? Knowledge Gaps IOT Networks Bots Fairness +
  • 8. ➤ “We could {build this} {answer this better} if….” ➤ Reciprocal Data Applications DESIGN FOR KNOWLEDGE GAPS rda rdarda choose your data storage choose your data storage choose your data storage the app you really want to make
  • 9. ➤ “Can we trust our sensors?” ➤ “Has our network been hacked?” DESIGN FOR IOT NETWORKS device device device alerts, notifications, monitoring dashboards data services Anomaly Detection Toolkit TimeSeries <<>> SFrame
  • 10. ➤ “How do we create a conversational interface?” ….nothing new, just the burning question since Turing, 1950 DESIGN FOR BOTS what NOT to do…. non-creepy unisex animal mascot conversational ui choose or create your framework choose your data storage Advanced Deep Learning Text Analysis Toolkit Graph Analytics Toolkit
  • 11. ➤ know your biases + limitations ➤ in your data, their data, all the data ➤ in your feature selection ➤ in your algorithm …..because ethics (these ALL bias your results + communications) DESIGN FOR FAIRNESS learn more at data & society’s case studies + + open source. reproducible. transparent.
  • 12. {THANKS MUCH} ➤ Concur is hiring! ➤ SAP + SAP Ariba are hiring! concurlabs.com github.com/ concurlabs ➤ example notebooks will be posted on our github in the future @amcasari