Make Cars Smarter with Data Science

How to Make Cars Smarter: A Step
Towards Self-Driving Cars
Kaushik K. Das
Esther Vasiete
Pivotal Data Science
October 2016

Today’s presenters
Pivotal Data Science Perspectives
Kaushik K. Das
Head of Data Science, Pivotal
Esther Vasiete
Data Scientist, Pivotal

Agenda
• What do we mean by “smarter cars”?
• How do we apply data science to build
smarter cars?
Example 1: Predictive Maintenance
Example 2: Understanding Driver
Behavior Patterns
• Demo
• Next Steps

Autonomous Cars will offer many advantages
Call a car whenever you want to go somewhere – sit and relax – and
you are there!
● No stress for you – don’t have to drive in traffic or maintain a car
● Better utilization of cars leading to lower impact on environment
● Fewer accidents and injuries
BUT
there are some issues that still need to be solved – e.g. California law
needs a driver ready to take over in case of an emergency

Autonomous Cars
Manually Driven Cars
We need to get from

Smart “Augmented” Cars*
Autonomous Cars
Manually Driven Cars
Why not -
* Some people refer to smart augmented cars as semi-autonomous vehicles

Augmentation – a situation in which humans and
computers combine to create effective and efficient
outcomes*
● You get reduced stress and fewer accidents
● Fewer regulatory / legal barriers
● Easier to implement
* Thomas H. Davenport, Augmentation or Automation ?, WSJ, Feb 25, 2015.
Smart Cars offer many of the advantages of
automation

Smart System = Sensors Digital Brain + Actuators
Problem
Formulation
Data Step
Modeling
Step
Application
Step
Data Science For Building Models
Sensors & Data
Data Lake
Big Data Platform

Phase 1: Problem
Formulation
Make sure you formulate a
problem that is relevant to
the goals and pain points of
the stakeholders
Phase 2: Data Step
Build the right feature set
making full use of the
volume, variety and
velocity of all available
data
Phase 3: Modeling Step
This is where you move from
answering what, where and
when to answering why and
what if?
Phase 4: Application
Create a framework for
integrating the model with
decision making processes
and taking action using the
Internet of Things
Technology Selection
Select the right platform and the
right set of tools for solving the
problem at hand
Iterative Approach
Perform each phase in an agile
manner, team up with domain
experts and SMEs, and iterate
as required
Creativity
Take the opportunity to
innovate at every phase
Building a Narrative
Create a fact-based narrative
that clearly communicates
insights to stakeholders
The Eightfold Path of Data Science – four phases
and four differentiating factors

KEY LANGUAGES
P L A T F O R
M
KEY TOOLS
MLlib
PL/X
ModelingTools
VisualizationTools
Platform
Pivotal
HDB
Pivotal
Greenplum
Spring Cloud
Data Flow
Apache
Spark
Pivotal
HDP
Data Science Toolkit

Scalable, In-Database
Machine Learning
• Open source https://github.com/apache/incubator-madlib
• Downloads and docs http://madlib.incubator.apache.org/
• Wiki
https://cwiki.apache.org/confluence/display/MADLIB/

Functions
Linear Systems
• Sparse and Dense Solvers
• Linear Algebra
Matrix Factorization
• Singular Value Decomposition (SVD)
• Low Rank
Generalized Linear Models
• Linear Regression
• Logistic Regression
• Multinomial Logistic Regression
• Ordinal Regression
• Cox Proportional Hazards Regression
• Elastic Net Regularization
• Robust Variance (Huber-White),
Clustered Variance, Marginal Effects
Other Machine Learning Algorithms
• Principal Component Analysis (PCA)
• Association Rules (Apriori)
• Topic Modeling (Parallel LDA)
• Decision Trees
• Random Forest
• Support Vector Machines
• Conditional Random Field (CRF)
• Clustering (K-means)
• Cross Validation
• Naïve Bayes
• Support Vector Machines (SVM)
• Prediction Metrics
Descriptive Statistics
Sketch-Based Estimators
• CountMin (Cormode-Muth.)
• FM (Flajolet-Martin)
• MFV (Most Frequent Values)
Correlation and Covariance
Summary
Utility Modules
Array and Matrix Operations
Sparse Vectors
Random Sampling
Probability Functions
Data Preparation
PMML Export
Conjugate Gradient
Stemming
Sessionization
Pivot
Inferential Statistics
Hypothesis Tests
Time Series
• ARIMA
Sept 2016
Path Functions
• Operations on Pattern Matches

Data Science Use-Cases
● Smarter Car
‒ Is the car functioning well?
‒ Do any of the parts need servicing or replacement?
‒ How are the new parts functioning? Are they better than the old parts? How’s their performance
relative to tests?
● Smarter Driver Response
‒ Understand drivers driving patterns and typical routes and customize for better driving experience
(Advanced Driver Assistance Systems)
● Smarter Response to Surroundings
‒ How do we improve congestion forecasting and optimize routes better?
‒ How do we improve traffic management ?
‒ How can city planning be improved by using very granular driving and traffic information?

Initial
Sales
Web/Apps
Logs
Demographics
CRM
Consumer Data
Surveys
Driving
Behavior
Sales &
Leasing
Car Data
Dealership
Service Data
Parts
Manufactur
-ing
Telemetry
Data
Weather
Traffic
Economic
External
Special
Events
(Note: not an exhaustive list)
There’s a lot of data available

Preventive Maintenance for Connected Cars
Diagnostic Trouble Codes (DTC)
Unscheduled repairs
AB1029 – Power steering pump replacement
CT3408 – Wheel alignment

Data Sources for Predictive Maintenance
VIN
Timestamp
DTC Code
Odometer
Speed
Acceleration
Engine Temperature
Engine Torque GPS
Coordinates
etc.
VIN
Date vehicle in
Date vehicle out
Repair code
Parts replaced
Warranty claims
Repair Comments
etc.
Vehicle Data Car Repairs Data

Predicting Job Type from Diagnostic Trouble Codes
(DTCs)
Time
Job Type:
Transmission
Job Type:
Transmission
Engine
Job Type:
Regular check
DTC: B DTC:
B,
P, C
DTC: U
DTC: B DTC: B
DTC:
B, P, C, U
DTC:
P, B, U
DTC: P DTC: B DTC:
B,P
DTC:
B,P
Can the DTCs
observed here predict
this Job Type?
Can the DTCs observed
here predict this Job
Type?
Can the DTCs observed
here predict this Job
Type?

Hierarchical Classification Framework
Vehicle
Features
DF
12
10
DF
12
15
DF
29
80
AB
10
29
AB
16
22
AB
16
25
AB
86
22
CT
34
02
CT
34
08
CT
35
60
CT
24
09
DTC codes + other features
(e.g. mileage, vehicle model,
previous repairs, ...)
1st stage:
N one-vs-rest logistic
regression models
2nd stage:
N random forest
models

Your car will be repaired before you
have a problem!

Example 2 - Smarter Driver
Response

Unsupervised driving behavior analysis
Segmentation:
From raw sensor data to
driving scenes using
HMM.
Feature Distribution:
Quantization of physical
features observed in
each scene
Driving topics:
Scenes are represented
as a combination of
driving topics, which
explain driving patterns.
Parallelism using:
PL/Python *
* HMM inference from
pre-trained model
PL/Pytho
n
[T. Bando, K. Tabenaka, S. Negasaka, T. Taniguchi, Unsupervised drive topic finding from driving behavioral data, IEEE Intelligent Vehicles Symposium, 2013]

HMM inference using PL/Python
Note: HMM parameters had been provided to us
and loaded in the database.
hmmlearn library installed in every segment!

From time-series driving behavior into natural language
Latent Dirichlet Allocation (LDA)
Document
Word
Scene
Quantized
sensor
value
[D. Blei, Probabilistic topic models, Communications of the ACM, 2012]

Data Lake
Business Levers
Apps
MLlib
PL/X
Model Building
Model Tuning
Continuous Model
Improvement
Data Feeds
Ingest Filter Enrich Sink
Spring Cloud Data Flow
Greenplum
Operationalization - Pipeline of a Data Science Driven App

We will be able to improve your
driving experience by preparing your
car for the exact conditions you are
about to encounter.

It’s easy to make cars smarter -
let’s make it happen!

Additional resources & next steps
Read: Pivotal Data Science Blog
https://blog.pivotal.io/channels/data-science-pivotal
Strategic: Pivotal Data Science Analytics Roadmapping Engagement
https://pivotal.io/contact
Tune in: Next data science webinar “How Data Science can help with Fraud
Detection and Cybersecurity” - Q1 2017 (Date TBD)
https://pivotal.io/resources/1/webinars
Hands on:
HDB Sandbox on HDP VM https://network.pivotal.io/products/pivotal-hdb
Greenplum Sandbox https://network.pivotal.io/products/pivotal-gpdb
Apache MADlib (incubating) http://madlib.incubator.apache.org/

Make Cars Smarter with Data Science

Make Cars Smarter with Data Science

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Make Cars Smarter with Data Science

Similar to Make Cars Smarter with Data Science (20)

More from VMware Tanzu

More from VMware Tanzu (20)

Recently uploaded

Recently uploaded (20)

Make Cars Smarter with Data Science