The world of transportation is radically changing. It is an industry with immense technological challenges, most of which are AI related. In the current paste and major active industry players, it will become unrecognisable in following years.
In this talk I aim to cover the different fields that it includes, data science related problems that it poses, and current state of the art solutions.
The focus of this talk will be smart cities, which multiple teams @Google work on, including mine and myself.
I will present my own work, including hotspot analysis, trajectory tracking (using a novel clustering method) using GPS and beacon data (patent pending), vehicle identification (classification and clustering), ETA and routing optimisation and personalisation (regression and ranking), drivers and riders matching (ranking and classification) and city planning.
I will also cover but not focus other smart city topics research and solutions by my counterparts on other Google teams and in Uber like autonomous vehicles (not a focus here, it is already too popular and crowded and appears in too many talks), fleet coordination (in a multi agent system), load distribution (reinforcement based), and vehicle syncing.
I will describe problems and solutions including the algorithm / model that is most currently used in the industry to solve such problems. On specific example, which I have personally researched I will go into more detail, including research phases, algorithm inner working and experiment results (usually A/B testing) on real user data.
This talk will give the audience an understanding of the tremendous challenges faced when trying to improve the state of transportation, and how we solve and plan on solving them to make the world a better place. It will also give participants a rare glimpse to some of Google's and Waze's ideas, algorithm, research methodologies and future plans for global transportation.
From personal experience of giving talks on transportation / Waze algorithms (never this one before) I have learnt that this is an "emotional" subject for many people, therefore very exciting to audience and full of questions.
Note that this talk is very different from the one presented last year which was covering multiple fields Waze operates on (e.g. Ads, usage, conversion, behavioural analytics, etc.). This talk would focus only transportation, current state and future which focus on how data science is crucial and the leading field in solving many of these problems.
1. Data Science Summit 2018
Towards
Smart Transportation
Daniel Marcous
Data Wizard
@dmarcous
Slides : https://www.slideshare.net/DanielMarcous/TowardsSmartTransporationDSS18
7. 2.3 2.31 2.33
1.35
2.35
2.67
Which city has the worst traffic?
MINUTES PER KILOMETER
Waze data, 2018
QUIZ
Dublin
Paris
Amsterdam
Tel Aviv
London
Zurich
9. Transportation is changing
ROADS ARE GETTING BUSIER
People are more inclined to share resources (Airbnb, Uber, Lyft, etc.)
Car ownership slowly makes less and less sense
Self driving cars are just around the corner
19. WIP : Routing Personalization
FINDING THE BEST ROUTE, FOR YOU
My Favourite
Learning to Rank
(pairwise)
Route features
User Features
Context Features
20. ETA
ESTIMATED TIME OF ARRIVAL
Real time traffic conditions
Historic road data
Highway / small street
Rush hours
Holidays
Vehicle
WIP : Personalization - User’s driving patterns
26. Transportation Supportive Events
KEEPING UP WITH REAL TIME CONDITIONS
Do several reports refer to a single event?
Where exactly is the event located?
Do events have trends?
29. Scalable Geospatial Clustering
DISTRIBUTED CLUSTERING USING MODIFIED DBSCAN
Geospatial Dataset
Read
1 2
Stage 1 -
Data
Partitioning
3 Stage 2 -
Local
Clustering
4 Stage 3 -
Global
Merging
Map only Foreach
Partition
Map-Reduce
Dataset[points]
30. Scalable Geospatial Clustering
PHASE 1 - REPORTS TO EVENTS
1. Divide the world into cells (s2)
2. For each cell
a. Local clustering - group similar reports into a single
event
3. Merge results
31. Scalable Geospatial Clustering
PHASE 2 - EVENTS TO TRENDS
1. Divide the world into cells (s2)
2. For each cell
a. Local clustering - group similar events over time into
an enclosing area
3. Merge results
32. Adding road speed limit to waze
map.
Safety as a top priority.
Speed Limit
33. Speed Limit Prediction
RANDOM FOREST CLASSIFIER
Reduced
driver speed
2
Kilometers /
hour
Road type
Road length
Average / Max / Percentile speed
Speed at night
Driving direction
Urban / Rural area
Accuracy : 70-90%
34. Ride Sharing
FINDING THE PERFECT MATCH
Rider Features Driver Features
Ride Attributes
Ride time
Detour length
Co-ridership
Rider
engagement
Driver
engagementML
Model
Reduced
time to match
15
minutes
35. Experiments In Production
A/B TESTS
1. Assign users to groups
2. Serve different feature versions
3. Measure differences
A B+75%
CLICKS
39. Future of Ride Sharing
VEHICLE AS A RESOURCE - AUTONOMOUS DRIVERS
Given X cars :
Pick up as many passengers as possible
Optimize for shortest passenger wait time
Optimize for shortest driver detour
40. Fleet Distribution
BALANCE SUPPLY & DEMAND OF RESOURCES
Resource = Vehicle
Supply - Fleet of vehicles (Drivers / Taxis / Autonomous)
Demand - Riders that want to get from A to B
41. 1. Record drives
2. Replay sequentially :
a. Calculate routes
b. Start drives
c. Adjust traffic
3. Measure KPI - e.g. overall ETA Error
4. Compare to true KPI (quality of simulation)
5. Change baseline (ETA prediction algorithm)
6. Reiterate 2-3
TRAFFIC SIMULATION
CURRENTLY USED TO IMPROVE ETA
42. Fleet Distribution
BALANCE RESOURCES SUPPLY & DEMAND
Solutions :
1. Simulation - turns “chaotic” rapidly. A change in X routes
effects traffic conditions. AKA: “The butterfly Effect”
2. Economic Modeling - balance based on predicted demand
3. Reinforcement Learning - change distribution dynamically
based on current supply & demand with regard to future
“reward”
We have just reached a milestone of 100M users, that spends 8 hours on Waze every month - 90K years of usage every month
23B driven KM - 75 round trips to the sun
And with great data come great insights