SlideShare a Scribd company logo
1 of 24
MOVI: A Model-Free Approach
to Dynamic Fleet Management
Takuma Oda and Carlee Joe-Wong
Carnegie Mellon University
IEEE INFOCOM 2018/4/19 @Honolulu, HI
Optimization of taxi dispatch/cruising
Reduce passengers’ waiting time
Increase drivers revenue
Vehicle Dispatch Problem
Real-time (On-demand)
Complexity
Large state space
Demand uncertainty
Coordination in large-scale fleet
Centralized, model-based approach
 F. Miao, et al., Taxi Dispatch With Real-Time Sensing Data in Metropolitan
Areas: A Receding Horizon Control Approach, IEEE Trans. Autom. Sci. Eng.,
vol. 13, no. 2, pp. 463478, Apr. 2016.
 Limited modeling of vehicles dynamics
 Computationally intractable for real-time application
Our work: distributed, model-free approach
Challenges
Problem Definition
Environment
Agent
Observation
-Requests
-Vehicle State
-Auxiliary Info
Action
-Dispatch Decision
Reward
-Revenue
-Idle Cruising Cost
Matching
All rides are requested with app
Vehicle state information is available in real time
Requests are rejected if no available vehicles within the
fixed range, e.g., 5 km.
Assumptions
Baseline: Receding Horizon Control (RHC)
Our approach: Deep Q-network (DQN)
Approach
Policy RHC DQN
Formulation Deterministic Optimization Reinforcement Learning
Coordination Centralized Distributed
Model Model-based Model-free
Discretization Taxi Zone Grid
Action: number of vehicles to send to each region, each time
Reward:
Transition Model:
Idle cruising costUnserved requests
Leftover vehicles Vehicles sent
Taxis dropping off
passengers
RHC Approach
Action: where each taxi should go in the next timeslot
Reward Model:
Optimal Action-Value Function:
Loss Function:
Idle cruising cost
Pickup
DQN Approach
Target value
MOVI Architecture
Agent
Demand
Prediction
RHC/DQN
Policies
Fleet Object
Ride
Requests
Dispatcher
ETA Model
OSM Road
Network
Matching
Dispatch
Route
Trip Time
Environment Simulator
wt - 1
Ft
at
Datasets
Training data
Test data
DQN Architecture
Fully CNN with auxiliary inputs
Outputs: Q value for each possible moves
Inputs: demand and supply heat maps
DQN Training
Training Step Training Step
Algorithm: Double DQN with experience replay
Exploration: Epsilon greedy with activation rate
Performance Comparison Over a Week
Reject Rate Wait Time Idle Cruising Time
Relative to NO 76% 34% Increases by 1.3%
Relative to RHC 20% 12% Increases by 4%
 DQN outperforms RHC due to the real-time dispatch decision
 DQN forward pass < 100ms
 RHC computation ~ a few seconds
 DQN is more beneficial for drivers
 DQN predicts best action for individual vehicle
 More realistic to implement in real-world
Discussion
UtilizationRate
Conclusion
Contribution
Demonstrated the benefits of applying model-free,
distributed solution to large-scale taxi dispatch problem
Future Work
Partial Observable Environment
Other Reinforcement Learning Framework
Thank you!
takumao@andrew.cmu.edu
MOVI Algorithm
Limited Performance Tradeoffs
Reject Penalty Reject Penalty Reject Penalty
Reject Penalty Reject Penalty Reject Penalty
RHC
DQN
Hour-by-Hour Performance Comparison
V. Mnih, et al. Human-level control through deep
reinforcement learning., Nature, vol. 518, no. 7540, pp.
52933, Feb. 2015.
Q-learning algorithm with function approximation
1. Take some action and observe
2. Set target values
3. Perform a gradient descent step on
Q-learning
Problem Definition
RHC/DQN
Policy Engine
Data Pre-
processing
Demand
Prediction
atwt - 1
Ft
Xt:t + TWt:t + T
Vehicle/passenger
matching
It
Vehicle State
Past
Demands
Dispatch Center
Dispatch
Decisions
Multi-Agent Double DQN Algorithm
Demand Supply Distribution Mismatch
Inference Algorithm

More Related Content

Similar to INFOCOM 2018 Talk: MOVI

[English]sae convergence2010 final2
[English]sae convergence2010 final2[English]sae convergence2010 final2
[English]sae convergence2010 final2Tsuguo Nobe
 
Vadakpat-UAV Intelligent Transportation Workshop Slides
Vadakpat-UAV Intelligent Transportation Workshop SlidesVadakpat-UAV Intelligent Transportation Workshop Slides
Vadakpat-UAV Intelligent Transportation Workshop SlidesPrithviraj (Raj) Dasgupta
 
[English]sae convergence2010 final2
[English]sae convergence2010 final2[English]sae convergence2010 final2
[English]sae convergence2010 final2Tsuguo Nobe
 
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1NanubalaDhruvan
 
Autonomous driving system using proximal policy optimization in deep reinforc...
Autonomous driving system using proximal policy optimization in deep reinforc...Autonomous driving system using proximal policy optimization in deep reinforc...
Autonomous driving system using proximal policy optimization in deep reinforc...IAESIJAI
 
Speed study pradipta banik 1204012
Speed study pradipta banik 1204012Speed study pradipta banik 1204012
Speed study pradipta banik 1204012Pradipta Banik
 
Local Motors Awesome System
Local Motors Awesome SystemLocal Motors Awesome System
Local Motors Awesome SystemDamien DECLERCQ
 
2017.On-Line VRP for CER.pdf
2017.On-Line VRP for CER.pdf2017.On-Line VRP for CER.pdf
2017.On-Line VRP for CER.pdfAdrianSerrano31
 
Traffic Light Control
Traffic Light ControlTraffic Light Control
Traffic Light Controlhoadktd
 
smart traffic control system using canny edge detection algorithm (4).pdf
smart traffic control system using canny edge detection algorithm (4).pdfsmart traffic control system using canny edge detection algorithm (4).pdf
smart traffic control system using canny edge detection algorithm (4).pdfGYamini22
 
Cost-Effective Single-Camera Multi-Car Parking Monitoring and Vacancy Detecti...
Cost-Effective Single-Camera Multi-Car Parking Monitoring and Vacancy Detecti...Cost-Effective Single-Camera Multi-Car Parking Monitoring and Vacancy Detecti...
Cost-Effective Single-Camera Multi-Car Parking Monitoring and Vacancy Detecti...toukaigi
 
Traffic Light Control
Traffic Light ControlTraffic Light Control
Traffic Light Controlhoadktd
 
Model options for public Bus transport- India(PPP)
Model options for public Bus transport- India(PPP)Model options for public Bus transport- India(PPP)
Model options for public Bus transport- India(PPP)Jalpa Jain
 
IRJET - Smart Traffic Monitoring System
IRJET -  	  Smart Traffic Monitoring SystemIRJET -  	  Smart Traffic Monitoring System
IRJET - Smart Traffic Monitoring SystemIRJET Journal
 
Praktijkrelevantie TRAIL PhD onderzoek
Praktijkrelevantie TRAIL PhD onderzoekPraktijkrelevantie TRAIL PhD onderzoek
Praktijkrelevantie TRAIL PhD onderzoekSerge Hoogendoorn
 
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Biplav Srivastava
 
Simulation and optimization of dynamic ridesharing services
Simulation and optimization of dynamic ridesharing servicesSimulation and optimization of dynamic ridesharing services
Simulation and optimization of dynamic ridesharing servicesMahdi Zarg Ayouna
 
How to Make Cars Smarter: A Step Towards Self-Driving Cars
How to Make Cars Smarter: A Step Towards Self-Driving CarsHow to Make Cars Smarter: A Step Towards Self-Driving Cars
How to Make Cars Smarter: A Step Towards Self-Driving CarsVMware Tanzu
 

Similar to INFOCOM 2018 Talk: MOVI (20)

[English]sae convergence2010 final2
[English]sae convergence2010 final2[English]sae convergence2010 final2
[English]sae convergence2010 final2
 
Vadakpat-UAV Intelligent Transportation Workshop Slides
Vadakpat-UAV Intelligent Transportation Workshop SlidesVadakpat-UAV Intelligent Transportation Workshop Slides
Vadakpat-UAV Intelligent Transportation Workshop Slides
 
[English]sae convergence2010 final2
[English]sae convergence2010 final2[English]sae convergence2010 final2
[English]sae convergence2010 final2
 
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
 
Ddam (1)
Ddam (1)Ddam (1)
Ddam (1)
 
Autonomous driving system using proximal policy optimization in deep reinforc...
Autonomous driving system using proximal policy optimization in deep reinforc...Autonomous driving system using proximal policy optimization in deep reinforc...
Autonomous driving system using proximal policy optimization in deep reinforc...
 
Speed study pradipta banik 1204012
Speed study pradipta banik 1204012Speed study pradipta banik 1204012
Speed study pradipta banik 1204012
 
Local Motors Awesome System
Local Motors Awesome SystemLocal Motors Awesome System
Local Motors Awesome System
 
2017.On-Line VRP for CER.pdf
2017.On-Line VRP for CER.pdf2017.On-Line VRP for CER.pdf
2017.On-Line VRP for CER.pdf
 
Traffic Light Control
Traffic Light ControlTraffic Light Control
Traffic Light Control
 
smart traffic control system using canny edge detection algorithm (4).pdf
smart traffic control system using canny edge detection algorithm (4).pdfsmart traffic control system using canny edge detection algorithm (4).pdf
smart traffic control system using canny edge detection algorithm (4).pdf
 
Cost-Effective Single-Camera Multi-Car Parking Monitoring and Vacancy Detecti...
Cost-Effective Single-Camera Multi-Car Parking Monitoring and Vacancy Detecti...Cost-Effective Single-Camera Multi-Car Parking Monitoring and Vacancy Detecti...
Cost-Effective Single-Camera Multi-Car Parking Monitoring and Vacancy Detecti...
 
Traffic Light Control
Traffic Light ControlTraffic Light Control
Traffic Light Control
 
Model options for public Bus transport- India(PPP)
Model options for public Bus transport- India(PPP)Model options for public Bus transport- India(PPP)
Model options for public Bus transport- India(PPP)
 
IRJET - Smart Traffic Monitoring System
IRJET -  	  Smart Traffic Monitoring SystemIRJET -  	  Smart Traffic Monitoring System
IRJET - Smart Traffic Monitoring System
 
Praktijkrelevantie TRAIL PhD onderzoek
Praktijkrelevantie TRAIL PhD onderzoekPraktijkrelevantie TRAIL PhD onderzoek
Praktijkrelevantie TRAIL PhD onderzoek
 
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
 
Simulation and optimization of dynamic ridesharing services
Simulation and optimization of dynamic ridesharing servicesSimulation and optimization of dynamic ridesharing services
Simulation and optimization of dynamic ridesharing services
 
Can we make traffic jams obsolete?
Can we make traffic jams obsolete?Can we make traffic jams obsolete?
Can we make traffic jams obsolete?
 
How to Make Cars Smarter: A Step Towards Self-Driving Cars
How to Make Cars Smarter: A Step Towards Self-Driving CarsHow to Make Cars Smarter: A Step Towards Self-Driving Cars
How to Make Cars Smarter: A Step Towards Self-Driving Cars
 

More from Takuma Oda

Drive-by Sensingによる都市のモニタリング
Drive-by Sensingによる都市のモニタリングDrive-by Sensingによる都市のモニタリング
Drive-by Sensingによる都市のモニタリングTakuma Oda
 
交通ゲーム理論入門
交通ゲーム理論入門交通ゲーム理論入門
交通ゲーム理論入門Takuma Oda
 
WWW 2021report public
WWW 2021report publicWWW 2021report public
WWW 2021report publicTakuma Oda
 
Reinforcement Learning For Taxi Rebalancing
Reinforcement Learning For Taxi RebalancingReinforcement Learning For Taxi Rebalancing
Reinforcement Learning For Taxi RebalancingTakuma Oda
 
Batch Reinforcement Learning
Batch Reinforcement LearningBatch Reinforcement Learning
Batch Reinforcement LearningTakuma Oda
 
機械学習を活用したモビリティサービスの地図データ整備
機械学習を活用したモビリティサービスの地図データ整備機械学習を活用したモビリティサービスの地図データ整備
機械学習を活用したモビリティサービスの地図データ整備Takuma Oda
 

More from Takuma Oda (6)

Drive-by Sensingによる都市のモニタリング
Drive-by Sensingによる都市のモニタリングDrive-by Sensingによる都市のモニタリング
Drive-by Sensingによる都市のモニタリング
 
交通ゲーム理論入門
交通ゲーム理論入門交通ゲーム理論入門
交通ゲーム理論入門
 
WWW 2021report public
WWW 2021report publicWWW 2021report public
WWW 2021report public
 
Reinforcement Learning For Taxi Rebalancing
Reinforcement Learning For Taxi RebalancingReinforcement Learning For Taxi Rebalancing
Reinforcement Learning For Taxi Rebalancing
 
Batch Reinforcement Learning
Batch Reinforcement LearningBatch Reinforcement Learning
Batch Reinforcement Learning
 
機械学習を活用したモビリティサービスの地図データ整備
機械学習を活用したモビリティサービスの地図データ整備機械学習を活用したモビリティサービスの地図データ整備
機械学習を活用したモビリティサービスの地図データ整備
 

Recently uploaded

Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 

Recently uploaded (20)

Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 

INFOCOM 2018 Talk: MOVI

Editor's Notes

  1. In traditional taxi networks, individual drivers look for passengers hailing on the street. They are relying on their experience and knowledge But it can be inefficient if they don’t know future demand and are not coordinated For instance, let’s say there are two vacant taxis on the streets and they cruise or are dispatched to this regions. But, customers may request rides at those locations. In this case, for both customers and drivers, dispatch decisions was not optimal. Either of drivers has to spend a lot of time on cruising Modern ride-hailing fleet networks such as Uber and Lyft can track vehicles’ GPS location and passengers’ pickup location in real time. This data can be utilized to predict passenger demand and vehicle mobility patterns in the future, which enables proactive dispatch of their vehicles to predicted future pickup locations In this way, optimization of taxi dispatch can reduce passengers waiting time for a ride and increase drivers revenue
  2. There several challenges in this problem. For an on-demand ride-hailing application, it needs to be solved in real-time However, challenges such as large state space, uncertain customer demand and coordination in large-scale fleet network, makes it difficult to solve efficiently Most previous works on fleet management address this problem with a model-based approach Model-based approach first models vehicles dynamics and interactions with passengers and then optimally solves the dispatch problems given these models Though the model-based approaches can improve the performance, modeling complex dynamics of fleet networks is inherently limited and solving the problem in large-scale fleet in real-time tends to be computationally intractable In this work, we propose a model-free, distributed approach for the problem to tackle these challenges Our contribution of this work are: Design and evaluate a distributed, model-free approach for taxi dispatch problem Compare model-free, distributed approach and model-based, centralized approach Demonstrate effectiveness of the new approach in a realistic simulated environment
  3. Let me define the problem more precisely We assume that there are an environment and an agent. The environment consists of vehicles and passengers with a mobile app The agent takes an action by dispatching. By dispatch, we mean sending a vacant taxi to an other location Agent observes each vehicles’ location and availability status and all passengers pickup requests. Using this real-time information, agent determines proactive dispatching for vacant vehicles. Since we focus on optimizing proactive dispatching, we incorporate the matching algorithm between passengers and available vehicles in the environment. The agent goal is to optimize sequential dispatch decisions so as to maximize accumulated reward
  4. We assume that all rides requested with a mobile app so that the agent can get pickup and drop off location in real time Vehicle state information is available in real time, including locations, occupancy status, destination Requests are rejected if no available vehicles within fixed range. We use 5 km for our experiment
  5. We used Receding Horizon Control approach as our baseline policy It is centralized, model-based approach and formulated as deterministic optimization problem For ours, we presented distributed, model-free approach using a popular reinforcement learning framework Deep Q-Network.
  6. The action variable for the baseline is the number of vehicles to send to each region, each time, denoted by u_t We wish to choose the u_t to maximize reward, defined by a weighted sum of the number of rejects and the vehicles’ idle cruising time The number of vehicles in next time slot t is computed by this transition model. The first term corresponds to leftover vehicles as the results of pickups The second term is the net number of vehicles dispatched to this region The last two terms represent occupied vehicles dropping off passengers within time slot t+1 Assuming the future demand are known, we can find optimal dispatch actions to maximize accumulated reward in T horizon Every time step, we solve RHC to determine next T step actions, but execute only current action. The first constraint ensures that the total number of vehicles dispatched from i-th region must not exceed the number of idle vehicles The second constraint ensures that we do not dispatch vehicles to regions with travel times that exceeds d_t and all dispatch movement completes within a time interval For simplicity, we assume that u_tij are continuous variables; we can then solve optimization problem efficiently with Linear Programming methods
  7. The action variable is where each taxi should go in the next timeslot Similar to the baseline, we express reward function for each vehicle as weighted sum of pickup reward and idle cruising cost We would like to learn optimal action-value function, which is defined as the maximum expected return achievable by any policy Since the number of states space is huge, we use neural network function approximator for Q For loss function, we use MSE and a target value is computed by bellman backup of current estimation
  8. To evaluate RHC and DQN policies, we design and implement MOVI as a taxi fleet simulator This diagram shows the MOVI architecture Fleet object simulates states of all vehicles In every time step, MOVI generates ride requests based on the real trip records and matches each request to vehicles by nearest neighbor algorithm Next, the agent observes the current state of the environment which includes vehicle and requests information The agent then computes the actions, using either RHC or DQN policy, and sends a dispatch order to idle vehicles For each dispatch order, MOVI creates an estimated trajectory to the dispatched location by computing the shortest path in OSM road network graph Finally, all vehicles update their states according to their matching and dispatch assignments Dispatch policy is a separate module and does not affect other simulator modules so that we can compare different dispatch policies in the same settings
  9. We used NYC taxi trip records for the experiments This is the regions in our experiment, showing geographical demand pattern The area size is roughly 40 km x 40 km We trained DQN and other machine learning models with one month data and evaluate metrics with one week data Temporal demand patterns are roughly similar
  10. We use a fully convolutional neural network with a 15 x 15 output map Each grid corresponds to the Q-value for each possible move from center location FCNN enables faster learning and inference due to the absence of fully connected layer inputs: state of the env For input features, we use demand and supply heat maps surrounding an agent vehicle as an environment state. It makes input size independent on the service area The larger the input heat maps are, the further future demand agents can see for decision making, but the more computationally intensive it will be In order to make image size small, we also use smoothed heat maps so that agents know further information easily Another key design is incorporating other agents’ destination into input. This allows to mitigate environment non-stationarity because an agent can learn its optimal action conditioned on other agents’ current action
  11. We trained DQNs with double DQN algorithm with experience replay Network weights and replay memory are shared among agents We customized epsilon greedy exploration methods by adding activation rate which controls the probability of move or stay. We found that it contributes to stable and faster training These graphs show training curves of average loss and average max Q during training max Q value starts decreasing after it reaches 100. This can be explained by environmental changes by more competitions among agents. It also indicates coordination in distributed manner.
  12. We ran simulations with DQN policy, RHC policy and No Dispatching Policy and calculate three metrics: reject rate, passenger wait time and idle cruising time in each day of the week All simulations were ran with 1 minute time step and 8000 vehicles. In every day of week, the DQN policies significantly reduce the reject rate and wait time compared to no dispatching, while idle cruising time stays almost the same In comparison with DQN and RHC, the reject rate of DQN is reduced by 20% and wait time is reduced by 12%
  13. Despite the fact that the DQN policy does not make coordinated decisions for idle vehicles, our results show that DQN performs better than RHC. We think that this is due to DQN’s faster and distributed dispatch decisions, allowing the dispatch policies to DQN forward pass computation takes less than 100ms while RHC computation takes a few seconds and depends on the number of regions To investigate the effect of on-demand, distributed nature of DQN, we simulates “batch” version of DQN policy. The results plotted as DQN* show that batch DQN policy performs almost the same as RHC. This indicates that faster, on-demand computation of DQN contributes to rapid adaptation to the environment state Another interesting feature of DQN policy is that it is more beneficial for drivers because it predicts the best action for each individual vehicle. The figure show that average and minimum utilization rate of all vehicles. DQN realizes better lowest utilization rate compared to RHC. Thus, the DQN policy may be more realistic to implement in real-world applications Utilization rate strongly relates to the revenue
  14. Let me conclude our work For our contribution, There might be several extensions of our work.