SlideShare a Scribd company logo
1 of 22
Download to read offline
Flight Delay Prediction
Model
Vishwanath K, Viral Tarpara,
Haozhe Wang, Ling Zhou
Business Problem Overview
Flight delay is a challenging problem for all airline companies, which will lead to
● Financial losses.
● Negative impact on their business reputation.
$32.9B
$8.3B $16.7B $3.9B $4B
Cost of Delays in the US
Cost to Airlines Cost to Passengers Cost from
Lost Demand
GDP Impact
Source: Total Cost Impact Study
Business Problem Overview
Model
Predict Flight Delay
Optimize operation
Reduce further loss
Airline
Companies
Help
Literature Review on Delay Costs
Airline industry incurs an average cost of about $11,300 per delayed flight.
based on 61,000 delayed flights per month average
Excludes costs to passengers and lost demand
A more accurate delay prediction system can help to identify operational
variables that contribute to delays.
While some conditions, such as weather, are not controllable factors, the way
airlines and airports operate and optimize resources in the face of "acts of
god" is controllable.
Data Understanding
Dataset: On-Time Performance
From Research and Innovative Technology Administration,BTS
Data Understanding
Potentially Useful Variables:
Quarter,
Month;
Day of Month
Flight
Number
Origin Airport;
Destination Airport
Departure Block;
Arrival Block
Carrier
Departure Delay;
Arrival Delay
Time
Operation
Geography
Airline
Training: Testing:
Data Preparation
Selected Attributes from 2012 Data
Derived Attributes from 2011 Data
Selected Attributes from 2013 Data
Derived Attributes from 2012 Data
Attributes from Additional Dataset Attributes from Additional Dataset
Data Preparation
Selected Attributes:
1. Quarter
2. Month
3. Day of Month
4. FL_NUM: Flight Number
5. Origin: Origin Airport
6. Dest: Destination Airport
7. UniqueCarrier: Unique Carrier Code
8. DepTimeBLK: Departure Time Block, Hourly Intervals
9. ArrTimeBLK: Arrival Time Block, Hourly Intervals
Target: ArrDel: Arrival Delay, 1=Y, 0=N
Removed for the
project.to build the full
model these attributes
are necessary.
Data Preparation
Derived Attributes:
1. Airline_Delay: the percentage of delay by each airline in one year
2. Flight_Delay: the percentage of delay by each specific flight in one year
3. Day_Delay: the percentage of delay by day of month for all flights in one year
4. Origin_Delay: the percentage of delay by each origin airport for all flights in one year
5. Dest_Delay: the percentage of delay by each destination airport for all flights in one year
6. Dep_BLK_Delay: the percentage of delay by each departure block for all flights in one year
7. Arr_BLK_Delay: the percentage of delay by each arrival for all flights in one year
Data Preparation
Additional Dataset : Schedule Employees
From Research and Innovative Technology Administration, BTS
Data Preparation
Additional Attributes:
1. Full Time Employees in current month
2. Part Time Employees in current month
3. FTE Employees: Full Time Equivalent Employees in current month
(2 part time= 1 full time)
4. Total Employees in current month
We wanted to see if historical on-time performance and current
staffing levels was enought to build a decent model.
Data Preparation
Large size of dataset(2.9GB)
Merge these attributes by month(via Excel Vlookup)
Use data of one month, January, to build the model.
Modeling
• Naive Beyes
• Decision tree- J48(with various leaf sizes)
• Logistic Regression “refused” to grocess in Weka
Modeling
Preprocess
• Convert the type of attributes
• Convert csv file to arff(70MB)
Training:
• Instances: 422539
• Attributes: 19
Testing:
• Instances: 478145
• Attributes: 19
NaiveBayes Modeling
On Training Data
Confusion Matrix of Naïve Bayes:
a b <-- classified as
333876 28289 | a = 0 (on-time)
45761 14613 | b = 1 (delay)
Accuracy ROC Area
Naïve Bayes 82.475% 0.694
High cost, lower
is better
Modeling- snapshot
J48 with different parameter:
MinObjNum Accuracy ROC Area
15 88.4917% 0.85
25 87.7308% 0.791
50 87.3311% 0.774
100 87.0414% 0.767
150 82.475% 0.694
Modeling - snapshot
Confusion Matrix of J48, 25:
a b ---classified as
356570 5595 a=0
46247 14127 b=1
Confusion Matrix of J48, 15:
a b <-- classified as
356407 5758 | a = 0
42869 17505 | b = 1
Confusion Matrix of J48, 100:
a b ---classified as
357881 4284 a=0
50471 14127 b=1
Confusion Matrix of J48, 50:
a b ---classified as
356885 5280 | a = 0
48251 12123 | b = 1
Training model Performance-J48
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
15 25 50 100 150
ROC Area
ROC Area
The trend
levels off
around 0.76
Model Evaluation
oThe evaluation is mainly based on the falsely classified on-time instances: this is the
case where pessengers are given confidence on arrive on time while end up being late.
oWe choose trainning model with largest AUC and smallest False Nagative value.
MinObjNum Accuracy ROC Area FN Value Results
15 88.4917 % 0.85 42869 Reject
25 87.7308% 0.791 46247 Reject
50 87.3311% 0.774 48251 Reject
100 87.0414% 0.767 50471 Reject
150 86.8746 % 0.761 51276 Reject
NaiveBeyes 82.475% 0.694 45761 Accept
Model Evaluation
Model Performance on Testing Data(Jan 2013)
Model ROC Area FN Value
J48_minObjNum=100 0.512 82442
Naive Bayes 0.583 74058
Deployment
Example : Avoiding the Most Delay Prone Parts of the System
Schedule your air flight without a layover
Avoid the major hubs by using smaller airports
Chicago ORD, New York City (All), Atlanta were the worse in terms of congestion
Early Morning Departure flights have better on-time performance
Late Afternoon and early evening has the worst on-time performance
"When I can, I try to arrive the night
before," says Russell Hayward, a
USA TODAY Road Warrior. "But that eats
up a whole work day, wasted
travel time due to airline uncertainty."
(Woodyard, 2001)

More Related Content

What's hot

Airline revenue management
Airline revenue managementAirline revenue management
Airline revenue managementZahide Bakar
 
[Airline Information System] in Database Project presntation
[Airline Information System] in Database Project presntation[Airline Information System] in Database Project presntation
[Airline Information System] in Database Project presntationSyed Muhammad Zeejah Hashmi
 
Introduction to airline networks
Introduction to airline networksIntroduction to airline networks
Introduction to airline networksPrateek Garodia
 
Air line reservation system software engeniring
Air line reservation system software engeniringAir line reservation system software engeniring
Air line reservation system software engeniringAsfand Sheraz Khan Niazi
 
Airline Reservation System
Airline Reservation SystemAirline Reservation System
Airline Reservation SystemArohi Khandelwal
 
Aviation ( World and India)
Aviation ( World and India)Aviation ( World and India)
Aviation ( World and India)sucharita1
 
Airline Reservation system(project report of six week training)-ppt
Airline Reservation system(project report of six week training)-pptAirline Reservation system(project report of six week training)-ppt
Airline Reservation system(project report of six week training)-pptPunjab technical University
 
Airline Cost Route Profitability System
Airline Cost  Route Profitability SystemAirline Cost  Route Profitability System
Airline Cost Route Profitability Systemsamissac
 
Assignment 1 of Database (MySQL & Sqlite3)
Assignment 1 of Database (MySQL & Sqlite3) Assignment 1 of Database (MySQL & Sqlite3)
Assignment 1 of Database (MySQL & Sqlite3) Aey Unthika
 
Airline reservation system project report (1)
Airline reservation system project report (1)Airline reservation system project report (1)
Airline reservation system project report (1)MostafaMorsyMohamed
 
AVIATION PPT (RAHUL GUPTA)
AVIATION PPT (RAHUL GUPTA) AVIATION PPT (RAHUL GUPTA)
AVIATION PPT (RAHUL GUPTA) rahul gupta
 
18542444 ticketing-manual-2
18542444 ticketing-manual-218542444 ticketing-manual-2
18542444 ticketing-manual-2Dinesh Ghodke
 
US Domestic Airline Industry
US Domestic Airline Industry US Domestic Airline Industry
US Domestic Airline Industry Lacey Claymier
 
Airline revenue management
Airline revenue managementAirline revenue management
Airline revenue managementMohanaRanganD2
 
Boeing Case Study | e-enabled Advantage
Boeing Case Study | e-enabled AdvantageBoeing Case Study | e-enabled Advantage
Boeing Case Study | e-enabled AdvantageAdheesha Dharmakeerthi
 

What's hot (20)

Airline revenue management
Airline revenue managementAirline revenue management
Airline revenue management
 
[Airline Information System] in Database Project presntation
[Airline Information System] in Database Project presntation[Airline Information System] in Database Project presntation
[Airline Information System] in Database Project presntation
 
Introduction to airline networks
Introduction to airline networksIntroduction to airline networks
Introduction to airline networks
 
Flight Data Analysis
Flight Data AnalysisFlight Data Analysis
Flight Data Analysis
 
Air line reservation system software engeniring
Air line reservation system software engeniringAir line reservation system software engeniring
Air line reservation system software engeniring
 
Airline Reservation System
Airline Reservation SystemAirline Reservation System
Airline Reservation System
 
Airlines Database Design
Airlines Database DesignAirlines Database Design
Airlines Database Design
 
Aviation ( World and India)
Aviation ( World and India)Aviation ( World and India)
Aviation ( World and India)
 
Airline Reservation system(project report of six week training)-ppt
Airline Reservation system(project report of six week training)-pptAirline Reservation system(project report of six week training)-ppt
Airline Reservation system(project report of six week training)-ppt
 
Airline Cost Route Profitability System
Airline Cost  Route Profitability SystemAirline Cost  Route Profitability System
Airline Cost Route Profitability System
 
Assignment 1 of Database (MySQL & Sqlite3)
Assignment 1 of Database (MySQL & Sqlite3) Assignment 1 of Database (MySQL & Sqlite3)
Assignment 1 of Database (MySQL & Sqlite3)
 
Airline reservation system project report (1)
Airline reservation system project report (1)Airline reservation system project report (1)
Airline reservation system project report (1)
 
The Low Cost Carriers Recipes
The Low Cost Carriers RecipesThe Low Cost Carriers Recipes
The Low Cost Carriers Recipes
 
AVIATION PPT (RAHUL GUPTA)
AVIATION PPT (RAHUL GUPTA) AVIATION PPT (RAHUL GUPTA)
AVIATION PPT (RAHUL GUPTA)
 
18542444 ticketing-manual-2
18542444 ticketing-manual-218542444 ticketing-manual-2
18542444 ticketing-manual-2
 
Aviation
AviationAviation
Aviation
 
US Domestic Airline Industry
US Domestic Airline Industry US Domestic Airline Industry
US Domestic Airline Industry
 
Airline revenue management
Airline revenue managementAirline revenue management
Airline revenue management
 
Fleet optimization
Fleet optimizationFleet optimization
Fleet optimization
 
Boeing Case Study | e-enabled Advantage
Boeing Case Study | e-enabled AdvantageBoeing Case Study | e-enabled Advantage
Boeing Case Study | e-enabled Advantage
 

Viewers also liked

Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Mingxuan Li
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Kun Le
 
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...NICSA
 
Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Shubham Gupta
 

Viewers also liked (7)

Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
 
Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)
 
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPTBIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
 
Big Data For Flight Delay Report
Big Data For Flight Delay ReportBig Data For Flight Delay Report
Big Data For Flight Delay Report
 
Flight Delay Prediction
Flight Delay PredictionFlight Delay Prediction
Flight Delay Prediction
 

Similar to Airline flights delay prediction- 2014 Spring Data Mining Project

Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihoodAashish Jain
 
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...IRJET Journal
 
UOPOPS571 Lessons in Excellence--uopops571.com
UOPOPS571 Lessons in Excellence--uopops571.comUOPOPS571 Lessons in Excellence--uopops571.com
UOPOPS571 Lessons in Excellence--uopops571.comthomashard90
 
Random Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics ProjectRandom Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics ProjectSaurabh Kale
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delayiDTechTechnologies
 
UOP OPS 571 Inspiring Innovation--uopops571.com
UOP OPS 571 Inspiring Innovation--uopops571.comUOP OPS 571 Inspiring Innovation--uopops571.com
UOP OPS 571 Inspiring Innovation--uopops571.comkopiko118
 
Scheduling And Revenue Management Process
Scheduling And Revenue Management ProcessScheduling And Revenue Management Process
Scheduling And Revenue Management Processahmad bassiouny
 
OPS 571 Effective Communication - tutorialrank.com
OPS 571    Effective Communication - tutorialrank.comOPS 571    Effective Communication - tutorialrank.com
OPS 571 Effective Communication - tutorialrank.comBartholomew54
 
From an operational perspective, yield management is most effective under whi...
From an operational perspective, yield management is most effective under whi...From an operational perspective, yield management is most effective under whi...
From an operational perspective, yield management is most effective under whi...johann11371
 
Lean six sigma executive overview (case study) templates
Lean six sigma executive overview (case study) templatesLean six sigma executive overview (case study) templates
Lean six sigma executive overview (case study) templatesSteven Bonacorsi
 
The Total Economic Impact of NetApp MetroCluster
The Total Economic Impact of NetApp MetroClusterThe Total Economic Impact of NetApp MetroCluster
The Total Economic Impact of NetApp MetroClusterNetApp
 
Data mining & predictive analytics for US Airlines' performance
Data mining & predictive analytics for US Airlines' performanceData mining & predictive analytics for US Airlines' performance
Data mining & predictive analytics for US Airlines' performanceAkiso Yadav
 
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...Microcredit Summit Campaign
 
OPS 571 HELP Inspiring Innovation--ops571help.com
OPS 571 HELP Inspiring Innovation--ops571help.comOPS 571 HELP Inspiring Innovation--ops571help.com
OPS 571 HELP Inspiring Innovation--ops571help.comclaric77
 
Which of the following is a characteristic that can be used to guide the desi...
Which of the following is a characteristic that can be used to guide the desi...Which of the following is a characteristic that can be used to guide the desi...
Which of the following is a characteristic that can be used to guide the desi...ramuaa128
 
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET Journal
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
Hard landing predection
Hard landing predectionHard landing predection
Hard landing predectionRAJUPADHYAY44
 

Similar to Airline flights delay prediction- 2014 Spring Data Mining Project (20)

Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihood
 
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
 
UOPOPS571 Lessons in Excellence--uopops571.com
UOPOPS571 Lessons in Excellence--uopops571.comUOPOPS571 Lessons in Excellence--uopops571.com
UOPOPS571 Lessons in Excellence--uopops571.com
 
Random Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics ProjectRandom Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics Project
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delay
 
UOP OPS 571 Inspiring Innovation--uopops571.com
UOP OPS 571 Inspiring Innovation--uopops571.comUOP OPS 571 Inspiring Innovation--uopops571.com
UOP OPS 571 Inspiring Innovation--uopops571.com
 
Scheduling And Revenue Management Process
Scheduling And Revenue Management ProcessScheduling And Revenue Management Process
Scheduling And Revenue Management Process
 
OPS 571 Effective Communication - tutorialrank.com
OPS 571    Effective Communication - tutorialrank.comOPS 571    Effective Communication - tutorialrank.com
OPS 571 Effective Communication - tutorialrank.com
 
From an operational perspective, yield management is most effective under whi...
From an operational perspective, yield management is most effective under whi...From an operational perspective, yield management is most effective under whi...
From an operational perspective, yield management is most effective under whi...
 
Data Mining and Analytics
Data Mining and AnalyticsData Mining and Analytics
Data Mining and Analytics
 
Lean six sigma executive overview (case study) templates
Lean six sigma executive overview (case study) templatesLean six sigma executive overview (case study) templates
Lean six sigma executive overview (case study) templates
 
Se notes
Se notesSe notes
Se notes
 
The Total Economic Impact of NetApp MetroCluster
The Total Economic Impact of NetApp MetroClusterThe Total Economic Impact of NetApp MetroCluster
The Total Economic Impact of NetApp MetroCluster
 
Data mining & predictive analytics for US Airlines' performance
Data mining & predictive analytics for US Airlines' performanceData mining & predictive analytics for US Airlines' performance
Data mining & predictive analytics for US Airlines' performance
 
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...
Moody's ---How Social Performance Impacts Financial Resilience and Default Pr...
 
OPS 571 HELP Inspiring Innovation--ops571help.com
OPS 571 HELP Inspiring Innovation--ops571help.comOPS 571 HELP Inspiring Innovation--ops571help.com
OPS 571 HELP Inspiring Innovation--ops571help.com
 
Which of the following is a characteristic that can be used to guide the desi...
Which of the following is a characteristic that can be used to guide the desi...Which of the following is a characteristic that can be used to guide the desi...
Which of the following is a characteristic that can be used to guide the desi...
 
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
Hard landing predection
Hard landing predectionHard landing predection
Hard landing predection
 

Recently uploaded

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 

Recently uploaded (20)

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 

Airline flights delay prediction- 2014 Spring Data Mining Project

  • 1. Flight Delay Prediction Model Vishwanath K, Viral Tarpara, Haozhe Wang, Ling Zhou
  • 2. Business Problem Overview Flight delay is a challenging problem for all airline companies, which will lead to ● Financial losses. ● Negative impact on their business reputation. $32.9B $8.3B $16.7B $3.9B $4B Cost of Delays in the US Cost to Airlines Cost to Passengers Cost from Lost Demand GDP Impact Source: Total Cost Impact Study
  • 3. Business Problem Overview Model Predict Flight Delay Optimize operation Reduce further loss Airline Companies Help
  • 4. Literature Review on Delay Costs Airline industry incurs an average cost of about $11,300 per delayed flight. based on 61,000 delayed flights per month average Excludes costs to passengers and lost demand A more accurate delay prediction system can help to identify operational variables that contribute to delays. While some conditions, such as weather, are not controllable factors, the way airlines and airports operate and optimize resources in the face of "acts of god" is controllable.
  • 5. Data Understanding Dataset: On-Time Performance From Research and Innovative Technology Administration,BTS
  • 6. Data Understanding Potentially Useful Variables: Quarter, Month; Day of Month Flight Number Origin Airport; Destination Airport Departure Block; Arrival Block Carrier Departure Delay; Arrival Delay Time Operation Geography Airline
  • 7. Training: Testing: Data Preparation Selected Attributes from 2012 Data Derived Attributes from 2011 Data Selected Attributes from 2013 Data Derived Attributes from 2012 Data Attributes from Additional Dataset Attributes from Additional Dataset
  • 8. Data Preparation Selected Attributes: 1. Quarter 2. Month 3. Day of Month 4. FL_NUM: Flight Number 5. Origin: Origin Airport 6. Dest: Destination Airport 7. UniqueCarrier: Unique Carrier Code 8. DepTimeBLK: Departure Time Block, Hourly Intervals 9. ArrTimeBLK: Arrival Time Block, Hourly Intervals Target: ArrDel: Arrival Delay, 1=Y, 0=N Removed for the project.to build the full model these attributes are necessary.
  • 9. Data Preparation Derived Attributes: 1. Airline_Delay: the percentage of delay by each airline in one year 2. Flight_Delay: the percentage of delay by each specific flight in one year 3. Day_Delay: the percentage of delay by day of month for all flights in one year 4. Origin_Delay: the percentage of delay by each origin airport for all flights in one year 5. Dest_Delay: the percentage of delay by each destination airport for all flights in one year 6. Dep_BLK_Delay: the percentage of delay by each departure block for all flights in one year 7. Arr_BLK_Delay: the percentage of delay by each arrival for all flights in one year
  • 10. Data Preparation Additional Dataset : Schedule Employees From Research and Innovative Technology Administration, BTS
  • 11. Data Preparation Additional Attributes: 1. Full Time Employees in current month 2. Part Time Employees in current month 3. FTE Employees: Full Time Equivalent Employees in current month (2 part time= 1 full time) 4. Total Employees in current month We wanted to see if historical on-time performance and current staffing levels was enought to build a decent model.
  • 12. Data Preparation Large size of dataset(2.9GB) Merge these attributes by month(via Excel Vlookup) Use data of one month, January, to build the model.
  • 13. Modeling • Naive Beyes • Decision tree- J48(with various leaf sizes) • Logistic Regression “refused” to grocess in Weka
  • 14. Modeling Preprocess • Convert the type of attributes • Convert csv file to arff(70MB) Training: • Instances: 422539 • Attributes: 19 Testing: • Instances: 478145 • Attributes: 19
  • 15. NaiveBayes Modeling On Training Data Confusion Matrix of Naïve Bayes: a b <-- classified as 333876 28289 | a = 0 (on-time) 45761 14613 | b = 1 (delay) Accuracy ROC Area Naïve Bayes 82.475% 0.694 High cost, lower is better
  • 16. Modeling- snapshot J48 with different parameter: MinObjNum Accuracy ROC Area 15 88.4917% 0.85 25 87.7308% 0.791 50 87.3311% 0.774 100 87.0414% 0.767 150 82.475% 0.694
  • 17. Modeling - snapshot Confusion Matrix of J48, 25: a b ---classified as 356570 5595 a=0 46247 14127 b=1 Confusion Matrix of J48, 15: a b <-- classified as 356407 5758 | a = 0 42869 17505 | b = 1 Confusion Matrix of J48, 100: a b ---classified as 357881 4284 a=0 50471 14127 b=1 Confusion Matrix of J48, 50: a b ---classified as 356885 5280 | a = 0 48251 12123 | b = 1
  • 18. Training model Performance-J48 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 15 25 50 100 150 ROC Area ROC Area The trend levels off around 0.76
  • 19. Model Evaluation oThe evaluation is mainly based on the falsely classified on-time instances: this is the case where pessengers are given confidence on arrive on time while end up being late. oWe choose trainning model with largest AUC and smallest False Nagative value. MinObjNum Accuracy ROC Area FN Value Results 15 88.4917 % 0.85 42869 Reject 25 87.7308% 0.791 46247 Reject 50 87.3311% 0.774 48251 Reject 100 87.0414% 0.767 50471 Reject 150 86.8746 % 0.761 51276 Reject NaiveBeyes 82.475% 0.694 45761 Accept
  • 20. Model Evaluation Model Performance on Testing Data(Jan 2013) Model ROC Area FN Value J48_minObjNum=100 0.512 82442 Naive Bayes 0.583 74058
  • 21. Deployment Example : Avoiding the Most Delay Prone Parts of the System Schedule your air flight without a layover Avoid the major hubs by using smaller airports Chicago ORD, New York City (All), Atlanta were the worse in terms of congestion Early Morning Departure flights have better on-time performance Late Afternoon and early evening has the worst on-time performance
  • 22. "When I can, I try to arrive the night before," says Russell Hayward, a USA TODAY Road Warrior. "But that eats up a whole work day, wasted travel time due to airline uncertainty." (Woodyard, 2001)