Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Webinar | Using Big Data and Predic... by NICSA 767 views
- Best practices for building and dep... by Kun Le 5332 views
- Airline flights delay prediction- 2... by Haozhe Wang 6983 views
- Flight Delay Prediction by Vyshak Srishylappa 1819 views
- Big Data For Flight Delay Report by JSPM's JSCOE , Pu... 5482 views
- BIG DATA TO AVOID WEATHER RELATED F... by JSPM's JSCOE , Pu... 9502 views

No Downloads

Total views

1,434

On SlideShare

0

From Embeds

0

Number of Embeds

17

Shares

0

Downloads

90

Comments

7

Likes

1

No notes for slide

- Also Extensive K-NN algorithm; known as ENN method which specifies two-way communication of classification; it considers not only who are nearest to test sample, but also who consider the test sample as their nearest neighbors.

Tree methods are nonparametric and nonlinear. The final results of using tree methods for classification or regression can be summarized in a series of (usually few) logical if-then conditions (tree nodes).

Statistical Classifier:

It chose the attribute with highest gain to split the sample sets in subsets of one or more classes. Information gain is used for splitting criteria.

Rweka library is used

SFO: three possible values:

0 means no flight

1 means flights

- 1. Prepared By: Ishani Desai Karan Shah Ketu Shah Shubham Gupta Flight Delay Prediction Model
- 2. Shubham Gupta Karan ShahKetu Shah Ishani Desai
- 3. • Flight Delay has emerged as a prime factor for economic loss for airlines. • Flight Delay has negative impact on business reputation and demand of airlines as well. Business Problem Overview
- 4. • Develop a business model to predict flight delays. • Optimize flight operations. • Reduce further economic loss for airlines. • Lessen inconvenience occurred to passengers. Goal and Objective
- 5. • As of 2007, airline industries incurred average cost of around $11,300 per delayed flight based upon 61,000 delayed flights per month average. • According to latest estimates, the cost of aircraft block time for U.S. passenger airlines was $81.18 per minute. Literature Review
- 6. • The data is taken from United States Bureau of Transportation Statistics. • The dataset represents 4 years of flight delay information for the state of Washington. • Dataset contains over 2500 records and 48 attributes for February month. • Attributes: • Origin Airport • Destination Airport • Flight Number • Airline • Date of Flight • Delay Information Data Source
- 7. Original Dataset
- 8. Selected Attributes
- 9. Derived attributes for Model
- 10. Data Preparation Selected attributes from years 2013, 2014, and 2015 Derived attributes from years 2013, 2014, and 2015 Selected attributes from year 2016 Derived attributes from year 2016 Training Data Testing Data
- 11. • K-Nearest Neighbors (K-NN) • Weighted K-Nearest Neighbors (KK-NN) • Decision Tree: CART • Decision Tree: C4.5 Algorithms Used
- 12. • In Pattern recognition and statistical estimation, the k-nearest neighbors algorithm is used for classification and regression. • For classification, K-nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a distance matrix. Euclidean Distance Function K-Nearest Neighbors attributestherepresent ,...,,and,,...,,where )(),( 2121 2 Euclidean m yyyxxx yxd mm i ii yx yx
- 13. 86.05 82.68 81.44 81.34 81.36 81.18 78 79 80 81 82 83 84 85 86 87 K=2 K=5 K=10 K=15 K=20 K=25 K-NN Series1 K-NN Graph K Value Result K=2 86.05 K=5 82.68 K=10 81.44 K=15 81.34 K=20 81.36 K=25 81.18
- 14. • This extension is based on the idea, that such observations within the learning set, which are particularly close to the new observation (y, x), should get a higher weight in the decision than such neighbors that are far away from (y, x). • This is not the case with kNN: Indeed only the k nearest neighbors influence the prediction; however, this influence is the same for each of these neighbors, although the individual similarity to (y, x) might be widely different. • To reach this aim, the distances, on which the search for the nearest neighbors is based in the first step, have to be transformed into similarity measures, which can be used as weights Weighted K-Nearest Neighbors
- 15. KK-NN Graph 85.35 84.82 84.87 84.52 83.11 83.07 82.59 81 81.5 82 82.5 83 83.5 84 84.5 85 85.5 86 K=1 K=2 K=5 K=10 K=15 K=20 K=25 KK-NN Series1 K Value Result K=1 85.35 K=2 84.52 K=5 84.87 K=10 84.52 K=15 83.11 K=20 83.07 K=25 82.59
- 16. • The purpose of the analysis via tree-building algorithm is to determine a set of if-then logical split conditions that permit accurate predictions or classification of cases. • A Classification tree will determine a set of logical if-then conditions instead of linear equations for predicting or classifying cases. • The general approach to derive predictions from if-then conditions can also be applied to regression tree as well. • Advantages of CART: • Simplicity • Nonparametric and Nonlinear Classification and Regression Tree
- 17. CART Implementation Predictions 0 1 0 1742 397 1 100 41 Accuracy: 78.20%
- 18. • C4.5 algorithm is used to generate decision tree that can be used for classification and so referred as Statistical Classifier. • C4.5 permits numeric attributes and deals sensibly with missing values. • C4.5 uses attributes with continuous data and different weights. • C4.5 follows Post-pruning approach to deal with noisy data and remove a sub-tree from fully developed decision tree. C4.5 Algorithm
- 19. C4.5 Decision Tree Predictions 0 1 0 1815 431 1 27 7 Accuracy: 79.91%
- 20. Analysis and Statistics of Flight Delays
- 21. Total Delayed for 2016 1.89% 67.23% 5.37% 7.45% 1.07% 2.65% 8.21% 2.46% 3.66% Total Delayed for 2015 Flight Delay [ Airlines ] 1.58% 65.26% 7.97% 6.92% 0.60% 2.41% 9.55% 2.03% 3.68% AA AS B6 DL F9 HA OO UA VX
- 22. Popular Route Delays Information 0 10 20 30 40 50 60 70 80 90 100 NO. OF DELAYS IN POPULAR ROUTES NO. OF DELAY 2015 NO. OF DELAY 2016 0% 1% 2% 3% 4% 5% 6% 7% SEA-LAX SEA-SFO SEA-JFK SEA-ORD SEA-DFW SEA-Boston SEA-ATL % SHARE OF POPULAR ROUTES IN TOTAL DELAY % SHARE IN TOTAL DELAY 2015 % SHARE IN TOTAL DELAY 2016
- 23. 0 20 40 60 80 100 120 1-Feb 2-Feb 3-Feb 4-Feb 5-Feb 6-Feb 7-Feb 8-Feb 9-Feb 10-Feb 11-Feb 12-Feb 13-Feb 14-Feb 15-Feb 16-Feb 17-Feb 18-Feb 19-Feb 20-Feb 21-Feb 22-Feb 23-Feb 24-Feb 25-Feb 26-Feb 27-Feb 28-Feb TOTAL NO. OF DELAYS FOR EACH DAY NO. OF DELAY 2015 NO. OF DELAY 2016
- 24. Outcome: • After studying different models on the dataset, it is observed that KNN provides us the best results with the accuracy of about 86%. Future Scope: • The model accuracy can be increased by taking into the account variables like weather conditions and airline employees efficiency. Application: • Airlines can determine efficient routes with minimum delay possibility. • Opt for secondary airports for particular routes between cities. E.g. SEA- LGA instead of SEA-JFK since SEA-JFK flights are more likely to be delayed. • This model can help passengers to plan layover at particular airport. Outcome-Application-Future Scope
- 25. Thank You!

No public clipboards found for this slide

Login to see the comments