SlideShare a Scribd company logo
1 of 30
Download to read offline
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Shipment Address Classification in Logistics in
the absence of Geolocation Information
Dr. T. Ravindra Babu,
Data Scientist,
Flipkart
August 1, 2015
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Presentation Plan
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
Motivation
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
Motivation
Problem Definition
Typical Operations Scenario at Delivery Hub without a model
Inscan of shipments received from Mother Hub
Manual reading of address; Assign to the Route/FE
Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
Motivation
Problem Definition
Typical Operations Scenario at Delivery Hub without a model
Inscan of shipments received from Mother Hub
Manual reading of address; Assign to the Route/FE
Sorting and Delivery
Overview of Proposed Solution
Capturing FEs’ domain knowledge and modelling around it
Classifying an address to be belonging to a pre-defined subarea
Allocation of the shipments to Route/FE based on Machine
Learning based Classifier
Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Delivery Hub and Subareas
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Some words a specific to certain places/states. Examples:
halli, hobli; bawdi, kuan; society; layout; etc.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Some words a specific to certain places/states. Examples:
halli, hobli; bawdi, kuan; society; layout; etc.
Addressing Systems across the world: US, Europe, Korea,
Japan; countries like Brazil, and India
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Proposed Model
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Preprocessing
An elaborate preprocessing model was necessary that accounts
for the following.
Retaining only those terms that possibly help classification
(discriminability)
Merging of terms by empirical statistical models as well as
domain knowledge based rules, n-grams, abbreviating, etc.
Developing data dependent dictionaries based on pattern
clustering (Machine Learning) and forming an equivalent set
Preprocessing reduces the vocabulary size by 65% as
measured on a large dataset
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Preprocessing for Data Compaction
Figure: Impact of Preprocessing
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -
Address Strings
Sl.No. Address
1 adf6546s54f6sadfsd6dsa4f6sd54f6sd46fasd54sd6f
2 gasdfashagadfasmejastic
3 fdgdf
4 hjsdhaddsdsasdsa
5 dsfadafadsasdfsdafsda
6 hjsdhaddsdsasdsa
7 asd
8 lmflvml
9 assasfsafasfsasfsfsafashaphilomena
10 faskjbdasdlkjbsaasd
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -
Address Strings-Heatmap
Figure: MonkeyType Addresses
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -
Items Bought
Figure: Items bought by such people
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
While writing addresses it is often found that the customer
either inadvertently misses the space or removed during
storage/retrieval
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
While writing addresses it is often found that the customer
either inadvertently misses the space or removed during
storage/retrieval
Separating such compound words
Compute empirical probabilities of words
Assuming conditional independence, if the joint probability of a
compound word is less than the product of the individual
words, separate the words
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modified version of the tree to generate
n-grams
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modified version of the tree to generate
n-grams
Conventional method
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modified version of the tree to generate
n-grams
Conventional method
New approach
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Clustering for equivalent set of
words with spell variations - Ex. koramangala, electronics
koramanagala koromangala kormanagala koramnagala
koramangalato kanamangala koramanagla koremangala
koaramangala koramamgala karamangala tkoramangala
kormangalla koramongala koarmangala korammangala
koramangalla koramangale koramanagal
electronice eclectronic elelctronic eelectronic electronica electroincs
electronics electroninc electrinics electroncis electronincs
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing:: Clustering for ... spell variations
- Ex. Bannerghattaroad(61 variations)
bannerghattaroad, bannergattaroad, banerghattaroad, bannerghataroad,
bannerughattaroad, bannarghattaroad, banergattaroad,
banneraghattaroad, bannerghettaroad, bannerugattaroad,
bhannerghattaroad, bennerghattaroad, bannerghttaroad,
bannargattaroad, banarghattaroad, banneghattaroad, banneragattaroad,
bennarghattaroad, baneerghattaroad, bannergettaroad,
banngerghattaroad, banerghataroad, bannerghuttaroad, bannergatharoad,
benerghattaroad, bannerghattaroadto, bannergataroad,
bannergattharoad, banerghettaroad, bannerguttaroad, bannarghataroad,
bannnerghattaroad, bannarghettaroad, banerughattaroad,
bannergahttaroad, bhannerughattaroad, bennergattaroad,
bannerghattroad, bannaraghattaroad, bannerhattaroad,
bannerghatharoad, banneerghattaroad, bannaerghattaroad,
baneergattaroad, bhannergattaroad, bhanerghattaroad,
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Post-processing :: Semi-Supervised Methods
Discussion
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Revisiting The Model
Supervised Classification
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Summary
Novelty
Solution is novel and developed in-house
No similar solution found in the Literature
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Thank You
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G

More Related Content

Viewers also liked

Online Student Registration System
Online Student Registration SystemOnline Student Registration System
Online Student Registration SystemSanjana Agarwal
 
Student information system project
Student information system projectStudent information system project
Student information system projectRizwan Ashraf
 
Procedure qualification
Procedure qualificationProcedure qualification
Procedure qualificationvaasuBandaru
 
M02 Uml Overview
M02 Uml OverviewM02 Uml Overview
M02 Uml OverviewDang Tuan
 
Types of Grading and Reporting System
Types of Grading and Reporting System Types of Grading and Reporting System
Types of Grading and Reporting System Cyra Mae Soreda
 
9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)Amani Mrisho
 
Student information-system-project-outline
Student information-system-project-outlineStudent information-system-project-outline
Student information-system-project-outlineAmit Panwar
 
Course registration system dfd
Course registration system dfdCourse registration system dfd
Course registration system dfdUtsav mistry
 
Modeling- Object, Dynamic and Functional
Modeling- Object, Dynamic and FunctionalModeling- Object, Dynamic and Functional
Modeling- Object, Dynamic and FunctionalRajani Bhandari
 
Use Case Diagram
Use Case DiagramUse Case Diagram
Use Case DiagramAshesh R
 

Viewers also liked (12)

Online Student Registration System
Online Student Registration SystemOnline Student Registration System
Online Student Registration System
 
Student information system project
Student information system projectStudent information system project
Student information system project
 
Procedure qualification
Procedure qualificationProcedure qualification
Procedure qualification
 
M02 Uml Overview
M02 Uml OverviewM02 Uml Overview
M02 Uml Overview
 
Types of Grading and Reporting System
Types of Grading and Reporting System Types of Grading and Reporting System
Types of Grading and Reporting System
 
Grading system
Grading systemGrading system
Grading system
 
5 Type Of Architecture Design Process
5 Type Of Architecture Design Process 5 Type Of Architecture Design Process
5 Type Of Architecture Design Process
 
9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)
 
Student information-system-project-outline
Student information-system-project-outlineStudent information-system-project-outline
Student information-system-project-outline
 
Course registration system dfd
Course registration system dfdCourse registration system dfd
Course registration system dfd
 
Modeling- Object, Dynamic and Functional
Modeling- Object, Dynamic and FunctionalModeling- Object, Dynamic and Functional
Modeling- Object, Dynamic and Functional
 
Use Case Diagram
Use Case DiagramUse Case Diagram
Use Case Diagram
 

Similar to Shipment Address Classification in Logistics using Machine Learning

Address classification
Address classificationAddress classification
Address classificationNamanChikara1
 
How to Answer Candidate Questions About Your DEI Strategy
How to Answer Candidate Questions About Your DEI StrategyHow to Answer Candidate Questions About Your DEI Strategy
How to Answer Candidate Questions About Your DEI StrategyRecruitingDaily.com LLC
 
Big Data in Human Resources
Big Data in Human ResourcesBig Data in Human Resources
Big Data in Human ResourcesMatthias Vallaey
 
Leaderhip dancefloor weminar
Leaderhip dancefloor weminarLeaderhip dancefloor weminar
Leaderhip dancefloor weminarAngel Diaz-Maroto
 
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...LinkedIn Talent Solutions
 
Human Resource Planning PowerPoint Presentation Slides
Human Resource Planning PowerPoint Presentation Slides Human Resource Planning PowerPoint Presentation Slides
Human Resource Planning PowerPoint Presentation Slides SlideTeam
 
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docxRubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docxjoellemurphey
 
Break Out of the Training Box with the Six Boxes® Approach
Break Out of the Training Box with the Six Boxes® ApproachBreak Out of the Training Box with the Six Boxes® Approach
Break Out of the Training Box with the Six Boxes® Approachcarlbinder
 
MM Bagali ......HR...... Succession planning......HRM......HRD.......Management
MM Bagali ......HR...... Succession planning......HRM......HRD.......ManagementMM Bagali ......HR...... Succession planning......HRM......HRD.......Management
MM Bagali ......HR...... Succession planning......HRM......HRD.......Managementdr m m bagali, phd in hr
 
Sharon G Wilder Resume v1
Sharon G Wilder Resume v1Sharon G Wilder Resume v1
Sharon G Wilder Resume v1Sharon Wilder
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesConnected Data World
 
Succession Management PowerPoint Presentation Slides
Succession Management PowerPoint Presentation Slides Succession Management PowerPoint Presentation Slides
Succession Management PowerPoint Presentation Slides SlideTeam
 
Make L&D Count - Shape a strong business case for L&D
Make L&D Count - Shape a strong business case for L&DMake L&D Count - Shape a strong business case for L&D
Make L&D Count - Shape a strong business case for L&DAlexandra Lederer
 
Successful ERP Selection
Successful ERP SelectionSuccessful ERP Selection
Successful ERP SelectionKatie Flanagan
 
Dfwtrn SourceCon2012 recap
Dfwtrn SourceCon2012 recapDfwtrn SourceCon2012 recap
Dfwtrn SourceCon2012 recapDorothy Beach
 
Replacement Planning PowerPoint Presentation Slides
Replacement Planning PowerPoint Presentation Slides Replacement Planning PowerPoint Presentation Slides
Replacement Planning PowerPoint Presentation Slides SlideTeam
 
Planning Your Workforce During Turbulent Times
Planning Your Workforce During Turbulent TimesPlanning Your Workforce During Turbulent Times
Planning Your Workforce During Turbulent TimesWorkday, Inc.
 

Similar to Shipment Address Classification in Logistics using Machine Learning (20)

Address classification
Address classificationAddress classification
Address classification
 
Vedant Borse
Vedant BorseVedant Borse
Vedant Borse
 
How to Answer Candidate Questions About Your DEI Strategy
How to Answer Candidate Questions About Your DEI StrategyHow to Answer Candidate Questions About Your DEI Strategy
How to Answer Candidate Questions About Your DEI Strategy
 
Big Data in Human Resources
Big Data in Human ResourcesBig Data in Human Resources
Big Data in Human Resources
 
Leaderhip dancefloor weminar
Leaderhip dancefloor weminarLeaderhip dancefloor weminar
Leaderhip dancefloor weminar
 
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
 
4 Steps to Become an HR Analytics Champion
4 Steps to Become an HR Analytics Champion4 Steps to Become an HR Analytics Champion
4 Steps to Become an HR Analytics Champion
 
Human Resource Planning PowerPoint Presentation Slides
Human Resource Planning PowerPoint Presentation Slides Human Resource Planning PowerPoint Presentation Slides
Human Resource Planning PowerPoint Presentation Slides
 
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docxRubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
 
Break Out of the Training Box with the Six Boxes® Approach
Break Out of the Training Box with the Six Boxes® ApproachBreak Out of the Training Box with the Six Boxes® Approach
Break Out of the Training Box with the Six Boxes® Approach
 
MM Bagali ......HR...... Succession planning......HRM......HRD.......Management
MM Bagali ......HR...... Succession planning......HRM......HRD.......ManagementMM Bagali ......HR...... Succession planning......HRM......HRD.......Management
MM Bagali ......HR...... Succession planning......HRM......HRD.......Management
 
Sharon G Wilder Resume v1
Sharon G Wilder Resume v1Sharon G Wilder Resume v1
Sharon G Wilder Resume v1
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the pieces
 
Succession Management PowerPoint Presentation Slides
Succession Management PowerPoint Presentation Slides Succession Management PowerPoint Presentation Slides
Succession Management PowerPoint Presentation Slides
 
Make L&D Count - Shape a strong business case for L&D
Make L&D Count - Shape a strong business case for L&DMake L&D Count - Shape a strong business case for L&D
Make L&D Count - Shape a strong business case for L&D
 
Finding Your Path to Value
Finding Your Path to ValueFinding Your Path to Value
Finding Your Path to Value
 
Successful ERP Selection
Successful ERP SelectionSuccessful ERP Selection
Successful ERP Selection
 
Dfwtrn SourceCon2012 recap
Dfwtrn SourceCon2012 recapDfwtrn SourceCon2012 recap
Dfwtrn SourceCon2012 recap
 
Replacement Planning PowerPoint Presentation Slides
Replacement Planning PowerPoint Presentation Slides Replacement Planning PowerPoint Presentation Slides
Replacement Planning PowerPoint Presentation Slides
 
Planning Your Workforce During Turbulent Times
Planning Your Workforce During Turbulent TimesPlanning Your Workforce During Turbulent Times
Planning Your Workforce During Turbulent Times
 

Recently uploaded

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Shipment Address Classification in Logistics using Machine Learning

  • 1. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Shipment Address Classification in Logistics in the absence of Geolocation Information Dr. T. Ravindra Babu, Data Scientist, Flipkart August 1, 2015 Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 2. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Presentation Plan Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 3. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Motivation and Problem Definition Motivation Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 4. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Motivation and Problem Definition Motivation Problem Definition Typical Operations Scenario at Delivery Hub without a model Inscan of shipments received from Mother Hub Manual reading of address; Assign to the Route/FE Sorting and Delivery Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 5. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Motivation and Problem Definition Motivation Problem Definition Typical Operations Scenario at Delivery Hub without a model Inscan of shipments received from Mother Hub Manual reading of address; Assign to the Route/FE Sorting and Delivery Overview of Proposed Solution Capturing FEs’ domain knowledge and modelling around it Classifying an address to be belonging to a pre-defined subarea Allocation of the shipments to Route/FE based on Machine Learning based Classifier Sorting and Delivery Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 6. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Delivery Hub and Subareas Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 7. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 8. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 9. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Structure in address is lacking even in city like Bangalore. Few examples. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 10. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Structure in address is lacking even in city like Bangalore. Few examples. Some words a specific to certain places/states. Examples: halli, hobli; bawdi, kuan; society; layout; etc. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 11. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Structure in address is lacking even in city like Bangalore. Few examples. Some words a specific to certain places/states. Examples: halli, hobli; bawdi, kuan; society; layout; etc. Addressing Systems across the world: US, Europe, Korea, Japan; countries like Brazil, and India Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 12. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Proposed Model Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 13. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Preprocessing An elaborate preprocessing model was necessary that accounts for the following. Retaining only those terms that possibly help classification (discriminability) Merging of terms by empirical statistical models as well as domain knowledge based rules, n-grams, abbreviating, etc. Developing data dependent dictionaries based on pattern clustering (Machine Learning) and forming an equivalent set Preprocessing reduces the vocabulary size by 65% as measured on a large dataset Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 14. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Preprocessing for Data Compaction Figure: Impact of Preprocessing Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 15. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Fraud Address Classification - Address Strings Sl.No. Address 1 adf6546s54f6sadfsd6dsa4f6sd54f6sd46fasd54sd6f 2 gasdfashagadfasmejastic 3 fdgdf 4 hjsdhaddsdsasdsa 5 dsfadafadsasdfsdafsda 6 hjsdhaddsdsasdsa 7 asd 8 lmflvml 9 assasfsafasfsasfsfsafashaphilomena 10 faskjbdasdlkjbsaasd Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 16. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Fraud Address Classification - Address Strings-Heatmap Figure: MonkeyType Addresses Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 17. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Fraud Address Classification - Items Bought Figure: Items bought by such people Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 18. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Probabilistic Separation of Compound Words To a large extent, Addresses are not amenable to English Dictionaries Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 19. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Probabilistic Separation of Compound Words To a large extent, Addresses are not amenable to English Dictionaries While writing addresses it is often found that the customer either inadvertently misses the space or removed during storage/retrieval Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 20. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Probabilistic Separation of Compound Words To a large extent, Addresses are not amenable to English Dictionaries While writing addresses it is often found that the customer either inadvertently misses the space or removed during storage/retrieval Separating such compound words Compute empirical probabilities of words Assuming conditional independence, if the joint probability of a compound word is less than the product of the individual words, separate the words Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 21. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 22. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets We implement a modified version of the tree to generate n-grams Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 23. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets We implement a modified version of the tree to generate n-grams Conventional method Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 24. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets We implement a modified version of the tree to generate n-grams Conventional method New approach Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 25. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Clustering for equivalent set of words with spell variations - Ex. koramangala, electronics koramanagala koromangala kormanagala koramnagala koramangalato kanamangala koramanagla koremangala koaramangala koramamgala karamangala tkoramangala kormangalla koramongala koarmangala korammangala koramangalla koramangale koramanagal electronice eclectronic elelctronic eelectronic electronica electroincs electronics electroninc electrinics electroncis electronincs Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 26. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing:: Clustering for ... spell variations - Ex. Bannerghattaroad(61 variations) bannerghattaroad, bannergattaroad, banerghattaroad, bannerghataroad, bannerughattaroad, bannarghattaroad, banergattaroad, banneraghattaroad, bannerghettaroad, bannerugattaroad, bhannerghattaroad, bennerghattaroad, bannerghttaroad, bannargattaroad, banarghattaroad, banneghattaroad, banneragattaroad, bennarghattaroad, baneerghattaroad, bannergettaroad, banngerghattaroad, banerghataroad, bannerghuttaroad, bannergatharoad, benerghattaroad, bannerghattaroadto, bannergataroad, bannergattharoad, banerghettaroad, bannerguttaroad, bannarghataroad, bannnerghattaroad, bannarghettaroad, banerughattaroad, bannergahttaroad, bhannerughattaroad, bennergattaroad, bannerghattroad, bannaraghattaroad, bannerhattaroad, bannerghatharoad, banneerghattaroad, bannaerghattaroad, baneergattaroad, bhannergattaroad, bhanerghattaroad, Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 27. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Post-processing :: Semi-Supervised Methods Discussion Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 28. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Revisiting The Model Supervised Classification Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 29. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Summary Novelty Solution is novel and developed in-house No similar solution found in the Literature Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
  • 30. Motivation, Problem Definition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Thank You Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G