SlideShare a Scribd company logo
1 of 50
Download to read offline
Valid Statistical Analysis at John
    Deere and Use of the R
    Programming Language
            Derek Hoffman
             Nov-8-2012
A bit about your speaker…

 • BS in Statistics and
    Material Science
    @ Winona State
        University
 • Masters in Statistics
    @ Iowa State
        University
 • 5 Years @ John Deere
Forecasting Group in 2012




 •   Improvements due to the science of forecasting
 •   Explosion in value and statistician hiring
 •   Increase in problem solving flexibility due to use of R
 •   Huge company saving with dropping flop forecasting software
• Revenue of roughly 35
  billion, 8.7% profit
• Has been a Fortune 500
  company for the last 56
  years, roughly 94th in
  rank.
• Employs about 50,000
  people world wide –
  roughly 5,000 of them in
  the Moline headquarters.
Deere & Company – 3 parts

 • Agriculture ~70%

                                 • Turf~15%




                • Construction
                  ~15%
Why does Deere hire forecasters?

 • Availability needs to match demand OR you
   lose market share
 • Inventory needs to stay low OR you pay lots
   in taxes and storage costs
 • New factories need to be built at the right
   size and time OR you made a multi million
   dollar mistake.
 • Work force needs to be hired/cut depending
   on production plans OR you lose tons
   training and severance.
My group’s reach at John Deere

       CEO,                        Flexibility of
    Presidents,                     Inventory
    Financials                     Next Month




                  Forecasts



                                 Factory Shifts
   New Markets,
                                      and
   10 Years Out
                                  Production
My group’s reach at John Deere

       CEO,                        Flexibility of
    Presidents,                     Inventory
    Financials                     Next Month




                  Forecasts



                                 Factory Shifts
   New Markets,
                                      and
   10 Years Out
                                  Production
Why do statisticians love R?

 • Common statistical methods are available as
   packages (advantage over C++)
 • Large support group of users worldwide
 • Credibility due to submission standards and
   university usage.
 • Often the program of choice during education
 • Easy to send results to another person (even
   if just text files for data and code)
Why does Deere love R?

• The cost is right
• Open source – no black box mysteries, no
  propriety lock downs
• Easy to share across the business
• Relatively easy to learn
• Often works better or faster than microsoft
  products for data and analysis
• Infinitely customizable to your problem and
  your products – vertical integration
Case Studies at John Deere

•   Short Term Demand Forecasting
•   Crop Forecasting
•   Long Term Demand Forecasting
•   Parts Decision Tree (APO)
•   Order Line Up
•   Data Coordinator
Short Term Demand Forecasting


                      Marketing   Potential Good:
                      Forecast
        Factory                   •Multiple view points
        Forecast
                                  •Buy-in from all players
                                  •Disciplined in forecast creation
                   Estimate
                    Group
                   Forecast       Potential Bad:
                                  •Group-think
                                  •Pressures other than accuracy
                                  •Poor information digestion


       Composite Forecast
Bad Forecasting Philosophies
  Executive Override   Gut Feel / Art          Blackbox Forecasts
                            News,
      News,
                        Experience, Last             History
    Experience             YR’s #’s



     Experience +       Math Comparisons,
    Feelings on that    Finical Forecasting,
     Day + Outside          Experience,                 ?
      pressures          Outside forecasts




                                                  Forecasts (NO
    “Forecasts” and
                                                   estimates of
     directives and         Forecasts
                                                  accuracy, NO
         goals
                                                  interpretation)
Forecasting Philosophies
  Statistical Models         Assumption Models         Economic Models
   Historical Data             Assumptions              Data, Assumptions,
                                  (user generated          News, ???,
  (known because is in the
                               assumptions about the
      past or current)
                                      future)
                                                        Outside Forecasts




         Data +                     Data +              Data + Economics
     Math/Statistics            Math/Statistics                + ???
   as calculated by a         as calculated by a          as created by a
   trained statistician       trained statistician      trained economist




    Forecasts and               Forecasts and              Forecasts,
    MEANINGFUL                    Analysis of               Outside
     plus/minus                 Forecast Error             Forecasts,
      intervals                Contributions by         Current Economic
      (flexibility and bad
     forecast detection)
                                 Assumptions                 News
Use of Data-Driven Analysis




                  Analysis done in
                  my group using R
                  and company data.
Case Studies at John Deere

•   Short Term Demand Forecasting
•   Crop Forecasting
•   Long Term Demand Forecasting
•   Parts Decision Tree (APO)
•   Order Line Up
•   Data Coordinator
Crop Yields Forecasting
Relative Land Area and Use

                             Circle = Total Land
Acres in Major World Crops
              Circle = Total Crop Land
Crop Yields Forecasting
Crop Yields Forecasting



    History                        2nd Year OUT

               1 Year OUT                                    3rd Year OUT




     The whole time, calculating the valid forecast error and influences.

     A large computational task, heavily using programs written in R.
Changes in Crop Splits
Corn Yields
Case Studies at John Deere

•   Short Term Demand Forecasting
•   Crop Forecasting
•   Long Term Demand Forecasting
•   Parts Decision Tree (APO)
•   Order Line Up
•   Data Coordinator
The Wrong way – Growth f(t)

 • The problem really is that we are looking at a
   correlation with time, not a causation. Also
   we will always be extrapolating (because the
   future value of time is outside the our
   historical data set).
What are Likely Causes?

 •   Crop Yields
 •   Planted Acres
 •   Crop Prices
 •   Population
 •   Gross Domestic Product
 •   Farm Size
 •   Government
 •   Mechanization Level of Farming
 •   Crop Choices (Corn damages combines faster than
     wheat.)
Example of Calculations




    The whole time, calculating the valid forecast error and influences.

    A large computational task, heavily using programs written in R.
Case Studies at John Deere

•   Short Term Demand Forecasting
•   Crop Forecasting
•   Long Term Demand Forecasting
•   Parts Decision Tree (APO)
•   Order Line Up
•   Data Coordinator
Parts Forecasting

                    • Tons of parts, need direction
                      how to best forecast with
                      SAP.
Parts Forecasting – Trilingual?
Case Studies at John Deere

•   Short Term Demand Forecasting
•   Crop Forecasting
•   Long Term Demand Forecasting
•   Parts Decision Tree (APO)
•   Order Line Up
•   Data Coordinator
Order Scheduling
Order Scheduling

 Restraint on
 Feature A:
 At most 2
 per 4 in a
 row.

 We’re OK!
Order Scheduling

 Restraint on
 Feature A:
 At most 2
 per 4 in a
 row.

 We’re OK!
Order Scheduling

 Restraint on
 Feature B:
 At most 1
 per 3 in a
 row.

 We’re OK!
Order Scheduling

 Restraint on
 Feature A:
 At most 1
 per 3 in a
 row.

 We’re got a
 problem!

 Have to
 move Matt
 or Shawn’s
 tractor to
 another spot
 and recheck
 it all!
Harvester Lineup – Random Guess
Harvester Lineup – Program Results
Order Scheduling – Time
Order Scheduling = $$$

 •   Old Process                 • Derek’s Process
     – Done manually by             – Automates the process
       hand                         – Duration: 1.5-2 hours
     – Weekly                       – Human time:15 mins
     – Duration: 8 Hours
     – Not necessarily perfect      – Saves about 8 hours
                                      per week
                                    – Saves ~$12K per year,
                                      per product
                                      implementation
Case Studies at John Deere

•   Short Term Demand Forecasting
•   Crop Forecasting
•   Long Term Demand Forecasting
•   Parts Decision Tree (APO)
•   Order Line Up
•   Data Coordinator
Data Coordinator Uses
                                           Scheduled
                                             Tasks
  Multiples
    Data        Multiple
 sources and     ODBC                                  DB2
                                           Batch
  Data types   Connections                 File
                                           execution
    DB2
                               Single R                 Export
                             source Code               Channels
    SQL


    DB2


   Oracle
A forecast of “Analytics”

 • A short history of “cool topics”

 • The future of forecasters

 • The coming data flood and analytics boom

     increase in scalpels ≠ increase in surgeons
The cool word of the year – Dot-com
The cool word of the year - Radiation
The cool word of the year – Big Data


                       How can we grow responsibly as data
                       scientists and statisticians?
Signs you are in the hype

 •   Everyone claims it will change the world
 •   It’s taught in business schools
 •   Features on covers of general magazines
 •   TONS of snake-oil salesmen
 •   Legitimate ease in access to the new thing
Cautionary tale:

                   • Thousands spent on a
                     weather “forecast”
                   • Ridiculous accuracy
                     measures
                   • Business users don’t
                     know the short falls till
                     it’s too late
Growing Need of Forecasting Professionals


 • A need for educated gate keepers to weed
   bad analysis from good.
 • More people are needed to practice
   forecasting as a profession – or the whole
   industry will suffer.
 • More data, more ease, more computing
   needed, with greater need for responsible
   use.
Statistics and R at John Deere

 • John Deere is among the best in large
   manufactures in implementing good
   forecasting methods to demand planning
 • There are still huge areas to grow – no
   where near the data usage of companies like
   Amazon or Wal-Mart
 • The challenge is to increase usage and
   access while maintaining a good internal and
   external reputation

More Related Content

What's hot

Introduction and production technology of winter vegetables
Introduction   and  production technology  of winter vegetablesIntroduction   and  production technology  of winter vegetables
Introduction and production technology of winter vegetablesJunaidNiazi5
 
Çiçekler Sunum
Çiçekler SunumÇiçekler Sunum
Çiçekler Sunumnerman20
 
Temperate deciduous forest PERIOD 3
Temperate deciduous forest PERIOD 3Temperate deciduous forest PERIOD 3
Temperate deciduous forest PERIOD 3Michael McGraw
 
Populus deltoides
Populus deltoidesPopulus deltoides
Populus deltoidesAliNawaz156
 
Role of protected cultivation in fruit crops
Role of protected cultivation in fruit cropsRole of protected cultivation in fruit crops
Role of protected cultivation in fruit cropsPraveen Mishra
 
Nursery Management in horticulture crops
Nursery  Management in horticulture cropsNursery  Management in horticulture crops
Nursery Management in horticulture cropsHARISH J
 
Sex determination in Papaya
Sex determination in PapayaSex determination in Papaya
Sex determination in PapayaAnil Thapa Kazi
 
Fas-Track Breeding Approaches in Fruit Crops
Fas-Track Breeding Approaches in Fruit CropsFas-Track Breeding Approaches in Fruit Crops
Fas-Track Breeding Approaches in Fruit CropsDarshan Kadam
 
Importance of PGR in fruit production and quality.pdf
Importance of PGR in fruit production and quality.pdfImportance of PGR in fruit production and quality.pdf
Importance of PGR in fruit production and quality.pdfAbhishek Pratap
 
Constraint of temperate fruit production in India
Constraint of temperate fruit production in IndiaConstraint of temperate fruit production in India
Constraint of temperate fruit production in IndiaDebashish Hota
 
NTFP: VALUE ADDITION AND IT’S IMPACT ON RURAL LIVELIHOOD
NTFP: VALUE ADDITION AND IT’S IMPACT ON RURAL LIVELIHOODNTFP: VALUE ADDITION AND IT’S IMPACT ON RURAL LIVELIHOOD
NTFP: VALUE ADDITION AND IT’S IMPACT ON RURAL LIVELIHOODYuvarajkumar Madheswaran
 
Vegetative propagation
Vegetative propagationVegetative propagation
Vegetative propagationHeena Malhotra
 
Aonla production technology
Aonla production technologyAonla production technology
Aonla production technologySushma Bhat
 
Cananga odorata by Jayakeerthi KR
Cananga odorata by Jayakeerthi KRCananga odorata by Jayakeerthi KR
Cananga odorata by Jayakeerthi KRJayakeerthi K R
 

What's hot (20)

Credit seminar
Credit seminarCredit seminar
Credit seminar
 
Introduction and production technology of winter vegetables
Introduction   and  production technology  of winter vegetablesIntroduction   and  production technology  of winter vegetables
Introduction and production technology of winter vegetables
 
Çiçekler Sunum
Çiçekler SunumÇiçekler Sunum
Çiçekler Sunum
 
Temperate deciduous forest PERIOD 3
Temperate deciduous forest PERIOD 3Temperate deciduous forest PERIOD 3
Temperate deciduous forest PERIOD 3
 
Breeding techniques for mango banana grapes
Breeding techniques for mango banana grapesBreeding techniques for mango banana grapes
Breeding techniques for mango banana grapes
 
Graft Incompatibility.pdf
Graft Incompatibility.pdfGraft Incompatibility.pdf
Graft Incompatibility.pdf
 
Populus deltoides
Populus deltoidesPopulus deltoides
Populus deltoides
 
Horticulture
HorticultureHorticulture
Horticulture
 
Role of protected cultivation in fruit crops
Role of protected cultivation in fruit cropsRole of protected cultivation in fruit crops
Role of protected cultivation in fruit crops
 
Nursery Management in horticulture crops
Nursery  Management in horticulture cropsNursery  Management in horticulture crops
Nursery Management in horticulture crops
 
Sex determination in Papaya
Sex determination in PapayaSex determination in Papaya
Sex determination in Papaya
 
Fas-Track Breeding Approaches in Fruit Crops
Fas-Track Breeding Approaches in Fruit CropsFas-Track Breeding Approaches in Fruit Crops
Fas-Track Breeding Approaches in Fruit Crops
 
Importance of PGR in fruit production and quality.pdf
Importance of PGR in fruit production and quality.pdfImportance of PGR in fruit production and quality.pdf
Importance of PGR in fruit production and quality.pdf
 
Constraint of temperate fruit production in India
Constraint of temperate fruit production in IndiaConstraint of temperate fruit production in India
Constraint of temperate fruit production in India
 
floriculture (1).pptx
floriculture (1).pptxfloriculture (1).pptx
floriculture (1).pptx
 
NTFP: VALUE ADDITION AND IT’S IMPACT ON RURAL LIVELIHOOD
NTFP: VALUE ADDITION AND IT’S IMPACT ON RURAL LIVELIHOODNTFP: VALUE ADDITION AND IT’S IMPACT ON RURAL LIVELIHOOD
NTFP: VALUE ADDITION AND IT’S IMPACT ON RURAL LIVELIHOOD
 
Vegetative propagation
Vegetative propagationVegetative propagation
Vegetative propagation
 
Aonla production technology
Aonla production technologyAonla production technology
Aonla production technology
 
Aonla
AonlaAonla
Aonla
 
Cananga odorata by Jayakeerthi KR
Cananga odorata by Jayakeerthi KRCananga odorata by Jayakeerthi KR
Cananga odorata by Jayakeerthi KR
 

Viewers also liked

John Deere Final
John Deere FinalJohn Deere Final
John Deere FinalTim Lewis
 
John Deere Social Media Analysis Q4 2015
John Deere Social Media Analysis Q4 2015John Deere Social Media Analysis Q4 2015
John Deere Social Media Analysis Q4 2015Unmetric
 
How the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedHow the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedRevolution Analytics
 
Big Data in Retail - Examples in Action
Big Data in Retail - Examples in ActionBig Data in Retail - Examples in Action
Big Data in Retail - Examples in ActionDavid Pittman
 

Viewers also liked (6)

John Deere Final
John Deere FinalJohn Deere Final
John Deere Final
 
The John Deere Way
The John Deere WayThe John Deere Way
The John Deere Way
 
John Deere Social Media Analysis Q4 2015
John Deere Social Media Analysis Q4 2015John Deere Social Media Analysis Q4 2015
John Deere Social Media Analysis Q4 2015
 
3...forecasting methods
3...forecasting methods3...forecasting methods
3...forecasting methods
 
How the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedHow the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeed
 
Big Data in Retail - Examples in Action
Big Data in Retail - Examples in ActionBig Data in Retail - Examples in Action
Big Data in Retail - Examples in Action
 

Similar to Valid Statistical Analysis and R Programming at John Deere

Spreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationSpreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationDan English
 
Better decisions through analytics in healthcare industry. Our journey so far
Better decisions through analytics in healthcare industry.  Our journey so farBetter decisions through analytics in healthcare industry.  Our journey so far
Better decisions through analytics in healthcare industry. Our journey so farSAS Asia Pacific
 
Iwsm2014 why cant people estimate (dan galorath)
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)Nesma
 
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...Fitzgerald Analytics, Inc.
 
Estimating software development
Estimating software developmentEstimating software development
Estimating software developmentJane Prusakova
 
Software licensing update 12 10-08
Software licensing update 12 10-08Software licensing update 12 10-08
Software licensing update 12 10-08Nadia Mayard
 
Effective Commercial Underwriting using Big Data and Risk Analytics
Effective Commercial Underwriting using Big Data and Risk AnalyticsEffective Commercial Underwriting using Big Data and Risk Analytics
Effective Commercial Underwriting using Big Data and Risk Analyticsintellectseec
 
Forecasting Product Performance Like a Meteorologist (June 2012)
Forecasting Product Performance Like a Meteorologist (June 2012)Forecasting Product Performance Like a Meteorologist (June 2012)
Forecasting Product Performance Like a Meteorologist (June 2012)ProductCamp Boston
 
Forecasting Product Performance060912
Forecasting Product Performance060912Forecasting Product Performance060912
Forecasting Product Performance060912Ananda Chakravarty
 
Forecasting New Product Performance Like A Meteorologist
Forecasting New Product Performance Like A MeteorologistForecasting New Product Performance Like A Meteorologist
Forecasting New Product Performance Like A MeteorologistAnanda Chakravarty
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overviewoptier
 
Data Science at LinkedIn - Data-Driven Products & Insights
Data Science at LinkedIn - Data-Driven Products & InsightsData Science at LinkedIn - Data-Driven Products & Insights
Data Science at LinkedIn - Data-Driven Products & InsightsYael Garten
 
OpTier McKinsey Big Data Overview
OpTier McKinsey Big Data OverviewOpTier McKinsey Big Data Overview
OpTier McKinsey Big Data Overviewnickychu
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overviewoptier
 
Bi introduction for cf os syntelli solutions
Bi introduction for cf os   syntelli solutionsBi introduction for cf os   syntelli solutions
Bi introduction for cf os syntelli solutionsSyntelli Solutions
 
Building a Giant Atlassian Universe to Take Over the World
Building a Giant Atlassian Universe to Take Over the WorldBuilding a Giant Atlassian Universe to Take Over the World
Building a Giant Atlassian Universe to Take Over the WorldAtlassian
 
Planning a data solution - "By Failing to prepare, you are preparing to fail"
Planning a data solution - "By Failing to prepare, you are preparing to fail"Planning a data solution - "By Failing to prepare, you are preparing to fail"
Planning a data solution - "By Failing to prepare, you are preparing to fail"Itai Yaffe
 

Similar to Valid Statistical Analysis and R Programming at John Deere (20)

Spreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationSpreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG Presentation
 
Better decisions through analytics in healthcare industry. Our journey so far
Better decisions through analytics in healthcare industry.  Our journey so farBetter decisions through analytics in healthcare industry.  Our journey so far
Better decisions through analytics in healthcare industry. Our journey so far
 
Iwsm2014 why cant people estimate (dan galorath)
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)
 
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...
 
Estimating software development
Estimating software developmentEstimating software development
Estimating software development
 
Software licensing update 12 10-08
Software licensing update 12 10-08Software licensing update 12 10-08
Software licensing update 12 10-08
 
Effective Commercial Underwriting using Big Data and Risk Analytics
Effective Commercial Underwriting using Big Data and Risk AnalyticsEffective Commercial Underwriting using Big Data and Risk Analytics
Effective Commercial Underwriting using Big Data and Risk Analytics
 
Forecasting Product Performance Like a Meteorologist (June 2012)
Forecasting Product Performance Like a Meteorologist (June 2012)Forecasting Product Performance Like a Meteorologist (June 2012)
Forecasting Product Performance Like a Meteorologist (June 2012)
 
Forecasting Product Performance060912
Forecasting Product Performance060912Forecasting Product Performance060912
Forecasting Product Performance060912
 
Forecasting New Product Performance Like A Meteorologist
Forecasting New Product Performance Like A MeteorologistForecasting New Product Performance Like A Meteorologist
Forecasting New Product Performance Like A Meteorologist
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Insync10 1708 145 vella
Insync10 1708 145 vellaInsync10 1708 145 vella
Insync10 1708 145 vella
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
Data Science at LinkedIn - Data-Driven Products & Insights
Data Science at LinkedIn - Data-Driven Products & InsightsData Science at LinkedIn - Data-Driven Products & Insights
Data Science at LinkedIn - Data-Driven Products & Insights
 
Engineering Global Content Planning - Pam Didner
Engineering Global Content Planning - Pam DidnerEngineering Global Content Planning - Pam Didner
Engineering Global Content Planning - Pam Didner
 
OpTier McKinsey Big Data Overview
OpTier McKinsey Big Data OverviewOpTier McKinsey Big Data Overview
OpTier McKinsey Big Data Overview
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
Bi introduction for cf os syntelli solutions
Bi introduction for cf os   syntelli solutionsBi introduction for cf os   syntelli solutions
Bi introduction for cf os syntelli solutions
 
Building a Giant Atlassian Universe to Take Over the World
Building a Giant Atlassian Universe to Take Over the WorldBuilding a Giant Atlassian Universe to Take Over the World
Building a Giant Atlassian Universe to Take Over the World
 
Planning a data solution - "By Failing to prepare, you are preparing to fail"
Planning a data solution - "By Failing to prepare, you are preparing to fail"Planning a data solution - "By Failing to prepare, you are preparing to fail"
Planning a data solution - "By Failing to prepare, you are preparing to fail"
 

More from Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 

More from Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 

Valid Statistical Analysis and R Programming at John Deere

  • 1. Valid Statistical Analysis at John Deere and Use of the R Programming Language Derek Hoffman Nov-8-2012
  • 2. A bit about your speaker… • BS in Statistics and Material Science @ Winona State University • Masters in Statistics @ Iowa State University • 5 Years @ John Deere
  • 3. Forecasting Group in 2012 • Improvements due to the science of forecasting • Explosion in value and statistician hiring • Increase in problem solving flexibility due to use of R • Huge company saving with dropping flop forecasting software
  • 4. • Revenue of roughly 35 billion, 8.7% profit • Has been a Fortune 500 company for the last 56 years, roughly 94th in rank. • Employs about 50,000 people world wide – roughly 5,000 of them in the Moline headquarters.
  • 5. Deere & Company – 3 parts • Agriculture ~70% • Turf~15% • Construction ~15%
  • 6. Why does Deere hire forecasters? • Availability needs to match demand OR you lose market share • Inventory needs to stay low OR you pay lots in taxes and storage costs • New factories need to be built at the right size and time OR you made a multi million dollar mistake. • Work force needs to be hired/cut depending on production plans OR you lose tons training and severance.
  • 7. My group’s reach at John Deere CEO, Flexibility of Presidents, Inventory Financials Next Month Forecasts Factory Shifts New Markets, and 10 Years Out Production
  • 8. My group’s reach at John Deere CEO, Flexibility of Presidents, Inventory Financials Next Month Forecasts Factory Shifts New Markets, and 10 Years Out Production
  • 9. Why do statisticians love R? • Common statistical methods are available as packages (advantage over C++) • Large support group of users worldwide • Credibility due to submission standards and university usage. • Often the program of choice during education • Easy to send results to another person (even if just text files for data and code)
  • 10. Why does Deere love R? • The cost is right • Open source – no black box mysteries, no propriety lock downs • Easy to share across the business • Relatively easy to learn • Often works better or faster than microsoft products for data and analysis • Infinitely customizable to your problem and your products – vertical integration
  • 11. Case Studies at John Deere • Short Term Demand Forecasting • Crop Forecasting • Long Term Demand Forecasting • Parts Decision Tree (APO) • Order Line Up • Data Coordinator
  • 12. Short Term Demand Forecasting Marketing Potential Good: Forecast Factory •Multiple view points Forecast •Buy-in from all players •Disciplined in forecast creation Estimate Group Forecast Potential Bad: •Group-think •Pressures other than accuracy •Poor information digestion Composite Forecast
  • 13. Bad Forecasting Philosophies Executive Override Gut Feel / Art Blackbox Forecasts News, News, Experience, Last History Experience YR’s #’s Experience + Math Comparisons, Feelings on that Finical Forecasting, Day + Outside Experience, ? pressures Outside forecasts Forecasts (NO “Forecasts” and estimates of directives and Forecasts accuracy, NO goals interpretation)
  • 14. Forecasting Philosophies Statistical Models Assumption Models Economic Models Historical Data Assumptions Data, Assumptions, (user generated News, ???, (known because is in the assumptions about the past or current) future) Outside Forecasts Data + Data + Data + Economics Math/Statistics Math/Statistics + ??? as calculated by a as calculated by a as created by a trained statistician trained statistician trained economist Forecasts and Forecasts and Forecasts, MEANINGFUL Analysis of Outside plus/minus Forecast Error Forecasts, intervals Contributions by Current Economic (flexibility and bad forecast detection) Assumptions News
  • 15. Use of Data-Driven Analysis Analysis done in my group using R and company data.
  • 16. Case Studies at John Deere • Short Term Demand Forecasting • Crop Forecasting • Long Term Demand Forecasting • Parts Decision Tree (APO) • Order Line Up • Data Coordinator
  • 18. Relative Land Area and Use Circle = Total Land
  • 19. Acres in Major World Crops Circle = Total Crop Land
  • 21. Crop Yields Forecasting History 2nd Year OUT 1 Year OUT 3rd Year OUT The whole time, calculating the valid forecast error and influences. A large computational task, heavily using programs written in R.
  • 22. Changes in Crop Splits
  • 24. Case Studies at John Deere • Short Term Demand Forecasting • Crop Forecasting • Long Term Demand Forecasting • Parts Decision Tree (APO) • Order Line Up • Data Coordinator
  • 25. The Wrong way – Growth f(t) • The problem really is that we are looking at a correlation with time, not a causation. Also we will always be extrapolating (because the future value of time is outside the our historical data set).
  • 26. What are Likely Causes? • Crop Yields • Planted Acres • Crop Prices • Population • Gross Domestic Product • Farm Size • Government • Mechanization Level of Farming • Crop Choices (Corn damages combines faster than wheat.)
  • 27. Example of Calculations The whole time, calculating the valid forecast error and influences. A large computational task, heavily using programs written in R.
  • 28. Case Studies at John Deere • Short Term Demand Forecasting • Crop Forecasting • Long Term Demand Forecasting • Parts Decision Tree (APO) • Order Line Up • Data Coordinator
  • 29. Parts Forecasting • Tons of parts, need direction how to best forecast with SAP.
  • 30. Parts Forecasting – Trilingual?
  • 31. Case Studies at John Deere • Short Term Demand Forecasting • Crop Forecasting • Long Term Demand Forecasting • Parts Decision Tree (APO) • Order Line Up • Data Coordinator
  • 33. Order Scheduling Restraint on Feature A: At most 2 per 4 in a row. We’re OK!
  • 34. Order Scheduling Restraint on Feature A: At most 2 per 4 in a row. We’re OK!
  • 35. Order Scheduling Restraint on Feature B: At most 1 per 3 in a row. We’re OK!
  • 36. Order Scheduling Restraint on Feature A: At most 1 per 3 in a row. We’re got a problem! Have to move Matt or Shawn’s tractor to another spot and recheck it all!
  • 37. Harvester Lineup – Random Guess
  • 38. Harvester Lineup – Program Results
  • 40. Order Scheduling = $$$ • Old Process • Derek’s Process – Done manually by – Automates the process hand – Duration: 1.5-2 hours – Weekly – Human time:15 mins – Duration: 8 Hours – Not necessarily perfect – Saves about 8 hours per week – Saves ~$12K per year, per product implementation
  • 41. Case Studies at John Deere • Short Term Demand Forecasting • Crop Forecasting • Long Term Demand Forecasting • Parts Decision Tree (APO) • Order Line Up • Data Coordinator
  • 42. Data Coordinator Uses Scheduled Tasks Multiples Data Multiple sources and ODBC DB2 Batch Data types Connections File execution DB2 Single R Export source Code Channels SQL DB2 Oracle
  • 43. A forecast of “Analytics” • A short history of “cool topics” • The future of forecasters • The coming data flood and analytics boom increase in scalpels ≠ increase in surgeons
  • 44. The cool word of the year – Dot-com
  • 45. The cool word of the year - Radiation
  • 46. The cool word of the year – Big Data How can we grow responsibly as data scientists and statisticians?
  • 47. Signs you are in the hype • Everyone claims it will change the world • It’s taught in business schools • Features on covers of general magazines • TONS of snake-oil salesmen • Legitimate ease in access to the new thing
  • 48. Cautionary tale: • Thousands spent on a weather “forecast” • Ridiculous accuracy measures • Business users don’t know the short falls till it’s too late
  • 49. Growing Need of Forecasting Professionals • A need for educated gate keepers to weed bad analysis from good. • More people are needed to practice forecasting as a profession – or the whole industry will suffer. • More data, more ease, more computing needed, with greater need for responsible use.
  • 50. Statistics and R at John Deere • John Deere is among the best in large manufactures in implementing good forecasting methods to demand planning • There are still huge areas to grow – no where near the data usage of companies like Amazon or Wal-Mart • The challenge is to increase usage and access while maintaining a good internal and external reputation