SlideShare a Scribd company logo
1 of 19
Download to read offline
BUILDING MODELS QUICKLY
ADDRESSING HOUSING
OVERFLOW AT PURDUE
Using Greenplum & XGBoost
March 19, 2019
PURDUE UNIVERSITY
AN INDIANA INSTITUTION
• Located in West Lafayette, IN
• Consists of one main campus
and 3 regional campuses
• Over 40,000 students enrolled
o ~30k Undergraduate, 10k
Graduate
• Over 200 majors offered
across ten academic colleges
• Part of the Big Ten Conference
DATA SCIENCE & HIGHER EDUCATION
A WORLD OF POSSIBILITY
• Higher education has only just
begun using Data Science
• This means lots of new paths
to forge
• From the obvious:
o Predict grades (done)
o Maximizing financial aid
through predicting yield
• To the complex:
o Course recommendation
engine
o Entry essay neural network
PURDUE’S IDAP SYSTEM
WHAT IS IDAP?
PURDUE DATA SCIENCE & ANALYTICS
• IDAP serves as a gray box for a wide variety of data sources:
o Traditional: Student Information
o Ancillary Data Sources: Degree Requirements, Student Activities
o New: Network Logs, Card Swipes, LMS Clickpath
• Also houses a modelling pipeline with several production models
o At Risk (<2.5 GPA first semester)
o Course GPA (C or worse in a course)
o Yield (Which students will attend)
• Faculty Research
o Secure high-compute server (Raiden)
o Pulls data from Greenplum
IDAP OVERVIEW
Refreshed data (incoming
daily/weekly/monthly updates)
feature
generation
pipeline
Static features
Static + time-sensitive
LMS features
Static + time-sensitive LMS +
network + card logs features
In-database parallel
grid-search
(XGBoost)
MADlib Logistic
Regression
Sklearn
AdaBoost
Sklearn
RandomForest
Model
selection
Serialize to
disk
Structured, unstructured
data sources
scoring results
• Student ID
• Feature names, values, importance scores
• Predictions
Results sent to end-
users
Cleared by IDAP
Data Scientist
Modeling pipeline
MODEL BUILDING AND SCORING PIPELINE
HOUSING
CANCELLATIONS:
PIPELINE IN ACTION
THE SITUATION
TOO MANY STUDENTS, TOO LITTLE HOUSING
• Admission to Purdue in Fall 2018 hit historic
highs
o 8,357 students in the entering class, on top of
historic high enrollment each of the two prior
years
o Nearly 800 new students vs Fall 2017
• Housing not being built quick enough to keep
up with demand
• Hundreds more students than usual might be
put into temporary and off-campus leased
housing at the start of semester
THE SITUATION
TOO MANY STUDENTS, TOO LITTLE HOUSING
• Typical Problem Amplified
• While temporary housing is
normal at many universities, the
need goes up with unexpected
enrollment
• Limited, Non-Ideal Space
• Temporary space is not unlimited,
nor is it ideal for learning
• Off-Campus Leased Housing
• Beyond temporary space, Purdue
also leases space to house
excess returning students
• This is not campus-adjacent, and
therefore also not ideal. Also not
unlimited
THE SOLUTION
BUILD A MODEL IN XGBOOST USING GREENPLUM
• Build a model - quickly
• The decision was made to try and predict which people coming to
Purdue’s housing system would not show up
• The goal – reduce the number of student move disruptions from
temporary housing, and maximize on-campus housing space
• From concept to execution, there were less than two months in
which to create and implement the results of this model
• Blending data
• Housing data was not in the greenplum system, needed to be
pulled in so it could be blended with everything needed for the
model
• Two Models
• Divided into two models, for two fundamentally different groups:
new students and those returning to campus housing
• First Iteration
• The model was put together
mostly using features from
prior student success models
• Performance & Usage
• Initial performance allowed
us to provide a sorted list of
the most likely students to
cancel
• This list was used to make
phone calls to these students
and confirm their intent to
utilize campus housing
THE SOLUTION
BUILD A MODEL IN XGBOOST AND GREENPLUM
Returning Students – Version 1
Cancelled Precision Recall F-Score Support
0 0.932 0.775 0.846 1833
1 0.225 0.538 0.317 223
New Students – Version 1
Cancelled Precision Recall F-Score Support
0 0.997 0.956 0.976 2765
1 0.463 0.929 0.618 113
• Typical Year
• Typically, rooms in the Union hotel are reserved as temporary space
• Additionally, other temporary spaces usually house students until after
October break
• Fall 2018 Temporary Housing
• Partly due to the calling students with high probability to cancel,
temporary housing actually saw a reduction in strain
• Not only were all students out of temporary housing by October break,
but rooms at the PMU were released prior to the start of classes
INITIAL SUCCESS
MORE EFFICIENT SPACE USAGE
• There was a cohort of students that
did not retain at Purdue, which the
model missed
• The model is highly unsure of many
students
• This was due, in part, to a bad
definition of ‘returner’ and of ‘cancel’
in the model – it needed to be fixed
and retrained
SUCCESS WITH ISSUES
USEFUL, NEEDS IMPROVEMENT
• Tuning & New Features
• New features and further
tuning of the model’s
parameters massively
improved the model for
returning students
• Impact
• Far more accurate model,
fewer calls required to reach
the students intending to
cancel
RETRAINING
ADDITIONAL FEATURE BUILD
Returning Students – Version 2
Cancelled Precision Recall F-Score Support
0 0.961 0.938 0.949 1880
1 0.524 0.642 0.577 201
New Students – Version 2
Cancelled Precision Recall F-Score Support
0 0.996 0.965 0.980 2736
1 0.555 0.917 0.691 132
• Tuning & New Features
• New features and further
tuning of the model’s
parameters massively
improved the model for
returning students
• Impact
• Far more accurate model,
fewer calls required to reach
the students intending to
cancel
RETRAINING
ADDITIONAL FEATURE BUILD
• Post-hoc Data Recording
• Fall 2019, housing will record who/when they call students so that we
can better match that with the actual results when cancellations come in
after August
• Potential Future Retraining
• New housing is being built on-campus to keep up with the growing
population. Once that is online, cancellation patterns may change and
require retraining
• Otherwise, keeping up with post-hoc analysis of results should indicate
when a retraining is next necessary
• Due to the setup of the model in greenplum, retraining is quick & easy!
NEXT STEPS
FUTURE TUNING & USAGE
APPENDIX
IMPORTANT FEATURES
TOP FEATURES IN XGBOOST MODELING RESULTS
Rank Feature Score
1star_registration_promptness 272
2 hs_core_gpa 225
3 population 223
4 medianfemalebachincome 190
5 medianmalebachincome 174
6 hs_gpa 167
7 closet_rep_miles 166
8 bach25plus 159
9 per_capita_income 158
10 mast25plus 153
11 days_before_start_sign 120
12 highest_satr_ebrw 100
13 highest_satr_total 91
14 highest_satr_math 85
15 ap_avg 77
16 decision_count 75
17 ap_cnt 56
18 vstar_ind 45
Rank Feature Score
1semester_cdfw_rate 745
2 prior_overall_gpa 743
3 avg_weekly_rectrac_swipes 718
4 hs_core_gpa 579
5 medianfemalebachincome 563
6 closet_rep_miles 549
7 population 482
8 bach25plus 450
9 per_capita_income 432
10 ap_avg 422
11 medianmalebachincome 413
12 days_before_start_sign 403
13 num_room_changes_last_year 389
14 num_classes_registered 374
15 highest_satr_ebrw 372
16 mast25plus 363
17 hs_gpa 345
18 highest_satr_math 318
19 highest_satr_total 304
20 hs_gpa_vs_hs_inst_gpa_diff 280
21 hs_size 261
22 hs_inst_gpa 251
23 ap_cnt 248
24 age 201
25 roomie_avg_gpa 189
26 age_as_of_semstart 103
27 roomie_gpa_diff 92
New Students Model Returners Model

More Related Content

Similar to Building Models Quickly Addressing Housing Overflow at Purdue - Greenplum Summit 2019

2014 New Member Orientation
2014 New Member Orientation2014 New Member Orientation
2014 New Member OrientationCPEDInitiative
 
Learning outcomes facilitation
Learning outcomes facilitationLearning outcomes facilitation
Learning outcomes facilitationthexia
 
Teaching Work Ready Skills Online
Teaching Work Ready Skills OnlineTeaching Work Ready Skills Online
Teaching Work Ready Skills OnlineJason Brown
 
The Corona Crisis (COVID-19) : Issues and Strategies For Hospitality / Hotel ...
The Corona Crisis (COVID-19) : Issues and Strategies For Hospitality / Hotel ...The Corona Crisis (COVID-19) : Issues and Strategies For Hospitality / Hotel ...
The Corona Crisis (COVID-19) : Issues and Strategies For Hospitality / Hotel ...Andrew Penington
 
Look before you leap externship-job shadowing program
Look before you leap externship-job shadowing programLook before you leap externship-job shadowing program
Look before you leap externship-job shadowing programhengelki
 
Learning Analytics for Student Support
Learning Analytics for Student Support Learning Analytics for Student Support
Learning Analytics for Student Support EADTU
 
Learning outcomes facilitation
Learning outcomes facilitationLearning outcomes facilitation
Learning outcomes facilitationthexia
 
Cengage Learning Webinar, Course Redesign, Improving Outcomes and Reducing Co...
Cengage Learning Webinar, Course Redesign, Improving Outcomes and Reducing Co...Cengage Learning Webinar, Course Redesign, Improving Outcomes and Reducing Co...
Cengage Learning Webinar, Course Redesign, Improving Outcomes and Reducing Co...Cengage Learning
 
A broader view of undergraduate research opportunity programmes: collaborativ...
A broader view of undergraduate research opportunity programmes: collaborativ...A broader view of undergraduate research opportunity programmes: collaborativ...
A broader view of undergraduate research opportunity programmes: collaborativ...Simon Haslett
 
Applying Your Way To Success: Grants and Best Practice Awards
Applying Your Way To Success: Grants and Best Practice AwardsApplying Your Way To Success: Grants and Best Practice Awards
Applying Your Way To Success: Grants and Best Practice AwardsAlliance To Save Energy
 
Rollover, rollover
Rollover, rolloverRollover, rollover
Rollover, rolloverAndy Turner
 
The evolution of University - April 2014
The evolution of University - April 2014The evolution of University - April 2014
The evolution of University - April 2014Oscar Aguer
 
Choosing and Implementing a New LMS: Lessons Learned from St. Petersburg College
Choosing and Implementing a New LMS: Lessons Learned from St. Petersburg CollegeChoosing and Implementing a New LMS: Lessons Learned from St. Petersburg College
Choosing and Implementing a New LMS: Lessons Learned from St. Petersburg CollegeD2L
 
Learning Analytics the Integrated Way
Learning Analytics the Integrated WayLearning Analytics the Integrated Way
Learning Analytics the Integrated WayBlackboardEMEA
 
Skills and strategies for higher education in the Corona Age
Skills and strategies for higher education in the Corona AgeSkills and strategies for higher education in the Corona Age
Skills and strategies for higher education in the Corona AgeMJ Xavier
 

Similar to Building Models Quickly Addressing Housing Overflow at Purdue - Greenplum Summit 2019 (20)

2014 New Member Orientation
2014 New Member Orientation2014 New Member Orientation
2014 New Member Orientation
 
Learning outcomes facilitation
Learning outcomes facilitationLearning outcomes facilitation
Learning outcomes facilitation
 
Teaching Work Ready Skills Online
Teaching Work Ready Skills OnlineTeaching Work Ready Skills Online
Teaching Work Ready Skills Online
 
The Corona Crisis (COVID-19) : Issues and Strategies For Hospitality / Hotel ...
The Corona Crisis (COVID-19) : Issues and Strategies For Hospitality / Hotel ...The Corona Crisis (COVID-19) : Issues and Strategies For Hospitality / Hotel ...
The Corona Crisis (COVID-19) : Issues and Strategies For Hospitality / Hotel ...
 
MATC Scholars Program: Dr. Judy A. Perkins
MATC Scholars Program: Dr. Judy A. PerkinsMATC Scholars Program: Dr. Judy A. Perkins
MATC Scholars Program: Dr. Judy A. Perkins
 
Holistic Admissions Practices
Holistic Admissions PracticesHolistic Admissions Practices
Holistic Admissions Practices
 
Look before you leap externship-job shadowing program
Look before you leap externship-job shadowing programLook before you leap externship-job shadowing program
Look before you leap externship-job shadowing program
 
Learning Analytics for Student Support
Learning Analytics for Student Support Learning Analytics for Student Support
Learning Analytics for Student Support
 
Taskstream
TaskstreamTaskstream
Taskstream
 
Learning outcomes facilitation
Learning outcomes facilitationLearning outcomes facilitation
Learning outcomes facilitation
 
Cengage Learning Webinar, Course Redesign, Improving Outcomes and Reducing Co...
Cengage Learning Webinar, Course Redesign, Improving Outcomes and Reducing Co...Cengage Learning Webinar, Course Redesign, Improving Outcomes and Reducing Co...
Cengage Learning Webinar, Course Redesign, Improving Outcomes and Reducing Co...
 
A broader view of undergraduate research opportunity programmes: collaborativ...
A broader view of undergraduate research opportunity programmes: collaborativ...A broader view of undergraduate research opportunity programmes: collaborativ...
A broader view of undergraduate research opportunity programmes: collaborativ...
 
1710 track1 bagirov
1710 track1 bagirov1710 track1 bagirov
1710 track1 bagirov
 
Applying Your Way To Success: Grants and Best Practice Awards
Applying Your Way To Success: Grants and Best Practice AwardsApplying Your Way To Success: Grants and Best Practice Awards
Applying Your Way To Success: Grants and Best Practice Awards
 
Rollover, rollover
Rollover, rolloverRollover, rollover
Rollover, rollover
 
The evolution of University - April 2014
The evolution of University - April 2014The evolution of University - April 2014
The evolution of University - April 2014
 
Blended Learning Trends
Blended Learning TrendsBlended Learning Trends
Blended Learning Trends
 
Choosing and Implementing a New LMS: Lessons Learned from St. Petersburg College
Choosing and Implementing a New LMS: Lessons Learned from St. Petersburg CollegeChoosing and Implementing a New LMS: Lessons Learned from St. Petersburg College
Choosing and Implementing a New LMS: Lessons Learned from St. Petersburg College
 
Learning Analytics the Integrated Way
Learning Analytics the Integrated WayLearning Analytics the Integrated Way
Learning Analytics the Integrated Way
 
Skills and strategies for higher education in the Corona Age
Skills and strategies for higher education in the Corona AgeSkills and strategies for higher education in the Corona Age
Skills and strategies for higher education in the Corona Age
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 

Recently uploaded (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 

Building Models Quickly Addressing Housing Overflow at Purdue - Greenplum Summit 2019

  • 1. BUILDING MODELS QUICKLY ADDRESSING HOUSING OVERFLOW AT PURDUE Using Greenplum & XGBoost March 19, 2019
  • 2. PURDUE UNIVERSITY AN INDIANA INSTITUTION • Located in West Lafayette, IN • Consists of one main campus and 3 regional campuses • Over 40,000 students enrolled o ~30k Undergraduate, 10k Graduate • Over 200 majors offered across ten academic colleges • Part of the Big Ten Conference
  • 3. DATA SCIENCE & HIGHER EDUCATION A WORLD OF POSSIBILITY • Higher education has only just begun using Data Science • This means lots of new paths to forge • From the obvious: o Predict grades (done) o Maximizing financial aid through predicting yield • To the complex: o Course recommendation engine o Entry essay neural network
  • 5. WHAT IS IDAP? PURDUE DATA SCIENCE & ANALYTICS • IDAP serves as a gray box for a wide variety of data sources: o Traditional: Student Information o Ancillary Data Sources: Degree Requirements, Student Activities o New: Network Logs, Card Swipes, LMS Clickpath • Also houses a modelling pipeline with several production models o At Risk (<2.5 GPA first semester) o Course GPA (C or worse in a course) o Yield (Which students will attend) • Faculty Research o Secure high-compute server (Raiden) o Pulls data from Greenplum
  • 7. Refreshed data (incoming daily/weekly/monthly updates) feature generation pipeline Static features Static + time-sensitive LMS features Static + time-sensitive LMS + network + card logs features In-database parallel grid-search (XGBoost) MADlib Logistic Regression Sklearn AdaBoost Sklearn RandomForest Model selection Serialize to disk Structured, unstructured data sources scoring results • Student ID • Feature names, values, importance scores • Predictions Results sent to end- users Cleared by IDAP Data Scientist Modeling pipeline MODEL BUILDING AND SCORING PIPELINE
  • 9. THE SITUATION TOO MANY STUDENTS, TOO LITTLE HOUSING • Admission to Purdue in Fall 2018 hit historic highs o 8,357 students in the entering class, on top of historic high enrollment each of the two prior years o Nearly 800 new students vs Fall 2017 • Housing not being built quick enough to keep up with demand • Hundreds more students than usual might be put into temporary and off-campus leased housing at the start of semester
  • 10. THE SITUATION TOO MANY STUDENTS, TOO LITTLE HOUSING • Typical Problem Amplified • While temporary housing is normal at many universities, the need goes up with unexpected enrollment • Limited, Non-Ideal Space • Temporary space is not unlimited, nor is it ideal for learning • Off-Campus Leased Housing • Beyond temporary space, Purdue also leases space to house excess returning students • This is not campus-adjacent, and therefore also not ideal. Also not unlimited
  • 11. THE SOLUTION BUILD A MODEL IN XGBOOST USING GREENPLUM • Build a model - quickly • The decision was made to try and predict which people coming to Purdue’s housing system would not show up • The goal – reduce the number of student move disruptions from temporary housing, and maximize on-campus housing space • From concept to execution, there were less than two months in which to create and implement the results of this model • Blending data • Housing data was not in the greenplum system, needed to be pulled in so it could be blended with everything needed for the model • Two Models • Divided into two models, for two fundamentally different groups: new students and those returning to campus housing
  • 12. • First Iteration • The model was put together mostly using features from prior student success models • Performance & Usage • Initial performance allowed us to provide a sorted list of the most likely students to cancel • This list was used to make phone calls to these students and confirm their intent to utilize campus housing THE SOLUTION BUILD A MODEL IN XGBOOST AND GREENPLUM Returning Students – Version 1 Cancelled Precision Recall F-Score Support 0 0.932 0.775 0.846 1833 1 0.225 0.538 0.317 223 New Students – Version 1 Cancelled Precision Recall F-Score Support 0 0.997 0.956 0.976 2765 1 0.463 0.929 0.618 113
  • 13. • Typical Year • Typically, rooms in the Union hotel are reserved as temporary space • Additionally, other temporary spaces usually house students until after October break • Fall 2018 Temporary Housing • Partly due to the calling students with high probability to cancel, temporary housing actually saw a reduction in strain • Not only were all students out of temporary housing by October break, but rooms at the PMU were released prior to the start of classes INITIAL SUCCESS MORE EFFICIENT SPACE USAGE
  • 14. • There was a cohort of students that did not retain at Purdue, which the model missed • The model is highly unsure of many students • This was due, in part, to a bad definition of ‘returner’ and of ‘cancel’ in the model – it needed to be fixed and retrained SUCCESS WITH ISSUES USEFUL, NEEDS IMPROVEMENT
  • 15. • Tuning & New Features • New features and further tuning of the model’s parameters massively improved the model for returning students • Impact • Far more accurate model, fewer calls required to reach the students intending to cancel RETRAINING ADDITIONAL FEATURE BUILD Returning Students – Version 2 Cancelled Precision Recall F-Score Support 0 0.961 0.938 0.949 1880 1 0.524 0.642 0.577 201 New Students – Version 2 Cancelled Precision Recall F-Score Support 0 0.996 0.965 0.980 2736 1 0.555 0.917 0.691 132
  • 16. • Tuning & New Features • New features and further tuning of the model’s parameters massively improved the model for returning students • Impact • Far more accurate model, fewer calls required to reach the students intending to cancel RETRAINING ADDITIONAL FEATURE BUILD
  • 17. • Post-hoc Data Recording • Fall 2019, housing will record who/when they call students so that we can better match that with the actual results when cancellations come in after August • Potential Future Retraining • New housing is being built on-campus to keep up with the growing population. Once that is online, cancellation patterns may change and require retraining • Otherwise, keeping up with post-hoc analysis of results should indicate when a retraining is next necessary • Due to the setup of the model in greenplum, retraining is quick & easy! NEXT STEPS FUTURE TUNING & USAGE
  • 19. IMPORTANT FEATURES TOP FEATURES IN XGBOOST MODELING RESULTS Rank Feature Score 1star_registration_promptness 272 2 hs_core_gpa 225 3 population 223 4 medianfemalebachincome 190 5 medianmalebachincome 174 6 hs_gpa 167 7 closet_rep_miles 166 8 bach25plus 159 9 per_capita_income 158 10 mast25plus 153 11 days_before_start_sign 120 12 highest_satr_ebrw 100 13 highest_satr_total 91 14 highest_satr_math 85 15 ap_avg 77 16 decision_count 75 17 ap_cnt 56 18 vstar_ind 45 Rank Feature Score 1semester_cdfw_rate 745 2 prior_overall_gpa 743 3 avg_weekly_rectrac_swipes 718 4 hs_core_gpa 579 5 medianfemalebachincome 563 6 closet_rep_miles 549 7 population 482 8 bach25plus 450 9 per_capita_income 432 10 ap_avg 422 11 medianmalebachincome 413 12 days_before_start_sign 403 13 num_room_changes_last_year 389 14 num_classes_registered 374 15 highest_satr_ebrw 372 16 mast25plus 363 17 hs_gpa 345 18 highest_satr_math 318 19 highest_satr_total 304 20 hs_gpa_vs_hs_inst_gpa_diff 280 21 hs_size 261 22 hs_inst_gpa 251 23 ap_cnt 248 24 age 201 25 roomie_avg_gpa 189 26 age_as_of_semstart 103 27 roomie_gpa_diff 92 New Students Model Returners Model