2. PURDUE UNIVERSITY
AN INDIANA INSTITUTION
• Located in West Lafayette, IN
• Consists of one main campus
and 3 regional campuses
• Over 40,000 students enrolled
o ~30k Undergraduate, 10k
Graduate
• Over 200 majors offered
across ten academic colleges
• Part of the Big Ten Conference
3. DATA SCIENCE & HIGHER EDUCATION
A WORLD OF POSSIBILITY
• Higher education has only just
begun using Data Science
• This means lots of new paths
to forge
• From the obvious:
o Predict grades (done)
o Maximizing financial aid
through predicting yield
• To the complex:
o Course recommendation
engine
o Entry essay neural network
5. WHAT IS IDAP?
PURDUE DATA SCIENCE & ANALYTICS
• IDAP serves as a gray box for a wide variety of data sources:
o Traditional: Student Information
o Ancillary Data Sources: Degree Requirements, Student Activities
o New: Network Logs, Card Swipes, LMS Clickpath
• Also houses a modelling pipeline with several production models
o At Risk (<2.5 GPA first semester)
o Course GPA (C or worse in a course)
o Yield (Which students will attend)
• Faculty Research
o Secure high-compute server (Raiden)
o Pulls data from Greenplum
7. Refreshed data (incoming
daily/weekly/monthly updates)
feature
generation
pipeline
Static features
Static + time-sensitive
LMS features
Static + time-sensitive LMS +
network + card logs features
In-database parallel
grid-search
(XGBoost)
MADlib Logistic
Regression
Sklearn
AdaBoost
Sklearn
RandomForest
Model
selection
Serialize to
disk
Structured, unstructured
data sources
scoring results
• Student ID
• Feature names, values, importance scores
• Predictions
Results sent to end-
users
Cleared by IDAP
Data Scientist
Modeling pipeline
MODEL BUILDING AND SCORING PIPELINE
9. THE SITUATION
TOO MANY STUDENTS, TOO LITTLE HOUSING
• Admission to Purdue in Fall 2018 hit historic
highs
o 8,357 students in the entering class, on top of
historic high enrollment each of the two prior
years
o Nearly 800 new students vs Fall 2017
• Housing not being built quick enough to keep
up with demand
• Hundreds more students than usual might be
put into temporary and off-campus leased
housing at the start of semester
10. THE SITUATION
TOO MANY STUDENTS, TOO LITTLE HOUSING
• Typical Problem Amplified
• While temporary housing is
normal at many universities, the
need goes up with unexpected
enrollment
• Limited, Non-Ideal Space
• Temporary space is not unlimited,
nor is it ideal for learning
• Off-Campus Leased Housing
• Beyond temporary space, Purdue
also leases space to house
excess returning students
• This is not campus-adjacent, and
therefore also not ideal. Also not
unlimited
11. THE SOLUTION
BUILD A MODEL IN XGBOOST USING GREENPLUM
• Build a model - quickly
• The decision was made to try and predict which people coming to
Purdue’s housing system would not show up
• The goal – reduce the number of student move disruptions from
temporary housing, and maximize on-campus housing space
• From concept to execution, there were less than two months in
which to create and implement the results of this model
• Blending data
• Housing data was not in the greenplum system, needed to be
pulled in so it could be blended with everything needed for the
model
• Two Models
• Divided into two models, for two fundamentally different groups:
new students and those returning to campus housing
12. • First Iteration
• The model was put together
mostly using features from
prior student success models
• Performance & Usage
• Initial performance allowed
us to provide a sorted list of
the most likely students to
cancel
• This list was used to make
phone calls to these students
and confirm their intent to
utilize campus housing
THE SOLUTION
BUILD A MODEL IN XGBOOST AND GREENPLUM
Returning Students – Version 1
Cancelled Precision Recall F-Score Support
0 0.932 0.775 0.846 1833
1 0.225 0.538 0.317 223
New Students – Version 1
Cancelled Precision Recall F-Score Support
0 0.997 0.956 0.976 2765
1 0.463 0.929 0.618 113
13. • Typical Year
• Typically, rooms in the Union hotel are reserved as temporary space
• Additionally, other temporary spaces usually house students until after
October break
• Fall 2018 Temporary Housing
• Partly due to the calling students with high probability to cancel,
temporary housing actually saw a reduction in strain
• Not only were all students out of temporary housing by October break,
but rooms at the PMU were released prior to the start of classes
INITIAL SUCCESS
MORE EFFICIENT SPACE USAGE
14. • There was a cohort of students that
did not retain at Purdue, which the
model missed
• The model is highly unsure of many
students
• This was due, in part, to a bad
definition of ‘returner’ and of ‘cancel’
in the model – it needed to be fixed
and retrained
SUCCESS WITH ISSUES
USEFUL, NEEDS IMPROVEMENT
15. • Tuning & New Features
• New features and further
tuning of the model’s
parameters massively
improved the model for
returning students
• Impact
• Far more accurate model,
fewer calls required to reach
the students intending to
cancel
RETRAINING
ADDITIONAL FEATURE BUILD
Returning Students – Version 2
Cancelled Precision Recall F-Score Support
0 0.961 0.938 0.949 1880
1 0.524 0.642 0.577 201
New Students – Version 2
Cancelled Precision Recall F-Score Support
0 0.996 0.965 0.980 2736
1 0.555 0.917 0.691 132
16. • Tuning & New Features
• New features and further
tuning of the model’s
parameters massively
improved the model for
returning students
• Impact
• Far more accurate model,
fewer calls required to reach
the students intending to
cancel
RETRAINING
ADDITIONAL FEATURE BUILD
17. • Post-hoc Data Recording
• Fall 2019, housing will record who/when they call students so that we
can better match that with the actual results when cancellations come in
after August
• Potential Future Retraining
• New housing is being built on-campus to keep up with the growing
population. Once that is online, cancellation patterns may change and
require retraining
• Otherwise, keeping up with post-hoc analysis of results should indicate
when a retraining is next necessary
• Due to the setup of the model in greenplum, retraining is quick & easy!
NEXT STEPS
FUTURE TUNING & USAGE