On 2 June during Textkernel's conference Intelligent Machines and the Future of Recruitment, Colin Lee presented his work on the automated preselection of applicants. For this research he used data from Connexys from 441,768 applicants at 48 companies, in combination with Textkernel parsing and normalization, to develop an algorithm that predicts which applicants get invited to a job interview. Colin explains the logic behind his approach and discusses potential future applications.
2. Intuition’s Fall from Grace
Algorithms and data in (Pre-)Selection
And a discussion of the opportunities highlighted by the study
A summary of a study on the prediction of prescreening decisions in Dutch
companies
An explanation of the logic behind the study’s design
3. 3
Colin Lee, Ph.D.
Data Collection
Study Design
441,769
REAL Applicants
48
Companies
>9
Industries
4. Colin Lee, Ph.D.
4
Data Analysis
Study Design
Use of conventional methods problematic:
- Importance of factors varies across jobs, companies, and industries
- Distribution of data is uneven (i.e., skewed) across occupations
- Reliability of data varies across companies
Issues are addressed by the use of “Synthetic Validity”:
- Estimates based on job characteristics instead of specific job, this allows:
- the creation of vacancy specific equations, while learning from all other vacancies
- separate adjustments for each company or industry
- estimates for jobs with very little or no prior data
6. Colin Lee, Ph.D.
6
Data Analysis
Study Design
The method is not new…
- Synthetic validity has been around since the 1950’s (Lawshe, 1952)
- Initially intended for jobs in companies with too little data to do a validity test of their
assessment batteries
… but has a newfound purpose in the era of “Big Data”
- Latest improvements incorporated by Piers Steel and colleagues (2006) weigh in the
reliability of the data, making poor data quality less problematic
- With enough data, Synthetic Validity allows a “Full Scale” selection system to be built,
which works for all jobs and can incorporate large numbers of predictors
Piers Steel
Professor at the University of Calgary
7. Colin Lee, Ph.D.
7
Accuracy of
PredictionStudy Results
• Algorithms trained on first
90% applicants per
company, tested on
remaining 10%
• Cover letters could not be
analyzed
• Results suggest that the
approach could allow an
accuracy of > 80% when
cover letters are included
in the analysis
68.78%
not invited
With cover letters (% correct):
69.45%
invited
80.27%
not invited
Without cover letters (% correct):
82.51%
invited
8. Colin Lee, Ph.D.
8
Key Factors
Study Results
Category Variable
Average Relative
Importance
Demographics 53.26%
Age 49.07%
Gender 0.84%
Dutch/Non-Dutch 0.48%
Registered Relationship 0.54%
Distance from Company 2.33%
Qualifications 27.29%
Experience Years 19.33%
Relevance of experience 5.92%
Undereducated 0.36%
Overeducated 0.36%
Number of relevant skills 1.32%
Context 18.73%
Number of other applicants 5.75%
External applicant 4.25%
Average percentage of applicants invited by company 3.23%
Vacancy rate for occupation 2.81%
Applied after target was reached 2.69%
9. Colin Lee, Ph.D.
9
Conclusion
Study Discussion
Novel Big Data technologies, including Connexys ATS interoperability and
TextKernel CV parsing and normalization, in combination with Synthetic Validity
matching methods enable powerful matching technologies for the job market
Leveraging these technologies and data could save time in preselection:
- Spend less time on most likely and least likely candidates
- Focus on boundary cases
Some (slightly disconcerting) insights into the weights of the factors:
- Age determines nearly 50% of the variance
- Experience nearly 25%
- Skills and education surprisingly low in weight
10. Colin Lee, Ph.D.
10
In Discussion...
[…] we now have a much better understanding
of who gets invited to a job interview. The next
step is to see whether the people invited are
the ones who should be invited.
Study Discussion
11. Colin Lee, Ph.D.
11
Future Applications
- Several options:
- Predicting Job performance (i.e., how would the employee be rated by his or her
manager?)
- Create a valuation model. For example:
𝑅𝑒𝑣𝑒𝑛𝑢𝑒 ∗ 𝑇𝑖𝑚𝑒 𝑡𝑜 𝑇𝑢𝑟𝑛𝑜𝑣𝑒𝑟 − 𝐻𝑖𝑟𝑖𝑛𝑔 𝐶𝑜𝑠𝑡𝑠 − 𝐶𝑜𝑚𝑝𝑒𝑛𝑠𝑎𝑡𝑖𝑜𝑛 = 𝑉𝑎𝑙𝑢𝑒 𝑜𝑓𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
- However:
- Data from different sources needs to be linked
- Some parts of the valuation are challenging to estimate (e.g., for many jobs revenue
is hard to establish)
- Legal issues
- Some coordination within company required
Study Discussion
12. Colin Lee, Ph.D.
12
Intermediate Solution
1. Get estimates of variable weights from prior research:
- metaBUS (Bosco, Uggerslev, Steel): Curating all reported findings in the field of
applied psychology to facilitate meta-analytic research (LAUNCHES TODAY! @
https://beta.metabus.org)
Study Discussion
14. Colin Lee, Ph.D.
14
Intermediate Solution
1. Get estimates of variable weights from prior research:
- metaBUS (Bosco, Uggerslev, Steel): Curating all reported findings in the field of
applied psychology to facilitate meta-analytic research (LAUNCHES TODAY! @
https://beta.metabus.org)
- ReNotate (Lee, Felps, Frasincar, Kobayashi, Kismihók, Mol): Using the highlights
and annotations of people reading academic publications to index and curate the
findings in those publications (in development)
2. Ask Subject Matter Experts to estimate the variation between occupations
Study Discussion
15. Colin Lee, Ph.D.
15
Intermediate Solution
- Let’s try this now!
- Well… maybe a simplified version.
- Instructions
- Go to the site below (or use the QR code)
http://synthetic-validity.com/tk15
- Fill out the questions
- And see if we can establish our own selection system.
Study Discussion
16. Colin Lee, Ph.D.
16
Synthetic Validity Promises
- Accurate selection systems
- Quickly built
- Relatively cheap once initialized
- Updated by a constant stream of data
- Predictions on new occupations
Study Discussion
17. THANK YOU
Colin Lee, Ph.D.
Haskayne School of Business, University of Calgary (Canada)
colin.lee1@ucalgary.ca
Editor's Notes
Algorithms and data
Name, where
Doctorate
Study on (pre-)selection
How I felt
Intitution’s fall from grace -> title inspired by Telegraaf
Summary of study
Logic and methods
Opportunities
440
48 comp
9
Used 90% of the first applicants to predict the selection decision of the final 10%
Conventional statistics
One equation
Some jobs data
Big data => not all data reliable
Synthetic validity
Cut job into activities
Predict on activities
Synthetic equation
Not general, job-specific
+ Company + Industry
Know how varies => Learn from other vacancies
Estimates without job-data
1950’s
Small companies with too little data
Developed into quest
Era of big data highly feasible
First step in study
Prescreening decision, not performance
ATS systems and text extraction
Trained on 90%
Tested on 10%
No cover letter, over 80%
With cover letter 69%
High considering