2. About the Speaker
Vinayak Joglekar
Founder & CTO, Synerzip
- Hires & Mentors Agile Software Development
Teams
- Over 3 decades of experience in Software Product
Development
- Hands on practitioner of Agile and Lean techniques.
-Speaker at 2008 Agile Conference in Toronto.
- Hands on experience in QA automation, DevOps,
UX design and CD
- Blogs about trends in software development
Linkedin Profile:
https://www.linkedin.com/in/vinayak-joglekar-b95329/
Confidential
3. Problem statement
• Recruiters in software companies who hire
3 to 5 years experienced professionals in
popular technologies like Javascript are
inundated with resumes. Build an app that
would magically parse and rank hundreds of
resumes in a jiffy.
• Build an engaging UX so that the recruiters
would return time and again to use the app.
The app should empower the recruiter by
snugly augmenting his routine tasks
Confidential
4. Hook Model to build an engaging UX
• External Trigger-Recruiters
receives a Job requirement
• Action- About 20 to 50 freshly
sourced resumes are submitted
• Variable reward – Download an
excel tracker with all the
resumes ranked in the order of
suitability
• Investment- Repeat use of the
application gives more accurate
results.
Confidential Concept courtesy: Nir Eyal
5. Smooth experience - Action
Confidential
• Point to a folder
containing all the
resumes received
• Parser extracts
important information
like contact details,
education, technical
expertise, relevant
project experience etc.
in less than 1 min.
6. Execution Challenges- Gate Server
• Gate server is single threaded. How to build
a web application?
• Gate server crashes after parsing a few
hundred resumes. Ops need to restart it to
bring up the service
• A rogue resume can take very long and
eventually bring down the Gate server
• Each resume takes a few seconds – parallel
processing needed to speed up parsing.
Confidential
7. Challenge-Gate is single threaded
Confidential
GATE
Ontolgy
User 1
GATE
Ontolgy
User 2 User 1 User 2
…
User n
Web Server
Singleton
Queue
8. Challenge-Parsing is inherently slow
Confidential
Gate Document
Parser
Preprocessing
DOCX to HTML
Timeout 60
seconds
Timeout 90
seconds
Error
Message
Pod
P G
Input
Queue
Dead Letter
Queue
6
Pod
P G
R
6
Output
Queue
R
6
Pod
P G
6
6
6
Pod
P G
Pod
P G
6
6
6
6
Pod
P G
Pod
P G
Pod
P G
Pod
P G
6
6
6
6
6
6
Instance 1 Instance 2
9. Solution
• Create a singleton Gate server that works on
multiple requests serially by using rabbit MQ
• Create multiple instances of this server by
putting each one in a Docker container.
• Use AWS to host and K8s to orchestrate
• Circuit breaker for consistent performance
• Container killed if it times out . Document beig
processed in put on dead letter queue
• Containers re start after servicing a fixed
number of requests
Confidential
10. Rewarding experience
Confidential
• Quickly rank the
resumes in the order of
suitability score=sum of
weighted score of
various criteria like
education, technical
expertise, relevant
experience, proximity,
notice period, expected
compensation etc.
11. Challenge- Missing Information
Confidential
Name
Contact Details
Objective Target Designation/ Role
Overview Experience
Skill Set List of Technologies
Institute Degree BranchYear
Institute Degree BranchYear
Company DesignationFrom
To Company DesignationFrom
To
Client Project DescriptionFro
m
URL ResponsibilitiesTechnologies
To
Client Project DescriptionFro
m
URL ResponsibilitiesTechnologies
To
Awards and Certifications
Sports and Hobbies
Footnote Address
All the terms were getting correctly
annotated but they were not getting
properly grouped under the correct
heading
Education
Experience
Project 1
Project 2
12. Solution
• We had 3 annotators manually annotate more
than 1000 resumes
• We modeled this as n way classification
problem with each heading as a class and the
terms inside the headings and their relative
location in the resume as the features
• We achieved 98% accuracy and 95% recall
Confidential
13. Challenge- weightages???
• Ranking was largely dependent on the
weightages assigned to suitability criteria
like education, relevant experience, notice
period, compensation, technical skills
needed etc.
• The importance assigned to these factors
was dependent on the seniority of the
position, company and specific project
needs
Confidential
14. Solution
We collected historical information about
resumes that were short listed for interview in a
company for specific projects and modeled it
as a logistic regression problem with the vector
of weightages being theta in the sigmoid
function above
Confidential
15. Challenge
• Resumes will have new words- technologies
and technical terms that are
unrecognizable.
• Candidates will learn new skills that are not
existent today(Big data analytics, cloud
computing & mobile programming didn’t
exist 5 years back)
• Recruiters will feel powerless and bored of
using the application if they can’t teach it to
work smarter- they want to achieve mastery
Confidential
16. Solution
• Created a “training set” from manually
annotated resumes. More resumes
processed= bigger training set = smarter
parsing of new resumes.
• Offset locations of the training set are
modeled as features and annotations are
modeled as their values
• CNN using Tensorflow to automatically
annotate resumes-> User empowerment!
Confidential
17. Challenge
• Most suitable candidate as per the suitability
score and quiz score doesn’t always get
selected. Sometimes no 2 or 3 is found to
be better than no 1.
• Suitability scores are calculated using
weightages assigned to various attributes.
These weightages are based on “hunch”
Confidential
18. Smooth experience - Action
Confidential
As frequency of use & no. of
users increases
•more terms get added to the
ontology and less number of
terms need manual
annotation
•Accuracy & recall in parsing
headings improves
•Weightages used in
computing suitability score
become more accurate
19. Un-annotated terms reduce
As more number of
words get added to
the ontology more
than 95% of the
words are found in
the ontology. The
drop in
unrecognized
terms is
exponential
Confidential
Number of resumes parsed.
Numberofunrecognizedwords.
20. Accuracy and Recall improve
As more number of
resumes are
parsed with
corrections done
manually wherever
required, they get
added to the
training set and the
recall and accuracy
improve
Confidential
Number of resumes parsed.
AccuracyandRecall%
Accuracy
Recall
21. Conclusion-hitech for engagig UX
• Machine learning models become smarter
with continued use which keeps the users
invested in the application. Past history of
usage is the investment in this case.
• Cloud native containerized micro-services
provide an opportunity to build magically
fast, consistent and reliable response
Confidential