Human AI Collaboration for Real-time Data Processing at Emergency Services, Guest Lecture, University of South Carolina

Human-AI Collaboration for Real-time Data
Processing Systems at Emergency Services
Hemant Purohit, Ph.D.
Humanitarian Informatics Lab (Human_Info_Lab)
Dept. of Information Sciences & Technology
Mar 5, 2021 @hemant_pt | hpurohit@gmu.edu
Grants:
• IIS #1657379, IIS #1815459
PhD Students:
Rahul Pandey & Yasas Senarath
Special Thanks:
Guest Lecture for CSCE 791: Seminar in Advances in Computing
University of South Carolina

Human-AI Collaboration for Next-Generation Emergency Services
Outline
¨ Summary of research thrusts
¨ Focus: social media & city services during crises
¨ Problem 1. Modeling human errors in human-in-the-loop AI
system design
¨ Problem 2. Human workload-aware serviceability ranking
system design
¨ Future directions
2

Broad Research Area
3
¨ Human-centered Computing
n Sebe (2010) – “integrating human sciences (e.g. social & cognitive)
and computer science (e.g. machine learning) methods
for the design of computing systems with a human focus,
which should consider the personal, social, and cultural contexts
in which such systems are deployed”
My focus on
Social Media Mining &
Semantic Text Analytics
for real-time processing
systems at city services

Lab’s Research Thrusts
4
¨ [Natural Crises] Social Media Mining for Crisis Communication
¤ Extracting actionable posts in a new crisis using transfer & active learning
¤ Ranking serviceable requests for help on social media
¤ Human workload-aware ranking system design
¨ [Societal Crises] Semantic Analysis for Human Behavior Modeling
¤ Defining intent behind harmful behaviors on social media: Stereotyping, Hate
¤ Mining malicious stereotypical behavior against women for negative social construction
¤ Identifying factors affecting diffusion and mitigation of hate and disinformation
¨ [Cyber Crises] Text Comprehension Modeling for Cyber Defense
¤ Manipulating text comprehensibility to generate deceptive content
¤ Estimating believability for deceptive content

Outline
system design
system design
5

Current Work at EM Services
6
World
Events
EM Response &
Decision Making
Human Workers
Information
Processing
CURRENT
Reliable but
small-scale
Accurate but
high workload
Data
Collection

Motivation
7
When traditional call-for-help EM services are overwhelmed ..
Source: https://www.usatoday.com/story/news/nation-now/2017/08/27/desperate-help-flood-victims-houston-turn-twitter-rescue/606035001/
Help
Offering
Help
Seeking
People resolving
to Social Media
How to
discover?

Future of Work at EM Services
8
World
Events
EM Response &
Decision Making
Human Worker +
AI agent
Data
Collection
Information
Processing
FUTURE
Noisy but
Large-scale
Faster but
Inaccurate

Future of Work at EM Services
9
World
Events
EM Response &
Decision Making
Human Worker +
AI agent
Data
Collection
Information
Processing
FUTURE
How to improve
AI Mental Model
with Worker
Mental Model?
Noisy but
Large-scale
Faster but
Inaccurate

Matching Mental Models of Human & AI
Agent: How to design Human-in-the-loop AI system?
10
Human Worker +
AI Agent
Information Processing Tasks
1. Filtering
• Classification Problem
2. Prioritization
• Ranking Problem
..
1. Adapt to
classify relevant
items in a
data stream
2. Adapt to rank
top-K items for
human
intervention

Human-in-the-loop AI System Design:
Awareness for human factors
11
Human Worker +
AI Agent
1. Filtering
2. Prioritization
• Ranking Problem
..
1. Active Learning
for Relevancy
Classification
Depends on
Annotator
Reliability

12
Human Worker +
AI Agent
1. Filtering
2. Prioritization
• Ranking Problem
..
2. Adaptive
Top-K ranking
alerts for
human
Affects
Human
Workload
1. Active Learning
for Relevancy
Classification
Depends on
Annotator
Reliability

Outline
system design
system design
13

14
Human Worker +
AI Agent
1. Active Learning
for Relevancy
Classification
Depends on
Annotator
Reliability
What if
system
causes
human
errors?

Problem 1: How to Reduce Annotator Errors
15
Understanding potential human error causes using psychology theories
¨ Annotator burnout(Marshall and Shipman, 2013)
¨ Cognitive bias for answer positions(Burghardt, Hogg, and Lerman, 2018)
¨ Human error in execution(Reason, 1990; Zhang et al., 2004)
Mistakes
Errors due to incorrect or incomplete
knowledge
Faulty heuristics
Slips
Errors in the presence of correct and
complete knowledge
Loss of activation
[Pandey, Castillo, & Purohit, ASONAM’19]

16
Ø Hypothesis: Serial ordering of instances given to the human
annotator may cause him/her errors due to a mistake or slip.
{c4, c1, c2, c3, c1, c3, c4, c1, c4, c1, c4, c2, c1, c4, c1, c2, c4, c2, c4, c3}
How likely an
annotator would
make error on this
3rd occurrence due
to the potential
decay in memory?
Instance class
Motivation: Memory Decay, Ebbinghaus Curve(Ebbinghaus,2013)

17
Type of Error Potential Cause Mitigation Approach
Slips induced by
time constraints
• Concept forgotten • Show reminder for concept
examples
Mistakes induced by
serial ordering
• Concept not acquired yet
or forgotten
• Show frequent learning
examples
Slips induced by
serial ordering
• Presence of
a high-availability concept or
a low-availability concept
• Limit extreme divergence
from base rate
Preliminary framework to study human factors in active learning
Ø Hypothesis: Serial ordering of instances given to the human
annotator may cause him/her errors due to a mistake or slip.

18
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1st 2nd 3rd
Average
Error
Position
¨ Crowdsourcing annotation
testing experiment
¤ 20 ordered instances per
schedule with specific
class positions
¤ 6 such schedules
¤ 10 human annotator per
task
p-value
0.005
Annotation Schedule: {c4, c1, c2, c3, c1, c3, c4, c1, c4, c1, c4, c2, c1, c4, c1, c2, c4, c2, c4, c3}
Forgetting or Memory Decaying
behavior exists

¨ Sigmoid function to
model error probability
for an ordered instance
¨ Lab annotation testing
¤ 3 human annotator
¤ 800 ordered instances
with the induced error
19
Forgetting or Memory Decaying
behavior resembles sigmoid function.

Generate
Annotation-Schedule
Sample instances
Minimizing
human memory
decaying score
Maximizing
streaming model
performance
1. Sample instances from decision boundary
range of active learning model
• Prediction probability in [30%, 70%]
2. Maintain a class label Cdiscarded for each
interval to avoid samples predicted with
Cdiscarded labels
• Choose Cdiscarded based on
• If the class is appearing too frequent
• If the class is adding noise to the
streaming model
20
Solution: Error-avoiding Annotation Schedule to augment both human & model performance

21
Solution: Error-avoiding Annotation Schedule to augment both human & model performance

22
Human Error-Mitigating Sampling Algorithm outperforms in most cases!

Outline
system design
system design
23

24
Human Worker +
AI Agent
2. Adaptive
Top-K ranking
alerts for
human Affects
Human
Workload
Can you
increase
human
control or
agency?

Problem 2: How to Create Human Workload-
aware Serviceability Ranking System
25
Image: https://blog.bufferapp.com/twitter-timeline-algorithm
BEYOND TIME & CREDIBILITY,
RANK BY
Serviceability
Can you
increase my
control or
agency?
End user
(Servicer)

Problem 2: Workload-aware Serviceability
Ranking: Designing for human-AI collaboration
26
¨ Problem: how many and how often generate the request
alerts to respond for a human servicer (cause him workload!)
High Recall can cause
more work for a
time-crunched
Servicer!
Low Recall can cause
missing important
requests for a
Servicer!
RECALL (Machine/System metric)
WORKLOAD
(Human metric)
Ineffective
Inefficient
Worst
Desired
Optimal
Solution
[Purohit, Castillo, Imran, & Pandey, WI’18]

Ranking: Designing for human-AI collaboration
27
Streaming
Requests
tij corresponds to
time period - when
to check requests,
e.g., 10 mins.
Row k corresponds
to the selection of
top-k ranked
requests to check
Ranked Requests Performance Metrics Estimation Dynamic Policy Selection
A cell tuple
corresponds to the
attainable
(Recall, Workload)
Choose a config,
e.g., k=10, tij=30,
and (R,W) = (90,20)

Ranking: Approach summary
28
Serviceability
Categorization
and Ranking
Ranking-
Workload
(RW) Matrix
Generation
Optimal RW
Policy Selection

Problem 2. Serviceability Model: using Qualitative
Knowledge extracted from domain guides
29
Explicit
Request
E(m)
Answerable
Query
A(m)
Sufficiently
Detailed
D(m)
Correctly
Addressed
C(m)
Serviceability(m) = f ( E(m), A(m), D(m), C(m) )
Explicitly asks for a
resource or service
Explicitly asks a question
that can be answered
Sent to organization or
person who could have
resources or provide the
service, an alarm, or
could answer questions
Specifying contextual
information: time (when),
location (where),
quantity (how much),
resource (which)
[Purohit, Castillo, Imran, &
Pandey, ASONAM’18]

Problem 2. Serviceability Model: Quantifying
Characteristics
(Anonymized) Message Explicit Answer-
able
Addressed Detailed
@account1 please, governor, post a phone # for
specific info in our local areas
4.3 4.3 3.3 3.7
@account2 is thr parking at McMahon for volunteer? 4.0 5.0 5.0 5.0
@account3 how can I help 1.3 4.3 4.3 1.0
@account4 Plz pray for these families 1.7 1.0 1.0 1.0
@account5 been working in #LAFlood shelter, we
actively monitor SM for feedback
1.0 1.0 2.0 2.0
“@account7 No matter where in the world ur
followers live, you can donate from link Plz RT
1.0 1.0 1.0 1.0
¨ E(m), A(m), C(m), D(m) : Likert Scale Functions [score:1-5]
30
Illustration Table: Average scores of Likert ratings by crowd annotators

Problem 2. Serviceability Model: Learning-to-Rank
System Design
31
[Purohit, Castillo, Imran, & Pandey, ASONAM’18]

Serviceability Model: Examples of resulting
ranked requests
33
Ranked Messages by T (text)+I (Inferred) Modeling Scheme
TOP-2
[Sandy]
- @_USER_ please, governor, post a website or phone# where we can get
specific info for our local areas
- @_USER_ Queens trains aren’t being addressed at all. When can v expect any
service updates for the NQR trains?
BOTTOM-2
[Sandy]
- @_USER_ Romney not going2like that gov christie is being nice about Obama’s
leadership
- @_USER_ HILARIOUS! That’s much needed laughter, I am sure.
TOP-2
[Alberta]
- @_USER_ can you tell me if sanitary pumps are running yet in elbow park?
#yycflood
- @_USER_ plz text with what you need & address. Lots of volunteers in mission
BOTTOM-2
[Alberta]
- @_USER_ thank u calgary police
- @_USER_ Tx for ur time!!

Workload-aware Serviceability Ranking:
Designing for human-AI collaboration
34
Streaming
Requests
Ranked Requests Performance Metrics Estimation Dynamic Policy Selection
Image: https://commons.wikimedia.org/wiki/File:Front_pareto.svg
Pareto Optimization
à Given the lack of user
preference apriori, rely
on for non-dominated
sorting
Ranking-Workload (RW) Matrix

Workload-aware Serviceability Ranking:
Ranking-Workload (RW) Matrix
35
¨ Define RW Matrix to model the relationship between human &
machine performances for a request-set 𝑥𝑖𝑗 in time 𝑡𝑖𝑗
n 𝑅𝑊 (𝑘, 𝑡𝑖𝑗) = ⟨ 𝑀(𝑅(𝑥𝑖𝑗)), 𝑤 𝑡𝑖𝑗, 𝑘 ⟩
¤ Machine Performance Metric (for a ranking system 𝑅(𝑥𝑖𝑗)): 𝑀(𝑅(𝑥𝑖𝑗))
n Recall@k, e.g., no. of relevant requests in top-k
n Precision@k
¤ Human Performance Metric: 𝑤(𝑡𝑖𝑗, 𝑘)
n Cognitive Load, e.g. hourly rate of requests to read
n Time-on-Task

Workload-bound Serviceability Ranking:
Pareto-Optimal RW Policy Selection
36
Given the lack of user
preference apriori, rely on
Pareto Optimization[Ross, 1973]
for the non-dominated
selection
Can you
recommend
me to choose?
Image: https://commons.wikimedia.org/wiki/File:Front_pareto.svg

Experimental Setup
37
¨ Used relevancy data as alerts from 6 crisis events in our prior work,
where relevancy is ‘serviceability’[Purohit, Castillo, Imran, & Pandey, ASONAM18] of a
message for response
Event (Year, start day – end day) Tweets Relevant Irrelevant
Hurricane Sandy (2012, 10/27-11/07) 1,153 40% 60%
Oklahoma Tornado (2013, 05/20-05/29) 1,513 48% 52%
Alberta Floods (2013, 06/16-06/16) 2,727 28% 72%
Nepal Earthquake (2015, 04/15-05/15) 2,222 18% 82%
Louisiana Floods (2016, 10/11-10/31) 1,369 34% 66%
Hurricane Harvey (2017, 08/29-09/15) 12,742 20% 80%

Experimental Setup
38
¨ Compared two algorithms for recommending RW policy:
¤ Periodic algorithm
n process requests posted in the time window of past H (24) hrs.
n generate top-k ranking and a RW matrix at the beginning of
every hour (e.g., 7am, 8am)
¤ Near-Realtime algorithm
n process requests posted in the time window of past G (60) mins.
n generate top-k ranking and a RW matrix at the beginning of
every minute (e.g., 7:01am, 7:02am)

Experiment 1 – RW trade-off validation analysis
39
Multiple
Recall values
for a given
workload
budget!

Problem 2. Workload-aware Serviceability
Ranking: Experiment – Greedy-recall baseline comparison
40
Pareto-optimal
Periodic RW
recommendations
give lower workload
in contrast to max.
recall-based policy.

Problem 2. Workload-aware Serviceability
Ranking: Experiment – Greedy-workload baseline
41
Pareto-optimal
Periodic RW
recommendations
give higher recall in
contrast to min.
workload-based
policy.

Conclusion: Lessons, Limitations, and Future Work
42
¤ A human-AI collaboration approach can help in scalable
stream data processing for the emergency services
n Combining Human Factors + AI systems
¤ Lessons learned:
n Serviceability characteristics of information capture the notion of relevance
and serviceability for social media requests to online public services.
n Workload-aware serviceability ranking provides a Human-AI Collaboration
design to seamlessly incorporate user choices in the system design.

Conclusion: Lessons, Limitations, and Future Work
43
¨ Limitations & opportunities:
¤ Serviceability model
n Study non-English language request messages
n Explore multiple as opposed to single platform based datasets
n Twitter vs. Forum
n Include indirectly addressed requests (i.e. not starting with @user)
¤ Human-AI collaboration
n Extend the human performance metrics in the Ranking-Workload matrix
n Incorporate bias of the performance metrics in the RW matrix
n Adapt the workload-aware serviceability approach to other domains

44
Applications:
CitizenHelper-Adaptive Tool: Expert-augmented Streaming Analytics
System for Emergency Services and Humanitarian Organizations
[Pandey & Purohit, ASONAM’18]

Applications:
Human-Annotation for Crowdsourcing Work
45
Concept for class c2
not acquired yet
– Mistakes
Imbalanced
presence of class c1
– Slips

Applications:
Working with CERTs
46
Assisting regional CERT organizations for rapid social media filtering
for COVID-19 response using the tool developed under NSF CRII
project, CitizenHelper Tool, leading to a new NSF RAPID grant!

Future Work: Human-AI Collaboration at
Workplaces of Various City Services
47
Q2.
How to classify
relevant
content in
online streams
in a new event
domain?
[ECML’20, ASONAM’20,
SBP-BRiMS’18]
Q3.
How to rank &
semantically group
serviceable,
actionable request
content?
[SNAM’20, ASONAM’18]
Q4.
How many & when
to present requests
to a worker with
dynamic workload?
[ASONAM’18, WI’18]
Data
Stream
City
Service
Worker
Filtering Prioritization Human-Machine
Interaction
Q1.
How to sample &
order instances
for human
annotation, to
improve labeled
data quality?
[ASONAM’19, IJHCS (under
review)]
Human
Annotation
opportunity for fundamental research in AI with Human-Centered Computing
CitizenHelper
Tool

More about our research:
http://ist.gmu.edu/~hpurohit/informatics-lab.html
CONTACT: hpurohit@gmu.edu
Acknowledgement:
Image sources, collaborators (especially Prof. Carlos Castillo, Prof. Valerie Shalin, Dr. Muhammad Imran);
U.S. DHS Science & Technology SMWGESDM Researcher-Practitioner Subgroup (especially Steve Peterson),
Human_Info_lab students as well as sponsors:
Questions?
48
Primary grants that supported this work:
• IIS #1657379, IIS #1815459

Human AI Collaboration for Real-time Data Processing at Emergency Services, Guest Lecture, University of South Carolina

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Human AI Collaboration for Real-time Data Processing at Emergency Services, Guest Lecture, University of South Carolina