3. What is Behavioral Big Data (BBD)
Special type of Big Data
• Behavioral: people’s measurable
“everyday” behavior,
interactions, self-reported
opinions, thoughts, feelings
• Human and social aspects:
Intentions, deception,
emotion, reciprocation,
herding,…
When aware of data collection ->
modified behavior
4. BBD vs. Inanimate Big Data
Human Subjects
• Aware, ongoing
interaction with the data
• Can be harmed by BBD
5. BBD vs.
Physiological
Big Data
• Individual bodies
• Physical measurements
• Medical systems set
data collection timing
• Aware of collection
• Vested interest
• Collection of connected people
• Measurable behaviors
• User generated content
• Often unaware of collection
• Not always in user’s best
interest
9. Human data from a typical hospital
Patients
Personal info
Medical history
visits, tests, medications,…
Scheduled events
Billing, insurance
Physicians
Scheduled + actual appointments,
procedures, prescriptions,…
Entries of patient info/data
Nurses
Location, work hours,…
Pharmacy staff
Speed of service
Quality of service
Lab staff
Speed of service
Quality of service
Other staff
Finance/accounting
Cleaning
Receptionists
Volunteers
Food court
Data Collection
Technologies
• Medical devices
• HIT systems
“Smart Hospital”
• Cameras
• Sensors
• GPS
• IoT
11. Interactions between
Patients – doctors/nurses
Doctors – other doctors
Patients – other patients
Patient family – hospital staff
Patients – social network ”friends”
...
New data #1:
Recorded Interactions
12. BBD:
• 90,000 doctor-patient conversations
during clinical visits
• 151 types of medical visits for
different purposes
• conversation sometimes also
including a nurse, or family member
17. Self-logged BBD on apps
Data voluntarily entered by users
health condition, symptoms, feelings, behaviors
(eating, exercise, sleep, sex, parking…)
Passive footprints
app log times, pages browsed, sequence, location…
18. In addition to logging a menstruation and health diary, users can
join a number of different themed groups including weight loss,
clothing, fitness, relationships, and travel. These groups
look and work much like “message board”-style social network
To date, Meet You has reportedly accumulated two million daily
active users, 1.2 million daily active users of its social network,
and over 800,000 daily posts.
Big and Behavioral:
Every day, women manually log around 1.4 M new data points
including cycle history, ovulation and pregnancy tests results, age,
height, weight, lifestyle statistics about sleep, activity, and nutrition.
In addition, more data comes from wearable devices like Fitbit &
Apple Watch.
21. Mobile health apps and wearable devices
that use artificial intelligence to help
diagnose or even treat medical conditions
pose a new regulatory challenge for the
U.S. Food and Drug Administration
23. “Some hospitals are collecting new information
from patients directly, while others have sought
data from companies that sell consumer and
financial information, or federal agencies that
provide statistics on poverty, housing density
and unemployment”
25. Subjects underwent a standardized
neurocognitive assessment, then went home
with an app that measured the ways they
touched their phone’s display (swipes, taps,
and keyboard typing)
Mindstrong conducted studies to figure
out whether there might be a systemic
measure of cognitive ability—or
disability—hidden in how we use our
phones.
memory problems… can be spotted by looking at
things including how rapidly you type and what errors
you make (such as how frequently you delete
characters), as well as by how fast you scroll down a
list of contacts
26. PRIVACY:
“while Mindstrong says it protects users’
data, collecting such data at all could be
a scary prospect for many of the people it
aims to help.
Companies may be interested in, say,
including it as part of an employee
wellness plan, but most of us wouldn’t
want our employers anywhere near our
mental health data”
27. This is where it becomes
ethically challenging:
Who’s collecting the data
and for what purpose?
Are users aware of the data collection
and usage?
What are users’ benefits & risks
from sharing their data?
28. What we’ve learned… is that
we need to take a more
proactive role in a broader
view of our responsibility. It’s
not enough to just build tools,
we need to make sure that
they’re used for good
What we’ve learned… is that
we need to take a more
proactive role in a broader
view of our responsibility. It’s
not enough to just build tools,
we need to make sure that
they’re used for good
32. Research Fields Using Health-Related BBD
Operations Researchers and Industrial Engineers
For: Hospital Management and Operations
(staffing, scheduling,…)
Medical/Healthcare Researchers & Clinicians
For: Improved Medical Treatment
(safety, effectiveness,…)
Information Systems Researchers
For: Improved Design & Use of Medical IS
(value of IS, effectiveness, standardization,…)
Marketing
Advertising
Insurance
Machine Learning
Social Science
33. How Do Researchers Get
Health BBD?
1. Open/Publicly Available Data
Constantly refreshed or single data dump
API, web scraping
Hacked data
2. Partner with Company/Organization
• Both parties interested in research question
• Data purchase
• Personal connections, sabbaticals, internships
• Partnership between school and organization
• Third party (WCAI)
3. Crowdsourcing
4. China (!)
34. Research Using New Health BBD: Challenges
Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Scientific vs.
Clinical vs.
Commercial
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Under/over-coverage
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
Generalization Challenges:
Acquire + analyze data
Users (self-selection,
spill-over, knowledge of
allocation, network)
Company algorithms
Data contaminated by:New modes of connection &
information (social networks,
forums, IoT, Apps)
Overall vs.
Individual
effect
Technical expertise
larger distance
Old Q, new data: Operationalize new variables
New Q: Lack of literature
36. Two examples of high-profile studies
using new health BBD
Emotional contagion in
social networks
Kramer et al. (PNAS, 2014)
Detecting influenza epidemics
using search engine query data
Ginsberg et al. (Nature, 2009)
38. • No Ethics Board Review (IRB)
“[The work] was consistent with Facebook’s Data
Use Policy, to which all users agree prior to
creating an account on Facebook, constituting
informed consent for this research.”
• PNAS editorial Expression of Concern
• Varied response from public, academia, press,
ethicists, corporates
Where do data scientists get ethics training?
39. Example #2
• “Up-to-date influenza estimates may enable
public health officials and health professional to
better respond to seasonal epidemics”
• BBD: automated search results for 50M
keywords on Google.com (2003-2007). For each
query: {query text, IP address}
• Fit 450M different models, correlating each
query text with CDC data; Combined 45 queries
with highest correlation
40. Researchers: epidemiologists + data science academics
Dalton et al. )2016(, “Flutracking weekly online community
survey of influenza-like illness annual report, 2015”
Communicable diseases intelligence quarterly report
Challenge: Acquire data
41. • Algorithm detects “flu” or “winter”?
• Persistent over-estimation
• Performs worse than lagged CDC
3-week-old data
• Never released 45 terms used
• Lazer et al. recommend combining/
calibrating GFT with CDC data
But most importantly…
42. Changes made by Google’s search
algorithm to display potential
diagnoses + recommend search for
treatment (more advertising)
-> increased search
44. Uses Google searches to measure sensitive
behaviors/opinions/thoughts on
racism, self-induced abortion, depression,
child abuse, hateful mobs, the science of
humor, sexual preference, anxiety, son
preference, and sexual insecurity, among
many other topics.
47. … and new challenges
Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Scientific vs.
Clinical vs.
Commercial
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Under/over-coverage
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
Generalization Challenges:
Acquire + analyze data
Users (self-selection,
spill-over, knowledge of
allocation, network)
Company algorithms
Data contaminated by:New modes of connection &
information (social networks,
forums, IoT, Apps)
Overall vs.
Individual
effect
Technical expertise
Old Q, new data: Operationalize new variables
New Q: Lack of literature
48. Behavioral Big Data
& Healthcare
Research
של לזכרה הרצאהכהן אילה פרופ׳
ולרבים רבות ותרמה שהובילה
כסטטיסטיקאית,חוקרת,מרצה,מנחה,ומנטורית
וניהול תעשיה להנדסת הפקולטה,הטכניון4.8.2019
49. • Greene, Shmueli, Ray & Fell (2019), Adjusting to the GDPR: The Impact on Data Scientists and
Behavioral Researchers, Big Data, forthcoming
• Shmueli (2017), Research Dilemmas With Behavioral Big Data, Big Data, vol 5 issue 2, pp. 98-119
• Shmueli (2017), Analyzing Behavioral Big Data: Methodological, Practical, Ethical, and Moral Issues,
with discussion and rejoinder, Quality Engineering, vol 29 no 1, pp. 57-74 and 88-90.
Editor's Notes
Inanimate:
Medical devices and drug manufacturing (quality control, safety)
Laboratory testing
UGC = User Generated Content
Separate research methods & ethics (e.g. deception)
Think of complimentary WiFi as a high-level view of your office. You can better understand the behavior of patients and staff and discover ways to make their experiences better. (https://blogs.spectrio.com/use-free-wifi-to-connect-with-patients-in-your-healthcare-office)
https://www.facebook.com/ayala.cohen.733
https://healthitanalytics.com/news/has-google-cracked-ehr-speech-recognition-for-medical-conversations
Chiu et al. (2017). Speech recognition for medical conversations. arXiv preprint arXiv:1711.07274.
John Hancock, one of the largest and oldest insurers in the United States, has announced it will stop selling traditional life insurance and will only market interactive policies that record the exercise activities and data of health of its customers through wearables such as Fitbit or Apple Watch.
https://www.forbes.com/sites/enriquedans/2018/09/21/insurance-wearables-and-the-future-of-healthcare/#25ddf7441782
https://www.technologyreview.com/s/612266/the-smartphone-app-that-can-tell-youre-depressed-before-you-know-it-yourself/
“thousands of people are using the app, and the company now has five years of clinical study data to confirm its science and technology.”
AOL log data; OKCupid data hacked by Danish researchers; AshleyMadison data
Let’s explore the landscape: what is there, who is using it, and for what?
HHS propose new IRB exemption criteria for publicly available data (or even buying it)
Council for Big Data, Ethics & Society’s letter: “these criteria for exclusion focus on the status of the dataset… not the content of the dataset nor what will be done with the dataset, which are more accurate criteria for determining the risk profile of the proposed research
How Does Flutracking work?
It takes only 10 - 15 seconds each week. We ask if you have had fever or cough in the last week. This will help us find ways to detect both seasonal influenza and hopefully pandemic influenza and other diseases so we can better protect the community from epidemics.
FluNearYou.org