SlideShare a Scribd company logo
1 of 21
Download to read offline
1Stefan Dietze
Backup
Human-in-the-Loop: the Web as Foundation for interdisciplinary
Data Science Methods and Research Questions
Stefan Dietze
GESIS - Leibniz Institute for the Social Sciences,
Heinrich-Heine-University Düsseldorf,
L3S Research Center
2Stefan Dietze
Interdisciplinary research facilitated by the Web
 Rapidly growing interdisciplinary research exploiting the Web for investigating online
behavior, e.g. with respect to knowledge construction and exchange, network effects,
or virality of disinformation (e.g. Vousoughi et al. 2018)
 Focused on gaining insights (e.g. social sciences, psychology) by understanding Web
data with the help of computational methods
Understanding & interpreting user behaviour & interactions
 Behaviour and interactions with online platforms (e.g. Web
search engines and social media platforms) & online
content (eg Tweets)
 Signals: click-through data, queries, shares, likes,
behavioral traces (mouse movements, navigation, eye
tracking etc)
Machine & representation learning, information retrieval, NLP and knowledge-based approaches for:
Understanding & intepreting (user-generated) Web content
 Content: web pages, social media posts, comments etc
 Extraction, verification, disambiguation of topics, entities,
stances, opinions, sentiments (semantics)
 Understanding language complexity, structure or modality
of online resources
3Stefan Dietze
Overview
 Understanding competence, information needs,
knowledge gain of users from behavioral traces
 Scenarios: Web search, microtask crowdsourcing
 Extraction & verification of factual knowledge & claims
 Stance detection of websites
 Understanding discourse/opinions/trends (Twitter)
Part IIPart I
Understanding & interpreting user behaviour & interactions
 Behaviour and interactions with online platforms (e.g. Web
search engines and social media platforms) & online
content (eg Tweets)
 Signals: click-through data, queries, shares, likes,
behavioral traces (mouse movements, navigation, eye
tracking etc)
Understanding & intepreting (user-generated) Web content
 Content: web pages, social media posts, comments etc
 Extraction, verification, disambiguation of topics, entities,
stances, opinions, sentiments (semantics)
 Understanding language complexity, structure or modality
of online resources
4Stefan Dietze
Extraction of "long-tail" factual knowledge on the web ?
<"Tim Berners-Lee" s:founderOf "Solid">
 How can entity-centric factual knowledge be extracted from
websites?
 Application of NLP/information extraction methods on 60 billion
Web pages (Google index)?
 Widespread adoption of embedded web markup
(Microdata/RDFa, schema.org): about 40% of all Common Crawl
web pages (3.2 billion Web pages) contain markup (about 44
billion "facts")
 Challenges
o Errors. Annotation errors and factual errors [Meusel et al,
ESWC2015]
o Ambiguity and co-references. e.g. 18,000 markup instances
of "iPhone 6" in Common Crawl 2016 & ambiguous literals
(e.g. "Apple")
o Redundancies & conflicts. large proportion of equivalent or
directly conflicting statements
5Stefan Dietze
KnowMore: data fusion on Web Markup
 0. Noise: data cleansing (URIs, deduplication etc)
 1.a) Scale: blocking with BM25 entity retrieval on Lucene index of markup data
 1.b) Relevance: supervised resolution of coreferences
 2.) Quality & Redundancy: Data Fusion with supervised classifier for all facts (SVM, knn, CNN, RF, LR, NB), uses various feature sets
(authority, relevance etc) of source (e.g. PageRank), entity description or facts
1. Blocking &
coreference resolution
2. Fusion / fact selection
(supervised)
Web page
markup
Web crawl
(Common Crawl,
44 bn facts)
Yu, R., [..], Dietze, S., KnowMore-Knowledge Base
Augmentation with Structured Web Markup, Semantic Web
Journal 2019 (SWJ2019)
Tempelmeier, N., Demidova, S., Dietze, S., Inferring Missing
Categorical Information in Noisy and Sparse Web Markup,
The Web Conf. 2018 (WWW2018)
New Query Entities
BBC Audio, type:(Organization)
Chapman & Hall, type:(Publisher)
Put Out More Flags, type:(Book)
Entity Description
author Evelyn Waugh
priorWork Put Out More Flags
ISBN 978031874803074
copyrightHolder Evelyn Waugh
releaseDate 1945
… …
Query Entity
Brideshead Revisited, type:(Book)
Candidate Facts
node1 publisher Chapman & Hall
node1 releaseDate 1945
node1 publishDate 1961
node2 country UK
node2 publisher Black Bay Books
node3 country US
node3 copyrightHolder Evelyn Waugh
… …. ….
About 5000 facts for "Brideshead Revisited
(125.000 facts for "iPhone6")
20 correct & non-redundant facts for "Brideshead Rev.
6Stefan Dietze
KnowMore: data fusion on Web Markup
 0. Noise: data cleansing (URIs, deduplication etc)
 1.a) Scale: blocking with BM25 entity retrieval on Lucene index of markup data
 1.b) Relevance: supervised resolution of coreferences
 2.) Quality & Redundancy: Data Fusion with supervised classifier for all facts (SVM, knn, CNN, RF, LR, NB), uses various feature sets
(authority, relevance etc) of source (e.g. PageRank), entity description or facts
1. Blocking &
coreference resolution
2. Fusion / fact selection
(supervised)
Web page
markup
Web crawl
(Common Crawl,
44 bn facts)
Yu, R., [..], Dietze, S., KnowMore-Knowledge Base
Augmentation with Structured Web Markup, Semantic Web
Journal 2019 (SWJ2019)
Tempelmeier, N., Demidova, S., Dietze, S., Inferring Missing
Categorical Information in Noisy and Sparse Web Markup,
The Web Conf. 2018 (WWW2018)
New Query Entities
BBC Audio, type:(Organization)
Chapman & Hall, type:(Publisher)
Put Out More Flags, type:(Book)
Entity Description
author Evelyn Waugh
priorWork Put Out More Flags
ISBN 978031874803074
copyrightHolder Evelyn Waugh
releaseDate 1945
… …
Query Entity
Brideshead Revisited, type:(Book)
Candidate Facts
node1 publisher Chapman & Hall
node1 releaseDate 1945
node1 publishDate 1961
node2 country UK
node2 publisher Black Bay Books
node3 country US
node3 copyrightHolder Evelyn Waugh
… …. ….
About 5000 facts for "Brideshead Revisited
(125.000 facts for "iPhone6")
20 correct & non-redundant facts for "Brideshead Rev.
Data fusion performance
 Experiments for books, films, products
 Baselines: BM25, CBFS [ESWC2015], PreRecCorr [Pochampally et.
al., ACM SIGMOD 2014], vary widely between types
Enriching knowledge graphs / finding new facts?
 On average 60% - 70% of all facts are new (compared to
knowledge graphs like WikiData, Freebase, Wikipedia/DBpedia)
 Experiments for learning categorical characteristics (e.g. film
genres or product categories) [WWW2018].
7Stefan Dietze
Understanding discourse & opinions on Twitter
http://dbpedia.org/resource/Tim_Berners-Lee
wna:positive-emotion
onyx:hasEmotionIntensity "0.75
onyx:hasEmotionIntensity "0.0
 Heterogeneity: multimodal, multilingual,
informal, "noisy" language
 Context dependency: interpretation of short
tweets requires consideration of context (e.g.
time, linked content), "Dusseldorf" => city or
football team
 Representativity & bias: demographic
distributions in Twitter archives not known
 Dynamics & scale: e.g. 8000 tweets per second,
plus interactions (retweets etc) & context (e.g.
25% of all tweets contain URLs)
 Evolution & temporal aspects: Evolution of
interactions over time important for most
research questions
http://dbpedia.org/resource/Solid
wna:negative-emotion
P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze,
TweetsKB: A Public and Large-Scale RDF Corpus of
Annotated Tweets, ESWC'18.
8Stefan Dietze
TweetsKB: a knowledge base of Web mined societal discourse
P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze,
TweetsKB: A Public and Large-Scale RDF Corpus of
Annotated Tweets, ESWC'18.
https://data.gesis.org/tweetskb/
 Collection & archiving of 10 billion tweets over 7 years
(permanent crawl of Twitter 1% API since 2013)
 Information extraction using NLP methods to extract
entities and sentiments (distributed batch processing
with Hadoop Map/Reduce)
o Entity linking with Wikipedia/DBpedia (Yahoo's FEL
[Blanco et al. 2015])
("president"/"potus"/"trump" => dbp:DonaldTrump), to
disambiguate tweets and link to background knowledge
(e.g. US politicians? Republicans?), high precision (.85),
poor recall (. 39)
o Sentiment analysis with SentiStrength [Thelwall et al.,
2017], F1 approx. . 80
o Extraction of metadata and lifting into established
formats and schemas (SIOC, schema.org), publication
using W3C standards (RDF/SPARQL)
10Stefan Dietze
TweetsCOV19: a knowledge graph of societal discourse on COVID19
Dimitrov, D., Baran, E., Fafalios, P., Yu, R., Zhu, X., Zloch, M., Dietze,
S., TweetsCOV19 -- A Knowledge Base of Semantically Annotated
Tweets about the COVID-19 Pandemic, CIKM2020.
https://data.gesis.org/tweetscov19/
 COVID19 discourse as foundation for
interdisciplinary research on solidarity behaviour
& societal changes during the pandemic
 8.1 million tweets since October 2019
(continuously updated), extracted using COVID-19
specific seed list & TweetsKB pipeline
 Used as corpus for CIKM2020 AnalytiCup & by
interdisciplinary partners, e.g. with the Federal
Statistical Office, Media & Communication
Studies @ Heinrich-Heine-University, University of
Hildesheim, etc.
11Stefan Dietze
Understanding claims & stances on the Web
12Stefan Dietze
Stance,
Trustworthiness of the
claim?
Stance,
Trustworthiness of the claim?
Understanding claims & stances on the Web
14Stefan Dietze
A hierarchical stance detection classifier
Motivation
 Problem: identifying stance of web documents (web pages,
tweets) on a specific claim
(class distribution highly unbalanced)
 Applications: stance of documents (especially disagreement)
important (a) as signal correctness of statement and (b) for the
classification of sources (Twitter users, PLDs)
Roy, A. Ekbal, S. Dietze, P. Fafalios, Exploiting stance hierarchies for cost-
sensitive stance detection of Web documents, preprint/Arxiv.
A. Tchechmedjiev, P. Fafalios, K. Boland, S. Dietze, B. Zapilko, K. Todorov,
ClaimsKG - A Live Knowledge Graph of fact-checked Claims, ISWC2019
15Stefan Dietze
Motivation
 Problem: identifying stance of web documents (web pages,
tweets) on a specific claim
(class distribution highly unbalanced)
 Applications: stance of documents (especially disagreement)
important (a) as signal correctness of statement and (b) for the
classification of sources (Twitter users, PLDs)
Approach
 Cascading binary classifiers to address problems at each step
(e.g. cost of misclassification)
 Features, e.g. text similarity (Word2Vec etc), sentiments, LIWC
 Best models per step: 1) SVM with class-wise penalty, 2) CNN, 3)
SVM with class-wise penalty
 Experiments with Fake News Challenge Benchmark Dataset &
baselines
Results
 Minor overall performance improvement
 27% improvement for disagree class
A hierarchical stance detection classifier Roy, A. Ekbal, S. Dietze, P. Fafalios, Exploiting stance hierarchies for cost-
sensitive stance detection of Web documents, preprint/Arxiv.
A. Tchechmedjiev, P. Fafalios, K. Boland, S. Dietze, B. Zapilko, K. Todorov,
ClaimsKG - A Live Knowledge Graph of fact-checked Claims, ISWC2019
16Stefan Dietze
 Extraction & verification of factual knowledge & claims
 Stance detection of websites
 Extraction of opinions/trends (Twitter)
Overview
Understanding & intepreting (user-generated) Web content
 Content: web pages, social media posts, etc
 Extraction, verification, disambiguation of topics, entities,
stances, opinions, sentiments (semantics)
 Understanding language complexity, structure or modality
of online resources
 Understanding competence, information needs,
knowledge gain of users from behavioral traces
 Scenarios: Web search, microtask crowdsourcing
Part IIPart I
Understanding & interpreting user behaviour & interactions
 Behaviour and interactions with online platforms (e.g. Web
search engines and social media platforms) & online
content (eg Tweets)
 Signals: click-through data, queries, shares, likes,
behavioral traces (mouse movements, navigation, eye
tracking etc)
17Stefan Dietze
Competence & knowledge acquisition of web users
Prediction from in-session behavior?
 Research questions: Is it possible to predict the
competence and knowledge acquisition of users on
the basis of user interactions such as browsing,
scrolling, or behavioral traces (mouse movements,
keystrokes, eye tracking)?
 Approach: Studies and machine learning models in
two scenarios: (a) Web Search and (b) Microtask
Crowdsourcing like Amazon Mechanical Turk
 Applications: e.g. for the classification of web users,
improvement of search results or the adaptation in
learning and assessment environments
Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding Malicious Behavior in
Crowdsourcing Platforms: The Case of Online Surveys, ACM CHI2015.
Gadiraju, U., Demartini, G., Kawase, R., Dietze, S., Crowd Anatomy Beyond the Good
and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection, Computer
Supported Cooperative Work 28(5): 815-841 (2019)
18Stefan Dietze
Acquisition of knowledge during web search?
Challenges & results
 Identifying coherent search missions?
 Identification of "learning" during search: identification of
"informational sessions" (as opposed to "transactional" or
"navigational" search [Broder, 2002])
o Classification with approx. F1 score 75% based on user
interactions
 How competent is the user? -
Predicting and understanding the competence / knowledge level
of users based on "in-session" behaviour
 How well does a user achieve his/her learning objective or
information need? - Predicting the knowledge state/gain during
a session
o Correlation of user behaviour (queries, browsing, mouse
movements etc) & knowledge state/gain [CHIIR18]
o Prediction of knowledge state/gain using supervised ML
methods [SIGIR18].
19Stefan Dietze
Knowledge level & growth vs user behaviour in web search
Data & experimental setup
 Crowdsourcing of behavioral data in search sessions
 10 topics/information needs (e.g. "altitude sickness", "tornados") plus
pre- and post-tests to determine knowledge state and knowledge gain
(KS, KG)
 Approx. 1000 crowd workers; 100 sessions per topic
 Monitoring of user behavior along 76 features in 5 categories: session,
query, SERP - search engine result page, browsing, mouse traces
Results
 70% of users show knowledge gain (KG)
 Negative correlation between KG & topic popularity (avg. accuracy of
workers in knowledge tests) (R= -.87)
 Time spent actively on websites explains 7% of knowledge gain
 Query complexity explains 25% of knowledge gain
 Search behavior correlates more strongly with search topic than with
KG/KS
Gadiraju, U., Yu, R., Dietze, S., Holtz, P.,. Analyzing
Knowledge Gain of Users in Informational Search
Sessions on the Web. ACM CHIIR 2018.
20Stefan Dietze
ML models to predict KG/KS during Web search
 Categorisation of the sessions along knowledge state (KS) & knowledge gain (KG)
in {low, moderate, high} with (low < (mean ± 0.5 SD) < high)
 Supervised multiclass classification (Naive Bayes, Logistic Regression, SVM, Random Forest, Multilayer
Perceptron)
 KG prediction performance
(after 10-fold cross-validation)
 Feature impact (KG prediction)
Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S.,
Analyzing Knowledge Gain of Users in Informational Search
Sessions on the Web. ACM SIGIR 2018.
21Stefan Dietze
ML models to predict KG/KS during the search
 Categorisation of the sessions along knowledge state (KS) & knowledge gain (KG)
in {low, moderate, high} with (low < (mean ± 0.5 SD) < high)
 Supervised multiclass classification (Naive Bayes, Logistic Regression, SVM, Random Forest, Multilayer
Perceptron)
 KG predicition performance
(after 10-fold cross-validation)
 Feature impact (KG prediction)
Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S.,
Analyzing Knowledge Gain of Users in Informational Search
Sessions on the Web. ACM SIGIR 2018.
Ongoing work
 Lab studies necessary for more reliable data
(controlled environment, longer sessions)
[completed]
 Additional behavioral features (eye tracking)
[CHIIR2020, CHI2020]
 Ressource features (e.g. complexity,
analytic/emotional language, multimodality etc) as
additional signals [IR Journal, under review]
 Improve ranking/retrieval in web search or in digital
archives
(SALIENT Project, Leibniz Cooperative Excellence;
GESIS Data Search platforms)
22Stefan Dietze
Other features to predict competence?
Expertise & the "Dunning-Kruger Effect
 Incompetence in a particular task reduces the ability to
recognise one's own incompetence in the task
(David Dunning. 2011 The Dunning-Kruger Effect: On Being Ignorant of One's Own Ignorance.
Advances in experimental social psychology 44 (2011), 247.)
Research questions
 Self-assessment as an additional feature to predict
competence?
 Application in microtask crowdsourcing for the classification
of "workers" or in online learning for the classification of
learners
Some results
 Self-assessment as a reliable feature for predicting
competence/future performance;
 More reliable than prior performance in the task alone
 The tendency to overestimate one's own competence grows
with increasing task difficulty Performance ("accuracy") of users classified as "competent" according to (1) prior
performance and (2) performance plus self-assessment
Gadiraju, U., Fetahu, B., Kawase, R., Siehndel, P., Dietze, S.,
Using Worker Self-Assessments for Competence-based Pre-
Selection in Crowdsourcing Microtasks. In: ACM Transactions
on Computer-Human Interaction (ACM TOCHI), Vol. 24,
Issue 4, August 2017.
23Stefan Dietze
Knowledge Technologies for the Social Sciences (WTS)
https://www.gesis.org/en/institute/departments/knowledge-technologies-for-
the-social-sciences/
Data & Knowledge Engineering @ HHU
https://www.cs.hhu.de/en/research-groups/data-knowledge-engineering.html
@stefandietze
http://stefandietze.net
Acknowledgements
• Erdal Baran (GESIS, Germany)
• Katarina Boland (GESIS, Germany)
• Stefan Conrad (HHU, Germany)
• Gianluca Demartini (Brisbane Uni, Australia)
• Elena Demidova (L3S, Germany)
• Dimitar Dimitrov (GESIS, Germany)
• Ujwal Gadiraju (Delft University, NL)
• Asif Ekbal (IIT Patna, India)
• Pavlos Fafalios (FORTH ICS, Greece)
• Peter Holtz (IWM, Tübingen)
• Ricardo Kawase (Mobile.de, Germany)
• Vasileios Iosifidis (L3S, Germany)
• Eirini Ntoutsi (LUH, Germany)
• Vasilis Iosifidis (L3S, Germany)
• Markus Rokicki (L3S, Germany)
• Arjun Roy (IIT Patna, India)
• Patrick Siehndel (L3S, Germany)
• Nicolas Tempelmeier (L3S, Germany)
• Konstantin Todorov (LIRMM, France)
• Ran Yu (GESIS, Germany)
• Benjamin Zapilko (GESIS, Germany)
• Matthäus Zloch (GESIS, Germany)
• Xiaofei Zhu (Chongqing University, China)

More Related Content

What's hot

Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Sören Auer
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesSciBite Limited
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
Trustworthy AI and Open Science
Trustworthy AI and Open ScienceTrustworthy AI and Open Science
Trustworthy AI and Open ScienceBeth Plale
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphsStefan Dietze
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Paolo Manghi
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesLaura Po
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedJoel Azzopardi
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataDongpo Deng
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 

What's hot (20)

Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Trustworthy AI and Open Science
Trustworthy AI and Open ScienceTrustworthy AI and Open Science
Trustworthy AI and Open Science
 
Sanderson Shout It Out: LOUD
Sanderson Shout It Out: LOUDSanderson Shout It Out: LOUD
Sanderson Shout It Out: LOUD
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental Data
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Ziegler Open Data in Special Collections Libraries
Ziegler Open Data in Special Collections LibrariesZiegler Open Data in Special Collections Libraries
Ziegler Open Data in Special Collections Libraries
 

Similar to Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science Methods and Research Questions

From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
 
AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...Stefan Dietze
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteShalin Hai-Jew
 
Humanities in the Digital World
Humanities in the Digital WorldHumanities in the Digital World
Humanities in the Digital WorldDavid De Roure
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebOpen Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebNoshir Contractor
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Digital Methods Initiative
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
 
The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0Weiai Wayne Xu
 
"Mass Surveillance" through Distant Reading
"Mass Surveillance" through Distant Reading"Mass Surveillance" through Distant Reading
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningStefan Dietze
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Oscar Corcho
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 
Semantic Wiki Based Collaborative Scientific Modeling Infrastructure
Semantic Wiki Based  Collaborative Scientific Modeling Infrastructure Semantic Wiki Based  Collaborative Scientific Modeling Infrastructure
Semantic Wiki Based Collaborative Scientific Modeling Infrastructure Jie Bao
 
Wire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub ProjectWire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub Projectmwe400
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebStefan Dietze
 

Similar to Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science Methods and Research Questions (20)

From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging Site
 
Humanities in the Digital World
Humanities in the Digital WorldHumanities in the Digital World
Humanities in the Digital World
 
DMI Summer 2010 - Final Presentations
DMI Summer 2010 - Final PresentationsDMI Summer 2010 - Final Presentations
DMI Summer 2010 - Final Presentations
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebOpen Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Data and science
Data and scienceData and science
Data and science
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0
 
"Mass Surveillance" through Distant Reading
"Mass Surveillance" through Distant Reading"Mass Surveillance" through Distant Reading
"Mass Surveillance" through Distant Reading
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday Learning
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
Semantic Wiki Based Collaborative Scientific Modeling Infrastructure
Semantic Wiki Based  Collaborative Scientific Modeling Infrastructure Semantic Wiki Based  Collaborative Scientific Modeling Infrastructure
Semantic Wiki Based Collaborative Scientific Modeling Infrastructure
 
Wire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub ProjectWire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub Project
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the Web
 

More from Stefan Dietze

An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...Stefan Dietze
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISStefan Dietze
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Stefan Dietze
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebStefan Dietze
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesStefan Dietze
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebStefan Dietze
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebStefan Dietze
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebStefan Dietze
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-esStefan Dietze
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Stefan Dietze
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsStefan Dietze
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedStefan Dietze
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsStefan Dietze
 
LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014Stefan Dietze
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataStefan Dietze
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Stefan Dietze
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebStefan Dietze
 
Open Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationOpen Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationStefan Dietze
 

More from Stefan Dietze (20)

An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESIS
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the Web
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online Activities
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the Web
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-es
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web Datasets
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked Datasets
 
LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open Data
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
 
Open Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationOpen Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in Education
 

Recently uploaded

ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 

Recently uploaded (11)

ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 

Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science Methods and Research Questions

  • 1. 1Stefan Dietze Backup Human-in-the-Loop: the Web as Foundation for interdisciplinary Data Science Methods and Research Questions Stefan Dietze GESIS - Leibniz Institute for the Social Sciences, Heinrich-Heine-University Düsseldorf, L3S Research Center
  • 2. 2Stefan Dietze Interdisciplinary research facilitated by the Web  Rapidly growing interdisciplinary research exploiting the Web for investigating online behavior, e.g. with respect to knowledge construction and exchange, network effects, or virality of disinformation (e.g. Vousoughi et al. 2018)  Focused on gaining insights (e.g. social sciences, psychology) by understanding Web data with the help of computational methods Understanding & interpreting user behaviour & interactions  Behaviour and interactions with online platforms (e.g. Web search engines and social media platforms) & online content (eg Tweets)  Signals: click-through data, queries, shares, likes, behavioral traces (mouse movements, navigation, eye tracking etc) Machine & representation learning, information retrieval, NLP and knowledge-based approaches for: Understanding & intepreting (user-generated) Web content  Content: web pages, social media posts, comments etc  Extraction, verification, disambiguation of topics, entities, stances, opinions, sentiments (semantics)  Understanding language complexity, structure or modality of online resources
  • 3. 3Stefan Dietze Overview  Understanding competence, information needs, knowledge gain of users from behavioral traces  Scenarios: Web search, microtask crowdsourcing  Extraction & verification of factual knowledge & claims  Stance detection of websites  Understanding discourse/opinions/trends (Twitter) Part IIPart I Understanding & interpreting user behaviour & interactions  Behaviour and interactions with online platforms (e.g. Web search engines and social media platforms) & online content (eg Tweets)  Signals: click-through data, queries, shares, likes, behavioral traces (mouse movements, navigation, eye tracking etc) Understanding & intepreting (user-generated) Web content  Content: web pages, social media posts, comments etc  Extraction, verification, disambiguation of topics, entities, stances, opinions, sentiments (semantics)  Understanding language complexity, structure or modality of online resources
  • 4. 4Stefan Dietze Extraction of "long-tail" factual knowledge on the web ? <"Tim Berners-Lee" s:founderOf "Solid">  How can entity-centric factual knowledge be extracted from websites?  Application of NLP/information extraction methods on 60 billion Web pages (Google index)?  Widespread adoption of embedded web markup (Microdata/RDFa, schema.org): about 40% of all Common Crawl web pages (3.2 billion Web pages) contain markup (about 44 billion "facts")  Challenges o Errors. Annotation errors and factual errors [Meusel et al, ESWC2015] o Ambiguity and co-references. e.g. 18,000 markup instances of "iPhone 6" in Common Crawl 2016 & ambiguous literals (e.g. "Apple") o Redundancies & conflicts. large proportion of equivalent or directly conflicting statements
  • 5. 5Stefan Dietze KnowMore: data fusion on Web Markup  0. Noise: data cleansing (URIs, deduplication etc)  1.a) Scale: blocking with BM25 entity retrieval on Lucene index of markup data  1.b) Relevance: supervised resolution of coreferences  2.) Quality & Redundancy: Data Fusion with supervised classifier for all facts (SVM, knn, CNN, RF, LR, NB), uses various feature sets (authority, relevance etc) of source (e.g. PageRank), entity description or facts 1. Blocking & coreference resolution 2. Fusion / fact selection (supervised) Web page markup Web crawl (Common Crawl, 44 bn facts) Yu, R., [..], Dietze, S., KnowMore-Knowledge Base Augmentation with Structured Web Markup, Semantic Web Journal 2019 (SWJ2019) Tempelmeier, N., Demidova, S., Dietze, S., Inferring Missing Categorical Information in Noisy and Sparse Web Markup, The Web Conf. 2018 (WWW2018) New Query Entities BBC Audio, type:(Organization) Chapman & Hall, type:(Publisher) Put Out More Flags, type:(Book) Entity Description author Evelyn Waugh priorWork Put Out More Flags ISBN 978031874803074 copyrightHolder Evelyn Waugh releaseDate 1945 … … Query Entity Brideshead Revisited, type:(Book) Candidate Facts node1 publisher Chapman & Hall node1 releaseDate 1945 node1 publishDate 1961 node2 country UK node2 publisher Black Bay Books node3 country US node3 copyrightHolder Evelyn Waugh … …. …. About 5000 facts for "Brideshead Revisited (125.000 facts for "iPhone6") 20 correct & non-redundant facts for "Brideshead Rev.
  • 6. 6Stefan Dietze KnowMore: data fusion on Web Markup  0. Noise: data cleansing (URIs, deduplication etc)  1.a) Scale: blocking with BM25 entity retrieval on Lucene index of markup data  1.b) Relevance: supervised resolution of coreferences  2.) Quality & Redundancy: Data Fusion with supervised classifier for all facts (SVM, knn, CNN, RF, LR, NB), uses various feature sets (authority, relevance etc) of source (e.g. PageRank), entity description or facts 1. Blocking & coreference resolution 2. Fusion / fact selection (supervised) Web page markup Web crawl (Common Crawl, 44 bn facts) Yu, R., [..], Dietze, S., KnowMore-Knowledge Base Augmentation with Structured Web Markup, Semantic Web Journal 2019 (SWJ2019) Tempelmeier, N., Demidova, S., Dietze, S., Inferring Missing Categorical Information in Noisy and Sparse Web Markup, The Web Conf. 2018 (WWW2018) New Query Entities BBC Audio, type:(Organization) Chapman & Hall, type:(Publisher) Put Out More Flags, type:(Book) Entity Description author Evelyn Waugh priorWork Put Out More Flags ISBN 978031874803074 copyrightHolder Evelyn Waugh releaseDate 1945 … … Query Entity Brideshead Revisited, type:(Book) Candidate Facts node1 publisher Chapman & Hall node1 releaseDate 1945 node1 publishDate 1961 node2 country UK node2 publisher Black Bay Books node3 country US node3 copyrightHolder Evelyn Waugh … …. …. About 5000 facts for "Brideshead Revisited (125.000 facts for "iPhone6") 20 correct & non-redundant facts for "Brideshead Rev. Data fusion performance  Experiments for books, films, products  Baselines: BM25, CBFS [ESWC2015], PreRecCorr [Pochampally et. al., ACM SIGMOD 2014], vary widely between types Enriching knowledge graphs / finding new facts?  On average 60% - 70% of all facts are new (compared to knowledge graphs like WikiData, Freebase, Wikipedia/DBpedia)  Experiments for learning categorical characteristics (e.g. film genres or product categories) [WWW2018].
  • 7. 7Stefan Dietze Understanding discourse & opinions on Twitter http://dbpedia.org/resource/Tim_Berners-Lee wna:positive-emotion onyx:hasEmotionIntensity "0.75 onyx:hasEmotionIntensity "0.0  Heterogeneity: multimodal, multilingual, informal, "noisy" language  Context dependency: interpretation of short tweets requires consideration of context (e.g. time, linked content), "Dusseldorf" => city or football team  Representativity & bias: demographic distributions in Twitter archives not known  Dynamics & scale: e.g. 8000 tweets per second, plus interactions (retweets etc) & context (e.g. 25% of all tweets contain URLs)  Evolution & temporal aspects: Evolution of interactions over time important for most research questions http://dbpedia.org/resource/Solid wna:negative-emotion P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
  • 8. 8Stefan Dietze TweetsKB: a knowledge base of Web mined societal discourse P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18. https://data.gesis.org/tweetskb/  Collection & archiving of 10 billion tweets over 7 years (permanent crawl of Twitter 1% API since 2013)  Information extraction using NLP methods to extract entities and sentiments (distributed batch processing with Hadoop Map/Reduce) o Entity linking with Wikipedia/DBpedia (Yahoo's FEL [Blanco et al. 2015]) ("president"/"potus"/"trump" => dbp:DonaldTrump), to disambiguate tweets and link to background knowledge (e.g. US politicians? Republicans?), high precision (.85), poor recall (. 39) o Sentiment analysis with SentiStrength [Thelwall et al., 2017], F1 approx. . 80 o Extraction of metadata and lifting into established formats and schemas (SIOC, schema.org), publication using W3C standards (RDF/SPARQL)
  • 9. 10Stefan Dietze TweetsCOV19: a knowledge graph of societal discourse on COVID19 Dimitrov, D., Baran, E., Fafalios, P., Yu, R., Zhu, X., Zloch, M., Dietze, S., TweetsCOV19 -- A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic, CIKM2020. https://data.gesis.org/tweetscov19/  COVID19 discourse as foundation for interdisciplinary research on solidarity behaviour & societal changes during the pandemic  8.1 million tweets since October 2019 (continuously updated), extracted using COVID-19 specific seed list & TweetsKB pipeline  Used as corpus for CIKM2020 AnalytiCup & by interdisciplinary partners, e.g. with the Federal Statistical Office, Media & Communication Studies @ Heinrich-Heine-University, University of Hildesheim, etc.
  • 10. 11Stefan Dietze Understanding claims & stances on the Web
  • 11. 12Stefan Dietze Stance, Trustworthiness of the claim? Stance, Trustworthiness of the claim? Understanding claims & stances on the Web
  • 12. 14Stefan Dietze A hierarchical stance detection classifier Motivation  Problem: identifying stance of web documents (web pages, tweets) on a specific claim (class distribution highly unbalanced)  Applications: stance of documents (especially disagreement) important (a) as signal correctness of statement and (b) for the classification of sources (Twitter users, PLDs) Roy, A. Ekbal, S. Dietze, P. Fafalios, Exploiting stance hierarchies for cost- sensitive stance detection of Web documents, preprint/Arxiv. A. Tchechmedjiev, P. Fafalios, K. Boland, S. Dietze, B. Zapilko, K. Todorov, ClaimsKG - A Live Knowledge Graph of fact-checked Claims, ISWC2019
  • 13. 15Stefan Dietze Motivation  Problem: identifying stance of web documents (web pages, tweets) on a specific claim (class distribution highly unbalanced)  Applications: stance of documents (especially disagreement) important (a) as signal correctness of statement and (b) for the classification of sources (Twitter users, PLDs) Approach  Cascading binary classifiers to address problems at each step (e.g. cost of misclassification)  Features, e.g. text similarity (Word2Vec etc), sentiments, LIWC  Best models per step: 1) SVM with class-wise penalty, 2) CNN, 3) SVM with class-wise penalty  Experiments with Fake News Challenge Benchmark Dataset & baselines Results  Minor overall performance improvement  27% improvement for disagree class A hierarchical stance detection classifier Roy, A. Ekbal, S. Dietze, P. Fafalios, Exploiting stance hierarchies for cost- sensitive stance detection of Web documents, preprint/Arxiv. A. Tchechmedjiev, P. Fafalios, K. Boland, S. Dietze, B. Zapilko, K. Todorov, ClaimsKG - A Live Knowledge Graph of fact-checked Claims, ISWC2019
  • 14. 16Stefan Dietze  Extraction & verification of factual knowledge & claims  Stance detection of websites  Extraction of opinions/trends (Twitter) Overview Understanding & intepreting (user-generated) Web content  Content: web pages, social media posts, etc  Extraction, verification, disambiguation of topics, entities, stances, opinions, sentiments (semantics)  Understanding language complexity, structure or modality of online resources  Understanding competence, information needs, knowledge gain of users from behavioral traces  Scenarios: Web search, microtask crowdsourcing Part IIPart I Understanding & interpreting user behaviour & interactions  Behaviour and interactions with online platforms (e.g. Web search engines and social media platforms) & online content (eg Tweets)  Signals: click-through data, queries, shares, likes, behavioral traces (mouse movements, navigation, eye tracking etc)
  • 15. 17Stefan Dietze Competence & knowledge acquisition of web users Prediction from in-session behavior?  Research questions: Is it possible to predict the competence and knowledge acquisition of users on the basis of user interactions such as browsing, scrolling, or behavioral traces (mouse movements, keystrokes, eye tracking)?  Approach: Studies and machine learning models in two scenarios: (a) Web Search and (b) Microtask Crowdsourcing like Amazon Mechanical Turk  Applications: e.g. for the classification of web users, improvement of search results or the adaptation in learning and assessment environments Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys, ACM CHI2015. Gadiraju, U., Demartini, G., Kawase, R., Dietze, S., Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection, Computer Supported Cooperative Work 28(5): 815-841 (2019)
  • 16. 18Stefan Dietze Acquisition of knowledge during web search? Challenges & results  Identifying coherent search missions?  Identification of "learning" during search: identification of "informational sessions" (as opposed to "transactional" or "navigational" search [Broder, 2002]) o Classification with approx. F1 score 75% based on user interactions  How competent is the user? - Predicting and understanding the competence / knowledge level of users based on "in-session" behaviour  How well does a user achieve his/her learning objective or information need? - Predicting the knowledge state/gain during a session o Correlation of user behaviour (queries, browsing, mouse movements etc) & knowledge state/gain [CHIIR18] o Prediction of knowledge state/gain using supervised ML methods [SIGIR18].
  • 17. 19Stefan Dietze Knowledge level & growth vs user behaviour in web search Data & experimental setup  Crowdsourcing of behavioral data in search sessions  10 topics/information needs (e.g. "altitude sickness", "tornados") plus pre- and post-tests to determine knowledge state and knowledge gain (KS, KG)  Approx. 1000 crowd workers; 100 sessions per topic  Monitoring of user behavior along 76 features in 5 categories: session, query, SERP - search engine result page, browsing, mouse traces Results  70% of users show knowledge gain (KG)  Negative correlation between KG & topic popularity (avg. accuracy of workers in knowledge tests) (R= -.87)  Time spent actively on websites explains 7% of knowledge gain  Query complexity explains 25% of knowledge gain  Search behavior correlates more strongly with search topic than with KG/KS Gadiraju, U., Yu, R., Dietze, S., Holtz, P.,. Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM CHIIR 2018.
  • 18. 20Stefan Dietze ML models to predict KG/KS during Web search  Categorisation of the sessions along knowledge state (KS) & knowledge gain (KG) in {low, moderate, high} with (low < (mean ± 0.5 SD) < high)  Supervised multiclass classification (Naive Bayes, Logistic Regression, SVM, Random Forest, Multilayer Perceptron)  KG prediction performance (after 10-fold cross-validation)  Feature impact (KG prediction) Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018.
  • 19. 21Stefan Dietze ML models to predict KG/KS during the search  Categorisation of the sessions along knowledge state (KS) & knowledge gain (KG) in {low, moderate, high} with (low < (mean ± 0.5 SD) < high)  Supervised multiclass classification (Naive Bayes, Logistic Regression, SVM, Random Forest, Multilayer Perceptron)  KG predicition performance (after 10-fold cross-validation)  Feature impact (KG prediction) Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018. Ongoing work  Lab studies necessary for more reliable data (controlled environment, longer sessions) [completed]  Additional behavioral features (eye tracking) [CHIIR2020, CHI2020]  Ressource features (e.g. complexity, analytic/emotional language, multimodality etc) as additional signals [IR Journal, under review]  Improve ranking/retrieval in web search or in digital archives (SALIENT Project, Leibniz Cooperative Excellence; GESIS Data Search platforms)
  • 20. 22Stefan Dietze Other features to predict competence? Expertise & the "Dunning-Kruger Effect  Incompetence in a particular task reduces the ability to recognise one's own incompetence in the task (David Dunning. 2011 The Dunning-Kruger Effect: On Being Ignorant of One's Own Ignorance. Advances in experimental social psychology 44 (2011), 247.) Research questions  Self-assessment as an additional feature to predict competence?  Application in microtask crowdsourcing for the classification of "workers" or in online learning for the classification of learners Some results  Self-assessment as a reliable feature for predicting competence/future performance;  More reliable than prior performance in the task alone  The tendency to overestimate one's own competence grows with increasing task difficulty Performance ("accuracy") of users classified as "competent" according to (1) prior performance and (2) performance plus self-assessment Gadiraju, U., Fetahu, B., Kawase, R., Siehndel, P., Dietze, S., Using Worker Self-Assessments for Competence-based Pre- Selection in Crowdsourcing Microtasks. In: ACM Transactions on Computer-Human Interaction (ACM TOCHI), Vol. 24, Issue 4, August 2017.
  • 21. 23Stefan Dietze Knowledge Technologies for the Social Sciences (WTS) https://www.gesis.org/en/institute/departments/knowledge-technologies-for- the-social-sciences/ Data & Knowledge Engineering @ HHU https://www.cs.hhu.de/en/research-groups/data-knowledge-engineering.html @stefandietze http://stefandietze.net Acknowledgements • Erdal Baran (GESIS, Germany) • Katarina Boland (GESIS, Germany) • Stefan Conrad (HHU, Germany) • Gianluca Demartini (Brisbane Uni, Australia) • Elena Demidova (L3S, Germany) • Dimitar Dimitrov (GESIS, Germany) • Ujwal Gadiraju (Delft University, NL) • Asif Ekbal (IIT Patna, India) • Pavlos Fafalios (FORTH ICS, Greece) • Peter Holtz (IWM, Tübingen) • Ricardo Kawase (Mobile.de, Germany) • Vasileios Iosifidis (L3S, Germany) • Eirini Ntoutsi (LUH, Germany) • Vasilis Iosifidis (L3S, Germany) • Markus Rokicki (L3S, Germany) • Arjun Roy (IIT Patna, India) • Patrick Siehndel (L3S, Germany) • Nicolas Tempelmeier (L3S, Germany) • Konstantin Todorov (LIRMM, France) • Ran Yu (GESIS, Germany) • Benjamin Zapilko (GESIS, Germany) • Matthäus Zloch (GESIS, Germany) • Xiaofei Zhu (Chongqing University, China)