Almost from its very beginning, the Web has been ambivalent.
It has facilitated freedom for information, but this also included the freedom to spread misinformation. It has faciliated intelligent personalization, but at the cost of intrusion into our private lifes. It has included more people than any other system before, but at the risk of exploiting them.
The Web is full of such ambivalences and the usage of artificial intelligences threatens to further amplify these ambivalences. To further the good and to contain the negative consequences, we need a research agenda studying and engineering the Web, as well as numerous activities by societies at large. In this talk, I will present and discuss a joint effort by an interdisciplinary team of Web Scientists to prepare and pursue such an agenda.
1. Steffen Staab 1Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Web and Internet Science Group · ECS · University of Southampton, UK &
Web Futures
Inclusive, Intelligent, Sustainable
This is really about Web Science
Steffen Staab
@ststaab
https://de.slideshare.net/steffenstaab
2. Steffen Staab 2
• Why wasn‘t the Semantic Web working?
(Staab et al 2019)
• Why did the Web work?
– Why did some Web encyclopedias flourish and others not?
– Why did some social networks grow and others stall?
– Why do open source projects grow and why do companies
put money into it?
• What could threaten the Web?
– Privacy (then!)
– Monopoly/Oligopoly (internet.org)
The origins of Web Science in 2004
3. Steffen Staab 3
What is Web Science?
(Staab, 2013)Co-constitution of Technology and Society
4. Steffen Staab 4
Less than the union, more than the intersection
Web Science Butterfly – How do you see the Web?
not loved by anyone and still the best depiction
5. Steffen Staab 5
Manifesto (Berendt et al) in preparation with most
significant contributions by
• Bettina Berendt, Fabien Gandon, Susan Halford,
Katharina Kinder-Kurlanda, Eirini Ntoutsi
Dagstuhl Seminar:
Web Science 10 years – closing the Loop
Dagstuhl picture
here
6. Steffen Staab 6
What is Web Science?
(Staab, 2013)Co-constitution of Technology and Society
7. Steffen Staab 7
Social determinism
Society defines technology and its effects.
Example:
EU politics defines what technology
„is able to do“ –
upload filters in copyright law
private? commercial?
Tower in Paris or Las Vegas?
Permanent nightly lightning vs light show?
France? Germany? USA?
Science and Technology Studies: Co-constitution
Von Benh LIEU SONG - own work, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=6926930
8. Steffen Staab 8
Technological determinism
Technology alone determines
usage
Example about privacy intrusion
by technology:
"If you have something that you
don't want anyone to know, maybe
you shouldn't be doing it in the first
place.“ – ERIC SCHMIDT:
gawker.com
Science and Technology Studies: Co-constitution
By Hecker / MSC -
https://www.securityconference.de/mediathek/munich-
security-conference-2018/image/eric-schmidt/filter/image/,
CC BY 3.0 de,
https://commons.wikimedia.org/w/index.php?curid=69416818
9. Steffen Staab 10
Co-constitution
Technology and society determine each other
Example:
• Technology: SMS
• Usage: hdgdl lol rofl
• Derived technologies: Twitter, Whatsapp
• Usage: #
• Derived technologies: Instagram, Snapchat
• Usage:
Politics,
Influencing as
business model
Science and Technology Studies: Co-constitution
10. Steffen Staab 11
The Web as a boundary object
• „data“
• „algorithms“
• „artificial intelligence“
• „gold standard“
• „communities“
• „survey“
• „data protection“
By Illustrator unknown - From Charles Maurice Stebbins & Mary H. Coolidge, Golden Treasury Readers: Primer, American Book Co.
(New York), p. 89., Public Domain, https://commons.wikimedia.org/w/index.php?curid=4581171
12. Steffen Staab 13
• Germany:
„Freie Fahrt für freie Bürger“ – no speed limit
• USA:
„The right to carry weapons“ – self-defense
• UK Brexit:
„Take back control“ – no delegation of power
Society not good dealing with ambivalences
Technical solutions exist to improve social welfare –
but they are of no use, if people disagree
14. Steffen Staab 15
Latest example
• „CNN Refuses to Run Trump Campaign's Biden Ad“
because of factual incorrectness
• Facebook accepts same video as part of its ad
business
• Fact checking as a way out?
– IFCN
• >100 fact checking
organizations
https://www.poynter.org/ifcn
• Facebook partners
• (Brandtzaeg and Følstad 2017)
Ambivalence:
Information Freedom vs Information Quality
Fact
checking
(Mis-)Trusting
the Fact
Checkers
15. Steffen Staab 16
RumourEval 2019 (Gorrell et al 2019)
Veracity: False
Stance: Comment
Stance: Comment
Stance: Deny
Stance: Deny
Subtask A:
• classify each comment as
support, deny, query, or
comment towards the
statement in the post
Subtask B:
• classify the statement
expressed in the thread’s
source post as true, false, or
unverified
Labelled collection of comment
threads from Twitter and Reddit
16. Steffen Staab 17
CLEARumor (Baris et al 2019)
• Pre-trained ELMo embeddings
• CNN-based model with auxiliary
input features
• Second place in Subtask B
(Veracity)
Idea: User‘s stance towards a claim
could be a clue for debunking the claim in
early stage.
17. Steffen Staab 18
• Data sets: size, balance
• Real-time
• Astroturfing campaigns
• No journalistic knowledge on fact checking:
just canned knowledge
⇒ Community oriented approach with FactCheckingNI
Failures of current misinfo detection
18. Steffen Staab 19
• No obvious solution
• No stopping condition
– Campaigns: Trolls / electronic armies / astroturfing
• Involving many stakeholders with diverging interest
• Needs technology, behavior, education, policy,
media/journalism
Misinformation is a Wicked Web Science Problem
An AI filter can be a
building stone, but not
the eventual solution
to deal with
ambivalence of
filtering
Mor Namaan 2019
21. Steffen Staab 22
Future of Online Participation?
https://de.wikipedia.org/wiki/Boris_Johnson#/media/Datei:Johnson_(48791303991)_(cropped).jpg
Deal or No Deal?
22. Steffen Staab 23
Exclusion:
• Not everyone has internet
• (functional) analphabets
Self censorship:
• legasthenic?
• Not reveal themselves
Gender discrimination
• Offline: gender quota for
presenters at Green Party
• Online: females trolled
more than males
Online Participation: Always great?
23. Steffen Staab 24
Exclusion:
• Not everyone has internet
• (functional) analphabets
Self censorship:
• legasthenic?
• Not reveal themselves
Gender discrimination
• Offline: gender quote for
presenters at Green Party
• Online: females trolled
more than males
Online Participation: Always great?
Renate Künast, Member of Bundestag, Green Party
Olaf Kosinsky - own work, CC BY-SA 3.0 de,
https://commons.wikimedia.org/w/index.php?curid=62640921
Attacked with
speech so abusive
that I cannot repeat it here.
Considered o.k. by court,
because she is politician
24. Steffen Staab 25
Exclusion:
• Not everyone has internet
• (functional) analphabets
Self censorship:
• legasthenic?
• Not reveal themselves
Gender discrimination
• Offline: gender quote for
presenters at Green Party
• Online: females trolled
more than males
Communication limited to
elites?
Online Participation: Always great?
Influence by
the Masses
Manipulation
of the
Masses
25. Steffen Staab 26
Typology of Online Participation
Main target
group in our
project
Not all
participation
is positive
26. Steffen Staab 27
• Mobilization
Party members become more active through online
• Replacement
Members replace traditional activity through online
• Reinforcement
Active members reinforce their participation relatively
• No Use
No change of behavior
Online participation model of Thuermer
Gefion Thuermer PhD thesis „The effect of
the introduction of online participation
processes in the Green Party Germany“
27. Steffen Staab 28
Survey Measurements
(in the social science sense)
Activity Tool Use
Activity
Increase
Mobilisation - + +
Reinforcement + + +
Replacement x + -
Non-Use x - x
Activity relates to institutional activity (various factors)
Tool Use - use of various tools;
Activity Increase compares actual against intended change of behavior
- indicates either no difference or negative correlation;
+ indicates a difference or positive correlation;
x indicates either (as it does not make a difference to the model).
(Thuermer 2018)
29. Steffen Staab 30
“I think the surveys were very accessible, so that everyone,
including old people, could participate. It stated clearly to
‚now click this link in the next line.’”
“Online participation is one way to improve inclusion. For
example, we have older people who are less mobile who
could participate through this route.”
„Our assemblies always happen at children’s bedtime. It sounds
trivial, but this highly specifically excludes parents. For polls,
discussions and so on, online participation would be really
great.”
“[Online Participation] is really good because it allows easy
access independent of people’s life and circumstances. For
example, shift workers who work at night and can then go
online and participate when they have the time.”
(Thuermer et al. WebSci 2018)
31. Steffen Staab 32
• Online participation generates data
• Data impact reality:
– Ex.: Online EU survey on whether to keep time change
• Which data do you want to have/?
– Democracy requires representativity
• Representative for party members (more college degrees)
• Representative for voters (more elderly)
• Representative for the people für (who represents kids?)
Online participation makes data become an actor
in the sense of actor-network theory
My optimistic point of view:
Many questions about „Correctness of
data“ are put forward, because they can
be put forward now!
Research task:
De-biasing as task of
Computer Science, ...
and society
33. Steffen Staab 34
Data:
- Linux Kernel Mailing List
2014
- LK Github Repository
2014-2015
Questions:
• Which shares have hobbyists vs firm-sponsored
developers?
• Is there a trend over the years?
• Are firm-sponsored developers more effective?
What motivates companies to Open Source?
Inclusive
and Fair
Exploitative
(Homscheid et al. 2016)
35. Steffen Staab 36
Sponsorship an important moderator
• Structural capital: degree (in+out)
• Relational capital: tie strength / weighted degree
• Cognitive: involvement on different mailing lists
(Homscheid 2019)
37. Steffen Staab 38
Preferential Attachment over Time
Prediction by Bianconi-Barabasi
Impossible to catch up? (Sun et al 18)
38. Steffen Staab 39
...and in reality: Time invariance
system time = network size ≠ real time
Well possible having a popular paper / Web site,
even if you come later
39. Steffen Staab 40
• Preferential attachment with decay of relevance 𝑅(𝑡 − 𝑡𝑖)
– Fewer citations as the paper gets older
– Π𝑖 ~ 𝜂𝑖 𝑘𝑖 𝑅(𝑡 − 𝑡𝑖)
• Exponential growth of the network size
– Holds empirically for publications!
– 𝑠 = 𝑒 𝛼𝑡
• Power-law degree growth of nodes with regard to their
ages.
– Holds empirically (see previous slide)
Core idea
age
fitnessdegree
40. Steffen Staab 41
fitness
Network growth: exponential or constant
Growth
𝒈~𝒆 𝝈𝒕
𝜎 ∈ ℝ
When (exponential)
growth can not be
uphold, some
assumptions will
have to give in
Academic reward
Economics
Etc.
42. Steffen Staab 43
India: Net neutrality extended
• No internet.org (by Facebook)
• Online-Shops must not sell their own stock
Inclusive and fair to achieve sustainability
• Also for the latecomers
Sustainability
44. Steffen Staab 45
• Documents
• Data
• Services
• Things
• People
• Assemblies of items
Toward a Web of Everything
(Berendt et al. 2019)
45. Steffen Staab 46
Somewhat intelligent, maybe
• biased
• unreliable
• unavailable
• unsafe
• unsecure
• unaccountable
Intelligences hosted on the Web
(Berendt et al. 2019)
Failure of the socio-technical system
47. Steffen Staab 48
• Web → AI
– ImageNet
• AI → Web
– Chatbots
– Virtual
Assistants
• Learning
from Web dialogues
• Safeguarding
Web and Artificial Intelligence
49. Steffen Staab 50
• Digital observation of everyday routines
• How data is created depends on
– technology, communities, ownership, markets, regulations,
rights, ...
• Just to describe these processes
and what they mean, we need
– Computer scientists, lawyers, political scientists,
sociologists, ...
Sociotechnical challenges: Datafication
Computer science tends to
underestimate the descriptive dimension
(Staab, Halford, Hall 2019)
50. Steffen Staab 51
• 3 billion people without internet
• no normative propositions
„The Web is `good‘ for everyone“
• Digital divide:
Internet benefitting the privileged –
rarely than the impoverished?
• Digital literacy: Understanding the Web as a system
– Everyone must know how a meme grows to avoid
misinformation!
– Not the case now!
Soziotechnical challenges: Digital Divide
(Staab, Halford, Hall 2019)
51. Steffen Staab 52
• Google, Facebook, Amazon,...
• 245,000 US$ to train XLNet
https://syncedreview.com/2019/06/27/the-staggering-cost-of-training-sota-ai-models/
• Who decides the bias?
– Representation of women on Wikipedia
• Edit-a-thons
• GNU for AI?
– (https://voice.mozilla.org/)
Who owns AI that steers the Web?
52. Steffen Staab 53
• Where are we now? What works well? And doesn‘t?
For whom, when and why?
• What are the possible futures for specific AI applications?
• What would have to happen to get us there?
• Diversifying the vision of the common good
– „good for someone else“
• Empowering participation in the future
• Bringing people back in - not as users or consumers, or
in terms of impact
but as part of the world we are building
Susan Halford (Exauguration speech 2019)
Democratizing futures: An utopy
54. Steffen Staab 55
Ipek Baris, Lukas Schmelzeisen, Steffen Staab (2019). CLEARumor at SemEval-2019 Task 7:
ConvoLving ELMo Against Rumors.
Bettina Berendt, Fabien Gandon, Susan Halford, Jim Hendler, Katharina Kinder-Kurlanda, Eirini
Ntoutsi, Steffen Staab. 10 Years of Web Science — Dagstuhl Manifesto, Manifesto from
Dagstuhl Perspectives Workshop 18262, in preparation 2019.
Petter Bae Brandtzaeg and Asbjørn Følstad (2017). Trust and distrust in online fact-checking services.
Communications of the ACM, 60(9), 65-71. doi: 10.1145/3122803.
Gorrell et al (2019). RumourEval 2019: Determining Rumour Veracity and Support for Rumours.
Dirk Homscheid, Mario Schaarschmidt, Steffen Staab: Firm-Sponsored Developers in Open Source
Software Projects: a Social Capital Perspective. ECIS 2016: Research-in-Progress Paper 12
Dirk Homscheid. Firm-Sponsored Developers in Open Source Software Projects: A Social Capital
Perspective. PhD thesis submitted at Universität Koblenz-Landau, 2019.
Steffen Staab, Susan Halford, Wendy Hall: Web science in Europe: beyond boundaries. Commun. ACM
62(4): 74 (2019)
Jun Sun, Steffen Staab, Fariba Karimi, Decay of Relevance in Exponentially Growing Networks. In:
Proc. of ACM WebSci ‘18. ACM.
Jun Sun, Matus Medoy, Steffen Staab. Time-invariant degree growth in preferential attachment network
models. Submitted 2019.
Gefion Thuermer „The effect of the introduction of online participation processes in the Green Party
Germany“, PhD thesis, University of Southampton 2019.
Thuermer, G., Roth, S., O'Hara, K., & Staab, S. Everybody thinks online participation is great – for
somebody else: A qualitative and quantitative analysis of perceptions and expectations of online
participation in the Green Party Germany. In Proc. of ACM WebSci 2018, pp. 287-296.
References
55. Steffen Staab 56Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Web and Internet Science Group · ECS · University of Southampton, UK &
Thanks to my many, many collaborators!
Ipek Baris, Lukas Schmelzeisen,
Jun Sun, Dirk Homscheid, Mario
Schaarschmidt (U Koblenz)
Gefion Thuermer (U of Southampton)
and all the others!
Editor's Notes
Web Futures - Inclusive, Intelligent, SustainableAbstract: Almost from its very beginning, the Web has been ambivalent.It has facilitated freedom for information, but this also included the freedom to spread misinformation.It has faciliated intelligent personalization, but at the cost of intrusion into our private lifes.It has included more people than any other system before, but at the risk of exploiting them.The Web is full of such ambivalences and the usage of artificial intelligences threatens to furtheramplify these ambivalences. To further the good and to contain the negative consequences,we need a research agenda studying and engineering the Web, as well as numerous activitiesby societies at large. In this talk, I will present and discuss a joint effort by an interdisciplinary teamof Web Scientists to prepare and pursue such an agenda.
When I read the welcome address I thought: this is Web Science!!!
Web Intelligence (WI) aims to achieve a multi-disciplinary balance between research advances in the fields of collective intelligence, data science, human-centric computing, knowledge management, and network science. It is committed to addressing research that deepens the understanding of computational, logical, cognitive, physical as well as business and social foundations of the future Web, and enables the development and application of intelligent technologies. WI’19 features high-quality, original research papers and real-world applications in all theoretical and technological areas that make up the field of WI.WI’19 welcomes research, application as well as Industry/Demo-Track paper submissions. Tutorial, Workshop and Special-Session proposals and papers are also welcome.
Encyclopedia -> ambiguity information quality -> misinformation/Ipek
Social Networks -> ???
-> Jun Sun
-> Gefion Thuermer
OSS -> ambiguity ??? -> Dirk HomScheid
Web Science is an interdisciplinary field of research dealing with the investigation of large, networked, socio-technical systems, in particular the World Wide Web.
Web Science investigates the relationship between humans and technology, the way society and technology constitute each other, and the effects of this constitution on society in the broader sense.
Web Science combines research from disciplines as diverse as sociology, computer science, economics, mathematics, and physics.
The Web is a boundary object
What do people see, when they see it
computer scientist: protocols
social scientist: communities
STS: actor-network theory
physicist: networks
law: just another domain
etc.
Web Science is an interdisciplinary field of research dealing with the investigation of large, networked, socio-technical systems, in particular the World Wide Web.
Web Science investigates the relationship between humans and technology, the way society and technology constitute each other, and the effects of this constitution on society in the broader sense.
Web Science combines research from disciplines as diverse as sociology, computer science, economics, mathematics, and physics.
https://netzpolitik.org/2018/das-eu-parlament-legt-einen-schleier-ueber-das-internet-votum-fuer-upload-filter-und-leistungsschutzrecht/
Es wünschen sich manche Digitalkommissare und EU-Abgeordneten ein technologische Wirklichkeit
Und bauen daraufhin Gesetze, die überhaupt nicht sinnvoll umgesetzt werden können
Eric Schmidt suggests you alter your scandalous behavior before you complain about his company invading your privacy. That's what the Google CEO told Maria Bartiromo during CNBC's big Google special last night, an extraordinary pronouncement for such a secretive guy.
The generous explanation for Schmidt's statement is that he's revolutionized his thinking since 2005, when he blacklisted CNET for publishing info about him gleaned from Google searches, including salary, neighborhood, hobbies and political donations. In that case, the married CEO must not mind all the coverage of his various reputed girlfriends; it's odd he doesn't clarify what's going on with the widely-rumored extramarital dalliances, though.
Google CEO Has Money for 'Dear Friend' of His Sometime Girlfriend
We heard Eric Schmidt was done with girlfriend Kate Bohner around the time he was seen again with…
Schmidt's philosophy is clear with Bartiromo in the clip below: "If you have something that you don't want anyone to know, maybe you shouldn't be doing it in the first place." The philosophy that secrets are useful mainly to indecent people is awfully convenient for Schmidt as the CEO of a company whose value proposition revolves around info-hoarding. Convenient, that is, as long as people are smart enough not to apply the "secrets suck" philosophy to their Google passwords , credit card numbers and various other secrets they need to put money in Google's pockets.
https://netzpolitik.org/2018/das-eu-parlament-legt-einen-schleier-ueber-das-internet-votum-fuer-upload-filter-und-leistungsschutzrecht/
Es wünschen sich manche Digitalkommissare und EU-Abgeordneten ein technologische Wirklichkeit
Und bauen daraufhin Gesetze, die überhaupt nicht sinnvoll umgesetzt werden können
Rumoureval dataset was too small and labels were too skewed. We also observed that annotations have errors. Twitter samples were majority. Overall, the competition was not so well prepared. 2- Stance detection model of CLEARumor is topic specific. Also, we didn’t perform well on this task too. For new events/topics it overfits and does not predict well as observed it in real-time implementation. Stoa method for stance detection uses time-series approach which first extracts conversation branches from tree-structured, and then fed into RNN networks. This method is also not feasible in real-time scenario. Popular user such as Trump could get 100s responses even in a hour. It is impossible to process all data. Additionally popular users use fake users to make a viral of their statement. If we only process small percent of conversations, it would be noisy because of fake users. So in my opinion, stoa RNN method is not good, leads wrong predictions. 3- We used user popularity for veracity model. It was easy for Twitter dataset but for reddit, it was not clear for us, we neglect that part for Reddit. For this reason, in our evaluations, we predicted well on Twitter, and takes us second place. But this is not domain independent approach. Currently i am working on:Masking named entities and learn event-invariant features, represent with BERT, word2vec, bag of words. (WIP)
Considering the following as future work:
For stance detection, I will add additional data which is stance detection of headline and fake news body (fake news detection dataset). Although task is stance detection, the way of annotation and data is different than rumoureval dataset. But if we can find a similarity/correlation between rumoureval, and fake news detection dataset. It would be useful to learn topic/domain dependent features.
Refer to Mor Namaan
And who is everyone?
How about the more than 1 million Brits living in Europe, but not in UK – a formal procedure was missed and hence they were not asked for the Leave referendum
How about the students not being at their registered homes during summer break – they were less likely to vote
Swiss ask themselves whether they should do this, because of their direct democracy!
In diesem Fall hat sich der Twitterer entschuldigt – wohl auch weil er einen Shitstorm erntete
In diesem Fall hat sich der Twitterer entschuldigt – wohl auch weil er einen Shitstorm erntete
In diesem Fall hat sich der Twitterer entschuldigt – wohl auch weil er einen Shitstorm erntete
Activity is: degree of involvement
Agency: self-determined or not
Valence: positive or negative
For activities are defined according to columns
Table 23 in thesis:
For each combination of tools (major rows) and effects (major columns), the set of variables
moderating the effect is given on the left. The details given on the right of each variable signify the
type of effect, with ‘V’ indicating the variable value, and ‘E’ the effect. ‘Age + +‘ can be read as
‘when Age is higher, the effect is more likely’.
H5b: can be confirmed: There was evidence for all four effects for all three tools, but affecting
different, in some instances contradicting groups.
H6a: can be partially confirmed: Behaviour did differ by age for the Befragung, with younger
members indeed showing reinforcement; however, for Antragsgrün, younger members were
more likely to show replacement behaviour, while older members showed non‐use.
H6b: can be confirmed in principle, but must be rejected in particulars: IT skills were not significant
for any of the identified behaviours. While the views of the tools and their benefit were
indeed highly significant, their effects work in different directions: Familiarity indicated
replacement for the Befragung, but reinforcement for the Begehren. Less familiarity
indicated non‐use for all tools.
H6c: must be rejected: Neither income not occupation were significant predictors for any of the
tools, let alone effects.
H6d: can be confirmed in principle, but must be rejected in particulars: While participation
preferences were indeed linked to a mobilisation effect in the Begehren, this was members
who preferred the maximum participation intensity rather than the participation type of
votes.
Web Intelligence conference is also about Network Science,
Hence I thought that I would throw in some of that perspective, too
Right figure:
Same popularity of all three nodes
The nodes coming later need 1 resp 2 orders of magnitude longer to reach same degree
Intelligent monitoring – warning system was shortcut, because it produced too many false warnings –
Until one warning was true
How we operate our password protected environments
https://www.tagesspiegel.de/gesellschaft/panorama/rheinland-pfalz-technischer-defekt-loeste-brand-in-ice-aus-zugstrecke-gesperrt/23183362.html
One friend said: there was probably no woman in the team who built Tay