SlideShare a Scribd company logo
1 of 56
Information Retrieval and Social Media
Prof.dr.ir. Arjen P. de Vries
arjen@acm.org
Lecture for the User-Centred Social Media Summer School
Duisburg, September 19, 2017
Social Media
Noun
social media (uncountable)
Interactive forms of media that allow users to interact with and publish to
each other, generally by means of the Internet.
The early 21st century saw a huge increase in social media thanks to the widespread availability of the
Internet.
Social Media
 “Social bookmarking” sites
 “User generated content”
- Images (flickr) and videos (youtube, vimeo), but also blogs, Wikipedia, etc.
 Social network services
- Twitter, facebook, instagram, snapchat
Not just one beast!
User contributed content
Permission based tagging, Set model
Bag model
Global Content
Free for all tagging
Social Media to help improve IR (1)
‘Co-creation’
 Social Media:
- Consumer becomes a co-creator
- Many ‘data consumption’ traces in social media are public
Richer information representations
Richer information representations
 User profiles
- User name, full name, description, image, homepage url, etc.
 Connections between users
- Networks of friends, followers, etc
 Comments/reactions
 Endorsing and sharing
E.g., Twitter
 Bio
- Often includes a geo-location of the profile
 Friends
 Followers
 Lists
- Groups followed Twitter accounts; lists can be followed
 Hashtags
 Mentions
User Demographics
 Gender from Tweet author’s first name
 Geographic location from profile
Diaz, Gamon, Hofman, Kiciman, Rothschild. Online and Social Media as an Imperfect Continuous
Panel Survey. In PLOS ONE, 2016
Detailed User Characteristics…
de Volkskrant, March 13, 2013
Michal Kosinski, David Stillwell, and
Thore Graepel. Private traits and
attributes are predictable from digital
records of human behavior. PNAS
2013.
Youyou, W., Kosinski, M. & Stillwell, D.
(2015) Computer-based personality
judgments are more accurate than
those made by humans. PNAS 2015.
… in Search
 Age and Gender, and perhaps also political and religious
views
 Maps both Page Likes from myPersonality dataset and
search results on a common space of ODP categories
 Learning approach to overcome the difference in
distribution between myPersonality data and Search data
- E.g., their FB dataset has 63% female, vs. only 47% in Bing
Bi, Kosinski, Shokouhi, Graepel. Inferring the Demographics of Search Users. WWW 2013
Many Opportunities for IR
 Expand content representation
 Reduce the vocabulary gap(s) between creators of
content (the indexers) and consumers of content (the
users)
 More diverse views on the same content
LibraryThing
 Items
 People
 Tags
 Ratings
Synonyms
Synonyms
Dissimilar users…
… with similar items
(Pearson Correlation)
Note: this representation ignored the item ratings
Examples
• Humour
• Classic
IR to help improve Social Media
LibraryThing – beyond terms
 Items
 People
 Tags
 Ratings
Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders. The task
dependent effect of tags and ratings on social media access. TOIS 28, 4, article
21 (November 2010), 42 pages.
Search with Random Walk
 Present nodes according to estimated probability that a
random walk that starts from (task dependent) starting
nodes, would end at this node
Tagging Relationships
Note: this representation used the item ratings in the user – item transitions
An item recommendation walk
Personalized Search
 Assume a user who types a single tag as query
 A soft clustering effect smoothly relates similar concepts
before converging to the background probability
 Homographs like “Java” are disambiguated because the
walk starts in both the query tag and the target user
- So, content that matches the user’s preference is more likely to
be found first
Expert Finding on Twitter
 Empirical evidence demonstrates that a mix of tweet text,
friends, followers and lists is most effective to infer
expertise
 Expertise ground truth taken from Quora, where (many)
users list their expertise and their social media accounts
Xu, Zhou and Lawless. Inferring your expertise from Twitter: combining multiple types of user activity.
WI ‘2017
Multiple Social Networks
 Accounts linked via services like about.me and Quora
 Users explicitly list their multiple accounts in one profile
 Missing data addressed via non-negative matrix
factorization (NMF)
- E.g., 57% list school in FB, 81% in LinkedIn
 Applied to various prediction tasks, e.g.,
topics users are interesting in
Social Media to help improve IR (2)
Relevant for Search… (1/4)
 Wikipedia contains semantically very rich annotations:
- Wikipedia Categories, Lists
- Times (1930, 1931, 1932, etc. etc.)
- Disambiguation pages
- Edit history
Etc.
Note: DBPedia is “just” Wikipedia 
Relevant for Search… (2/4)
 “Twanchor text”
- Tweets citing online media can be used as additional resources
describing the content, just like anchor text
Relevant for Search… (3/4)
 Geotags / POIs
- Recommend geo-locations to people
- Recommend people to geo-locations
- Predict a user’s whereabouts (or “trails”)
Relevant for Search… (4/4)
 Timestamps
- Helps reveal trends, e.g., which documents went viral?
- Allows to search “in the past”
Searching the Social Web
 Do not improve Web search with social annotations, but
improve search in Social
 Builds on the observation in prior work (Goel et al., 2016)
that virality is really different from popularity
- The most viral content is often distinct from the most popular
content being shared online
- Can we surface that content more easily?
Alonso, Kandylas, Tremblay, Hofman, Sen. What’s Happening and What Happened: Searching the
Social Web. WebSci ‘17.
Pipeline
 Content selection:
- Select tweets that contain links and satisfy simple user, content
and time range criteria
 User selection:
- Extract and normalize links and select those that have been
shared by a minimum number of trusted users
 Link selection:
- Clean-up links, compute link virality and popularity, cluster
similar links, and apply heuristic criteria to select good quality
links
 Annotations:
- Generate metadata for the selected links from the associated
tweets
Collecting Data
API Blues
Bit.ly API used in my own research:
/v3/link/content
deprecated
Note: This endpoint was deprecated on 10/15/2014.
API Blues
 The combination of rate limits and Terms of Service of
most social media platforms complicates our life
 Not even to mention volume
- TREC Microblog collection of 2013 “Tweets2013” consists of
107 GB compressed (for only 2 months of data!)
 Did I mention ToS?
- Mandatory continual processing of deletions…
Good News for Twitter
 The Internet Archive distributes two collections from 2013
that can be used as drop-in replacement for evaluation
purposes
 Deletions seem to affect non-relevant documents more
than relevant documents
Sequira and Lin. Finally, a Downloadable Test Collection of Tweets. SIGIR 2017.
Social Media as Panel Survey
 Online population is a non-representative sample of the
off-line world
 Demographic skew and user participation is non-
stationary and difficult to predict over time
- E.g., women are underrepresented in the raw volume of tweets,
but tweet more often about politics than men
- Half of the activity on a specific debate came from individuals
who had not previously posted about the election
Diaz, Gamon, Hofman, Kiciman, Rothschild. Online and Social Media as an Imperfect Continuous
Panel Survey. In PLOS ONE, 2016
Fred Morstatter, Jürgen Pfeffer, Huan Liu and
Kathleen M. Carley. Is the Sample Good
Enough? Comparing Data from Twitter’s
Streaming API with Twitter’s Firehose.
ICWSM 2013
API Blues
Take home message(s)
Take home message(s)
• Social media give access to a rich resource of context
- Including time & location!
Take home message(s)
• Social media give access to a rich resource of context
- Including time & location!
• The academic’s alternative to click data?
Take home message(s)
• Social media give access to a rich resource of context
- Including time & location!
• The academic’s alternative to click data?
• A big open research question:
Can one theory (about matching users and content) address the
complete spectrum of IR tasks that arise in social media?

More Related Content

What's hot

WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.`Shweta Bhavsar
 
Information retrieval 16 latent semantic indexing model
Information retrieval 16  latent semantic indexing modelInformation retrieval 16  latent semantic indexing model
Information retrieval 16 latent semantic indexing modelVaibhav Khanna
 
INTERNET AND E-MAIL IN ‎LIBRARIES
INTERNET AND E-MAIL IN ‎LIBRARIESINTERNET AND E-MAIL IN ‎LIBRARIES
INTERNET AND E-MAIL IN ‎LIBRARIESLibcorpio
 
Η έννοια του όρου "θρησκεία" στην πρωτοχριστιανική γραμματεία
Η έννοια του όρου "θρησκεία" στην πρωτοχριστιανική γραμματείαΗ έννοια του όρου "θρησκεία" στην πρωτοχριστιανική γραμματεία
Η έννοια του όρου "θρησκεία" στην πρωτοχριστιανική γραμματείαDr. Georgios Gaitanos
 
Library and Information Science As a Career Choice
Library and Information Science As a Career ChoiceLibrary and Information Science As a Career Choice
Library and Information Science As a Career ChoiceDavid Nzoputa Ofili
 
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data:  Technical Introduction to BigSheets for InfoSphere BigInsightsBig Data:  Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsightsCynthia Saracco
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Zainul Sayed
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process Shuvra Ghosh
 
4. Με τη λατρεία εκφράζουμε την πίστη μας
4. Με τη λατρεία εκφράζουμε την πίστη μας4. Με τη λατρεία εκφράζουμε την πίστη μας
4. Με τη λατρεία εκφράζουμε την πίστη μαςΠΕ 01 ΜΠΑΛΤΟΣ ΙΩΑΝΝΗΣ
 
Technical services presentation
Technical services presentation Technical services presentation
Technical services presentation Ali Hassan Maken
 

What's hot (20)

WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
 
Precis
PrecisPrecis
Precis
 
デジタル田園都市国家構想推進交付金について
デジタル田園都市国家構想推進交付金についてデジタル田園都市国家構想推進交付金について
デジタル田園都市国家構想推進交付金について
 
Information retrieval 16 latent semantic indexing model
Information retrieval 16  latent semantic indexing modelInformation retrieval 16  latent semantic indexing model
Information retrieval 16 latent semantic indexing model
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
INTERNET AND E-MAIL IN ‎LIBRARIES
INTERNET AND E-MAIL IN ‎LIBRARIESINTERNET AND E-MAIL IN ‎LIBRARIES
INTERNET AND E-MAIL IN ‎LIBRARIES
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Dbms
DbmsDbms
Dbms
 
Η έννοια του όρου "θρησκεία" στην πρωτοχριστιανική γραμματεία
Η έννοια του όρου "θρησκεία" στην πρωτοχριστιανική γραμματείαΗ έννοια του όρου "θρησκεία" στην πρωτοχριστιανική γραμματεία
Η έννοια του όρου "θρησκεία" στην πρωτοχριστιανική γραμματεία
 
Information Retrieval Evaluation
Information Retrieval EvaluationInformation Retrieval Evaluation
Information Retrieval Evaluation
 
POPSI
POPSIPOPSI
POPSI
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
Library and Information Science As a Career Choice
Library and Information Science As a Career ChoiceLibrary and Information Science As a Career Choice
Library and Information Science As a Career Choice
 
National information policy
National information policyNational information policy
National information policy
 
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data:  Technical Introduction to BigSheets for InfoSphere BigInsightsBig Data:  Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
4. Με τη λατρεία εκφράζουμε την πίστη μας
4. Με τη λατρεία εκφράζουμε την πίστη μας4. Με τη λατρεία εκφράζουμε την πίστη μας
4. Με τη λατρεία εκφράζουμε την πίστη μας
 
Technical services presentation
Technical services presentation Technical services presentation
Technical services presentation
 
Term weighting
Term weightingTerm weighting
Term weighting
 

Similar to Information Retrieval and Social Media

Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Social media in Research, friend or foe?
Social media in Research, friend or foe?Social media in Research, friend or foe?
Social media in Research, friend or foe?Fiona Still-Drewett
 
Il laboratorio aperto: limiti e possibilità dell’uso di Facebook, Twitter e Y...
Il laboratorio aperto: limiti e possibilità dell’uso di Facebook, Twitter e Y...Il laboratorio aperto: limiti e possibilità dell’uso di Facebook, Twitter e Y...
Il laboratorio aperto: limiti e possibilità dell’uso di Facebook, Twitter e Y...Manolo Farci
 
Utilizing Social Media to Understand People
Utilizing Social Media to Understand PeopleUtilizing Social Media to Understand People
Utilizing Social Media to Understand PeopleBenjamin Smithee
 
WEBINAR: Joining the "buzz": the role of social media in raising research vi...
WEBINAR:  Joining the "buzz": the role of social media in raising research vi...WEBINAR:  Joining the "buzz": the role of social media in raising research vi...
WEBINAR: Joining the "buzz": the role of social media in raising research vi...HELIGLIASA
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaArjen de Vries
 
Joining the ‘buzz’ : the role of social media in raising research visibility ...
Joining the ‘buzz’ : the role of social media in raising research visibility ...Joining the ‘buzz’ : the role of social media in raising research visibility ...
Joining the ‘buzz’ : the role of social media in raising research visibility ...Eileen Shepherd
 
A review for the online social networks literature
A review for the online social networks literatureA review for the online social networks literature
A review for the online social networks literatureAlexander Decker
 
A review for the online social networks literature
A review for the online social networks literatureA review for the online social networks literature
A review for the online social networks literatureAlexander Decker
 
The Implementation of Social Media for Educational Objectives
The Implementation of Social Media for Educational ObjectivesThe Implementation of Social Media for Educational Objectives
The Implementation of Social Media for Educational Objectivestheijes
 
Research-Open Access-Social Media: a winning combination
Research-Open Access-Social Media: a winning combinationResearch-Open Access-Social Media: a winning combination
Research-Open Access-Social Media: a winning combinationRhodes University Library
 
Science communication via social media
Science communication via social mediaScience communication via social media
Science communication via social mediaSimon Schneider
 
Science communication via social media
Science communication via social mediaScience communication via social media
Science communication via social mediaSimon Schneider
 
Impact & Interaction: social media as part of communication strategy for rese...
Impact & Interaction: social media as part of communication strategy for rese...Impact & Interaction: social media as part of communication strategy for rese...
Impact & Interaction: social media as part of communication strategy for rese...Esther De Smet
 
Vu M Kloos 20071116
Vu M Kloos 20071116Vu M Kloos 20071116
Vu M Kloos 20071116Martin Kloos
 
The Social Mind Study
The Social Mind StudyThe Social Mind Study
The Social Mind StudyDon Bulmer
 
The Social Mind Research Study
The Social Mind Research StudyThe Social Mind Research Study
The Social Mind Research StudyLeader Networks
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Biasgloriakt
 

Similar to Information Retrieval and Social Media (20)

Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Social media in Research, friend or foe?
Social media in Research, friend or foe?Social media in Research, friend or foe?
Social media in Research, friend or foe?
 
Il laboratorio aperto: limiti e possibilità dell’uso di Facebook, Twitter e Y...
Il laboratorio aperto: limiti e possibilità dell’uso di Facebook, Twitter e Y...Il laboratorio aperto: limiti e possibilità dell’uso di Facebook, Twitter e Y...
Il laboratorio aperto: limiti e possibilità dell’uso di Facebook, Twitter e Y...
 
Utilizing Social Media to Understand People
Utilizing Social Media to Understand PeopleUtilizing Social Media to Understand People
Utilizing Social Media to Understand People
 
WEBINAR: Joining the "buzz": the role of social media in raising research vi...
WEBINAR:  Joining the "buzz": the role of social media in raising research vi...WEBINAR:  Joining the "buzz": the role of social media in raising research vi...
WEBINAR: Joining the "buzz": the role of social media in raising research vi...
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
 
Joining the ‘buzz’ : the role of social media in raising research visibility ...
Joining the ‘buzz’ : the role of social media in raising research visibility ...Joining the ‘buzz’ : the role of social media in raising research visibility ...
Joining the ‘buzz’ : the role of social media in raising research visibility ...
 
Helig webinar 6 nov_2014
Helig webinar 6 nov_2014Helig webinar 6 nov_2014
Helig webinar 6 nov_2014
 
A review for the online social networks literature
A review for the online social networks literatureA review for the online social networks literature
A review for the online social networks literature
 
A review for the online social networks literature
A review for the online social networks literatureA review for the online social networks literature
A review for the online social networks literature
 
The Implementation of Social Media for Educational Objectives
The Implementation of Social Media for Educational ObjectivesThe Implementation of Social Media for Educational Objectives
The Implementation of Social Media for Educational Objectives
 
A research paper on Twitter_Intrinsic versus image related utility in social ...
A research paper on Twitter_Intrinsic versus image related utility in social ...A research paper on Twitter_Intrinsic versus image related utility in social ...
A research paper on Twitter_Intrinsic versus image related utility in social ...
 
Research-Open Access-Social Media: a winning combination
Research-Open Access-Social Media: a winning combinationResearch-Open Access-Social Media: a winning combination
Research-Open Access-Social Media: a winning combination
 
Science communication via social media
Science communication via social mediaScience communication via social media
Science communication via social media
 
Science communication via social media
Science communication via social mediaScience communication via social media
Science communication via social media
 
Impact & Interaction: social media as part of communication strategy for rese...
Impact & Interaction: social media as part of communication strategy for rese...Impact & Interaction: social media as part of communication strategy for rese...
Impact & Interaction: social media as part of communication strategy for rese...
 
Vu M Kloos 20071116
Vu M Kloos 20071116Vu M Kloos 20071116
Vu M Kloos 20071116
 
The Social Mind Study
The Social Mind StudyThe Social Mind Study
The Social Mind Study
 
The Social Mind Research Study
The Social Mind Research StudyThe Social Mind Research Study
The Social Mind Research Study
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Bias
 

More from Arjen de Vries

Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Arjen de Vries
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Arjen de Vries
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Arjen de Vries
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMMArjen de Vries
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsArjen de Vries
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master SpecialisationArjen de Vries
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part IIArjen de Vries
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with SparkArjen de Vries
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelArjen de Vries
 
The personal search engine
The personal search engineThe personal search engine
The personal search engineArjen de Vries
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeArjen de Vries
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Arjen de Vries
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseArjen de Vries
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Arjen de Vries
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by StrategyArjen de Vries
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?Arjen de Vries
 

More from Arjen de Vries (20)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by Strategy
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?
 

Recently uploaded

G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Recently uploaded (20)

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

Information Retrieval and Social Media

  • 1. Information Retrieval and Social Media Prof.dr.ir. Arjen P. de Vries arjen@acm.org Lecture for the User-Centred Social Media Summer School Duisburg, September 19, 2017
  • 2. Social Media Noun social media (uncountable) Interactive forms of media that allow users to interact with and publish to each other, generally by means of the Internet. The early 21st century saw a huge increase in social media thanks to the widespread availability of the Internet.
  • 3. Social Media  “Social bookmarking” sites  “User generated content” - Images (flickr) and videos (youtube, vimeo), but also blogs, Wikipedia, etc.  Social network services - Twitter, facebook, instagram, snapchat
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. Not just one beast!
  • 11. Bag model Global Content Free for all tagging
  • 12. Social Media to help improve IR (1)
  • 13. ‘Co-creation’  Social Media: - Consumer becomes a co-creator - Many ‘data consumption’ traces in social media are public
  • 15. Richer information representations  User profiles - User name, full name, description, image, homepage url, etc.  Connections between users - Networks of friends, followers, etc  Comments/reactions  Endorsing and sharing
  • 16. E.g., Twitter  Bio - Often includes a geo-location of the profile  Friends  Followers  Lists - Groups followed Twitter accounts; lists can be followed  Hashtags  Mentions
  • 17. User Demographics  Gender from Tweet author’s first name  Geographic location from profile Diaz, Gamon, Hofman, Kiciman, Rothschild. Online and Social Media as an Imperfect Continuous Panel Survey. In PLOS ONE, 2016
  • 18. Detailed User Characteristics… de Volkskrant, March 13, 2013 Michal Kosinski, David Stillwell, and Thore Graepel. Private traits and attributes are predictable from digital records of human behavior. PNAS 2013. Youyou, W., Kosinski, M. & Stillwell, D. (2015) Computer-based personality judgments are more accurate than those made by humans. PNAS 2015.
  • 19. … in Search  Age and Gender, and perhaps also political and religious views  Maps both Page Likes from myPersonality dataset and search results on a common space of ODP categories  Learning approach to overcome the difference in distribution between myPersonality data and Search data - E.g., their FB dataset has 63% female, vs. only 47% in Bing Bi, Kosinski, Shokouhi, Graepel. Inferring the Demographics of Search Users. WWW 2013
  • 20. Many Opportunities for IR  Expand content representation  Reduce the vocabulary gap(s) between creators of content (the indexers) and consumers of content (the users)  More diverse views on the same content
  • 23. Synonyms Dissimilar users… … with similar items (Pearson Correlation) Note: this representation ignored the item ratings
  • 24.
  • 26. IR to help improve Social Media
  • 27. LibraryThing – beyond terms  Items  People  Tags  Ratings
  • 28. Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders. The task dependent effect of tags and ratings on social media access. TOIS 28, 4, article 21 (November 2010), 42 pages.
  • 29. Search with Random Walk  Present nodes according to estimated probability that a random walk that starts from (task dependent) starting nodes, would end at this node
  • 31. Note: this representation used the item ratings in the user – item transitions
  • 33. Personalized Search  Assume a user who types a single tag as query
  • 34.  A soft clustering effect smoothly relates similar concepts before converging to the background probability
  • 35.  Homographs like “Java” are disambiguated because the walk starts in both the query tag and the target user - So, content that matches the user’s preference is more likely to be found first
  • 36. Expert Finding on Twitter  Empirical evidence demonstrates that a mix of tweet text, friends, followers and lists is most effective to infer expertise  Expertise ground truth taken from Quora, where (many) users list their expertise and their social media accounts Xu, Zhou and Lawless. Inferring your expertise from Twitter: combining multiple types of user activity. WI ‘2017
  • 37. Multiple Social Networks  Accounts linked via services like about.me and Quora  Users explicitly list their multiple accounts in one profile  Missing data addressed via non-negative matrix factorization (NMF) - E.g., 57% list school in FB, 81% in LinkedIn  Applied to various prediction tasks, e.g., topics users are interesting in
  • 38. Social Media to help improve IR (2)
  • 39. Relevant for Search… (1/4)  Wikipedia contains semantically very rich annotations: - Wikipedia Categories, Lists - Times (1930, 1931, 1932, etc. etc.) - Disambiguation pages - Edit history Etc. Note: DBPedia is “just” Wikipedia 
  • 40. Relevant for Search… (2/4)  “Twanchor text” - Tweets citing online media can be used as additional resources describing the content, just like anchor text
  • 41. Relevant for Search… (3/4)  Geotags / POIs - Recommend geo-locations to people - Recommend people to geo-locations - Predict a user’s whereabouts (or “trails”)
  • 42. Relevant for Search… (4/4)  Timestamps - Helps reveal trends, e.g., which documents went viral? - Allows to search “in the past”
  • 43. Searching the Social Web  Do not improve Web search with social annotations, but improve search in Social  Builds on the observation in prior work (Goel et al., 2016) that virality is really different from popularity - The most viral content is often distinct from the most popular content being shared online - Can we surface that content more easily? Alonso, Kandylas, Tremblay, Hofman, Sen. What’s Happening and What Happened: Searching the Social Web. WebSci ‘17.
  • 44.
  • 45. Pipeline  Content selection: - Select tweets that contain links and satisfy simple user, content and time range criteria  User selection: - Extract and normalize links and select those that have been shared by a minimum number of trusted users  Link selection: - Clean-up links, compute link virality and popularity, cluster similar links, and apply heuristic criteria to select good quality links  Annotations: - Generate metadata for the selected links from the associated tweets
  • 46.
  • 48. API Blues Bit.ly API used in my own research: /v3/link/content deprecated Note: This endpoint was deprecated on 10/15/2014.
  • 49. API Blues  The combination of rate limits and Terms of Service of most social media platforms complicates our life  Not even to mention volume - TREC Microblog collection of 2013 “Tweets2013” consists of 107 GB compressed (for only 2 months of data!)  Did I mention ToS? - Mandatory continual processing of deletions…
  • 50. Good News for Twitter  The Internet Archive distributes two collections from 2013 that can be used as drop-in replacement for evaluation purposes  Deletions seem to affect non-relevant documents more than relevant documents Sequira and Lin. Finally, a Downloadable Test Collection of Tweets. SIGIR 2017.
  • 51. Social Media as Panel Survey  Online population is a non-representative sample of the off-line world  Demographic skew and user participation is non- stationary and difficult to predict over time - E.g., women are underrepresented in the raw volume of tweets, but tweet more often about politics than men - Half of the activity on a specific debate came from individuals who had not previously posted about the election Diaz, Gamon, Hofman, Kiciman, Rothschild. Online and Social Media as an Imperfect Continuous Panel Survey. In PLOS ONE, 2016
  • 52. Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley. Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. ICWSM 2013 API Blues
  • 54. Take home message(s) • Social media give access to a rich resource of context - Including time & location!
  • 55. Take home message(s) • Social media give access to a rich resource of context - Including time & location! • The academic’s alternative to click data?
  • 56. Take home message(s) • Social media give access to a rich resource of context - Including time & location! • The academic’s alternative to click data? • A big open research question: Can one theory (about matching users and content) address the complete spectrum of IR tasks that arise in social media?