Slides presented at ICWE 2011: Learning Semantic Relationships between Entities in Twitter
Supporting web site: http://wis.ewi.tudelft.nl/icwe2011/relation-learning/
Powerful Google developer tools for immediate impact! (2023-24 C)
Learning Semantic Relationships between Entities in Twitter
1. Learning Semantic Relationships between Entities in Twitter ICWE, Cyprus, June 22, 2011 IlknurCelik, Fabian Abel, Geert-Jan Houben Web Information Systems, TU Delft
2. What we do: Science and Engineering for the Personal Web domains: news social mediacultural heritage public datae-learning Personalized Recommendations Personalized Search Adaptive Systems Analysis and User Modeling Semantic Enrichment, Linkage and Alignment user/usage data Social Web
11. Page 60!! Music Artist Next Saturday @thatsimpsonguyaka Guilty Simpson will be performing at Area51 in my hometwonEindhoven. #realliveshit #iwillspinrecords about 9 hours ago via Blackberry tweet I was looking for Locations
12. Is there an easier way?Faceted Search can help Current Query: Expand Query: Results: Yskiddd: Next saturday@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit#iwillspinrecords2 Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents Eindhoven Music Locations more... Events more... Music Artists: + Guilty Simpson + Bryan Adams + Elton John + Golden Earring + Rihanna + The eagles + 3 Doors Down more...
13. Location: Eindhoven Music Artist: Guilty Simpson Location: Area51 Semantic relationships between entities are essential to realize such applications.
17. Relation Learning Strategies entities time period Relation: relation(e1, e2, type, tstart, tend, weight) RelationLearningstrategy: Input:entity e1 and e2, time period (tstart, tend) Challenge:inferweightand type of the relationfor the given Weightingaccording to co-occurrence frequency: Tweet-based: count co-occurrence in tweets News-based: count co-occurrence in news Tweet-News-based: count co-occurrence in both tweets and news type/label of relation relatedness
18. Research Questions Which strategy performs best in detecting relationships between entities? Does the accuracy depend on the type of entities which are involved in a relation? How do the strategies perform for discovering relationships which have temporal constraints (trending relationships)?
19. Dataset more than: 20,000 Twitter users 2 months 10,000,000 WikiLeaks founder, Julian Assange, under arrest in London tweets 75,000 news time Dec 15 Jan 15 Nov 15
21. Tweets and news articles per day 50,000-400,000 tweets per day 100-1000 news articles per day
22. Entities referenced per day 10,000-100,000 entity ref. in tweets per day 5,000-20,000 entity ref. in news per day ~40% tweets do not mention any (recognizable) entity 72.6% of the top 1000 mentioned entities in Twitter are also mentioned in the mainstream news media 99.3% of the news articles mention at least one (recognizable) entity
25. Our Ground Truth of true relations Based on DBpedia: We mapped entities to their corresponding DBpedia resources No appropriate DBpedia URIs for more than 35% of the entities We analyzed whether there is a direct relation between two entities Based on user study: Participants judged whether two entities are really: related (62.6% were rated as related) related in the given time period (57.3% were rated as related) Overall: 2588 judgments Thank you!
26. 1. Which strategy performs best in detecting relationships between entities?
27. Accuracy of relation discovery Combining both tweet-based and news-based strategies allows for highest accuracy Based on user study Based on DBpedia
29. 2. Does the accuracy depend on the type of entities which are involved in a relation?
30. Does the accuracy depend on the type of entities? 87% precision 92% Relationships which involve events can be discovered with high precision 26% precision 23%
31. Does the accuracy depend on the type of entities? (cont.) Relationships between events can be detected with highest precision. Relationships between persons/groups are difficult to detect.
32. 3. How do the strategies perform for discovering relationships which have temporal constraints?
33. Relationships with temporal constraints Tweet-based strategy performs better in discovering relationships that are valid only for a specific period in time
34. Where do relationships emerge faster? Speed of strategies is domain-dependent time difference (in days) of first occurrence of relationship News is faster Twitter is faster
35. Conclusions and Future Work What we did: relation discovery framework based on Twitter Findings: Strategy that considers both tweets and (linked) news articles allows for highest accuracy Performance varies for different domains (e.g. event-relationships can be detected with highest precision) Tweet-based strategy allows for detecting relationships, which have a restricted temporal validity, with high precision (and fast) Ongoing work: Adaptive Faceted Search on Twitter http://wis.ewi.tudelft.nl/tweetum/
36. Relation Discovery for Adaptive Faceted Search Current Query: 2. Analyze (temporal)relationships of entities that appear in the user profile to adapt facet ranking. Expand Query: Results: Yskiddd: Next saturday@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit#iwillspinrecords2 Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents Eindhoven Music Locations more... Events more... Music Artists: + Guilty Simpson + Bryan Adams + Elton John + Golden Earring + Rihanna + The eagles + 3 Doors Down more... 1. Analyze (temporal)relationships of entities of the “current query” to adapt facet ranking. user
38. The Social Web Help me to tackle the information overload! Who is this? What are his personal demands? How can we make him happy? Recommend me news articles that now interest me! Help me to find interesting (social) media! Give me personalized support when I do my online training! Personalize my Web experience! Do not bother me with advertisements that are not interesting for me!
entity extraction and semantic enrichment and relation discovery.
large dataset of more than 10 million tweets and 70,000 news articles
100-1000 news articles per day50,000 and 400,000 tweets per dayTwo of the minima were caused by temporary unavailability of the Twitter monitoring service.
approximately 10,000-100,000 entity references per day for tweetsapproximately 5,000-20,000 entity references per day for News~40% of the tweets had no entities99.3% of the news articles had at least one entityoverlap of entities: 72.6% of the top 1000 mentioned entities in Twitter are also mentioned in the news media.
39 different entity typesPersons, locations and organizations were mentioned most often, followed by movies, music albums, sport events and political eventswe analyzed specific types of relations such as relationships between persons and locations or organizations and events in detail
Person/Group-Event relationships cover relations between persons and political events, persons and sport events, organizations and sport events, etcinteresting to see that the Tweet+News-based strategy discovers relationships, between persons/groups and events with higher precision 0.92 and 0.87 regarding P@10 and P@20 than people's relations to products (0.23 and 0.26) or locations (0.73 and 0.6).
relations between entities that are of the same typerelationships between two events can also be discovered with high precision, followed by relations between locations...
Twitter is more appropriate for inferring relationships, which have temporal constraints, than the news media.Tweet-based strategy improves precision (P@5) by 22.7%
relationships between persons and movies or music albums emerge much faster (14.7 and 5.1 days respectively) in Twitter than in the traditional news media.
Our framework extracts typed entities from enriched tweets/news and provides strategies for detecting semantic (trending) relationships between entities. We:investigated the precision and recall of the relation detection strategies,analyzed how the strategies perform for each type of relationships andWhich strategy performs best in detecting relationships between entities?Does the accuracy depend on the type of entities which are involved in a relation?How do the strategies perform for discovering relationships which have temporal constraints, and how fast can the strategies detect (trending) relationships?evaluated the quality and speed for discovering trending relationships that possibly have a limited temporal validity.