SlideShare a Scribd company logo
1 of 17
Discovering Context: Classifying
  tweets through a semantic
transform based on Wikipedia
  Yegin Genc, Yasuaki Sakamoto, and
         Jeffrey V. Nickerson
"So I'm told by a reputable      “I hate how my phone has
  person they have killed           this stupid … spell check
  Osama Bin Laden. …"               …”



 Twitter to function as a large sensor system, and can increase
               our awareness of our surroundings
Discovering Context: Classifying tweets
through a semantic transform based on
              Wikipedia



           Why classify?
"So I'm told by a reputable               “I hate how my phone has
  person they have killed                    this stupid … spell check
  Osama Bin Laden. …"                        …”

                 important
                             important
    important

                Terrorism                       (?) Irritating technology
  important                   important




                                                   NOT IMPORTANT
How to classify?

      message

                transform

                                   distance(T(m1), T(m2))


                transform

      message

d(message1, message2) α d(T(message1),T( message2))
A Two-Step Approach

                                    Wiki Page 1
                                      (WP1)
                                                                d12

                                  Wiki Page 2            WP1          WP2
  Tweet 1
                                    (WP2)
 Tweet 2                                                                           d2n
                                Wiki Page 3                    d13
Tweet 3                           (WP3)                               d32
                                                                             d1n
    .
    .                                  .                       WP3
    .                                  .
                                                                            d3n    WPn
                                       .
 Tweet n
                                  Wiki Page n
                                    (WPn)




                 STEP 1:                                 STEP 2:
           FINDING WIKI PAGES                     CALCULATING DISTANCE
Step – 1: Finding Wiki Pages

                                     Candidate
                            Word11     Pages
                                      (word11)
                            Word12
                                     Candidate
                            Word13     Pages              Wiki Page 1
Tweet 1   Word-Set (WS) =             (word12)
                              .                        max overlap btw.
                              .                        WS and CP content
                              .      Candidate
                                       Pages
                                      (word13)
                            Word1n                 .
                                         .         .
                                         .         .
                                         .


                                       Candidate
                                         Pages
                                        (word1n)
Tweet:
RT ashajayy Rest in peace JD Salinger Catcher in the Rye is one of my absolute
                           favourite books Sad day


                                    Words:
 Rest, peace, JD, Salinger, Catcher, Rye, absolute, favourite, books, Sad, day

                        Candidate Pages                          Hits
                        //en.wikipedia.org/wiki/J.D._Salinger    290
                        //en.wikipedia.org/wiki/J._D._Salinger   289
                        //en.wikipedia.org/wiki/books            145
                        //en.wikipedia.org/wiki/Doris_Day        138
                        //en.wikipedia.org/wiki/peace            131
Step – 2: Calculating the Distance
                                                   WP1
                                                   L3
                                    WP1
                                    L2

                                                   WP1
     Wiki Page 1                                    L3
                         WP1
                         L1
                               1
                                    WP1            WP1
                                    L2              L3


               d12= 3
                                              2
                                   WP2
                                   L3

                                                  WP1 L3
     Wiki Page 2
                                                  WP2 L2
                        WP2
                        L1
                                          3

                                   WP2
                                   L2
Method
                                                            X        Y

                                                      T1    t1x      t1y
                       Distance Matrix
Tweets                                                T2    t2x      t2y        Discriminant
-T 1 (Topic 1)               T1    T2    T3
                                                                                   Analysis
-T 2 (Topic 1)                                  MDS   T3    t3x      t3y

-T 3 (Topic 2)
                        T1   0     d12   d12                                                    Accuracy
  .                     T2   d21   0     d23                                                      Rate
  .                     T3   d31   d32   0
                                                                           T3
  .                                                        T1
                                                                T2


                                   DSED
                                                                                               Acc. SED
                 SED
                                   DLSA
                 LSA                                                                           Acc. LSA

            Wikipedia              DWIKI
                                                                                               Acc. WIKI
Other Techniques

String Edit Distance (SED)                Latent Semantic Analysis (LSA)

                                          Natural language processing
Minimum number of edits needed to         technique for classification based on
transform one string into the other       term occurrences in documents




Kitten → sitten (subst. of 's' for 'k')
               SED = 1
Data
     Without Noise                                                    With Noise
             Category                       Count                           Category                       Count
        X                                   15                         X    J.D. Salinger                  15
             J.D. Salinger
             iPad                           15                              iPad                           15

             Haiti                          15                              Haiti                          15

                                 TOTAL 45                                   Random                         55

                                                                                                TOTAL 100

RT @ashajayy Rest in peace, JD Salinger. Catcher in the Rye is one of my absolute favourite books. Sad day.
@JMNelis I fear I may have killed him because I talked about how I hate "Catcher." (1/2)
'Catcher In The Rye' Author J.D. Salinger Dies At 91 - The author of The Catcher in the Rye died of natural causes,... http//ow.ly/16rETF


    iPad..not so appealing to me (Yet!) It's basically the MacBook&iPhone combined.I have both so don't think i'll be getting the iPad soon.
    Have u seen it?Apple iPad Tablet Steve Jobs Unveils Visionary Computer http//bit.ly/9IslTP
    The new Apple formula Hype

                What Yall think about me buying a whole bunch of sour patch kids and giving them to haiti i bet they would be HAPPY!
                Please ReTweet (http//caltweet.com/4gx ) - Lets ALL really AID Haiti
                RT @UNC_Health_Care Video Want to help the #Haitian patients at #UNC Hospitals? Here's how. http//bit…



 @Alitas_Way naw im kiddin but ma'am it really looks great on u
 Please come to our Legal Studies Open House on Tuesday February 2nd from 6-730pm.Please call for exact location and to RSVP …
 Most impressive stat for Warner is he holds the top 3 most passing yards in a superbowl. Three games three most passing yards in 40
X J.D. Salinger
                                                                                                                                                                 iPad
                                                                                                                                                                 Haiti
                                                                    Tweets without noise:


                                            SED                                                             LSA                                             Wiki
               0.2




                                                                                       0.6




                                                                                                                                              6
               0.1




                                                                                                                                              4
                                                                                       0.2
Coordinate 2




                                                                      Coordinate 2




                                                                                                                               Coordinate 2

                                                                                                                                              2
               0.0




                                                                                                                                              0
               -0.3 -0.2 -0.1




                                                                                       -0.2




                                                                                                                                              -2
                                                                                                                                              -4
                                                                                       -0.6




                                -0.3 -0.2 -0.1   0.0    0.1   0.2                             -0.6   -0.2         0.2   0.6                        -2   0    2     4     6   8

                                         Coordinate 1                                                Coordinate 1                                        Coordinate 1




                         Technique                                                   J. D. Salinger                     iPad                            Haiti
                         String Edit Distance                                              .67                           .13                             .60
                         Latent Semantic Analysis                                          .67                           .73                             .80
                         Wikipedia                                                         .93                           .87                             .80
X J.D. Salinger
                                                                                                                                            iPad
                                                                                                                                            Haiti
                                                             Tweets with noise:                                                             Random


SED
SED                                     0.6
                                       0.6                   LSA
                                                             LSA                                                 Wiki
                                                                                                                 Wiki




                                                                                               6
                                                                                               46
                                                                                                4
                                        0.2
                        Coordinate 2




                                                                                Coordinate 2
                                       0.2
                        Coordinate 2




                                                                                Coordinate 2

                                                                                               2
                                                                                               02
                                                                                                0
                                        -0.2
                                       -0.2




                                                                                                -2
                                                                                               -2
                                                                                                -4
                                                                                               -4
                                        -0.6
                                       -0.6




1
1   0.0
    0.0     0.1
            0.1   0.2
                  0.2                          -0.6
                                               -0.6   -0.2
                                                      -0.2         0.2
                                                                   0.2   0.6
                                                                         0.6                         -2
                                                                                                     -2    0
                                                                                                           0     2
                                                                                                                 2      4
                                                                                                                        4   6
                                                                                                                            6   8
                                                                                                                                8

rdinate 1
rdinate 1                                             Coordinate 1
                                                      Coordinate 1                                         Coordinate 1
                                                                                                           Coordinate 1




              Technique                                            J. D. Salinger                         iPad                      Haiti
              Latent Semantic Analysis                                   .60                               .60                       .20
              Wikipedia                                                  .93                               .87                       .73
Conclusion

Wikipedia Space shows promising results in
     defining similarity of short text

– Socially constructed
– Large space
– Immune to noise
Future Work
• Adaptive classification
  – What we consider as noise may contain useful
    information depending on the context


• Improved mapping and distance calculations

• Utilizing other social aspects of Wikipedia
Thank you!

   Q&A

More Related Content

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Discovering Context

  • 1. Discovering Context: Classifying tweets through a semantic transform based on Wikipedia Yegin Genc, Yasuaki Sakamoto, and Jeffrey V. Nickerson
  • 2. "So I'm told by a reputable “I hate how my phone has person they have killed this stupid … spell check Osama Bin Laden. …" …” Twitter to function as a large sensor system, and can increase our awareness of our surroundings
  • 3. Discovering Context: Classifying tweets through a semantic transform based on Wikipedia Why classify?
  • 4. "So I'm told by a reputable “I hate how my phone has person they have killed this stupid … spell check Osama Bin Laden. …" …” important important important Terrorism (?) Irritating technology important important NOT IMPORTANT
  • 5. How to classify? message transform distance(T(m1), T(m2)) transform message d(message1, message2) α d(T(message1),T( message2))
  • 6. A Two-Step Approach Wiki Page 1 (WP1) d12 Wiki Page 2 WP1 WP2 Tweet 1 (WP2) Tweet 2 d2n Wiki Page 3 d13 Tweet 3 (WP3) d32 d1n . . . WP3 . . d3n WPn . Tweet n Wiki Page n (WPn) STEP 1: STEP 2: FINDING WIKI PAGES CALCULATING DISTANCE
  • 7. Step – 1: Finding Wiki Pages Candidate Word11 Pages (word11) Word12 Candidate Word13 Pages Wiki Page 1 Tweet 1 Word-Set (WS) = (word12) . max overlap btw. . WS and CP content . Candidate Pages (word13) Word1n . . . . . . Candidate Pages (word1n)
  • 8. Tweet: RT ashajayy Rest in peace JD Salinger Catcher in the Rye is one of my absolute favourite books Sad day Words: Rest, peace, JD, Salinger, Catcher, Rye, absolute, favourite, books, Sad, day Candidate Pages Hits //en.wikipedia.org/wiki/J.D._Salinger 290 //en.wikipedia.org/wiki/J._D._Salinger 289 //en.wikipedia.org/wiki/books 145 //en.wikipedia.org/wiki/Doris_Day 138 //en.wikipedia.org/wiki/peace 131
  • 9. Step – 2: Calculating the Distance WP1 L3 WP1 L2 WP1 Wiki Page 1 L3 WP1 L1 1 WP1 WP1 L2 L3 d12= 3 2 WP2 L3 WP1 L3 Wiki Page 2 WP2 L2 WP2 L1 3 WP2 L2
  • 10. Method X Y T1 t1x t1y Distance Matrix Tweets T2 t2x t2y Discriminant -T 1 (Topic 1) T1 T2 T3 Analysis -T 2 (Topic 1) MDS T3 t3x t3y -T 3 (Topic 2) T1 0 d12 d12 Accuracy . T2 d21 0 d23 Rate . T3 d31 d32 0 T3 . T1 T2 DSED Acc. SED SED DLSA LSA Acc. LSA Wikipedia DWIKI Acc. WIKI
  • 11. Other Techniques String Edit Distance (SED) Latent Semantic Analysis (LSA) Natural language processing Minimum number of edits needed to technique for classification based on transform one string into the other term occurrences in documents Kitten → sitten (subst. of 's' for 'k') SED = 1
  • 12. Data Without Noise With Noise Category Count Category Count X 15 X J.D. Salinger 15 J.D. Salinger iPad 15 iPad 15 Haiti 15 Haiti 15 TOTAL 45 Random 55 TOTAL 100 RT @ashajayy Rest in peace, JD Salinger. Catcher in the Rye is one of my absolute favourite books. Sad day. @JMNelis I fear I may have killed him because I talked about how I hate "Catcher." (1/2) 'Catcher In The Rye' Author J.D. Salinger Dies At 91 - The author of The Catcher in the Rye died of natural causes,... http//ow.ly/16rETF iPad..not so appealing to me (Yet!) It's basically the MacBook&iPhone combined.I have both so don't think i'll be getting the iPad soon. Have u seen it?Apple iPad Tablet Steve Jobs Unveils Visionary Computer http//bit.ly/9IslTP The new Apple formula Hype What Yall think about me buying a whole bunch of sour patch kids and giving them to haiti i bet they would be HAPPY! Please ReTweet (http//caltweet.com/4gx ) - Lets ALL really AID Haiti RT @UNC_Health_Care Video Want to help the #Haitian patients at #UNC Hospitals? Here's how. http//bit… @Alitas_Way naw im kiddin but ma'am it really looks great on u Please come to our Legal Studies Open House on Tuesday February 2nd from 6-730pm.Please call for exact location and to RSVP … Most impressive stat for Warner is he holds the top 3 most passing yards in a superbowl. Three games three most passing yards in 40
  • 13. X J.D. Salinger iPad Haiti Tweets without noise: SED LSA Wiki 0.2 0.6 6 0.1 4 0.2 Coordinate 2 Coordinate 2 Coordinate 2 2 0.0 0 -0.3 -0.2 -0.1 -0.2 -2 -4 -0.6 -0.3 -0.2 -0.1 0.0 0.1 0.2 -0.6 -0.2 0.2 0.6 -2 0 2 4 6 8 Coordinate 1 Coordinate 1 Coordinate 1 Technique J. D. Salinger iPad Haiti String Edit Distance .67 .13 .60 Latent Semantic Analysis .67 .73 .80 Wikipedia .93 .87 .80
  • 14. X J.D. Salinger iPad Haiti Tweets with noise: Random SED SED 0.6 0.6 LSA LSA Wiki Wiki 6 46 4 0.2 Coordinate 2 Coordinate 2 0.2 Coordinate 2 Coordinate 2 2 02 0 -0.2 -0.2 -2 -2 -4 -4 -0.6 -0.6 1 1 0.0 0.0 0.1 0.1 0.2 0.2 -0.6 -0.6 -0.2 -0.2 0.2 0.2 0.6 0.6 -2 -2 0 0 2 2 4 4 6 6 8 8 rdinate 1 rdinate 1 Coordinate 1 Coordinate 1 Coordinate 1 Coordinate 1 Technique J. D. Salinger iPad Haiti Latent Semantic Analysis .60 .60 .20 Wikipedia .93 .87 .73
  • 15. Conclusion Wikipedia Space shows promising results in defining similarity of short text – Socially constructed – Large space – Immune to noise
  • 16. Future Work • Adaptive classification – What we consider as noise may contain useful information depending on the context • Improved mapping and distance calculations • Utilizing other social aspects of Wikipedia
  • 17. Thank you! Q&A

Editor's Notes

  1. We study how we can categorize messages streaming through Twitter. These messages, called tweets, come in at a rate of more than 600 a second [2], and are often cryptic. recognizing new and useful topics in this noisy environment, we may provide automated tools with pragmatic uses: Twitter functions as a large sensor system, and can increase our awareness of our surroundings Humans are experts in recognizing new and useful messages while ignoring others. They do this by extracting meaning from messages, categorizing messages with related meaning into the same topics, and noticing information that does not fit any existing categories. Attempts to automate this fundamental ability of cognition using semantic models still leave room for improvement (e.g. [1]).
  2. Accuracy (hit plus correct rejection) of classifying 45 tweets with known categories when 55 randomly sampled tweets are added.
  3. . Forty-five tweets with known categories mapped onto two-dimensional planes using multidimensional scaling of the between-tweet distances based on String Edit Distance, LSA and Wikipedia. An x is a tweet about J. D. Salinger and a triangle is a tweet about the iPad.