SlideShare a Scribd company logo
1 of 41
Download to read offline
SEASR and UIMA
             Mike Haberman
          mikeh@ncsa.uiuc.edu

National Center for Supercomputing Applications
   University of Illinois at Urbana-Champaign
UIMA 
Unstructured Information Management Applications
UIMA to SEASR 




                  SEASR
UIMA + P.O.S. tagging

Four Analysis Engines to analyze document to
 record POS information.



OpenNLP     OpenNLP       OpenNLP
                                             POSWriter
Tokenizer   PosTagger     SentanceDetector




            Serialization of the UIMA CAS
UIMA Structured data
•  POSWriter is a CAS Consumer
  –  Extracted data from the CAS
  –  Ready for import into SEASR
UIMA + P.O.S. tagging: step 1
UIMA + P.O.S. tagging: step 2
UIMA + P.O.S. tagging: step 3
UIMA + P.O.S. tagging: step 4
UIMA Structured data
•  Two SEASR examples using UIMA POS data
  –  Frequent patterns (rule associations) on nouns
     (fpgrowth)
  –  Sentiment analysis on adjectives
UIMA to SEASR: Experiment I
•  Finding patterns
SEASR + UIMA: Frequent Patterns
Frequent Pattern Analysis on nouns
•  Goal:
   –  Discover a cast of characters within the text
   –  Discover nouns that frequently occur together
      •  character relationships
Frequent Patterns: nouns
•  Use of item sets in fpgrowth
•  What’s new:
   –  handling sparse item sets


    Transac'on
Id
 Item
 Item
   Item

                                         •••
                   A
    B
      C


    1
            0
     1
      1

    2
            1
     1
      1

    3
            1
     0
      1

    4
            1
     0
      0

Frequent Patterns: nouns
•  What’s new:
  –  handling sparse item sets

   Transac'on


   {A,B,C}

   {X,Y}

   {F,E,A,C,E}

   {A,Z,X,U,I,O}

Frequent Patterns: nouns
                               Reads UIMA’s CAS consumer output
SEASR Flow
                               Enter number UIMA data source
                                   •  url of the of sentences to group
http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl
                                {word=tom}
(similar to fpgrowth demo)
                              http://repository.seasr.org/Datasets/POS/
                                {word=answer}
                                        Enter support: 10%
                                {word=tom}
                                  
tomSawyer.NN.is, tomSawyer.NNP.is
                                {word=lady,word=spectacles,word=room,word=thing,word=boy,word=state,wor
                                   uncleTom.NN.is, uncleTom.NNP.is
                                {word=bed,word=broom,word=breath,word=punches,word=nothing,word=cat}
                                {word=aunt,word=polly,word=moment,word=laugh}
                                {word=boy,word=anything,word=aint,word=tricks,word=fools,word=fools,word=
Frequent Patterns: visualization
             Analysis of Tom Sawyer
                 10 paragraph window
                 Support set to 10%
Frequent Patterns: nouns
•  Recap: SEASR flow information
•  The repository location is: 
   –  http://repository.seasr.org/Meandre/Locations/1.4/
      Demo-UIMA/repository.ttl

•  Reads UIMA’s CAS consumer output
   –  Select file/url of the UIMA data source
   –  http://repository.seasr.org/Datasets/POS
      tomSawyer.NN.is, tomSawyer.NNP.is, uncleTom.NN.is, uncleTom.NNP.is


•  Similar to fpgrowth demo
UIMA + SEASR: Frequent Patterns
•  Extensions
   –  Analysis for separate chapters
      •  Discover new relationships that occur over small windows

   –  Adjectives, Adverbs
      •  Common, repeating word usage, phrases

   –  Entity Extraction: Dates, Locations, Geo
UIMA to SEASR: Experiment II
•  Sentiment Analysis
UIMA + SEASR: Sentiment Analysis
•  Classifying text based on its sentiment
   –  Determining the attitude of a speaker or a writer
   –  Determining whether a review is positive/negative
UIMA + SEASR: Sentiment Analysis
•  Ask: What emotion is being conveyed within a
   body of text?
  –  Look at only adjectives (UIMA POS)
     •  lots of issues, challenges, and but’s “but … “
UIMA + SEASR: Sentiment Analysis
•  Need to Answer:
  –  What emotions to track?
  –  How to measure/classify an adjective to one of the
     selected emotions?
  –  How to visualize the results
UIMA + SEASR: Sentiment Analysis
•  Which emotions:
   –  http://en.wikipedia.org/wiki/List_of_emotions
   –  http://changingminds.org/explanations/emotions/
      basic%20emotions.htm
   –  http://www.emotionalcompetency.com/
      recognizing.htm

•  Parrot’s classification (2001)
   –  six core emotions
   –  Love, Joy, Surprise, Anger, Sadness, Fear
UIMA + SEASR: Sentiment Analysis
UIMA + SEASR: Sentiment Analysis
•  How to classify adjectives:
   –  Lots of metrics we could use …
      •  Lists of adjectives already classified
          –  http://www.derose.net/steve/resources/emotionwords/ewords.html

          –  Need a “nearness” metric for missing adjectives

   –  How about the thesaurus game ?
UIMA + SEASR: Sentiment Analysis

              •  Using only a thesaurus, find
                 a path between two words
                 –  no antonyms
                 –  no colloquialisms or slang
UIMA + SEASR: Sentiment Analysis

         •  How to get from delightful to rainy ?
           ['delightful', 'fair', 'balmy', 'moist', 'rainy'].

         •  sexy to joyless?
           ['sexy', 'provocative', 'blue', 'joyless’]

         •  bitter to lovable?
           ['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]
UIMA + SEASR: Sentiment Analysis

         •  Use this game as a metric for
            measuring a given adjective to one
            of the six emotions.
           •  Assume the longer the path, the “farther
              away” the two words are.
              •  address some of issues
UIMA + SEASR: Sentiment Analysis

         •  SynNet: a traversable graph of
            synonyms (adjectives)
SynNet: rainy to pleasant
UIMA + SEASR: Sentiment Analysis

         •  SynNet Metrics
           •  Common nodes
           •  Path length
           •  Symmetric: a->b->c c->b->a
           •  Link strength: 
              •  tangy->sweet

              •  sweet->lovable
              •  Use of slang or informal usage
UIMA + SEASR: Sentiment Analysis

                •  Common Nodes
                  •  depth of common
UIMA + SEASR: Sentiment Analysis
•  Symmetry of path in common nodes
UIMA + SEASR: Sentiment Analysis

         •  Find the shortest path between
            adjective and each emotion:
            •  ['delightful', 'beatific', 'joyful']
            •  ['delightful', 'ineffable', 'unspeakable',
               'fearful']

         •  Pick the emotion with shortest path
            length
            •  tie breaking procedures
UIMA + SEASR: Sentiment Analysis

•  Not a perfect solution
   –  still need context to get quality
      •  Vain
          –  ['vain', 'insignificant', 'contemptible', 'hateful']
          –  ['vain', 'misleading', 'puzzling', 'surprising’]
      •  Animal
               ['animal', 'sensual', 'pleasing', 'joyful']
          – 
               ['animal', 'bestial', 'vile', 'hateful']
          – 
               ['animal', 'gross', 'shocking', 'fearful']
          – 
               ['animal', 'gross', 'grievous', 'sorrowful']
          – 
      •  Negation
          –  “My mother was not a hateful person.”
UIMA + SEASR: Sentiment Analysis

•  A word about WordNet
  •  http://wordnetweb.princeton.edu/
  •  English nouns, verbs, adjectives and adverbs
     organized into sets of synonyms (synsets)
UIMA + SEASR: Sentiment Analysis

•  Adjective islands
  •  There is no path from delightful to happy

  •  happy: {beaming, beamy, effulgent, felicitous, glad, happy,
     radiant, refulgent, well-chosen}
UIMA + SEASR: Sentiment Analysis

•  Process Overview
  •  Extract the adjectives (UIMA POS analysis)
  •  Read in adjectives (SEASR library)
  •  Label each adjective (SynNet)
  •  Summarize windows of adjectives
     •  lots of experimentation here

  •  Visualize the windows
UIMA + SEASR: Sentiment Analysis

•  Visualization
   •  New SEASR visualization component
      •  Based on flare ActionScript Library
          •  http://flare.prefuse.org/

      •  Still in development

      •  http://demo.seasr.org:1714/public/resources/data/emotions/
         ev/EmotionViewer.html
UIMA + SEASR: Sentiment Analysis
UIMA + SEASR: Sentiment Analysis

•  Extensions
   •  Adverbs, nouns, verbs
   •  Analysis of metrics, etc

•  Goal and Relevancy
   •  Two new components
      •  SynNet
      •  Flash based visualization of sequential based data

More Related Content

Similar to SEASR and UIMA

企业级搜索引擎Solr交流
企业级搜索引擎Solr交流企业级搜索引擎Solr交流
企业级搜索引擎Solr交流chuan liang
 
Rosario Hearst
Rosario HearstRosario Hearst
Rosario Hearstfarzanehs
 
Chinaonrails Rubyonrails21 Zh
Chinaonrails Rubyonrails21 ZhChinaonrails Rubyonrails21 Zh
Chinaonrails Rubyonrails21 ZhJesse Cai
 
Why Perl, when you can use bash+awk+sed? :P
Why Perl, when you can use bash+awk+sed? :PWhy Perl, when you can use bash+awk+sed? :P
Why Perl, when you can use bash+awk+sed? :PLuciano Rocha
 
Ruby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionRuby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionLibin Pan
 
Writing Modular Command-line Apps with App::Cmd
Writing Modular Command-line Apps with App::CmdWriting Modular Command-line Apps with App::Cmd
Writing Modular Command-line Apps with App::CmdRicardo Signes
 
So Cal Bio Keynote Talent Development
So Cal Bio Keynote   Talent DevelopmentSo Cal Bio Keynote   Talent Development
So Cal Bio Keynote Talent DevelopmentMike Winstanley
 
2007 0822 Antelope Php
2007 0822 Antelope Php2007 0822 Antelope Php
2007 0822 Antelope Phpgmaxsonic
 
Social Computing Tools and Social Technography
Social Computing Tools and Social TechnographySocial Computing Tools and Social Technography
Social Computing Tools and Social TechnographyKiran Budhrani
 
Douglas Knudsen - Great Mash Up
Douglas Knudsen - Great Mash UpDouglas Knudsen - Great Mash Up
Douglas Knudsen - Great Mash Up360|Conferences
 
Scalability without going nuts
Scalability without going nutsScalability without going nuts
Scalability without going nutsJames Cox
 
The Lean Startup at Web 2.0 Expo
The Lean Startup at Web 2.0 ExpoThe Lean Startup at Web 2.0 Expo
The Lean Startup at Web 2.0 ExpoVenture Hacks
 
『Ficia』インフラとPerlにまつわるエトセトラ
『Ficia』インフラとPerlにまつわるエトセトラ『Ficia』インフラとPerlにまつわるエトセトラ
『Ficia』インフラとPerlにまつわるエトセトラMasaaki HIROSE
 
Really Simple Document Management with Alfresco
Really Simple Document Management with AlfrescoReally Simple Document Management with Alfresco
Really Simple Document Management with AlfrescoAlfresco Software
 

Similar to SEASR and UIMA (20)

企业级搜索引擎Solr交流
企业级搜索引擎Solr交流企业级搜索引擎Solr交流
企业级搜索引擎Solr交流
 
Rosario Hearst
Rosario HearstRosario Hearst
Rosario Hearst
 
gen_paxos
gen_paxosgen_paxos
gen_paxos
 
Chinaonrails Rubyonrails21 Zh
Chinaonrails Rubyonrails21 ZhChinaonrails Rubyonrails21 Zh
Chinaonrails Rubyonrails21 Zh
 
Why Perl, when you can use bash+awk+sed? :P
Why Perl, when you can use bash+awk+sed? :PWhy Perl, when you can use bash+awk+sed? :P
Why Perl, when you can use bash+awk+sed? :P
 
Genome Browser
Genome BrowserGenome Browser
Genome Browser
 
Ruby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionRuby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese Version
 
Revisited
RevisitedRevisited
Revisited
 
Writing Modular Command-line Apps with App::Cmd
Writing Modular Command-line Apps with App::CmdWriting Modular Command-line Apps with App::Cmd
Writing Modular Command-line Apps with App::Cmd
 
From Work To Word
From Work To WordFrom Work To Word
From Work To Word
 
So Cal Bio Keynote Talent Development
So Cal Bio Keynote   Talent DevelopmentSo Cal Bio Keynote   Talent Development
So Cal Bio Keynote Talent Development
 
2007 0822 Antelope Php
2007 0822 Antelope Php2007 0822 Antelope Php
2007 0822 Antelope Php
 
Social Computing Tools and Social Technography
Social Computing Tools and Social TechnographySocial Computing Tools and Social Technography
Social Computing Tools and Social Technography
 
Douglas Knudsen - Great Mash Up
Douglas Knudsen - Great Mash UpDouglas Knudsen - Great Mash Up
Douglas Knudsen - Great Mash Up
 
Scalability without going nuts
Scalability without going nutsScalability without going nuts
Scalability without going nuts
 
The Lean Startup at Web 2.0 Expo
The Lean Startup at Web 2.0 ExpoThe Lean Startup at Web 2.0 Expo
The Lean Startup at Web 2.0 Expo
 
HTML Parsing With Hpricot
HTML Parsing With HpricotHTML Parsing With Hpricot
HTML Parsing With Hpricot
 
Mlw
MlwMlw
Mlw
 
『Ficia』インフラとPerlにまつわるエトセトラ
『Ficia』インフラとPerlにまつわるエトセトラ『Ficia』インフラとPerlにまつわるエトセトラ
『Ficia』インフラとPerlにまつわるエトセトラ
 
Really Simple Document Management with Alfresco
Really Simple Document Management with AlfrescoReally Simple Document Management with Alfresco
Really Simple Document Management with Alfresco
 

More from Loretta Auvil

Seasr Overview Ws April 2009
Seasr Overview Ws April 2009Seasr Overview Ws April 2009
Seasr Overview Ws April 2009Loretta Auvil
 
Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009Loretta Auvil
 
Fedora App Slide 2009 Hastac
Fedora App Slide 2009 HastacFedora App Slide 2009 Hastac
Fedora App Slide 2009 HastacLoretta Auvil
 
Meandre Architecture
Meandre ArchitectureMeandre Architecture
Meandre ArchitectureLoretta Auvil
 
Meandre Workbench Ws Jan 2009
Meandre Workbench Ws Jan 2009Meandre Workbench Ws Jan 2009
Meandre Workbench Ws Jan 2009Loretta Auvil
 
SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009Loretta Auvil
 
ICHASS Workshop Seasr
ICHASS Workshop SeasrICHASS Workshop Seasr
ICHASS Workshop SeasrLoretta Auvil
 
ICHASS Workshop Text Mining
ICHASS Workshop Text MiningICHASS Workshop Text Mining
ICHASS Workshop Text MiningLoretta Auvil
 

More from Loretta Auvil (20)

Seasr Overview Ws April 2009
Seasr Overview Ws April 2009Seasr Overview Ws April 2009
Seasr Overview Ws April 2009
 
Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009
 
Fedora App Slide 2009 Hastac
Fedora App Slide 2009 HastacFedora App Slide 2009 Hastac
Fedora App Slide 2009 Hastac
 
Discus
DiscusDiscus
Discus
 
Meandre Architecture
Meandre ArchitectureMeandre Architecture
Meandre Architecture
 
SEASR Audio
SEASR AudioSEASR Audio
SEASR Audio
 
SEASR Overview
SEASR OverviewSEASR Overview
SEASR Overview
 
SEASR Tools
SEASR ToolsSEASR Tools
SEASR Tools
 
SEASR-and-Zotero
SEASR-and-ZoteroSEASR-and-Zotero
SEASR-and-Zotero
 
SEASR-Fedora App
SEASR-Fedora AppSEASR-Fedora App
SEASR-Fedora App
 
SEASR Installation
SEASR InstallationSEASR Installation
SEASR Installation
 
SEASR Community Hub
SEASR Community HubSEASR Community Hub
SEASR Community Hub
 
Meandre Workbench Ws Jan 2009
Meandre Workbench Ws Jan 2009Meandre Workbench Ws Jan 2009
Meandre Workbench Ws Jan 2009
 
SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009
 
SEASR and Zotero
SEASR and ZoteroSEASR and Zotero
SEASR and Zotero
 
SEASR Overview
SEASR OverviewSEASR Overview
SEASR Overview
 
SEASR eScience 2008
SEASR eScience 2008SEASR eScience 2008
SEASR eScience 2008
 
ICHASS Workshop Lab
ICHASS Workshop LabICHASS Workshop Lab
ICHASS Workshop Lab
 
ICHASS Workshop Seasr
ICHASS Workshop SeasrICHASS Workshop Seasr
ICHASS Workshop Seasr
 
ICHASS Workshop Text Mining
ICHASS Workshop Text MiningICHASS Workshop Text Mining
ICHASS Workshop Text Mining
 

Recently uploaded

Taylor Swift quiz( with answers) by SJU quizzers
Taylor Swift quiz( with answers) by SJU quizzersTaylor Swift quiz( with answers) by SJU quizzers
Taylor Swift quiz( with answers) by SJU quizzersSJU Quizzers
 
The Hardest Part About Picking A Show To Watch (Comics)
The Hardest Part About Picking A Show To Watch (Comics)The Hardest Part About Picking A Show To Watch (Comics)
The Hardest Part About Picking A Show To Watch (Comics)Salty Vixen Stories & More
 
Young adult book quiz by SJU quizzers.ppt
Young adult book quiz by SJU quizzers.pptYoung adult book quiz by SJU quizzers.ppt
Young adult book quiz by SJU quizzers.pptSJU Quizzers
 
5 Moments of Everyday Self-Loathing That Perfectly Describe Your Life
5 Moments of Everyday Self-Loathing That Perfectly Describe Your Life5 Moments of Everyday Self-Loathing That Perfectly Describe Your Life
5 Moments of Everyday Self-Loathing That Perfectly Describe Your LifeSalty Vixen Stories & More
 
"Quest for Knowledge: An Exciting Journey Through 40 Brain-Bending Questions ...
"Quest for Knowledge: An Exciting Journey Through 40 Brain-Bending Questions ..."Quest for Knowledge: An Exciting Journey Through 40 Brain-Bending Questions ...
"Quest for Knowledge: An Exciting Journey Through 40 Brain-Bending Questions ...RAGHURAMYC
 
How the fever night scores above your mundane nightlife occurrence
How the fever night scores above your mundane nightlife occurrenceHow the fever night scores above your mundane nightlife occurrence
How the fever night scores above your mundane nightlife occurrenceJFI Production
 
Inside Look: Brooke Monk's Exclusive OnlyFans Content Production
Inside Look: Brooke Monk's Exclusive OnlyFans Content ProductionInside Look: Brooke Monk's Exclusive OnlyFans Content Production
Inside Look: Brooke Monk's Exclusive OnlyFans Content Productionget joys
 
Holi:: "The Festival of Colors in India"
Holi:: "The Festival of Colors in India"Holi:: "The Festival of Colors in India"
Holi:: "The Festival of Colors in India"IdolsArts
 

Recently uploaded (8)

Taylor Swift quiz( with answers) by SJU quizzers
Taylor Swift quiz( with answers) by SJU quizzersTaylor Swift quiz( with answers) by SJU quizzers
Taylor Swift quiz( with answers) by SJU quizzers
 
The Hardest Part About Picking A Show To Watch (Comics)
The Hardest Part About Picking A Show To Watch (Comics)The Hardest Part About Picking A Show To Watch (Comics)
The Hardest Part About Picking A Show To Watch (Comics)
 
Young adult book quiz by SJU quizzers.ppt
Young adult book quiz by SJU quizzers.pptYoung adult book quiz by SJU quizzers.ppt
Young adult book quiz by SJU quizzers.ppt
 
5 Moments of Everyday Self-Loathing That Perfectly Describe Your Life
5 Moments of Everyday Self-Loathing That Perfectly Describe Your Life5 Moments of Everyday Self-Loathing That Perfectly Describe Your Life
5 Moments of Everyday Self-Loathing That Perfectly Describe Your Life
 
"Quest for Knowledge: An Exciting Journey Through 40 Brain-Bending Questions ...
"Quest for Knowledge: An Exciting Journey Through 40 Brain-Bending Questions ..."Quest for Knowledge: An Exciting Journey Through 40 Brain-Bending Questions ...
"Quest for Knowledge: An Exciting Journey Through 40 Brain-Bending Questions ...
 
How the fever night scores above your mundane nightlife occurrence
How the fever night scores above your mundane nightlife occurrenceHow the fever night scores above your mundane nightlife occurrence
How the fever night scores above your mundane nightlife occurrence
 
Inside Look: Brooke Monk's Exclusive OnlyFans Content Production
Inside Look: Brooke Monk's Exclusive OnlyFans Content ProductionInside Look: Brooke Monk's Exclusive OnlyFans Content Production
Inside Look: Brooke Monk's Exclusive OnlyFans Content Production
 
Holi:: "The Festival of Colors in India"
Holi:: "The Festival of Colors in India"Holi:: "The Festival of Colors in India"
Holi:: "The Festival of Colors in India"
 

SEASR and UIMA

  • 1. SEASR and UIMA Mike Haberman mikeh@ncsa.uiuc.edu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
  • 2. UIMA Unstructured Information Management Applications
  • 4. UIMA + P.O.S. tagging Four Analysis Engines to analyze document to record POS information. OpenNLP OpenNLP OpenNLP POSWriter Tokenizer PosTagger SentanceDetector Serialization of the UIMA CAS
  • 5. UIMA Structured data •  POSWriter is a CAS Consumer –  Extracted data from the CAS –  Ready for import into SEASR
  • 6. UIMA + P.O.S. tagging: step 1
  • 7. UIMA + P.O.S. tagging: step 2
  • 8. UIMA + P.O.S. tagging: step 3
  • 9. UIMA + P.O.S. tagging: step 4
  • 10. UIMA Structured data •  Two SEASR examples using UIMA POS data –  Frequent patterns (rule associations) on nouns (fpgrowth) –  Sentiment analysis on adjectives
  • 11. UIMA to SEASR: Experiment I •  Finding patterns
  • 12. SEASR + UIMA: Frequent Patterns Frequent Pattern Analysis on nouns •  Goal: –  Discover a cast of characters within the text –  Discover nouns that frequently occur together •  character relationships
  • 13. Frequent Patterns: nouns •  Use of item sets in fpgrowth •  What’s new: –  handling sparse item sets Transac'on
Id
 Item
 Item
 Item
 ••• A
 B
 C
 1
 0
 1
 1
 2
 1
 1
 1
 3
 1
 0
 1
 4
 1
 0
 0

  • 14. Frequent Patterns: nouns •  What’s new: –  handling sparse item sets Transac'on
 {A,B,C}
 {X,Y}
 {F,E,A,C,E}
 {A,Z,X,U,I,O}

  • 15. Frequent Patterns: nouns Reads UIMA’s CAS consumer output SEASR Flow Enter number UIMA data source •  url of the of sentences to group http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl {word=tom} (similar to fpgrowth demo) http://repository.seasr.org/Datasets/POS/ {word=answer} Enter support: 10% {word=tom} tomSawyer.NN.is, tomSawyer.NNP.is {word=lady,word=spectacles,word=room,word=thing,word=boy,word=state,wor uncleTom.NN.is, uncleTom.NNP.is {word=bed,word=broom,word=breath,word=punches,word=nothing,word=cat} {word=aunt,word=polly,word=moment,word=laugh} {word=boy,word=anything,word=aint,word=tricks,word=fools,word=fools,word=
  • 16. Frequent Patterns: visualization Analysis of Tom Sawyer 10 paragraph window Support set to 10%
  • 17. Frequent Patterns: nouns •  Recap: SEASR flow information •  The repository location is: –  http://repository.seasr.org/Meandre/Locations/1.4/ Demo-UIMA/repository.ttl •  Reads UIMA’s CAS consumer output –  Select file/url of the UIMA data source –  http://repository.seasr.org/Datasets/POS tomSawyer.NN.is, tomSawyer.NNP.is, uncleTom.NN.is, uncleTom.NNP.is •  Similar to fpgrowth demo
  • 18. UIMA + SEASR: Frequent Patterns •  Extensions –  Analysis for separate chapters •  Discover new relationships that occur over small windows –  Adjectives, Adverbs •  Common, repeating word usage, phrases –  Entity Extraction: Dates, Locations, Geo
  • 19. UIMA to SEASR: Experiment II •  Sentiment Analysis
  • 20. UIMA + SEASR: Sentiment Analysis •  Classifying text based on its sentiment –  Determining the attitude of a speaker or a writer –  Determining whether a review is positive/negative
  • 21. UIMA + SEASR: Sentiment Analysis •  Ask: What emotion is being conveyed within a body of text? –  Look at only adjectives (UIMA POS) •  lots of issues, challenges, and but’s “but … “
  • 22. UIMA + SEASR: Sentiment Analysis •  Need to Answer: –  What emotions to track? –  How to measure/classify an adjective to one of the selected emotions? –  How to visualize the results
  • 23. UIMA + SEASR: Sentiment Analysis •  Which emotions: –  http://en.wikipedia.org/wiki/List_of_emotions –  http://changingminds.org/explanations/emotions/ basic%20emotions.htm –  http://www.emotionalcompetency.com/ recognizing.htm •  Parrot’s classification (2001) –  six core emotions –  Love, Joy, Surprise, Anger, Sadness, Fear
  • 24. UIMA + SEASR: Sentiment Analysis
  • 25. UIMA + SEASR: Sentiment Analysis •  How to classify adjectives: –  Lots of metrics we could use … •  Lists of adjectives already classified –  http://www.derose.net/steve/resources/emotionwords/ewords.html –  Need a “nearness” metric for missing adjectives –  How about the thesaurus game ?
  • 26. UIMA + SEASR: Sentiment Analysis •  Using only a thesaurus, find a path between two words –  no antonyms –  no colloquialisms or slang
  • 27. UIMA + SEASR: Sentiment Analysis •  How to get from delightful to rainy ? ['delightful', 'fair', 'balmy', 'moist', 'rainy']. •  sexy to joyless? ['sexy', 'provocative', 'blue', 'joyless’] •  bitter to lovable? ['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]
  • 28. UIMA + SEASR: Sentiment Analysis •  Use this game as a metric for measuring a given adjective to one of the six emotions. •  Assume the longer the path, the “farther away” the two words are. •  address some of issues
  • 29. UIMA + SEASR: Sentiment Analysis •  SynNet: a traversable graph of synonyms (adjectives)
  • 30. SynNet: rainy to pleasant
  • 31. UIMA + SEASR: Sentiment Analysis •  SynNet Metrics •  Common nodes •  Path length •  Symmetric: a->b->c c->b->a •  Link strength: •  tangy->sweet •  sweet->lovable •  Use of slang or informal usage
  • 32. UIMA + SEASR: Sentiment Analysis •  Common Nodes •  depth of common
  • 33. UIMA + SEASR: Sentiment Analysis •  Symmetry of path in common nodes
  • 34. UIMA + SEASR: Sentiment Analysis •  Find the shortest path between adjective and each emotion: •  ['delightful', 'beatific', 'joyful'] •  ['delightful', 'ineffable', 'unspeakable', 'fearful'] •  Pick the emotion with shortest path length •  tie breaking procedures
  • 35. UIMA + SEASR: Sentiment Analysis •  Not a perfect solution –  still need context to get quality •  Vain –  ['vain', 'insignificant', 'contemptible', 'hateful'] –  ['vain', 'misleading', 'puzzling', 'surprising’] •  Animal ['animal', 'sensual', 'pleasing', 'joyful'] –  ['animal', 'bestial', 'vile', 'hateful'] –  ['animal', 'gross', 'shocking', 'fearful'] –  ['animal', 'gross', 'grievous', 'sorrowful'] –  •  Negation –  “My mother was not a hateful person.”
  • 36. UIMA + SEASR: Sentiment Analysis •  A word about WordNet •  http://wordnetweb.princeton.edu/ •  English nouns, verbs, adjectives and adverbs organized into sets of synonyms (synsets)
  • 37. UIMA + SEASR: Sentiment Analysis •  Adjective islands •  There is no path from delightful to happy •  happy: {beaming, beamy, effulgent, felicitous, glad, happy, radiant, refulgent, well-chosen}
  • 38. UIMA + SEASR: Sentiment Analysis •  Process Overview •  Extract the adjectives (UIMA POS analysis) •  Read in adjectives (SEASR library) •  Label each adjective (SynNet) •  Summarize windows of adjectives •  lots of experimentation here •  Visualize the windows
  • 39. UIMA + SEASR: Sentiment Analysis •  Visualization •  New SEASR visualization component •  Based on flare ActionScript Library •  http://flare.prefuse.org/ •  Still in development •  http://demo.seasr.org:1714/public/resources/data/emotions/ ev/EmotionViewer.html
  • 40. UIMA + SEASR: Sentiment Analysis
  • 41. UIMA + SEASR: Sentiment Analysis •  Extensions •  Adverbs, nouns, verbs •  Analysis of metrics, etc •  Goal and Relevancy •  Two new components •  SynNet •  Flash based visualization of sequential based data