This document discusses using UIMA (Unstructured Information Management Applications) to preprocess text data and extract part-of-speech tags, and then importing that structured data into SEASR (Syntactic and Semantic Research) for further analysis. It provides two examples: (1) using frequent pattern mining on nouns to discover relationships between characters, and (2) sentiment analysis on adjectives to classify emotions in text. It outlines the overall workflow and challenges, such as developing a thesaurus-based approach to classify adjectives according to different emotions.
4. UIMA + P.O.S. tagging
Four Analysis Engines to analyze document to
record POS information.
OpenNLP OpenNLP OpenNLP
POSWriter
Tokenizer PosTagger SentanceDetector
Serialization of the UIMA CAS
5. UIMA Structured data
• POSWriter is a CAS Consumer
– Extracted data from the CAS
– Ready for import into SEASR
10. UIMA Structured data
• Two SEASR examples using UIMA POS data
– Frequent patterns (rule associations) on nouns
(fpgrowth)
– Sentiment analysis on adjectives
12. SEASR + UIMA: Frequent Patterns
Frequent Pattern Analysis on nouns
• Goal:
– Discover a cast of characters within the text
– Discover nouns that frequently occur together
• character relationships
13. Frequent Patterns: nouns
• Use of item sets in fpgrowth
• What’s new:
– handling sparse item sets
Transac'on Id Item Item Item
•••
A B C
1 0 1 1
2 1 1 1
3 1 0 1
4 1 0 0
15. Frequent Patterns: nouns
Reads UIMA’s CAS consumer output
SEASR Flow
Enter number UIMA data source
• url of the of sentences to group
http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl
{word=tom}
(similar to fpgrowth demo)
http://repository.seasr.org/Datasets/POS/
{word=answer}
Enter support: 10%
{word=tom}
tomSawyer.NN.is, tomSawyer.NNP.is
{word=lady,word=spectacles,word=room,word=thing,word=boy,word=state,wor
uncleTom.NN.is, uncleTom.NNP.is
{word=bed,word=broom,word=breath,word=punches,word=nothing,word=cat}
{word=aunt,word=polly,word=moment,word=laugh}
{word=boy,word=anything,word=aint,word=tricks,word=fools,word=fools,word=
17. Frequent Patterns: nouns
• Recap: SEASR flow information
• The repository location is:
– http://repository.seasr.org/Meandre/Locations/1.4/
Demo-UIMA/repository.ttl
• Reads UIMA’s CAS consumer output
– Select file/url of the UIMA data source
– http://repository.seasr.org/Datasets/POS
tomSawyer.NN.is, tomSawyer.NNP.is, uncleTom.NN.is, uncleTom.NNP.is
• Similar to fpgrowth demo
18. UIMA + SEASR: Frequent Patterns
• Extensions
– Analysis for separate chapters
• Discover new relationships that occur over small windows
– Adjectives, Adverbs
• Common, repeating word usage, phrases
– Entity Extraction: Dates, Locations, Geo
20. UIMA + SEASR: Sentiment Analysis
• Classifying text based on its sentiment
– Determining the attitude of a speaker or a writer
– Determining whether a review is positive/negative
21. UIMA + SEASR: Sentiment Analysis
• Ask: What emotion is being conveyed within a
body of text?
– Look at only adjectives (UIMA POS)
• lots of issues, challenges, and but’s “but … “
22. UIMA + SEASR: Sentiment Analysis
• Need to Answer:
– What emotions to track?
– How to measure/classify an adjective to one of the
selected emotions?
– How to visualize the results
25. UIMA + SEASR: Sentiment Analysis
• How to classify adjectives:
– Lots of metrics we could use …
• Lists of adjectives already classified
– http://www.derose.net/steve/resources/emotionwords/ewords.html
– Need a “nearness” metric for missing adjectives
– How about the thesaurus game ?
26. UIMA + SEASR: Sentiment Analysis
• Using only a thesaurus, find
a path between two words
– no antonyms
– no colloquialisms or slang
27. UIMA + SEASR: Sentiment Analysis
• How to get from delightful to rainy ?
['delightful', 'fair', 'balmy', 'moist', 'rainy'].
• sexy to joyless?
['sexy', 'provocative', 'blue', 'joyless’]
• bitter to lovable?
['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]
28. UIMA + SEASR: Sentiment Analysis
• Use this game as a metric for
measuring a given adjective to one
of the six emotions.
• Assume the longer the path, the “farther
away” the two words are.
• address some of issues
29. UIMA + SEASR: Sentiment Analysis
• SynNet: a traversable graph of
synonyms (adjectives)
31. UIMA + SEASR: Sentiment Analysis
• SynNet Metrics
• Common nodes
• Path length
• Symmetric: a->b->c c->b->a
• Link strength:
• tangy->sweet
• sweet->lovable
• Use of slang or informal usage
32. UIMA + SEASR: Sentiment Analysis
• Common Nodes
• depth of common
33. UIMA + SEASR: Sentiment Analysis
• Symmetry of path in common nodes
34. UIMA + SEASR: Sentiment Analysis
• Find the shortest path between
adjective and each emotion:
• ['delightful', 'beatific', 'joyful']
• ['delightful', 'ineffable', 'unspeakable',
'fearful']
• Pick the emotion with shortest path
length
• tie breaking procedures
35. UIMA + SEASR: Sentiment Analysis
• Not a perfect solution
– still need context to get quality
• Vain
– ['vain', 'insignificant', 'contemptible', 'hateful']
– ['vain', 'misleading', 'puzzling', 'surprising’]
• Animal
['animal', 'sensual', 'pleasing', 'joyful']
–
['animal', 'bestial', 'vile', 'hateful']
–
['animal', 'gross', 'shocking', 'fearful']
–
['animal', 'gross', 'grievous', 'sorrowful']
–
• Negation
– “My mother was not a hateful person.”
36. UIMA + SEASR: Sentiment Analysis
• A word about WordNet
• http://wordnetweb.princeton.edu/
• English nouns, verbs, adjectives and adverbs
organized into sets of synonyms (synsets)
37. UIMA + SEASR: Sentiment Analysis
• Adjective islands
• There is no path from delightful to happy
• happy: {beaming, beamy, effulgent, felicitous, glad, happy,
radiant, refulgent, well-chosen}
38. UIMA + SEASR: Sentiment Analysis
• Process Overview
• Extract the adjectives (UIMA POS analysis)
• Read in adjectives (SEASR library)
• Label each adjective (SynNet)
• Summarize windows of adjectives
• lots of experimentation here
• Visualize the windows
39. UIMA + SEASR: Sentiment Analysis
• Visualization
• New SEASR visualization component
• Based on flare ActionScript Library
• http://flare.prefuse.org/
• Still in development
• http://demo.seasr.org:1714/public/resources/data/emotions/
ev/EmotionViewer.html
41. UIMA + SEASR: Sentiment Analysis
• Extensions
• Adverbs, nouns, verbs
• Analysis of metrics, etc
• Goal and Relevancy
• Two new components
• SynNet
• Flash based visualization of sequential based data