SlideShare a Scribd company logo
1 of 147
Download to read offline
Measuring the Similarity and
Relatedness of Concepts :
a MICAI 2013 Tutorial
Ted Pedersen, Ph.D.
University of Minnesota
Department of Computer Science, Duluth
http://www.d.umn.edu/~tpederse
tpederse@d.umn.edu
What (I hope) you will learn!
●

●

●
●

●

The distinction between semantic similarity
and relatedness (and why both are useful)
How to measure using information from
ontologies, definitions, and corpora
How to use freely available software
How to conduct experiments using freely
available reference standards
Some applications where these measures are
used or could be useful

November 25, 2013

MICAI-2013 Tutorial

2
Orientation
●

●

We focus on methods that measure similarity
and relatedness using information found in an
ontology, which may be possibly augmented
with statistics from corpora or other resources
We will not discuss purely distributional
methods
–

November 25, 2013

Very interesting and useful, and deserve
their own separate tutorial

MICAI-2013 Tutorial

3
Just a few distributional methods
●

Latent Semantic Analysis
–

●

SenseClusters
–

●

http://senseclusters.sourceforge.net

Clustering by Committee
–

●

http://lsa.colorado.edu

http://demo.patrickpantel.com

Disco
–

November 25, 2013

http://www.linguatools.de/disco/disco_en.html
MICAI-2013 Tutorial

4
Outline
●

Measures of Similarity and Relatedness
–

●

Using Open Source Software
–

●

75 minutes + 10 minutes of questions
45 minutes + 10 minutes of questions

Similarity and Relatedness in the Wild
–

November 25, 2013

60 minutes + 10 minutes of questions

MICAI-2013 Tutorial

5
Measures of Semantic
Similarity and Relatedness
(without tears)

November 25, 2013

MICAI-2013 Tutorial

6
What are we measuring?
●

Concept pairs (word senses)
–

●

Assign a numeric value that quantifies how
similar or related two concepts are

Not words
–

Cold may be temperature or illness
●

–

This tutorial assumes senses assigned
●

November 25, 2013

Word Sense Disambiguation
But, can also use these measures for WSD!

MICAI-2013 Tutorial

7
Why?
●

●

Being able to organize concepts by their
similarity or relatedness to each other is a
fundamental operation in the human mind, and
in many problems in Natural Language
Processing and Artificial Intelligence
If we know a lot about X, and if we know Y is
similar to X, then a lot of what we know about
X may apply to Y
–

November 25, 2013

Use X to explain or categorize Y
MICAI-2013 Tutorial

8
What is lefse?

November 25, 2013

MICAI-2013 Tutorial

9
Well, it's like a tortilla, except made with potatoes.

November 25, 2013

MICAI-2013 Tutorial

10
Lefse is a traditional soft, Norwegian flatbread.
Lefse is made out of flour, and milk or cream (or
sometimes lard), and cooked on a griddle.
Traditional lefse does not include potato, but it is
commonly added to make a thicker dough that is
easier to work with. Special tools are available for
lefse baking, including long wooden turning sticks
and special rolling pins with deep grooves.

November 25, 2013

MICAI-2013 Tutorial

11
Lefse

November 25, 2013

MICAI-2013 Tutorial

12
Similar or Related?
●

Similarity based on is-a relations
–
–

November 25, 2013

How much is X like Y?
Share ancestor in is-a hierarchy

MICAI-2013 Tutorial

13
Is-a hierarchy

November 25, 2013

MICAI-2013 Tutorial

14
Similar or Related?
●

Similarity based on is-a relations
–

Share ancestor in is-a hierarchy
●
●

–

A miter saw and a sander are similar
●

November 25, 2013

LCS : least common subsumer
Closer / deeper the ancestor the more similar
both are kinds-of power tools (LCS)

MICAI-2013 Tutorial

15
Least Common Subsumer (LCS)

November 25, 2013

MICAI-2013 Tutorial

16
Saws!

November 25, 2013

MICAI-2013 Tutorial

17
Sanders!

November 25, 2013

MICAI-2013 Tutorial

18
Similar or Related?
●

Relatedness more general
–

How much is X related to Y?

–

Many ways to be related
●

●

Hammer and nail are related but they really
aren't similar
–

●

is-a, part-of, treats, affects, symptom-of, ...

(use hammer to drive nails)

All similar concepts are related, but not all
related concepts are similar

November 25, 2013

MICAI-2013 Tutorial

19
“Standard” Measures of Similarity
●

Path Based
–

●

Rada et al., 1989 (path)

Path + Depth
–
–

●

Wu & Palmer, 1994 (wup)
Leacock & Chodorow, 1998 (lch)

Path + Information Content
–

Resnik, 1995 (res)

–

Jiang & Conrath, 1997 (jcn)

–

Lin, 1998 (lin)

November 25, 2013

MICAI-2013 Tutorial

20
Path Based Measures
●

●

Distance between concepts (nodes) in tree
intuitively appealing
Spatial orientation, good for networks or maps
but not is-a hierarchies
–
–

Assumes all paths have same “weight”

–

●

Reasonable approximation sometimes
But, more specific (deeper) paths tend to
travel less semantic distance

Shortest path a good start, needs corrections

November 25, 2013

MICAI-2013 Tutorial

21
Shortest is-a Path
1
●

path(a,b) =

-----------------------------shortest is-a path(a,b)

November 25, 2013

MICAI-2013 Tutorial

22
We count nodes...
●

Maximum = 1
–
–

●

self similarity
path(miter saw, miter saw) = 1

Minimum = 1 / (longest path in isa tree)
–

path(miter saw, claw hammer) = 1/7

–

path(table saw, flat tip screwdriver) = 1/7

–

etc...

November 25, 2013

MICAI-2013 Tutorial

23
path(miter saw, sander) = .25

November 25, 2013

MICAI-2013 Tutorial

24
miter saw & sander

November 25, 2013

MICAI-2013 Tutorial

25
path (hammer, power tool) = .25

November 25, 2013

MICAI-2013 Tutorial

26
hammers and power tools

November 25, 2013

MICAI-2013 Tutorial

27
?
●

●

Are hammer and power tool similar to the
same degree as are mitre saw and sander?
The path measure reports “yes, they are.”

November 25, 2013

MICAI-2013 Tutorial

28
Path + Depth
●

Path only doesn't account for specificity

●

Deeper concepts more specific

●

Paths between deeper concepts travel less
semantic distance

November 25, 2013

MICAI-2013 Tutorial

29
Wu and Palmer, 1994
2 * depth (LCS (a,b))
●

wup(a,b) = ---------------------------depth (a) + depth (b)

●

depth(x) = shortest is-a path(root,x)

November 25, 2013

MICAI-2013 Tutorial

30
wup(miter saw, sander) = (2*2)/(4+3) = .57

November 25, 2013

MICAI-2013 Tutorial

31
wup (hammer, power tool) = (2*1)/(2+3) = .4

November 25, 2013

MICAI-2013 Tutorial

32
?
●

●

●

Wu and Palmer reports that sander and miter
saw (.57) are more similar than are power tool
and hammer (.4)
Path reports that sander and miter saw (.25)
are equally similar as are power tool and
hammer (.25)
Note that measures are scaled differently and
so should compare relative rankings between
measures (and not exact scores)

November 25, 2013

MICAI-2013 Tutorial

33
Information Content
●

ic(concept) = -log p(concept) [Resnik 1995]
–
–

●

Term frequency +Inherited frequency

–
●

Need to count concepts
p(concept) = tf + if / N

Depth shows specificity but not frequency
Low frequency concepts often much more
specific than high frequency ones

November 25, 2013

MICAI-2013 Tutorial

34
Information Content
term frequency (tf)

November 25, 2013

MICAI-2013 Tutorial

35
Information Content
inherited frequency (if)

November 25, 2013

MICAI-2013 Tutorial

36
Information Content (IC = -log (f/N)
final count (f = tf + if, N = 365,820)

November 25, 2013

MICAI-2013 Tutorial

37
Information Content (IC = -log (f/N)
final count (f = tf + if, N = 365,820)

November 25, 2013

MICAI-2013 Tutorial

38
Lin, 1998
2 * IC (LCS (a,b))
●

lin(a,b) = -------------------------IC (a) + IC (b)

●

Look familiar?

November 25, 2013

MICAI-2013 Tutorial

39
Wu & Palmer, 1994
●

●

2 * depth (LCS (a,b))
wup(a,b) = -------------------------depth (a) + depth (b)
wup and lin are identical except that lin
uses info content instead of depth
– Info content provides a measure of
depth (based on specificity)

November 25, 2013

MICAI-2013 Tutorial

40
lin (miter saw, sander) =
2 * 2.26 / (3.60 + 3.66) = 0.62

November 25, 2013

MICAI-2013 Tutorial

41
lin (hammer, power tool) =
2 * 0.71 / (2.26+2.81) = 0.28

November 25, 2013

MICAI-2013 Tutorial

42
?
●

●

●

Lin : miter saw and sander (.62) more similar
than hammer and power tool (.28)
Wu and Palmer : miter saw and sander (.57)
more similar than hammer and power tool (.4)
Path miter saw and sander (.25) equally
similar to hammer and power tool (.25)

November 25, 2013

MICAI-2013 Tutorial

43
What about concepts not connected
via is-a relations?
●

Connected via other relations?
–

●

Part-of, treatment-of, causes, etc.

Not connected at all?
–
–

●

In different sections (axes) of an ontology
In different ontologies entirely

Relatedness!
–

Use definition information

–

No is-a relations so can't be similarity

November 25, 2013

MICAI-2013 Tutorial

44
Measures of relatedness
●

Path based
–

●

Hirst & St-Onge, 1998 (hso)

Definition based
–

Lesk, 1986

–

Adapted lesk (lesk)
●

●

Banerjee & Pedersen, 2003

Definition + corpus
–

Gloss Vector (vector, vector_pairs)
●

November 25, 2013

Patwardhan & Pedersen, 2006
MICAI-2013 Tutorial

45
Path based relatedness
●
●

Ontologies include relations other than is-a
These can be used to find shortest paths
between concepts
–

However, a path made up of different kinds
of relations can lead to big semantic jumps

–

A hammer is used to drive nails which are
made of iron which comes from mines in
Minnesota
●

November 25, 2013

…. so hammer and Minnesota are related ??
MICAI-2013 Tutorial

46
Measuring relatedness with definitions
●

●
●

Related concepts defined using many of the
same terms
But, definitions are short, inconsistent
Concepts don't need to be connected via
relations or paths to measure them
–

Lesk, 1986

–

Adapted Lesk, Banerjee & Pedersen, 2003

November 25, 2013

MICAI-2013 Tutorial

47
Two separate ontologies...

November 25, 2013

MICAI-2013 Tutorial

48
Could join them together … ?

November 25, 2013

MICAI-2013 Tutorial

49
Each concept has definition

November 25, 2013

MICAI-2013 Tutorial

50
Each concept has definition

November 25, 2013

MICAI-2013 Tutorial

51
Each concept has definition

November 25, 2013

MICAI-2013 Tutorial

52
Overlaps
●

Claw hammer and carpenter
–

Related by working with wood
●
●

●

Can't see this in structure of is-a hierarchies
Claw hammer and iron worker just as similar

Ball peen hammer and claw hammer
–

Reflects structure of is-a hierarchies

–

If you start with text like this maybe you can
build is-a hierarchies automatically!
●

November 25, 2013

Another tutorial...
MICAI-2013 Tutorial

53
Lesk and Adapted Lesk
●

Lesk, 1986 : measure overlaps in definitions to
assign senses to words
–

●

The more overlaps between two senses
(concepts), the more related

Banerjee & Pedersen, 2003, Adapted Lesk
–

Augment definition of each concept with
definitions of related concepts
●

–
November 25, 2013

Build a super gloss

Increase chance of finding overlaps
MICAI-2013 Tutorial

54
The problem with definitions ...
●

Definitions contain variations of terminology that
make it impossible to find exact overlaps

●

spatula : an instrument for spreading material

●

spreader : a hand tool for smoothing compounds

●

No matches??! How can we see that “hand tool”
and “instrument” are similar, as are “spreading
material” and “smoothing compound” ?

November 25, 2013

MICAI-2013 Tutorial

55
Gloss Vector Measure
of Semantic Relatedness
●

Rely on co-occurrences of terms
–

Terms that occur within some given number
of terms of each other other

●

Allows for a fuzzier notion of matching

●

Exploits second order co-occurrences
–

November 25, 2013

Friend of a friend relation

MICAI-2013 Tutorial

56
Gloss Vector Measure
of Semantic Relatedness
●

Friend of a friend relation
–

Suppose hand tool and instrument don't
occur in text with each other. But, suppose
that “repair” occurs with each.

–

Hand tool and instrument are second order
co-occurrences via “repair”

November 25, 2013

MICAI-2013 Tutorial

57
Gloss Vector Measure
of Semantic Relatedness
●

●

●

●

Replace words or terms in definitions with
vector of co-occurrences observed in corpus
Defined concept now represented by an
averaged vector of co-occurrences
Measure relatedness of concepts via cosine
between their respective vectors
Patwardhan and Pedersen, 2006
–

November 25, 2013

Inspired by Schutze, 1998
MICAI-2013 Tutorial

58
Experimental Results
●

Vector > Lesk > Info Content > Depth > Path
–

●

Clear trend across various studies

Big differences in intrinsic evaluations (Vector
> Lesk >> Info Content > Depth > Path)
–
–

●

Banerjee and Pedersen, 2003 (IJCAI)
Pedersen, et al. 2007 (JBI)

Smaller differences in extrinsic evaluations
–

November 25, 2013

Human raters mix up similarity &
relatedness?
MICAI-2013 Tutorial

59
Questions?

November 25, 2013

MICAI-2013 Tutorial

60
References
●

●

●

S. Banerjee and T. Pedersen. Extended gloss overlaps as a
measure of semantic relatedness. In Proceedings of the Eighteenth
International Joint Conference on Artificial Intelligence, pages 805810, Acapulco, August 2003. (lesk)
J. Jiang and D. Conrath. Semantic similarity based on corpus
statistics and lexical taxonomy. In Proceedings on International
Conference on Research in Computational Linguistics, pages 1933, Taiwan, 1997. (jcn)
C. Leacock and M. Chodorow. Combining local context and
WordNet similarity for word sense identification. In C. Fellbaum,
editor, WordNet: An electronic lexical database, pages 265-283.
MIT Press, 1998. (lch)

November 25, 2013

MICAI-2013 Tutorial

61
References
●

●

●

M.E. Lesk. Automatic sense disambiguation using machine
readable dictionaries: how to tell a pine code from an ice cream
cone. In Proceedings of the 5th annual international conference on
Systems documentation, pages 24-26. ACM Press, 1986.
D. Lin. An information-theoretic definition of similarity. In
Proceedings of the International Conference on Machine Learning,
Madison, August 1998. (lin).
S. Patwardhan and T. Pedersen. Using WordNet-based Context
Vectors to Estimate the Semantic Relatedness of Concepts. In
Proceedings of the EACL 2006 Workshop on Making Sense of
Sense: Bringing Computational Linguistics and Psycholinguistics
Together, pages 1-8, Trento, Italy, April 2006. (vector, vector_pairs)

November 25, 2013

MICAI-2013 Tutorial

62
References
●

●

●

R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and
application of a metric on semantic nets. IEEE Transactions on
Systems, Man and Cybernetics, 19(1):17-30, 1989. (path)
P. Resnik. Using information content to evaluate semantic similarity
in a taxonomy. In Proceedings of the 14th International Joint
Conference on Artificial Intelligence, pages 448-453, Montreal,
August 1995. (res)
H. Schütze. Automatic word sense discrimination. Computational
Linguistics, 24(1):97-123, 1998.

November 25, 2013

MICAI-2013 Tutorial

63
Using Open Source Software

November 25, 2013

MICAI-2013 Tutorial

64
Using Open Source Software
●

Packages providing the “standard” measures

●

Implementations of specific measures

●

Overview of WordNet::Similarity usage

November 25, 2013

MICAI-2013 Tutorial

65
Software that provides
some of the “standard” measures

November 25, 2013

MICAI-2013 Tutorial

66
WordNet::Similarity
●

Similarity and Relatedness for WordNet
–

http://wordnet.princeton.edu

●

Written in Perl (starting in 2002)

●

Offers command line, web interface, and API
–

●

http://wn-similarity.sourceforge.net

We'll come back to this for some examples

November 25, 2013

MICAI-2013 Tutorial

67
ws4j
●

Java Re-implementation of WordNet::Similarity
–

●

●

https://code.google.com/p/ws4j/

Includes path, depth, info content, hso, and
lesk measures
Online demo
–

November 25, 2013

http://ws4jdemo.appspot.com/

MICAI-2013 Tutorial

68
NLTK
●

Natural Language Toolkit
–

●

Includes path, depth, and information
content measures

Written in Python
–

General purpose NLP toolkit
●

–

November 25, 2013

Parsers, part of speech taggers, and more

http://nltk.org/

MICAI-2013 Tutorial

69
DKPro Similarity
●

Semantic similarity using vector space models
like LSA and ESA, and also WordNet
–
–

●

●

Implemented using UIMA
https://code.google.com/p/dkpro-similarity-asl/

Part of the much larger DKPro project, which
provides UIMA wrappers for many existing
tools and models
Supports measuring similarity of short texts and
concept pairs

November 25, 2013

MICAI-2013 Tutorial

70
Semilar
●

Semantic similarity using WordNet and LSA
–

●

●
●

http://semanticsimilarity.org

Supports measuring similarity of short texts
and concept pairs
Provides many pre-built models using LSA
Includes a web service and API in addition to
downloadable libraries

November 25, 2013

MICAI-2013 Tutorial

71
UMLS::Similarity
●

Ports WordNet::Similarity to the UMLS
–

Unified Medical Language System from
NLM, a data warehouse of medical sources
●

–
●

Freely available, license required

http://umls-similarity.sourceforge.net

Perl and mySQL

November 25, 2013

MICAI-2013 Tutorial

72
ProteInOn
●

Computes Semantic Similarity for the Gene
Ontology (GO) using path and information
content measures
–

●

http://geneontology.org/

Protein Interactions and Ontology
–

November 25, 2013

http://lasige.di.fc.ul.pt/webtools/proteinon/

MICAI-2013 Tutorial

73
Measure Specific Software

November 25, 2013

MICAI-2013 Tutorial

74
UKB
●

Graph based similarity and relatedness
measures, using WordNet
–

●

http://ixa2.si.ehu.es/ukb/

Applies Personalized Page Rank to semantic
similarity and relatedness measures, as well
as to word sense disambiguation

November 25, 2013

MICAI-2013 Tutorial

75
WMFVEC
●

High dimensional vector approach using
definitions from WordNet and Wiktionary
–

●

http://www.cs.columbia.edu/~weiwei/code.h
tml#wmfvec

Supports similarity measurements of short
texts and concept pairs

November 25, 2013

MICAI-2013 Tutorial

76
olesk
●

Shortest path in weighted semantic network
–

●

http://olesk.com/#SemanticRelatedness

Supports measuring similarity of short texts
and concept pairs

November 25, 2013

MICAI-2013 Tutorial

77
Illinois WNSim
●

WordNet-based Similarity Metric
–
–
–

●

https://cogcomp.cs.illinois.edu/page/softwa
re_view/Illinois%20WNSim
Also provides Java version
https://cogcomp.cs.illinois.edu/page/softw
are_view/Illinois%20WNSim%20(Java
)

Measures similarity of short texts, provides
support for similarity of named entities

November 25, 2013

MICAI-2013 Tutorial

78
WordNet::Similarity Usage

November 25, 2013

MICAI-2013 Tutorial

79
WordNet::Similarity
●

Install WordNet (http://princeton.wordnet.edu)
–

●

Make sure to set $WNHOME

Install WordNet::QueryData
–
–

●

cpan
> install WordNet::QueryData

Install WordNet::Similarity
–

cpan

–

> install WordNet::Similarity

November 25, 2013

MICAI-2013 Tutorial

80
WordNet::Similarity

November 25, 2013

MICAI-2013 Tutorial

81
Command line
●

similarity.pl --type WordNet::Similarity::path dog cat
–

●

similarity.pl --type WordNet::Similarity::path dog#n cat#n
–

●

dog#n#1 cat#n#1 0.2

similarity.pl --type WordNet::Similarity::path dog#n#2 cat#n#3
–

●

dog#n#1 cat#n#1 0.2

dog#n#2 cat#n#3 0.125

Similarity.pl –type WordNet::Similarity::path dog cat#n#3
–

November 25, 2013

dog#n#3 cat#n#3 0.142857142857143

MICAI-2013 Tutorial

82
Command line
●

similarity.pl --type WordNet::Similarity::path dog cat --allsenses
–
–

dog#n#3 cat#n#2 0.2

–

dog#n#1 cat#n#7 0.2

–

dog#n#7 cat#n#5 0.166666666666667

–

dog#n#4 cat#n#2 0.142857142857143

–

dog#n#3 cat#n#3 0.142857142857143

–

dog#v#1 cat#v#2 0.142857142857143

–

dog#n#4 cat#n#3 0.142857142857143

–

dog#n#6 cat#n#5 0.142857142857143

–

dog#v#1 cat#v#1 0.142857142857143

–

dog#n#2 cat#n#2 0.125

–

November 25, 2013

dog#n#1 cat#n#1 0.2

Etc...

MICAI-2013 Tutorial

83
WordNet senses
●

wn cat -over

●

Overview of noun cat

●

The noun cat has 8 senses (first 1 from tagged texts)

●

●

●

●

1. (18) cat, true cat -- (feline mammal usually having thick soft fur
and no ability to roar: domestic cats; wildcats)
2. guy, cat, hombre, bozo -- (an informal term for a youth or man; "a
nice guy"; "the guy's only doing it for some doll")
3. cat -- (a spiteful woman gossip; "what a cat she is!")
4. kat, khat, qat, quat, cat, Arabian tea, African tea -- (the leaves of
the shrub Catha edulis which are chewed like tobacco or used to
make tea; has the effect of a euphoric stimulant; "in Yemen kat is
used daily by 85% of adults")

November 25, 2013

MICAI-2013 Tutorial

84
wn cat -over
●

●

●

●

5. cat-o'-nine-tails, cat -- (a whip with nine knotted cords;
"British sailors feared the cat")
6. Caterpillar, cat -- (a large tracked vehicle that is propelled
by two endless metal belts; frequently used for moving earth
in construction and farm work)
7. big cat, cat -- (any of several large cats typically able to roar
and living in the wild)
8. computerized tomography, computed tomography, CT,
computerized axial tomography, computed axial tomography,
CAT -- (a method of examining body organs by scanning them
with X rays and using a computer to construct a series of
cross-sectional scans along a single axis

November 25, 2013

MICAI-2013 Tutorial

85
wn cat -over
●

Overview of verb cat

●

The verb cat has 2 senses (no senses from tagged texts)

●

1. cat -- (beat with a cat-o'-nine-tails)

●

2. vomit, vomit up, purge, cast, sick, cat, be sick, disgorge, regorge,
retch, puke, barf, spew, spue, chuck, upchuck, honk, regurgitate,
throw up -- (eject the contents of the stomach through the mouth;
"After drinking too much, the students vomited"; "He purged
continuously"; "The patient regurgitated the food we gave him last
night")

November 25, 2013

MICAI-2013 Tutorial

86
Measure type
(-type WordNet::Similarity::X)
●

Similarity
–

Shortest Path
●

–

Depth
●

–

wup, lch

Information Content
●

November 25, 2013

path

res, lin, jcn

MICAI-2013 Tutorial

87
Measure type
(-type WordNet::Similarity::X)
●

Relatedness
–

Definition
●

–

Definition + Corpus
●
●

–

vector
vector_pairs

Path finding
●

November 25, 2013

lesk

hso
MICAI-2013 Tutorial

88
Measure type
(-type WordNet::Similarity::X)
●

Random
–

rand

–

Very useful for experimental comparisons
●

November 25, 2013

How do your results compare with a random
measure of similarity or relatedness??

MICAI-2013 Tutorial

89
Similarity measures don't
cross part of speech tags
●

similarity.pl --type WordNet::Similarity::path dog#n cat#v
–

Warning (WordNet::Similarity::path::parseWps()) - dog#n
and cat#v belong to different parts of speech.

–

dog#n#2 cat#v#1 -1000000

November 25, 2013

MICAI-2013 Tutorial

90
Relatedness measures
cross POS boundaries
●

similarity.pl --type WordNet::Similarity::lesk dog#n#1 cat#v#1
–

●

dog#n#1 cat#v#1 20

similarity.pl --type WordNet::Similarity::vector dog#n#1 cat#v#1
–

November 25, 2013

dog#n#1 cat#v#1 0.146581307649965

MICAI-2013 Tutorial

91
API
●

use WordNet::Similarity::wup;

●

use WordNet::QueryData;

●

my $wn = WordNet::QueryData->new();

●

my $wup = WordNet::Similarity::wup->new($wn);

●

●

my $value = $wup->getRelatedness('dog#n#1', 'cat#n#1');

●

my ($error, $errorString) = $wup->getError();

●

die $errorString if $error;

●

print "dog (sense 1) <-> cat (sense 1) = $valuen";

●

dog (sense 1) <-> cat (sense 1) = 0.866666666666667

November 25, 2013

MICAI-2013 Tutorial

92
API
●

my $wn = WordNet::QueryData->new;

●

use WordNet::Similarity::PathFinder;

●

my $obj = WordNet::Similarity::PathFinder->new ($wn);

●

my $wps1 = 'winston_churchill#n#1';

●

my $wps2 = 'england#n#1';

●

my @paths = $obj->getShortestPath($wps1, $wps2, 'n', 'wps');

●

my ($length, $path) = @{shift @paths};

●

defined $path or die "No path between synsets";

●

print "shortest path between $wps1 and $wps2 is $length edges longn";

●

print "@$pathn";

●

shortest path between winston_churchill#n#1 and england#n#1 is 14 edges long
winston_churchill#n#1 writer#n#1 communicator#n#1 person#n#1 causal_agent#n#1
physical_entity#n#1 object#n#1 location#n#1 region#n#3 district#n#1
administrative_district#n#1 country#n#2 European_country#n#1 england#n#1

November 25, 2013

MICAI-2013 Tutorial

93
Web Interface

November 25, 2013

MICAI-2013 Tutorial

94
Web Interface

November 25, 2013

MICAI-2013 Tutorial

95
Web Interface

November 25, 2013

MICAI-2013 Tutorial

96
Web Interface

November 25, 2013

MICAI-2013 Tutorial

97
Web Interface

November 25, 2013

MICAI-2013 Tutorial

98
Web Interface

November 25, 2013

MICAI-2013 Tutorial

99
Web Interface
●

If you like the web interface, you can run your
own version!
–

similarity_server.pl

–

All necessary html and cgi files included

November 25, 2013

MICAI-2013 Tutorial

100
Other Utilities
●

Build new information content files – by default
counts come from SemCor
–

BNCFreq.pl

–

brownFreq.pl

–

treebankFreq.pl

–

rawtextFreq.pl

●

compounds.pl – list all WordNet compounds

●

wnDepths.pl – list all WordNet depths

November 25, 2013

MICAI-2013 Tutorial

101
Questions?

November 25, 2013

MICAI-2013 Tutorial

102
References
●

●

●

●

Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius
Pasca and Aitor Soroa. 2009. A Study on Similarity and Relatedness
Using Distributional and WordNet-based Approaches. Proceedings of
NAACL-HLT 09. Boulder, USA. (ukb)
Daniel Bär, Torsten Zesch, and Iryna Gurevych. DKPro Similarity: An
Open Source Framework for Text Similarity, in Proceedings of the 51st
Annual Meeting of the Association for Computational Linguistics: System
Demonstrations, pages 121-126, August 2013, Sofia, Bulgaria. (pdf) (bib)
(dkpro-similarity)
Steven Bird, Ewan Klein, and Edward Loper (2009). Natural Language
Processing with Python. O’Reilly Media Inc. (nltk)
Q. Do and D. Roth and M. Sammons and Y. Tu and V. Vydiswaran,
Robust, Light-weight Approaches to compute Lexical Similarity. Computer
Science Research and Technical Reports, University of Illinois (2009)
(Illionois WNSim)

November 25, 2013

MICAI-2013 Tutorial

103
References
●

●

●

Weiwei Guo and Mona Diab. "Improving Lexical Semantics for Sentential
Semantics: Modeling Selectional Preference and Similar Words in a Latent
Variable Model". In Proceedings of NAACL, 2013, Atlanta, Georgia, USA.
(wmfvec)
Bridget McInnes, Ted Pedersen, and Serguei Pakhomov, UMLS-Interface
and UMLS-Similarity : Open Source Software for Measuring Paths and
Semantic Similarity - Appears in the Proceedings of the Annual
Symposium of the American Medical Informatics Association, Nov 14-18,
2009, pp. 431-435, San Francisco, CA (umls-similarity)
Ted Pedersen, Siddharth Patwardhan and Jason Michelizzi,
WordNet::Similarity - Measuring the Relatedness of Concepts - Appears in
the Proceedings of Fifth Annual Meeting of the North American Chapter of
the Association for Computational Linguistics (NAACL-04), pp. 38-41, May
3-5, 2004, Boston, MA. (wordnet-similarity)

November 25, 2013

MICAI-2013 Tutorial

104
References
●

●

Rus, V., Lintean, M., Banjade, R., Niraula, N., and Stefanescu, D.
(2013). SEMILAR: The Semantic Similarity Toolkit. Proceedings of
the 51st Annual Meeting of the Association for Computational
Linguistics, August 4-9, 2013, Sofia, Bulgaria. (semilar)
Reda Siblini and Leila Kosseim (2013). Using a Weighted Semantic
Network for Lexical Semantic Relatedness. In Proceedings of
Recent Advances in Natural Language Processing (RANLP 2013),
September, Hissar, Bulgaria. (olesk)

November 25, 2013

MICAI-2013 Tutorial

105
Similarity and Relatedness in the Wild :
How do we know it's working?

November 25, 2013

MICAI-2013 Tutorial

106
Intrinsic Evaluation
●
●

●

Develop your own measure
Score it using pairs for which human reference
standard is available
Compare correlation between your measure
and established measures
–

Spearman's rank correlation often used

–

rank.pl in Ngram Statistics Package
●

November 25, 2013

http://ngram.sourceforge.net

MICAI-2013 Tutorial

107
Intrinsic Evaluation
●

Replication proves to be very difficult!

●

Many factors, see ACL 2013 paper
–

Offspring from Reproduction Problems: What Replication
Failure Teaches Us (Fokkens, van Erp, Postma,
Pedersen, Vossen, and Freire) - Appears in the
Proceedings of the 51st Annual Meeting of the
Association for Computational Linguistics, August 4-9,
2013, pp. 1691-1701, Sofia, Bulgaria.

–

http://aclweb.org/anthology//P/P13/P13-1166.pdf

November 25, 2013

MICAI-2013 Tutorial

108
Reference Standards
●

Rubenstein and Goodenough, 1965
–
–

Assessed by 50 undergraduate students

–
●

65 pairs
http://www.d.umn.edu/~tpederse/Data/ruben
stein-goodenough-1965.txt

Miller and Charles, 1991
–

30 pair subset of R&G

–

Re-assessed by 38 undergraduate students

–

http://www.d.umn.edu/~tpederse/Data/millercharles-1991.txt

November 25, 2013

MICAI-2013 Tutorial

109
Rubenstein and Goodenough pairs
no similarity (0.0) – synonyms (4.0)
●

3.94 gem jewel

●

3.92 car automobile

●

3.92 automobile car

●

3.84 gem jewel

●

2.41 brother lad

●

1.66 lad brother

●

2.37 crane implement

●

1.68 crane implement

●

0.04 rooster voyage

●

0.08 rooster voyage

●

0.04 noon string

●

0.08 noon string

–
November 25, 2013

Rubenstein &
Goodenough

MICAI-2013 Tutorial

–

Miller & Charles
110
Reference Standards
•

WordSim-353, 2002
–
–

200 pairs assessed by 16 subjects

–

●

153 pairs assessed by 13 subjects
Includes the Miller and Charles pairs (reassessed)

http://www.cs.technion.ac.il/~gabr/resources/
data/wordsim353/

November 25, 2013

MICAI-2013 Tutorial

111
Reference Standards
●

Yang and Powers, 2006

●

130 verb pairs
–

Assessed by 2 academic staff and 4
graduate students

–

How related in meaning is the pair?
●
●

–

November 25, 2013

0 for not at all
4 for inseperably related

http://david.wardpowers.info/Research/AI/p
apers/200601-GWC-130verbpairs.txt
MICAI-2013 Tutorial

112
Reference standards
●
●

●

Mturk 771, 2012
771 word pairs scored for relatedness by
Mechanical Turkers
At least 20 judgements per pair
–

1 for not related, 5 for highly related

–

50 ratings per Turker

–

http://www2.mta.ac.il/~gideon/mturk771.html

November 25, 2013

MICAI-2013 Tutorial

113
Reference Standards
●

MWE-300, 2012
–

–

Assessed by 5 native speakers on scale of 0
to 1

–

November 25, 2013

300 pairs where 216 are multi-word
expressions and 84 are word pairs

http://adapt.seiee.sjtu.edu.cn/similarity/

MICAI-2013 Tutorial

114
Reference Standards
●
●

Rel-122, 2013
Relatedness scores for 122 noun pairs,
created at University of Central Florida
–

Each pair assessed by at least 20
undergraduate students

–

0 for completely unrelated, 4 for strongly
related

–

http://www.cs.ucf.edu/~seansz/rel-122/

November 25, 2013

MICAI-2013 Tutorial

115
Reference standards
●

MayoSRS, 2007
–

101 pairs of medical concepts

–

Assessed by 13 medical coders and 3
physicians, all from Mayo Clinic
●

–

●

1 for not at all related, 4 for nearly
synonymous

MiniMayoSRS – a highly reliable subset of
29 pairs

http://rxinformatics.umn.edu/SemanticRelate
dnessResources.html

November 25, 2013

MICAI-2013 Tutorial

116
Reference standards
UMNSRS, 2010
–

–

●
●

566 pairs of medical concepts assessed for
similarity by 8 medical students / residents
587 pairs of medical concepts assessed for
relatedness by 8 medical students / residents

Assessed on a continuous scale (0 – 1500)
http://rxinformatics.umn.edu/SemanticRelated
nessResources.html

November 25, 2013

MICAI-2013 Tutorial

117
Reference Standards
●

Lexical & Distributional Semantics Evaluation
Benchmarks, maintained by Manaal Faruqui
–

●

http://www.cs.cmu.edu/~mfaruqui/suite.html

ACL Wiki (various datasets for related tasks)
http://aclweb.org/aclwiki/index.php?title=Simi
larity_(State_of_the_art)
http://aclweb.org/aclwiki/index.php?title=Kn
owledge_collections_and_datasets_(English)
SemEval (many related tasks with data)
–

●

–
November 25, 2013

http://aclweb.org/aclwiki/index.php?title=Se
mEval_Portal
MICAI-2013 Tutorial

118
Extrinsic Evaluation
●

Synonym Tests

●

Word Sense Disambiguation

●

Semantic Textual Similarity

●

Recognizing Textual Entailment

November 25, 2013

MICAI-2013 Tutorial

119
ESL Synonym Tests
●

Provide one target word in context

●

Select “closest” synonym from a list of 4

●

●

●

Used in previous versions of TOEFL and other
standardized tests
http://aclweb.org/aclwiki/index.php?title=ESL_Synonym_Questions_(State_
of_the_art)

50 question data set available from Peter Turney
–

November 25, 2013

http://www.apperceptual.com/
MICAI-2013 Tutorial

120
ESL Synonym Tests
●

●

Stem: "A rusty nail is not as strong as a clean,
new one."
Choices:
–

(a) corroded

–

(b) black

–

(c) dirty

–

(d) painted

November 25, 2013

MICAI-2013 Tutorial

121
ESL Synonym Tests
●

similarity.pl --type WordNet::Similarity::vector --file rusty.txt

●

rusty#a#1 dirty#a#1 0.175879883782967

●

rusty#a#1 painted#a#1 0.0844532311114079

●

rusty#a#1 black#a#2 0.0656019491836669

●

rusty#a#1 corroded#a#1 0.0357324093083641
–

November 25, 2013

:(

MICAI-2013 Tutorial

122
TOEFL Synonym Tests
●

Rusty and other words are adjectives

●

Must used relatedness measure
lesk
– vector
– vector_pairs
– hso
Should do word sense disambiguation first
–

●

November 25, 2013

MICAI-2013 Tutorial

123
Word Sense Disambiguation
●

The meanings of words that occur together in
a context will likely be related
–

If a word has multiple senses, it will most
likely be used in the sense that is most
related to the senses of it's neighbors

–

Relatedness seems to matter more than
similarity, unless you have a list
●

November 25, 2013

I have a horse, a cat and a cow at my farm.

MICAI-2013 Tutorial

124
Word Sense Disambiguation
●

SenseRelate Hypothesis : Most words in text
will have multiple possible senses and will
often be used with the sense most related to
those of surrounding words
–

He either has a cold or the flu
●

November 25, 2013

Cold not likely to mean air temperature

MICAI-2013 Tutorial

125
SenseRelate
●

●

In coherent text words will be used in similar or
related senses, and these will also be related
to the overall topic or mood of a text
First applied to WSD in 2002
–

Banerjee and Pedersen, 2002 (WordNet)

–

Patwardhan et al., 2003 (WordNet)

–

Pedersen and Kolhatkar 2009 (WordNet)

–

McInnes et al., 2011 (UMLS)

November 25, 2013

MICAI-2013 Tutorial

126
Implementations
●

WordNet::SenseRelate
–

AllWords, TargetWord, WordToSet

–

http://senserelate.sourceforge.net
●

●

Includes command line, API, and web
interface

UMLS::SenseRelate
–

AllWords

–

http://search.cpan.org/dist/UMLS-SenseRelat
e/

November 25, 2013

MICAI-2013 Tutorial

127
Web Interface

November 25, 2013

MICAI-2013 Tutorial

128
Web Interface

November 25, 2013

MICAI-2013 Tutorial

129
SenseRelate for WSD
●

Assign each word the sense which is most
similar or related to one or more of its
neighbors
–
–

●

Pairwise
2 or more neighbors

Pairwise algorithm results in a trellis much like
in HMMs
–

November 25, 2013

More neighbors adds lots of information and
a lot of computational complexity
MICAI-2013 Tutorial

130
SenseRelate - pairwise

November 25, 2013

MICAI-2013 Tutorial

131
SenseRelate – 2 neighbors

November 25, 2013

MICAI-2013 Tutorial

132
General Observations on WSD Results
●

●

●

●

Nouns more accurate; verbs, adjectives, and
adverbs less so
Increasing the window size nearly always
improves performance
Jiang-Conrath measure often a high performer
for nouns (e.g., Patwardhan et al. 2003)
Vector and lesk have coverage advantage
–

November 25, 2013

handle mixed pairs while others don't
MICAI-2013 Tutorial

133
SenseRelate Sentiment Classification
●

The underlying sentiment of a text can be
discovered by determining which emotion is
most related to the words in that text.
–
–

Similar to happy? : joyful, ecstatic, ...

–

●

Related to happy? : love, food, success, ...
Pairwise comparisons between emotion and
senses of words in context

Same form as Naive Bayesian model
–

November 25, 2013

WordNet::SenseRelate::WordToSet
MICAI-2013 Tutorial

134
SenseRelate - WordToSet

November 25, 2013

MICAI-2013 Tutorial

135
Experimental Results
●

Sentiment classification results in 2011 i2b2
suicide notes challenge were disappointing
(Pedersen, 2012)
–

Suicide notes not very emotional!

–

In many cases reflect a decision made and
focus on settling affairs

November 25, 2013

MICAI-2013 Tutorial

136
Semantic Textual Similarity (STS)
●

How similar (semantically) are 2 texts?
–

–

●

The Senate Select Committee on
Intelligence is preparing a blistering report on
prewar intelligence on Iraq.
American intelligence leading up to the war
on Iraq will be criticized by a powerful US
Congressional committee due to report soon,
officials said today

http://www-nlp.stanford.edu/wiki/STS

November 25, 2013

MICAI-2013 Tutorial

137
Semantic Textual Similarity (STS)
●

Combined distributional and WordNet
information to learn a model from training data
–

●

UKP: Computing Semantic Textual Similarity by Combining
Multiple Content Similarity Measures,
Daniel Bär, Chris Biemann, Iryna Gurevych, and Torsten
Zesch, Semeval 2012

LSA Boosted with WordNet
–

November 25, 2013

UMBC EBIQUITY-CORE: Semantic Textual Similarity Sy
stems
Lushan Han, Abhay L. Kashyap, Tim Finin, James
Mayfield, and Johnathan Weese, *Sem 2013

MICAI-2013 Tutorial

138
Recognizing Textual Entailment (RTE)
●

A text entails a hypothesis if a human reading
the text would infer that the hypothesis is true
–

Text : The Christian Science Monitor named
a US journalist kidnapped in Iraq as
freelancer Jill Carroll.

–

Hypothesis: Jill Carroll was abducted in Iraq.

–

Hypothesis: The Christian Science Monitor
kidnapped a freelancer.

November 25, 2013

MICAI-2013 Tutorial

139
RTE methods and data
●

Long series of shared tasks
–
–

●

2004 to present
http://aclweb.org/aclwiki/index.php?title=T
extual_Entailment_Resource_Pool

Recognizing that T and H are similar is helpful,
although does not really solve the problem
–

November 25, 2013

Hybrid approaches (like with STS)

MICAI-2013 Tutorial

140
Applications
●

Semantic similarity and relatedness are
important components of many NLP
applications
–

Crucial building blocks

–

Interesting to study in their own right

November 25, 2013

MICAI-2013 Tutorial

141
Thank you!
If you have any suggestions for content that
should be added to or changed in this tutorial,
please let me know! Any other comments are
welcome too.
tpederse@d.umn.edu
Questions?
November 25, 2013

MICAI-2013 Tutorial

142
References
●

●

●

●

S. Banerjee and T. Pedersen. An adapted Lesk algorithm for word sense
disambiguation using WordNet. In Proceedings of the Third International
Conference on Intelligent Text Processing and Computational Linguistics,
pages 136—145, Mexico City, February 2002. (wsd result)
D. Faria, C. Pesquita, F. M. Couto, and A. Falcão, ProteInOn: A Web Tool
for Protein Semantic Similarity, Technical Report, Department of
Informatics, University of Lisbon, 2007 (proteinon)
L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan and G.
Wolfman (2002). Placing Search in Context: The Concept Revisited. ACM
Transactions on Information Systems, 20(1), 116-131. (wordsim-353)
B. McInnes, T. Pedersen, Y. Liu, G. Melton and S. Pakhomov. Knowledgebased Method for Determining the Meaning of Ambiguous Biomedical
Terms Using Information Content Measures of Similarity. Appears in the
Proceedings of the Annual Symposium of the American Medical
Informatics Association, pages 895-904, Washington, DC, October 2011.
(wsd result)

November 25, 2013

MICAI-2013 Tutorial

143
References
●

●

●

●

G. A. Miller and W. G. Charles (1991). Contextual Correlates of Semantic
Similarity. Language and Cognitive Processes, 6(1), 1-28.
S. Pakhomov, B. McInnes, T. Adam, Y. Liu, T. Pedersen, and G Melton,
Semantic Similarity and Relatedness between Clinical Terms : An
Experimental Study - Appears in the Proceedings of the Annual
Symposium of the American Medical Informatics Association, November
13-17, 2010, pp. 572 - 576, Washington, DC. (umnsrs)
S. Patwardhan, S. Banerjee, and T. Pedersen. Using measures of
semantic relatedness for word sense disambiguation. In Proceedings of
the Fourth International Conference on Intelligent Text Processing and
Computational Linguistics, pages 241—257, Mexico City, February 2003.
(wsd result)
S. Patwardhan and T. Pedersen. Using WordNet-based Context Vectors
to Estimate the Semantic Relatedness of Concepts. In Proceedings of the
EACL 2006 Workshop on Making Sense of Sense: Bringing Computational
Linguistics and Psycholinguistics Together, pages 1-8, Trento, Italy, April
2006. (wsd result)

November 25, 2013

MICAI-2013 Tutorial

144
References
●

●

●

●

T. Pedersen and V. Kolhatkar. WordNet :: SenseRelate :: AllWords - a
broad coverage word sense tagger that maximizes semantic relatedness.
In Proceedings of the North American Chapter of the Association for
Computational Linguistics - Human Language Technologies 2009
Conference, pages 17-20, Boulder, CO, June 2009. (wsd result)
T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of
semantic similarity and relatedness in the biomedical domain. Journal of
Biomedical Informatics, 40(3) : 288-299, June 2007. (mayosrs)
T. Pedersen. Rule-based and lightly supervised methods to predict
emotions in suicide notes. Biomedical Informatics Insights, 2012:5 (Suppl.
1):185-193, January 2012. (sentiment result)
H. Rubenstein and J. B. Goodenough (1965). Contextual Correlates of
Synonymy. Communications of the ACM, 8(10), 627-633.

November 25, 2013

MICAI-2013 Tutorial

145
References
●

●

●

●

●

SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, Eneko
Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Semeval 2012 (sts
shared task)
S. Szumlanski, F. Gomez and V.K. Sims (2013). A New Set of Norms for
Semantic Relatedness Measures. Proceedings of the 51st Annual Meeting
of the Association for Computational Linguistics (Volume 2: Short Papers)
(pp. 890-895). Sofia, Bulgaria. (rel-122)
P. D. Turney (2001). Mining the Web for synonyms: PMI-IR versus LSA on
TOEFL. Proceedings of the Twelfth European Conference on Machine
Learning (ECML-2001), Freiburg, Germany, pp. 491-502. (toefl synonyms)
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy
for text understanding. In Proceedings of SIGMOD'12, pages 481-492,
2012. (mwe-300)
D. Yang and D.M. W. Powers (2006). Verb Similarity on the Taxonomy of
WordNet. Proceedings of the Third International WordNet Conference
(GWC-06) (pp. 121-128). Jeju Island, Korea.

November 25, 2013

MICAI-2013 Tutorial

146
The End!

November 25, 2013

MICAI-2013 Tutorial

147

More Related Content

Viewers also liked

Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information RetrievalDishant Ailawadi
 
Pass4sure 640-864 Questions Answers
Pass4sure 640-864 Questions AnswersPass4sure 640-864 Questions Answers
Pass4sure 640-864 Questions AnswersRoxycodone Online
 
Document similarity with vector space model
Document similarity with vector space modelDocument similarity with vector space model
Document similarity with vector space modeldalal404
 
Initial Configuration of Router
Initial Configuration of RouterInitial Configuration of Router
Initial Configuration of RouterKishore Kumar
 
Cisco router command configuration overview
Cisco router command configuration overviewCisco router command configuration overview
Cisco router command configuration overview3Anetwork com
 
Router configuration in packet tracer
Router configuration in packet  tracerRouter configuration in packet  tracer
Router configuration in packet tracerAnabia Anabia
 
Day 25 cisco ios router configuration
Day 25 cisco ios router configurationDay 25 cisco ios router configuration
Day 25 cisco ios router configurationCYBERINTELLIGENTS
 
De-Risk Data Center Projects With Cisco Services
De-Risk Data Center Projects With Cisco ServicesDe-Risk Data Center Projects With Cisco Services
De-Risk Data Center Projects With Cisco ServicesCisco Canada
 

Viewers also liked (11)

Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
 
Pass4sure 640-864 Questions Answers
Pass4sure 640-864 Questions AnswersPass4sure 640-864 Questions Answers
Pass4sure 640-864 Questions Answers
 
10 More Quotes for Entrepreneurs
10 More Quotes for Entrepreneurs10 More Quotes for Entrepreneurs
10 More Quotes for Entrepreneurs
 
Document similarity with vector space model
Document similarity with vector space modelDocument similarity with vector space model
Document similarity with vector space model
 
Lesson 1 slideshow
Lesson 1 slideshowLesson 1 slideshow
Lesson 1 slideshow
 
Initial Configuration of Router
Initial Configuration of RouterInitial Configuration of Router
Initial Configuration of Router
 
Cisco router command configuration overview
Cisco router command configuration overviewCisco router command configuration overview
Cisco router command configuration overview
 
Router configuration in packet tracer
Router configuration in packet  tracerRouter configuration in packet  tracer
Router configuration in packet tracer
 
Day 25 cisco ios router configuration
Day 25 cisco ios router configurationDay 25 cisco ios router configuration
Day 25 cisco ios router configuration
 
Troubleshooting basic networks
Troubleshooting basic networksTroubleshooting basic networks
Troubleshooting basic networks
 
De-Risk Data Center Projects With Cisco Services
De-Risk Data Center Projects With Cisco ServicesDe-Risk Data Center Projects With Cisco Services
De-Risk Data Center Projects With Cisco Services
 

More from University of Minnesota, Duluth

Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...University of Minnesota, Duluth
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? University of Minnesota, Duluth
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?University of Minnesota, Duluth
 
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection University of Minnesota, Duluth
 
Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...University of Minnesota, Duluth
 
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...University of Minnesota, Duluth
 
Puns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyPuns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyUniversity of Minnesota, Duluth
 
The horizon isn't found in a dictionary : Identifying emerging word senses a...
The horizon isn't found in a  dictionary : Identifying emerging word senses a...The horizon isn't found in a  dictionary : Identifying emerging word senses a...
The horizon isn't found in a dictionary : Identifying emerging word senses a...University of Minnesota, Duluth
 
Duluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyDuluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyUniversity of Minnesota, Duluth
 
What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)University of Minnesota, Duluth
 

More from University of Minnesota, Duluth (20)

Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
 
Automatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social MediaAutomatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social Media
 
What Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshopWhat Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshop
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it?
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
 
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
 
Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...
 
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
 
Puns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyPuns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and weary
 
The horizon isn't found in a dictionary : Identifying emerging word senses a...
The horizon isn't found in a  dictionary : Identifying emerging word senses a...The horizon isn't found in a  dictionary : Identifying emerging word senses a...
The horizon isn't found in a dictionary : Identifying emerging word senses a...
 
Screening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSDScreening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSD
 
Duluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyDuluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of Lexicography
 
Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014
 
What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)
 
Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25
 
Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24
 
Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013
 
Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012
 
Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1
 
Pedersen ACL Disco-2011 workshop
Pedersen ACL Disco-2011 workshopPedersen ACL Disco-2011 workshop
Pedersen ACL Disco-2011 workshop
 

Recently uploaded

Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 

Recently uploaded (20)

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 

MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Concepts

  • 1. Measuring the Similarity and Relatedness of Concepts : a MICAI 2013 Tutorial Ted Pedersen, Ph.D. University of Minnesota Department of Computer Science, Duluth http://www.d.umn.edu/~tpederse tpederse@d.umn.edu
  • 2. What (I hope) you will learn! ● ● ● ● ● The distinction between semantic similarity and relatedness (and why both are useful) How to measure using information from ontologies, definitions, and corpora How to use freely available software How to conduct experiments using freely available reference standards Some applications where these measures are used or could be useful November 25, 2013 MICAI-2013 Tutorial 2
  • 3. Orientation ● ● We focus on methods that measure similarity and relatedness using information found in an ontology, which may be possibly augmented with statistics from corpora or other resources We will not discuss purely distributional methods – November 25, 2013 Very interesting and useful, and deserve their own separate tutorial MICAI-2013 Tutorial 3
  • 4. Just a few distributional methods ● Latent Semantic Analysis – ● SenseClusters – ● http://senseclusters.sourceforge.net Clustering by Committee – ● http://lsa.colorado.edu http://demo.patrickpantel.com Disco – November 25, 2013 http://www.linguatools.de/disco/disco_en.html MICAI-2013 Tutorial 4
  • 5. Outline ● Measures of Similarity and Relatedness – ● Using Open Source Software – ● 75 minutes + 10 minutes of questions 45 minutes + 10 minutes of questions Similarity and Relatedness in the Wild – November 25, 2013 60 minutes + 10 minutes of questions MICAI-2013 Tutorial 5
  • 6. Measures of Semantic Similarity and Relatedness (without tears) November 25, 2013 MICAI-2013 Tutorial 6
  • 7. What are we measuring? ● Concept pairs (word senses) – ● Assign a numeric value that quantifies how similar or related two concepts are Not words – Cold may be temperature or illness ● – This tutorial assumes senses assigned ● November 25, 2013 Word Sense Disambiguation But, can also use these measures for WSD! MICAI-2013 Tutorial 7
  • 8. Why? ● ● Being able to organize concepts by their similarity or relatedness to each other is a fundamental operation in the human mind, and in many problems in Natural Language Processing and Artificial Intelligence If we know a lot about X, and if we know Y is similar to X, then a lot of what we know about X may apply to Y – November 25, 2013 Use X to explain or categorize Y MICAI-2013 Tutorial 8
  • 9. What is lefse? November 25, 2013 MICAI-2013 Tutorial 9
  • 10. Well, it's like a tortilla, except made with potatoes. November 25, 2013 MICAI-2013 Tutorial 10
  • 11. Lefse is a traditional soft, Norwegian flatbread. Lefse is made out of flour, and milk or cream (or sometimes lard), and cooked on a griddle. Traditional lefse does not include potato, but it is commonly added to make a thicker dough that is easier to work with. Special tools are available for lefse baking, including long wooden turning sticks and special rolling pins with deep grooves. November 25, 2013 MICAI-2013 Tutorial 11
  • 13. Similar or Related? ● Similarity based on is-a relations – – November 25, 2013 How much is X like Y? Share ancestor in is-a hierarchy MICAI-2013 Tutorial 13
  • 14. Is-a hierarchy November 25, 2013 MICAI-2013 Tutorial 14
  • 15. Similar or Related? ● Similarity based on is-a relations – Share ancestor in is-a hierarchy ● ● – A miter saw and a sander are similar ● November 25, 2013 LCS : least common subsumer Closer / deeper the ancestor the more similar both are kinds-of power tools (LCS) MICAI-2013 Tutorial 15
  • 16. Least Common Subsumer (LCS) November 25, 2013 MICAI-2013 Tutorial 16
  • 19. Similar or Related? ● Relatedness more general – How much is X related to Y? – Many ways to be related ● ● Hammer and nail are related but they really aren't similar – ● is-a, part-of, treats, affects, symptom-of, ... (use hammer to drive nails) All similar concepts are related, but not all related concepts are similar November 25, 2013 MICAI-2013 Tutorial 19
  • 20. “Standard” Measures of Similarity ● Path Based – ● Rada et al., 1989 (path) Path + Depth – – ● Wu & Palmer, 1994 (wup) Leacock & Chodorow, 1998 (lch) Path + Information Content – Resnik, 1995 (res) – Jiang & Conrath, 1997 (jcn) – Lin, 1998 (lin) November 25, 2013 MICAI-2013 Tutorial 20
  • 21. Path Based Measures ● ● Distance between concepts (nodes) in tree intuitively appealing Spatial orientation, good for networks or maps but not is-a hierarchies – – Assumes all paths have same “weight” – ● Reasonable approximation sometimes But, more specific (deeper) paths tend to travel less semantic distance Shortest path a good start, needs corrections November 25, 2013 MICAI-2013 Tutorial 21
  • 22. Shortest is-a Path 1 ● path(a,b) = -----------------------------shortest is-a path(a,b) November 25, 2013 MICAI-2013 Tutorial 22
  • 23. We count nodes... ● Maximum = 1 – – ● self similarity path(miter saw, miter saw) = 1 Minimum = 1 / (longest path in isa tree) – path(miter saw, claw hammer) = 1/7 – path(table saw, flat tip screwdriver) = 1/7 – etc... November 25, 2013 MICAI-2013 Tutorial 23
  • 24. path(miter saw, sander) = .25 November 25, 2013 MICAI-2013 Tutorial 24
  • 25. miter saw & sander November 25, 2013 MICAI-2013 Tutorial 25
  • 26. path (hammer, power tool) = .25 November 25, 2013 MICAI-2013 Tutorial 26
  • 27. hammers and power tools November 25, 2013 MICAI-2013 Tutorial 27
  • 28. ? ● ● Are hammer and power tool similar to the same degree as are mitre saw and sander? The path measure reports “yes, they are.” November 25, 2013 MICAI-2013 Tutorial 28
  • 29. Path + Depth ● Path only doesn't account for specificity ● Deeper concepts more specific ● Paths between deeper concepts travel less semantic distance November 25, 2013 MICAI-2013 Tutorial 29
  • 30. Wu and Palmer, 1994 2 * depth (LCS (a,b)) ● wup(a,b) = ---------------------------depth (a) + depth (b) ● depth(x) = shortest is-a path(root,x) November 25, 2013 MICAI-2013 Tutorial 30
  • 31. wup(miter saw, sander) = (2*2)/(4+3) = .57 November 25, 2013 MICAI-2013 Tutorial 31
  • 32. wup (hammer, power tool) = (2*1)/(2+3) = .4 November 25, 2013 MICAI-2013 Tutorial 32
  • 33. ? ● ● ● Wu and Palmer reports that sander and miter saw (.57) are more similar than are power tool and hammer (.4) Path reports that sander and miter saw (.25) are equally similar as are power tool and hammer (.25) Note that measures are scaled differently and so should compare relative rankings between measures (and not exact scores) November 25, 2013 MICAI-2013 Tutorial 33
  • 34. Information Content ● ic(concept) = -log p(concept) [Resnik 1995] – – ● Term frequency +Inherited frequency – ● Need to count concepts p(concept) = tf + if / N Depth shows specificity but not frequency Low frequency concepts often much more specific than high frequency ones November 25, 2013 MICAI-2013 Tutorial 34
  • 35. Information Content term frequency (tf) November 25, 2013 MICAI-2013 Tutorial 35
  • 36. Information Content inherited frequency (if) November 25, 2013 MICAI-2013 Tutorial 36
  • 37. Information Content (IC = -log (f/N) final count (f = tf + if, N = 365,820) November 25, 2013 MICAI-2013 Tutorial 37
  • 38. Information Content (IC = -log (f/N) final count (f = tf + if, N = 365,820) November 25, 2013 MICAI-2013 Tutorial 38
  • 39. Lin, 1998 2 * IC (LCS (a,b)) ● lin(a,b) = -------------------------IC (a) + IC (b) ● Look familiar? November 25, 2013 MICAI-2013 Tutorial 39
  • 40. Wu & Palmer, 1994 ● ● 2 * depth (LCS (a,b)) wup(a,b) = -------------------------depth (a) + depth (b) wup and lin are identical except that lin uses info content instead of depth – Info content provides a measure of depth (based on specificity) November 25, 2013 MICAI-2013 Tutorial 40
  • 41. lin (miter saw, sander) = 2 * 2.26 / (3.60 + 3.66) = 0.62 November 25, 2013 MICAI-2013 Tutorial 41
  • 42. lin (hammer, power tool) = 2 * 0.71 / (2.26+2.81) = 0.28 November 25, 2013 MICAI-2013 Tutorial 42
  • 43. ? ● ● ● Lin : miter saw and sander (.62) more similar than hammer and power tool (.28) Wu and Palmer : miter saw and sander (.57) more similar than hammer and power tool (.4) Path miter saw and sander (.25) equally similar to hammer and power tool (.25) November 25, 2013 MICAI-2013 Tutorial 43
  • 44. What about concepts not connected via is-a relations? ● Connected via other relations? – ● Part-of, treatment-of, causes, etc. Not connected at all? – – ● In different sections (axes) of an ontology In different ontologies entirely Relatedness! – Use definition information – No is-a relations so can't be similarity November 25, 2013 MICAI-2013 Tutorial 44
  • 45. Measures of relatedness ● Path based – ● Hirst & St-Onge, 1998 (hso) Definition based – Lesk, 1986 – Adapted lesk (lesk) ● ● Banerjee & Pedersen, 2003 Definition + corpus – Gloss Vector (vector, vector_pairs) ● November 25, 2013 Patwardhan & Pedersen, 2006 MICAI-2013 Tutorial 45
  • 46. Path based relatedness ● ● Ontologies include relations other than is-a These can be used to find shortest paths between concepts – However, a path made up of different kinds of relations can lead to big semantic jumps – A hammer is used to drive nails which are made of iron which comes from mines in Minnesota ● November 25, 2013 …. so hammer and Minnesota are related ?? MICAI-2013 Tutorial 46
  • 47. Measuring relatedness with definitions ● ● ● Related concepts defined using many of the same terms But, definitions are short, inconsistent Concepts don't need to be connected via relations or paths to measure them – Lesk, 1986 – Adapted Lesk, Banerjee & Pedersen, 2003 November 25, 2013 MICAI-2013 Tutorial 47
  • 48. Two separate ontologies... November 25, 2013 MICAI-2013 Tutorial 48
  • 49. Could join them together … ? November 25, 2013 MICAI-2013 Tutorial 49
  • 50. Each concept has definition November 25, 2013 MICAI-2013 Tutorial 50
  • 51. Each concept has definition November 25, 2013 MICAI-2013 Tutorial 51
  • 52. Each concept has definition November 25, 2013 MICAI-2013 Tutorial 52
  • 53. Overlaps ● Claw hammer and carpenter – Related by working with wood ● ● ● Can't see this in structure of is-a hierarchies Claw hammer and iron worker just as similar Ball peen hammer and claw hammer – Reflects structure of is-a hierarchies – If you start with text like this maybe you can build is-a hierarchies automatically! ● November 25, 2013 Another tutorial... MICAI-2013 Tutorial 53
  • 54. Lesk and Adapted Lesk ● Lesk, 1986 : measure overlaps in definitions to assign senses to words – ● The more overlaps between two senses (concepts), the more related Banerjee & Pedersen, 2003, Adapted Lesk – Augment definition of each concept with definitions of related concepts ● – November 25, 2013 Build a super gloss Increase chance of finding overlaps MICAI-2013 Tutorial 54
  • 55. The problem with definitions ... ● Definitions contain variations of terminology that make it impossible to find exact overlaps ● spatula : an instrument for spreading material ● spreader : a hand tool for smoothing compounds ● No matches??! How can we see that “hand tool” and “instrument” are similar, as are “spreading material” and “smoothing compound” ? November 25, 2013 MICAI-2013 Tutorial 55
  • 56. Gloss Vector Measure of Semantic Relatedness ● Rely on co-occurrences of terms – Terms that occur within some given number of terms of each other other ● Allows for a fuzzier notion of matching ● Exploits second order co-occurrences – November 25, 2013 Friend of a friend relation MICAI-2013 Tutorial 56
  • 57. Gloss Vector Measure of Semantic Relatedness ● Friend of a friend relation – Suppose hand tool and instrument don't occur in text with each other. But, suppose that “repair” occurs with each. – Hand tool and instrument are second order co-occurrences via “repair” November 25, 2013 MICAI-2013 Tutorial 57
  • 58. Gloss Vector Measure of Semantic Relatedness ● ● ● ● Replace words or terms in definitions with vector of co-occurrences observed in corpus Defined concept now represented by an averaged vector of co-occurrences Measure relatedness of concepts via cosine between their respective vectors Patwardhan and Pedersen, 2006 – November 25, 2013 Inspired by Schutze, 1998 MICAI-2013 Tutorial 58
  • 59. Experimental Results ● Vector > Lesk > Info Content > Depth > Path – ● Clear trend across various studies Big differences in intrinsic evaluations (Vector > Lesk >> Info Content > Depth > Path) – – ● Banerjee and Pedersen, 2003 (IJCAI) Pedersen, et al. 2007 (JBI) Smaller differences in extrinsic evaluations – November 25, 2013 Human raters mix up similarity & relatedness? MICAI-2013 Tutorial 59
  • 61. References ● ● ● S. Banerjee and T. Pedersen. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pages 805810, Acapulco, August 2003. (lesk) J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings on International Conference on Research in Computational Linguistics, pages 1933, Taiwan, 1997. (jcn) C. Leacock and M. Chodorow. Combining local context and WordNet similarity for word sense identification. In C. Fellbaum, editor, WordNet: An electronic lexical database, pages 265-283. MIT Press, 1998. (lch) November 25, 2013 MICAI-2013 Tutorial 61
  • 62. References ● ● ● M.E. Lesk. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine code from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation, pages 24-26. ACM Press, 1986. D. Lin. An information-theoretic definition of similarity. In Proceedings of the International Conference on Machine Learning, Madison, August 1998. (lin). S. Patwardhan and T. Pedersen. Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts. In Proceedings of the EACL 2006 Workshop on Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics Together, pages 1-8, Trento, Italy, April 2006. (vector, vector_pairs) November 25, 2013 MICAI-2013 Tutorial 62
  • 63. References ● ● ● R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1):17-30, 1989. (path) P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448-453, Montreal, August 1995. (res) H. Schütze. Automatic word sense discrimination. Computational Linguistics, 24(1):97-123, 1998. November 25, 2013 MICAI-2013 Tutorial 63
  • 64. Using Open Source Software November 25, 2013 MICAI-2013 Tutorial 64
  • 65. Using Open Source Software ● Packages providing the “standard” measures ● Implementations of specific measures ● Overview of WordNet::Similarity usage November 25, 2013 MICAI-2013 Tutorial 65
  • 66. Software that provides some of the “standard” measures November 25, 2013 MICAI-2013 Tutorial 66
  • 67. WordNet::Similarity ● Similarity and Relatedness for WordNet – http://wordnet.princeton.edu ● Written in Perl (starting in 2002) ● Offers command line, web interface, and API – ● http://wn-similarity.sourceforge.net We'll come back to this for some examples November 25, 2013 MICAI-2013 Tutorial 67
  • 68. ws4j ● Java Re-implementation of WordNet::Similarity – ● ● https://code.google.com/p/ws4j/ Includes path, depth, info content, hso, and lesk measures Online demo – November 25, 2013 http://ws4jdemo.appspot.com/ MICAI-2013 Tutorial 68
  • 69. NLTK ● Natural Language Toolkit – ● Includes path, depth, and information content measures Written in Python – General purpose NLP toolkit ● – November 25, 2013 Parsers, part of speech taggers, and more http://nltk.org/ MICAI-2013 Tutorial 69
  • 70. DKPro Similarity ● Semantic similarity using vector space models like LSA and ESA, and also WordNet – – ● ● Implemented using UIMA https://code.google.com/p/dkpro-similarity-asl/ Part of the much larger DKPro project, which provides UIMA wrappers for many existing tools and models Supports measuring similarity of short texts and concept pairs November 25, 2013 MICAI-2013 Tutorial 70
  • 71. Semilar ● Semantic similarity using WordNet and LSA – ● ● ● http://semanticsimilarity.org Supports measuring similarity of short texts and concept pairs Provides many pre-built models using LSA Includes a web service and API in addition to downloadable libraries November 25, 2013 MICAI-2013 Tutorial 71
  • 72. UMLS::Similarity ● Ports WordNet::Similarity to the UMLS – Unified Medical Language System from NLM, a data warehouse of medical sources ● – ● Freely available, license required http://umls-similarity.sourceforge.net Perl and mySQL November 25, 2013 MICAI-2013 Tutorial 72
  • 73. ProteInOn ● Computes Semantic Similarity for the Gene Ontology (GO) using path and information content measures – ● http://geneontology.org/ Protein Interactions and Ontology – November 25, 2013 http://lasige.di.fc.ul.pt/webtools/proteinon/ MICAI-2013 Tutorial 73
  • 74. Measure Specific Software November 25, 2013 MICAI-2013 Tutorial 74
  • 75. UKB ● Graph based similarity and relatedness measures, using WordNet – ● http://ixa2.si.ehu.es/ukb/ Applies Personalized Page Rank to semantic similarity and relatedness measures, as well as to word sense disambiguation November 25, 2013 MICAI-2013 Tutorial 75
  • 76. WMFVEC ● High dimensional vector approach using definitions from WordNet and Wiktionary – ● http://www.cs.columbia.edu/~weiwei/code.h tml#wmfvec Supports similarity measurements of short texts and concept pairs November 25, 2013 MICAI-2013 Tutorial 76
  • 77. olesk ● Shortest path in weighted semantic network – ● http://olesk.com/#SemanticRelatedness Supports measuring similarity of short texts and concept pairs November 25, 2013 MICAI-2013 Tutorial 77
  • 78. Illinois WNSim ● WordNet-based Similarity Metric – – – ● https://cogcomp.cs.illinois.edu/page/softwa re_view/Illinois%20WNSim Also provides Java version https://cogcomp.cs.illinois.edu/page/softw are_view/Illinois%20WNSim%20(Java ) Measures similarity of short texts, provides support for similarity of named entities November 25, 2013 MICAI-2013 Tutorial 78
  • 79. WordNet::Similarity Usage November 25, 2013 MICAI-2013 Tutorial 79
  • 80. WordNet::Similarity ● Install WordNet (http://princeton.wordnet.edu) – ● Make sure to set $WNHOME Install WordNet::QueryData – – ● cpan > install WordNet::QueryData Install WordNet::Similarity – cpan – > install WordNet::Similarity November 25, 2013 MICAI-2013 Tutorial 80
  • 82. Command line ● similarity.pl --type WordNet::Similarity::path dog cat – ● similarity.pl --type WordNet::Similarity::path dog#n cat#n – ● dog#n#1 cat#n#1 0.2 similarity.pl --type WordNet::Similarity::path dog#n#2 cat#n#3 – ● dog#n#1 cat#n#1 0.2 dog#n#2 cat#n#3 0.125 Similarity.pl –type WordNet::Similarity::path dog cat#n#3 – November 25, 2013 dog#n#3 cat#n#3 0.142857142857143 MICAI-2013 Tutorial 82
  • 83. Command line ● similarity.pl --type WordNet::Similarity::path dog cat --allsenses – – dog#n#3 cat#n#2 0.2 – dog#n#1 cat#n#7 0.2 – dog#n#7 cat#n#5 0.166666666666667 – dog#n#4 cat#n#2 0.142857142857143 – dog#n#3 cat#n#3 0.142857142857143 – dog#v#1 cat#v#2 0.142857142857143 – dog#n#4 cat#n#3 0.142857142857143 – dog#n#6 cat#n#5 0.142857142857143 – dog#v#1 cat#v#1 0.142857142857143 – dog#n#2 cat#n#2 0.125 – November 25, 2013 dog#n#1 cat#n#1 0.2 Etc... MICAI-2013 Tutorial 83
  • 84. WordNet senses ● wn cat -over ● Overview of noun cat ● The noun cat has 8 senses (first 1 from tagged texts) ● ● ● ● 1. (18) cat, true cat -- (feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats) 2. guy, cat, hombre, bozo -- (an informal term for a youth or man; "a nice guy"; "the guy's only doing it for some doll") 3. cat -- (a spiteful woman gossip; "what a cat she is!") 4. kat, khat, qat, quat, cat, Arabian tea, African tea -- (the leaves of the shrub Catha edulis which are chewed like tobacco or used to make tea; has the effect of a euphoric stimulant; "in Yemen kat is used daily by 85% of adults") November 25, 2013 MICAI-2013 Tutorial 84
  • 85. wn cat -over ● ● ● ● 5. cat-o'-nine-tails, cat -- (a whip with nine knotted cords; "British sailors feared the cat") 6. Caterpillar, cat -- (a large tracked vehicle that is propelled by two endless metal belts; frequently used for moving earth in construction and farm work) 7. big cat, cat -- (any of several large cats typically able to roar and living in the wild) 8. computerized tomography, computed tomography, CT, computerized axial tomography, computed axial tomography, CAT -- (a method of examining body organs by scanning them with X rays and using a computer to construct a series of cross-sectional scans along a single axis November 25, 2013 MICAI-2013 Tutorial 85
  • 86. wn cat -over ● Overview of verb cat ● The verb cat has 2 senses (no senses from tagged texts) ● 1. cat -- (beat with a cat-o'-nine-tails) ● 2. vomit, vomit up, purge, cast, sick, cat, be sick, disgorge, regorge, retch, puke, barf, spew, spue, chuck, upchuck, honk, regurgitate, throw up -- (eject the contents of the stomach through the mouth; "After drinking too much, the students vomited"; "He purged continuously"; "The patient regurgitated the food we gave him last night") November 25, 2013 MICAI-2013 Tutorial 86
  • 87. Measure type (-type WordNet::Similarity::X) ● Similarity – Shortest Path ● – Depth ● – wup, lch Information Content ● November 25, 2013 path res, lin, jcn MICAI-2013 Tutorial 87
  • 88. Measure type (-type WordNet::Similarity::X) ● Relatedness – Definition ● – Definition + Corpus ● ● – vector vector_pairs Path finding ● November 25, 2013 lesk hso MICAI-2013 Tutorial 88
  • 89. Measure type (-type WordNet::Similarity::X) ● Random – rand – Very useful for experimental comparisons ● November 25, 2013 How do your results compare with a random measure of similarity or relatedness?? MICAI-2013 Tutorial 89
  • 90. Similarity measures don't cross part of speech tags ● similarity.pl --type WordNet::Similarity::path dog#n cat#v – Warning (WordNet::Similarity::path::parseWps()) - dog#n and cat#v belong to different parts of speech. – dog#n#2 cat#v#1 -1000000 November 25, 2013 MICAI-2013 Tutorial 90
  • 91. Relatedness measures cross POS boundaries ● similarity.pl --type WordNet::Similarity::lesk dog#n#1 cat#v#1 – ● dog#n#1 cat#v#1 20 similarity.pl --type WordNet::Similarity::vector dog#n#1 cat#v#1 – November 25, 2013 dog#n#1 cat#v#1 0.146581307649965 MICAI-2013 Tutorial 91
  • 92. API ● use WordNet::Similarity::wup; ● use WordNet::QueryData; ● my $wn = WordNet::QueryData->new(); ● my $wup = WordNet::Similarity::wup->new($wn); ● ● my $value = $wup->getRelatedness('dog#n#1', 'cat#n#1'); ● my ($error, $errorString) = $wup->getError(); ● die $errorString if $error; ● print "dog (sense 1) <-> cat (sense 1) = $valuen"; ● dog (sense 1) <-> cat (sense 1) = 0.866666666666667 November 25, 2013 MICAI-2013 Tutorial 92
  • 93. API ● my $wn = WordNet::QueryData->new; ● use WordNet::Similarity::PathFinder; ● my $obj = WordNet::Similarity::PathFinder->new ($wn); ● my $wps1 = 'winston_churchill#n#1'; ● my $wps2 = 'england#n#1'; ● my @paths = $obj->getShortestPath($wps1, $wps2, 'n', 'wps'); ● my ($length, $path) = @{shift @paths}; ● defined $path or die "No path between synsets"; ● print "shortest path between $wps1 and $wps2 is $length edges longn"; ● print "@$pathn"; ● shortest path between winston_churchill#n#1 and england#n#1 is 14 edges long winston_churchill#n#1 writer#n#1 communicator#n#1 person#n#1 causal_agent#n#1 physical_entity#n#1 object#n#1 location#n#1 region#n#3 district#n#1 administrative_district#n#1 country#n#2 European_country#n#1 england#n#1 November 25, 2013 MICAI-2013 Tutorial 93
  • 94. Web Interface November 25, 2013 MICAI-2013 Tutorial 94
  • 95. Web Interface November 25, 2013 MICAI-2013 Tutorial 95
  • 96. Web Interface November 25, 2013 MICAI-2013 Tutorial 96
  • 97. Web Interface November 25, 2013 MICAI-2013 Tutorial 97
  • 98. Web Interface November 25, 2013 MICAI-2013 Tutorial 98
  • 99. Web Interface November 25, 2013 MICAI-2013 Tutorial 99
  • 100. Web Interface ● If you like the web interface, you can run your own version! – similarity_server.pl – All necessary html and cgi files included November 25, 2013 MICAI-2013 Tutorial 100
  • 101. Other Utilities ● Build new information content files – by default counts come from SemCor – BNCFreq.pl – brownFreq.pl – treebankFreq.pl – rawtextFreq.pl ● compounds.pl – list all WordNet compounds ● wnDepths.pl – list all WordNet depths November 25, 2013 MICAI-2013 Tutorial 101
  • 103. References ● ● ● ● Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca and Aitor Soroa. 2009. A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches. Proceedings of NAACL-HLT 09. Boulder, USA. (ukb) Daniel Bär, Torsten Zesch, and Iryna Gurevych. DKPro Similarity: An Open Source Framework for Text Similarity, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 121-126, August 2013, Sofia, Bulgaria. (pdf) (bib) (dkpro-similarity) Steven Bird, Ewan Klein, and Edward Loper (2009). Natural Language Processing with Python. O’Reilly Media Inc. (nltk) Q. Do and D. Roth and M. Sammons and Y. Tu and V. Vydiswaran, Robust, Light-weight Approaches to compute Lexical Similarity. Computer Science Research and Technical Reports, University of Illinois (2009) (Illionois WNSim) November 25, 2013 MICAI-2013 Tutorial 103
  • 104. References ● ● ● Weiwei Guo and Mona Diab. "Improving Lexical Semantics for Sentential Semantics: Modeling Selectional Preference and Similar Words in a Latent Variable Model". In Proceedings of NAACL, 2013, Atlanta, Georgia, USA. (wmfvec) Bridget McInnes, Ted Pedersen, and Serguei Pakhomov, UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, Nov 14-18, 2009, pp. 431-435, San Francisco, CA (umls-similarity) Ted Pedersen, Siddharth Patwardhan and Jason Michelizzi, WordNet::Similarity - Measuring the Relatedness of Concepts - Appears in the Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-04), pp. 38-41, May 3-5, 2004, Boston, MA. (wordnet-similarity) November 25, 2013 MICAI-2013 Tutorial 104
  • 105. References ● ● Rus, V., Lintean, M., Banjade, R., Niraula, N., and Stefanescu, D. (2013). SEMILAR: The Semantic Similarity Toolkit. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, August 4-9, 2013, Sofia, Bulgaria. (semilar) Reda Siblini and Leila Kosseim (2013). Using a Weighted Semantic Network for Lexical Semantic Relatedness. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2013), September, Hissar, Bulgaria. (olesk) November 25, 2013 MICAI-2013 Tutorial 105
  • 106. Similarity and Relatedness in the Wild : How do we know it's working? November 25, 2013 MICAI-2013 Tutorial 106
  • 107. Intrinsic Evaluation ● ● ● Develop your own measure Score it using pairs for which human reference standard is available Compare correlation between your measure and established measures – Spearman's rank correlation often used – rank.pl in Ngram Statistics Package ● November 25, 2013 http://ngram.sourceforge.net MICAI-2013 Tutorial 107
  • 108. Intrinsic Evaluation ● Replication proves to be very difficult! ● Many factors, see ACL 2013 paper – Offspring from Reproduction Problems: What Replication Failure Teaches Us (Fokkens, van Erp, Postma, Pedersen, Vossen, and Freire) - Appears in the Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, August 4-9, 2013, pp. 1691-1701, Sofia, Bulgaria. – http://aclweb.org/anthology//P/P13/P13-1166.pdf November 25, 2013 MICAI-2013 Tutorial 108
  • 109. Reference Standards ● Rubenstein and Goodenough, 1965 – – Assessed by 50 undergraduate students – ● 65 pairs http://www.d.umn.edu/~tpederse/Data/ruben stein-goodenough-1965.txt Miller and Charles, 1991 – 30 pair subset of R&G – Re-assessed by 38 undergraduate students – http://www.d.umn.edu/~tpederse/Data/millercharles-1991.txt November 25, 2013 MICAI-2013 Tutorial 109
  • 110. Rubenstein and Goodenough pairs no similarity (0.0) – synonyms (4.0) ● 3.94 gem jewel ● 3.92 car automobile ● 3.92 automobile car ● 3.84 gem jewel ● 2.41 brother lad ● 1.66 lad brother ● 2.37 crane implement ● 1.68 crane implement ● 0.04 rooster voyage ● 0.08 rooster voyage ● 0.04 noon string ● 0.08 noon string – November 25, 2013 Rubenstein & Goodenough MICAI-2013 Tutorial – Miller & Charles 110
  • 111. Reference Standards • WordSim-353, 2002 – – 200 pairs assessed by 16 subjects – ● 153 pairs assessed by 13 subjects Includes the Miller and Charles pairs (reassessed) http://www.cs.technion.ac.il/~gabr/resources/ data/wordsim353/ November 25, 2013 MICAI-2013 Tutorial 111
  • 112. Reference Standards ● Yang and Powers, 2006 ● 130 verb pairs – Assessed by 2 academic staff and 4 graduate students – How related in meaning is the pair? ● ● – November 25, 2013 0 for not at all 4 for inseperably related http://david.wardpowers.info/Research/AI/p apers/200601-GWC-130verbpairs.txt MICAI-2013 Tutorial 112
  • 113. Reference standards ● ● ● Mturk 771, 2012 771 word pairs scored for relatedness by Mechanical Turkers At least 20 judgements per pair – 1 for not related, 5 for highly related – 50 ratings per Turker – http://www2.mta.ac.il/~gideon/mturk771.html November 25, 2013 MICAI-2013 Tutorial 113
  • 114. Reference Standards ● MWE-300, 2012 – – Assessed by 5 native speakers on scale of 0 to 1 – November 25, 2013 300 pairs where 216 are multi-word expressions and 84 are word pairs http://adapt.seiee.sjtu.edu.cn/similarity/ MICAI-2013 Tutorial 114
  • 115. Reference Standards ● ● Rel-122, 2013 Relatedness scores for 122 noun pairs, created at University of Central Florida – Each pair assessed by at least 20 undergraduate students – 0 for completely unrelated, 4 for strongly related – http://www.cs.ucf.edu/~seansz/rel-122/ November 25, 2013 MICAI-2013 Tutorial 115
  • 116. Reference standards ● MayoSRS, 2007 – 101 pairs of medical concepts – Assessed by 13 medical coders and 3 physicians, all from Mayo Clinic ● – ● 1 for not at all related, 4 for nearly synonymous MiniMayoSRS – a highly reliable subset of 29 pairs http://rxinformatics.umn.edu/SemanticRelate dnessResources.html November 25, 2013 MICAI-2013 Tutorial 116
  • 117. Reference standards UMNSRS, 2010 – – ● ● 566 pairs of medical concepts assessed for similarity by 8 medical students / residents 587 pairs of medical concepts assessed for relatedness by 8 medical students / residents Assessed on a continuous scale (0 – 1500) http://rxinformatics.umn.edu/SemanticRelated nessResources.html November 25, 2013 MICAI-2013 Tutorial 117
  • 118. Reference Standards ● Lexical & Distributional Semantics Evaluation Benchmarks, maintained by Manaal Faruqui – ● http://www.cs.cmu.edu/~mfaruqui/suite.html ACL Wiki (various datasets for related tasks) http://aclweb.org/aclwiki/index.php?title=Simi larity_(State_of_the_art) http://aclweb.org/aclwiki/index.php?title=Kn owledge_collections_and_datasets_(English) SemEval (many related tasks with data) – ● – November 25, 2013 http://aclweb.org/aclwiki/index.php?title=Se mEval_Portal MICAI-2013 Tutorial 118
  • 119. Extrinsic Evaluation ● Synonym Tests ● Word Sense Disambiguation ● Semantic Textual Similarity ● Recognizing Textual Entailment November 25, 2013 MICAI-2013 Tutorial 119
  • 120. ESL Synonym Tests ● Provide one target word in context ● Select “closest” synonym from a list of 4 ● ● ● Used in previous versions of TOEFL and other standardized tests http://aclweb.org/aclwiki/index.php?title=ESL_Synonym_Questions_(State_ of_the_art) 50 question data set available from Peter Turney – November 25, 2013 http://www.apperceptual.com/ MICAI-2013 Tutorial 120
  • 121. ESL Synonym Tests ● ● Stem: "A rusty nail is not as strong as a clean, new one." Choices: – (a) corroded – (b) black – (c) dirty – (d) painted November 25, 2013 MICAI-2013 Tutorial 121
  • 122. ESL Synonym Tests ● similarity.pl --type WordNet::Similarity::vector --file rusty.txt ● rusty#a#1 dirty#a#1 0.175879883782967 ● rusty#a#1 painted#a#1 0.0844532311114079 ● rusty#a#1 black#a#2 0.0656019491836669 ● rusty#a#1 corroded#a#1 0.0357324093083641 – November 25, 2013 :( MICAI-2013 Tutorial 122
  • 123. TOEFL Synonym Tests ● Rusty and other words are adjectives ● Must used relatedness measure lesk – vector – vector_pairs – hso Should do word sense disambiguation first – ● November 25, 2013 MICAI-2013 Tutorial 123
  • 124. Word Sense Disambiguation ● The meanings of words that occur together in a context will likely be related – If a word has multiple senses, it will most likely be used in the sense that is most related to the senses of it's neighbors – Relatedness seems to matter more than similarity, unless you have a list ● November 25, 2013 I have a horse, a cat and a cow at my farm. MICAI-2013 Tutorial 124
  • 125. Word Sense Disambiguation ● SenseRelate Hypothesis : Most words in text will have multiple possible senses and will often be used with the sense most related to those of surrounding words – He either has a cold or the flu ● November 25, 2013 Cold not likely to mean air temperature MICAI-2013 Tutorial 125
  • 126. SenseRelate ● ● In coherent text words will be used in similar or related senses, and these will also be related to the overall topic or mood of a text First applied to WSD in 2002 – Banerjee and Pedersen, 2002 (WordNet) – Patwardhan et al., 2003 (WordNet) – Pedersen and Kolhatkar 2009 (WordNet) – McInnes et al., 2011 (UMLS) November 25, 2013 MICAI-2013 Tutorial 126
  • 127. Implementations ● WordNet::SenseRelate – AllWords, TargetWord, WordToSet – http://senserelate.sourceforge.net ● ● Includes command line, API, and web interface UMLS::SenseRelate – AllWords – http://search.cpan.org/dist/UMLS-SenseRelat e/ November 25, 2013 MICAI-2013 Tutorial 127
  • 128. Web Interface November 25, 2013 MICAI-2013 Tutorial 128
  • 129. Web Interface November 25, 2013 MICAI-2013 Tutorial 129
  • 130. SenseRelate for WSD ● Assign each word the sense which is most similar or related to one or more of its neighbors – – ● Pairwise 2 or more neighbors Pairwise algorithm results in a trellis much like in HMMs – November 25, 2013 More neighbors adds lots of information and a lot of computational complexity MICAI-2013 Tutorial 130
  • 131. SenseRelate - pairwise November 25, 2013 MICAI-2013 Tutorial 131
  • 132. SenseRelate – 2 neighbors November 25, 2013 MICAI-2013 Tutorial 132
  • 133. General Observations on WSD Results ● ● ● ● Nouns more accurate; verbs, adjectives, and adverbs less so Increasing the window size nearly always improves performance Jiang-Conrath measure often a high performer for nouns (e.g., Patwardhan et al. 2003) Vector and lesk have coverage advantage – November 25, 2013 handle mixed pairs while others don't MICAI-2013 Tutorial 133
  • 134. SenseRelate Sentiment Classification ● The underlying sentiment of a text can be discovered by determining which emotion is most related to the words in that text. – – Similar to happy? : joyful, ecstatic, ... – ● Related to happy? : love, food, success, ... Pairwise comparisons between emotion and senses of words in context Same form as Naive Bayesian model – November 25, 2013 WordNet::SenseRelate::WordToSet MICAI-2013 Tutorial 134
  • 135. SenseRelate - WordToSet November 25, 2013 MICAI-2013 Tutorial 135
  • 136. Experimental Results ● Sentiment classification results in 2011 i2b2 suicide notes challenge were disappointing (Pedersen, 2012) – Suicide notes not very emotional! – In many cases reflect a decision made and focus on settling affairs November 25, 2013 MICAI-2013 Tutorial 136
  • 137. Semantic Textual Similarity (STS) ● How similar (semantically) are 2 texts? – – ● The Senate Select Committee on Intelligence is preparing a blistering report on prewar intelligence on Iraq. American intelligence leading up to the war on Iraq will be criticized by a powerful US Congressional committee due to report soon, officials said today http://www-nlp.stanford.edu/wiki/STS November 25, 2013 MICAI-2013 Tutorial 137
  • 138. Semantic Textual Similarity (STS) ● Combined distributional and WordNet information to learn a model from training data – ● UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures, Daniel Bär, Chris Biemann, Iryna Gurevych, and Torsten Zesch, Semeval 2012 LSA Boosted with WordNet – November 25, 2013 UMBC EBIQUITY-CORE: Semantic Textual Similarity Sy stems Lushan Han, Abhay L. Kashyap, Tim Finin, James Mayfield, and Johnathan Weese, *Sem 2013 MICAI-2013 Tutorial 138
  • 139. Recognizing Textual Entailment (RTE) ● A text entails a hypothesis if a human reading the text would infer that the hypothesis is true – Text : The Christian Science Monitor named a US journalist kidnapped in Iraq as freelancer Jill Carroll. – Hypothesis: Jill Carroll was abducted in Iraq. – Hypothesis: The Christian Science Monitor kidnapped a freelancer. November 25, 2013 MICAI-2013 Tutorial 139
  • 140. RTE methods and data ● Long series of shared tasks – – ● 2004 to present http://aclweb.org/aclwiki/index.php?title=T extual_Entailment_Resource_Pool Recognizing that T and H are similar is helpful, although does not really solve the problem – November 25, 2013 Hybrid approaches (like with STS) MICAI-2013 Tutorial 140
  • 141. Applications ● Semantic similarity and relatedness are important components of many NLP applications – Crucial building blocks – Interesting to study in their own right November 25, 2013 MICAI-2013 Tutorial 141
  • 142. Thank you! If you have any suggestions for content that should be added to or changed in this tutorial, please let me know! Any other comments are welcome too. tpederse@d.umn.edu Questions? November 25, 2013 MICAI-2013 Tutorial 142
  • 143. References ● ● ● ● S. Banerjee and T. Pedersen. An adapted Lesk algorithm for word sense disambiguation using WordNet. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pages 136—145, Mexico City, February 2002. (wsd result) D. Faria, C. Pesquita, F. M. Couto, and A. Falcão, ProteInOn: A Web Tool for Protein Semantic Similarity, Technical Report, Department of Informatics, University of Lisbon, 2007 (proteinon) L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan and G. Wolfman (2002). Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems, 20(1), 116-131. (wordsim-353) B. McInnes, T. Pedersen, Y. Liu, G. Melton and S. Pakhomov. Knowledgebased Method for Determining the Meaning of Ambiguous Biomedical Terms Using Information Content Measures of Similarity. Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, pages 895-904, Washington, DC, October 2011. (wsd result) November 25, 2013 MICAI-2013 Tutorial 143
  • 144. References ● ● ● ● G. A. Miller and W. G. Charles (1991). Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6(1), 1-28. S. Pakhomov, B. McInnes, T. Adam, Y. Liu, T. Pedersen, and G Melton, Semantic Similarity and Relatedness between Clinical Terms : An Experimental Study - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, November 13-17, 2010, pp. 572 - 576, Washington, DC. (umnsrs) S. Patwardhan, S. Banerjee, and T. Pedersen. Using measures of semantic relatedness for word sense disambiguation. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pages 241—257, Mexico City, February 2003. (wsd result) S. Patwardhan and T. Pedersen. Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts. In Proceedings of the EACL 2006 Workshop on Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics Together, pages 1-8, Trento, Italy, April 2006. (wsd result) November 25, 2013 MICAI-2013 Tutorial 144
  • 145. References ● ● ● ● T. Pedersen and V. Kolhatkar. WordNet :: SenseRelate :: AllWords - a broad coverage word sense tagger that maximizes semantic relatedness. In Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies 2009 Conference, pages 17-20, Boulder, CO, June 2009. (wsd result) T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3) : 288-299, June 2007. (mayosrs) T. Pedersen. Rule-based and lightly supervised methods to predict emotions in suicide notes. Biomedical Informatics Insights, 2012:5 (Suppl. 1):185-193, January 2012. (sentiment result) H. Rubenstein and J. B. Goodenough (1965). Contextual Correlates of Synonymy. Communications of the ACM, 8(10), 627-633. November 25, 2013 MICAI-2013 Tutorial 145
  • 146. References ● ● ● ● ● SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Semeval 2012 (sts shared task) S. Szumlanski, F. Gomez and V.K. Sims (2013). A New Set of Norms for Semantic Relatedness Measures. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 890-895). Sofia, Bulgaria. (rel-122) P. D. Turney (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502. (toefl synonyms) W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In Proceedings of SIGMOD'12, pages 481-492, 2012. (mwe-300) D. Yang and D.M. W. Powers (2006). Verb Similarity on the Taxonomy of WordNet. Proceedings of the Third International WordNet Conference (GWC-06) (pp. 121-128). Jeju Island, Korea. November 25, 2013 MICAI-2013 Tutorial 146
  • 147. The End! November 25, 2013 MICAI-2013 Tutorial 147