SlideShare a Scribd company logo
1 of 21
STL : A Similarity Measure Based on Semantic, Terminological and Linguistic Information Nitish Aggarwal joint work with Tobias Wunner, MihaelArcan DERI, NUI Galway firstname.lastname@deri.org Friday,19th Aug, 2011 DERI, Friday Meeting
Overview Motivation & Applications Why STL?  Semantic Terminology Linguistic Evaluation Conclusion and future work 2
Motivation & Applications SemanticAnnotation Similarity between corpus data and ontology concepts SAP AG held €1615 million in short-term liquid assets (2009) “dbpedia:SAP_AG” “xEBR:LiquidAssets” at “dbpedia:year:2009” 3
SemanticSearch Similarity between Query and index object Motivation & Applications SAP liquid asset in 2010 Current asset of SAP last year “dbpedia:SAP_AG” “xEBR:liquid asset” at “dbpedia:year:2010” Net cash of SAP in 2010 SAP total amount received in 2010 4
Motivation & Applications OntologyMatching & Alignment Similarity between ontology concepts ifrs:StatementOfFinancialPosition xebr:KeyBalanceSheet Assets Ifrs:Assets ifrs:BiologicalAssets xebr:SubscribedCapitalUnpaid Ifrs:CurrentAssets Ifrs:NonCurrentAssets xebr:FixedAssets xebr:CurrentAssets ifrs:PropertyPlantAndEquipment xebr:TangibleFixedAssets xebr:IntangibleFixedAssets xebr:Amount Receivable xebr:Liquid Assets Similarity = ? Similarity = ? ifrs:CashAndCashEquivalents Ifrs:TradeAndOtherCurrentReceivables Ifrs:Inventories 5
Classical Approaches String Similarity Levenshteindistance, Dice Coefficient Corpus-based LSA, ESA, Google distance,Vector-Space Model Ontology-based Path distance, Information content Syntax Similarity Word-order, Part of Speech 6
Why STL? Semantic Semanticstructure and relations Terminology complex terms expressing the same concept Linguistic  Phrase and dependency structure 7
STL Definition Linear combination of semantic, terminological and linguistic obtained by using a linear regression Formula used STL = w1*S + w2*T + w3*L + Constant w1, w2, w3 represent the contribution of each 8
Semantic WuPalmer 2*depth(MSCA) / depth(c1) + depth(c2) Resnik’s Information Content IC(c) = -log p(c) Intrinsic Information Content (Pirro09) Overcome the analysis of large corpora 9
Cont. Intrinsic information content(iIC) . where sub(c) is number of sub-concept of given concept c. Pirro_Similarity 10
Cont. MSCA subconcepts = 48 IC (TFA) = 0.32 Assets Subscribed Capital Unpaid Fixed Assets Current Assets Pirro_Sim = 0.33 Pirro_Sim =? Stocks Tangible Fixed Assets Amount Receivable subconcepts = 6 IC (AR) = 0.69 subconcepts = 9 IC (TFA) = 0.60 Amount Receivable [total] Amount Receivable  with in one year Amount Receivable after more than one year Other Tangible Fixed Assets Property, Plant  and Equipment Payments on account and asset in construction Furniture Fixture and Equipment Trade Debtors Other Fixture Land and Building Other Debtors Plant and Machinery Other Property, Plant  and Equipment Property, Plant  and Equipment [Total] 11
Limitation Does semantic structure reflect a good similarity? not necessarily e.g. In xEBR, parent-child relation for describing the layout of 	    	concepts “Work in progress” is not a type of asset, although both are linked via the parent-child relationship   12
Terminology Definition Common naming convention Ngram Vs subterms In financial domain, bigram ”Intangible Fixed” is a subtring of ”Other Intangible Fixed Assets” but not a subterm. Terminological similarity maximal subterm overlap 13
Cont. Trade Debts Payable After More Than One Year  [[Trade][Debts]][Payable][After More Than One Year] [SAP:Payable] [Ifrs:After More Than One Year] [Investoword:Debt] [FinanceDict:Trade Debts] [Investopedia:Trade] Financial[Debts][Payable][After More Than One Year] Financial Debts Payable After More Than One Year  14
Multilingual Subterms Translatedsubterms Available in otherlanguages Advantage Reflect terminological similarities that may be available in one language but not in others. ”Property Plant and Equipment”@en ”Sachanlagen”@de ”Tangible Fixed Asset” @en 15
Linguistic	 Syntactic Information Beyond simple word order phrase structure Dependency structure Phrase structure Intangible fixed : adj adj > ?? Intangible fixed assets : adj adj n > NP Dependency structure Amounts receivable : N Adv : receive:mod, amounts:head Received amounts : V N : receive:mod, amounts:head 16
Evaluation Data Set xEBR finance vocabulary 269 terms (concept labels) 72,361(269*269) termpairs Benchmarks SimSem59: sample of 59 term pairs SimSem200 : sample of 200 term pairs (under construction) 17
Experiment An overview of similarity measures 18
Experiment Results (Simsem59) STL formula used STL = 0.1531 * S + 0.5218 * T + 0.1041 * L + 0.1791 Correlation between similarity scores & simsem59 Semantic  Contribution Terminology Contribution Linguistic  Contribution 19
Conclusion STL outperforms more traditional similarity measures Largest contribution by T (Terminological Analysis) Multilingual subterms performs better than monolingual 20
Future work Evaluation on larger data set and vocabularies (IFRS) 3000+ terms  9M term pairs richer set of linguistic operations “recognise” => “recognition”  	by derivation rule verb_lemma+"ion” Similarity between subterms “Staff Costs” and "Wages And Salaries" 21

More Related Content

What's hot

110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...helggeist
 
XBRL - Features and Fundamental
XBRL - Features and FundamentalXBRL - Features and Fundamental
XBRL - Features and FundamentalSundar B N
 
XBRL Conversion Steps
XBRL Conversion StepsXBRL Conversion Steps
XBRL Conversion Stepstrivesa
 
Understanding XBRL
Understanding XBRLUnderstanding XBRL
Understanding XBRLMamta Binani
 

What's hot (10)

Overview of XBRL by FinDynamics.com
Overview of XBRL by FinDynamics.comOverview of XBRL by FinDynamics.com
Overview of XBRL by FinDynamics.com
 
Gaia 5
Gaia 5Gaia 5
Gaia 5
 
Xbrl india[1]
Xbrl india[1]Xbrl india[1]
Xbrl india[1]
 
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
 
XBRL - Features and Fundamental
XBRL - Features and FundamentalXBRL - Features and Fundamental
XBRL - Features and Fundamental
 
XBRL Conversion Steps
XBRL Conversion StepsXBRL Conversion Steps
XBRL Conversion Steps
 
Understanding XBRL
Understanding XBRLUnderstanding XBRL
Understanding XBRL
 
XBRL Fundamentals
XBRL FundamentalsXBRL Fundamentals
XBRL Fundamentals
 
XBRL Overview
XBRL OverviewXBRL Overview
XBRL Overview
 
Xbrl slideshare
Xbrl slideshareXbrl slideshare
Xbrl slideshare
 

Similar to STL: A similarity measure based on semantic and linguistic information

Semantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrlSemantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrlTobias Wunner
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Tobias Wunner
 
Financial Industry Semantics and Ontologies
Financial Industry Semantics and OntologiesFinancial Industry Semantics and Ontologies
Financial Industry Semantics and OntologiesMike Bennett
 
Arch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxArch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxSanjoy Kumar Roy
 
Les week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlLes week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlIfk Bigfood
 
Implementing information federation
Implementing information federationImplementing information federation
Implementing information federationCory Casanave
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSiJohn O'Gorman
 
Chapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesChapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesjps619
 
Data Model vs Ontology Development – a FIBO perspective | Mike Bennett
Data Model vs Ontology Development – a FIBO perspective | Mike BennettData Model vs Ontology Development – a FIBO perspective | Mike Bennett
Data Model vs Ontology Development – a FIBO perspective | Mike BennettConnected Data World
 
SSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsSSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsJohn Bauer
 
What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11SAP Technology
 
Data Modeling Presentations I
Data Modeling Presentations IData Modeling Presentations I
Data Modeling Presentations Icd_crisci
 
Cloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportCloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportMMMTechLaw
 
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryFIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryNeo4j
 
Chapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesChapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesjps619
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016Jessie Chuang
 
Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018Megan Bowe
 

Similar to STL: A similarity measure based on semantic and linguistic information (20)

Semantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrlSemantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrl
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
 
Financial Industry Semantics and Ontologies
Financial Industry Semantics and OntologiesFinancial Industry Semantics and Ontologies
Financial Industry Semantics and Ontologies
 
Arch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxArch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptx
 
Les week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlLes week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrl
 
Implementing information federation
Implementing information federationImplementing information federation
Implementing information federation
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSi
 
Chapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesChapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvalues
 
42109 scudeletti (1)
42109 scudeletti (1)42109 scudeletti (1)
42109 scudeletti (1)
 
Data Model vs Ontology Development – a FIBO perspective | Mike Bennett
Data Model vs Ontology Development – a FIBO perspective | Mike BennettData Model vs Ontology Development – a FIBO perspective | Mike Bennett
Data Model vs Ontology Development – a FIBO perspective | Mike Bennett
 
SSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsSSO Strategy Implementation Considerations
SSO Strategy Implementation Considerations
 
What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11
 
Data Modeling Presentations I
Data Modeling Presentations IData Modeling Presentations I
Data Modeling Presentations I
 
CV Tuyen Ly Eng 2017 01-09
CV Tuyen Ly Eng 2017 01-09CV Tuyen Ly Eng 2017 01-09
CV Tuyen Ly Eng 2017 01-09
 
Cloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportCloud insights m&a and capital markets report
Cloud insights m&a and capital markets report
 
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryFIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
 
Chapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesChapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrules
 
Wetzel, "CORE, Cost of Resource Exchange Update"
Wetzel, "CORE, Cost of Resource Exchange Update"Wetzel, "CORE, Cost of Resource Exchange Update"
Wetzel, "CORE, Cost of Resource Exchange Update"
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018
 

Recently uploaded

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Recently uploaded (20)

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

STL: A similarity measure based on semantic and linguistic information

  • 1. STL : A Similarity Measure Based on Semantic, Terminological and Linguistic Information Nitish Aggarwal joint work with Tobias Wunner, MihaelArcan DERI, NUI Galway firstname.lastname@deri.org Friday,19th Aug, 2011 DERI, Friday Meeting
  • 2. Overview Motivation & Applications Why STL? Semantic Terminology Linguistic Evaluation Conclusion and future work 2
  • 3. Motivation & Applications SemanticAnnotation Similarity between corpus data and ontology concepts SAP AG held €1615 million in short-term liquid assets (2009) “dbpedia:SAP_AG” “xEBR:LiquidAssets” at “dbpedia:year:2009” 3
  • 4. SemanticSearch Similarity between Query and index object Motivation & Applications SAP liquid asset in 2010 Current asset of SAP last year “dbpedia:SAP_AG” “xEBR:liquid asset” at “dbpedia:year:2010” Net cash of SAP in 2010 SAP total amount received in 2010 4
  • 5. Motivation & Applications OntologyMatching & Alignment Similarity between ontology concepts ifrs:StatementOfFinancialPosition xebr:KeyBalanceSheet Assets Ifrs:Assets ifrs:BiologicalAssets xebr:SubscribedCapitalUnpaid Ifrs:CurrentAssets Ifrs:NonCurrentAssets xebr:FixedAssets xebr:CurrentAssets ifrs:PropertyPlantAndEquipment xebr:TangibleFixedAssets xebr:IntangibleFixedAssets xebr:Amount Receivable xebr:Liquid Assets Similarity = ? Similarity = ? ifrs:CashAndCashEquivalents Ifrs:TradeAndOtherCurrentReceivables Ifrs:Inventories 5
  • 6. Classical Approaches String Similarity Levenshteindistance, Dice Coefficient Corpus-based LSA, ESA, Google distance,Vector-Space Model Ontology-based Path distance, Information content Syntax Similarity Word-order, Part of Speech 6
  • 7. Why STL? Semantic Semanticstructure and relations Terminology complex terms expressing the same concept Linguistic Phrase and dependency structure 7
  • 8. STL Definition Linear combination of semantic, terminological and linguistic obtained by using a linear regression Formula used STL = w1*S + w2*T + w3*L + Constant w1, w2, w3 represent the contribution of each 8
  • 9. Semantic WuPalmer 2*depth(MSCA) / depth(c1) + depth(c2) Resnik’s Information Content IC(c) = -log p(c) Intrinsic Information Content (Pirro09) Overcome the analysis of large corpora 9
  • 10. Cont. Intrinsic information content(iIC) . where sub(c) is number of sub-concept of given concept c. Pirro_Similarity 10
  • 11. Cont. MSCA subconcepts = 48 IC (TFA) = 0.32 Assets Subscribed Capital Unpaid Fixed Assets Current Assets Pirro_Sim = 0.33 Pirro_Sim =? Stocks Tangible Fixed Assets Amount Receivable subconcepts = 6 IC (AR) = 0.69 subconcepts = 9 IC (TFA) = 0.60 Amount Receivable [total] Amount Receivable with in one year Amount Receivable after more than one year Other Tangible Fixed Assets Property, Plant and Equipment Payments on account and asset in construction Furniture Fixture and Equipment Trade Debtors Other Fixture Land and Building Other Debtors Plant and Machinery Other Property, Plant and Equipment Property, Plant and Equipment [Total] 11
  • 12. Limitation Does semantic structure reflect a good similarity? not necessarily e.g. In xEBR, parent-child relation for describing the layout of concepts “Work in progress” is not a type of asset, although both are linked via the parent-child relationship 12
  • 13. Terminology Definition Common naming convention Ngram Vs subterms In financial domain, bigram ”Intangible Fixed” is a subtring of ”Other Intangible Fixed Assets” but not a subterm. Terminological similarity maximal subterm overlap 13
  • 14. Cont. Trade Debts Payable After More Than One Year [[Trade][Debts]][Payable][After More Than One Year] [SAP:Payable] [Ifrs:After More Than One Year] [Investoword:Debt] [FinanceDict:Trade Debts] [Investopedia:Trade] Financial[Debts][Payable][After More Than One Year] Financial Debts Payable After More Than One Year 14
  • 15. Multilingual Subterms Translatedsubterms Available in otherlanguages Advantage Reflect terminological similarities that may be available in one language but not in others. ”Property Plant and Equipment”@en ”Sachanlagen”@de ”Tangible Fixed Asset” @en 15
  • 16. Linguistic Syntactic Information Beyond simple word order phrase structure Dependency structure Phrase structure Intangible fixed : adj adj > ?? Intangible fixed assets : adj adj n > NP Dependency structure Amounts receivable : N Adv : receive:mod, amounts:head Received amounts : V N : receive:mod, amounts:head 16
  • 17. Evaluation Data Set xEBR finance vocabulary 269 terms (concept labels) 72,361(269*269) termpairs Benchmarks SimSem59: sample of 59 term pairs SimSem200 : sample of 200 term pairs (under construction) 17
  • 18. Experiment An overview of similarity measures 18
  • 19. Experiment Results (Simsem59) STL formula used STL = 0.1531 * S + 0.5218 * T + 0.1041 * L + 0.1791 Correlation between similarity scores & simsem59 Semantic Contribution Terminology Contribution Linguistic Contribution 19
  • 20. Conclusion STL outperforms more traditional similarity measures Largest contribution by T (Terminological Analysis) Multilingual subterms performs better than monolingual 20
  • 21. Future work Evaluation on larger data set and vocabularies (IFRS) 3000+ terms 9M term pairs richer set of linguistic operations “recognise” => “recognition” by derivation rule verb_lemma+"ion” Similarity between subterms “Staff Costs” and "Wages And Salaries" 21