SlideShare a Scribd company logo
1 of 61
Panos Alexopoulos
Data and Knowledge Technologies
Professional
http://www.panosalexopoulos.com
p.alexopoulos@gmail.com
@PAlexop
How many truths can you handle?
Strategies and techniques for handling vagueness in
conceptual data models
Talk
Identity
Conceptual Data Models
Semantics
Semantic gap
Vagueness
What is it and how it differs
from other phenomena
Guidelines and (automatic)
techniques
Approaches and trade-offs
Why you should care Metrics and methods
Topics covered
UNDERSTANDING
VAGUENESS
DETECTING
VAGUENESS
TACKLING
VAGUENESS
VAGUENESS
RAMIFICATIONS
MEASURING
VAGUENESS
Understanding
Vagueness
What it is and what it is not
The Sorites Paradox
● 1 grain of wheat does not make a heap.
● If 1 grain doesn’t make a heap, then 2 grains
don’t.
● If 2 grains don’t make a heap, then 3 grains
don’t.
● …
● If 999,999 grains don’t make a heap, then 1
million grains don’t.
● Therefore, 1 million grains don’t make a
heap!
What is vagueness
“Vagueness is a semantic
phenomenon where predicates admit
borderline cases, namely cases where
it is not determinately true that the
predicate applies or not”
—Shapiro 2006
What is not vagueness
AMBIGUITY
E.g., “Last week I visited
Tripoli”
INEXACTNESS
E.g., “My height is
between 165 and 175
cm”
UNCERTAINTY
E.g., “The temperature
in Amsterdam right now
might be 15 degrees”,
Vagueness Types
QUANTITATIVE
Borderline cases stem from the
lack of precise boundaries along
some measurable dimension
(e.g. “Bald”, “Tall”, “Near”)
QUALITATIVE
Borderline cases stem from not
being able to decide which
dimensions and conditions are
sufficient and/or necessary for
the predicate to apply. (e.g.,
“Religion”, “Expert”)
Vagueness
Ramifications
Why should we care
Miscommunication
Disagreements
How would you model this?
Problematic Scenarios
USING
VAGUE DATA
REUSING
VAGUE DATA
INTEGRATING
VAGUE DATA
Detecting
Vagueness
Where and what to look
How to detect vagueness
● Identify which of your data model’s
elements are vague
● Investigate whether these elements are
indeed vague.
● Investigate and determine potential
dimensions and applicability contexts.
Where to look
● Classes: E.g. “Tall Person”, “Strategic
Customer”, “Experienced Researcher”
● Relations and attributes: E.g., “hasGenre”,
“hasIdeology”
● Attribute values: E.g., the “price” of a
restaurant could take as values the vague
terms “cheap”, “moderate” and “expensive”
What to look for
● Vague terms in names and definitions
● Disagreements and inconsistencies among
data modelers, domain experts, and data
stewards during model development and
maintenance
● Disagreements and inconsistencies in user
feedback during model application.
Examples from Wordnet
Vague senses Non vague senses
Yellowish: of the color intermediate
between green and orange in the color
spectrum, of something resembling the
color of an egg yolk.
Compound: composed of more than one
part
Impenitent: impervious to moral persuasion Biweekly: occurring every two weeks.
Notorious: known widely and usually
unfavorably
Outermost: situated at the farthest possible
point from a center.
Examples from the Citation Ontology
Vague relations Non vague relations
plagiarizes: A property indicating that the
author of the citing entity plagiarizes
the cited entity, by including textual or other
elements from the cited entity
without formal acknowledgement of their
source.
sharesAuthorInstitutionWith: Each entity
has at least one author that shares a
common institutional affiliation with an
author of the other entity.
citesAsAuthority: The citing entity cites the
cited entity as one that provides an
authoritative description or definition of the
subject under discussion.
retracts: The citing entity constitutes a
formal retraction of the cited entity.
supports: The citing entity provides
intellectual or factual support for
statements, ideas or conclusions presented
in the cited entity.
includesExcerptFrom: The citing entity
includes one or more excerpts from the
cited entity.
Measuring
Vagueness
Key metrics
Vagueness spread
● The ratio of model elements (classes,
relations, datatypes, etc) that are vague
● A data model with a high vagueness spread
is less explicit and shareable than an
ontology with a low one.
Vagueness intensity
● The degree to which the model’s users disagree
on the validity of the (potential) instances of the
elements.
● The higher this disagreement is for an element,
the more problems the element is likely to cause.
● Calculation:
○ Consider a sample set of vague element
instances
○ Have human judges denote whether and to
what extent they believe these instances are
valid
○ Measure the inter-agreement between users
(e.g. by using Cohen’s kappa)
Tackling
Vagueness
Approaches and trade-offs
Three (complementary)
techniques
VAGUENESS
AWARENESS
TRUTH
CONTEXTUALIZATIO
N
TRUTH
FUZZIFICATION
Vagueness-aware data models
Data models whose vague elements
are accompanied by meta-information
that describes the nature and
characteristics of their vagueness in
an explicit way.
E.g. “Tall Person” is vague and
“Adult” is non-vague
E.g. “Strategic Client" is vague
in the dimension of the
generated revenue”
E.g. “Strategic Client" is vague
in the dimension of the
generated revenue according
to the Financial Manager.
E.g. “Low Budget” has
quantitative vagueness and
“Expert Consultant” qualitative.
E.g. “Strategic Client" is vague
in the dimension of the
generated revenue in the
context of Financial Reporting”
What to make explicit
VAGUENESS EXISTENCE VAGUENESS DIMENSIONS
VAGUENESS
PROVENANCE
VAGUENESS TYPE
APPLICABILITY
CONTEXTS
A Vagueness Metamodel
Truth contextualization
● The same statement in the data model can be true
in some contexts and false in other contexts.
● E.g., “Stephen Curry is short” is true in the context
of “Basketball Playing” but false in all others.
● Potential contexts:
○ Cultures
○ Locations
○ Industries
○ Processes
○ Demographics
○ ...
Contextualized poverty
When to contextualize?
● When vagueness intensity is high and consensus
is impossible
● When you are able to identify truth contexts
● When the applications that use the model
applications can actually handle the contexts.
● When contextualization actually manages to
reduce disagreements and have a positive effect
to the model’s applications.
● When the contextualization benefits outweigh the
context management overhead.
Truth fuzzification
● The basic idea is that we can assign a real number
to a vague statement, within a range from 0 to 1.
○ A value of 1 would mean that the statement
is completely true
○ A value of 0 that it is completely false
○ Any value in between that it is “partly true” to
a given, quantifiable extent.
● For example:
○ “John is an instance of YoungPerson to a
degree of 0.8”
○ “Google has Competitor Microsoft B to a
degree of 0.4”.
● The premise is that fuzzy degrees can reduce the
disagreements around the truth of a vague
statement.
Truth degrees are not
probabilities● A probability statement is about quantifying the
likelihood of events or facts whose truth conditions
are well defined to come true
○ e.g., “it will rain tomorrow with a probability of
0.8”
● A fuzzy statement is about quantifying the extent to
which events or facts whose truth conditions are
undefined to be perceived as true.
○ e.g., “It’s now raining to a degree of 0.6”
● That’s the reason why they are supported by different
mathematical frameworks, namely probability theory
and fuzzy logic
What fuzzification involves
1. Detect and analyze all vague elements in your
model
1. Decide how to fuzzify each element
1. Harvest truth degrees
1. Assess fuzzy model quality
1. Represent fuzzy degrees
1. Apply the fuzzy model
Fuzzification options
● The number and kind of fuzzy degrees you
need to acquire for your model’s vague
elements depend on the latter’s vagueness
type and dimensions.
● If your element has quantitative vagueness
in one dimension, then all you need is a
fuzzy membership function that maps
numerical values of the dimension to fuzzy
degrees in the range [0,1]
Fuzzy membership functions
Fuzzy membership functions
Fuzzification options
● If an element has quantitative vagueness in
more than one dimensions then you can
either:
○ Define a multivariate fuzzy
membership function
○ Define one membership function per
dimension and then combine these via
some fuzzy logic operation, like fuzzy
conjunction or fuzzy disjunction
Multivariate fuzzy membership function
Fuzzy conjunction and disjunction
Fuzzification options
● A third option is to just define one direct degree
per statement.
○ “John is tall to a degree of 0.8”
○ “Maria is expert in data modeling to a degree
of 0.6”
● This approach makes sense when:
○ Your element is vague in too many
dimensions and you cannot find a proper
membership function,
○ When the element’s vagueness is qualitative
and, thus, you have no dimensions to use.
● The drawback is that you will have to harvest a lot
of degrees!
Harvesting truth degrees
● Remember that vague statements provoke
disagreements and debates among people or
even among people and systems.
● To generate fuzzy degrees for these statements
you need practically to capture and quantify these
disagreements.
● How to capture:
○ Ask people directly
○ Ask people indirectly
○ Mine from data
Explanation and feedback based harvesting
Multiple fuzzy truths
● Even with fuzzification you still may be getting
disagreements
● This can be an indication of context-dependence
● Different contexts may require different fuzzy
degrees or membership functions
● In other words, contextualization and fuzzification
are orthogonal approaches.
Fuzzy model quality
● Main questions you need to consider:
○ Have I fuzzified the correct elements?
○ Are the truth degrees consistent?
○ Are the truth degrees accurate?
○ Is the provenance of the truth degrees well
documented?
● Both accuracy and consistency are best treated
not as a binary metric but rather as a distance
Fuzzy model representation
● To represent a truth degree for a relation you
simply need to define a relation attribute named
“truth degree” or similar.
● This is straightforward if you work with E-R
models or property graphs, but also possible in
RDF or OWL, even if these languages do not
directly support relation attributes.
● Things can become more difficult when you need
to represent fuzzy membership functions or more
complex fuzzy rules and axioms, along with their
necessary reasoning support.
Fuzzy model application
● This last step might not look like a semantic
modeling task, yet it is a crucial one if you want
your fuzzification effort to pay off
● A fuzzy data model can be helpful in:
○ Semantic tagging and disambiguation
○ Semantic search and match
○ Decision support systems
○ Conversational agents (aka chatbots)
● In both cases proper design and adaptation of
the underlying algorithms is needed
When to fuzzify?
● Questions you need to consider:
○ Which elements in your model are
unavoidably vague?
○ How severe and impactful are the
disagreements you (expect to) get on the
veracity of these vague elements?
○ Are these disagreements caused by
vagueness or other factors?
When to fuzzify?
● Questions you need to consider:
○ If your model’s elements had fuzzy degrees,
would you get less disagreement?
○ Are the applications that use the model able
to exploit and benefit from truth degrees?
○ Can you develop a scalable way to get and
maintain fuzzy degrees that costs less than
the benefits they bring you?
How would you tackle this?
How would you tackle this?
● (Perceived) inaccuracy
● Disagreements and
misinterpretations
● Reduced semantic
interoperability
Take Aways
Data and information
quality can be negatively
affected by vagueness
● It’s how we think and
communicate
● Insisting on crispness is
unproductive
● But leaving things as-is is
also bad.
Treating vagueness as
noise doesn’t help
● Make your data models
Vagueness-Aware
● Contextualize truth
● Fuzzify truth
Three complementary
weapons to tackle
vagueness
Currently writing a book on
semantic data modeling
To be published by O’Reilly in
September 2020
Early release expected at O’Reilly
Learning Platform in December
2019
To get news about the book
progress and a free preview
chapter send me an email to
p.alexopoulos@gmail.com
How many truths can you handle?

More Related Content

What's hot

Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPromptCloud
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsDerek Kane
 
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010Text Analytics for Dummies 2010
Text Analytics for Dummies 2010Seth Grimes
 
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text AnalyticsLexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text AnalyticsLexalytics
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics PresentationSkylar Ritchie
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysisBob Prieto
 
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Analytics India Magazine
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011Seth Grimes
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics TodaySeth Grimes
 
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Sri Ambati
 
Text Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewText Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewSeth Grimes
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Seth Grimes
 
These slides cover the final defense presentation for my Doctorate degree. Th...
These slides cover the final defense presentation for my Doctorate degree. Th...These slides cover the final defense presentation for my Doctorate degree. Th...
These slides cover the final defense presentation for my Doctorate degree. Th...Eric Brown
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysissneha penmetsa
 
Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...Eric Brown
 

What's hot (20)

Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics Algorithms
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
 
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text AnalyticsLexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics Presentation
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
 
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
 
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
 
Text Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewText Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry View
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
The Need for Explainable AI - Dorothea Wisemann
The Need for Explainable AI - Dorothea WisemannThe Need for Explainable AI - Dorothea Wisemann
The Need for Explainable AI - Dorothea Wisemann
 
Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)
 
These slides cover the final defense presentation for my Doctorate degree. Th...
These slides cover the final defense presentation for my Doctorate degree. Th...These slides cover the final defense presentation for my Doctorate degree. Th...
These slides cover the final defense presentation for my Doctorate degree. Th...
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
 
Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...
 

Similar to How many truths can you handle?

Size Of Writing Paper. Writing Paper Sizes Chart. 2019-01-16
Size Of Writing Paper. Writing Paper Sizes Chart. 2019-01-16Size Of Writing Paper. Writing Paper Sizes Chart. 2019-01-16
Size Of Writing Paper. Writing Paper Sizes Chart. 2019-01-16Kimberly Gomez
 
Self Assessment Essay Examples
Self Assessment Essay ExamplesSelf Assessment Essay Examples
Self Assessment Essay ExamplesMelissa Mack
 
Don't let data get in the way of a good story
Don't let data get in the way of a good storyDon't let data get in the way of a good story
Don't let data get in the way of a good storymark madsen
 
Ivory Essay Uk. Online assignment writing service.
Ivory Essay Uk. Online assignment writing service.Ivory Essay Uk. Online assignment writing service.
Ivory Essay Uk. Online assignment writing service.Tonya Jackson
 
"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AIErika Agostinelli
 
Step Up Your Survey Research - Dawn of the Data Age Lecture Series
Step Up Your Survey Research - Dawn of the Data Age Lecture SeriesStep Up Your Survey Research - Dawn of the Data Age Lecture Series
Step Up Your Survey Research - Dawn of the Data Age Lecture SeriesLuciano Pesci, PhD
 
[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success
[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success
[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise SuccessAltimeter, a Prophet Company
 
Magic If Play Analysis
Magic If Play AnalysisMagic If Play Analysis
Magic If Play AnalysisKendra Cote
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
 
Graphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docxGraphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docxwhittemorelucilla
 
Quantitative Methods for Lawyers - Class #5 - Research Design Part V - Profes...
Quantitative Methods for Lawyers - Class #5 - Research Design Part V - Profes...Quantitative Methods for Lawyers - Class #5 - Research Design Part V - Profes...
Quantitative Methods for Lawyers - Class #5 - Research Design Part V - Profes...Daniel Katz
 
Semantic, Cognitive and Perceptual Computing -Perceptual computing from the f...
Semantic, Cognitive and Perceptual Computing -Perceptual computing from the f...Semantic, Cognitive and Perceptual Computing -Perceptual computing from the f...
Semantic, Cognitive and Perceptual Computing -Perceptual computing from the f...Artificial Intelligence Institute at UofSC
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSiJohn O'Gorman
 
Understanding Users Through Ethnography and Modeling - STC Summit 2010
Understanding Users Through Ethnography and Modeling - STC Summit 2010Understanding Users Through Ethnography and Modeling - STC Summit 2010
Understanding Users Through Ethnography and Modeling - STC Summit 2010Jim Jarrett
 
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...Edmund Chattoe-Brown
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data SetSarah Jimenez
 
Lyric Essay Syllabus
Lyric Essay SyllabusLyric Essay Syllabus
Lyric Essay SyllabusCarrie Brooks
 

Similar to How many truths can you handle? (20)

Size Of Writing Paper. Writing Paper Sizes Chart. 2019-01-16
Size Of Writing Paper. Writing Paper Sizes Chart. 2019-01-16Size Of Writing Paper. Writing Paper Sizes Chart. 2019-01-16
Size Of Writing Paper. Writing Paper Sizes Chart. 2019-01-16
 
Self Assessment Essay Examples
Self Assessment Essay ExamplesSelf Assessment Essay Examples
Self Assessment Essay Examples
 
Don't let data get in the way of a good story
Don't let data get in the way of a good storyDon't let data get in the way of a good story
Don't let data get in the way of a good story
 
Ivory Essay Uk. Online assignment writing service.
Ivory Essay Uk. Online assignment writing service.Ivory Essay Uk. Online assignment writing service.
Ivory Essay Uk. Online assignment writing service.
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI
 
U4 l03 Checking your Assumptions
U4 l03 Checking your AssumptionsU4 l03 Checking your Assumptions
U4 l03 Checking your Assumptions
 
Step Up Your Survey Research - Dawn of the Data Age Lecture Series
Step Up Your Survey Research - Dawn of the Data Age Lecture SeriesStep Up Your Survey Research - Dawn of the Data Age Lecture Series
Step Up Your Survey Research - Dawn of the Data Age Lecture Series
 
[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success
[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success
[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success
 
Magic If Play Analysis
Magic If Play AnalysisMagic If Play Analysis
Magic If Play Analysis
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Graphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docxGraphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docx
 
Quantitative Methods for Lawyers - Class #5 - Research Design Part V - Profes...
Quantitative Methods for Lawyers - Class #5 - Research Design Part V - Profes...Quantitative Methods for Lawyers - Class #5 - Research Design Part V - Profes...
Quantitative Methods for Lawyers - Class #5 - Research Design Part V - Profes...
 
Semantic, Cognitive and Perceptual Computing -Perceptual computing from the f...
Semantic, Cognitive and Perceptual Computing -Perceptual computing from the f...Semantic, Cognitive and Perceptual Computing -Perceptual computing from the f...
Semantic, Cognitive and Perceptual Computing -Perceptual computing from the f...
 
bigDay1data
bigDay1databigDay1data
bigDay1data
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSi
 
Understanding Users Through Ethnography and Modeling - STC Summit 2010
Understanding Users Through Ethnography and Modeling - STC Summit 2010Understanding Users Through Ethnography and Modeling - STC Summit 2010
Understanding Users Through Ethnography and Modeling - STC Summit 2010
 
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data Set
 
Lyric Essay Syllabus
Lyric Essay SyllabusLyric Essay Syllabus
Lyric Essay Syllabus
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

How many truths can you handle?

  • 1. Panos Alexopoulos Data and Knowledge Technologies Professional http://www.panosalexopoulos.com p.alexopoulos@gmail.com @PAlexop How many truths can you handle? Strategies and techniques for handling vagueness in conceptual data models
  • 7. What is it and how it differs from other phenomena Guidelines and (automatic) techniques Approaches and trade-offs Why you should care Metrics and methods Topics covered UNDERSTANDING VAGUENESS DETECTING VAGUENESS TACKLING VAGUENESS VAGUENESS RAMIFICATIONS MEASURING VAGUENESS
  • 9. The Sorites Paradox ● 1 grain of wheat does not make a heap. ● If 1 grain doesn’t make a heap, then 2 grains don’t. ● If 2 grains don’t make a heap, then 3 grains don’t. ● … ● If 999,999 grains don’t make a heap, then 1 million grains don’t. ● Therefore, 1 million grains don’t make a heap!
  • 10. What is vagueness “Vagueness is a semantic phenomenon where predicates admit borderline cases, namely cases where it is not determinately true that the predicate applies or not” —Shapiro 2006
  • 11. What is not vagueness AMBIGUITY E.g., “Last week I visited Tripoli” INEXACTNESS E.g., “My height is between 165 and 175 cm” UNCERTAINTY E.g., “The temperature in Amsterdam right now might be 15 degrees”,
  • 12. Vagueness Types QUANTITATIVE Borderline cases stem from the lack of precise boundaries along some measurable dimension (e.g. “Bald”, “Tall”, “Near”) QUALITATIVE Borderline cases stem from not being able to decide which dimensions and conditions are sufficient and/or necessary for the predicate to apply. (e.g., “Religion”, “Expert”)
  • 16.
  • 17. How would you model this?
  • 19.
  • 21. How to detect vagueness ● Identify which of your data model’s elements are vague ● Investigate whether these elements are indeed vague. ● Investigate and determine potential dimensions and applicability contexts.
  • 22. Where to look ● Classes: E.g. “Tall Person”, “Strategic Customer”, “Experienced Researcher” ● Relations and attributes: E.g., “hasGenre”, “hasIdeology” ● Attribute values: E.g., the “price” of a restaurant could take as values the vague terms “cheap”, “moderate” and “expensive”
  • 23. What to look for ● Vague terms in names and definitions ● Disagreements and inconsistencies among data modelers, domain experts, and data stewards during model development and maintenance ● Disagreements and inconsistencies in user feedback during model application.
  • 24. Examples from Wordnet Vague senses Non vague senses Yellowish: of the color intermediate between green and orange in the color spectrum, of something resembling the color of an egg yolk. Compound: composed of more than one part Impenitent: impervious to moral persuasion Biweekly: occurring every two weeks. Notorious: known widely and usually unfavorably Outermost: situated at the farthest possible point from a center.
  • 25. Examples from the Citation Ontology Vague relations Non vague relations plagiarizes: A property indicating that the author of the citing entity plagiarizes the cited entity, by including textual or other elements from the cited entity without formal acknowledgement of their source. sharesAuthorInstitutionWith: Each entity has at least one author that shares a common institutional affiliation with an author of the other entity. citesAsAuthority: The citing entity cites the cited entity as one that provides an authoritative description or definition of the subject under discussion. retracts: The citing entity constitutes a formal retraction of the cited entity. supports: The citing entity provides intellectual or factual support for statements, ideas or conclusions presented in the cited entity. includesExcerptFrom: The citing entity includes one or more excerpts from the cited entity.
  • 27. Vagueness spread ● The ratio of model elements (classes, relations, datatypes, etc) that are vague ● A data model with a high vagueness spread is less explicit and shareable than an ontology with a low one.
  • 28. Vagueness intensity ● The degree to which the model’s users disagree on the validity of the (potential) instances of the elements. ● The higher this disagreement is for an element, the more problems the element is likely to cause. ● Calculation: ○ Consider a sample set of vague element instances ○ Have human judges denote whether and to what extent they believe these instances are valid ○ Measure the inter-agreement between users (e.g. by using Cohen’s kappa)
  • 31. Vagueness-aware data models Data models whose vague elements are accompanied by meta-information that describes the nature and characteristics of their vagueness in an explicit way.
  • 32. E.g. “Tall Person” is vague and “Adult” is non-vague E.g. “Strategic Client" is vague in the dimension of the generated revenue” E.g. “Strategic Client" is vague in the dimension of the generated revenue according to the Financial Manager. E.g. “Low Budget” has quantitative vagueness and “Expert Consultant” qualitative. E.g. “Strategic Client" is vague in the dimension of the generated revenue in the context of Financial Reporting” What to make explicit VAGUENESS EXISTENCE VAGUENESS DIMENSIONS VAGUENESS PROVENANCE VAGUENESS TYPE APPLICABILITY CONTEXTS
  • 34. Truth contextualization ● The same statement in the data model can be true in some contexts and false in other contexts. ● E.g., “Stephen Curry is short” is true in the context of “Basketball Playing” but false in all others. ● Potential contexts: ○ Cultures ○ Locations ○ Industries ○ Processes ○ Demographics ○ ...
  • 36. When to contextualize? ● When vagueness intensity is high and consensus is impossible ● When you are able to identify truth contexts ● When the applications that use the model applications can actually handle the contexts. ● When contextualization actually manages to reduce disagreements and have a positive effect to the model’s applications. ● When the contextualization benefits outweigh the context management overhead.
  • 37. Truth fuzzification ● The basic idea is that we can assign a real number to a vague statement, within a range from 0 to 1. ○ A value of 1 would mean that the statement is completely true ○ A value of 0 that it is completely false ○ Any value in between that it is “partly true” to a given, quantifiable extent. ● For example: ○ “John is an instance of YoungPerson to a degree of 0.8” ○ “Google has Competitor Microsoft B to a degree of 0.4”. ● The premise is that fuzzy degrees can reduce the disagreements around the truth of a vague statement.
  • 38. Truth degrees are not probabilities● A probability statement is about quantifying the likelihood of events or facts whose truth conditions are well defined to come true ○ e.g., “it will rain tomorrow with a probability of 0.8” ● A fuzzy statement is about quantifying the extent to which events or facts whose truth conditions are undefined to be perceived as true. ○ e.g., “It’s now raining to a degree of 0.6” ● That’s the reason why they are supported by different mathematical frameworks, namely probability theory and fuzzy logic
  • 39. What fuzzification involves 1. Detect and analyze all vague elements in your model 1. Decide how to fuzzify each element 1. Harvest truth degrees 1. Assess fuzzy model quality 1. Represent fuzzy degrees 1. Apply the fuzzy model
  • 40. Fuzzification options ● The number and kind of fuzzy degrees you need to acquire for your model’s vague elements depend on the latter’s vagueness type and dimensions. ● If your element has quantitative vagueness in one dimension, then all you need is a fuzzy membership function that maps numerical values of the dimension to fuzzy degrees in the range [0,1]
  • 43. Fuzzification options ● If an element has quantitative vagueness in more than one dimensions then you can either: ○ Define a multivariate fuzzy membership function ○ Define one membership function per dimension and then combine these via some fuzzy logic operation, like fuzzy conjunction or fuzzy disjunction
  • 45. Fuzzy conjunction and disjunction
  • 46. Fuzzification options ● A third option is to just define one direct degree per statement. ○ “John is tall to a degree of 0.8” ○ “Maria is expert in data modeling to a degree of 0.6” ● This approach makes sense when: ○ Your element is vague in too many dimensions and you cannot find a proper membership function, ○ When the element’s vagueness is qualitative and, thus, you have no dimensions to use. ● The drawback is that you will have to harvest a lot of degrees!
  • 47. Harvesting truth degrees ● Remember that vague statements provoke disagreements and debates among people or even among people and systems. ● To generate fuzzy degrees for these statements you need practically to capture and quantify these disagreements. ● How to capture: ○ Ask people directly ○ Ask people indirectly ○ Mine from data
  • 48. Explanation and feedback based harvesting
  • 49. Multiple fuzzy truths ● Even with fuzzification you still may be getting disagreements ● This can be an indication of context-dependence ● Different contexts may require different fuzzy degrees or membership functions ● In other words, contextualization and fuzzification are orthogonal approaches.
  • 50. Fuzzy model quality ● Main questions you need to consider: ○ Have I fuzzified the correct elements? ○ Are the truth degrees consistent? ○ Are the truth degrees accurate? ○ Is the provenance of the truth degrees well documented? ● Both accuracy and consistency are best treated not as a binary metric but rather as a distance
  • 51. Fuzzy model representation ● To represent a truth degree for a relation you simply need to define a relation attribute named “truth degree” or similar. ● This is straightforward if you work with E-R models or property graphs, but also possible in RDF or OWL, even if these languages do not directly support relation attributes. ● Things can become more difficult when you need to represent fuzzy membership functions or more complex fuzzy rules and axioms, along with their necessary reasoning support.
  • 52. Fuzzy model application ● This last step might not look like a semantic modeling task, yet it is a crucial one if you want your fuzzification effort to pay off ● A fuzzy data model can be helpful in: ○ Semantic tagging and disambiguation ○ Semantic search and match ○ Decision support systems ○ Conversational agents (aka chatbots) ● In both cases proper design and adaptation of the underlying algorithms is needed
  • 53. When to fuzzify? ● Questions you need to consider: ○ Which elements in your model are unavoidably vague? ○ How severe and impactful are the disagreements you (expect to) get on the veracity of these vague elements? ○ Are these disagreements caused by vagueness or other factors?
  • 54. When to fuzzify? ● Questions you need to consider: ○ If your model’s elements had fuzzy degrees, would you get less disagreement? ○ Are the applications that use the model able to exploit and benefit from truth degrees? ○ Can you develop a scalable way to get and maintain fuzzy degrees that costs less than the benefits they bring you?
  • 55. How would you tackle this?
  • 56. How would you tackle this?
  • 57.
  • 58. ● (Perceived) inaccuracy ● Disagreements and misinterpretations ● Reduced semantic interoperability Take Aways Data and information quality can be negatively affected by vagueness ● It’s how we think and communicate ● Insisting on crispness is unproductive ● But leaving things as-is is also bad. Treating vagueness as noise doesn’t help ● Make your data models Vagueness-Aware ● Contextualize truth ● Fuzzify truth Three complementary weapons to tackle vagueness
  • 59.
  • 60. Currently writing a book on semantic data modeling To be published by O’Reilly in September 2020 Early release expected at O’Reilly Learning Platform in December 2019 To get news about the book progress and a free preview chapter send me an email to p.alexopoulos@gmail.com