Presentation Title:
owl:sameAs Considered Harmful to Provenance
Presentation Abstract:
GOTO was once a standard operation in most computer programming languages. Edsger Dijkstra argued in 1968 that GOTO is a low level operation that is not appropriate for higher-level programming languages, and advocated structured programming in its place. Arguably, owl:sameAs in its current usage may be poised to go through a similar discussion and transformation period. In biomedical research, the provenance of information gathered is nearly as important as, and sometimes even more important than, the information itself. owl:sameAs allows someone to state that two separate descriptions really refer to the same entity. Currently that means that operational systems merge the descriptions and at the same time, merge the provenance information, thus losing the ability to retrieve where each individual description came from. This merging of provenance can be problematic or even catastrophic in biomedical applications that demand access to provenance information. Based on our knowledge of integration issues of data in biomedicine, we give examples as use cases of this issue in biospecimen management and experimental metadata representations. We suggest that systems using any construct like owl:sameAs must provide an option preserve the provenance of the entities and ground assertions related to those entities in question.
1. The Tetherless World
Constellation
owl:sameAs Considered
Harmful to Provenance
James McCusker and Deborah L. McGuinness
Tetherless World Constellation,
Rensselaer Polytechnic Institute
http://tw.rpi.edu
7. Background
For science, provenance is:
The origin or source from which something comes,
intention for use,
who/what generated it,
manner of manufacture,
8. Background
For science, provenance is:
The origin or source from which something comes,
intention for use,
who/what generated it,
manner of manufacture,
history of subsequent owners,
9. Background
For science, provenance is:
The origin or source from which something comes,
intention for use,
who/what generated it,
manner of manufacture,
history of subsequent owners,
sense of place and time of manufacture,
10. Background
For science, provenance is:
The origin or source from which something comes,
intention for use,
who/what generated it,
manner of manufacture,
history of subsequent owners,
sense of place and time of manufacture,
and production or discovery,
11. Background
For science, provenance is:
The origin or source from which something comes,
intention for use,
who/what generated it,
manner of manufacture,
history of subsequent owners,
sense of place and time of manufacture,
and production or discovery,
all in sufficient detail for reproducibility.
13. Provenance of this Talk
• Discussions from International Semantic Web
Conference (ISWC) Workshop on Role of
Semantic Web in Provenance Management
14. Provenance of this Talk
• Discussions from International Semantic Web
Conference (ISWC) Workshop on Role of
Semantic Web in Provenance Management
• Pay Hayes’ ISWC talk on Blogic
15. Provenance of this Talk
• Discussions from International Semantic Web
Conference (ISWC) Workshop on Role of
Semantic Web in Provenance Management
• Pay Hayes’ ISWC talk on Blogic
• Web Science 2010 Talk: An Empirical
Study of owl:sameAs Use in Linked Data,
Ding et al.
16. Provenance of this Talk
• Discussions from International Semantic Web
Conference (ISWC) Workshop on Role of
Semantic Web in Provenance Management
• Pay Hayes’ ISWC talk on Blogic
• Web Science 2010 Talk: An Empirical
Study of owl:sameAs Use in Linked Data,
Ding et al.
• Semantic web data warehousing for caGrid,
McCusker et al.
17. Provenance of this Talk
• Discussions from International Semantic Web
Conference (ISWC) Workshop on Role of
Semantic Web in Provenance Management
• Pay Hayes’ ISWC talk on Blogic
• Web Science 2010 Talk: An Empirical
Study of owl:sameAs Use in Linked Data,
Ding et al.
• Semantic web data warehousing for caGrid,
McCusker et al.
• Other discussions about sameAs.
18. SameAs and Provenance collide in
experiments Datasets D andhas and
A scientist
E,
wants to link them, but
D and E refer to
different instances of
the same cell line.
Dataset D
Dataset E
19. SameAs and Provenance collide in
experiments Datasets D andhas and
A scientist
E,
wants to link them, but
Specimen LB D and E refer to
Type Cell Line different instances of
the same cell line.
Created on 8/31/09
Quantity 5g
Passage 0 used Dataset D
derived from
Specimen LA used Dataset E
Type Cell Line
Created on 9/20/09
Quantity 10 g
Passage 10
20. SameAs and Provenance collide in
experiments Datasets D andhas and
A scientist
E,
wants to link them, but
Specimen LB D and E refer to
Type Cell Line different instances of
the same cell line.
Created on 8/31/09
Quantity 5g
Passage 0 used Dataset D
Specimen T derived from
Type Tumor
Created on 7/8/09
Specimen LA used Dataset E
Quantity 5g
Type Cell Line
Created on 9/20/09
derived from Quantity 10 g
Passage 10
21. SameAs and Provenance collide in
experiments Datasets D andhas and
A scientist
E,
Patient A wants to link them, but
Visit Date 7/8/09 Specimen LB D and E refer to
DOB 2/3/45 Type Cell Line different instances of
the same cell line.
Dx Melanoma Created on 8/31/09
Quantity 5g
derived from Passage 0 used Dataset D
Specimen T derived from
Type Tumor
Created on 7/8/09
Specimen LA used Dataset E
Quantity 5g
Type Cell Line
Created on 9/20/09
derived from Quantity 10 g
Passage 10
22. SameAs and Provenance collide in
experiments The naturalstate
is to
inclination
Patient A (LA owl:sameAs LB)
Visit Date 7/8/09 Specimen LB and LA. Then D and E
DOB 2/3/45 Type Cell Line can refer to the same
specimens.
Dx Melanoma Created on 8/31/09
Quantity 5g
derived from Passage 0 used Dataset D
Specimen T derived from owl:sameAs
Type Tumor
Created on 7/8/09
Specimen LA used Dataset E
Quantity 5g
Type Cell Line
Created on 9/20/09
derived from Quantity 10 g
Passage 10
23. SameAs and Provenance collide in
experiments Now theOops! has
specimen
Patient A multiple values for some
Visit Date 7/8/09 Specimen LB important properties and
LA appears to have
DOB 2/3/45 Type Cell Line
been derived from itself.
Dx Melanoma Created on 8/31 or 9/20
Quantity 5 or 10 g
derived from
derived from Passage 0 or 10 used Dataset D
Specimen T derived from
owl:sameAs used
Type Tumor
Created on 7/8/09
Specimen LA used Dataset E
Quantity 5g
Type Cell Line
Created on 9/20 or 8/31
derived from Quantity 10 or 5 g
Passage 10 or 0
26. Now try to answer:
• Experiment Analysis:
– The data doesn't look right. What were the methods and
protocols, and how consistent were they, going back to surgical
resection?
27. Now try to answer:
• Experiment Analysis:
– The data doesn't look right. What were the methods and
protocols, and how consistent were they, going back to surgical
resection?
– Did the “same cell line” actually come from the same tumor, or
just from the same patient? Or even different patients?
28. Now try to answer:
• Experiment Analysis:
– The data doesn't look right. What were the methods and
protocols, and how consistent were they, going back to surgical
resection?
– Did the “same cell line” actually come from the same tumor, or
just from the same patient? Or even different patients?
– What originally seemed to be a primary breast cancer or lung
cancer is now a metastasized melanoma. Now what?
29. Now try to answer:
• Experiment Analysis:
– The data doesn't look right. What were the methods and
protocols, and how consistent were they, going back to surgical
resection?
– Did the “same cell line” actually come from the same tumor, or
just from the same patient? Or even different patients?
– What originally seemed to be a primary breast cancer or lung
cancer is now a metastasized melanoma. Now what?
• Biospecimen Manangement:
30. Now try to answer:
• Experiment Analysis:
– The data doesn't look right. What were the methods and
protocols, and how consistent were they, going back to surgical
resection?
– Did the “same cell line” actually come from the same tumor, or
just from the same patient? Or even different patients?
– What originally seemed to be a primary breast cancer or lung
cancer is now a metastasized melanoma. Now what?
• Biospecimen Manangement:
– Is a histology slide made from a tumor the same as
the tumor? What about the tissue microarray,
the cell culture, or the isolated molecular material?
31. Now try to answer:
None of this is important, until
• Experiment Analysis: it turns out to be.
– The data doesn't look right. What were the methods and
protocols, and how consistent were they, going back to surgical
resection?
– Did the “same cell line” actually come from the same tumor, or
just from the same patient? Or even different patients?
– What originally seemed to be a primary breast cancer or lung
cancer is now a metastasized melanoma. Now what?
• Biospecimen Manangement:
– Is a histology slide made from a tumor the same as
the tumor? What about the tissue microarray,
the cell culture, or the isolated molecular material?
34. Issues and Requirements
• owl:sameAs is a powerful construct.
• Need an alternative way of representing a
portion of the owl:sameAs relationship.
35. Issues and Requirements
• owl:sameAs is a powerful construct.
• Need an alternative way of representing a
portion of the owl:sameAs relationship.
• Could be something that is possibly
weaker or decomposable.
36. Issues and Requirements
• owl:sameAs is a powerful construct.
• Need an alternative way of representing a
portion of the owl:sameAs relationship.
• Could be something that is possibly
weaker or decomposable.
• Could be a domain-specific best practice
modeling option.
37. Issues and Requirements
• owl:sameAs is a powerful construct.
• Need an alternative way of representing a
portion of the owl:sameAs relationship.
• Could be something that is possibly
weaker or decomposable.
• Could be a domain-specific best practice
modeling option.
• We need understand what values
came from where.
44. Possible Fixes
• Deprecate owl:sameAs?X
• Weaken owl:sameAs? X
• Less liberal use of owl:sameAs?
• A domain-literate modeling of
weakened notion owl:sameAs?
45. Possible Fixes
• Deprecate owl:sameAs? X
• Weaken owl:sameAs? X
• Less liberal use of owl:sameAs?
• A domain-literate modeling of
weakened notion owl:sameAs?
• A transitive, reflexive version of
skos:exactMatch?
46. Possible Fixes
• Deprecate owl:sameAs? X
• Weaken owl:sameAs? X
• Less liberal use of owl:sameAs?
• A domain-literate modeling of
weakened notion owl:sameAs?
• A transitive, reflexive version of
skos:exactMatch?
• ?x skos:exactMatch ?y.
?y propertyOfInterest ?value.
50. Conclusions
• Provenance is critical for understanding
linked data in scientific applications.
• Using owl:sameAs can result in the
confusion of provenance and ground
truths.
51. Conclusions
• Provenance is critical for understanding
linked data in scientific applications.
• Using owl:sameAs can result in the
confusion of provenance and ground
truths.
52. Conclusions
• Provenance is critical for understanding
linked data in scientific applications.
• Using owl:sameAs can result in the
confusion of provenance and ground
truths.
• We are exploring some of the potential
solutions.
53. Acknowledgements &
References
• Tetherless World Constellation:
• Jim Hendler, Deborah McGuinness, Peter Fox, Li Ding, and
the rest.
• Carole Goble (for the title)
• Blogic. P. Hayes. International Semantic Web Conference, 2009
http://www.slideshare.net/PatHayes/blogic-iswc-2009-invited-
talk.
• An Empirical Study of owl:sameAs Use in Linked Data. L. Ding,
J. Shinavier, T. Finin and D. L. McGuinness Web Science 2010,
http://tw.rpi.edu/wiki/An_Empirical_Study_of_owl:sameAs_Use_in_Linked_Data
• L. Moreau, “The Foundations for Provenance on the Web,” Nov.
2009 http://eprints.ecs.soton.ac.uk/18176.
• J. McCusker, J. Phillips, A. Beltran, A. Finkelstein, and M.
Krauthammer, “Semantic web data warehousing for caGrid,”
BMC Bioinformatics, vol. 10, 2009, p. S2.
54. skos:exactMatch
Example select ?ds, ?dx
where {
?ds used ?spec.
?spec matches ?x.
matches ?x diagnosis ?dx.
}
Dataset D
matches
Dataset E
55. skos:exactMatch
Example select ?ds, ?dx
where {
?ds used ?spec.
Specimen LB ?spec matches ?x.
matches
Type Cell Line ?x diagnosis ?dx.
}
Created on 8/31/09
Quantity 5g
Passage 0 used Dataset D
derived from matches
Specimen LA used Dataset E
Type Cell Line
Created on 9/20/09
Quantity 10 g
Passage 10
56. skos:exactMatch
Example select ?ds, ?dx
where {
?ds used ?spec.
Specimen LB ?spec matches ?x.
matches
Type Cell Line ?x diagnosis ?dx.
}
Created on 8/31/09
Quantity 5g
Passage 0 used Dataset D
Specimen T derived from matches
Type Tumor
Created on 7/8/09
Specimen LA used Dataset E
Quantity 5g
Type Cell Line
Created on 9/20/09
derived from Quantity 10 g
Passage 10
57. skos:exactMatch
Example select ?ds, ?dx
where {
Patient A ?ds used ?spec.
Visit Date 7/8/09 Specimen LB ?spec matches ?x.
DOB 2/3/45
matches
Type Cell Line ?x diagnosis ?dx.
}
Dx Melanoma Created on 8/31/09
Quantity 5g
derived from Passage 0 used Dataset D
Specimen T derived from matches
Type Tumor
Created on 7/8/09
Specimen LA used Dataset E
Quantity 5g
Type Cell Line
Created on 9/20/09
derived from Quantity 10 g
Passage 10
Editor's Notes
There is a growing appreciation in the Linked Data community for owl:sameAs, but a growing apprehension and concern for it in the provenance and reasoning communities.
There is a growing appreciation in the Linked Data community for owl:sameAs, but a growing apprehension and concern for it in the provenance and reasoning communities.
There is a growing appreciation in the Linked Data community for owl:sameAs, but a growing apprehension and concern for it in the provenance and reasoning communities.
There is a growing appreciation in the Linked Data community for owl:sameAs, but a growing apprehension and concern for it in the provenance and reasoning communities.
There is a growing appreciation in the Linked Data community for owl:sameAs, but a growing apprehension and concern for it in the provenance and reasoning communities.
What is passage?
DON’T RAMBLE!!!
Any time a particular cell line is mentioned in an experiment, it is the “same as” itself, because we want to compare, for instance, kinase phosphorylation with gene expression.
A cell line is an abstract concept. We both have “YUMAC” cells, but I have many different colonies of YUMAC, since I have been growing them for a while.
In fact, I sent you one of my colonies so you can do other research on them.
The provenance of one colony shouldn't affect that of the others. Each biospecimen needs it's own provenance trace.
What is passage?
DON’T RAMBLE!!!
Any time a particular cell line is mentioned in an experiment, it is the “same as” itself, because we want to compare, for instance, kinase phosphorylation with gene expression.
A cell line is an abstract concept. We both have “YUMAC” cells, but I have many different colonies of YUMAC, since I have been growing them for a while.
In fact, I sent you one of my colonies so you can do other research on them.
The provenance of one colony shouldn't affect that of the others. Each biospecimen needs it's own provenance trace.
What is passage?
DON’T RAMBLE!!!
Any time a particular cell line is mentioned in an experiment, it is the “same as” itself, because we want to compare, for instance, kinase phosphorylation with gene expression.
A cell line is an abstract concept. We both have “YUMAC” cells, but I have many different colonies of YUMAC, since I have been growing them for a while.
In fact, I sent you one of my colonies so you can do other research on them.
The provenance of one colony shouldn't affect that of the others. Each biospecimen needs it's own provenance trace.
Better explanation for the breast cancer to melanoma issue.
DON’T RAMBLE!!!
Better explanation for the breast cancer to melanoma issue.
DON’T RAMBLE!!!
Better explanation for the breast cancer to melanoma issue.
DON’T RAMBLE!!!
Better explanation for the breast cancer to melanoma issue.
DON’T RAMBLE!!!
Better explanation for the breast cancer to melanoma issue.
DON’T RAMBLE!!!
Better explanation for the breast cancer to melanoma issue.
DON’T RAMBLE!!!
Better explanation for the breast cancer to melanoma issue.
DON’T RAMBLE!!!
sameas allows reasoners to infer useful equality relationships.
sameas allows reasoners to infer useful equality relationships.
sameas allows reasoners to infer useful equality relationships.
sameas allows reasoners to infer useful equality relationships.
sameas allows reasoners to infer useful equality relationships.
DON’T RAMBLE!!!
Deprecate owl:sameAs?
Maybe, but we need an alternative. owl:sameAs is very useful for linked data.
Weaken owl:sameAs?
This would disrupt the semantics that are relied on in existing applications.
A transitive, reflexive skos:exactMatch?
Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them.
DON’T RAMBLE!!!
Deprecate owl:sameAs?
Maybe, but we need an alternative. owl:sameAs is very useful for linked data.
Weaken owl:sameAs?
This would disrupt the semantics that are relied on in existing applications.
A transitive, reflexive skos:exactMatch?
Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them.
DON’T RAMBLE!!!
Deprecate owl:sameAs?
Maybe, but we need an alternative. owl:sameAs is very useful for linked data.
Weaken owl:sameAs?
This would disrupt the semantics that are relied on in existing applications.
A transitive, reflexive skos:exactMatch?
Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them.
DON’T RAMBLE!!!
Deprecate owl:sameAs?
Maybe, but we need an alternative. owl:sameAs is very useful for linked data.
Weaken owl:sameAs?
This would disrupt the semantics that are relied on in existing applications.
A transitive, reflexive skos:exactMatch?
Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them.
DON’T RAMBLE!!!
Deprecate owl:sameAs?
Maybe, but we need an alternative. owl:sameAs is very useful for linked data.
Weaken owl:sameAs?
This would disrupt the semantics that are relied on in existing applications.
A transitive, reflexive skos:exactMatch?
Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them.
DON’T RAMBLE!!!
Deprecate owl:sameAs?
Maybe, but we need an alternative. owl:sameAs is very useful for linked data.
Weaken owl:sameAs?
This would disrupt the semantics that are relied on in existing applications.
A transitive, reflexive skos:exactMatch?
Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them.
DON’T RAMBLE!!!
Deprecate owl:sameAs?
Maybe, but we need an alternative. owl:sameAs is very useful for linked data.
Weaken owl:sameAs?
This would disrupt the semantics that are relied on in existing applications.
A transitive, reflexive skos:exactMatch?
Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them.
DON’T RAMBLE!!!
Deprecate owl:sameAs?
Maybe, but we need an alternative. owl:sameAs is very useful for linked data.
Weaken owl:sameAs?
This would disrupt the semantics that are relied on in existing applications.
A transitive, reflexive skos:exactMatch?
Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them.
What is passage?
DON’T RAMBLE!!!
Any time a particular cell line is mentioned in an experiment, it is the “same as” itself, because we want to compare, for instance, kinase phosphorylation with gene expression.
A cell line is an abstract concept. We both have “YUMAC” cells, but I have many different colonies of YUMAC, since I have been growing them for a while.
In fact, I sent you one of my colonies so you can do other research on them.
The provenance of one colony shouldn't affect that of the others. Each biospecimen needs it's own provenance trace.
What is passage?
DON’T RAMBLE!!!
Any time a particular cell line is mentioned in an experiment, it is the “same as” itself, because we want to compare, for instance, kinase phosphorylation with gene expression.
A cell line is an abstract concept. We both have “YUMAC” cells, but I have many different colonies of YUMAC, since I have been growing them for a while.
In fact, I sent you one of my colonies so you can do other research on them.
The provenance of one colony shouldn't affect that of the others. Each biospecimen needs it's own provenance trace.
What is passage?
DON’T RAMBLE!!!
Any time a particular cell line is mentioned in an experiment, it is the “same as” itself, because we want to compare, for instance, kinase phosphorylation with gene expression.
A cell line is an abstract concept. We both have “YUMAC” cells, but I have many different colonies of YUMAC, since I have been growing them for a while.
In fact, I sent you one of my colonies so you can do other research on them.
The provenance of one colony shouldn't affect that of the others. Each biospecimen needs it's own provenance trace.