Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linked Data, Cultural Heritage & the Karma Mapping Software

2,394 views

Published on

Introduction to publishing cultural heritage data in the Linked Open Data cloud, and using Karma to map data to the CIDO CRM ontology

Published in: Technology
  • Login to see the comments

Linked Data, Cultural Heritage & the Karma Mapping Software

  1. 1. Linked Data & Cultural Heritage Pedro Szekely and Craig Knoblock USC/Information Sciences Institute pszekely@isi.edu, knoblock@isi.edu http://isi.edu/integration/karma February 2015
  2. 2. Outline •  Problem •  Linked Data •  Karma •  Reconciliation •  Next steps CC-By 2.0 2USC Information Sciences Institute
  3. 3. CURRENT STATE OF CULTURAL HERITAGE DATA CC-By 2.0 3USC Information Sciences Institute
  4. 4. Humans Browsing the Web Crystal Bridges Museum of American Art Dallas Museum of Art Indianapolis Museum of Art The Metropolitan Museum of Art National Portrait Gallery Smithsonian American Art Museum USC Information Sciences Institute CC-By 2.0 4
  5. 5. WHAT WE SEE CC-By 2.0 5USC Information Sciences Institute
  6. 6. blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah  blah  blah       blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah      blah  blah  blah  blah  blah  blah  blah  blah    blah   blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah     blah  blah  blah  blah   blah  blah  blah   blah  blah   blah  blah  blah   blah  blah  blah  blah     blah  blah  blah  blah  blah   blah  blah  blah     blah  blah  blah     blah  blah  blah  blah       blah  blah  blah   blah  blah  blah  blah   blah  blah  blah  blah  blah   blah  blah  blah       blah  blah  blah     blah  blah  blah  blah     blah  blah  blah     blah  blah  blah     blah  blah  blah       blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah   blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah     blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah     WHAT THE COMPUTER SEES USC Information Sciences Institute CC-By 2.0 6
  7. 7. WEB PAGES ARE UNUSABLE FOR CREATING INNOVATIVE APPLICATIONS USING THE DATA CC-By 2.0 7USC Information Sciences Institute
  8. 8. SOLUTION: Linked Open Data “web pages for computers” using W3C standards for publishing data CC-By 2.0 8USC Information Sciences Institute
  9. 9. CC-By 2.0 9 Tim Berners Lee on Linked Open Data USC Information Sciences Institute http://youtu.be/OM6XIICm_qo
  10. 10. Humans Browsing the Web Crystal Bridges Museum of American Art Dallas Museum of Art Indianapolis Museum of Art The Metropolitan Museum of Art National Portrait Gallery Smithsonian American Art Museum USC Information Sciences Institute CC-By 2.0 10
  11. 11. CC-By 2.0 11 RAW DATA NOW USC Information Sciences Institute
  12. 12. Publish Your Raw Data Crystal Bridges Museum of American Art Dallas Museum of Art Indianapolis Museum of Art The Metropolitan Museum of Art National Portrait Gallery Smithsonian American Art Museum USC Information Sciences Institute CC-By 2.0 12
  13. 13. CC-By 2.0 13 Examples of Raw Data Now USC Information Sciences Institute https://github.com/cooperhewitt/collection https://github.com/IMAmuseum/ima-collection
  14. 14. Convert Data to CRM (2 star) Crystal Bridges Museum of American Art Dallas Museum of Art Indianapolis Museum of Art The Metropolitan Museum of Art National Portrait Gallery Smithsonian American Art Museum USC Information Sciences Institute CC-By 2.0 14
  15. 15. Linked Museum Data (3 star) Crystal Bridges Museum of American Art Dallas Museum of Art Indianapolis Museum of Art The Metropolitan Museum of Art National Portrait Gallery Smithsonian American Art Museum USC Information Sciences Institute CC-By 2.0 15
  16. 16. Linked Cultural Heritage Data (4 star) USC Information Sciences Institute CC-By 2.0 16
  17. 17. Represent Resources Using URIs h&p://szekelys.com/family#pedro   “Pedro”   h&p://xmlns.com/foaf/0.1/firstName   USC Information Sciences Institute CC-By 2.0 17
  18. 18. Represent Information as Triples h&p://szekelys.com/family#pedro   h&p://xmlns.com/foaf/0.1/firstName   Subject Predicate Object The resource being described A property of the resource The value of the property “Pedro”   USC Information Sciences Institute CC-By 2.0 18
  19. 19. RDF Graphs h&p://szekelys.com/family#pedro   “Pedro”   foaf:firstName   foaf:Person   rdf:type   h&p://isi.edu/~szekely   foaf:homepage   USC Information Sciences Institute CC-By 2.0 19
  20. 20. Linked Open Data CC-By 2.0 20USC Information Sciences Institute
  21. 21. Steps to Create Linked Open Data CC-By 2.0 21USC Information Sciences Institute
  22. 22. Steps to Create Linked Open Data •  Publish the raw data … get the data out of the proprietary database •  Select ontologies … that define classes and properties for our data •  Define URI scheme … identifiers of your resources •  Convert data to RDF … from data sources to the ontologies •  Identify links to other Linked Data datasets … aka reconciliation, entity resolution, … USC Information Sciences Institute CC-By 2.0 22
  23. 23. CC-By 2.0 23 CIDOC CRM •  Select ontologies … that define classes and properties for our data http://www.cidoc-crm.org/ USC Information Sciences Institute
  24. 24. CC-By 2.0 24 •  Define URI scheme … identifiers of your resources USC Information Sciences Institute
  25. 25. CC-By 2.0 25 http://edan.si.edu/saam/person-institution/8 http://edan.si.edu/saam/person-institution/8/id http://edan.si.edu/saam/person-institution/8/appellation/displayname http://edan.si.edu/saam/object/12 http://edan.si.edu/saam/object/12/title http://edan.si.edu/saam/object/12/id http://edan.si.edu/saam/object/12/acquisition http://edan.si.edu/saam/object/12/production http://edan.si.edu/saam/object/12/production/date http://edan.si.edu/saam/thesauri/nationality/American http://edan.si.edu/saam/thesauri/classification/Photography •  Define URI scheme … identifiers of your resources USC Information Sciences Institute
  26. 26. CC-By 2.0 26 •  Convert data to RDF … from data sources to the ontologies USC Information Sciences Institute
  27. 27. RDF Mapping Tools CC-By 2.0 27USC Information Sciences Institute TOOL SHORTCOMINGS BENEFITS custom code labor intensive w error prone flexible R2RML difficult to learn w only SQL databases W3C standard w good documentation w multiple vendors Open Refine no guidance w only tabular data graphical user interface w support for reconciliation w open source Karma university product easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
  28. 28. XML/JSON Services Karma SQL/CSV BigData RDF JSON … Interactive tool for rapidly extracting, cleaning, transforming, integrating & publishing linked data in multiple formats 28USC Information Sciences Institute Ontology
  29. 29. KARMA DEMO CC-By 2.0 29USC Information Sciences Institute http://youtu.be/h3_yiBhAJIc
  30. 30. Easy To Use CC-By 2.0 30 easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source CLEAR DEPICTION OF MAPPING USC Information Sciences Institute
  31. 31. CC-By 2.0 31 easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source LEARNS TO MAP YOUR DATA USC Information Sciences Institute
  32. 32. CC-By 2.0 32 easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source SUGGEST CORRECT ADJUSTMENTS USC Information Sciences Institute
  33. 33. CC-By 2.0 33 easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source EMBEDDED PYTHON SCRIPTING USC Information Sciences Institute
  34. 34. CC-By 2.0 34 easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source IMPORT POPULAR DATA FORMATS USC Information Sciences Institute
  35. 35. CC-By 2.0 35 easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source OUTPUT RDF IN MULTIPLE FORMATS ntriples JSON AVRO SPARQL ElasticSearch, GitHub, … Hadoop, BigData USC Information Sciences Institute
  36. 36. CC-By 2.0 36 easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source 40 million documents 1 billion triples larger than all AAC museums combined USC Information Sciences Institute
  37. 37. CC-By 2.0 37 easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source periodic update every hour, every day continuous update as new records come in USC Information Sciences Institute
  38. 38. CC-By 2.0 38 easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source Karma compatible with R2RML tools USC Information Sciences Institute
  39. 39. CC-By 2.0 39 easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source Karma Is Open Souce USC Information Sciences Institute
  40. 40. CC-By 2.0 40 URI RECONCILIATION USC Information Sciences Institute
  41. 41. Multiple “John Singer Sargent” ima:Singer_Sargent_John a aac:Person ; dct:date "1856-1925" ; foaf:name "John Singer Sargent" . saam:person_4253 a aac:Person ; saam:associatedPlace saam:SaamPlace_1357324439768t1r13950_0, saam:SaamPlace_1357324439768t1r13951_0 ; saam:constituentId "4253" ; rdaGr2:biographicalInformation “Painter. Sargent traveled …" ; rdaGr2:dateAssociatedWithThePerson "1990-10-1”, "1995-5-8" ; rdaGr2:dateOfBirth "1856-1-12" ; rdaGr2:dateOfDeath "1925-4-15" ; rdaGr2:placeOfBirth saam:SaamPlace_1357324439768t1r13952_0 ; rdaGr2:placeOfDeath saam:SaamPlace_1357324439768t1r13953_0 ; skos:altLabel "John S. Sargent" ; skos:prefLabel "John Singer Sargent" . cb:12_4567 a aac:Person ; ont0:dateOfBirth "1879", "1885" ; ont0:dateOfDeath "1925" ; skos:prefLabel "John Singer Sargent" . met:person_1893_3819 a aac:Person ; ont0:placeOfResidence "North and Central America", "United States" ; foaf:name "John Singer Sargent" . dma:person_John_Singer_Sargent a aac:Person ; ont0:dateOfBirth "1856" ; ont0:dateOfDeath "1925" ; foaf:name "John Singer Sargent" . Pedro  Szekely   USC Information Sciences Institute CC-By 2.0 41
  42. 42. John Singer Sargent ima:SaamPerson_John_Singer_Sargent a aac:Person ; dct:date "1856-1925" ; foaf:name "John Singer Sargent" . aac:Person_4253 a aac:Person ; saam:associatedPlace saam:SaamPlace_1357324439768t1r13950_0, saam:SaamPlace_1357324439768t1r13951_0 ; saam:constituentId "4253" ; rdaGr2:biographicalInformation “Painter. Sargent traveled …" ; rdaGr2:dateAssociatedWithThePerson "1990-10-1”, "1995-5-8" ; rdaGr2:dateOfBirth "1856-1-12" ; rdaGr2:dateOfDeath "1925-4-15" ; rdaGr2:placeOfBirth saam:SaamPlace_1357324439768t1r13952_0 ; rdaGr2:placeOfDeath saam:SaamPlace_1357324439768t1r13953_0 ; skos:altLabel "John S. Sargent" ; skos:prefLabel "John Singer Sargent" . cb:SaamPerson_John_Singer_Sargent a aac:Person ; ont0:dateOfBirth "1879", "1885" ; ont0:dateOfDeath "1925" ; skos:prefLabel "John Singer Sargent" . met:SaamPerson_John_Singer_Sargent a aac:Person ; ont0:placeOfResidence "North and Central America", "United States" ; foaf:name "John Singer Sargent" . dallas:SaamPerson_John_Singer_Sargent a aac:Person ; ont0:dateOfBirth "1856" ; ont0:dateOfDeath "1925" ; foaf:name "John Singer Sargent" . Pedro  Szekely   USC Information Sciences Institute CC-By 2.0 42
  43. 43. Reconciled “John Singer Sargent” URIs saam:person_4253 owl:sameAs cb:12_4567 ; owl:sameAs dma:person_John_Singer_Sargent ; owl:sameAs ima:Singer_Sargent_John ; owl:sameAs met:SaamPerson_John_Singer_Sargent ; owl:sameAs dbpedia:John_Singer_Sargent ; owl:sameAs nytimes/N49129220686803623753 ; owl:sameAs w-flick/John_Singer_Sargent ; ... . Pedro  Szekely   USC Information Sciences Institute CC-By 2.0 43
  44. 44. URI Reconciliation In Karma Pedro  Szekely   USC Information Sciences Institute CC-By 2.0 44
  45. 45. Results of Automatic Linking Pedro  Szekely   99% are correct 6% are missing USC Information Sciences Institute CC-By 2.0 45
  46. 46. Steps to Create Linked Open Data •  Publish the raw data … get the data out of the proprietary database •  Select ontologies … that define classes and properties for our data •  Define URI scheme … identifiers of your resources •  Convert data to RDF … from data sources to the ontologies •  Identify links to other Linked Data datasets … aka reconciliation, entity resolution, … USC Information Sciences Institute CC-By 2.0 46
  47. 47. CC-By 2.0 47 TMS to CRM easy? USC Information Sciences Institute
  48. 48. CC-By 2.0 48 TMS to CRM easy? USC Information Sciences Institute NO  
  49. 49. COMMUNITY EFFORT •  Publish the raw data … get the data out of the proprietary database •  Select ontologies … that define classes and properties for our data •  Define URI scheme … identifiers of your resources •  Convert data to RDF … from data sources to the ontologies •  Identify links to other Linked Data datasets … aka reconciliation, entity resolution, … USC Information Sciences Institute CC-By 2.0 49
  50. 50. Radical Ideas •  ULAN in Wikipedia or Wikidata •  ULAN in GitHub •  Collection data in GitHub •  Community created CRM mappings in GitHub •  CRM in JSON-LD in GitHub •  Tools to export from TMS to GitHub USC Information Sciences Institute CC-By 2.0 50
  51. 51. STORING AND MAINTAINING THE DATA CC-By 2.0 51USC Information Sciences Institute
  52. 52. Deployment Options CC-By 2.0 52USC Information Sciences Institute Technology Shortcomings Benefits SPARQL endpoint low reliability, esoteric, slow sophisticated query language RDF dump no query capability, esoteric flexibility: clients can download and use in applications, easy to publish JSON-LD + ElasticSearch restricted query language very high performance, mainstream technology, easy to publish Karma supports the three options
  53. 53. CC-By 2.0 53 federation every publishes their data with their own URIs aggregation aggregator repulishes everyone’s data with new URIs USC Information Sciences Institute
  54. 54. thanks for your attention! https://github.com/usc-isi-i2/Web-Karma! Open Source, Apache 2 License! CC-By 2.0 54USC Information Sciences Institute

×