Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan

1,211 views

Published on

Linked data paradigm has provided the potential for any data to link or to be linked with structural information, internally and externally. To improve on current cultural
service of the Union Catalog of Digital Archives Taiwan (catalog.digitalarchives.tw), a linked data prototype is developed and benefited by extending the Art & Architecture Thesaurus (AAT) for a machine-understandable catalog service.
However, knowledge engineering is time and labor consuming, especially for an archive that is non-western based in culture and multidisciplinary in natural. This
makes data semantics of the UCdaT are extremely challenged for mapping to international standards and vocabularies.
At this stage, the triple store is an experimental addition to the existing Union Catalog of Digital Archives Taiwan architecture, and provides semantic links to target collections for relative suggestions. This will guide us in creating a future technical architecture that is scalable to the whole archive level, compliant with learning by doing
guidelines, and preserves the data even that is difficult to be understood fully at present, but at least to be linked by others that may provide third-party’s understandings for their own reuse.

Published in: Technology
  • Login to see the comments

  • Be the first to like this

A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan

  1. 1. A Linked Data Prototype for The Union Catalog of Digital Archives Taiwan Museum Computing: An Approach to Bridging Cultures, Communities and Science The 21th PNC Annual Conference and Joint Meetings, October 21-23, 2014 National Palace Museum, Taipei, Taiwan Keh-JiannChen, Tyng-Ruey Chuang, Andrea Wei-Ching Huang, Chung-HsiHung, and Wan-Jung Shu Institute of Information Science, Academia Sinica, Taipei, Taiwan The corresponding author is Andrea Wei‐Ching Huang at {andreahg}@iis.sinica.edu.tw
  2. 2. Outline 1.Introduction & Motivation 2.Digital Archives Thesaurus (dat) 3.A Chinese Bottle in the Prototype 4.The dat Ontology & Prototype System 5.Conclusion & Future Works 6.Reference
  3. 3. Introduction / MotivationUnion Catalog of Digital Archives TaiwanWhy linked dataDatasets we use for the experimentDigital Archives Thesaurus (dat) Overview AAT hierarchy adaption Disambiguation skillSpeaker (1): Wan-Jung Shu dat
  4. 4. 1. Union Catalog of Digital Archives Taiwan: Collections from more than 12 InstitutionsIntroduction & Motivation 1. Anthropology 2. Archaeology 3. Archeology 4. Archives 5. Biology 6. Chinese Artifacts 7. Chinese Paintings & Calligraphy 8. Full Text of Rare Chinese Books 9. Geology 10. Language 11. Map & Remote Sensing 12. Multimedia 13. News 14. Rare & Manuscript Collections 15. Research Reusing 16. Resource Integration for Applications 17. Stone Rubbing 17 Topic Subjects Metadata (DC 15 elements): 5,214,602 Image: 4,032,112 Audio & Video Media: 48,591 oAcademia Sinica oAcademia Historica oNational Museum of Natural Science oNational Central Library oNational Taiwan University oNational Palace Museum oTaiwan Historica oNational Museum of History oChinese Taipei Film Archive oHakka Affairs Council oNational Archives Administration oCouncil of Indigenous Peoples oOpen Requests for Proposals Projects o… o…
  5. 5. Catalogsin Web Context • Need to be open. • Need to be linkable. • Needs to provide links. • Must be part of the network. • Can not be an end in itself. • Allow for hackability. CommonsenseCataloging •2014 Survey indicates: Over 36.6% of keywords in Google search results include Schema snippets. •Pages using schema.org markupshave higher Google rankings. •Library users visit daily, such as Google, Wikipedia and social networks. Modernize Catalogs •Improve Visibility, Discoverability and Findability. •Linking Outside the Catalog. •Sharing of metadata. •Move from Document-basedModel to Data-CentricDescription Model(ex. Marc-based to BIBFRAME). MARC  MARC 21-BIBFRAMEFor Linking For Sharing For Finding Introduction & Motivation: Why Linked (Open) Data ? Reason 1: International Trends
  6. 6. Introduction & Motivation: Why Linked (Open) Data ? Data Semantics Thesaurus Vocabulary Ontology What to wear is depending on what applications need. Old Data New MeaningNew Value Reason 2: Sematic value added for data
  7. 7. Introduction & Motivation 1) For Digital Archives Thesaurus:  Chinese Artifacts : 32,044  Concepts : 1,667 2) For Linked Data Prototype:  5 sub categories of the Chinese Artifacts / 25 examples  No. of Concepts : 167  No. of Triples : 225 …  bamboo/wood lacquerware  ceramic artifacts  enamelware and glass artifacts  jade/stone artifacts  metal artifacts Chinese Artifacts Datasets we use for this prototype experiment
  8. 8. Digital Archives Thesaurus: overview Chinese Art and Artifact Subsets : [concepts and guide terms : 3,088 ] / [terms : 4,538] Digital Archive ThesaurusConcept N Term 1Union Catalog Keyword dictionaryAAT hierarchy adaptionRelated terms of Chinese Artifacts Term 2 Term 3 Term 4 Term 5Concept 2Concept 1 Term n Union Catalog Keyword Dictionary Over 100,000 keywords Sourceof related terms: Art dictionaries Textbooks Journal papers
  9. 9. Digital Archives Thesaurus: AAT hierarchy adaption Chinese Art and Artifact Subsets : [concepts and guide terms : 3,088 ] / [terms : 4,538] Contribution to AAT Equivalence relation AAT dat
  10. 10. tagged term Digital Archives Thesaurus: knowledge extraction form Chinese text Digital Archive Thesaurus CKIP segmentation process 銀鍍 金 纍絲 點翠 珠寶 花蝶 簪 term extraction
  11. 11. Digital Archives Thesaurus: concept-terms-objectConcept N 洋彩 瓷胎洋彩 tag n 瓶 bottle紙槌瓶 蕉葉紋 番蓮紋 開光 內填琺瑯 champlevé 如意雲紋 磁胎銅胎 錦地
  12. 12. Digital Archives Thesaurus: disambiguation Disambiguation skills Homograph distinguished byprefix Subjectrestriction DC elements restriction Example of ambiguation 金in Chinese may represent Metal (material) Gold (material) Golden (color) JinDynasty (styles and periods )
  13. 13. Homograph distinguished byprefix 青花(blue white porcelain) as a type of object 青花(ching-whaglaze)as glazing material 青花with prefix character 以、用、由(means ‘use’) → use ching-hwaglaze Digital Archives Thesaurus: Disambiguation-Part I
  14. 14. Subjectrestriction 琉璃as a kind of glazed pottery tags in ‘Pottery’ category 琉璃as glass material tags in ‘Enamel and Glassware’ categoryDigital Archives Thesaurus: Disambiguation-Part II
  15. 15. DC element restriction 1.DC elementsused: title, type, date, subject, description 2.Example in title object type term (簪=hair pin) must be at the last word of titleDigital Archives Thesaurus: Disambiguation-Part III
  16. 16. A Prototype System Framework overview How a Chinese Bottle is semantically represented in RDF triples? A beta dat ontology domain knowledge representation of the Chinese Artifacts descriptions for curation and publication the Artifacts about data reusing and the use of the R4R OntologySpeaker (2): Andrea Wei-Ching Huang
  17. 17. A Chinese Bottle in the Prototype 作者不詳(-)。[銅琺瑯方瓶]。《數位典藏與數位學習聯合目錄》。 http://catalog.digitalarchives.tw/item/00/30/e5/f1.html @prefix prv: <http://purl.org/net/provenance/ns#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix r4r: <http://guava.iis.sinica.edu.tw/r4r/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix sp: <http://spinrdf.org/sp#> . @prefix xhtml: <http://www.w3.org/1999/xhtml/vocab/#> . @prefix void: <http://rdfs.org/ns/void#> . @prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> . @prefix dat: <http://dat.digitalarchives.tw/ontology.html#> . @prefix schema: <http://schema.org/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix prvTypes: <http://purl.org/net/provenance/types#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix d2r: <http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/config.rdf#> . @prefix aat: <http://vocab.getty.edu/aat/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix map: <http://dat.digitalarchives.tw/resource/#> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix uc: <http://dat.digitalarchives.tw/resource/uc/> . @prefix doap: <http://usefulinc.com/ns/doap#> . <http://dat.digitalarchives.tw/data/Artifact/3204593> a foaf:Document , prv:DataItem ; dct:date "2014-11-24T06:55:37.329Z"^^xsd:dateTime ; prv:containedBy <http://dat.digitalarchives.tw/dataset> ; void:inDataset <http://dat.digitalarchives.tw/dataset> ; foaf:primaryTopic <http://dat.digitalarchives.tw/resource/Artifact/3204593> . <http://dat.digitalarchives.tw/resource/Artifact/3204593> rdfs:isDefinedBy <http://dat.digitalarchives.tw/data/Artifact/3204593> ; dat:artifactType <http://vocab.getty.edu/aat/300010898> , <http://dat.digitalarchives.tw/Concept/800000632> ; dat:componentForm <http://dat.digitalarchives.tw/Concept/800001205> , <http://dat.digitalarchives.tw/Concept/800001103> , <http://dat.digitalarchives.tw/Concept/800000915> , <http://dat.digitalarchives.tw/Concept/800000886> , <http://dat.digitalarchives.tw/Concept/800000913> ; dat:decorationSubject <http://dat.digitalarchives.tw/Concept/800000295> ; r4r:hasProvenance <http://trdf.sourceforge.net/provenance/ns.html#DataCreation> ; r4r:isPartOf <http://dat.digitalarchives.tw/data/Dataset/10000001> ; dct:created "unavailable " ; dct:instructionalMethod <http://vocab.getty.edu/aat/300053778> ; dct:title "銅琺瑯方瓶" ; schema:url <http://catalog.digitalarchives.tw/item/00/30/e5/f1.html> ; foaf:page <http://dat.digitalarchives.tw/page/Artifact/3204593> .
  18. 18. http://dat.digitalarchives.tw/concept/800000295
  19. 19. Search Concept (800000295:Shanshui) Result in one artifact: numbered 3204593
  20. 20. How is this Chinese Bottle semantically represented in RDF triples through our prototype?
  21. 21. The Prototype – I Describing and representing for publishing the concept relations between the Chinese Artifacts of the Digital Archive Taiwan and the Digital Archives Thesaurus. Union Catalog Metadata Digital Archives Thesaurus The dat beta ontology Chinese Artifacts Relational Database Semantic Browsing The dat concept making process Chinese Knowledge and Information Processing (CKIP) Chinese Word Segmentation System Segmented Keyword List Keyword Extraction Tag Extensions Binary Relation Overview of a Linked Data Prototype System using dat (Digital Archives Thesaurus) & dat Ontology
  22. 22. Union Catalog Metadata Chinese Knowledge and Information Processing (CKIP) Chinese Word Segmentation System Segmented Keyword List Keyword Extraction landscape landscapeDigital Archives Thesaurus Tag Extensions Binary RelationBefore The Prototype
  23. 23. Union Catalog Metadata Chinese Knowledge and Information Processing (CKIP) Chinese Word Segmentation System Segmented Keyword List Keyword Extraction landscape landscape Digital Archives Thesaurus Tag Extensions Binary Relation landscape shanshui Shanshui The Prototype – II Overview of a Linked Data Prototype System using dat (Digital Archives Thesaurus)
  24. 24. Describing and representing for publishing the concept relations between the Chinese Artifacts of the Digital Archive Taiwan and the Digital Archives Thesaurus. Union Catalog Metadata Digital Archives Thesaurus The dat beta ontology Chinese Artifacts Relational Database Semantic Browsing Binary Relation Every artifact item has been assigned a dat URI. The Prototype – III Overview of a Linked Data Prototype System using the dat Ontology
  25. 25. ArtifactConcept (dat) Concept(aat) Tag dat:artifactType dat:componentForm dat:decorationSubject dat:describedSubject dat:designElement dct:created dct:instructionalMethod dct:medium schema:color dat:hasTag black ovals are the main modeling resources white ovals are resources defined by local class definitions grey ovals are resources defined by external class definitions dash lines indicate mapping relation tasks not completed skos:narrowerThe Core Ontology: intellectual semantics of the Chinese ArtifactsThe dat ontology –I
  26. 26. dcat:DatasetArtifactObjectNameSourceUnionCatalog rdfs:subClassOf dct:title schema:urldat:ProvenanceCuration & Publication: descriptions of the modelling objects We use popular vocabularies such as DC terms and schema.org to relate Artifact to its preservation and technical descriptions. The dat ontology –II
  27. 27. dcat:DatasetArtifact r4r:isPartOf r4r:hasProvenancer4r:RRObject rdfs:subClassOfReusing: descriptions of the modelling object to associated publications and policy used Do not use common vocabularies to describe Artifact and Dataset relations because we wish to publish the dataset and to be reused by others. In particular, we wish to publish an URI for this resource that can support dynamic contexts: (1)Metadata: ready or not ready? (2)Publish only or can be reused? (3)Joint publications such as article, data and code. dat:ProvenanceThe dat ontology –III
  28. 28. dcat:DatasetArtifactConcept (dat) ObjectNameSourceConcept(aat) Tag dat:artifactType dat:componentForm dat:decorationSubject dat:describedSubject dat:designElement dct:created dct:instructionalMethod dct:medium schema:color r4r:isPartOfUnionCatalog rdfs:subClassOfskos:Concept dat:hasTag dct:title rdfs:subClassOf schema:url r4r:hasProvenancer4r:RRObject rdfs:subClassOf black ovals are the main modeling resources white ovals are resources defined by local class definitions grey ovals are resources defined by external class definitions dash lines indicate mapping relation tasks not completed Beta: An Ontology for Publishing Chinese Artifacts as Linked Data Using the Digital Archives Thesaurus (dat) skos:narrower skos:broader skos:relatedpreservation & technical descriptions of modelling objectsdat:ProvenanceThe dat ontology
  29. 29. Conclusion & Future Works
  30. 30. http://dat.digitalarchives.tw/concept/ http://dat.digitalarchives.tw/ontology http://dat.digitalarchives.tw/
  31. 31. dcat:DatasetArtifactConcept (dat) ObjectNameSourceConcept(aat) Tag dat:artifactType dat:componentForm dat:decorationSubject dat:describedSubject dat:designElement dct:created dct:instructionalMethod dct:medium schema:color r4r:isPartOfUnionCatalog rdfs:subClassOfskos:Concept dat:hasTag dct:title rdfs:subClassOf schema:url r4r:hasProvenancer4r:RRObject rdfs:subClassOf skos:narrower skos:broader skos:relatedFuture Works -IWikipedia dat:Provenance
  32. 32. dcat:DatasetArtifactConcept (dat) ObjectNameSourceConcept(aat) Tag dat:artifactType dat:componentForm dat:decorationSubject dat:describedSubject dat:designElement dct:created dct:instructionalMethod dct:medium schema:color r4r:isPartOfUnionCatalog rdfs:subClassOfskos:Concept dat:hasTag dct:title schema:url r4r:hasProvenancer4r:RRObject rdfs:subClassOf skos:narrowerFuture Works -IIWikipedia Concept(Place) Concept(People) dat:Provenance
  33. 33. dcat:Dataset 16 other catalogs Concept (domain local Source ObjectName Concept (domain external) Tag r4r:isPartOf UnionCatalog rdfs:subClassOf skos:Concept dat:hasTag dct:title rdfs:subClassOf schema:url r4r:hasProvenance r4r:RRObject rdfs:subClassOf skos:narrower skos:broader skos:related Future Works - III Wikipedia cross-domain dat thesaurus dat:Provenance
  34. 34. Reference Article: Bizer, Christian, and Richard Cyganiak. "D2r server-publishing relational databases on the semantic web."Poster at the 5th International Semantic Web Conference. 2006. Bizer, Chris, Richard Cyganiak, and Tom Heath. "How to publish linked data on the web." (2007). Huang, Andrea Wei-Ching and Tyng-Ruey Chuang, “Relations for Reusing (R4R) in a Shared Context: An Exploration on Research Publications and Cultural Objects”, Proc. of the 4th International Workshop on Semantic Digital Archives (SDA), in conjunction with International Digital Libraries Conference (DL2014), London, 8th-12th September 2014. Malmsten, Martin. "Making a library catalogue part of the semantic web."UniversitätsverlagGöttingen(2008): 146. OCLC Linked Data, http://oclc.org/developer/develop/linked-data.en.html LC Linked Data Service: Authorities and Vocabularies, http://id.loc.gov/ Code: d2R, Database to RDF mapping engine and SPARQL server http://d2rq.org/, https://github.com/d2rq/d2rq Huang, Andrea Wei-Ching and Tyng-Ruey Chuang, Relations for Reusing (R4R) Ontology, http://guava.iis.sinica.edu.tw/r4r Huang, Andrea Wei-Ching, Chung-HsiHung, and Wan-Jung Shu, Keh-JiannChen and Tyng-Ruey Chuang, Beta: An Ontology for Publishing Chinese Artifacts as Linked Data Using the Digital Archives Thesaurus (dat), http://dat.digitalarchives.tw/ontology/
  35. 35. Data 作者不詳(-)。[銅琺瑯方瓶]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/30/e5/f1.html 作者不詳(2500 B.C.-2200 B.C.)。[良渚文化晚期玉琮]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/0c/c0/4e.html 作者不詳(1199 B.C.-1000 B.C.)。[商後期□父丁方鼎]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/0c/be/f7.html 作者不詳(960 A.D.-1279 A.D.)。[宋官窯翠青琮式瓶]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/0c/c4/b5.html 作者不詳(960 A.D.-1279 A.D.)。[宋定窯劃花蓮花葵瓣口盤]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/5f/a4/74.html 作者不詳(1601 A.D.-1700 A.D.)。[明末清初銅胎琺瑯獸面紋方鼎式爐]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/33/0b/96.html 作者不詳(1601 A.D-1700 A.D)。[明十七世紀嵌玉石花鳥圓盒]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/33/0e/2d.html 作者不詳(1644 A.D.-1911 A.D.)。[清內填琺瑯纍絲瓜形盒]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/59/d2/1c.html 作者不詳(1644 A.D.-1911 A.D.)。[清玉香盒]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/11/1c/d8.html 作者不詳(1644 A.D.-1911 A.D.)。[清玉鎖環]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/33/48/ef.html 作者不詳(1644 A.D.-1911 A.D.)。[清伽南香手串(十八子)]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/5f/a0/ef.html 作者不詳(1644 A.D.-1911 A.D.)。[清周樂元玻璃內繪行旅圖鼻煙壺]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/59/d0/6a.html 作者不詳(1644 A.D.-1911 A.D.)。[清青玉琱花爐]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/11/0b/3f.html 作者不詳(1644 A.D.-1911 A.D.)。[清剔彩耕作圓瓣式盒]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/33/49/38.html 作者不詳(1644 A.D.-1911 A.D.)。[清留青竹雕臂擱]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/10/b4/73.html 作者不詳(1644 A.D.-1911 A.D.)。[清瑪瑙葵瓣口碗]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/1c/c6/06.html 作者不詳(1644 A.D.-1911 A.D.)。[清銀鍍金嵌珠鳳蝶牡丹鈿花]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/5f/a2/79.html 作者不詳(1644 A.D.-1911 A.D.)。[清銀鍍金纍絲點翠嵌珠寶花蝶簪]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/5f/a2/85.html 作者不詳(1644 A.D.-1911 A.D.)。[清銅鎏金葫蘆式執壺]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/42/b7/6d.html 作者不詳(1644 A.D.-1911 A.D.)。[清燒藍竹桃蘭芝花籃形銀片]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/59/d1/3c.html 作者不詳(1736 A.D.-1795 A.D.)。[清乾隆內填琺瑯番蓮紋瓶]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/0c/c5/bf.html 作者不詳(1736 A.D.-1795 A.D.)。[清乾隆青花荔枝桃實執壺]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/42/b4/00.html 作者不詳(1736 A.D-1795 A.D)。[清乾隆(1736-1795)剔彩山人水物四瓣式套盒]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/33/0e/34.html 作者不詳(1741 A.D.-)。[清乾隆六年磁胎畫琺瑯八哥膽瓶]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/42/b7/d0.html 作者不詳(1742 A.D.-)。[清乾隆窯琺瑯彩藍地開光花卉瓶]。《數位典藏與數位學習聯合目錄》。http://catalog.digitalarchives.tw/item/00/33/49/cf.htmlReference
  36. 36. ADERVERTISEMENT http://summit2015.lodlam.net/

×