This document summarizes Reem Weda's presentation on connecting the Iconclass thesaurus and the Art & Architecture Thesaurus (AAT) through their concepts and keywords. Weda analyzed mappings between Iconclass keywords and AAT concepts, finding around 35% matches initially and 50% when using additional tools. Connecting the thesauri in this way through linked open data could provide multilingual access to AAT descriptions and disambiguate Iconclass keywords. However, exact matches are limited by Iconclass' standardized but not strictly conceptual keywords and language variations between terms.
3. Exploring connections
Iconclass and AAT
Questions:
Connect on a concept level?
or
Connect on keyword level?
Data structure issues?
What are the benefits?
4. Henri van der Waal (1910-1972)
Start in 1950’s
Computerized editions: 1990-2001 (cd-
rom) Online browser 2004. Update in
2009 www.iconclass.org
Alfanumerical codes with text.
Hierarchical like thesaurus
Multilingual
LOD publication (URI, API and full set)
5. Hierarchical structure
7 – Bible
71 – Old Testament
71C – Genesis: the patriarchs
71C2 – story of Isaac
71C21 – Rebekah (Rebecca) sought in marriage
(Genesis 24)
71C217 – marriage of Isaac and Rebekah
Keys: Eliezer (servant of Abraham) · Genesis · Genesis 24 ·
Isaac · Old Testament · Rebekah · bible · marriage · patriarch
· proposal
Rembrandt, Portret van een echtpaar als Isaak en Rebecca, 1662-1666
6. Example of saints
Mary Magdalene
• 11HH(MARY MAGDALENE)(+3) the penitent
harlot Mary Magdalene… (+ angel(s))
• Keywords: Christian religion · Mary
Magdalene (St.) · angel · book · crown ·
crown of thorns · crucifix · jar · mirror ·
musical instrument · ointment · palm-
branch · religion · rosary · saint · scourge ·
scroll · supernatural · woman
11. AAT: Art & Architecture Thesaurus
ULAN: Union List of Artists Names
TGN: Thesaurus of Geographic Names
CONA: Cultural Object Name Authority
Getty vocabularies
They are all constructed as thesauri
12. Developed since 1980
43.000 concepts, and 300.000 terms
Generic concepts for describing objects and images in
serveral languages. Multi-lingual
No names of persons and unique events
‘Cathedal’ yes, but not ‘Chartres cathedral’ (CONA)
Strict editorial control, validation by sources
Thesaurus construction follows ISO 25964
international (even global) scope
14. Getty web service
• The base URI is http://vocab.getty.edu/
• SPARQL-endpoint
• Standard presentation is SKOS or SKOS-XL
• Mappings and ontology based on RDF/XML and Turtle
• Published versions of lookup lists e.g.,languages, roles,
nationalities, place types, and bibliographic sources)
ODC-By licence
This [title or report or article or dataset] contains information from Art & Architecture
Thesaurus (AAT)® which is made available under the ODC Attribution License
15. Compare Iconclass - AAT
IC notations (text) can be equivalent to AAT concept, but more often not
41A71 ‘furniture to put things on’
IC keywords for ‘textile fabric, cloth’:
civilization · cloth · craft · culture · industry · occupations · society ·
textile industry
These keywords can be mapped more easily because more
standardized and generic!
Iconclass notation: 47H6 textile fabric, cloth
Has equivalents with AAT: cloth (300162391)
AND textile materials (300231565)
16. Mapping results
Tools: Excel, Open Refine, Cultuurlink
When removing duplicates from the keywords. The result was
about: 35% matches overall
When Using Cultuurlink for the full set: the percentage
rose to around 50%
17. Mapping keywords to AAT
• http://vocab.getty.edu/aat/300055806Civilization
• http://vocab.getty.edu/aat/300162391cloth
• http://vocab.getty.edu/aat/300054704craft
• http://vocab.getty.edu/aat/300055768culture
• http://vocab.getty.edu/aat/300055718industry
• http://vocab.getty.edu/aat/300263369occupations
• http://vocab.getty.edu/aat/300026009society
• NO MATCHtextile industry
18. Mapping results 2
Named entities do not match!
Cleopatra, Arthur, Ophelia, Odin, Mars, Paris,
Marseille
iconographical (biblical) subjects do not match (often)
‘Man of Sorrows’, ‘hand of God’ and ‘Jacob's ladder’
Unusual compound terms do not match
apparent death’, ‘love unrequited’ and ‘not too poor,
not too rich’
20. Issues with Iconclass keywords
• They are standardized following some conventions but : they
are not concepts
• keywords in LOD, are not always symmetrically structured
between the language variants.
• Keyword 1 in French = ‘Victoria’
• Keyword 1 in English = ‘idea’
My experiment focused on mapping the English keywords: no
automatic mapping for the other languages yet.
21. The value
• Keywords become concepts with URI’s
• Gives acces to AAT multilingual content
• Provide scope notes
• Creates a bridge between AAT and Iconclass collections
23. Thank you for listening!
RKDexplore https://rkd.nl/en/
Iconclass
http://www.iconclass.org
LOD http://www.iconclass.org/help/lod
Getty vocabularies
http://www.getty.edu/research/tools/vocabularies/index.ht
ml LOD http://vocab.getty.edu/
Open Refine http://openrefine.org/
Cultuurlink http://cultuurlink.beeldengeluid.nl/app/#/
Editor's Notes
Good afternoon, . I am honored and happy to be here.
During my talk I would like to eleborate a bit on the idea of connecting two major cultural thesauri to each other in a meaningfull way. These are Iconclass which is now maintained by my institute the RKD the Netherlands Institute for Art and the Art & Architecture thesaurus developed by Getty Vocabularies, but in partnership with the RKD for the Dutch translation.
Both Iconclass and AAT are widely used multilingual systems to describe and annotate works of art and the content and scope of AAT and Iconclass differ noticeably, but also know overlap. I want to explore and present a solution where the overlap between the two is linked at a keyword level made with AAT concepts.
Here we see a cloud representation of some heritage datasets. Not complete and possibly outdated. The red circles are RKD, Blue are Getty Vocabularies, Brown are the Rijksmuseum datasets, Green from the Louvre.
Now, considering connecting knowledge, the hope on the horizon lies with the promise of LOD. It provides a method for connecting concepts between knowledge systems. The only problem is that this cannot be achieved fully automatically. We, the domain experts and data experts, have to make the connections ourselves first. It might be impossible to connect everything to everything. But there is already a lot to be gained if parts of the terminologies between datasets are mapped. We have to start somewhere. More effective exploration and use of collections; ‘semantic glue’ for LOD. So find more of what you want and be guided to what you need.
Many collections and projects use the same thesauri – Getty Vocabularies and Iconclass. Give authoritative information and strengthen access to databases.
These vocabularies are not unified (although overlap does exist), so browsing many collections in an interoperable way becomes difficult.
Metadata and vocabularies must be depicted in RDF and/or OWL. Iconclass and AAT are published as LOD.
Forming semantic links between different resources is ontology mapping.
What is Iconclass? subject-specific classification system
Iconclass was developed by Henri van de Waal (1910-1972), Professor of Art History at the University of Leiden (photo). Published online in 2004 and updated in 2009 in partnership with RKD. RKD took over the role as maintaining institute since 2009.
The system is one of the largest classification systems for cultural content and possibly the largest for visual arts content. Described by Wikipedia as: ‘a highly complex way of classifying the content of images’, unfortunately, the system is often perceived this way. It might not be all that bad. Seeing that Libraries are using coded classifications all the time.
Here you see the annotation for The event accuring in the story of Isaac, from the old testament. The marriage of Isaac and Rebeccah. Here depicted by Rembrandt
To sum up a the key elements. Classification system for Iconographical topics of 28,000 hierarchically ordered definitions. It’s consists of Alfa-numerical codes with text correlates in several languages. It’s hierarchically ordered so it works like a thesaurus.
I want to point out that is also integrates 14.000 keywords linked to the subjects, that can help guide the user to the subject he is looking for. Also it they could provide enough overlap with AAT concepts to make it worth while to map them.
Here an example of a more complex IC annotation. Mary Magdalene with angels. Can be described as such. There are some more build in functions in the system imprive the accuracy of a notation systematically. (+3) means the precence of angels in this case.
I want to focus on the keywords that are linked to the annotation. They offer extra associated terms and are used for every notation that is related. In the browser you can click these terms to retrieve all associated notations.
Keywords are added to the notations as separate elements and they aid the information retrieval from the iconclass system. They are inspired on the textual correlate, but they mostly reflect the hierarchical relations that exist between the notations, they are hereditary. By this we mean that a keyword that has been assigned to a certain notation, is a valid keyword for that notation and al notations on a lower hierarchical level.
The multilingual Iconclass Browser serves as a search tool that helps an indexer to find the concepts to tag an image. It can be used to establish the correct meaning of a notation. The Iconclass browser lets you do simple and sophisticated keyword searches, and browse the Iconclass schedules, because its structured as a thesaurus.
You can make lists of often used notations, and copy-paste them as a whole. When logged in as a user, you can add comments on notations and the idea is that these comments will be processed by editors of iconclass.
Laat zien: Clipboard functionaliteit. Browse History.
32B voor comments en 32B341 Eskimos voor concept onder revisie. Zelf commentaar plaatsen en nieuwe notaties voorstellen. Meer Opties en Help. Het Iconclass forum en Blog.
Available JSON en RDF/SKOS . JSON is JavaScript Object Notation, and the concepts can be found online though URI’s with the addition of .json or .rdf. You can query using the API or download the full set of raw data.
Getty Vocabularies consist of the Art and Architecture Thesaurus (AAT), Thesaurus of Geographic Names (TGN), The Cultural Objects Name Authority (CONA), and The Union List of Artist Names (ULAN). These give structured vocabulary for names, descriptions, titles biographies, and various information on art, architecture, important places, works of art and artists, respectively.
The AAT contains generic terms; it contains no iconographic subjects and no proper names. That is, each concept is a case of many (a generic thing), not a case of one (a specific thing). For example, the generic term cathedral is in the AAT, but the specific proper name Chartres Cathedral is out of scope for the AAT (it would be included in one of the other vocabularies instead).
The multilingual AAT contains more than 42.000 concepts, and 300.000 terms, organized in 8 facets, about material heritage, like art and architecture, but also techniques, materials and living creatures.
Has strict editorial control. The AAT is a compiled resource; it is not comprehensive. The AAT grows through contributions. Among others the RKD-Netherlands Institute for art history, coordinates a full Dutch translation and proposes new concepts coming from the Dutch heritage field.
Follows ISO 25964 the international standard for thesauri and interoperability with other vocabularies
Since 2014 AAT is published as LOD. It strives to have a international even global scope.
The AAT is a thesaurus in compliance with ISO and NISO standards. The focus of each AAT record is a concept. In the database, each concept's record (also called a subject) is identified by a unique numeric ID. Linked to each concept record are terms, related concepts, a parent (that is, a position in the hierarchy), sources for the data, and notes. The temporal coverage of the AAT ranges from Antiquity to the present and the scope is global.
AAT concepts can contain multiple equivalent terms that have equal authority. The terms connected to the concept all have warrant from authoritative sources. Users do not have to use the ‘preferred term’ the AAT chose, if needed they can use a equivalent.
The AAT is a hierarchical database; its trees branch from a root called Top of the AAT hierarchies (Subject_ID: 300000000). There may be multiple broader contexts, making AAT polyhierarchical. In addition to the hierarchical relationships, the AAT has equivalence and associative relationships.
It is possible to link the concepts in Iconclass and AAT. For instance: 47H6 textile fabric, cloth with cloth (AAT: 300162391), but this Iconclass notation also maps with textile materials (AAT: 300231565), which is a broader term of ‘cloth’ in the AAT. Also the hierarchal position of the concept differs a lot. Linking concepts on this level is probably difficult and mostly not possible.
Others difficult to determine exact meaning and often out of scope. See example
Conclusion: a concept in Iconclass is a somewhat different thing than in AAT. Iconclass annotations do not always describe one single topic. Linking on this level will only work for a relatively small number of instances because of described differences. However: connecting the iconclass keywords with AAT concepts will deliver more results.
For this test I collected all the keywords from the Iconclass data and put them in a spreadsheet. The keywords that are used in Iconclass are grouped into 5 directories, combining two to four facets in one directory. For each group I first deleted all the duplicates. Then I uploaded the remaining list into Google Refine to reconcile with the AAT. By reconciling I mean linking/or mapping the data to structured databases online, in this case the AAT.
A tool developed by Google, now open source, for working with messy data: cleaning it ; transforming it from one formation into another; and extending it with web services and external data. http://openrefine.org/index.html
CultuurLINK is developed in the Netherlands for cultural heritage institutions to help them link their vocabularies, such as thesauri and term lists, with the Dutch cultural heritage Hub. With CultuurLINK you upload your vocabulary, select a target from the Hub and build your unique link strategy
I put around 24000 keywords into Cultuurlink. And with the stategy around 12000 were linked. These are not neccesarily all correct!
Example of succesfull mapping.
For this test I collected all the keywords from the Iconclass data and put them in a spreadsheet. The keywords that are used in Iconclass are grouped into 5 directories, combining two to four facets in one directory. For each group I first deleted all the duplicates. Then I uploaded the remaining list into Google Refine to reconcile with the AAT. By reconciling I mean linking/or mapping the data to structured databases online, in this case the AAT.
A tool developed by Google, now open source, for working with messy data: cleaning it ; transforming it from one formation into another; and extending it with web services and external data. http://openrefine.org/index.html
CultuurLINK is developed in the Netherlands for cultural heritage institutions to help them link their vocabularies, such as thesauri and term lists, with the Dutch cultural heritage Hub. With CultuurLINK you upload your vocabulary, select a target from the Hub and build your unique link strategy
the value of itegrating? creating LOD links to standard; URI’s change keywords to concepts
Possibly get scope notes and more descriptors/equivalent terms for the concepts. use AAT structure to search within hierarchy keywords
Possible end user search assistant, visualize links between concepts aswel as IC notations.
Keywords show are considerably larger degree of standardization than the words in textual correlates, they also follow certain conventions:
with a few exception, the singular form of nouns is preferred.
There are keywords that are verbal nouns, denoting activities and processes (e.g. burning, walking, singing, laughing).
Modified letters (diacritics) in the textual correlates – é, ö, ü, ï, etcetera have been replaced by their unmodified equivalents when they are part of a keyword.
This was done in the early 90-ties to enhance searching. Today this would not be necessary anymore.
The Iconclass keywords are a powerful means of retrieving information from the Iconclass system, but they are not (yet) fully linked semantic elements within the LOD publication. Mapping and structuring these keywords would benefit the publication.
The AAT is a hierarchical database; its trees branch from a root called Top of the AAT hierarchies (Subject_ID: 300000000). There may be multiple broader contexts, making AAT polyhierarchical. In addition to the hierarchical relationships, the AAT has equivalence and associative relationships.
More associative terms and information. A lot more languages. Wikidata also a good option!