7. Guidelines
Literature review
Main elements of a
thesaurus
Terms and concepts
Relationships (USE, UF, BT,
NT, RT)
Notes, node labels and
arrays
Facet analysis
Construction process
LAI CMG Annual Seminar, November 8 2013
8. Guidelines
Literature review
Main elements of a
thesaurus
Terms and concepts
Relationships (USE, UF, BT,
NT, RT)
Notes, node labels and
arrays
Facet analysis
Construction process
LAI CMG Annual Seminar, November 8 2013
9. Guidelines
Literature review
Main elements of a
thesaurus
Terms and concepts
Relationships (USE, UF, BT,
NT, RT)
Notes, node labels and
arrays
Facet analysis
Construction process
LAI CMG Annual Seminar, November 8 2013
10. Guidelines
Literature review
Main elements of a
thesaurus
Terms and concepts
Relationships (USE, UF, BT,
NT, RT)
Notes, node labels and
arrays
Facet analysis
Construction process
LAI CMG Annual Seminar, November 8 2013
20. Term Selection I
Vocabulary Resources
A Handbook of Irish
Folklore by Seán Ó
Suilleabháin
Bealoideas: Journal of
the Folklore of Ireland
Society
LAI CMG Annual Seminar, November 8 2013
21. Term Selection II
Form of Entry
Nouns: count nouns (cows, dogs) in the plural,
non-count (livestock, milk) in the singular.
Verbs: gerund or verbal, no infinitive.
Adjectives: avoid unless significant.
Adverbs: avoid.
Articles (the, a): avoid.
LAI CMG Annual Seminar, November 8 2013
23. Systematic Structure
Hierarchical (facet) or classified (subject) display
Fundamental facets as top concepts (TT).
Easily updated structure.
Good demonstration of the ISO 25964 hierarchy
rules.
Thing/kind
Whole/part
Particular instances of a class
LAI CMG Annual Seminar, November 8 2013
24. Facet Analysis
ISO: “grouping of concepts of the same inherent
category”
Objects, materials, people, places, etc.
Ranganathan, 1920s and 1930s
Personality, Matter, Energy, Space, Time
Classification Research Group, 1960s
Thing, kind, part, property, material, process,
operation, agent, patient, product, by-product, space
and time
LAI CMG Annual Seminar, November 8 2013
25. Time
Place / Space /
Environment
Products
Activities
Processes and
Phenomena
Events
Agents
Objects
Materials
Attributes and
Properties
Parts
Genre
Abstract Entities and
Concepts
LAI CMG Annual Seminar, November 8 2013
32. Current and Future Work
Expansion of the pilot thesaurus to approximately
2,000 preferred terms
Feasibility study into representation in SKOS
Potential Future Work
Multilingual thesaurus (Irish and English)
Representation in SKOS
Mapping to other vocabularies
LAI CMG Annual Seminar, November 8 2013
MoTIF is a collaborative project undertaken by the Digital Repository of Ireland and the National Library of Ireland.It produced a set of guidelines on thesaurus construction as a resource for librarians, archivists and other information professionals who wish to organise and annotate their content for improved search and retrieval. The guidelines are accompanied by a pilot thesaurus of Irish folklore which acts as a illustrative example, a visual demonstration of the principles and processes outlined in the guidelines.Both guidelines and pilot have been submitted for review and will be published in December 2013.
Controlled vocabularies are restricted lists of terms used to provide consistency across search, remove any ambiguity between terms and improve search precision. They may contain equivalence relationships such as USE, Use For, see reference types but don’t have to.
Taxonomies are controlled vocabularies with hierarchical relationships, which can be used for browsing up and down a tree or navigating a website.
Thesauri are controlled vocabularies with hierarchical, associative and equivalence relationships that offer all the benefits previously described but can also make more connections between terms using associative relationships, allow search over non-preferred terms using equivalence relationships, and clarify the meaning of terms using definitions and scope notes.Now, definitions can overlap but that, broadly speaking, is how they work.
Following the Digital Archiving in Ireland DRI report, an opportunity was identified to produce guidelines which would give professionals the advice they need to improve their own data practices by adhering to international standards and best practices.
The guidelines act as a basic introduction to thesauri, including the main elements of a thesaurus, facet analysis, and the construction process. They also briefly cover planning, multilingual thesauri, mapping thesauri and the relationship between thesauri and the Semantic Web.
The guidelines act as a basic introduction to thesauri, including the main elements of a thesaurus, facet analysis, and the construction process. They also briefly cover planning, multilingual thesauri, mapping thesauri and the relationship between thesauri and the Semantic Web.
The guidelines act as a basic introduction to thesauri, including the main elements of a thesaurus, facet analysis, and the construction process. They also briefly cover planning, multilingual thesauri, mapping thesauri and the relationship between thesauri and the Semantic Web.
The guidelines act as a basic introduction to thesauri, including the main elements of a thesaurus, facet analysis, and the construction process. They also briefly cover planning, multilingual thesauri, mapping thesauri and the relationship between thesauri and the Semantic Web.
The construction process in brief
Step 1 involves the selection and recording of terms
Step 2: determining structure and display of the thesaurus, if it will be organised by subject or by facet.
Step 3: the facet analysis itself
Step 4: creating relationships and notes within the now complete structure
Step 5: creating an alphabetical list (if desired)
This will then be followed by expert review and
Documentation.
=thesaurus.
The correct form of these terms was then chosen.ISO 25964-1 sets out guidelines on the form that term should take when entered into the thesaurus. Nouns are the most common form you will encounter. Verbs are the next most common and will take the gerund or verbal forms (usually ending in –ing)Adjectives, adverbs (usually ending in –ly) and articles are to be avoided. Some adjectives were included in the pilot thesaurus as they did pop up as significant in the literature. For example, the significance of wearing red on a particular day might be discussed as part of the lore of a particular area.
The initial list of terms is all over the place and a systematic structure needs to be developed to order them in a logical way.
A systematic structure can be though of as a hierarchical or classified display with subjects or facets at the top of the hierarchies.At this stage a hierarchical display was chosen with fundamental facets as the main divisions, or top concepts as this structure is more easily updated, and it is a good demonstration of the ISO standard rules for hierarchical arrangement, that broader and narrower terms should be one of three different types of relationships: a thing/kind of a thing relationship, all concepts in a the objects facet will be a kind or type of object a whole and its parts relationship, the human anatomy will have hands, heads and so on a narrower terms or a narrower term should be an instance of the broader term. So, for example, the class ‘dogs’ would have a narrower term ‘Spot’It’s important to emphasise that we didn’t structure the thesaurus at this stage, we only made the decision on the design of the scaffolding. More detailed elements of the systematic structure were only determined during the vocabulary analysis.
We used the method of facet analysis which is theanalysis of a subject area into its constituent concepts which are then grouped into facets.The ISO thesaurus standard defines a facet as a ‘grouping of concepts of the same inherent category’. Object, materials, people, places and so on are known as fundamental facets. These fundamental categories of facets were first devised by Ranganathan as part of a library classification scheme in the 1920s and 1930s. Ranganathan proposed five categories, Personality, Matter, Energy, Space and Time, or PMEST, which could cover all aspects of a discipline or subject. These were later expanded by Brian Vickery for the Classification Research Group (CRG) based on the Aristotelian fundamental categories—thing, kind, part, property, material, process, operation, agent, patient, product, by-product, space and time. The CRG went on to state that these categories act as guides to analysis and should not be imposed on subjects Ultimately the choice of facets will depend on the subject matter and what is most practical.
What was most practical for the pilot were the facets listed above. How to organise..
...a jumble of words into...
...intolists of basic coherent facets. For example, in the literature, an agent is a person or piece of equipment which carries out transitive actions, i.e. actions that require a direct object. Following this, animals, fish and people were placed under the Agents facet as these were living creatures which can perform actions and can have an effect on the environment around them.The category also includes supernatural beings and creatures. Other living organisms were originally located under a separate Living Entities facet. In the end, the decision was taken to include all living organisms, from people through to mythical beings and plants under the Agents facet as it made more sense to keep these all living entities together. It is also arguable that, in folklore, some plants, trees and other such living entities have the potential to perform actions or have an effect on others. So that made sense in the context of folklore. It may not in another. Rather than confuse people, equipment was then put into objects. The guidelines go through a few more tricky decisions and they also outline the scope of each fundamental facet as defined for the pilot thesaurus.
Facet analysis IIOnce the initial analysis has been completed and all terms grouped, the facets were then grouped into narrower divisions, using node labels to divide the facets into sub-facets and to organise them according to the principles, or characteristics of division. In the above example, the Agent facets has sub-facets, people, animals, other living organisms and supernatural beings. The animal sub-facet is then organised according to their characteristics of division, in this case animals by function, by species and so on. This is exactly the kind of division you would see on say a fashion website where shirts are organised by size, by colour and so on.
Once the hierarchies were completed and input into the thesaurus management software, associative relationships were added. This is the process recommended by the ISO standard as the most useful associative relationships are usually across hierarchies and so this is easier to do once those hierarchies have been established.These are examples of the most common type of relationships created across hierarchies, so we have agents relating to their activities, materials referring to their products, objects with parts, etc. It should also be noted that these relationships are reciprocal, so they refer to each other.
After that, example scope notes were added to the pilot thesaurus to explain concepts. Like the relationships, the scope notes present in the pilot thesaurus should be considered as illustrative examples as this was as much as the time frame of the project would allow.
Two lists, alphabetical and hierarchical, were then generated within the software and exported. These formed the basis of the print version of the thesaurus.An electronic version of the software also exists and it contains both hierarchical and alphabetical displays which can be browsed. It can also be searched by keyword.