SlideShare a Scribd company logo
1 of 44
Download to read offline
Metadata management for data
storage spaces
Contributors:
François Ehrenmann (UMR BioGECO)
Philippe Chaumeil (UMR BioGECO)
Daniel Jacob (UMR BFP)
INRAE - Indexator – October 2022
• The implementation of a Data Management Plan (DMP) involves
some requisites such as the data outsourcing to be preserved
outside the users' disk space.
• This concerns not only published data but all data produced during
the course of a project.
• This is even more necessary when temporary staff (doctoral
students, post-docs, trainees, fixed-term contracts) are involved in
the production of data.
Data Management Plan
How to encourage the structures (Units, Platforms,...)
to better manage their data ?
INRAE - Indexator – October 2022
Data storage
• The central idea is that the storage space becomes the data repository, so the
metadata should go to the data and not the other way around.
Metadata
How to encourage the structures (Units, Platforms,...) to better manage their data
Your data repository
• Concern about the organisation of these storage spaces.
• Should they be harmonised, i.e. impose good practices such as i) folder and file naming, ii) folder structure (docs, data, scripts,
etc.), iii) the use of README files, iv) etc.
• At least the use of a README file seems the simplest and least restrictive.  what to put in it ?
• How to use them effectively when you want to find information? With what vocabulary ?
INRAE - Indexator – October 2022
Data storage Project data storage space :
Put a metadata file (JSON format)
describing the project data within each
subdirectory
• The central idea is that the storage space becomes the data repository, so the
metadata should go to the data and not the other way around.
The choice was made for the JSON format,
which is very appropriate for describing
metadata, readable by both humans and
machines
How to encourage the structures (Units, Platforms,...) to better manage their data
Your data repository
INRAE - Indexator – October 2022
Generate the
metadata file (JSON)
Data storage
Web interface
Project data storage space :
Put a metadata file (JSON format)
describing the project data within each
subdirectory
• The central idea is that the storage space becomes the data repository, so the
metadata should go to the data and not the other way around.
The choice was made for the JSON format,
which is very appropriate for describing
metadata, readable by both humans and
machines
Knowing the production of files in JSON
format being delicate for users, a web
interface makes it possible to create
them.
How to encourage the structures (Units, Platforms,...) to better manage their data
deposit
INRAE - Indexator – October 2022
View
Metadata
Generate the
metadata file (JSON)
Search datasets based
on some metadata
deposit
scan
Data storage
Web interface
Project data storage space :
Put a metadata file (JSON format)
describing the project data within each
subdirectory
Then, find projects and/or data
corresponding to your criteria
• The central idea is that the storage space becomes the data repository, so the
metadata should go to the data and not the other way around.
How to encourage the structures (Units, Platforms,...) to better manage their data
INRAE - Indexator – October 2022
How to encourage the structures (Units, Platforms,...) to better manage their data
What metadata?
How to specify it?
From which vocabulary?
How to generate a JSON file?
Questions immediately raised
INRAE - Indexator – October 2022
• Given the diversity of domains, the approach chosen is to be both as flexible and as
pragmatic as possible by allowing each collective to choose its own (controlled) vocabulary
corresponding to the reality of its field and activities.
• The main idea is to be able to "capture" the user's metadata as easily as possible using their
vocabulary.
How to encourage the structures (Units, Platforms,...) to better manage their data
What metadata?
How to specify it?
From which vocabulary?
How to generate a JSON file?
Questions immediately raised
INRAE - Indexator – October 2022
• The main idea is to be able to "capture" the user's metadata as easily as possible using their
vocabulary.
How to encourage the structures (Units, Platforms,...) to better manage their data
The web interface
must therefore correspond to the scientific and experimental context
of the collective (research unit, project, platform, ...)
What metadata?
How to specify it?
From which vocabulary?
How to generate a JSON file?
Questions immediately raised
• Given the diversity of domains, the approach chosen is to be both as flexible and as
pragmatic as possible by allowing each collective to choose its own (controlled) vocabulary
corresponding to the reality of its field and activities.
INRAE - Indexator – October 2022
…
Web interface for metadata entry
Generate the metadata file (JSON)
INRAE - Indexator – October 2022
Sections
…
Web interface for metadata entry
Generate the metadata file (JSON)
INRAE - Indexator – October 2022
…
Web interface for metadata entry
Generate the metadata file (JSON)
Sections
Fields
INRAE - Indexator – October 2022
…
Web interface for metadata entry
textbox
dropbox textbox
checkbox
dropbox
textbox textbox
checkbox
Generate the metadata file (JSON)
Type
Sections
Fields
INRAE - Indexator – October 2022
…
Web interface for metadata entry
textbox
dropbox textbox
checkbox
dropbox
textbox textbox
checkbox
Generate the metadata file (JSON)
Predefined terms
Sections
Fields
Type
INRAE - Indexator – October 2022
Sections
Predefined terms
…
Web interface for metadata entry
Fields
width=350px width=350px
width=350px width=500px
open
textbox
dropbox textbox
checkbox
dropbox
textbox textbox
checkbox
Generate the metadata file (JSON)
Features
Type
INRAE - Indexator – October 2022
…
Fields Sections Type Features Predefined terms
config_terms.txt
Definition of metadata
• Terminology definition file in Tabulation-Separared-Values (TSV)
• Based on (controlled) vocabulary specified by the data manager of a collective (research unit, platform, … )
all the metadata to be entered can be fully configured using only one configuration file (TSV format).
It is possible to define
the whole terminology
using a spreadsheet.
INRAE - Indexator – October 2022
• column 1 - Field : shortname of the fields
• column 2 - Section : shortname ot the sections
• column 3 - Search : indicates if the field can be used as a criterion search ('Y') or not ('N')
• column 4 - Shortview : indicates with ordered numbers if the field serves for the overview table after the search (empty by default)
• column 5 - Type : indicates the way they will be entered via the web interface (possible values are: textbox, dropbox, checkbox and areabox).
• column 6 - Features : dependings on the Type value, one can specifiy some specific features. If several features, they must be separated by a comma
• for checkbox: open=0 or open=1 indicates if the selection is opened or not
• for textbox & checkbox: autocomplete=item The items.js file must be present under web/js/autocomplete
• for textbox & dropbox: width=NNNpx allows you to specify the width of the box. Usefull if you want put several fields in the same line
• for areabox: row=NN and cols=NN allows you to specify the row and column size of the textarea
• column 7 - Label : Labels corresponding to the fields that will appear in the web interface
• column 8 - Predefined terms : for fields defined with a type equal to 'checkbox' or 'dropbox', one can give a list of terms separated by a comma.
Structure of the Terminology definition file
Definition of metadata
config_terms.txt
all the metadata to be entered can be fully configured using only one configuration file (TSV format).
INRAE - Indexator – October 2022
Architecture diagram
config_terms.json
initdb
search
Configuration / Initialization steps
Normal operating mode
pgd-mmdt-schema.json
Terminology definition file (Tabulation-Separated Values)
Important: Must be defined in the first step and then no longer changed.
Web interface
(config)
config_terms.txt
generate
generate
generate
linked
MongoDB Web interface
create
insert
PGD_XXXXX.json
options
scan
cron
Data storage
deposit
scan
View
Metadata
Docker Containers
Input / Output files
Data storage
Web server
INRAE - Indexator – October 2022
Architecture diagram
config_terms.json
initdb
pgd-mmdt-schema.json
Terminology definition file (Tabulation-Separated Values)
Important: Must be defined in the first step and then no longer changed.
Web interface
(config)
config_terms.txt
generate
generate
MongoDB
http:/mysite.org/pgd-mmdt/config
Docker Containers
Input / Output files
Configuration / Initialization steps
web/json
INRAE - Indexator – October 2022
Architecture diagram
config_terms.json
Web interface
create
PGD_XXXXX.json
pgd-mmdt-schema.json
linked
options
Data storage
deposit
Metadata entry
Docker Containers
Input / Output files
web/json
INRAE - Indexator – October 2022
Architecture diagram
config_terms.json
Web interface
search
insert
MongoDB
options
scan
cron
Data storage
scan
View
Metadata
Docker Containers
Input / Output files
web/json
Project search
INRAE - Indexator – October 2022
…
http:/mysite.org/pgd-mmdt/search
Web interface for search
INRAE - Indexator – October 2022
http:/mysite.org/pgd-mmdt/search#results
Web interface for search
Short View
INRAE - Indexator – October 2022
http:/mysite.org/pgd-mmdt/metadata/Atacama
Web interface for metadata
…
INRAE - Indexator – October 2022
PGD_XXXXX.json
deposit
scan
Web interface
options
scan cron
Web interface : Add new predefined terms
Terminology definition file
First time we need
of this new term
This new term is now available
for other users / datasets
Data storage
INRAE - Indexator – October 2022
web/js/autocomplete/cities.js
Web interface
Example with
Web interface : autocompletion
.
.
API « Découpage administratif » (Administrative division)
var cities=[];
$.getJSON("https://geo.api.gouv.fr/communes", function (data) {
$.each(data, function (index, value) { cities.push(value['nom']); });
});
. Terminology definition file
INRAE - Indexator – October 2022
// Get all descendant classes from 'Data' classe
edam_data=[];
get_terms_from_bioportal('EDAM', 'http://edamontology.org/data_0006', 'edam_data');
web/js/autocomplete/edam_data.js
To get information about the BioPortal API : https://data.bioontology.org/documentation
Web interface : autocompletion Example with
https://bioportal.bioontology.org/ontologies/EDAM/?p=classes
“datatype":{
"titre":"Data type",
"autocomplete":"edam_data",
"width":"350px“
}
web/json/config_terms.json
.
Web interface
.
.
Choose from 947 terms
autocompletion
INRAE - Indexator – October 2022
Web interface : autocompletion
https://vocabulaires-ouverts.inrae.fr/a-propos-du-thesaurus-inrae/
Example with
INRAE - Indexator – October 2022
Web interface : autocompletion Example with
https://consultation.vocabulaires-ouverts.inrae.fr/api/
web/js/autocomplete/VOvocab.js
.
Terminology definition file
keywords = [
'data', 'report','simulation', 'model', 'image','script',
'omics', 'statistic','scientific', 'research', ‘document',
'experiment','video', 'spatial', 'instrument'
]
VOvocab=[];
get_terms_from_voinrae(keywords,'VOvocab')
Choose from 405 terms
autocompletion
INRAE - Indexator – October 2022
Web interface : Resources
Terminology definition file
The "description" field should make it possible to better annotate the data,
while the "location" field should make it possible to
1) extend the perimeter of the data beyond the local space,
2) eventually to be able to emancipate oneself from the local space when one wishes to
disseminate the metadata alone
A location can be anything: a text, an absolute path in a tree, a URL link, ...
We can thus put a link to a publication: Type=article, link=DOI
INRAE - Indexator – October 2022
Creation
JSON metadata file
metadata viewer
Resource example 1: Atacama
INRAE - Indexator – October 2022
Resource example 2: Link to nextcloud
Put a NextCloud link pointing to the data repository.
Access is thus limited to those who have rights !
INRAE - Indexator – October 2022
Resource example 2: Link to nextcloud
Put a NextCloud link pointing to the data repository.
Access is thus limited to those who have rights !
Resource example 3: Indicate the path on a external storage
In case putting an URL is not possible, nervertheless
provide clear indications on the location of the data.
INRAE - Indexator – October 2022
VM
Data storage
Web server
Storage located on the VM
Installation : Local, Remote or Mixed
Local storage mounted on the VM
NAS Server
VPN
GlobalProtect
WinSCP
Successful
testing
Local VM
Remote VM (Datacenter)
2 cpu, 2 Go RAM, 10 Go HD
INRAE - Indexator – October 2022
VM
Data storage
Web server
Local VM
Remote VM (Datacenter)
Storage located on the VM
Google Drive
2 cpu, 2 Go RAM, 10 Go HD
Installation : Local, Remote or Mixed
Local storage mounted on the VM
NAS Server
VPN
GlobalProtect
WinSCP
Successful
testing
INRAE - Indexator – October 2022
scan
[ncloud]
type = webdav
url = https://nextcloud.inrae.fr/remote.php/webdav/
vendor = nextcloud
user = XXXXX
Pass = XXXXX
rclone mount ncloud:MTH2-PF-Bordeaux/DATA/ /mnt/ncloud/ 
--allow-other --vfs-cache-mode minimal 
--read-only --no-checksum --no-modtime 
--daemon --daemon-wait 15s
https://pmb-bordeaux.fr/ncloud/search
https://nextcloud.inrae.fr/apps/files/?dir=/MTH2-PF-Bordeaux/DATA
INRAE - Indexator – October 2022
Web Interface
Creation of the
JSON file
Mapping of JSON
file sections/terms
with the metadata
structure in
DATA INRAE
Pre-fill a dataset in the INRAE DATA dataverse (via API)
JSON Schema
Metadata JSON file
+
pgd-mmdt-schema.json
JSON-LD
Metadata JSON-LD file
• A good approach is to use only controlled vocabulary i.e. a relevant and sufficient
vocabulary used as reference in the field concerned to allow users to describe a project and
its context without having to add additional terms.
• A mapping of terms based on controlled vocabulary can thus be done more easily to
generate formats corresponding to different standards (MIAPPE, JSON-LD, ...)
Push
INRAE - Indexator – October 2022
Example of mapping from a controlled vocabulary based on an ontology in BioPortal
autocompletion
http://edamontology.org/data_0006
API BioPortal ontology / EDAM
get terms
Pre-fill a dataset in the INRAE DATA dataverse (via API)
INRAE - Indexator – October 2022
API BioPortal Search
https://data.bioontology.org/search
?q=Gene%20expression%20profile&ontology=EDAM&subtree_root_id=http%3A%2F%2Fedamontology.org%2Fdata_0006&apikey=….
Example of mapping from a controlled vocabulary based on an ontology in BioPortal
autocompletion
http://edamontology.org/data_0006
API BioPortal ontology / EDAM
get terms
search
Pre-fill a dataset in the INRAE DATA dataverse (via API)
Mapping
get
INRAE - Indexator – October 2022
Example of mapping from a controlled vocabulary based on the Thesaurus INRAE
https://consultation.vocabulaires-ouverts.inrae.fr/api/
API Thesaurus INRAE
get terms
Pre-fill a dataset in the INRAE DATA dataverse (via API)
autocompletion
INRAE - Indexator – October 2022
Example of mapping from a controlled vocabulary based on the Thesaurus INRAE
https://consultation.vocabulaires-ouverts.inrae.fr/api/
API Thesaurus INRAE
get terms
Pre-fill a dataset in the INRAE DATA dataverse (via API)
autocompletion
https://consultation.vocabulaires-ouverts.inrae.fr/rest/v1/search
?vocab=thesaurus-inrae&lang=en&type=skos%3AConcept
&query=metabolomics
&offset=0
API Thesaurus INRAE
search
get
Mapping
INRAE - Indexator – October 2022
Create
the
project Descriptive metadata
(Project)
Preserving
data
Web-based metadata entry tool
Storage space for the project
associated with the metadata file
Data analysis
•Adding new metadata
•Saving data with their metadata
•Convert to a suitable format
(JSON-LD)
Access to
data
Reuse of
data
Metadata query
(Web interface and/or API)
Observations,
Samples,
Experimentation,
Instrumentation
Push
JSON-LD
JSON with
a Schema
Adding
Resources
NAS
National and
international
data repositories
TSV
PGD_XXX.json
…
TSV
XXX
“Machine-Actionable Metadata" Create
the data
JSON with a Schema
Pre-fill a dataset in the INRAE DATA dataverse (via API)
Mapping
INRAE - Indexator – October 2022
• Have a visibility of what is produced within the collective
• data sets, software, databases, images, sounds, videos, analyses, codes, ...
• Use a controlled vocabulary specific to the domain of the collective, with mapping to other formats
embedding ontologies to be done downstream as required,
• Propose an alternative/complement to external data repositories or other thematic warehouses to have
knowledge of and access to ALL data, not only those that are published,
• Favour FAIR (at least Findable & Accessible criteria) within the collective,
• Sensitise newcomers and students to a better description of what they produce.
Conclusion
The “INDEXATOR" tool allows a collective to :
INRAE - Indexator – October 2022
https://github.com/inrae/pgd-mmdt
Thank you for your attention
Metadata Management for Storage Spaces
Metadata aggregation & indexation
Source code

More Related Content

Similar to Indexator_oct2022.pdf

Micka Manual
Micka ManualMicka Manual
Micka ManualSDIEDU
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentationOleksii Usyk
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application ModelsMarco Brambilla
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Mark Wilkinson
 
United Airlines 2012 SharePoint Conference Presentation
United Airlines 2012 SharePoint Conference PresentationUnited Airlines 2012 SharePoint Conference Presentation
United Airlines 2012 SharePoint Conference PresentationDenise Wilson
 
United Airlines 2012 Microsoft SharePoint Conference Presentation
United Airlines 2012 Microsoft SharePoint Conference PresentationUnited Airlines 2012 Microsoft SharePoint Conference Presentation
United Airlines 2012 Microsoft SharePoint Conference PresentationDenise Wilson
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
"Data Dynamics: Trends & Patterns Revealed"
"Data Dynamics: Trends & Patterns Revealed""Data Dynamics: Trends & Patterns Revealed"
"Data Dynamics: Trends & Patterns Revealed"cakepearls17
 
MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaperRajesh Kumar
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM4Science
 
Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Suite Solutions
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederOpenAIRE
 
Data Science Process.pptx
Data Science Process.pptxData Science Process.pptx
Data Science Process.pptxWidsoulDevil
 
An Efficient Approach to Manage Small Files in Distributed File Systems
An Efficient Approach to Manage Small Files in Distributed File SystemsAn Efficient Approach to Manage Small Files in Distributed File Systems
An Efficient Approach to Manage Small Files in Distributed File SystemsIRJET Journal
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talkbenosteen
 

Similar to Indexator_oct2022.pdf (20)

Micka Manual
Micka ManualMicka Manual
Micka Manual
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentation
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
United Airlines 2012 SharePoint Conference Presentation
United Airlines 2012 SharePoint Conference PresentationUnited Airlines 2012 SharePoint Conference Presentation
United Airlines 2012 SharePoint Conference Presentation
 
United Airlines 2012 Microsoft SharePoint Conference Presentation
United Airlines 2012 Microsoft SharePoint Conference PresentationUnited Airlines 2012 Microsoft SharePoint Conference Presentation
United Airlines 2012 Microsoft SharePoint Conference Presentation
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
21 domino mohan-1
21 domino mohan-121 domino mohan-1
21 domino mohan-1
 
"Data Dynamics: Trends & Patterns Revealed"
"Data Dynamics: Trends & Patterns Revealed""Data Dynamics: Trends & Patterns Revealed"
"Data Dynamics: Trends & Patterns Revealed"
 
MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaper
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
 
SAP BI/BW
SAP BI/BWSAP BI/BW
SAP BI/BW
 
Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
Data Science Process.pptx
Data Science Process.pptxData Science Process.pptx
Data Science Process.pptx
 
An Efficient Approach to Manage Small Files in Distributed File Systems
An Efficient Approach to Manage Small Files in Distributed File SystemsAn Efficient Approach to Manage Small Files in Distributed File Systems
An Efficient Approach to Manage Small Files in Distributed File Systems
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
MongoDB
MongoDBMongoDB
MongoDB
 

More from Daniel JACOB

Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementDaniel JACOB
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2Daniel JACOB
 
Make your data great now
Make your data great nowMake your data great now
Make your data great nowDaniel JACOB
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 

More from Daniel JACOB (6)

Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
 
Make your data great now
Make your data great nowMake your data great now
Make your data great now
 
Biostatflow
BiostatflowBiostatflow
Biostatflow
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
ERVA-NMR
ERVA-NMRERVA-NMR
ERVA-NMR
 

Recently uploaded

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 

Recently uploaded (20)

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 

Indexator_oct2022.pdf

  • 1. Metadata management for data storage spaces Contributors: François Ehrenmann (UMR BioGECO) Philippe Chaumeil (UMR BioGECO) Daniel Jacob (UMR BFP)
  • 2. INRAE - Indexator – October 2022 • The implementation of a Data Management Plan (DMP) involves some requisites such as the data outsourcing to be preserved outside the users' disk space. • This concerns not only published data but all data produced during the course of a project. • This is even more necessary when temporary staff (doctoral students, post-docs, trainees, fixed-term contracts) are involved in the production of data. Data Management Plan How to encourage the structures (Units, Platforms,...) to better manage their data ?
  • 3. INRAE - Indexator – October 2022 Data storage • The central idea is that the storage space becomes the data repository, so the metadata should go to the data and not the other way around. Metadata How to encourage the structures (Units, Platforms,...) to better manage their data Your data repository • Concern about the organisation of these storage spaces. • Should they be harmonised, i.e. impose good practices such as i) folder and file naming, ii) folder structure (docs, data, scripts, etc.), iii) the use of README files, iv) etc. • At least the use of a README file seems the simplest and least restrictive.  what to put in it ? • How to use them effectively when you want to find information? With what vocabulary ?
  • 4. INRAE - Indexator – October 2022 Data storage Project data storage space : Put a metadata file (JSON format) describing the project data within each subdirectory • The central idea is that the storage space becomes the data repository, so the metadata should go to the data and not the other way around. The choice was made for the JSON format, which is very appropriate for describing metadata, readable by both humans and machines How to encourage the structures (Units, Platforms,...) to better manage their data Your data repository
  • 5. INRAE - Indexator – October 2022 Generate the metadata file (JSON) Data storage Web interface Project data storage space : Put a metadata file (JSON format) describing the project data within each subdirectory • The central idea is that the storage space becomes the data repository, so the metadata should go to the data and not the other way around. The choice was made for the JSON format, which is very appropriate for describing metadata, readable by both humans and machines Knowing the production of files in JSON format being delicate for users, a web interface makes it possible to create them. How to encourage the structures (Units, Platforms,...) to better manage their data deposit
  • 6. INRAE - Indexator – October 2022 View Metadata Generate the metadata file (JSON) Search datasets based on some metadata deposit scan Data storage Web interface Project data storage space : Put a metadata file (JSON format) describing the project data within each subdirectory Then, find projects and/or data corresponding to your criteria • The central idea is that the storage space becomes the data repository, so the metadata should go to the data and not the other way around. How to encourage the structures (Units, Platforms,...) to better manage their data
  • 7. INRAE - Indexator – October 2022 How to encourage the structures (Units, Platforms,...) to better manage their data What metadata? How to specify it? From which vocabulary? How to generate a JSON file? Questions immediately raised
  • 8. INRAE - Indexator – October 2022 • Given the diversity of domains, the approach chosen is to be both as flexible and as pragmatic as possible by allowing each collective to choose its own (controlled) vocabulary corresponding to the reality of its field and activities. • The main idea is to be able to "capture" the user's metadata as easily as possible using their vocabulary. How to encourage the structures (Units, Platforms,...) to better manage their data What metadata? How to specify it? From which vocabulary? How to generate a JSON file? Questions immediately raised
  • 9. INRAE - Indexator – October 2022 • The main idea is to be able to "capture" the user's metadata as easily as possible using their vocabulary. How to encourage the structures (Units, Platforms,...) to better manage their data The web interface must therefore correspond to the scientific and experimental context of the collective (research unit, project, platform, ...) What metadata? How to specify it? From which vocabulary? How to generate a JSON file? Questions immediately raised • Given the diversity of domains, the approach chosen is to be both as flexible and as pragmatic as possible by allowing each collective to choose its own (controlled) vocabulary corresponding to the reality of its field and activities.
  • 10. INRAE - Indexator – October 2022 … Web interface for metadata entry Generate the metadata file (JSON)
  • 11. INRAE - Indexator – October 2022 Sections … Web interface for metadata entry Generate the metadata file (JSON)
  • 12. INRAE - Indexator – October 2022 … Web interface for metadata entry Generate the metadata file (JSON) Sections Fields
  • 13. INRAE - Indexator – October 2022 … Web interface for metadata entry textbox dropbox textbox checkbox dropbox textbox textbox checkbox Generate the metadata file (JSON) Type Sections Fields
  • 14. INRAE - Indexator – October 2022 … Web interface for metadata entry textbox dropbox textbox checkbox dropbox textbox textbox checkbox Generate the metadata file (JSON) Predefined terms Sections Fields Type
  • 15. INRAE - Indexator – October 2022 Sections Predefined terms … Web interface for metadata entry Fields width=350px width=350px width=350px width=500px open textbox dropbox textbox checkbox dropbox textbox textbox checkbox Generate the metadata file (JSON) Features Type
  • 16. INRAE - Indexator – October 2022 … Fields Sections Type Features Predefined terms config_terms.txt Definition of metadata • Terminology definition file in Tabulation-Separared-Values (TSV) • Based on (controlled) vocabulary specified by the data manager of a collective (research unit, platform, … ) all the metadata to be entered can be fully configured using only one configuration file (TSV format). It is possible to define the whole terminology using a spreadsheet.
  • 17. INRAE - Indexator – October 2022 • column 1 - Field : shortname of the fields • column 2 - Section : shortname ot the sections • column 3 - Search : indicates if the field can be used as a criterion search ('Y') or not ('N') • column 4 - Shortview : indicates with ordered numbers if the field serves for the overview table after the search (empty by default) • column 5 - Type : indicates the way they will be entered via the web interface (possible values are: textbox, dropbox, checkbox and areabox). • column 6 - Features : dependings on the Type value, one can specifiy some specific features. If several features, they must be separated by a comma • for checkbox: open=0 or open=1 indicates if the selection is opened or not • for textbox & checkbox: autocomplete=item The items.js file must be present under web/js/autocomplete • for textbox & dropbox: width=NNNpx allows you to specify the width of the box. Usefull if you want put several fields in the same line • for areabox: row=NN and cols=NN allows you to specify the row and column size of the textarea • column 7 - Label : Labels corresponding to the fields that will appear in the web interface • column 8 - Predefined terms : for fields defined with a type equal to 'checkbox' or 'dropbox', one can give a list of terms separated by a comma. Structure of the Terminology definition file Definition of metadata config_terms.txt all the metadata to be entered can be fully configured using only one configuration file (TSV format).
  • 18. INRAE - Indexator – October 2022 Architecture diagram config_terms.json initdb search Configuration / Initialization steps Normal operating mode pgd-mmdt-schema.json Terminology definition file (Tabulation-Separated Values) Important: Must be defined in the first step and then no longer changed. Web interface (config) config_terms.txt generate generate generate linked MongoDB Web interface create insert PGD_XXXXX.json options scan cron Data storage deposit scan View Metadata Docker Containers Input / Output files Data storage Web server
  • 19. INRAE - Indexator – October 2022 Architecture diagram config_terms.json initdb pgd-mmdt-schema.json Terminology definition file (Tabulation-Separated Values) Important: Must be defined in the first step and then no longer changed. Web interface (config) config_terms.txt generate generate MongoDB http:/mysite.org/pgd-mmdt/config Docker Containers Input / Output files Configuration / Initialization steps web/json
  • 20. INRAE - Indexator – October 2022 Architecture diagram config_terms.json Web interface create PGD_XXXXX.json pgd-mmdt-schema.json linked options Data storage deposit Metadata entry Docker Containers Input / Output files web/json
  • 21. INRAE - Indexator – October 2022 Architecture diagram config_terms.json Web interface search insert MongoDB options scan cron Data storage scan View Metadata Docker Containers Input / Output files web/json Project search
  • 22. INRAE - Indexator – October 2022 … http:/mysite.org/pgd-mmdt/search Web interface for search
  • 23. INRAE - Indexator – October 2022 http:/mysite.org/pgd-mmdt/search#results Web interface for search Short View
  • 24. INRAE - Indexator – October 2022 http:/mysite.org/pgd-mmdt/metadata/Atacama Web interface for metadata …
  • 25. INRAE - Indexator – October 2022 PGD_XXXXX.json deposit scan Web interface options scan cron Web interface : Add new predefined terms Terminology definition file First time we need of this new term This new term is now available for other users / datasets Data storage
  • 26. INRAE - Indexator – October 2022 web/js/autocomplete/cities.js Web interface Example with Web interface : autocompletion . . API « Découpage administratif » (Administrative division) var cities=[]; $.getJSON("https://geo.api.gouv.fr/communes", function (data) { $.each(data, function (index, value) { cities.push(value['nom']); }); }); . Terminology definition file
  • 27. INRAE - Indexator – October 2022 // Get all descendant classes from 'Data' classe edam_data=[]; get_terms_from_bioportal('EDAM', 'http://edamontology.org/data_0006', 'edam_data'); web/js/autocomplete/edam_data.js To get information about the BioPortal API : https://data.bioontology.org/documentation Web interface : autocompletion Example with https://bioportal.bioontology.org/ontologies/EDAM/?p=classes “datatype":{ "titre":"Data type", "autocomplete":"edam_data", "width":"350px“ } web/json/config_terms.json . Web interface . . Choose from 947 terms autocompletion
  • 28. INRAE - Indexator – October 2022 Web interface : autocompletion https://vocabulaires-ouverts.inrae.fr/a-propos-du-thesaurus-inrae/ Example with
  • 29. INRAE - Indexator – October 2022 Web interface : autocompletion Example with https://consultation.vocabulaires-ouverts.inrae.fr/api/ web/js/autocomplete/VOvocab.js . Terminology definition file keywords = [ 'data', 'report','simulation', 'model', 'image','script', 'omics', 'statistic','scientific', 'research', ‘document', 'experiment','video', 'spatial', 'instrument' ] VOvocab=[]; get_terms_from_voinrae(keywords,'VOvocab') Choose from 405 terms autocompletion
  • 30. INRAE - Indexator – October 2022 Web interface : Resources Terminology definition file The "description" field should make it possible to better annotate the data, while the "location" field should make it possible to 1) extend the perimeter of the data beyond the local space, 2) eventually to be able to emancipate oneself from the local space when one wishes to disseminate the metadata alone A location can be anything: a text, an absolute path in a tree, a URL link, ... We can thus put a link to a publication: Type=article, link=DOI
  • 31. INRAE - Indexator – October 2022 Creation JSON metadata file metadata viewer Resource example 1: Atacama
  • 32. INRAE - Indexator – October 2022 Resource example 2: Link to nextcloud Put a NextCloud link pointing to the data repository. Access is thus limited to those who have rights !
  • 33. INRAE - Indexator – October 2022 Resource example 2: Link to nextcloud Put a NextCloud link pointing to the data repository. Access is thus limited to those who have rights ! Resource example 3: Indicate the path on a external storage In case putting an URL is not possible, nervertheless provide clear indications on the location of the data.
  • 34. INRAE - Indexator – October 2022 VM Data storage Web server Storage located on the VM Installation : Local, Remote or Mixed Local storage mounted on the VM NAS Server VPN GlobalProtect WinSCP Successful testing Local VM Remote VM (Datacenter) 2 cpu, 2 Go RAM, 10 Go HD
  • 35. INRAE - Indexator – October 2022 VM Data storage Web server Local VM Remote VM (Datacenter) Storage located on the VM Google Drive 2 cpu, 2 Go RAM, 10 Go HD Installation : Local, Remote or Mixed Local storage mounted on the VM NAS Server VPN GlobalProtect WinSCP Successful testing
  • 36. INRAE - Indexator – October 2022 scan [ncloud] type = webdav url = https://nextcloud.inrae.fr/remote.php/webdav/ vendor = nextcloud user = XXXXX Pass = XXXXX rclone mount ncloud:MTH2-PF-Bordeaux/DATA/ /mnt/ncloud/ --allow-other --vfs-cache-mode minimal --read-only --no-checksum --no-modtime --daemon --daemon-wait 15s https://pmb-bordeaux.fr/ncloud/search https://nextcloud.inrae.fr/apps/files/?dir=/MTH2-PF-Bordeaux/DATA
  • 37. INRAE - Indexator – October 2022 Web Interface Creation of the JSON file Mapping of JSON file sections/terms with the metadata structure in DATA INRAE Pre-fill a dataset in the INRAE DATA dataverse (via API) JSON Schema Metadata JSON file + pgd-mmdt-schema.json JSON-LD Metadata JSON-LD file • A good approach is to use only controlled vocabulary i.e. a relevant and sufficient vocabulary used as reference in the field concerned to allow users to describe a project and its context without having to add additional terms. • A mapping of terms based on controlled vocabulary can thus be done more easily to generate formats corresponding to different standards (MIAPPE, JSON-LD, ...) Push
  • 38. INRAE - Indexator – October 2022 Example of mapping from a controlled vocabulary based on an ontology in BioPortal autocompletion http://edamontology.org/data_0006 API BioPortal ontology / EDAM get terms Pre-fill a dataset in the INRAE DATA dataverse (via API)
  • 39. INRAE - Indexator – October 2022 API BioPortal Search https://data.bioontology.org/search ?q=Gene%20expression%20profile&ontology=EDAM&subtree_root_id=http%3A%2F%2Fedamontology.org%2Fdata_0006&apikey=…. Example of mapping from a controlled vocabulary based on an ontology in BioPortal autocompletion http://edamontology.org/data_0006 API BioPortal ontology / EDAM get terms search Pre-fill a dataset in the INRAE DATA dataverse (via API) Mapping get
  • 40. INRAE - Indexator – October 2022 Example of mapping from a controlled vocabulary based on the Thesaurus INRAE https://consultation.vocabulaires-ouverts.inrae.fr/api/ API Thesaurus INRAE get terms Pre-fill a dataset in the INRAE DATA dataverse (via API) autocompletion
  • 41. INRAE - Indexator – October 2022 Example of mapping from a controlled vocabulary based on the Thesaurus INRAE https://consultation.vocabulaires-ouverts.inrae.fr/api/ API Thesaurus INRAE get terms Pre-fill a dataset in the INRAE DATA dataverse (via API) autocompletion https://consultation.vocabulaires-ouverts.inrae.fr/rest/v1/search ?vocab=thesaurus-inrae&lang=en&type=skos%3AConcept &query=metabolomics &offset=0 API Thesaurus INRAE search get Mapping
  • 42. INRAE - Indexator – October 2022 Create the project Descriptive metadata (Project) Preserving data Web-based metadata entry tool Storage space for the project associated with the metadata file Data analysis •Adding new metadata •Saving data with their metadata •Convert to a suitable format (JSON-LD) Access to data Reuse of data Metadata query (Web interface and/or API) Observations, Samples, Experimentation, Instrumentation Push JSON-LD JSON with a Schema Adding Resources NAS National and international data repositories TSV PGD_XXX.json … TSV XXX “Machine-Actionable Metadata" Create the data JSON with a Schema Pre-fill a dataset in the INRAE DATA dataverse (via API) Mapping
  • 43. INRAE - Indexator – October 2022 • Have a visibility of what is produced within the collective • data sets, software, databases, images, sounds, videos, analyses, codes, ... • Use a controlled vocabulary specific to the domain of the collective, with mapping to other formats embedding ontologies to be done downstream as required, • Propose an alternative/complement to external data repositories or other thematic warehouses to have knowledge of and access to ALL data, not only those that are published, • Favour FAIR (at least Findable & Accessible criteria) within the collective, • Sensitise newcomers and students to a better description of what they produce. Conclusion The “INDEXATOR" tool allows a collective to :
  • 44. INRAE - Indexator – October 2022 https://github.com/inrae/pgd-mmdt Thank you for your attention Metadata Management for Storage Spaces Metadata aggregation & indexation Source code