SlideShare a Scribd company logo
1 of 21
Batch metadata assignment to archival
photograph
collections using facial recognition softwareKyle Banerjee
banerjek@ohsu.edu
Why should anyone care?
Current methods for assigning metadata are:
•Slow
•Difficult
•Error Prone
•Incomplete
2
Filing code stencil cards at the W. Atlas Burpee Company
Library of Congress Prints and Photographs Division
A few challenges
• Libraries and archives use external
systems to maintain metadata
• Archival images are huge and clunky to
work with
• Metadata standards for image files are
implemented inconsistently and weren’t
designed with library needs in mind
3
Automation
• Process in bulk
• Use metadata embedded
within the image
4
Fran Bilas Spence and Jean Jennings Bartik work on ENIAC
ARL Technical Library
• Use the file system
• Use consumer grade software as a force multiplier
• Improve search engine visibility and simplify
migrations
What you need to get started
• A computer with the operating system of
your choice
• Mad programming skilz
• Modest scripting ability (any language)
5
Image metadata demystified
$ head lovejoy-moskovetz_1923.tif
II▒▒▒@d▒▒F▒(1▒2▒▒ ▒▒]BI▒▒ ▒Ci▒Black and white photograph of Esther Pohl
Lovejoy and Doctors Elliot and Moskovetz in Athens in 1923.▒▒['▒▒['Adobe Photoshop
CS2 Windows2012:04:10 14:16:16<?xpacket begin=""
id="W5M0MpCehiHzreSzNTczkc9d"?>
[a few lines deleted here]
<rdf:Description rdf:about=""
xmlns:tiff="http://ns.adobe.com/tiff/1.0/">
<tiff:ImageWidth>6046</tiff:ImageWidth>
<tiff:ImageLength>4880</tiff:ImageLength>
[a few more lines deleted]
<dc:subject>
<rdf:Bag>
<rdf:li>Lovejoy</rdf:li>
<rdf:li>Moskovetz</rdf:li>
</rdf:Bag>
</dc:subject> 6
Facial recognition
• People are an important access point
• Provides authority control by nature
• Identification of individuals helps
determine other details
7
Facial recognition primer
WPI Transformations
• Extraction of faces simplifies manual
identification
• Non-specialist staff can do more metadata work
Useful software
• Free Picasa software works
great
8
• Stores person info in a combination of
contacts.xml and .picasa.ini files
9
Since I know you’re wondering, it’s no good
for…
10
.picasa.ini
[lovejoy-esther_portrait_nd.jpg]
faces=rect64(135a175de074cd8b),c0ef2256901bfbb6
backuphash=23375
[matarrazo-joseph_2001.jpg]
faces=rect64(3407026fe607ac00),c2c65f903b3150cb
backuphash=33
11
contacts.xml
<contact id="c0ef2256901bfbb6" name=“Esther Pohl
Lovejoy" modified_time="2012-11-26T09:48:04-08:00"
local_contact="1"/>
<contact id="c2c65f903b3150cb" name="Joseph
Matarazzo" modified_time="2012-11-30T15:02:10-
08:00" local_contact="1"/>
12
Adding metadata en masse
• Exiftool (available for all platforms) is incredibly
handy
exiftool -XMP-dc:Subject+=‘My new heading’ myimage.tif
exiftool -XMP-iptcExt:PersonInImage+=‘Doe, John’ myimage.tif
• Notice the Dublin Core subject tag
• DC doesn’t define people explicitly as subjects
so we used IPTC extensions here
13
14
Exiftool is useful for reading
metadata
• Exif stores excellent technical metadata so
it’s nuts to hand key this into other systems
• Usage is brain dead
exiftool filename (Labeled display)
exiftool –X filename (XML)
exiftool –T filename (Tab delimited)
• Many powerful options
15
You need 3 image metadata
standards
• Exif for technical metadata
• IPTC for many descriptive fields
• XMP for specialized information needed by
archivists and librarians
16
A glimpse into the future
• Social metadata
• Union catalogs contain better metadata than
local catalogs
• Create richer and more accurate metadata
much faster and cheaper than is otherwise
possible
17
18
Before going nuts on your photos…
Picasa can mess up existing metadata if you let it
write tags (facial recognition doesn’t use tags)
You can create new tags, but don’t expect other
software to read them
Facial recognition is a handy tool, but don’t use it
as a crutch
Always test before performing batch metadata
modifications or you may wind up blasting out
existing metadata
19
Takeaways from this presentation
1. Facial recognition is easy with Picasa
2. Exiftool is incredibly useful for reading
and writing image metadata
3. Learning to use embedded metadata is
easy and makes too much sense not to
do
20
Thank You!
Kyle Banerjee
banerjek@ohsu.edu

More Related Content

Similar to Batch metadata assignment to archival photograph collections using facial recognition software

It summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publishIt summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publishkevin_donovan
 
Web Data Analysis at the Spallation Neutron Source
Web Data Analysis at the Spallation Neutron SourceWeb Data Analysis at the Spallation Neutron Source
Web Data Analysis at the Spallation Neutron SourceRicardo Ferraz Leal
 
SANS Forensics 2009 - Memory Forensics and Registry Analysis
SANS Forensics 2009 - Memory Forensics and Registry AnalysisSANS Forensics 2009 - Memory Forensics and Registry Analysis
SANS Forensics 2009 - Memory Forensics and Registry Analysismooyix
 
Denis Reznik "True SQL Server Detective"
Denis Reznik "True SQL Server Detective"Denis Reznik "True SQL Server Detective"
Denis Reznik "True SQL Server Detective"Fwdays
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisBrendan Gregg
 
Search and analyze data in real time
Search and analyze data in real timeSearch and analyze data in real time
Search and analyze data in real timeRohit Kalsarpe
 
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...DevOpsDays Tel Aviv
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226Nick Kypreos
 
Power to the People: Manipulating SharePoint with Client-Side JavaScript
Power to the People:  Manipulating SharePoint with Client-Side JavaScriptPower to the People:  Manipulating SharePoint with Client-Side JavaScript
Power to the People: Manipulating SharePoint with Client-Side JavaScriptPeterBrunone
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16Christian Berg
 
.NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov).NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov)ITCamp
 
[Case Simulation Workshop] Michael Evans - High performance system building
[Case Simulation Workshop] Michael Evans - High performance system building[Case Simulation Workshop] Michael Evans - High performance system building
[Case Simulation Workshop] Michael Evans - High performance system buildingNexus FrontierTech
 
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)Masayuki Nii
 
Big Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerBig Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerMongoDB
 
The databases in SSDT: A work with project and best practices
The databases in SSDT: A work with project and best practicesThe databases in SSDT: A work with project and best practices
The databases in SSDT: A work with project and best practicesKamil Nowinski
 
Facebook Scaling Overview
Facebook Scaling OverviewFacebook Scaling Overview
Facebook Scaling OverviewMoritz Haarmann
 

Similar to Batch metadata assignment to archival photograph collections using facial recognition software (20)

It summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publishIt summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publish
 
Web Data Analysis at the Spallation Neutron Source
Web Data Analysis at the Spallation Neutron SourceWeb Data Analysis at the Spallation Neutron Source
Web Data Analysis at the Spallation Neutron Source
 
SANS Forensics 2009 - Memory Forensics and Registry Analysis
SANS Forensics 2009 - Memory Forensics and Registry AnalysisSANS Forensics 2009 - Memory Forensics and Registry Analysis
SANS Forensics 2009 - Memory Forensics and Registry Analysis
 
Extended events
Extended eventsExtended events
Extended events
 
Rails scaling
Rails scalingRails scaling
Rails scaling
 
Denis Reznik "True SQL Server Detective"
Denis Reznik "True SQL Server Detective"Denis Reznik "True SQL Server Detective"
Denis Reznik "True SQL Server Detective"
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance Analysis
 
Search and analyze data in real time
Search and analyze data in real timeSearch and analyze data in real time
Search and analyze data in real time
 
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
 
Match making system
Match making systemMatch making system
Match making system
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
 
Power to the People: Manipulating SharePoint with Client-Side JavaScript
Power to the People:  Manipulating SharePoint with Client-Side JavaScriptPower to the People:  Manipulating SharePoint with Client-Side JavaScript
Power to the People: Manipulating SharePoint with Client-Side JavaScript
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16
 
.NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov).NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov)
 
[Case Simulation Workshop] Michael Evans - High performance system building
[Case Simulation Workshop] Michael Evans - High performance system building[Case Simulation Workshop] Michael Evans - High performance system building
[Case Simulation Workshop] Michael Evans - High performance system building
 
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
 
Big Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerBig Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision Maker
 
The databases in SSDT: A work with project and best practices
The databases in SSDT: A work with project and best practicesThe databases in SSDT: A work with project and best practices
The databases in SSDT: A work with project and best practices
 
Facebook Scaling Overview
Facebook Scaling OverviewFacebook Scaling Overview
Facebook Scaling Overview
 

More from Kyle Banerjee

Getting Started with the Alma API
Getting Started with the Alma APIGetting Started with the Alma API
Getting Started with the Alma APIKyle Banerjee
 
Keep it Safe, Stupid, or an Intro to Digital Preservation
Keep it Safe, Stupid, or an Intro to Digital PreservationKeep it Safe, Stupid, or an Intro to Digital Preservation
Keep it Safe, Stupid, or an Intro to Digital PreservationKyle Banerjee
 
Future Directions in Metadata
Future Directions in MetadataFuture Directions in Metadata
Future Directions in MetadataKyle Banerjee
 
Переход от отдельных библиотечных систем к объединенной системе Альма
Переход от отдельных библиотечных систем к объединенной системе АльмаПереход от отдельных библиотечных систем к объединенной системе Альма
Переход от отдельных библиотечных систем к объединенной системе АльмаKyle Banerjee
 
Normalizing Data for Migrations
Normalizing Data for MigrationsNormalizing Data for Migrations
Normalizing Data for MigrationsKyle Banerjee
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesKyle Banerjee
 
Intro to XML in libraries
Intro to XML in librariesIntro to XML in libraries
Intro to XML in librariesKyle Banerjee
 

More from Kyle Banerjee (9)

Getting Started with the Alma API
Getting Started with the Alma APIGetting Started with the Alma API
Getting Started with the Alma API
 
Demystifying RDF
Demystifying RDFDemystifying RDF
Demystifying RDF
 
Keep it Safe, Stupid, or an Intro to Digital Preservation
Keep it Safe, Stupid, or an Intro to Digital PreservationKeep it Safe, Stupid, or an Intro to Digital Preservation
Keep it Safe, Stupid, or an Intro to Digital Preservation
 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
 
Future Directions in Metadata
Future Directions in MetadataFuture Directions in Metadata
Future Directions in Metadata
 
Переход от отдельных библиотечных систем к объединенной системе Альма
Переход от отдельных библиотечных систем к объединенной системе АльмаПереход от отдельных библиотечных систем к объединенной системе Альма
Переход от отдельных библиотечных систем к объединенной системе Альма
 
Normalizing Data for Migrations
Normalizing Data for MigrationsNormalizing Data for Migrations
Normalizing Data for Migrations
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
 
Intro to XML in libraries
Intro to XML in librariesIntro to XML in libraries
Intro to XML in libraries
 

Recently uploaded

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Batch metadata assignment to archival photograph collections using facial recognition software

  • 1. Batch metadata assignment to archival photograph collections using facial recognition softwareKyle Banerjee banerjek@ohsu.edu
  • 2. Why should anyone care? Current methods for assigning metadata are: •Slow •Difficult •Error Prone •Incomplete 2 Filing code stencil cards at the W. Atlas Burpee Company Library of Congress Prints and Photographs Division
  • 3. A few challenges • Libraries and archives use external systems to maintain metadata • Archival images are huge and clunky to work with • Metadata standards for image files are implemented inconsistently and weren’t designed with library needs in mind 3
  • 4. Automation • Process in bulk • Use metadata embedded within the image 4 Fran Bilas Spence and Jean Jennings Bartik work on ENIAC ARL Technical Library • Use the file system • Use consumer grade software as a force multiplier • Improve search engine visibility and simplify migrations
  • 5. What you need to get started • A computer with the operating system of your choice • Mad programming skilz • Modest scripting ability (any language) 5
  • 6. Image metadata demystified $ head lovejoy-moskovetz_1923.tif II▒▒▒@d▒▒F▒(1▒2▒▒ ▒▒]BI▒▒ ▒Ci▒Black and white photograph of Esther Pohl Lovejoy and Doctors Elliot and Moskovetz in Athens in 1923.▒▒['▒▒['Adobe Photoshop CS2 Windows2012:04:10 14:16:16<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> [a few lines deleted here] <rdf:Description rdf:about="" xmlns:tiff="http://ns.adobe.com/tiff/1.0/"> <tiff:ImageWidth>6046</tiff:ImageWidth> <tiff:ImageLength>4880</tiff:ImageLength> [a few more lines deleted] <dc:subject> <rdf:Bag> <rdf:li>Lovejoy</rdf:li> <rdf:li>Moskovetz</rdf:li> </rdf:Bag> </dc:subject> 6
  • 7. Facial recognition • People are an important access point • Provides authority control by nature • Identification of individuals helps determine other details 7 Facial recognition primer WPI Transformations • Extraction of faces simplifies manual identification • Non-specialist staff can do more metadata work
  • 8. Useful software • Free Picasa software works great 8 • Stores person info in a combination of contacts.xml and .picasa.ini files
  • 9. 9
  • 10. Since I know you’re wondering, it’s no good for… 10
  • 12. contacts.xml <contact id="c0ef2256901bfbb6" name=“Esther Pohl Lovejoy" modified_time="2012-11-26T09:48:04-08:00" local_contact="1"/> <contact id="c2c65f903b3150cb" name="Joseph Matarazzo" modified_time="2012-11-30T15:02:10- 08:00" local_contact="1"/> 12
  • 13. Adding metadata en masse • Exiftool (available for all platforms) is incredibly handy exiftool -XMP-dc:Subject+=‘My new heading’ myimage.tif exiftool -XMP-iptcExt:PersonInImage+=‘Doe, John’ myimage.tif • Notice the Dublin Core subject tag • DC doesn’t define people explicitly as subjects so we used IPTC extensions here 13
  • 14. 14
  • 15. Exiftool is useful for reading metadata • Exif stores excellent technical metadata so it’s nuts to hand key this into other systems • Usage is brain dead exiftool filename (Labeled display) exiftool –X filename (XML) exiftool –T filename (Tab delimited) • Many powerful options 15
  • 16. You need 3 image metadata standards • Exif for technical metadata • IPTC for many descriptive fields • XMP for specialized information needed by archivists and librarians 16
  • 17. A glimpse into the future • Social metadata • Union catalogs contain better metadata than local catalogs • Create richer and more accurate metadata much faster and cheaper than is otherwise possible 17
  • 18. 18
  • 19. Before going nuts on your photos… Picasa can mess up existing metadata if you let it write tags (facial recognition doesn’t use tags) You can create new tags, but don’t expect other software to read them Facial recognition is a handy tool, but don’t use it as a crutch Always test before performing batch metadata modifications or you may wind up blasting out existing metadata 19
  • 20. Takeaways from this presentation 1. Facial recognition is easy with Picasa 2. Exiftool is incredibly useful for reading and writing image metadata 3. Learning to use embedded metadata is easy and makes too much sense not to do 20