How to Troubleshoot Apps for the Modern Connected Worker
Batch metadata assignment to archival photograph collections using facial recognition software
1. Batch metadata assignment to archival
photograph
collections using facial recognition softwareKyle Banerjee
banerjek@ohsu.edu
2. Why should anyone care?
Current methods for assigning metadata are:
•Slow
•Difficult
•Error Prone
•Incomplete
2
Filing code stencil cards at the W. Atlas Burpee Company
Library of Congress Prints and Photographs Division
3. A few challenges
• Libraries and archives use external
systems to maintain metadata
• Archival images are huge and clunky to
work with
• Metadata standards for image files are
implemented inconsistently and weren’t
designed with library needs in mind
3
4. Automation
• Process in bulk
• Use metadata embedded
within the image
4
Fran Bilas Spence and Jean Jennings Bartik work on ENIAC
ARL Technical Library
• Use the file system
• Use consumer grade software as a force multiplier
• Improve search engine visibility and simplify
migrations
5. What you need to get started
• A computer with the operating system of
your choice
• Mad programming skilz
• Modest scripting ability (any language)
5
6. Image metadata demystified
$ head lovejoy-moskovetz_1923.tif
II▒▒▒@d▒▒F▒(1▒2▒▒ ▒▒]BI▒▒ ▒Ci▒Black and white photograph of Esther Pohl
Lovejoy and Doctors Elliot and Moskovetz in Athens in 1923.▒▒['▒▒['Adobe Photoshop
CS2 Windows2012:04:10 14:16:16<?xpacket begin=""
id="W5M0MpCehiHzreSzNTczkc9d"?>
[a few lines deleted here]
<rdf:Description rdf:about=""
xmlns:tiff="http://ns.adobe.com/tiff/1.0/">
<tiff:ImageWidth>6046</tiff:ImageWidth>
<tiff:ImageLength>4880</tiff:ImageLength>
[a few more lines deleted]
<dc:subject>
<rdf:Bag>
<rdf:li>Lovejoy</rdf:li>
<rdf:li>Moskovetz</rdf:li>
</rdf:Bag>
</dc:subject> 6
7. Facial recognition
• People are an important access point
• Provides authority control by nature
• Identification of individuals helps
determine other details
7
Facial recognition primer
WPI Transformations
• Extraction of faces simplifies manual
identification
• Non-specialist staff can do more metadata work
8. Useful software
• Free Picasa software works
great
8
• Stores person info in a combination of
contacts.xml and .picasa.ini files
13. Adding metadata en masse
• Exiftool (available for all platforms) is incredibly
handy
exiftool -XMP-dc:Subject+=‘My new heading’ myimage.tif
exiftool -XMP-iptcExt:PersonInImage+=‘Doe, John’ myimage.tif
• Notice the Dublin Core subject tag
• DC doesn’t define people explicitly as subjects
so we used IPTC extensions here
13
15. Exiftool is useful for reading
metadata
• Exif stores excellent technical metadata so
it’s nuts to hand key this into other systems
• Usage is brain dead
exiftool filename (Labeled display)
exiftool –X filename (XML)
exiftool –T filename (Tab delimited)
• Many powerful options
15
16. You need 3 image metadata
standards
• Exif for technical metadata
• IPTC for many descriptive fields
• XMP for specialized information needed by
archivists and librarians
16
17. A glimpse into the future
• Social metadata
• Union catalogs contain better metadata than
local catalogs
• Create richer and more accurate metadata
much faster and cheaper than is otherwise
possible
17
19. Before going nuts on your photos…
Picasa can mess up existing metadata if you let it
write tags (facial recognition doesn’t use tags)
You can create new tags, but don’t expect other
software to read them
Facial recognition is a handy tool, but don’t use it
as a crutch
Always test before performing batch metadata
modifications or you may wind up blasting out
existing metadata
19
20. Takeaways from this presentation
1. Facial recognition is easy with Picasa
2. Exiftool is incredibly useful for reading
and writing image metadata
3. Learning to use embedded metadata is
easy and makes too much sense not to
do
20