SlideShare a Scribd company logo
1 of 23
CREATING A DATABASE OF
SHIP CITATIONS:
THE CHALLENGES ENCOUNTERED
IN SHIPINDEX.ORG
The Charleston Conference, 3 Nov 2010
Peter McCracken
Co-Founder & Director of Content
and Business Development,
ShipIndex.org
What kinds of ships are these?
Bark (or barque); Ship; Brigantine; Barquentine; Topsail Schooner; Schooner
Serials :: Ships
 Publication pattern (or format?) :: Vessel type
 Serial title :: Ship name
 ISSN :: IMO
  Ship research :: Any other historical research
Ships :: Other historical research
 Problems with ships are the same as problems
with personal names, geographic descriptors,
etc.
 Can also apply to concepts, as well as things
 Also ‘non-unique’ items, like a car model
Data challenges – personal names
 Innumerable works by “Anonymous”
 Names are often shortened
 Pablo Picasso’s full name was Pablo Diego José
Francisco de Paula Juan Nepomuceno María de
los Remedios Cipriano de la Santísima Trinidad
Ruiz y Picasso
 Names have strange limitations
 Some must be unique – Consider Michael J. Fox
 Some are very common – Consider Adam Smith
Data challenges – geographic
names
 Numerous variations: Köln; Cologne; Keulen;
Colonia; Colònia; Kolín nad Rýnem; Cwlen;
Κολωνία; Kolonjo; ‫;كولونيا‬ Кьолн; Ķelne; Кёльн
 Name changes
 Hot Springs, NM -> Truth or Consequences, NM
 Halfway, OR -> Half.com, OR
 Clark, TX -> DISH, TX
 St. Petersburg -> Petrograd -> Leningrad ->
St. Petersburg (“Petersburg,” or “Piter”)
A “meaning-less” identifier
 Regardless of the topic, some meaning-less
identifier can provide significant assistance
 “Meaning-less” in the sense of a one-to-many
relationship between the identifier and the
data
 The identifier doesn’t change, but the data can
Overview of ShipIndex.org
 A database of citations –
 >1.42 million citations, from >200 resources
 >140,000 citations are freely available
 Changes how one does maritime research
 Far more content can researched more quickly
 Opens up maritime research to everyone
 No need for inside knowledge on where to start
searching
 Uncovers many hidden resources
 Locates free, but hidden, web resources
Maritime access points
 Vessel name
 Vessel number
 IMO numbers are new; hull numbers change
 Captain name
 They change between voyages, and die during
them
 Rig or vessel type
 Ships are rebuilt; definitions change; “ship”
 ALSO: Port of registration; crew members;
others
Vessel names – this is easy!
 “What does the
stern say?”
1872, American Lloyd’s Register of American and Foreign
Shipping
1867, American Lloyd’s Register of American and Foreign
Shipping
Sources of errors – primary sources
 Mistakes in primary sources are very common,
and forgiveable
 Digitized version of Lloyd’s List of 1812
Ships called “Adolph & Fredericka”
Sources of errors – transcribers,
indexers, OCR operators, etc.
 Transcription errors are very easy to make –
whether through incorrect assumptions, or
just mistakes
 “Earnets” for “Earnest”; “Elizaneth” for
“Elizabeth”, etc.
 Some files are much tougher to manage than
others
More challenges
 How do we locate Elizabeth? Or Mary?
 Elizabeth = 1899 citations
 Mary = 2614 citations
 Top ten ship names, for no good reason: Mary, Maria,
Elizabeth, Anna, Union, Victoria, Hope, Flora,
Emma, America
 Try to limit results sets?
 by time period
 by vessel rig (maybe?)
 by location(?)
 by nationality
Changing vessel names
 What do we do when a vessel changes its
name?
 A person researching a vessel wants to know the
life of a ship; at present they need to know its
previous or subsequent names
 This can only be done when we have unique
vessel identifiers – otherwise, how do you know
which Elizabeth became Hogwarts Belle?
Existing vessel identifiers
 Hull Identification Number – Only US; any
powered boat
 USCG Documentation Number – Only US; >5
net tons
 IMO Number – Assigned by Lloyd’s/Fairplay;
international; passenger ships >100 gross
tons, and cargo ships >300 gross tons;
mandatory from 1996
 Naval Identifiers – eg, PT-109, CV-42, BB-18,
DD-793, D118, etc.
 Lloyd’s numbers, and many more…
Unique historical vessel identifiers
 Need an easy way to differentiate between
“Mary,” “Mary,” and “Mary”
 Needs to be unique and unchanging (unlike
name, naval identifier, etc.)
 Identifier itself has no meaning – no
indication within it of size, nationality, etc.
 Identifier is quickly & automatically assigned
 Identification is coordinated with multiple
organizations
Creating an identifier
 Could be done through a standards-creation
process, via NISO or another organization
 Or informally, with publicly-defined
guidelines, such as (just as examples):
 Nine-digit number; ddd-ddddd-c (c=check digit)
 Allow individuals to easily request identifiers for
their vessels or their citations
 Need ability to easily combine/split/modify
 User-managed is likely most cost-effective solution
Creating an identifier
 Must have buy-in from many groups
 Should be easy to implement
 Should be easy to use; available to many
individuals and resources
 Pre-populate as much as possible, open
editing to all
 Maintain advisory group to address concerns,
disagreements, etc.
Defining <ShipIdentifier>
<OtherIdentifiers>
<IdentifierType>
<IdentifierNumber>
<ShipName>
<DateNameStartedInUse>
<DateNameEndedInUse>
<PreviousShipName>
<SubsequentShipName>
<RigType> - defined list of types, & “other”
<VoyageIdentifier> - multiple
More <ShipIdentifier>
<MilitaryUsage?> - yes/no/unclear
<Nationality>
<ServiceBranch>
<HullIdentifier>
<VesselMeasurements>
<MeasurementType> - list of options
<MeasurementValue>
Defining <VoyageIdentifier>
<ShipIdentifier>
<Captain>
<Crew> - multiple positions, multiple names
<CrewPosition>
<CrewmemberName>
<OtherVoyageIdentifiers>
<OtherVoyageDatabase>
<OtherVoyageDbId>
Expanding to other fields
 Makes discovery more manageable
 Makes linking possible
 Use the same concept for other areas of
research, linking everything together
 People
 Places
 Manufactured items
 Artwork
 Everything
Thoughts, questions, more?
Thank you –
Peter McCracken
peter@shipindex.org

More Related Content

Viewers also liked

Scattergraph
ScattergraphScattergraph
Scattergraphag00059
 
2012 Aquatic And Recreation Conference
2012 Aquatic And Recreation Conference2012 Aquatic And Recreation Conference
2012 Aquatic And Recreation ConferenceCraig Burton
 
Jsr final presentation Uday Kolluri
Jsr final presentation Uday KolluriJsr final presentation Uday Kolluri
Jsr final presentation Uday KolluriNikunja Kolluri
 
Breakfast with Greenlight Nov 2010
Breakfast with Greenlight Nov 2010Breakfast with Greenlight Nov 2010
Breakfast with Greenlight Nov 2010Emilcott
 
L'engagement sur les réseaux sociaux et les chaînes de télévision françaises
L'engagement sur les réseaux sociaux et les chaînes de télévision françaisesL'engagement sur les réseaux sociaux et les chaînes de télévision françaises
L'engagement sur les réseaux sociaux et les chaînes de télévision françaisesengagementlabs France
 
Mesure de la performance e-marketing
Mesure de la performance e-marketingMesure de la performance e-marketing
Mesure de la performance e-marketingohmyweb!
 
Sources de trafic site internet
Sources de trafic site internetSources de trafic site internet
Sources de trafic site internetohmyweb!
 

Viewers also liked (12)

Scattergraph
ScattergraphScattergraph
Scattergraph
 
2012 Aquatic And Recreation Conference
2012 Aquatic And Recreation Conference2012 Aquatic And Recreation Conference
2012 Aquatic And Recreation Conference
 
Jsr final presentation Uday Kolluri
Jsr final presentation Uday KolluriJsr final presentation Uday Kolluri
Jsr final presentation Uday Kolluri
 
Formulari nadal
Formulari nadalFormulari nadal
Formulari nadal
 
Pepsico
PepsicoPepsico
Pepsico
 
Breakfast with Greenlight Nov 2010
Breakfast with Greenlight Nov 2010Breakfast with Greenlight Nov 2010
Breakfast with Greenlight Nov 2010
 
Sho ts
Sho tsSho ts
Sho ts
 
Photojournalism
PhotojournalismPhotojournalism
Photojournalism
 
L'engagement sur les réseaux sociaux et les chaînes de télévision françaises
L'engagement sur les réseaux sociaux et les chaînes de télévision françaisesL'engagement sur les réseaux sociaux et les chaînes de télévision françaises
L'engagement sur les réseaux sociaux et les chaînes de télévision françaises
 
rapportDigital-TV
rapportDigital-TVrapportDigital-TV
rapportDigital-TV
 
Mesure de la performance e-marketing
Mesure de la performance e-marketingMesure de la performance e-marketing
Mesure de la performance e-marketing
 
Sources de trafic site internet
Sources de trafic site internetSources de trafic site internet
Sources de trafic site internet
 

Creating A Database of Ship Citations

  • 1. CREATING A DATABASE OF SHIP CITATIONS: THE CHALLENGES ENCOUNTERED IN SHIPINDEX.ORG The Charleston Conference, 3 Nov 2010 Peter McCracken Co-Founder & Director of Content and Business Development, ShipIndex.org
  • 2. What kinds of ships are these? Bark (or barque); Ship; Brigantine; Barquentine; Topsail Schooner; Schooner
  • 3. Serials :: Ships  Publication pattern (or format?) :: Vessel type  Serial title :: Ship name  ISSN :: IMO   Ship research :: Any other historical research
  • 4. Ships :: Other historical research  Problems with ships are the same as problems with personal names, geographic descriptors, etc.  Can also apply to concepts, as well as things  Also ‘non-unique’ items, like a car model
  • 5. Data challenges – personal names  Innumerable works by “Anonymous”  Names are often shortened  Pablo Picasso’s full name was Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso  Names have strange limitations  Some must be unique – Consider Michael J. Fox  Some are very common – Consider Adam Smith
  • 6. Data challenges – geographic names  Numerous variations: Köln; Cologne; Keulen; Colonia; Colònia; Kolín nad Rýnem; Cwlen; Κολωνία; Kolonjo; ‫;كولونيا‬ Кьолн; Ķelne; Кёльн  Name changes  Hot Springs, NM -> Truth or Consequences, NM  Halfway, OR -> Half.com, OR  Clark, TX -> DISH, TX  St. Petersburg -> Petrograd -> Leningrad -> St. Petersburg (“Petersburg,” or “Piter”)
  • 7. A “meaning-less” identifier  Regardless of the topic, some meaning-less identifier can provide significant assistance  “Meaning-less” in the sense of a one-to-many relationship between the identifier and the data  The identifier doesn’t change, but the data can
  • 8. Overview of ShipIndex.org  A database of citations –  >1.42 million citations, from >200 resources  >140,000 citations are freely available  Changes how one does maritime research  Far more content can researched more quickly  Opens up maritime research to everyone  No need for inside knowledge on where to start searching  Uncovers many hidden resources  Locates free, but hidden, web resources
  • 9. Maritime access points  Vessel name  Vessel number  IMO numbers are new; hull numbers change  Captain name  They change between voyages, and die during them  Rig or vessel type  Ships are rebuilt; definitions change; “ship”  ALSO: Port of registration; crew members; others
  • 10. Vessel names – this is easy!  “What does the stern say?” 1872, American Lloyd’s Register of American and Foreign Shipping 1867, American Lloyd’s Register of American and Foreign Shipping
  • 11. Sources of errors – primary sources  Mistakes in primary sources are very common, and forgiveable  Digitized version of Lloyd’s List of 1812 Ships called “Adolph & Fredericka”
  • 12. Sources of errors – transcribers, indexers, OCR operators, etc.  Transcription errors are very easy to make – whether through incorrect assumptions, or just mistakes  “Earnets” for “Earnest”; “Elizaneth” for “Elizabeth”, etc.  Some files are much tougher to manage than others
  • 13. More challenges  How do we locate Elizabeth? Or Mary?  Elizabeth = 1899 citations  Mary = 2614 citations  Top ten ship names, for no good reason: Mary, Maria, Elizabeth, Anna, Union, Victoria, Hope, Flora, Emma, America  Try to limit results sets?  by time period  by vessel rig (maybe?)  by location(?)  by nationality
  • 14. Changing vessel names  What do we do when a vessel changes its name?  A person researching a vessel wants to know the life of a ship; at present they need to know its previous or subsequent names  This can only be done when we have unique vessel identifiers – otherwise, how do you know which Elizabeth became Hogwarts Belle?
  • 15. Existing vessel identifiers  Hull Identification Number – Only US; any powered boat  USCG Documentation Number – Only US; >5 net tons  IMO Number – Assigned by Lloyd’s/Fairplay; international; passenger ships >100 gross tons, and cargo ships >300 gross tons; mandatory from 1996  Naval Identifiers – eg, PT-109, CV-42, BB-18, DD-793, D118, etc.  Lloyd’s numbers, and many more…
  • 16. Unique historical vessel identifiers  Need an easy way to differentiate between “Mary,” “Mary,” and “Mary”  Needs to be unique and unchanging (unlike name, naval identifier, etc.)  Identifier itself has no meaning – no indication within it of size, nationality, etc.  Identifier is quickly & automatically assigned  Identification is coordinated with multiple organizations
  • 17. Creating an identifier  Could be done through a standards-creation process, via NISO or another organization  Or informally, with publicly-defined guidelines, such as (just as examples):  Nine-digit number; ddd-ddddd-c (c=check digit)  Allow individuals to easily request identifiers for their vessels or their citations  Need ability to easily combine/split/modify  User-managed is likely most cost-effective solution
  • 18. Creating an identifier  Must have buy-in from many groups  Should be easy to implement  Should be easy to use; available to many individuals and resources  Pre-populate as much as possible, open editing to all  Maintain advisory group to address concerns, disagreements, etc.
  • 20. More <ShipIdentifier> <MilitaryUsage?> - yes/no/unclear <Nationality> <ServiceBranch> <HullIdentifier> <VesselMeasurements> <MeasurementType> - list of options <MeasurementValue>
  • 21. Defining <VoyageIdentifier> <ShipIdentifier> <Captain> <Crew> - multiple positions, multiple names <CrewPosition> <CrewmemberName> <OtherVoyageIdentifiers> <OtherVoyageDatabase> <OtherVoyageDbId>
  • 22. Expanding to other fields  Makes discovery more manageable  Makes linking possible  Use the same concept for other areas of research, linking everything together  People  Places  Manufactured items  Artwork  Everything
  • 23. Thoughts, questions, more? Thank you – Peter McCracken peter@shipindex.org

Editor's Notes

  1. German; English; Spanish; Afrikaans; Catalan; Welsh; Greek; Esperanto; Arabic; Bulgarian; Latvian; Russian
  2. “organization, however, can be a challenge” – how we are applying it to ships can be used in nearly any other area, I believe
  3. IMO mandatory from Jan 1996; for propelled ocean-going ships of >100GT
  4. CV-42: Franklin D. Roosevelt; BB-18: USS Connecticut; DD-793: USS Cassin Young; D118: HMS Coventry