I've written before about ships in WorldCat Identities, but never tried to find all the ships for which there are Identity pages. Recently Peter McCracken suggested that ShipIndex.org might link to appropriate Identities pages (and vice-versa), so we went ahead and tried to locate all the ships in Worldcat.
Not surprisingly, identifying all the ships took a little work. The task reminds me of trying to find fictitious characters in WorldCat, and we used a similar approach looking at the parenthetical qualifier that describes the name. For instance the Confederate sloop Alabama is controlled as Alabama (Screw sloop). The trouble is that there are quite a few different types of ships and boats, from the Albatross (Steamer) to Zingarella (Bark). Here is the basic list we are currently using:
- barge
- bark
- barque
- boat
- brig
- carrier
- catamaran
- corvette
- cruiser
- destroyer
- dredge
- freighter
- frigate
- galleon
- ironclad
- ketch
- packet
- schooner
- ship
- sloop
- steamer
- submarine
- sweeper
- tender
- trawler
- trimaran
- tug
- vessel
- whaler
- yacht
I'm sure there are more words that describe ships, so please let me know if you notice anything missing.
Of course you have to be careful not to include airships or boat seaplanes or ship replicas or townships or Steamboat Springs, so I also have a list of patterns to skip, and then a list of overrides for the skips.
Altogether I found just under 40,000 Identity pages for ships in WorldCat Identities. About 6,700 of them have an LC/NACO authority record associated with the name. Approximately 11,000 had two or more citations and 6,000 had three or more.
How successful we will be in linking the databases remains to be seen, but it was interesting to try to find all the ships.
Here are the regular expressions (in Python):
#Always accept these:
simpleRE = re.compile(r'\(battle.?ships?\)|\(.*cargo.?ships?\)|\(cruise.?ships?\)|\(motor.?ships\)|\(passenger.?ships?\)|\(steam.?ships?\)|\(war.?ships?\)|\(ships?\)', re.IGNORECASE)
#Skip these
skipRE = re.compile(r'\(airship\)|\(air.ship\)|\(.*boat seaplane.*\)|\(.*boat work.*\)|\(.*fellowship.*\)|\(.*friendship.*\)|\(.*govenership.*\)|\(.* key.*\)|\(.*partnership.*\)|\(.* replica.*\)|\(.*ship[a-z].*\)|\(.*springs.*\)|\(.*township.*\)|\(.*twonship.*\)|\(.*voivodeship.*\)', re.IGNORECASE)
# The main regular expression
shipsRE = re.compile(
r'\(.*barges?\b\)|\(.*bark\b\)|\(.*barkentine.*\)|\(.*barque.*\)|\(.*boat.*\)|\(.*brig\b\)|\(.*carrier.*\)|\(.*catamaran.*\)|\(.*corvette.*\)|\(.*cruiser.*\)|\(.*destroyer.*\)|\(.*dredge.*\)|\(.*freighter.*\)|\(.*frigate.*\)|\(.*galleon.*\)|\(.*ironclad.*\)|\(.ketch.*\)|\(.*packet.*\)|\(.*schooner.*\)|\(.*ship.*\)|\(.*sloop.*\)|\(.*steamer.*\)|\(.*submarine.*\)|\(.*sweeper.*\)|\(.*tender.*\)|\(.*trawler.*\)|\(.*trimaran.*\)|\(.*tugs?\b\)|\(.*vessel.*\)|\(.*whaler.*\)|\(.*yacht.*\)', re.IGNORECASE)
--Th