Islandora Online, 4 August 2020

Case Study on Archives Central

Presented by Jonathan Hunt (@kayakr)

HTML: Navigate via left/right/up/down, ESC to see all slides, 's' for speaker notes

Case Study on Archives Central

  • Archives Central (about), 9 councils
  • ~#5 Islandora 8 in world
  • ICA Records in Contexts (Conceptual Model v0.2, ontology)
  • Migrated Records, Still Images, Documents from Kete
  • Migrated supporting data incl. trackable items


Catalyst Map

Records in Contexts

  • ICA Records in Contexts (Conceptual Model v0.2, ontology)
  • Agents: Corporate Body (RiC-E11), Family (RiC-E10), Person (RiC-E08) terms
  • Place (RiC-E22) terms
  • Accessions node
  • Record Sets (Series) (RiC-E03) node
  • Records (RiC-E04) node, Media, File
  • Containers and Locations node
  • also Rights, Newsletter, Basic Page...

Records in Contexts mapping

  • via rdf.mapping.node.*.yml, rdf.mapping.term.*.yml, etc.
  • RiC mapping is partial, e.g.
    • Not using Event (RiC-E14), Rule (RiC-E16), Mandate (RiC-E17), etc.
    • skip over Group (RiC-E09) to Family (RiC-E10), Corporate Body (RiC-E11)
    • Date is EDTF string, not class
    • relation in Islandora is from Instantiation (RiC-E06) media to Record node (RiC-R025i instantiates)
    • relations are not classes
    • Might use Record Part (RiC-E05) in future


json-ld fragment

Location control

  • Location: Building, Room, Shelf, Row, etc.
  • Container: London Box, Envelope, etc.
Location model diagram

Migration overview

  • 210,000 records including ~7,000 images and 600 PDF documents
  • Used Drupal Migrate framework
  • Sources: SQL (local snapshot of Kete MySQL db), CSV, embedded json

Migration sequence

  • User to User user
  • Kete basket to Group group
  • Kete notional to Basic Page node
  • License to Rights node + embedded_data
  • Agencies to Agents (corporate body) term
  • Accession to Accession node
  • Series to Record Set (Record Set Type=Series) node
  • Item to Record node
  • Still Image (jpeg), Document (pdf) to File file
  • Still Image, Document to Media media
  • Trackable Items to TISL table to Locations, Containers node

Data challenges

  • Long titles (1,284 chars) -> Full title field #
  • disjoint values, e.g. Consult Archivist, See Archivist, Refer to Archivist -> OpenRefine text facets
  • disjoint values, Ian Matheson City archives, Ian Matheson CIty Archives
  • Numerous legacy ids, tried typed identifier field, ended up mapped individually
  • modelling: coordinates, series no, box no fields on Record

Migration implementation

  • sort source rows
  • disjoint values mapped via yml map: or hook_migrate_MIGRATION_ID_prepare_row()
  • onPostRowSave() event subscribers for redirect, group assignment
  • composite keys for locations, containers
  • ORDER BY CAST(id AS unsigned)
  • source:
      has_redirects: true
      batch_size: 100
      track_changes: true