| Publication Type | presentation |
| School or College | University Libraries |
| Department | Digital Library Services |
| Creator | Neatrour, Anna; McBride, Brian.; Brunsvik, Matt; Maringanti, Harish; Myntti, Jeremy; Witkowski, Alan |
| Title | Supercharged Digital Collections: Moving to the fast lane with scalable open source infrastructure |
| Date | 2017-05-19 |
| Description | Presentation given at the Utah Library Association Conference, Sandy, UT. |
| Type | Text |
| Publisher | University of Utah |
| Subject | Digital libraries; Systems migration |
| Language | eng |
| Conference Title | Utah Library Association Conference |
| Rights Management | © Anna Neatrour, Brian McBride, Matt Brunsvik, Harish Maringanti, Jeremy Myntti, Alan Witkowski |
| Format Medium | application/pdf |
| ARK | ark:/87278/s68h2hhv |
| Setname | ir_uspace |
| ID | 1283557 |
| OCR Text | Show 1 Supercharged Digital Collections: Moving to the fast lane with scalable open source infrastructure Photos from University of Utah, Multimedia Archives Photo Collection, Gary Dean Brown 2 Presenters Anna Neatrour Matt Brunsvik Brian McBride Harish Maringanti Other Presenters/contributors: Jeremy Myntti Alan Witkowski 3 What we'll cover today ▸History ▸SIMP demo ▸Migration ▸Lessons learned 4 History of UU Marriott Digital Library 2001 Launched ContentDM 5 History of UU Marriott Digital Library 2002 Launched Utah Digital Newspapers (UDN) 6 History of UU Marriott Digital Library 2003 UDN receives IMLS NLG grant 7 History of UU Marriott Digital Library 2010 UDN reaches 1 Million pages milestone 8 History of UU Marriott Digital Library 2012 2001 • Purchased Rosetta for long term archiving • SIMP development started 9 History of UU Marriott Digital Library 2013 2001 • SIMP launched • DAMS review Committee report 10 History of UU Marriott Digital Library 2001 2015 Upgraded Server & Storage infrastructure 11 History of UU Marriott Digital Library 2001 2016 UDN migrated (Jan - June) All Other digital Collections migrated (June - Dec) 12 Marriott Library Digital Collections ▸269 collections ▸717,858 IEs (items) ▸139,583 compound objects ▸12 file types (images,videos,pdfs...) ▸3 TB storage ▸Over 50 partners (internal and external) hosting collections with the Marriott Library https://collections.lib.utah.edu/ 13 Utah Digital Newspapers (UDN) ▸~1.926 Million pages ▸~21.64 Million articles ▸~138 Newspaper titles ▸Article level newspaper data ▸Many unique papers not in Chronicling America https://digitalnewspapers.org/ 14 SIMP Tool for metadata management ▸Initially designed to facilitate management between CONTENTdm and Rosetta, our preservation system ▸Platform agnostic and modular, provides ability to use other DAMs/Preservation Systems ▸Options for automatically extracting some metadata (format, OCR text) ▸Updated in 2016 to support Solphal More details available in D-Lib article: http://www.dlib.org/dlib/july14/neatrour/07neatrour.html 15 DAMS Review in 2013 16 DAMS Review in 2013 17 DAMS Review - requirements criteria Patron ease of use •End-user experience •Search options and accuracy - discoverability of the content •Handle all formats in similar fashion for all browsers •ADA compliance Metadata administration •Metadata creation, editing, and maintenance options •Faceting •Copyright/Embargo •Search Engine Optimization (SEO) Types of content supported •Major content categories (e.g. IR, newspapers, EAD) •File types and datasets •Tiers of content (both object and item level) •Media streaming Integration with other library platforms •Discovery layer (e.g. Primo) •Digital preservation system (e.g. Rosetta) •Locally developed tools (e.g. SIMP Tool) Issues related to ingest/conversion/exit •Workflows for adding new content •Converting existing collection data •Barrier(s) to exit Technical infrastructure/administration •Scalability of collections/content •Hardware/Server(s) specs •API support •System and server installation, configuration, and upgrades •Internal resources required to run the system Collection administration •Interface design •User permissions •Website configuration •Statistics/Reporting/Logs Technical support •Training •Help desk Future and strategic directions 18 DAMS Review Outcome - Solphal 19 Hardware Requirements - CONTENTdm vs Solphal CONTENTdm Solphal •2 servers (UDN & Digital Collections) •3 VMs (indexing services, UDN, and Digital Collections) •Digital Collections (2x12 Core 3.0 Ghz, 64 Gb mem) UDN (2x8 Core 2.4 Ghz, 96 Gb mem) •VMs avg. 2 cores, 3 Gb mem •Scalability issues •Extremely small footprint (hardware and power savings) •Highly scalable and sustainable •Limited customizations •Fully customizable •Easy replication and cloud-centric 20 Benefits of Solphal •In-house expertise and support •Student, patron, and partner convenience • • • • faster response times advanced search functions front-end metadata editing customized interface •Substantial power, hardware, and licensing cost savings •Improved workflows and QA/QC for all teams •Staff time savings = ~1 less FTE supporting servers •Demonstrating and sharing open source solutions with other institutions 21 Indexing Improvements Utah Digital Newspapers Digital Collections •55 GB of metadata •6 GB of metadata •21,648,462 records •2,248,922 records •CONTENTdm • Full re-index ~ 1440 hrs (site is online) • Full re-index ~ 240 hrs (site is offline) •CONTENTdm • Full re-index ~ 144 hrs (site is online) • Full re-index ~ 72 hrs (site is offline) •Apache Solr indexing • Initial index ~ 2.75 hrs • Incremental index ~ instant •Apache Solr indexing • Initial index ~ 25 min • Incremental index ~ instant 22 User Interface Improvements Utah Digital Newspapers Digital Collections •CONTENTdm • Page load times (avg) 10s • Pdf display issues •CONTENTdm • Page load times (avg) 5s • Pdf display issues •Solphal • Page load times (avg) under 500ms •Solphal • Custom HTML5 image viewer • HTML response times (avg) under 50ms • Complete page load times (avg) under 500ms 23 Sample Solr Ingest File 24 UDN migration vs. Digital Collections migration Utah Digital Newspapers •Newspapers metadata more standardized, but not perfect •Inconsistencies with titles, dates, paper names, and type fields •Clean-up for nonstandard characters, newspaper names issues •Previous architecture had one newspaper title split into multiple collections (scalability issues) •Consistent field names and metadata meant that migration happened sooner and faster •No external partners to contact Digital Collections •More communication needed with both internal and external partners during migration •Problem or missing metadata values in older collections not fixed (yet) •Field label standardization •Standardization of type and format •Lots of clean-up work still to do in the future 25 Metadata issues: Corrections Metadata Standardization Establish core fields needed for faceting and queries in new DAM, and change varying fields to the following: •Creator •Date •Subject •Rights •Spatial Coverage •Type •Title 26 Metadata fields gone wild! About the Art About the Artist Abstract Accession number Accession Number Access restrictions Access Restrictions Access rights Accompanying Material Acoustic Zone Added Title Added title 2 Additional information Additional Information Additional Resources Additonal notes Address Affiliation Ages Alt # 1 Alt # 2 Alternate Name(s) Alternate title AlternateTitle Alternate Title Alternative formats available Alternative Formats Available Alternative title Amendments and updates Anatomy Annotation Archival file Archival Resolution ark ARK Article Title Article Title(s) Artist Artist/Manufacturer Artist's Notation Athlete's Name Audience Audio Clip Instructions Author Author/Editor Author/Editor Bio Author(s) AV Item Number Awards A-Z assignment Band Banner Batch Batch # Batch and Box # Batch number bibliographicCitation Bibliographic Citation Binding Birth Date Birthplace Bit depth Bit-depth Bit Depth Box and Folder Number browse order Building Business/building names B/W Illustrations Calendar Calendar Month Call Number Camp CA+P Accession No. CA+P Course Caption Caption on slide Cataloged by Cataloger Catalog Number Catalog Transcription Catalogue Catalogued By Category Cloth Discount Cloth ISBN Cloth Price CDM 5 URL Cohort Number Century Collection Children of Deceased Collection aliases citatation_issn Collection compiler Citation Collection Compiler citation_author Collection contact information citation_conference_titl e Collection information citation_date Collection Information citation_dissertation_institution Collection, Is part of citation_dissertation_name Collection Name citation_doi Collection Name and Number citation_firstpage Collection number citation_inbook_titl e Collection Number citation_isbn Collection number and name citation_issn Collection Number and Name citation_issue Collection Owning Institution citation_journal_title Collection type citation_keywords Color Illustrations citation_language Comment Citation_lastpage Comments citation_other_author Commissioned By citation_publisher Common Name citation_technical_report_institution Company citation_technical_report_numbe Company/Business citation_title Complete Catalog citation_volume Completion Date City Compressed file editing softwar Class Compressed File Extent Classes Compressed file operating syste Classification Compressed file quality Click here for an immersive experience! Compressed file size Click here for a 'page-flip' view Compression Click here for details Conference Click here to go to AFRC 2012 Winter Conference Contact information Materials CONTENTdm file name Click here to go to link CONTENTdm file name v.5 Click here to view photo set! CONTENTdm file path v.5 Click to Play CONTENTdm number Clinical CONTENTdm number v.5 Clinical Signs Context Context URL Contibuting Institution Continent Contrib. Institution Contributing Institution Contributing.Institution Contributing Institutution Contributing Organization Contributor Contributor Primary Contributor Primary Contributor Secondary Conversion specifications ConversionSpecifications Conversion Specifications Copy ID Copyright Copyright Date Copyright notice Corporate name Corporate Name Country County County Name Coverage Coverage spatial Coverage-spatial Coverage Spatial Coverage-Spatial Coverage - Temporal Coverage-Temporal Covrage-spatial Creation Date Creation methodology Creation Methodology Creator Creator Dates Creator Nationality Creator of Media Recording Creator(s)/Author(s) Credit Line Culture Current address Current Address Curriculum Data Description Date Date created Date-Created Date created v.5 Date Delivered Date digital Date.digital Date Digital Date-Digital Date.Digital Date.Earliest Date Entered Date issued Date.Latest Date modified Date modified v.5 Date of photograph Date of Photograph Date.original Date Original Date-Original Date.Original Date Original DF Date or Period Date Scanned DC.citation.epage DC.citation.issue DC.citation.spage DC.citation.volume DC.relation.ispartof Death Date Deathplace Deceased Age Deceased NameDegree Degree Granting Institution Department Department email Department name Depth (3D Objects) Desciption Anecdotal Description Description2 Dewey Decimal Number Digital Archive Digital Collection Digital File Location Digital Format Digitization Specification Digitization specifications Digitization Specifications Digitization Specs Digitized by Digitizing Device DigitizingTechnician Dimensions Full NatureServe Report Full resolution Full Resolution 2 Full resolution folder path Full resolution v.5 Full test Full text Full Text Full text field Full Text Field Full Text Search Full Transcript Function Funding Further InFunding/Fellowship formation Game Website Gender Genre Genre (AAT) Genre or Form Genus Geographical names Geographical Names Geographic coordinates Geographic Coordinates Geographic Location Geo-locale Geopolitical place Get Media Global Positioning System Coordinat Grade Level Habitat Type Has Part Height, Framed Height Unframed Hidden Description Historic address Historic Address Historic place name Historic Place Name History Holding institution Holding Institution Holding.Institution Holding Location Hosting institution ID Identifier Image Capture 27 Variations in title field name 28 Controlled Vocabularies Type (DCMI Type vocab) Format •Collection •Dataset •Event •Image •Image/MovingImage •Image/StillImage •InteractiveResource •PhysicalObject •Service •Software •Sound •Text •application/msword •application/pdf •application/vnd.google-earth.kmz •application/vnd.ms-powerpoint •application/xml •application/x-shockwave-flash •application/zip •audio/mpeg •image/jpeg •image/png •text/html •video/mp4 (Internet Media Type) Language (ISO 639-2) •ara •chi •eng •fre •ger •heb •ita •kor •lat •nav •rus •spa 29 Delete Unneeded Legacy Collections 30 Future metadata corrections - next steps Review 441 fields manually, noted issues (field used in only one collection, field synonyms) Develop spreadsheet to cluster possible fixes by partner, write up metadata recommendations ongoing process To do: •Explore possibilities of assessment scripts against solr to review metadata values •Explore visualization tools 31 Future metadata corrections - Rights statements 32 Future metadata corrections - MWDL Application Profile 33 Future metadata corrections - Subject faceting 34 Roadmap for future development Solphal (Front end) development Large projects •International Image Interoperability Framework (IIIF) •RDFa and schema.org support •IMLS Grants • Newspapers in Hydra (LG-70-170043-17) • Western Name Authority File (LG72-16-0002-16) •Born digital content •Partner projects •Data Repository SIMP Tool enhancements •Workflow automation •Large updates monthly, small updates as needed 35 36 People Platforms Processes 37 Lessons Learned Wild wild west of metadata wrangling Don't keep shoveling while it is still snowing! Shooting of Patrick Coughlin https://collections.lib.utah.edu/details?id=962118 Unidentified man holding snow shovel in front of his house, (family, friend or acquaintance of Norman D. Nevills). https://collections.lib.utah.edu/details?id=984034 38 Thank You! Contact us at: Jeremy Myntti (jeremy.myntti@utah.edu) Anna Neatrour (anna.neatrour@utah.edu) Harish Maringanti (harish.maringanti@utah.edu) Brian McBride (brian.mcbride@utah.edu) Alan Witkowski (alan.witkowski@utah.edu) Matt Brunsvik (matt.brunsvik@utah.edu) https://collections.lib.utah.edu/ https://newspapers.lib.utah.edu/ |
| Reference URL | https://collections.lib.utah.edu/ark:/87278/s68h2hhv |



