Please join the ALCTS CaMMS Catalog Management Interest Group Meeting at ALA Annual Conference in Washington, DC, on Saturday, June 22, 2019, 1:00-2:00 pm at The Marriott Marquis for presentations and Q &A on experiences with batch processing and metadata reconciliation projects completed using MarcEdit, OpenRefine, Excel, and Python scripting language to facilitate record processing and enhance catalog management.

PRESENTATION 1

Presentation Title: Does Working in Batch Mean Sacrificing Quality Metadata? How tools like MarcEdit, OpenRefine, Excel, and Python can help improve access and discovery

Name and position of presenter: Jennifer Eustis, Metadata Librarian, University of Massachusetts Amherst

Abstract: Batch processing metadata for electronic resources means working with records of varying quality. Common issues include titles in all sorts of cases, missing information such as publication, URLs, or fixed field data, lack of information needed for local best practices, or inconsistent vendor and/or OCLC numbers. These issues can be daunting and involve a significant amount of cleanup that can slow the batch processing down or make it ineffective. To help process title sets of records, I have begun using a suite of tools that include MarcEdit, OpenRefine, Excel, and Python. These tools help me address common issues and implement local practices in batch. The results are better quality metadata records that facilitate access and discovery. My presentation will highlight how I use these tools with examples. My hope is that attendees can learn from these examples and use these tools in their own batch processing.

PRESENTATION 2

Presentation Title: Title-level De-Duplication of Vendor Records Using OpenRefine

Name and position of presenter: Elizabeth Miraglia, Assistant Program Director, Metadata Services, Head, Books & Serials Metadata, UC San Diego Library

Abstract: UC San Diego participates in several patron-driven acquisition programs, often relying on vendor-supplied records for discovery in its catalog. After the merger of two large e-book vendors, we found that we had a large amount of content in one particular DDA pool that duplicated content that was either already licensed elsewhere or that existed in another DDA or EBA program. Selectors were interested in removing the duplicate content in order to prevent purchasing titles we didn’t actually want or need to buy. However, because vendor records vary in quality and often do not include OCLC numbers, this de-duplication had to happen at the title level and wasn’t possible in the past. Over the course of several months, UC San Diego metadata staff developed a process using OpenRefine to compare and de-duplicate their DDA titles against licensed content. We were able to provide our acquisitions team with identifiers for deactivating the titles in the vendor’s platform and create a much cleaner catalog. In addition, we estimate that the library ultimately a substantial amount of money and prevented about 1,000 duplicate purchases. This presentation will outline the process that we developed and how we’ve been able to re-purpose it for smaller scale projects and some ongoing maintenance

PRESENTATION 3

Presentation title: Connecting Crowdsourced Audio Recording Metadata with MARC Records

Name and position of presenter: Brian Rennick, Associate University Librarian for Library Information Technology

Abstract: A large collection of vinyl long play (LP) records was cataloged years ago without specifying any music genre or style metadata. In an effort to improve discovery of these LPs, a project is underway to pull metadata from the Discogs user-built database of music. The enrichment project employs batch processing using a combination of OpenRefine and Python scripting. Thousands of Discogs volunteers have contributed high quality metadata for more than 11 million audio recordings. The Discogs XML data is freely available under the CC0 No Rights Reserved license and has proven to be suitable for adding genres and style metadata to the audio recordings in our library catalog. This presentation will share lessons learned from the project so far and will describe the techniques and algorithms used for matching MARC records with the Discogs data.

Jeanette Sewell and Vesselina Stoytcheva, Catalog Management IG co-chairs 2018-2019

Dan Tam Do and Marina Morgan, Catalog Management IG vice co-chairs 2018-2019, chairs 2019-2010

Best regards,

Vesselina Stoytcheva

Systems Librarian

Office of Management – Human Capital

Office of the Comptroller of the Currency

Phone: (202) 649-7120

Cell: (202) 731-5383