Becker Archives Digital Content Organization Plan

Stephen Logsdon
Archivist
Washington University School of Medicine
logsdons@wustl.edu

The Becker Archives Digital Content Organization Plan (BADCOP) outlines the file-naming convention used for all digital content maintained by the Bernard Becker Medical Library Archives at the Washington University School of Medicine. To explain how it works, I want to first draw your attention to the ornate document labeled Number 1 which is the US Army commission given to Dr. William Beaumont during the War of 1812. This document can be found in the William Beaumont Papers at the Becker Library. President James Madison signed this commission appointing Dr. Beaumont as a surgeon in the Sixth Regiment of Infantry in the US Army on December 2, 1812.

Imagine that a patron wanted a scanned copy of this document in PDF format. Once you scan it for them, you’ll need to provide a filename for the PDF on a screen that looks similar to the image labeled Number 2. What filename do you give it? Should the filename begin with “William Beaumont” or “Beaumont-William”? Should you only say it’s a commission, or should you be more specific and indicate it’s a surgeon’s commission in the US Army? Should James Madison’s name be in the filename anywhere? Should you include the date of the document in the filename? All of these questions are important to consider when choosing a filename.

screen-shot-2016-11-22-at-12-40-21-pm

The Becker Archives Digital Content Organization Plan, with the unfortunate acronym BADCOP, takes the guessing game out of assigning filenames because this plan centers on a methodical file-naming system. The basic premise of BADCOP is that the organization of digital content should follow the principle of archival arrangement (the organization and sequence of items within a collection). All filenames assigned using this method will use a series of symbolic letters and numbers that represent the scanned file’s arrangement within a collection. The BADCOP-compliant filename that I would assign to this document is labeled image Number 3: PC012-S05-B20-F03.pdf.

screen-shot-2016-11-22-at-12-40-21-pm

Briefly looking at this filename, you’ll see that it does not say it’s a surgeon’s commission, it does not include William Beaumont’s name or James Madison’s, and it does not even contain the date of the document.  However, if you look closer at the filename, all of that information is included.  The filename PC012-S05-B20-F03.pdf is a code, and you can see how that code breaks down into identifiable pieces in the much abbreviated view of the finding aid to the William Beaumont Papers represented in image Number 4.

screen-shot-2016-11-22-at-12-40-25-pm

PC012 is the collection code for Personal Collection #12, the William Beaumont Papers. S05 stands for Series #5, which is the series in which the commissions are located. B20 is Box #20. F03 is folder #3, which contains the 1812 surgeon’s commission signed by President Madison.

There are numerous justifications for using BADCOP, but the most important reason to implement this file-naming convention is to answer this question: Once you have scanned this document, and you have assigned it the filename PC012-S05-B20-F03.pdf, how are you ever going find that PDF again? The answer to that question is the beauty of BADCOP. Let’s say several years from now, a different patron asks you for a PDF of that exact same surgeon’s commission. How would you find it amongst the 1000s of digitized images on your computer, server, or wherever you store your digital content?

You would find the PDF of the surgeon’s commission in exactly the same way as you would if you were looking for the original physical copy of it. You should use the finding aid for the William Beaumont Papers. Don’t start this search with your digital files. Instead, go to the finding aid first and search for the description of the item you are looking for, which in this case is the 1812 surgeon’s commission. Once you find it, then you have also identified the BADCOP filename because you know its organizational location in the collection. It’s the third file of Box 20 in Series 5 of the Beaumont Papers. You can then create that corresponding filename on the fly while you’re looking at the finding aid: PC012-S05-B20-F03.pdf.

screen-shot-2016-11-22-at-12-40-30-pm

Now that you know the filename you need, you are sufficiently prepared to find it amongst all your digital content. The ease of finding the correct digitized file is illustrated by the filenames listed in image Number 5. In this case, you have scanned only six documents in that collection. Picking out the filename you need is rather easy in this case.

Imagine that instead of six scanned documents, you had scanned 600 documents from this collection. If you have assigned BADCOP-compliant filenames to each file, all 600 scans will line up in your file directory in exactly the same order as your finding aid lists them. So all of your scanned documents from Series 3, are going to follow all of those from Series 1 and Series 2. All of the scans from Box 13 are going to be found after all the scans from Box 1 through Box 12. This means there is no need to open up random files on your computer from this collection to check if it’s the specific document you want. Because you have the filename in hand, you know the exact file you are looking for. So whether there are six, 600, or 6000 PDFs from this collection, finding the exact file you need takes only seconds, and that’s what makes BADCOP such an effective tool to use.

For more information about the BADCOP file-naming convention, visit:

https://becker.wustl.edu/resources/arb/policies/becker-archives-digital-content-organization-plan

becker-archives-digital-content-organization-plan-saa-presentation-2016
Commission signed by President James Madison appointing Dr. William Beaumont as a surgeon in the Sixth Regiment of Infantry in the US Army on December 2, 1812. Personal Collection #12, William Beaumont Papers, Bernard Becker Medical Library Archives, Washington University School of Medicine.

Launching a Digital Preservation Program as a Solo Archivist

Mike Satalof
Archivist and Digital Collections Librarian
Bard Graduate Center
mike.satalof@bgc.bard.edu

Bard Graduate Center (BGC) is a graduate research institute in New York City. Founded in 1993, it examines the decorative arts, design history, and material culture through academic programs, research forums, and exhibitions. As part of a small Library staff, and as its first archivist, I have found rewarding opportunities and challenges in planning, advocating for, and—finally!—beginning to acquire and preserve the institution’s born-digital collections.

In 2014, soon after the BGC’s 20th anniversary, the Library began undertaking an initiative to roll out a modest institutional archive. Though at the start of this initiative our familiarity with digital preservation concepts was low-to-moderate, we knew it would be important to build in sound policies and infrastructure to preserve born-digital materials. To solo archivists taking on such a project for the first time, my advice would be to take an approach that fosters a shared sense of purpose with your stakeholders; think broadly in gathering information about how your institution’s digital assets are created and managed; and aim for a scale within the limitations of your resources.

We made communication an essential part of this project from the beginning, focusing on outreach and advocacy while preparing for an institution-wide inventory. To make our case and propose a game plan, a Digital Preservation Committee was called, including stakeholders, departmental representatives, and IT. During the inventory process, I held more than 20 meetings with staff from 13 different departments to compile technical data about their digital materials, recording information such as file formats, size, and storage locations. I found that along with gathering data, these “bring out your dead” meetings were also useful for gathering the stories behind those digital files directly from the staff members most familiar them: Which were of especially high value? Which were lost when a long-time employee left? What kind of anxieties did staff feel about managing, storing, and preserving their department’s electronic records? Combining institutional memory with an in-depth inventory provided a detailed map of the landscape of our digital assets and informed the next phase: prioritizing for preservation of materials most at risk.

With the inventories completed, I reported back to the departments with findings and recommended next steps, including a series of proposed pilot preservation projects. We aimed to identify “low hanging fruit”—high value digital materials already in danger of becoming lost or unreadable, including exhibition records, publications, public programming materials, and thesis projects. While planning several discrete pilots, I also began drafting policies to formalize our archive and its mission.

To plan for a digital repository, we were able to secure funding to hire a great part-time digital archives consultant to provide recommendations for a repository that could be managed at our scale (a solo archivist and small institutional IT staff). The consultant and I worked with IT to identify a solution under $5,000 that could be administered by a lone archivist and monitored by IT with little maintenance. We selected a three-copy solution with a server that would be used both to store collections and serve as the “drop-off” point for other departments to transfer digital materials to the archives. The consultant and I produced workflow documentation for the accessioning, transfer, processing and description of materials (in ArchivesSpace), and he provided a script that leverages the LoC’s BagIt tool to monitor file fixity. With IT’s assistance in setting up the server, I have been able to begin processing and preserving pilot collections this fall, with hopes to complete processing and revise documentation in the coming months.

One interesting thing that emerged as I completed the inventory was the number of staff members raising questions about records management and especially feeling unsure about how to take on some of these seemingly-overwhelming tasks (like cleaning up a large file backlog of project files or creating procedures for disposition). While a large-scale records management program is beyond our scope, it became clear that for many departments, these issues represent a more tangible priority than getting materials into an archive. I have tried to provide some records cleanup recommendations in response to specific questions, and it is clear that records management training will be a key activity that the archives is in the best position to offer as a service. In the future, I would like to explore outreach to staff and faculty through best practices workshops and an online documentation portal (via Google Docs), and by providing individualized consultation on request.

I’m grateful that this project has been well received and to have the support of administrators, library colleagues, and allies in other departments who have given thoughtful direction to our still-nascent archives. In a role that allows me to work with the institution’s many different constituencies, I am eager to make sure the archives is inclusive, transparent, and trusted as a repository for the record of their most important achievements and efforts—digital or nondigital.

Thanks to the LART for providing a wonderful platform for solo archivists to share our experiences and resources.

screen-shot-2016-11-11-at-4-22-46-pm
Slide for: Preserving in Digital Formats: Challenges and Solutions in Small Archives, SAA Annual Meeting, Atlanta, Georgia, August 3, 2016.