An Institutional Perspective of Digitization
Fred Saunderson, The National Library of Scotland
Fred Saunderson, An Institutional Perspective of Digitization, in Andrea Wallace and Ronan Deazley, eds, Display At Your Own Risk: An experimental exhibition of digital cultural heritage, 2016.
One third of the National Library of Scotland’s collections will be held in digital format by 2025, the centenary of the Library’s foundation (National Library of Scotland 2015). As a strategic objective, this goal is telling of the significance and role of ‘digital’ to contemporary, collecting cultural institutions. It also speaks to the scale and richness of analogue collections. The Library’s collections come in all shapes, sizes and varieties, from bestselling paperbacks to archives of correspondence. The collections number well into the tens of millions, although counting the exact number of material objects is not feasible. All this in a single library which, in its current form as Scotland’s national library, is less than a century old.
As rapid as the growth of the physical collections has been, the expansion of the National Library’s digital holdings promises to be more extreme. The Library’s physical collections currently number around 25 million material objects. So far, we have digitized around 170,000 paper-based objects, generating about 5.2 million digital images, and we hold around 1.3 million digital journal articles and 50,000 ebooks, collected in accordance with the Legal Deposit Libraries (Non-Print Works) Regulations 2013. Assuming our cumulative holdings rise to around 28 million by 2025, we anticipate that 33% – more than nine million – of these objects will be digital. The Library currently receives around 4,000 new physical objects every week and while it remains unlikely that print material will abate in the next decade the number of born-digital items available for collection will undoubtedly also rise, which in turn will contribute to the expansion of our digital holdings. However, as much as our born-digital collections will increase, this growth will not deliver the Library’s aspirational one-third threshold. To ‘plug the gap’ many of the material objects in our collection will need to be transformed from analogue items into the 1s and 0s of machine-readable (digital) data.
This process of digitization presents cultural institutions like the National Library with a significant set of opportunities and challenges. This essay will explore these, through examination of the nature of physical collections, the benefits of transforming these into digital form, and the associated problems and risks. The paper’s aim is to describe some of the factors involved and dispel any idea that digitization, as a process, involves nothing more than ‘scan and publish’.
MATERIAL OBJECTS AND ANALOGUE INFORMATION
Information is recorded and conveyed by a multiplicity of carriers and forms. Excluding information that is stored as machine-readable 1s and 0s (digital information), these carriers can broadly be categorized as ‘analogue’ materials. Cultural and collecting organizations have long been in the business of acquiring, preserving and providing access to analogue content. From paintings, sculptures, and scrolls to jewels, stuffed animals, and airplanes, the array of analogue materials is extensive. As a library, the National Library naturally focuses on analogue materials that specifically function as ‘information’ carriers; however, its collections extend beyond printed books: photographs, maps, drawings, etchings, plans, charts, letters, diaries, journals, ledgers, films and television programmes are just some of the other information-holding formats that we curate.
Analogue information is indelibly valued and valuable. The physical characteristics of an object are frequently central to its significance. The ink pressed into the paper of a letter, the binding wrapped around a first edition book, or the postmarking on an envelope are all physical traits that distinguish and enliven analogue materials (see Fig. 1). Every analogue object in the National Library’s collections carries information, often (but not always) in written form, and these objects also carry information within their own characteristics, form, nature, and provenance. Such characteristics are rooted in, shaped by, or exposed through physical form.
Although we all engage with and consume increasing amounts of digital data, it is also true that we have done comparatively little by way of ending our millennia-old habits of recording and consuming physical information. We still live in a world that is full of analogue information carriers. Books are published, newspapers are printed, and physical postcards are sent. We still put up posters, write lists, and stare at advertising billboards from cars, buses, and trains. We even convert data ‘back’ from digital to analogue, for example, when we print articles or emails. It is obvious, perhaps intuitive, that the tangible nature of physical information carriers gives them particular qualities that cannot be replicated by the power and flexibility of 1s and 0s.
WHY WE DIGITIZE
Despite their durability, popularity, and persistent ubiquity, analogue information carriers have weaknesses. Fortunately, digital technologies are good at addressing many of these weaknesses. From an institutional perspective, there are two key limitations that stem from retention of collections only of physical information carriers. The first is preservation. The second is access.
Physical objects are often more durable than their digital counterparts, thus they can be easier to preserve. Analogue materials rarely require an interface for their information to be extracted. Books (language and, perhaps, print size notwithstanding) can be read without any interface between eyes and the page. The mere preservation of the page maintains the ability of the text to be digested. Humans require, conversely, an interface in order to interpret digital information: it’s no use looking directly at the millions of 1s and 0s that make up a digital image or block of text. In spite of these facts, digitization offers important benefits to collecting organizations’ overarching responsibility to preserve the information for which we are caretakers.
Digitization is necessarily concerned with copying. The process involves transformation: from analogue, non-interface ingestible content, to digital, machine-readable data. At a basic level, therefore, digitization is a patently useful preservation mechanism. If you can digitize an item, you obtain the ability to preserve multiple copies. Unlike re-pressing a printed book, you won’t, through digitization, create a like-for-like format copy of an object. From a tangible value perspective, therefore, you risk the potential for information- and value-loss in the copy. However, the ‘core’ information – the text, the look of the image – is preserved as 1s and 0s. These can easily be replicated over and over again. Digital information, unlike finite physical objects, can be stored in multiple places ‘at once’, protecting information from calamitous events and other potential dangers.
A related preservation benefit of digitization is a reduction in stress, handling, and exposure to which the material object is subjected. If an institution has only one physical copy of an object, in order for the information within that object to be interpreted and used, the object inevitably must face a degree of stress. Books and manuscripts need to be handled, with all the associated pulling, moving and potential for ripping or folding (see Fig. 2). To counter such stresses, almost any appropriately cared-for cultural object can be ‘dark’ stored, locked away from such dangers and threats. However, an outcome of this is that the information, preserved as it is in a tangible sense, remains unpreserved in an intellectual sense. No one can get to the information, because of the way in which the carrier is being protected. There is therefore a preservation dichotomy.
Digitization can address this in part, allowing a preservation and institutional win-win. With a digital surrogate, an organization has a potentially fragile material object and a durable, copyable, portable surrogate. The analogue object can be sent into ‘dark’ storage, better ensuring its physical characteristics are retained for the future. The digital copy, meanwhile, can be retained in circulation, supplanting the original and allowing the intellectual content to be preserved simultaneously to the material object preservation.
Intellectual preservation is associated with the other principal benefit of digitization for cultural institutions: improved access. This is perhaps the most apparent and obvious benefit, particularly for users. Much as a material object can only be protected in a limited sense, by virtue of being tied to a single location at any one time, so too can that object only be accessed in a limited way. Physical access to analogue information carriers may be superior in terms of access to the cumulative total of the information available. Both the intellectual content and the data stored within the physical carrier are available, from an ink impression to a postmark. However, the object itself is available only to one person, or a limited group of people, at a time. Crucially, object and audience must be in the same location. This is a significant limitation on the object’s potential.
The creation of a digital surrogate allows physically and geographically-bounded objects potentially to become accessible to anyone, anywhere and at any time. In this way, digitization allows collections to ‘break free’ from cultural institutions in ways that were not previously viable. Older techniques of preservation and access supplementation, such as the decades-old practice of microphotography (for example, copying newspapers onto microfiche), bring similar preservation and access benefits. Significantly, however, copies made through such processes remain analogue, and largely constrained to certain physical access locations. Digital surrogates and born-digital collections are not so constrained. They can be delivered to large, dispersed audiences, audiences far larger than any institution could realistically attract or accommodate on-site. However, as the next section explores, digital conversion is by no means the final hurdle to entirely free, open and global access.
Cultural institutions face various challenges when digitizing collections. Financing is a significant hurdle, although one that is naturally not unique to this endeavour. Other, more specific hurdles come in the form of selection, standards, storage, and technological sustainability. It’s worth exploring these in turn.
Just as digitization is not a straightforward technical process of ‘scan and move on’, determining what to digitize is not a simple activity. A key challenge in developing appropriate selection criteria is to understand why an institution wants to digitize in the first place. If, for example, the key driver is preservation, then it seems logical to prioritize unique, fragile and at-risk objects. However, these may not be the best collections to start with if wider access and audience development are key priorities. Equally, the very process of digitizing fragile materials, as a preservation technique, can endanger the physical integrity of a work, whether it is a tightly bound volume or a set of crumbling papers. In order to preserve intellectual content through digital capture, institutions are often necessarily guided by the real risk of damage to the material object. To digitize a fragile item is a tough call to make, if the process is likely to have lasting or irreversible negative effects on the original. Selection may be heavily determined, therefore, by usability and (perceived) interest in the content, as well as condition of the material.
Legal issues also present hurdles. Copyright is a comparatively minor challenge to institutions like the National Library, however. There are copyright exceptions, notably the preservation exception in the UK (Copyright, Designs and Patents Act 1988, s.42), which enable digitization for preservation purposes (although enabling access to the digitized, preservation copy is something else again). A more significant legal complexity relates to ownership of the objects, as well as the digital copies and any rights therein. Frequently, an institution’s most significant, fragile, and in-demand collections are deposits (cared for and preserved by the institution but owned by a third party). Squaring the circle of ownership, when seeking to copy huge quantities of material, can be a steep challenge.
Once content has been selected, institutions need to agree and implement suitable standards. This means more than simply capturing high quality digital images, film, or sound files. A pivotal element of any digitization effort is metadata. Improving metadata – the data about data – can become a task almost ad infinitum for institutions with wide and deep collections. Effectively, it is pointless to digitize material without developing appropriate metadata . There is little point in expending time, effort and expense in creating digital copies if those copies cannot be identified, linked, stored or used. However, it’s often impractical to dedicate time to generating highly detailed metadata about every aspect of each object (or, indeed, each ‘capture element’ of each object, such as each page of a digitized book). Therefore, determining standards – of capture quality and of metadata quality – before digitization is essential. These decisions can significantly impact the amount of time and expense required for capture, which can subsequently impact selection, scale and budget.
Once the material for digitization has been selected and the relevant standards determined, there is the comparatively straightforward – and brief – task of capture. Unless an institution plans to divest itself of the originals after capture, storage requirements in fact expand after capture. A shelf or drawer is still needed for the material object(s), even if in cheaper ‘dark’ storage, and servers are now required for the digital surrogates. Capture is likely to be at a high ‘preservation’ standard, which means high demands in terms of digital storage capacity and quality. To comply with robust digital preservation standards, the storage of digital surrogates across multiple locations is often also required, which can impact costs and planning. Unlike one-off matters of selection and standards, storage requirements (and digital preservation) place ongoing obligations on institutions. Once a few million TIFF files are created they must be maintained and serviced into the future. Over-zealous digitization, perhaps in response to a glut of immediate funding, has the potential to lead to future storage headaches. In this respect, storage forms a fundamental, and largely hidden, consideration for institutions when digitizing collections.
A final challenge concerns the sustainability of collections more broadly conceived. This demands taking into account future developments in technology, something that is particularly testing. Sizeable investment goes into digitization, meaning that the viable lifespan of digital surrogates must be considered alongside the pace of technological development. For example, in the current environment the growth and future role of 3D scanning should be considered alongside considerations of more ‘traditional’ 2D capture. How quickly will or might a high quality digitization be seen as a low quality or insufficient copy? Crucially, by digitizing today can an institution be certain it won’t need to re-digitize an item in three or five years? These questions are far harder to address, largely because the answers – inasmuch as there are any – are difficult or impossible to calculate. Nevertheless, for the investment and risks involved, it’s essential that institutions give real thought to the sustainability of both their digitization efforts and their digitization outputs from the outset.
Digitization of cultural heritage and information collections is a many-headed beast. Fundamentally, there is far more to digitization than a simple decision to capture. The payoffs and benefits of digitization for cultural institutions and their users can be significant. Digitization affords the possibility of making physical collections infinitely more accessible and considerably better secured. Conversely, it presents a host of not-inconsiderable challenges. Beyond the obvious limitations of funding and the pressures placed on institutions by introducing major new work streams, digitization also brings more nuanced challenges. Chief among these are considerations of selection, standards, storage, and sustainability.
Overall, digitization is valuable for institutions like the National Library. There is little to be said, irrespective of the challenges, against improved preservation and access, which can come readily from digitization. Allowing analogue, unique collections like ours to ‘burst free’ of their physical premises and constraints without inordinate risk to the original material objects holds out tremendous potential that should not be undersold or underestimated. The National Library’s ambition to be ‘one-third digital’ within the decade is testament to the importance and value of digitization to cultural institutions.
Copyright, Designs and Patents Act 1988 (as amended), available at: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/462194/Copyright_Designs_and_Patents_Act_1988.pdf
Legal Deposit Libraries (Non-print Works) Regulations 2013, available at: http://www.legislation.gov.uk/ukdsi/2013/9780111533703/contents
National Library of Scotland, The way forward: Library strategy 2015-2020 (2015), available at: http://www.nls.uk/about-us/corporate-documents/strategy-2015-2020-text