Can I Just Google That? Orphan Works and Image Recognition Tools
Kerry Patterson, CREATe, University of Glasgow
Kerry Patterson, Can I Just Google That? Orphan Works and Image Recognition Tools, in Andrea Wallace and Ronan Deazley, eds, Display At Your Own Risk: An experimental exhibition of digital cultural heritage, 2016.
INTRODUCTION
The problem that orphan works pose for cultural institutions engaging in digitization initiatives has been well documented in recent years. In 2014 European copyright law introduced a specific copyright exception to help address this issue. Under this exception cultural institutions can digitize and make orphan works available online, so long as they have engaged in a diligent search for the copyright owner. Within the UK, this pan-European exception is supplemented by an innovative Orphan Works Licensing Scheme (OWLS), enabling anyone to apply for a licence to make use of an orphan work whether for commercial or noncommercial purposes. Like the European exception, OWLS is also contingent on conducting a diligent search.
For the cultural heritage sector these represent positive developments, but the burden of diligent search presents problems for mass digitization initiatives. Consider, for example, the Edwin Morgan Scrapbooks, held in the Special Collections department of the University of Glasgow Library. Morgan is one of the most distinguished Scottish poets of the 20th century, although as a younger man he also harboured ambitions to be an artist. His scrapbooks, made between the 1930s and the 1960s, have enormous visual appeal (see Fig.1). Morgan described them as ‘a mixture of autobiography, documentary, and art. I do not think there is anything quite like them.’[1]

Figure 1: Extract from Edwin Morgan’s Scrapbook 12, pp2239c-2240, MS Morgan 917/12. Images of Scrapbook 12 are © The Estate of Edwin Morgan and appear courtesy of The Edwin Morgan Trust and the University of Glasgow Library.
As a digitization project, the scrapbooks – 16 volumes in all – present considerable challenges in terms of copyright compliance. Morgan rarely gives a source for the images he uses, meaning that the scrapbooks contain tens of thousands of images with no information on their origins. That is, the scrapbooks contain tens of thousands of orphan works. When faced with an image with no caption or clue to its context, image recognition technology is an attractive and easy-to-use research option. The UK Intellectual Property Office has recognized the potential usefulness of these tools by including image recognition sites in its Orphan Works Diligent Search Guidelines which accompanied the launch of the OWLS scheme in 2014.[2] This paper explores the features and functionality of some commonly image recognition tools (IRTs), and considers how they might be of use to cultural heritage institutions.
INTRODUCTING IRTS
IRTs allow the user to upload an image which the tool attempts to match with images available online or in its databases. The IPO Guidelines referred to above include only Tineye (www.tineye.com) and PicScout (www.picscout.com), but other sites are available, such as Image Raider (www.imageraider.com). Image search function is also embedded in various web browsers. For example, Google has offered a reverse image search function since June 2011, allowing you to upload an image to be compared to visually similar images. Similarly, Bing also offers an Image Match function.
Both Tineye and PicScout are free and can be used without registration, features that likely influenced their inclusion on the IPO’s Guidelines. For this reason they are attractive to the user who is only searching a few images and doesn’t wish to sign up to a site or have to pay. Tineye is free for noncommercial users and includes extensions that allow for easy searching in a web browser toolbar. The PicScout Platform is aimed at commercial users. Their search tool is designed to ‘enable image buyers to identify and license the images they’d like to use,’ and they have ‘200 million owner-contributed image fingerprints.’[3] As a subsidiary of Getty Images, PicScout would seem an obvious choice when searching for commercial photography.
Image Raider relies on Google, Bing and Yanex to get results. It offers a long term image monitoring service and allows the user to run multiple searches concurrently, features attractive to photographers who wish to monitor potential copyright violations of their work. It uses a credit model, where users can purchase credits or earn credits by tweeting about the site. Unfortunately I was never able to get it to properly perform searches for any images at all, despite trying across a period of several weeks. As such, my observations in this essay draw upon my experiences of using Google, Tineye and PicScout only.
IMAGE SECURITY
Cultural institutions carrying out diligent search will rightly be concerned about copyright and image security when uploading images from their collection, and this will no doubt influence their choice of search tool. On this issue, different IRTs adopt different approaches. For example, Google’s Help Forum states: ‘When you search using an image, any images or URLs that you upload will be stored by Google. Google only uses these images and URLs to make our products and services better.’[4] This somewhat vague statement will certainly be undesirable for some users of the service, particularly for a mass digitization project.
Compare, however, the approach adopted by Tineye:
Images uploaded to TinEye are not added to the search index, nor are they made accessible to other users. Copyright for all images submitted to TinEye remains with the original owner/author.
Search images submitted by unregistered users are automatically discarded after 72 hours. Links to these searches will stop working after 72 hours, unless a registered user happens to save the same image.[5]
Bing’s privacy statement does not specifically mention what happens to images,[6] and I was unable to find information relating to this on PicScout or Image Raider.
PARTIAL AND CROPPED IMAGES
So how useful are these search tools? The results, when searching for orphan images from the Scrapbooks, were variable, especially when dealing with partial or cropped images. Within the Scrapbooks, Morgan often cropped down images from their original state in newspapers, magazines and books. These irregular-shaped items tend to decrease the likelihood of an uploaded image search yielding beneficial results, although
identification of partial images is still possible. One example of a successful search is this image of an oil painting, taken from Scrapbook 12 (Fig. 2).

Figure 2: Extract from Edwin Morgan’s Scrapbook 12, image from p2239c, MS Morgan 917/12. Images of Scrapbook 12 are © The Estate of Edwin Morgan and appear courtesy of The Edwin Morgan Trust and the University of Glasgow Library.
Despite the fact that Morgan had cropped the image, Google Images and Tineye were both able to point to sources to identify the cutting showing the centre third section of the oil painting Villa Doria Pamphili, Rome (Souvenir d’une Villa) 1838-39 by Alexandre Gabriel Decamps (1803-60). Naturally, the key to the success of the search tools is the fact that Decamps’ painting can be found multiple websites online. The more ubiquitous the image is online, the greater the chance of identifying it using an IRT. PicScout, however, was unable to identify the painting.
The kind of hit rate you can expect to get from image search will, of course, vary. In road-testing these tools, I selected two pages at random from Scrapbook 12, incorporating a total of 14 viable images. From this small sample, I found that Google provided the best results, followed by Tineye, with PicScout unable to provide anything at all. But even with Google the success rate was very modest, identifying only two images from my sample, both of which were 20th century artworks by a well-known artist. That said, an example of a useful outcome came from the image search of an advert that originally featured in The New Statesman. One result identified the issue in which it originally featured as containing spoof publisher adverts and in-jokes, which was not evident when the advert was removed from that context. This demonstrates that image recognition tools might offer benefits beyond the identification of a possible rights owner: they can help us contextualise and better understand the material within our collections.
However, there are limits to the scope of these tools. They do not have universal reach to every image available on the internet. (Even Google Images falls short on this front.) For example, searching for information on a black and white portrait photograph of a boy (Fig. 3), I found nothing through image recognition tools. However, where the IRTs failed, serendipity prevailed. Looking through Twitter for new accounts to follow, I happened to see the same image being used as a Twitter avatar. This lead to a conversation with the user which revealed the name of the book in which he found the image, and subsequently the 1950s magazine source from which Morgan likely cropped the portrait.

Figure 3: Extract from Edwin Morgan’s Scrapbook 12, p2241, MS Morgan 917/12. Images of Scrapbook 12 are © The Estate of Edwin Morgan and appear courtesy of The Edwin Morgan Trust and the University of Glasgow Library.
IRTS AND DILIGENT SEARCH
As IRTs are an IPO-approved method for engaging in diligent search in accordance with the EU and UK orphan works regime, I was interested to explore the IPO’s response to the use of these tools as the primary means of diligent search in an application made to the OWLS scheme. For this purpose I chose an original photographic work, rather than one cut from a magazine or other source. It is a black and white studio photograph of a male bodybuilder-type figure. The photograph likely dates to the 1950s (the Scrapbook in which it was found was made between 1954 and 1960) but there is no supporting material to give any information about its origins. I used Google, PicScout, and Tineye to search for the image with no results, and then submitted an application to the OWLS scheme on the basis of just those three searches.
The response of the IPO was that the requirements of the scheme would be satisfied by a further search with three additional sources: the Association of Photographers, British Association of Picture Libraries, and British Institute of Professional Photographers. This involved my sending an email to each contact and did not result in identification of the work. This result should be encouraging to cultural heritage institutions who intend to apply to OWLS, to know that a diligent search carried out using these tools can form a significant part of their application.
SEARCHING FOR DAYOR
In preparing this paper for the Display At Your Own Risk exhibition, it seemed only appropriate that I should deploy the tools I have been discussing on the exhibition photographs of the material surrogates. Would these search tools link those digital images back to the cultural institution that created the digital surrogates and made them available online? What other information might be revealed? A sample of eight works generated some interesting results, with details provided in an Appendix to this paper.
Similar to my experience with the Morgan Scrapbooks, Google Images emerged as the tool that was most likely to generate links to the holding institution’s website, although more often than not Wikipedia was the number one source identified for the images concerned. By contrast, on Tineye, personal blogs featured highly as a source of images (and typically not linked back to organisational source) as well as Shutterstock images. Interestingly, on Shutterstock, the user ‘Everett-Art’ asserts copyright claims over two of the images I selected from the exhibition – Da Vinci’s Mona Lisa and Jan van Eyck’s Portrait of Giovanni Arnolfini & His Wife – as well as many other famous paintings by Van Gogh and more (Figs 4 and 5).[7]

Figure 4: Shutterstock account for Everett – Art, accessed: 22 April 2016.

Figure 5: Shutterstock account for Everett – Art, accessed: 22 April 2016.
While PicScout was unable to identify half of the images in the sample, the images it did identify were all linked to photo agency sources such as Getty Images and the Press Association. As PicScout is a subsidiary of Getty Images, perhaps this should not surprise. In one instance, however, the search returned ‘Friends of San Diego Architecture’ as the rightsholder for van Eyck’s Portrait (Fig. 6).[8] Registering with their site allows you to download a free low res image, which is watermarked: ‘Copyright Protected.’ Van Eyck’s painting, of course, is part of the collection of The National Gallery, London.

Figure 6: Friends of San Diego Architecture, accessed: 22 April 2016.
CONCLUDING THOUGHTS
Reverse image search technology can certainly be beneficial to cultural heritage institutions. Image recognition tools have a role in play in helping identify any very ‘obvious’ works which are still in copyright. By this I mean those which are usually by well-known creators and with potentially litigious rightsholders. Works which are out of copyright are also more likely to be found, as they present less risk for users to use and share online. Of course, simply finding an image may not answer the copyright questions you have about the work, but it is a start.
Image searches work much less well on anything from a more obscure source or which has been cropped too much from its original state, but additionally they can reveal interesting supplementary information about the item. In the case of the Scrapbooks, the majority of images are fairly unremarkable photographs, or sections from photographs, taken from the contemporary press. These type of images are unlikely to be identified using IRTs at the current time.
There are also practical considerations to bear in mind when using these tools. Preparing images to upload for search may involve considerable effort that is not scalable when engaging in a mass digitization project. For example, I have estimated that the Edwin Morgan Scrapbooks contain an estimated 42,000 orphan works, a significant proportion of which are images. Engaging in any form of search – whether technically assisted or not – is simply impractical in terms of both time and resource. Institutions should also consider which tool is the most appropriate tool for its needs. Although I found that Google was the most likely to provide results, individuals and organizations may have understandable reservations about uploading large amounts of images to Google Image Search due to security concerns, or to other sites where the terms are unclear.
Ultimately, in the case of the Scrapbooks, the nature of the de-contextualised works means that in some cases, IRTs form one of the main ways of conducting an-IPO approved diligent search. The technology is continually developing and improving, and it seems likely that the usefulness of image recognition tools for cultural heritage institutions engaging in digitization and rights clearance activities will only increase in the future.
[1] Letter from Edwin Morgan to his publisher Michael Schmidt (15 December 1988).
[2] Available at: https://www.gov.uk/government/publications/orphan-works-diligent-search-guidance-for-applicants (accessed: 4 April 2016).
[3] Available at: http://www.picscout.com/about-us/faqs/ (accessed: 8 April 2016)
[4] Available at: https://support.google.com/websearch/answer/1325808?hl=en (accessed: 8 April 2016)
[5] Available at: http://www.tineye.com/faq#uploading (accessed: 8 April 2016)
[6] Available at: https://privacy.microsoft.com/en-gb/privacystatement/ (accessed: 8 April 2016)
[7] Available at: http://www.shutterstock.com/en/portfolio/search.mhtml?gallery_id=2713483&page=1&gallery_landing=1 (accessed: 22 April 2016).
[8] Available at: http://friendsofsdarch.photoshelter.com/image/I00007O1uCGiPgsk (accessed: 22 April 2016).