The New York Times (NYT) has detailed plans to undertake an ambitious project to digitise its historical photo library. The archive library, affectionately known inside the NYT as the morgue, began clipping and saving articles in the 1870s.
The library now holds millions of photographs, along with tens of millions of historical news clippings, microfilm records and other archival materials. The technology partner selected for this project is Google Cloud.
“We’ve always known that we were sitting on a trove of historical photos and now, cloud technology allows us to not only preserve this archival source, but easily search and pull photos to provide even more historical context,” said Monica Drake, assistant managing editor, The New York Times. “Ultimately, this digitalisation will equip Times journalists with useful tools to make it easier to tell even more visual stories.”
“Google Cloud technologies like Cloud Storage, Cloud Pub/Sub, and Cloud Vision API are helping to preserve this priceless history and give journalists a new way to search, access, and analyse millions of historic photos and give them new life,” said Brian Stevens, chief technology officer, Google Cloud. “Cloud technology is allowing The Times to protect one of their most unique assets migrating from steel filing cabinets to a cloud-based platform where journalists can bring visual storytelling to a whole new level.”
NYT plans to use the Google Cloud Vision API’s to enable machine learning algorithms. This will staff to identify, classify and automatically organise its photos.
A significant challenge
This will be a significant challenge not just to capture the images but also to index them so that they can be accessed and used. One of the first problems will be the fragility of many of the older images. They will need to be handled so as to not create any more damage than they have already suffered.
There are also challenges when it comes to scanning different formats. This is not just sticking a negative on a lightbox or putting a photo in a scanner. It may require some interesting engineering to capture the images.
Once scanned, many of the older images will need to be digitally restored. This will include repairing the damage that age has wrought on the images and even adding colour. The latter, in particular, would give a new vision on the past.
Indexing and attributing the images is also going to be a problem. Many of the older images have not been catalogued. Even where they have, the indexes are patchy and incomplete. This means that there will have to be time spent researching the images. Unfortunately, the NYT plans to keep these away from the public eye. That’s a mistake. In circumstances like this. taking advantage of cloud sourcing help makes sense.
The archive has been called the “history of the world through the eyes of the New York Times”. You have to admire the monumental task ahead of the NYT and Google teams which will pull it off. The archive has over 100 years of history in photographs, many of which will never have been seen before being digitised.
It is easy to write this off as just another Google customer win. It is not. The major newspapers all have vast photo and image libraries waiting for digitisation. The success of this project will closely watched as others think about the future.