Выбрать главу

But digital storage, with its eternally morphing and data-orphaning formats, was not then and is not now an accepted archival-storage medium. A true archive must be able to tolerate years of relative inattention; scanned copies of little-used books, however, demand constant refreshment, software-revision-upgrading, and new machinery, the long-term costs of which are unknowable but high. The relatively simple substitution3 of electronic databases for paper card catalogs, and the yearly maintenance of these databases, has very nearly blown the head gaskets of many libraries. They have smiled bravely through their pain, while hewing madly away at staffing and book-buying budgets behind the scenes; and there is still greater pain to come. Since an average book, whose description in an online catalog takes up less than a page’s worth of text, is about two hundred pages long, a fully digitized library collection requires a live data-swamp roughly two hundred times the size of its online catalog. And that’s just for an old-fashioned full-text ASCII digital library — not one that captures the appearance of the original typeset pages. If you want to see those old pages as scanned images, the storage and transmission requirements are going to be, say, twenty-five times higher than that of plain ASCII text — Lesk says it’s a hundred times higher, but let’s assume advances in compression and the economies of shared effort — which means that the overhead cost of a digital library that delivers the look (if not the feel) of former pages at medium resolution is going to run about five thousand times the overhead of the digital catalog. If your library spends three hundred thousand dollars per year to maintain its online catalog, it will have to come up with $1.5 billion a year to maintain copies of those books on its servers in the form of remotely accessible scanned files. If you want color scans, as people increasingly do, because they feel more attuned to the surrogate when they can see the particular creamy hue of the paper or the brown tint of the ink, it’ll cost you a few billion more than that. These figures are very loose and undoubtedly wrong — but the truth is that nobody has ever underestimated the cost of any computer project, and the costs will be yodelingly high in any case. “Our biggest misjudgment was4 underestimating the cost of automation,” William Welsh told an interviewer in 1984. “Way back when a consultant predicted the cost of an automated systems approach, we thought it was beyond our means. Later, we went ahead, not realizing that even the first cost predictions were greatly underestimated. The costs of software and maintenance just explode the totals.”

Things that cost a lot, year after year, are subject, during lean decades, to deferred maintenance or outright abandonment. If you put some books and papers in a locked storage closet and come back fifteen years later, the documents will be readable without the typesetting systems and printing presses and binding machines that produced them; if you lock up computer media for the same interval (some once-standard eight-inch floppy disks from the mid-eighties, say), the documents they hold will be extremely difficult to reconstitute. We will certainly get more adept at long-term data storage, but even so, a collection of live book-facsimiles on a computer network is like a family of elephants at a zoo: if the zoo runs out of money for hay and bananas, for vets and dung-trucks, the elephants will sicken and die.

This is an alternative route to a point that Walt Crawford and Michael Gorman make very well in their snappy 1995 book Future Libraries: Dreams, Madness, and Reality. It would take, Crawford and Gorman estimate, about 168 gigabytes of memory, after compression, to store one year’s worth of page-images of The New Yorker, scanned at moderate resolution, in color; thus, if you wanted to make two decades of old New Yorkers accessible in an electronic archive, you would consume more memory than OCLC uses to hold its entire ASCII bibliographic database. “No amount of handwaving, mumbo-jumbo, or blithe assumptions that the future will answer all problems can disguise the plain fact that society cannot afford anything even approaching universal conversion,” Crawford and Gorman write. “We have not the money or time to do the conversion and cannot provide the storage.”

E-futurists of a certain sort — those who talk dismissively of books as tree-corpses — sometimes respond to observations about digital expense and impermanency by shrugging and saying that if people want to keep reading some electronic copy whose paper source was trashed, they’ll find the money to keep it alive on whatever software and hardware wins out in the market. This is the use-it-or-lose-it argument, and it is a deadly way to run a culture. Over a few centuries, library books (and newspapers and journals) that were ignored can become suddenly interesting, and heavily read books, newspapers, and journals can drop way down in the charts; one of the important functions, and pleasures, of writing history is that of cultural tillage, or soil renewaclass="underline" you trowel around in unfashionable holding places for things that have lain untouched for decades to see what particularities they may yield to a new eye. We mustn’t model the digital library on the day-to-day operation of a single human brain, which quite properly uses-or-loses, keeps uppermost in mind what it needs most often, and does not refresh, and eventually forgets, what it very infrequently considers — after all, the principal reason groups of rememberers invented writing and printing was to record accurately what they sensed was otherwise likely to be forgotten.

Mindful of the unprovenness of long-term digital storage, yet eager to spend large amounts of money right away, Lesk, Lynn, Battin, and the Technology Assessment Advisory Committee adopted Warren Haas’s position: microfilm strenuously in the short term, digitize from the microfilm (rather than from originals) in the fullness of time. Turn the pages once was the TAAC’s motto. Microfilm has, Stuart Lynn noted in 1992, higher resolution and superior archival quality, and we can convert later to digital images at “only a small increment5 of the original cost” of the microfilming. He sums up: “The key point is, either way, we can have our cake and eat it, too.”

As ill luck would have it, the cake went stale quickly: people just don’t want to scan from microfilm if they can avoid it. It isn’t cheap, for one thing: Stuart Lynn’s “small incremental cost” is somewhere around $40 per roll — that is, to digitize one white box of preexisting microfilm, without any secondary OCR processing, you are going to spend half as much again to convert from the film to the digital file as it cost you to produce the film in the first place. If you must manually adjust for variations in the contrast of the microfilm or in the size of the images, the cost climbs dramatically from there. And resolution is, as always, an obstacle: if you want to convert a newspaper page that was shrunk on film to a sixteenth of its original size, your scanner, lasering gamely away on each film-frame, is going to have to resolve to 9,600 dots per inch in order to achieve an “output resolution” of six hundred dots per inch. This is at or beyond the outer limits of microfilm scanners now.

And six hundred dots per inch doesn’t do justice to the tiny printing used on the editorial pages of nineteenth-century newspapers anyway. In an experiment called Project Open Book, Paul Conway demonstrated that it was possible to scan and reanimate digitally two thousand shrunken microfilm copies of monographs from Yale’s diminished history collection (1,000 volumes of Civil War history, 200 volumes of Native American history, 400 volumes on the history of Spain before the Civil War, and 400 volumes having to do with the histories of communism, socialism, and fascism) — but Conway was working from post-1983, preservation-quality microfilm made at the relatively low reduction-ratios employed for books. “We’ve pretty much figured out how to do books and serials and things up to about the size of, oh, eleven by seventeen, in various formats, whether it’s microfilm or paper,” Conway says. “We’ve kind of got that one nailed down, and the affordable technology is there to support digitization from either the original document or from its microfilm copy. But once you get larger than that, the technology isn’t there yet, [and] the testing of the existing technology to find out where it falls off is not there.” Conway hasn’t been able to put these scanned-from-microfilm books on the Web yet. “The files are not available now,” he wrote me,