Выбрать главу

The OCLC database, on the other hand, was, until quite recently, intolerant of deviation. Authors get married, they receive honorific titles, they die and have a year put to the right of the hyphen. Or suddenly The New York Times starts spelling Mao Tse-tung “Mao Zedong.” In the face of all this bewildering variability, the object of a catalog, as Charles Cutter himself suggested in his Rules for a Printed Dictionary Catalogue, is to group together, or collocate, all the works by a given writer, and all the editions of a given work by a given writer, and all the works about a given writer’s work, and all the biographies of a given writer, in the proper groups and subgroups, rationally.

For instance, we would prefer (this example is from a search of Harvard’s HOLLIS, which I did in October 1993), when attempting to view the books written by Alfred Tennyson, that they weren’t arbitrarily distributed under three separately alphabetized, unpunctuated headings: TENNYSON ALFRED TENNYSON BARON 1809 1892 and TENNYSON ALFRED TENNYSON 1ST BARON 1809 1892 and TENNYSON ALFRED TENNYSON 1809 1892. Moreover, it would be nice if the first work listed as by TENNYSON ALFRED TENNYSON BARON 1809 1892 (in response to the command “Find Au Tennyson”) were in fact a work by Alfred Tennyson, and not a work by Tuningius, Gerardus (1566–1610), called Apophthegmata graeca, latina, italica, gallica, hispanica (“Imperfect: title-page slightly mutilated”), that happens to be autographed on the front endpaper by Tennyson. And we would prefer that the second work listed as by Alfred Tennyson were not The Kraken: for solo trombone, by Deborah Barnekow, 7 pp. (1978). (Ms. Barnekow is right, though: if Tennyson’s sea monster played an instrument, it probably would be the trombone.) It would be nice, too, if Neuronal Information Transfer, co-edited by Virginia Tennyson, didn’t intrude between several books published by the Tennyson Society and a tempting entry for a work called “Tennysoniana”—an entry that, when I accepted it, plucked me from the Tennyson list and dropped me into a list of twenty-three books by SHEPHERD RICHARD HERNE 1842–1895, none of which was “Tennysoniana.” (Many of these oddities mysteriously disappeared shortly before this article went to press, but there are thousands more. A quick check of HOLLIS on March 21, 1994, revealed that Bolingbroke, Villiers de L’Isle-Adam, Edward Bulwer-Lytton, and Bernard Berenson all have works wrongly segregated under at least three different forms of their names. Charles George Lamb’s Alternating Currents and Charles W. Lamb, Jr.’s The Market for Guayule Rubber come between editions of Charles Lamb’s Essays of Elia. And 462 records for works by Thomas Macaulay are separately alphabetized under eight versions of his name.) I have no doubt that Dale Flecker believed what he was saying when he told me that “the machine catalog is in almost no cases worse and in most cases better than the card catalog was.” But in my experience, five minutes with any online catalog is sufficient time to uncover states of disorder that simply would not have arisen in what library administrators call a “paper environment.”

When I visited OCLC, some of the staff freely admitted to me that card catalogs currently do a better job of collocation than online catalogs do. “We’re only partway there,” Barbara Strauss, then a senior product support specialist at OCLC, told me. (Ms. Strauss “knows cataloging like your tongue knows the inside of your mouth,” one of her colleagues said.) Her boss, Martin Dillon, the director of OCLC’s Library Resources Management Division, recently told an interviewer that browsing the OCLC database using “keyword indexes, author indexes, and subject-term indexes sheds a harsh light on misspellings and errors of all types.” A random sample in one 1989 OCLC study found a hundred and ten separate records for Tobias Smollett’s The Expedition of Humphry Clinker in the database, nearly half of which were potential duplicates, kept separate by minuscule variations and typos. You have to feel sorry for the sophomore accounting major who is hired as a part-time “copy-cataloger” by his university’s library, given a week’s training, and handed an old edition of Clinker left to his university’s library by an alumnus; you have to forgive him when, having drifted for a time through some of the seemingly endless, code-disfigured series of records, looking for a hit, he swears, gives up, and decides that it’s faster just to make up another record on the fly, further cluttering the system with the hundred and eleventh “edition” of The Expedition of Humphry Clinker.

In the past few years, fortunately, OCLC has done a lot of automated cleanup. (The cleanup has to be automated, for, laments Martin Dillon, “when databases get as large as ours the contribution of individual humans is severely limited. The task is so large that no practical number of humans could handle it.”) OCLC’s “DDR” software—“Duplicate Detection and Resolution”—which was first installed in 1991, compares two records at as many as fourteen points and decides whether they stand for the same book, and thus should be fused, or not: if they differ only by an ellipsis (…) at the end of a truncated subtitle, say, or if one calls the publisher “Wiley” and the other calls it “John Wiley & Sons,” the two become one. Common but hard-to-see typos like “Great Britian” and “Untied States” no longer force fictional duplicates. Over six hundred thousand redundant records are gone as a result of this work. And OCLC is now refining authority-control software that becomes more experience — crisscrossed by more specific links among separate forms of the same person’s name, for example — the more it works through new data. Millions of orphaned records have been united since 1990. (There have been a few embarrassments along the way, naturally: “Madonna” was globally altered by OCLC to “Mary, Blessed Virgin, Saint” as part of an authority-control routine — a change that, before it was corrected, caused problems for libraries interested in cataloging the recent work of Ms. Ciccone.)

But no matter how clever and successful OCLC’s new quality-control efforts are, they mainly benefit those libraries that have not yet converted their catalogs. Countless old errors and inconsistencies are out there still, doing indolent mischief to scholarship in the local online catalogs of the libraries that “reconned” early on. Having paid millions in fees to OCLC for the use of its database, university libraries must now scrape up the cash to pay for authority-control software just so that their online catalog will perform the minimal tasks that Charles Cutter expected of the card catalog as a matter of course. The University of Chicago, even as it pays OCLC’s RETROCON department to convert cards relating to the classics, philosophy, and American literature (at a cost of around two dollars a card), is contemplating paying Blackwell North America, Inc., a database processor, at least a hundred and fifty thousand dollars for a onetime authority-control grooming. (Errors and inconsistencies that appear after Blackwell is finished will persevere, of course.) The CARL Corporation, in Denver, Colorado, charges in the six figures to license its authority-control application to a major university library. This can all suddenly seem very unfuturistic and sad — sad because the cost of technology now consumes nearly 30 percent of the typical American library’s budget, according to one 1992 estimate, forcing it to cut book purchases, reference staff, and skilled catalogers, and sad because the technology that libraries are actually buying turns out to be remedial software meant to correct the hash that earlier technologies have made of information once safely stored on paper.