Johns Hopkins University
Fall/Winter 2007
Vol. 5, No. 1

FEATURES

> OF BYTES AND BOOKS AND DATABASES
  

THE WONDER YEARS

DEFINING WORKS

cover

ON THE COVER »


COVER STORY
adjust type size + -

Of Bytes and Books and Databases

illustrationIn their quest to create the library of the future, Winston Tabb and his band of forward-looking technologists are partnering with scholars to dramatically advance the way research is done.

For French scholar Stephen Nichols, poring over a previously unexplored version of Roman de la Rose is a bit like a zoologist observing a new subspecies. With its tales of courtly love and thoughtful digressions on topics of the day, the story illuminates a world inching toward the Renaissance. And like the maiden who inspired the poet who wrote it, Roman de la Rose entices Nichols and scholars like him—it is the Holy Grail of medieval French texts.

"It's an incredibly rich work," says Nichols, chair of the Department of German and Romance Languages and Literatures. "It is the major vernacular work of the period. It's what Dante is to Italian."

There are 250 unique versions of Roman de la Rose scattered all around the world, penned by various authors and illustrators over a 300-year period. And each handmade version has its own unique story to tell. For decades, humanities scholars like Nichols-limited by geography and the costs of time and travel-have been stymied in their efforts to see and compare how each version differs from the others.

But no more.

Through a decade-long effort by librarians and engineers at Hopkins' Milton S. Eisenhower (MSE) Library, scholars from around the world will soon have online access to nearly 150 versions of the 800-year-old story. The transformation of the antique texts to bits and bytes, a process called "digitization," has electrified humanities scholars-not just those who study and teach language and literature, but also scholars in art, art history, philosophy, and sociology.

"A shock went through medievalists around the world after they saw these texts go online," says Nichols, the James M. Beall Professor of French and the Humanities. "We've been able to take this project that was very small and turn it into a paradigm worldwide for digital scholarship. It has changed the scale for how people think about these things. There will be a similar effort to put Dante's work online for comparative study because of this."

Winston Tabb photoNichols credits Winston Tabb (left) , dean of the university's Sheridan Libraries, with ramping up and accelerating Johns Hopkins' digitizing of texts. It was Tabb who encouraged Nichols three years ago to apply for foundation grants and make arrangements with libraries in France to put dozens of versions of Roman de la Rose online—and not settle for a mere four or five of them. "He scaled up the whole thing," Nichols says.

Since arriving at Hopkins in 2002, the Harvard-educated Tabb has transformed his love of books—which he first experienced as a 10-year-old "book page" at a library in his hometown of Tulsa—into energy that converts words to e-files. With the help of the Internet Archive, a group backed by Microsoft and Yahoo, Hopkins will turn some of its nearly 3 million analog texts into e-books and make them available free to scholars around the world. The Internet Archive project started this summer at Hopkins' Applied Physics Laboratory, when it began digitizing many of the university's special collections.

"All of us are trying to figure out what the library of the future is. There is no such thing as 'a paperless library.' We'll always have books of some kind. We're really trying to create a hybrid."

—Winston Tabb
But devising ways of turning yellowing pages into computer-ready files is not all that Tabb, formerly a high-ranking administrator at the Library of Congress, has to take on. As Hopkins scholars find new ways to use databases, mine information, sort it out, and disseminate it, Tabb and his staff act as partners. They help scholars create and access disparate databases, use them, and store them. Tabb and his crew are key players in the university's digital future—whatever that turns out to be.

"The challenge for me is to be able to predict and imagine what it will take to make this work," says Tabb, a balding, soft-spoken gentleman who, despite his conservative librarian looks, is sparkly and animated. "We have to be supple and flexible. If you were anal about this job, it would drive you crazy."

Tabb pooh-poohs the notion that he has developed a "vision" for the libraries (which include the Garrett Library at Evergreen, the Hutzler Reading Room, MSE, and the Peabody Library). A vision would be too static to be useful, he says. But he has taken the lead on modernizing them as chief of Hopkins' University Library Council, the body that oversees all libraries at the university and the medical institutions.

He taps 30 years of experience in various positions at the Library of Congress, where he was a major player in the American Memory Project, the first large-scale digitization program undertaken in the United States. Since 1994, the project has converted 5 million items—including articles, books, sheet music, and the papers of George Washington and other historical luminaries.

"It has made libraries across the country aware of what could be done," Tabb says.

Besides cutting down on the travel costs of scholars, digitization offers several dramatic advantages to researchers, including:

Availability. Projects like the Internet Archive and another bankrolled by Google aim to link Web users anywhere in the world with materials from some of the top research universities in the United States. Efforts by Johns Hopkins to create datasets and merge disparate ones to make them more useful are also aimed at reaching a worldwide scholarly audience. "In 10 years' time, everything that has been printed will be available digitally," says Tabb.

Speed. Because of its immediacy, the Internet can transmit researchers' most recent findings. More than half of all academic journals are now in electronic form—up from 6 percent in 2001—a boon to scholars seeking the best available information quickly.

Better communication with fellow researchers. Because Web pages are interactive, researchers can write on each other's blogs or attach their own work on a subject to that of other researchers. Scholars such as Nichols tout their ability to use hypertext to link one's own research paper and other relevant information to online documents as particularly valuable. Layers of scholarship from various disciplines on one subject can be collected in one place online, available to anyone who is plugged in to the Internet.

Editing capability. Instead of producing an entirely new paper on a subject, researchers can update or add to an existing one online, then ask their fellow researchers for feedback on the changes.

Comparative scholarship. By viewing a body of work, such as 150 or so versions of Roman de la Rose instantly as one dataset, scholars can more easily make comparisons between items, then immediately share their thoughts with others.

The march to digital at Hopkins and elsewhere has been accompanied by grand slogans for the brave new library worlds that would result from it. Paperless libraries. Libraries without walls. Digital laboratories. To Tabb, they are so much bunk.

"All of us are trying to figure out what the library of the future is," he says. "There is no such thing as 'a paperless library.' We'll always have books of some kind. We're really trying to create a hybrid."

A composite kind of library sounds like a great idea to university presidents who are looking to cut costs. And to scholars, who have come to enjoy the benefits of technology. And especially to students, for whom computers are as ubiquitous as air.

But across the country, university libraries have suffered growing pains as they shift from an emphasis on perfect-bound pieces of paper to a dizzying plethora of digital formats. Each step in modernization brings its own set of headaches, says Thomas J. Mann, author of The Oxford Guide to Library Research.

"It's like trying to pin down a warped piece of linoleum," says Mann. "Flattening a bulge in one area immediately causes other bulges to pop up elsewhere."

The biggest bulge is money. A digitizing scanner, called a Kirtas machine, can run into the tens of thousands of dollars. (Hopkins and the Internet Archive project have offset the cost of one such machine through a foundation grant.) Although electronic journals can produce savings by eliminating the need for shelf and storage space, the cost of e-journals is rising at a rate more than twice that of print journals. Scholarly databases made up of historic collections are particularly expensive.

Another ongoing concern for research librarians is the technological treadmill of the digital world. As soon as everything is saved in one format, a new, better technology emerges. And costly upgrades are only part of the problem: The dirty little secret among research librarians is that digital materials wear out faster than paper.

Meanwhile, Google's efforts to digitize the entire collections of major universities, including those of Michigan and Stanford, have created legal headaches. Many authors and scholars complain that Google is developing a card catalog on steroids—one that violates copyright laws by putting their work online for free. To sidestep this issue, the Internet Archive and Hopkins will scan and make available materials that are solely in the public domain.

A final wrinkle involves students who become lazy scholars-glomming together research materials copied from Web sites at the last minute—thanks to the ease of technology.

"It's a problem," acknowledges David Bell, the Andrew W. Mellon Professor in the Humanities and dean of faculty at the Krieger School. Bell, who teaches history, writes regularly on library modernization issues. "Often, students cut and paste stuff together in two hours. Ordinarily, it would take them 20 hours, and they'd have to think about whatever it is they're writing about."

Tabb and his staff are paid to think their way through these thickets.

Soon after coming to Homewood five years ago, Tabb realized he needed help navigating his way through them. He couldn't answer or anticipate all of the questions that came with huge advances in technology, nor could his librarians.

Sayeed Choudhury photoHe sought out Sayeed Choudhury (right) , a graduate of the Whiting School of Engineering, who began working a library technology job at MSE in 1994—a way for the then graduate student to pay off a loan he had taken out to buy World Cup soccer tickets.

Trained as a civil engineer, Choudhury signed on for library work after noticing that MSE had next to no digital information on natural disasters, his area of study.

Choudhury's first year featured the digitization of the university's Lester Levy Collection—30,000 pieces of popular sheet music from the 19th century—which allowed scholars to view the art and lyrics of songs from the period in one place, as well as help them teach classes in history and music.

Within a few years, Choudhury was named director of the Digital Knowledge Center, a research and development arm within MSE that was created to craft ways for scholars to use and share e-data. "Given that we were the most research-centered organization in the country, it seemed natural we should be out in front on this," he says.

Before long, Tabb saw the need for even more engineering expertise. He required people who understood how to develop systems that would store gargantuan files of information and make them available to scholars from a wide range of disciplines.

Technology now makes it possible for astronomers to view complete series of galactic images from a decade ago and for scholars to see far-flung illustrated poems that go back centuries. But it takes specialized expertise to figure out how to gather the stuff, present it, manufacture the space to store it, and maintain it. Data files often take up terabytes of space—one terabyte equals more than 1,000 gigabytes (enough to collect all 350-plus episodes of The Simpsons)—making storage, particularly in the long term, an expensive and mind-consuming endeavor.

In addition to tapping Choudhury's expertise, Tabb began to recruit Whiting School students and educators from areas outside of library science. Currently, more than half of Tabb's digital collections staff of 10 is made up of engineers, mathematicians, and scientists.

"We're moving from the idea of digitizing collections to creating a wide range of data sets. By doing that, we can help support interdisciplinary research and teaching that might not occur otherwise. We're creating new ways to interact with data."

—Sayeed Choudhury

Choudhury, now an associate director for digital programs at the newly renamed Digital Research and Curation Center, says that his job rarely deals with digitization anymore. "We're moving from the idea of digitizing collections to creating a wide range of data sets," he says. "By doing that, we can help support interdisciplinary research and teaching that might not occur otherwise. We're creating new ways to interact with data."

Tabb notes that digital information, properly conflated and made available, can lead to new avenues of multidisciplinary collaborations among scholars, as has been the case with Roman de la Rose, which has attracted interest from art historians, literary critics, and philosophers, among others. He adds that Hopkins is one of only a handful of universities to practice what he calls "digital curation"—which moves beyond mere preservation by maintaining and adding value to a trusted body of digital information for use now and later.

Perhaps the largest project Choudhury and crew are working on is a digital archive and data set for the National Virtual Observatory. The program, started six years ago by Krieger School astronomer Alexander Szalay, doesn't generate new data. Instead, it's an attempt to make all the astronomy data in the world easy to access. The NVO collects databases of telescopic photos of the skies made by observatories across the globe and around it (on Earth and in orbit) so that researchers can have a seamlessly unified data set—a complete and uninterrupted view of the skies and how they move through time.

Last fall, the Sheridan Libraries used a $185,000 grant from the Institute of Museum and Library Services to collect and preserve the enormous, memory-eating, digital NVO files, so that astronomers worldwide would have access to files that provide a complete spectrum of images of the cosmos and the physical and chemical properties that govern them.

It's a "huge and groundbreaking" undertaking, Choudhury says, one that's not without its logistical headaches. "Sometimes, data sets aren't of interest for 10 years or so. They're huge. They're cumbersome. They cost money to store and preserve. But you just can't throw them out."

And the potential payoff is enormous. With this new access to huge amounts of data from all wavelengths of the electromagnetic spectrum, scientists will be able to better understand the distribution of stars in our galaxy, and how and why stars and galaxies change with time, Choudhury notes. Already, new NVO search tools have found several new brown dwarfs, and astronomers are confident they'll find more rare objects, including faint quasars and gamma-ray bursts.

With the NVO project and others as yet unknown, Tabb and his staff (or their successors) will provide the intellectual heavy lifting behind even more intellectual heavy lifting-the research tools that help other researchers make new connections and discoveries.

"We are here as a support function to the students and faculty," Tabb says. "It's important to remember that and not take things on just because we think they're interesting. Without a faculty champion like Steve Nichols, we won't do a project."

Choudhury adds that creating a new data set often means coming up with something entirely new, something that must be worked through without the aid of a model or guidebook.

"We're trying to build scaffolding, not structure," he says. "We can't predict what scholarship or research will take place. So, the infrastructure has to remain open and flexible."

For his part, Nichols is more than happy that his instigation led to a breakthrough in how humanities research is done. Getting prized old texts online opens up a world of possibilities, Nichols says. Scholars no longer need to work in independent isolation.

"This is the Copernican revolution," says Nichols. "It's changing the way we look at things."

Michael Anft is a freelance writer based in Baltimore.