Information Technology in Humanities Scholarship. . . (ACLS Occasional Paper No. 37)

American Council of Learned Societies
Occasional Paper No. 37

Information Technology in Humanities Scholarship:
Achievements, Prospects, and Challenges—
The United States Focus

II. INFORMATION TECHNOLOGY AND SCHOLARSHIP

Introduction

Humanists have used computers since the 1950s, but until the 1980s usage could be described as occasional. Initially computers were perceived as number-crunchers, and most data were numerically coded for input and analysis. These data were often subject to purely statistical analysis, and eventually the "quantitative paradigm" took hold in some humanities disciplines, such as history and archaeology. The gradual realization that computer hardware and software could manipulate symbolic as well as numeric data had a remarkable impact on the kinds of projects for which computers were used. Today, most scholars recognize that data need not be numeric to find a place in humanities information processing.

To some extent the discussion of computer-based projects is best viewed from the vantage of the raw materials of scholarship, which include text, data, images, and sound. At least two additional resources must be considered: electronic communications, which permit the transmission of information; and combined sources (hypermedia/multimedia), which provide a platform for working with several different types of raw material simultaneously. The distinction between text and data, too, is somewhat artificial. Historians and archaeologists often work with kinds of information that are most effectively collected, manipulated, stored, and output in structured form. Some of the new tools for working with textual data have blurred the distinction between text and data, but many different types of information are still better handled as discrete units by statistical and database management software. To assess the impact of new technology on humanities scholarship, we concluded that it would be helpful if the developments were viewed in terms of the type of information involved.

If the use of computers for word processing, electronic mail, and simple searching of online databases or catalogs is excluded, the computer-based work done by humanists can be divided into five categories: the provision of general resources, such as library catalogs, dictionaries, and bibliographies; the retrospective conversion of manuscript or printed sources into machine-readable form; the creation of specific research tools, such as databases and image banks; the extraction of summary data from larger electronic resources, such as population censuses and tithe surveys; and the computer-assisted investigation of hypotheses and testing of models. The first three categories provide the basic resources for further research in the humanities and lay the foundation for the last two categories.

The use of computers to investigate hypotheses that must be tested with large and complex datasets has led to advances in archaeology, history, literary studies, and philosophy. Sir Anthony Kenny's study of the Aristotelian Ethics, which demonstrated that books ascribed to both the Nicomachean Ethics and the Eudemian Ethics actually belonged to the latter text and not the former, would have been difficult, and perhaps impossible, if carried out by manual investigation. Computers have allowed humanists to extend the agenda of inquiry. While computer applications assist in establishing the identities and relationships of primary sources, computerized texts and databases are also vital in rethinking the history of ideas and particular works. Keith Baker traced the evolution of opinion publique in the French Enlightenment using the American and French Research on the Treasury of the French Language (ARTFL) text collection: he developed a tentative chronology of the use of the term in eighteenth-century France and showed how the traditional associations of opinion with uncertainty, instability, and disorder gave way to the rational authority of opinion publique in the late eighteenth century.

The projects described in this report illustrate the overall impact of new technology on scholarship. The coverage is not comprehensive; the projects cited are offered as significant examples of what has already been accomplished and what already is underway or planned for the future. Inevitably there are gaps. The discussion of sound, for instance, touches on sound cognition by musicologists, but excludes research into the understanding of text by linguists and literary scholars. The computerization of materials in museums, libraries, and archives is also not treated exhaustively. Major institutions, such as the Library of Congress, the National Archives and Records Administration, and research universities, have computerization programs, but smaller institutions often lack the resources, equipment, and training necessary to undertake such programs. We discuss in the concluding section some ways to address problems facing these institutions. Surveys by Michelson and Rothenberg and by Gould, the European Science Foundation's survey edited by Genet and Zampoli, and the Humanities Information Review Panel report in the United Kingdom (Feeney and Ross) have charted some of this work.

Electronic Communication

Developments in communications technologies and facilities have had a significant impact on the way scholars use and exchange information. The Internet came to the forefront as a powerful communications system for the higher education community in the 1980s. Since then, its use has grown dramatically and has extended beyond electronic mail and day-to-day communication to the exchange of papers, the launch of collaborative ventures, and the provision of online access to resources. In this context it is important to emphasize that the use of electronic mail (e-mail) and online information services requires a personal computer or intelligent terminal on every scholar's desk. In humanities departments such provision is by no means assured. Improved access is essential to ensure that all scholars have access to the growing range of Internet services.

The use of e-mail has led to the formation of discussion lists and bulletin boards. Discussion lists allow users to exchange ideas with other scholars and to distribute information to groups in their field. Users can "post" information to electronic bulletin boards and log in to read what interests them.

HUMANIST, operated jointly by the Center for Electronic Texts in the Humanities at Rutgers University and Kings College London, is a forum for scholars working in humanities computing or interested in the applications of technology to the humanities. The list includes over 1,000 registered users, a rather small percentage of the scholarly community. INTERDIS <INTERDIS@MIAMIU.MUOHIO.EDU>, another general discussion group for the humanities, offers announcements, conference notes, and informational queries rather than discussion of issues.

Most scholars use bulletin boards and discussion groups specific to their field of study. Specialized discussion groups and bulletin boards tend to promote discussion rather than distribute announcements. H-Net is an extensive series of discussion lists in history, with a rich variety of subfields offered for specialists in those areas. PHILOS-L <PHILOS-L@UK.AC.LIVERPOOL> was created to distribute information about jobs, conferences, and the occasional query, but users report it often hosts lively scholarly debates. ARCH-L <ARCH-L@LISTSERV.TAMU.EDU>, run from Texas A & M University, is both a discussion list for archaeologists and a distribution service for data and software.

Other discussion groups are interdisciplinary, seeking to reinvigorate scholarship in traditional fields through new modes of interaction. PHIL-LIT <phil-lit-request@tamvm1.tamu.edu> is a forum for queries, information sharing, and previews of articles and reviews. The group promotes the exchange of ideas, owing no allegiance to a particular school or style of criticism, and is open to anyone with a serious interest in philosophical interpretations of literature, literary investigations of classic works of philosophy, philosophy of language, and literary theory. PSYCHE-D <PSYCHE-D@iris.rfmh.org> aims to encourage discussion and the exchange of information on research in cognitive science, neuroscience, philosophy, and other related disciplines with the aim of understanding the nature, function, and underlying mechanisms of consciousness. Discussion lists and bulletin boards may also focus on the work of an individual author or theorist, thereby bringing together scholars in different disciplines. While DESCARTES-L <DESCARTES-L@bucknell.edu> draws its audience mainly from scholars in philosophy, other discussion lists centered on individual theorists, such as DELEUZE-GUATTARI, attract scholars from almost every field in the humanities. AUSTEN-L <AUSTEN-L@vm1.mcgill.ca> serves as a discussion group for scholars of English literature, history, and gender studies, but also provides valuable interaction between scholars and interested lay readers.

In addition to discussion lists and bulletin boards, Internet Relay Chat (IRC) groups and multiple-user spaces (MUDs or MOOs) afford simultaneous textual exchange. One of the earliest projects of this nature is PMC-MOO, which hosts a real-time virtual discussion of recent articles in the electronic journal Postmodern Culture and of general issues related to contemporary cultural theory. Recently, the VRoma Project garnered funding from NEH to enhance students' study of Roman culture. VRoma is a multi-user, networked environment built upon a spatial and cultural metaphor of ancient Rome; faculty and students can meet, interact, collaborate, hold classes, and access databases, texts, images, and teaching materials.

Discussion groups, bulletin boards, and IRC are not without limitations. Impermanence is one of the more intractable problems of the Internet, and services often migrate, wither, or disappear altogether. At times a small group or an individual will tend to monopolize a discussion, which undermines the dialogic aspect of networked communication. The anonymity associated with bulletin boards and discussion groups can give participants a license for mischief that face-to-face interaction might inhibit; on the other hand, anonymity can also lead to more honest and creative exchanges.

Networked information, however, consists of more than just discussion lists, bulletin boards, and Internet Relay Chat. The Internet provides immediate access to electronic archives, automated library catalogs, and numerous information services. The Library of Congress, government publications distributed by the General Printing Office (GPO), congressional reports, full-text books and journals, shareware, census data, other numerical databases, and projects designed specifically for the scholarly community (such as the Dartmouth Dante Project <Telnet://baker.dartmouth.edu:1835/>)—all are available to online academic researchers. Items once available only in print, such as the Institute for Scientific Information (ISI) Arts & Humanities Search database, are now accessible over the Internet. As more and more institutions have established a presence on the Internet, the number of information sources has expanded enormously, giving rise to the concept of the "virtual library."

Indeed, as the Internet gained in speed and richness of available resources, librarians, scholars, and information technology specialists began to create new applications of considerable significance to humanities scholars. These included new publishing opportunities; the digitization of picture and slide collections and of sound archives; the scanning and delivery of high-quality facsimiles of manuscripts, archives, and rare books; and the production of multimedia courseware and interactive learning sessions.

These applications came into being sooner than most had expected with the appearance of the World Wide Web. The Web, as it is usually termed, allows for multimedia file transfer and access to millions of sites around the globe. A scholar can perform research on the Web using search engines such as Lycos, Infoseek, and Webcrawler. He or she can consult digital image collections from the Vatican or the Louvre, digital text archives at major universities, film reviews, online journals, bibliographies, information on scholarly societies, syllabi for thousands of academic courses, moving images, online catalogs, dissertations, and other works pertinent to the humanities.

Specialized Web search tools are rapidly developing. Argos, for example, is the first peer-reviewed, limited-area search engine (LASE) on the World Wide Web. It has been designed to cover ancient and medieval cultures. Quality is controlled by a system of hyperlinked Internet indices which are managed by an international group of scholars serving as associate editors of the project. Argos is managed by the University of Evansville.

TORGO, maintained by the Department of English at the University of California at Santa Barbara, is another search tool designed to facilitate and promote scholarly research on the Web. TORGO searches abstracts of online journals and papers which have been screened for quality and usefulness. Web-Cite, a commercial venture, also collects information from online journals and returns search results by electronic mail.

Guides to Internet resources in the humanities, usually created by scholars in a given field, are another effective means of locating Web resources. Voice of the Shuttle, maintained by Alan Liu at the University of California at Santa Barbara, is one of the most comprehensive and popular guides to humanities resources on the Internet in fields ranging from anthropology to women and gender studies.

Other Internet subject guides focus on a particular field. Literary Resources on the Net, a large collection of references to Internet sites in English and American literature, was one of the earliest specialized subject guides for the Internet and is maintained by Jack Lynch, a doctoral student at the University of Pennsylvania. REESWeb at the University of Pittsburgh is a comprehensive index of electronic resources on Central Asia and the former Soviet Union.

In truth, however, the Web is still a fairly raw technology. It points to resources in an elegant and even flashy way, but still suffers from considerable drawbacks which include missing, defective, or outdated links; difficulties in ascertaining the authority behind most Web sites; the misleading titles of many sites; the burgeoning incursion of commercial ventures onto the Web; the sheer amount of material available; and the lack of direct access to the texts and other resources in some databases. Because the Web is a highly accessible mass publishing environment, many academics have embraced this new medium. The concept of an arena for public discourse appeals to most scholars. However, on the level of day-to-day work, the openness of the Web has led to a saturation of communication space that at times seems to erase the possibility of coherent discourse.

The potential of networked-based presentation for humanities scholarship is immense, but, at present, open to question. The Web today is in fact a more passive, anonymous tool than originally envisioned by its creators, Swiss physicists who saw the Web as a means for intensive collaboration. In order to transform the Web into a genuinely collaborative space as opposed to a "surfable," alternate world, a community of scholars must be formed to bring to bear upon the Web the evaluative standards and professional discernment that are employed in more traditional scholastic endeavors.

Text

Humanities instruction and scholarship generally endeavor to reinterpret and reevaluate our textual legacy in an evolving understanding of larger historical, social, and cultural contexts. Since work in the humanities has been largely text-based, electronic text projects are becoming increasingly important as academic resources. Electronic texts include scholarly editions, corpora of contemporary writings and transcriptions, reference works, and instructional hypertexts. The most prolific electronic text producers combine the expertise of scholars, librarians, and humanities computing consultants. Electronic texts from manuscripts and printed sources are now available in different formats and from a number of different venues. Some projects endeavor to provide comprehensive coverage while others focus on a particular corpus of texts. Some are combined with integrated software for analysis and can be accessed only by means of that software; others offer plain text files which can be processed by software chosen by individual scholars.

Generally speaking, though, the primary purpose of a text archive is to ensure that machine-readable texts remain available to the academic community. Because the texts come from so many different sources, they vary considerably in format, accuracy, and type of coding. Discussion is ongoing on establishing standards of text encoding so that these archives remain universally pertinent.

Major text archives include the Electronic Text Library at the University of Virginia; the Center for Electronic Texts in the Humanities, jointly sponsored by Rutgers and Princeton Universities; the Humanities Text Initiative at the University of Michigan, Ann Arbor; and ALEX: A Catalog of Electronic Texts on the Internet, supported by North Carolina State University. In the United Kingdom, the Oxford Text Archive was established in response to the growing number of electronic texts prepared by individual scholars, major research projects, and publishers.

Advances in computer storage and retrieval have made the construction and use of text corpora much easier, and this has in turn widened their usefulness for research. A corpus differs from an archive in that it consists of a collection of texts gathered according to particular principles for a specific purpose. Several of the largest electronic text projects support inquiry across several traditional fields, facilitating interdisciplinary work. One of the best known, the Thesaurus Linguae Graecae (TLG), hosted by the University of California at Irvine, provides in machine-readable form the work of 3,157 authors who wrote in Greek from the time of Homer to CE 600 with historiographical, lexicographical, and scholastic texts from the period between CE 600 and 1453. It has been claimed that the 57 million words in the corpus represent 99 percent of the surviving Greek literature. The TLG accommodates different approaches to textual analysis, and this has encouraged the production of a number of text-handling packages designed specifically for work with this corpus. The advantages of electronic corpora, first demonstrated among classical scholars, are now perceived by scholars in other disciplines. The Dictionary of Old English (DOE) project <healey@doe.utoronto.ca>, compiled at the University of Toronto, has converted the whole corpus of Old English texts to machine-readable form and is now available on magnetic tape or for searching online via the Internet.

Another corpus of significant value is the American and French Research on the Treasury of the French Language (ARTFL), maintained at the University of Chicago and co-sponsored by the Centre National de la Reserche Scientifique (CNRS). The project, originally initiated by the French government for the creation of a new dictionary of the French language, developed a corpus totaling some 150 million words, representing a broad range of written French. In most cases standard scholarly editions were used in converting the text into machine-readable form, and the data contain page references to these authoritative editions. ARTFL makes accessible approximately 2,000 digitized texts from the medieval period to the twentieth century. They include literary works, political tracts, philosophical theses, and technical writings, and such genres as novels, verse, theater, journalism, essays, correspondence, and treatises.

The American Verse Project, of the Humanities Text Initiative, assembles an electronic archive of volumes of American poetry spanning the eighteenth to the twentieth centuries. The full text of each volume is converted into digital form and coded in Standard Generalized Markup Language (SGML) using the Text Encoding Initiative (TEI) Guidelines. The archives, available over the Internet, may be searched in a variety of ways. A second goal of the project is to provide a service to scholars by advancing their ability to use Web documents in their work. Currently, the Internet does not have well-established mechanisms for authors seeking to integrate complete texts, or parts of texts, into their scholarship. This project will allow someone writing about Dickinson, for example, to embed links in his or her electronic text pointing the reader to various poems, stanzas, or lines that are part of the project without having to replicate the material within his or her own document.

Smaller projects, focused in a particular subject area, are also flourishing. The Dartmouth Dante Project database <Telnet://baker.dartmouth.edu:1835/> combines computer technology with more than 600 years of commentary tradition on Dante Alighieri's major poem, the Divina Commedia. This gives scholars easier access to the full texts of important critical works, many of which are rare and difficult to obtain. Likewise, the Wesleyan Confucian Project produces electronic texts of Confucian and Confucian-inspired texts from the eleventh century (CE) to the present. The recent improvements in optical character recognition (OCR) technology and encoded non-Latin characters has made possible digitization projects for Chinese, Japanese, Arabic, Hebrew, and Russian works.

Linguistics is another area where electronic text corpora are being developed on a large scale. The results are proving invaluable not only in lexicography and the preparation of language reference works, but also in the development of speech recognition for computers, machine translation, computer-assisted language learning, and "intelligent" word processing software.

Because human language is so complex, computer programs for processing it must be fed enormous amounts of data—including speech, text, lexicons, and grammars—to be robust and effective. Shared resources permit replication of published results, support fair comparison of alternative algorithms or systems, and permit the research community to benefit from corrections and additions provided by individual users. The Linguistic Data Consortium (LDC) is an open consortium of universities, companies, and government research laboratories supported by grants from the Advanced Research Projects Agency (ARPA) and the National Science Foundation. It creates, collects, and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC's host institution. Its collections include texts, audio files of telephone speech, and audio and video files of broadcast data, using such computer-based linguistic technologies as speech recognition and understanding, optical and pen-based character recognition, text retrieval and understanding, and machine translation.

Many archive and corpora projects derive from academically funded work, but it is an indication of the growth of this area that commercial publishers are now investing in electronic texts, assisting with online publication, and creating and distributing CD-ROM, diskette, or print versions of academic electronic text projects. The Brown University Women Writers Project is creating a textbase of pre-Victorian women's writing in English. This is a sizable body of material which has been largely inaccessible to scholars and students. Despite its considerable historical and literary interest, its lack of availability has seriously distorted our view of the role of women in Western literary and cultural history. This project explores the educational advantages of integrating students into a technology-intensive interdisciplinary research project. Graduate and undergraduate students learn the techniques of literary text encoding, scholarly editing, book production, and traditional and electronic publishing by working in close collaboration with humanities computing specialists and literary scholars. The textbase is used to support a variety of products in different formats, including publication of selected works through a 30-volume print series and eventual online publication with Oxford University Press.

Many other commercial publishers are exploring the possibilities of new media. Chadwyck-Healey Ltd. offers a number of machine-readable full-text databases on CD-ROM, including the Database of African-American Poetry, which contains over 2,500 poems written by African-American poets between 1760 and 1900; Goethes Werke, an electronic version of the Weimar Edition of Goethe's works, originally published between 1887 and 1919; and the Corpus des œuvres de philosophie en langue française, which collects major works of post-Renaissance French philosophy and is developed under the direction of Michel Serres of l'Académie Française. Recently Chadwyck-Healey has assembled its English and American literature databases for inclusion in an online service, Literature Online (LION). LION brings together nine of Chadwyck-Healey's full-text literary databases. Together these comprise more than 208,000 poems, 4,000 plays, 290 works of prose fiction, and 21 versions of the English Bible. This service combines texts, electronic journals, discussion groups, reference works, bibliographies, and catalogs in English and American literature; it also provides hypertext links to relevant resources. Also included will be electronic bookshops for new and antiquarian books and journal articles, and a printing service for instructors who require bound copies of texts they have found in the database.

Scholarly effort and commercial funding have also gone into the provision of reference works in electronic form. The recently completed Oxford English Dictionary (OED) Second Edition on CD-ROM, published by Oxford University Press, makes use of specially designed search and retrieval software. This electronic version of the 20-volume dictionary contains 60 million words and allows researchers to carry out sophisticated searches not feasible using only the print version. For example, the system supports both browsing and searching through quotations, definitions, and etymologies. Researchers interested in etymology can find all words derived from a particular language, such as that of the Blackfoot Indians. A search combining date, author, publication, or word will find specific quotations. The Stanford Encyclopedia of Philosophy, sponsored by the American Philosophical Association and the Philosophy Documentation Center at Bowling Green State University, represents a dynamic database of information on philosophers to which scholars from all over the world contribute.

Machine-readable text offers further possibilities which have been realized in the concept of hypertext. Hypertext has emerged as a way of exploring and manipulating text that is non-linear, non-numerical, and conducive to allowing readers to discover and create their own paths through material. It provides a means of linking texts in an associative way using "nodes" and "links," an electronic version of footnotes and cross-references. The links can be preserved to permit others to follow predefined paths or to define their own paths. This technique is particularly suited to reference works and collaborative writing.

The Victorian Web, for example, is the Web version of Brown University's Context 61, which serves as a resource for courses in Victorian literature. The materials originally derive from Context 32, a hypertext developed with Intermedia software that provided contextual information for English 32, a survey of English literature from 1700 to the present. The Victorian Web includes hyperlinked information about individual authors, Biblical typology, Romantic and Victorian timelines, feminism and literary theory, public health, race and class issues, and anti-Catholic sentiment in Victorian England. This collection of materials on nineteenth-century British culture continues to grow as students and faculty at Brown University and other institutions contribute new essays, questions, and images.

Electronic text can also be used together with text analysis software—whether integrated or separate—to carry out complex textual analysis, providing the researcher with both microscopic and macroscopic views of the text, from small-scale features of an individual work to searches across an entire corpus. These tools can be used to create critical editions, carry out stylistic comparisons or lexical analysis, and attribute authorship. An example is the examination by Gerard Ledger of the works of Plato. By taking the simplest possible feature of language—the occurrence of particular letters of the alphabet—and subjecting it to a complex multivariate analysis, Ledger was able to draw new conclusions about the authenticity of dubious dialogues and the chronology of the entire corpus.

Donald Foster recently received national attention for his work in establishing, with wide critical acceptance, the author of a little-known Renaissance elegy to be William Shakespeare. The hermeneutic facet of his research cannot be minimized, but the stylistic analysis performed by the computer was integral to his conclusions. Foster performed a frequency-of-use match of words in the elegy against the full corpus of Shakespeare's writings and other contemporary authors. From this analysis Foster could conclude that selected words and their contextual phrasing were unique to Shakespeare; enough of these appeared in this "lost" elegy to strongly suggest the Bard's authorship.

Computer-based textual analysis can also assist scholars in the production of critical editions. The Canterbury Tales Project, in which Sheffield and Oxford Universities collaborate, draws upon the traditional methods of textual criticism and new software to reveal unseen patterns in the multiple manuscripts of Chaucer's Canterbury Tales. The main objective of the project is to produce a library of resources which can assist in the creation of critical editions of the Tales, language studies, and historical and cultural analysis. The manuscript tradition for The Canterbury Tales includes some 83 manuscripts, some of which are fragmentary. Although the Tales are well understood, study of the entire manuscript tradition was not feasible before the advent of computer systems capable of handling significant quantities of data.

Data

The use of computers for handling data is well established in historical and archaeological research, which require the scholar to extract data from source material, formalize them, and organize them for analysis. In these areas the individual dataset, collected and organized for specific projects, is predominant. While some projects do rest on the analysis of textual information, structured data is the basic source for many studies. History projects have spearheaded the creation and dissemination of electronic datasets, though these datasets are becoming increasingly important to scholars working in cultural studies, linguistics, and gender studies.

One example is the Dumbarton Oaks Hagiography Project <mcgrathm@doaks.org>, which assembles information on Byzantine culture and society drawn from Greek hagiographical texts. The database includes over 100,000 data entries. Additionally, text files of the vitae provide immediate access to the chapter of the text from which data has been extracted. The database is accompanied by brief introductions to each vita with summary biographical and chronological information.

Other computer datasets have emerged out of counterparts in print publications. Ethnologue, sponsored by the Summer Institute of Linguistics, is a catalog of the world's languages. The Summer Institute of Linguistics specializes in work with languages spoken by the world's lesser-known linguistic groups by developing programs in partnership with host governments, universities, churches, and local people to promote linguistic research, language development, literacy, translation, and other educational and research projects. The database includes information on number of speakers, location, dialects, linguistic affiliation, and other sociolinguistic and demographic issues. This database represents the twelfth edition of the print version of Ethnologue. The data may be browsed by country or language family, and includes interactive maps, a language distribution chart, and search capabilities.

Another category of very large online datasets is the combined library catalog. One of the largest in the world is the Online Computer Library Center (OCLC), based in Dublin, Ohio, and containing over 32,000,000 bibliographic records. A rival project is the Research Libraries Information Network (RLIN), based in California, with over 77,000,000 records from over 250 sources describing books, journal articles, dissertations, and rare materials in over 365 languages. These undertakings are of fundamental importance to libraries around the world, with access by individual scholars increasingly a part of each organization's marketing strategy.

An important question for the providers of large datasets and archival material is how the database should be organized to optimize access for users. The automation of archive catalogs has led in some cases to a reconsideration of the way this material is designed and prepared. The Berkeley Finding Aid Project, funded by a grant from the Department of Research and Development and sponsored by the Commission on Preservation and Access, provides finding aids in a standard, platform-independent electronic form. Finding aids are inventories, registers, indices, or guides to collections held by archives, manuscript repositories, libraries, and museums. The Berkeley finding aids provide detailed descriptions of collections, their intellectual organization, and—at varying levels of analysis—of individual items in the collections. Access to the finding aid allows scholars to explore the content of a collection and determine whether it is likely to satisfy his or her research needs. The Berkeley project makes finding aids for a number of American archives available online. The project is developing sophisticated searching and sorting capabilities so that information may be retrieved according to individual need.

Some specialized datasets are commercially available on optical storage media. One example is the CD-ROM version of the Eighteenth-Century Short Title Catalogue published by the British Library. Those who study eighteenth-century history, literature, and culture can identify the current location of printed material by particular authors or having particular titles from among the 305,000 records. They can also investigate where works were published and which printers handled particular categories of books.

The Trans-Atlantic Slave Trade Database, of the W.E.B. Du Bois Institute for Afro-American Research at Harvard University, aims to disseminate computerized data on most of the slave voyages that sailed from Africa to the Americas from the sixteenth century to the late nineteenth century. The project has gathered records from 75 percent of all the slave ships sailing under British, French, Spanish, and Dutch flags between 1662 and 1860. The data details mortality, age, sex, crew membership, conditions on slave ships, duration of voyages, nature of slave resistance, business organization of slave traders, and age and physical characteristics of vessels. When the project is completed, data will be published on CD-ROM by Cambridge University Press. The core set of more than 20,000 transatlantic slave voyages will constitute the largest data source for the long-distance movement of peoples before the twentieth century. Refined demographic data on the volume of the trade (and thus of pre-colonial African populations) and on the spatial distribution of African peoples in the Atlantic world will allow scholars to assess more accurately questions of African state formation, agricultural and ecological change, African cultural survivals, and the development of the Atlantic economies.

In historical research, scholars who use computers usually create their own datasets. A large number of machine-readable historical data files have been created over the years, not only by historians, but also by geographers, anthropologists, genealogists, family history groups, and sociologists. The Arts and Humanities Data Service (AHDS) was recently established in the United Kingdom by the Joint Information Systems Committee to facilitate the creation and use of electronic resources in the arts and humanities. To achieve this aim, the AHDS will collect, describe, catalog, preserve, and provide subject-specific user support for digital resources that are created as a product of scholarly research; facilitate collaboration between arts-based user communities and the commercial or non-profit sectors; and promote standards and guidelines for the creation, description, preservation, and scholarly use of electronic information.

Scholars engaged in historical research make use of an increasingly wide range of software packages in order to assemble, organize, analyze, and display their source material. Statistical packages, in use for many years, have been augmented by programs for relational database management, nominal record linkage, mapping, and hypermedia. Developments in software that allow data to be input in a way that preserves their complexity and irregularities facilitate source criticism, which is fundamental to historical study. The British Academy's Prosopography of the Byzantine Empire <UDLC052@uk.ac.kcl.cc.elm>, developed and housed at King's College London, was established to create a database of all documented persons in the Byzantine Empire from CE 641-1260. The information includes names of the individuals, their responsibilities, first and last date mentioned, sex, career titles, topographical details, sources of the information, and a short article about each person. The inquiries made possible by the database have surprised even its creators; it is possible, for example, to find all bishops whose brothers were also bishops, or the names of all individuals who appeared at the court of a particular emperor, along with information on religious sects, languages, and patrons.

Visualization of information has long been a valuable explanatory tool among historians and archaeologists. Computer mapping systems in general and geographical information systems in specific have increased the options for generating presentations of this kind. The Alexandria Project represents a consortium of researchers, developers, and educators, spanning the academic, public, and private sectors, who are exploring a variety of issues related to a digital library for geographically-referenced information. All of the objects in the library will be associated with one or more regions on the surface of the earth. The Alexandria Digital Library is an online information system that provides access over the World Wide Web to a subset of the holdings, as well as other geographic datasets. It is sponsored by the Map and Imagery Laboratory in the Davidson Library at the University of California, Santa Barbara

More recently, visualization techniques have also been applied to textual information. The Language Visualization and Multilayer Text Analysis Project, sponsored by the Cornell Theory Center, created a prototype tool for the study of language and discourse phenomena in three-dimensional space. The tool allows a researcher to conduct interactive research in the structures and typologies of discursive formation in large samples of textual data. Using this resource, scholars can develop new techniques for reading and interpreting text space.

Images

One promising facet of digital technology is that it makes visual, textual, and numeric information both more accessible and easier to handle. Archaeologists, art historians, geographers, and historians are making increasing use of digital image processing, image enhancement, and graphics. Current projects incorporate three-dimensional modeling, enhanced data identification, high-resolution scanning techniques, and online exhibitions of visual primary sources.

The rapidly-developing field of computer graphics is beginning to play a key role in archaeological data processing. Such systems bring to archaeology new mechanisms for analyzing data. Using graphical representations, archaeologists can explore a range of different configurations and interpretations of evidence, and thus take a "second look."

Computer modeling systems are also being used to build reconstructions of historical spaces. The Rossetti Room is a Virtual Reality Modeling Language (VRML) model of the studio of the pre-Raphaelite painter and poet Dante Gabriel Rossetti. Users can select one of a series of paintings to be placed in the room, which is then created from existing files by means of a simple program. The virtual room recreates the work environment of Rossetti. Each picture provides a link back to the two-dimensional HTML page in the archive. This project is a part of Jerome McGann's The Complete Writings and Pictures of Dante Gabriel Rossetti: A Hypermedia Research Archive at the University of Virginia's Institute for Advanced Technology in the Humanities. The archive is a structured database holding digitized images of Rossetti's works in their original documentary forms. Rossetti's poetical manuscripts, early printed texts—including proofs and first editions—drawings, and paintings are stored in the archive in full color. The materials are marked up for electronic search and analysis and supplied with full scholarly annotations and notes.

In addition to three-dimensional analysis and solid-modeling, images captured by digital cameras or by digitizing conventional photographs offer scholars new tools. Museums and art galleries have recognized digital technology as a way of providing enhanced access to their collections and have established a number of computerized systems as permanent or temporary displays. These applications and other educational packages incorporating digitized images have been designed with interactivity as a prime concern.

The Museum Educational Site Licensing Project (MESL) is a collaboration of seven collecting institutions and seven universities that define the terms for the educational use of digitized museum images. MESL participants include the Fowler Museum of Cultural History; the George Eastman House; Harvard University Art Museums; the Library of Congress; the Museum of Fine Arts, Houston; the National Gallery of Art; the National Museum of American Art; American University; Columbia University; Cornell University; the University of Illinois at Urbana-Champaign; the University of Maryland at College Park; the University of Michigan; and the University of Virginia. This group develops a model educational site license, evaluates procedures for the collection and distribution of museums' digital images and information, and assesses the impact of this distribution in both technical and economic terms. At the end of this experiment, the participants will propose a broadly-based system that could support ongoing distribution and educational use of museum images and text.

Most major museums host online exhibitions—for example, the Smithsonian and the Whitney—but now smaller university museums are able to share images of their collections over the network. The Oriental Institute Virtual Museum showcases the history, art, and archaeology of the ancient Near East. An integral part of the University of Chicago's Oriental Institute, which supports research and archaeological excavation in the Near East, the Museum exhibits major collections of antiquities from Egypt, Mesopotamia, Iran, Syria, Palestine, and Anatolia. This museum uses a series of panoramic movies to guide visitors through a virtual tour of the galleries.

The techniques of image storage and analysis developed in scientific disciplines, including crystallography, astronomy, and medicine, clearly have potential for the humanities. The Image Understanding Environment (IUE), developed with the support of NASA, provides a rich set of tools for working with imaging data that opens new possibilities to humanists. Standards are currently under development for recording high-resolution digital images of works of art and encoding information about these images, whether through textual description or metadata in the image file itself. The Digital Image Access Project, sponsored by OCLC and the Coalition for Networked Information, is developing standards for image archiving, compression, representation, and description. The American Memory Project of the Library of Congress was one of the first sites to implement developing standards in digital image creation and transmission, offering background papers and technical information that describe the digital imaging process.

Images accompanied by textual data are often the only way to provide flexible and dynamic access to collections. SPIRO, the visual online public access catalog for the University of California at Berkeley's Architecture Slide Library, comprises approximately 200,000 35mm slides. SPIRO can be accessed using either Image Query, a powerful database retrieval package, or the World Wide Web. ImageQuery 2.0 was developed by Berkeley's Information Systems and Technology and the Museum Informatics Project; it permits research by 10 different fields: period, place, creator name, object name, view type, subject terms (from the Art and Architecture Thesaurus), source of image, creation dates, classification number, and image identification number. Digital surrogates of the slides help users identify the exact image for which they are searching. SPIRO currently contains 16,000 records linked to images. The Web version of SPIRO supports research by five fields: period, place, creator name, object name, and subject terms.

Similarly, a.k.a., developed by the Getty Information Institute, is an experimental searching tool that uses the Institute's vocabularies to provide enhanced access to databases of cultural information. The service allows users to search through thousands of records from several Getty databases, including the Avery Index to Architectural Periodicals , the International Repertory of the Literature of Art (RILA), and the Provenance Index Databases. The a.k.a. uses the Art & Architecture Thesaurus (AAT) and the Union List of Artist Names (ULAN) vocabularies to enhance searches. The Clearinghouse of Image Databases at the University of Arizona, which includes the IMAGELIB listserv archives, is currently the most comprehensive listing of image projects in libraries and archives.

The digitized image, combined with computer technology, offers the art historian the same kind of opportunity for retrieval and manipulation as that enjoyed by the classicist working with the Thesaurus Linguae Graecae. Most access to images is still via textual description of the image, but this is not necessarily the most useful means of access. Even on modest computers, programs manipulating digitized descriptive codes derived from existing art reproductions can make accurate distinctions between images that appear similar to the human eye. But this is only one possible application of such a tool; a more useful function will be in the searching of archives of visual images to find similar and related compositions. This type of system offers possible applications in the areas of identifying, referencing, classifying, and analyzing images. As hardware with large storage devices, faster processors, and enhanced resolution and graphics capabilities becomes more widely available, art historians will have increased opportunities to perform research of this kind.

Sound

Digitized sound, not unlike images and data, can be created, manipulated, and analyzed by computer. Computers can generate or modify musical signals and serve as a control device for electronic musical instruments. Electronically-generated music, or electroacoustic music, revolves around the Musical Instrument Digital Interface (MIDI). Originally MIDI allowed electronic keyboards and associated sound processors to interact in the studio or on the concert platform. Since then it has emerged as a protocol for information exchange through the development of software which allows performers to record and edit performances and composers to build up complex pieces of music. Since MIDI files mainly contain information about sequences on peripheral devices, it is possible for sophisticated sequencing software to run on low-cost microcomputers. Composers thus have access to powerful tools to create, edit, mix, process, and filter sound and can record their compositions to disk.

Performance of electroacoustic music rarely involves computation taking place on stage, although projects based at the Massachusetts Institute of Technology (MIT) and the Institut de Recherche et Coordination Acoustique/Musique (IRCAM) in France are exploring this avenue with the aid of artificial intelligence (AI) techniques. The goal of the MIT project is to produce what may be called an automatic accompanying machine: in a live performance a human performer can respond to the computer's output, but the computer cannot reciprocate. For a computer to embody the characteristics of a performing musician, it must be able to analyze sound into coherent and discrete pitches of determinate duration and spectrum (timbre), and then use that analysis of the acoustic signal as a basis for interpreting a musical signal. It must generate a plausible model of what the live performer is doing, and then use this information to structure its own performance. To build such a machine requires an understanding of the psychoacoustical and psychological processes involved in listening to and performing music. The MIT team has had to incorporate some "engineering," rather than cognitive, solutions in their system: this indicates the limitations of present understanding of the cognitive processes underlying musical perception and performance.

Advances in compression schemes for audio files have opened possibilities for the networked delivery of music performance. The Center for Research in Electronic Art Technology (CREATE) at the University of California at Santa Barbara supports online production of electroacoustic and computer music and electronic media art technology. The CREATE facility consists of multi-track digital recording and monitoring studios, a collection of multimedia workstations, and a network of stereo and quadraphonic digital synthesis studios. The main focus of its research is wide-area network-based media.

The ability to deliver sound files across a network enhances online scholarship in music theory, as well. The Journal of Seventeeth Century Music provides a refereed forum for scholarly studies of the musical cultures of the seventeenth century. The areas of concern include historical and archival studies, performance practice, music theory, aesthetics, dance, and theater. The Journal emphasizes audio examples which are presented as monaural audio files and can be played on most systems with audio hardware and software. Likewise, Ethnomusicology Online features peer-reviewed articles; reviews of audio, video, and multimedia titles; and enhanced Ph.D. dissertations, all accompanied by illustrative audio files and multimedia.

Combined Sources/Multimedia/WWW

Hypermedia systems, also known as multimedia, use hypertext techniques to link other media—images, graphics, animation, sound, and video—to text. Information linked in this way becomes a flexible reference tool, effectively functioning as an automated encyclopedia, but with the advantage of giving visual proximity to conceptual connections. Hypermedia systems are interactive, allowing users to find their own path through the material; they have obvious potential for both research and learning.

One of the best established hypermedia products is Perseus, a multimedia database designed to aid the study of archaic and classical Greece. Perseus expands the ways in which ancient Greek literature, history, art, and archaeology can be investigated. In addition to complete literary texts, Perseus contains a lexicon, morphological databases, an extensive archaeological catalog with accompanying illustrations, several atlases, site plans, an illustrated catalog of Greek vases, a classical encyclopedia, and an historical overview of the history of the classical period. Like many projects (including the American Verse Project, the Early American Fiction Archive, and the Dante Gabriel Rosetti Archive), Perseus is available in two versions: a large collection on CD-ROM and a smaller subset of resources available without charge on the Web.

Another important multimedia project is American Memory: Historical Collections for the National Digital Library, undertaken by the Library of Congress. American Memory consists of primary source and archival materials relating to American culture and history, combining film (early moving images of New York City); photographs (from the Office of War Information and the Civil War); sound recordings (World War II newsreels); and manuscripts (the WPA Federal Writers' Project during the 1930s). The project encompasses more than 210,000 images from over 20 of the Library's collections. Similarly, the Valley of the Shadow: Two Communities in the American Civil War, a project sponsored by the Institute for Advanced Technology in the Humanities and the National Endowment for the Humanities, draws on resources ranging from government documents, newspapers, photographs, personal papers, maps, rosters, and government documents. The collection examines the communities, daily life, politics, and religious and racial conflicts surrounding the Civil War. The collection also traces the course of events from impending crisis to the secession and is projected to continue through the war and into the postbellum era.

Hypermedia projects have the potential to become powerful teaching tools. The Labyrinth, created at Georgetown University, provides organized access to electronic resources in medieval studies, including an electronic library of texts and images, online forums, professional directories and news, online bibliographies, an online "university" of teachers and scholars available for electronic conferencing, and an archive of pedagogical tools. This project will not only provide an organizational structure for electronic resources in medieval studies, but will also serve as a model for similar, collaborative projects in other fields of study.

The Electronic Beowulf Project takes advantage of advanced imaging technology to assemble a database of digital images of the Beowulf manuscript and related texts. The archive already includes fiber-optic readings of hidden letters and ultraviolet readings of erased text in the early eleventh-century manuscript; full electronic facsimiles of the indispensable eighteenth-century transcripts of the manuscript; and selections from important nineteenth-century collations, editions, and translations. Future additions will include images of contemporary manuscript illuminations and material culture, and links with the Toronto Dictionary of Old English project and with the comprehensive Anglo-Saxon bibliographies of the Old English Newsletter.

Retroconversion Projects

Libraries, in cooperation with scholars, are engaging in a number of large-scale retroconversion projects to enhance access to rare texts or other materials that are brittle, damaged, or not easily accessible. Preserving unique and rare materials has been a principle function of research libraries but a costly part of their mission. Security concerns also require that physical access to rare books be restricted. The Internet offers the possibility of greatly expanded access to rare and delicate materials. And while computer images of rare book pages can serve only as pointers to the original artifacts, the combination of searchable text and high-resolution color images provides a detailed and flexible view of the material to teacher and scholar alike.

The Victorian Women Writers Project at Indiana University produces highly accurate transcriptions of rare, often unpublished, literary works by British women writers of the late nineteenth century, encoded using SGML. The works include anthologies, novels, political pamphlets, and volumes of poetry and verse drama along with bibliographical descriptions. While the Victorian Women Writers Project focuses on converting the text, the Making of America Project (MOA) creates high-quality digital page images of important materials on the history of the United States. The Cornell University and University of Michigan libraries are cooperating in the initial phase of MOA by selecting complementary journals and monographs to ensure full capture of all significant information. The University of Virginia Library's Early American Fiction Archive will provide both text files and page images of 582 volumes of early American fiction. The project also presents the opportunity to study scholarly use of rare books and their computer simulacra, and to determine the extent to which electronic texts of rare books can serve scholars.

While many retroconversion projects focus on books, others incorporate collections of historical data. The University of Michigan Papyrus Collection spans a broad range of materials: textbooks, lectures, private notes and accounts, letters, invitations, decrees of kings and emperors, official petitions, contracts and agreements of every kind, purchase orders, checks, medical recipes and prescriptions, receipts, tax lists and declarations, court proceedings, and other legal texts. The documents exist on papyrus fragments currently preserved in glass casings; the most important are being digitized at high resolution for archival purposes and at lower resolution for the Web.

JSTOR, originally a project of the Andrew W. Mellon Foundation, is an independent organization that is building a comprehensive archive of scholarly journals. The project has important potential cost savings for libraries, and will be closely studied for its impact on the scholarly research process in the humanities and other disciplines.

Microfilm, long used as a means of preserving brittle books, is also under consideration for digitization. Yale University Library's Project Open Book, supported by the Commission on Preservation and Access and the National Endowment for the Humanities, is a research and development program that is exploring the feasibility and costs of large-scale conversion of preserved material from microfilm to digital imagery. Project Open Book aims to create a 10,000-volume digital image library and to enhance access to the converted volumes through the creation of document structure and page number indices. Furthermore, the project will provide greatly enhanced access to these materials over the Internet.

Archival holdings are the focus of some retroconversion projects. The Center for Electronic Texts in the Humanities sponsors several pilot projects, among them an experiment in encoding archival materials in order to link the finding aid with the full text of documents. The William Elliot Griffis Collection, part of Rutgers University Special Collections, is a major grouping of nineteenth- and early twentieth-century print, manuscript, photographic, and ephemeral materials relating to the early history of Japan-U.S. relations. This pilot project demonstrates the feasibility of a networked electronic access tool for collections of rare documents, according to the guidelines of the TEI (Text Encoding Initiative) and of the EAD (Encoded Archival Description). The finding aid is used as a "frame" within which some of the rare manuscript materials held in the collection may be accessed. Listings in the finding aid are provided with hypertext links to electronic editions of the manuscripts, two views which in turn provide the essays and images in the manuscript pages.

Archives of photographs and slides form the basis of one of the largest retroconversion projects. The Research Libraries Group's Digital Image Access Project (DIAP) involved nine academic institutions in partnership with Stokes Imaging of Austin, Texas. This consortium experimented with an online image management system to find economical ways to catalog and index large photographic collections. Approximately 9,000 images related to the theme "The Urban Landscape" were digitized and included in the project database along with related index records. Collections include many images from the Avery Library at Columbia University, images of the Bronx, construction photographs of the Empire State Building, and architectural drawings from the Aviador Collection.

Original and Creative Works

A growing number of writers, artists, and scholars are turning to the Internet and the Web for creative expression and using hypertext and hypermedia as methods of publishing original works. Michael Joyce's hypertext novel Afternoon was among the first to engage the reader specifically through hypertext, where the software was integral to the experience and interpretation of the novel. A number of writers, sometimes referred to as the "Eastgate School" after the name of the prominent hypertext publisher, are creating hypertext fictions. Other recent works in this genre include Stuart Moulthrop's Victory Garden and Jane Yellowlees Douglas' Patchwork Girl.

The Electronic Poetry Center of the State University of New York at Buffalo serves as a central gateway to electronic resources in poetry and poetics. A collaborative effort of the University Libraries, the Faculty of Arts and Letters, and the Poetics Program, the Center represents one of the first electronic publishing projects joining the academic and creative communities. The Center makes a wide range of contemporary experimental and innovative poetry available online. Resources include collections of texts, a directory of contemporary poets, small press announcements, poetry events, spoken word archives, electronic journals, and gallery areas. The Center also sponsors online collaborative writing spaces. Selected online texts are cataloged and made available nationally and internationally through major bibliographic databases.

Today many poetry journals, including RIF/T and the Mississippi Review, are published online using hypertext links, audio archives, and accompanying images. In the classroom, students contribute to hypertext novels, dairies, and journals that form new kinds of participatory original works (Slatin). This virtual colloquium engages students in ways that differ substantively from traditional concepts of classroom, course assignments, and the iterative accumulation of knowledge.

Educators recognized the value of hypertext and multimedia for pedagogic purposes during the 1980s. This gave rise to such earlier projects as the focus on Shakespeare at Stanford, the emblem book at the Memorial University of Newfoundland, and the project for Robert Browning at Cornell. Perseus, mentioned earlier, also began at this time and has evolved with the technology in compelling ways. Hypertext and multimedia allow for deeply nested fields of associations; a line in Browning's "Caliban Upon Setebos" can be connected to other works of Browning and Shakespeare, to various contemporary treatises on art, and to images of related plastic arts. The sound of someone reading the lines and moving images can also be called up to help contextualize the poem. Critics of this approach feel that while the variety and breadth of allusions can create an effective teaching tool, the secondary material is necessarily limited to a particular perspective while possibly giving the student the illusion of thoroughness. The Brown Storyspace Cluster, a collection of several hundred hypertext and hypermedia Webs, gathers informational materials, fiction, and poetry created by Brown faculty members, including Robert Arellano, Robert Coover, George Landow, Massimo Riva, and students in their courses between 1992 and the present.

As poets, novelists, and scholars continue to work in hypertext and networked publications, future critics and readers will need to become increasingly familiar with this technology. Primary networked sources will thus pressure the academy to undertake new kinds of training in order to access the new sources of creative expression.

Electronic Publication

Several years ago Stephen Harnad, then at Princeton, founded a ground-breaking journal called Psycholoquy, designed to be a peer-reviewed journal of the highest caliber but available only online. The reasons for this were widely broadcast by Harnad: journals in his field were terribly expensive; the field in question, cognitive neuroscience, was of interest to very few researchers worldwide; review of submitted articles could take months, with this delay often dating the article by the time of publication; printed journals were difficult to come by in schools without large budgets; and the nature of printed journals required that they accumulate in discrete volumes, rendering difficult a more dialogic interaction of scholars with the ideas presented in their pages.

Psycholoquy has been a reasonably successful venture. The turnaround time for editorial review of submitted articles is six weeks; the scholars vetting submission are distinguished in their fields; the subscription is free to anyone with an Internet connection; the articles are indexed, and scholars can respond to individual articles with their responses attached to the original article to stimulate additional responses. Scholarly electronic publishing continues to show strong growth, with faculty collaborating to disseminate new research online (Bailey).

Many electronic journals are published by individual scholars or by scholarly societies rather than by academic publishers. Some journals, like Critical Inquiry, an interdisciplinary journal for the arts and humanities, publish only tables of contents and excerpts of current and back issues. Others focus on one aspect of the traditional scholarly journal: History Reviews On-Line is a quarterly electronic journal devoted to reviewing books on all fields of history. The International Philosophical Preprint Exchange is a service provided by the Department of Philosophy, Chiba University, and an international working group comprised of scholars at the University of Toronto, York University, the University of Missouri, and the University of Alberta. The Exchange provides preprints of articles and voluntary peer review, as well as information on eventual publication or presentations at conferences.

More and more journals are offering the complete text of articles online. The Slavic Review publishes an electronic post-print edition of the print journal, with articles appearing on its Web site several months after print publication. Bryn Mawr Classical Review, founded by James O'Donnell at the University of Pennsylvania and designed for publication on the Internet, publishes the complete text of articles. Early Modern Literary Studies, a refereed journal for English literature, literary culture, and language during the sixteenth and seventeenth centuries, features complete articles as well as an online readers' forum and links to related Internet resources in the field. The Journal of Buddhist Ethics is also designed specifically for online publication; in addition to publishing peer-reviewed scholarship, it hosts online conferences and offers the full text of the Pali Canon online to which scholars may link in their articles.

As of this writing, the number of electronic journals exceeds 1,700 and is increasing at a rapid pace each year. Many, like Psycholoquy and History Reviews Online, are new entries in reaction to the slow and costly printed serial publications. Publishers of traditional journals have been slow to warm to the idea of electronic publishing because of the uncertainty of revenue and also because the medium is new, with a predisposition for transience: machine and software obsolescence, uncertainty of funding, and the uncertain status of electronic publishing as legitimate work for tenure consideration all contribute to their doubts.

Nonetheless, academic presses are beginning to offer full-text journals online for a subscription fee, and it is likely that the future of serial publication will be predominantly a networked enterprise. Project Muse provides worldwide, networked, subscription access to the full text of over 40 Johns Hopkins University Press scholarly journals in the humanities, social sciences, and mathematics. JSTOR proposes to build a reliable and comprehensive archive of important scholarly journal literature, improve access to these journals, fill gaps in library collections of journal back issues, address preservation issues such as mutilated pages and long-term deterioration of paper copy, and assist scholarly associations and publishers in making the transition to electronic modes of publication.

Tools

A remarkable amount of commercial software is currently available. Tools for handling text, data, and images can often be tailored to support a range of humanities research projects. BRS/Search and similar programs offer powerful free-text search capabilities; they are proving useful in many areas, including literary and linguistic scholarship. IdeaList, a free-text database manager, is popular among historians; Access and other database packages are also widely used. Additional programs, including Collate and Concorder, have emerged from the innovations of individual scholars. An ongoing critical appraisal of software would be of significant value to humanist scholars; the number of available packages and diversity of application areas make this task well beyond the scope of this publication.

Many historical applications demand the power of relational databases to handle the complex interrelation of the data. A recognized standard language for such databases, the Structured Query Language (SQL), is used in such management systems as INGRES and Oracle. Numerical analysis was one of the first applications of computers in historical and archaeological research; the Statistical Package for the Social Sciences (SPSS) continues to be widely used and is well supported by computing services,; with microcomputer versions now also available. A wide variety of modelling and graphical packages are also in use, some of which, such as ARC/Info, integrate advanced mapping with relational database facilities. Commercial software, even when it can be customized, does not meet all the needs of scholars and in some cases it is important to write special software for humanities applications.

The exchange of textual information poses problems, some of which can be alleviated by the use of standard conventions. Many electronic texts are now encoded using Standard Generalized Markup Language (SGML), which is the International Standards Organisation convention for the description of electronic documents. Before this standard was accepted, electronic texts were often encoded using idiosyncratic systems, which meant that they could not be exchanged among scholars. The pace of conversion of text to electronic form has accelerated rapidly, and the Text Encoding Initiative was established in order to address the problem of a proliferation of differently encoded texts (McCarty). The objective of this international project is to develop guidelines for the preparation and interchange of machine-readable texts for scholarly research. The first guidelines, produced in 1990, suggested that SGML be adopted by scholars for the encoding of electronic text. SGML provides a syntactic framework within which markup or encoding tags can be defined by adding only those characters which can be transmitted over networks and read by most computers. The application of SGML is not limited to textual resources and can be extended to other disciplines; for instance, it is already in use for the encoding of music. If these standards gain general acceptance, the encoding of new data and the interchange of existing data will be positively revolutionized. Encoded Archival Description (EAD) is being developed to work in tandem with SGML. Unicode, a platform-independent scheme for languages using non-Latin characters (including Greek, Russian, Arabic, Hebrew, Japanese, Korean, and Chinese), is also under development as an international standard to enhance access to computerized information.

SGML has sparked the creation of a new generation of text-handling software. Dynatext, which makes use of SGML to provide sophisticated textual analysis and manipulation, is typical of these recent products: it accepts SGML-encoded text directly and creates electronic texts, automatically building a full-text index and establishing hypertext links for tables, figures, footnotes, and cross-references. A browsing facility allows users to read, query, and annotate the text; full-text searching is also supported. Style sheets use SGML tagging to define the format for on-screen display or printed document. This system is not without its limitations; chief among them, for humanists, is cost. Many of the most powerful and advantageous software tools are priced far beyond the means of most scholars, even when generous educational discounts are provided. Panorama, a Web browser for SGML-encoded texts, provides the same functionality as Dynatext for documents published over the Internet.

Researchers use text-handling software to carry out complex analyses that would be tedious and time-consuming if attempted manually. Many such packages are now available to scholars: one of the best known is the Oxford Concordance Program (OCP) <OCP@VAX.OX.AC.UK>, originally developed for mainframe computers but now available in a PC version. OCP is a general- purpose tool suitable for such applications as stylistic analysis, preparation of language courses, vocabulary acquisition, dictionary making, text editing, and content analysis. It is widely used in literary research and in other disciplines (history, for example) where textual analysis is required.

Investigation of manuscript traditions and the creation of critical editions from multiple sources requires specialized software. Collate, a program developed by the Computers and Manuscripts project at the Oxford University Computing Services, helps scholars collate up to a hundred texts simultaneously; it provides facilities for adjustments of the collation by the user and can generate output in many different formats, including several recommended by the Text Encoding Initiative.

A number of simple-to-use hypertext authoring systems are now available, such as HyperCard from Apple. Instructors can use these programs to compile their own hypertext courseware. Authoring tools which offer a "shell" for the production of multimedia materials are also widely used: these include Macromedia, Storyspace (Eastgate), and Toolbook (Asymetrix).

Summary and Outlook

This synopsis indicates that much research is underway involving the creation of new resources, the conversion of conventional sources into machine-readable form, and the analysis of material to answer scholarly questions. Computers serve different functions at different stages in the process of research. The current model of raw materials of scholarship includes text, data, sound, and images, but there has been a growing attempt by humanists to use computers to model knowledge. Expert systems are being used by some researchers along with other Artificial Intelligence techniques (neural networks, for example) to formalize knowledge and automate the process of interpretation. While projects in this area have generated lively debate, few have produced useful results to date. However, they do show promise and are likely to become increasingly important in the coming decade.

Contents
Preface | I. Background
II. Information Technology and Scholarship | III. New Developments and Change
IV. To Challenge and Invigorate Future Scholarship | V. Principal Recommendations and Follow-up Activities
APPENDICES

American Council of Learned SocietiesOccasional Paper No. 37

Information Technology in Humanities Scholarship: Achievements, Prospects, and Challenges— The United States Focus