American Council of Learned Societies
Occasional Paper No. 37
Information Technology in Humanities Scholarship:
Achievements, Prospects, and Challenges
The United States Focus
II. INFORMATION TECHNOLOGY AND SCHOLARSHIP
Humanists have used computers since the 1950s, but until the 1980s usage could be described
as occasional. Initially computers were perceived as number-crunchers, and most data were
numerically coded for input and analysis. These data were often subject to purely statistical analysis,
and eventually the "quantitative paradigm" took hold in some humanities disciplines, such as history
and archaeology. The gradual realization that computer hardware and software could
manipulate symbolic as well as numeric data had a remarkable impact on the kinds of projects for
which computers were used. Today, most scholars recognize that data need not be numeric to find a
place in humanities information processing.
To some extent the discussion of computer-based projects is best viewed from the vantage of
the raw materials of scholarship, which include text, data, images, and sound. At least two
additional resources must be considered: electronic communications, which permit the transmission
of information; and combined sources (hypermedia/multimedia), which provide a platform
for working with several different types of raw material simultaneously. The distinction between
text and data, too, is somewhat artificial. Historians and archaeologists often work with kinds
of information that are most effectively collected, manipulated, stored, and output in structured
form. Some of the new tools for working with textual data have blurred the distinction between text
and data, but many different types of information are still better handled as discrete units by
statistical and database management software. To assess the impact of new technology on
humanities scholarship, we concluded that it would be helpful if the developments were viewed in terms of
the type of information involved.
If the use of computers for word processing, electronic mail, and simple searching of
online databases or catalogs is excluded, the computer-based work done by humanists can be divided
into five categories: the provision of general resources, such as library catalogs, dictionaries,
and bibliographies; the retrospective conversion of manuscript or printed sources into
machine-readable form; the creation of specific research tools, such as databases and image banks; the extraction
of summary data from larger electronic resources, such as population censuses and tithe surveys;
and the computer-assisted investigation of hypotheses and testing of models. The first three
categories provide the basic resources for further research in the humanities and lay the foundation for the
last two categories.
The use of computers to investigate hypotheses that must be tested with large and
complex datasets has led to advances in archaeology, history, literary studies, and philosophy. Sir
Anthony Kenny's study of the Aristotelian
Ethics, which demonstrated that books ascribed to both
the Nicomachean Ethics and the Eudemian
Ethics actually belonged to the latter text and not the
former, would have been difficult, and perhaps impossible, if carried out by manual investigation.
Computers have allowed humanists to extend the agenda of inquiry. While computer applications assist
in establishing the identities and relationships of primary sources, computerized texts and
databases are also vital in rethinking the history of ideas and particular works. Keith Baker traced the
evolution of opinion publique in the French Enlightenment using the American and French Research on
the Treasury of the French Language (ARTFL) text collection: he developed a tentative chronology
of the use of the term in eighteenth-century France and showed how the traditional associations
of opinion with uncertainty, instability, and disorder gave way to the rational authority of
opinion publique in the late eighteenth century.
The projects described in this report illustrate the overall impact of new technology on
scholarship. The coverage is not comprehensive; the projects cited are offered as significant examples of
what has already been accomplished and what already is underway or planned for the future.
Inevitably there are gaps. The discussion of sound, for instance, touches on sound cognition by
musicologists, but excludes research into the understanding of text by linguists and literary scholars.
The computerization of materials in museums, libraries, and archives is also not treated
exhaustively. Major institutions, such as the Library of Congress, the National Archives and Records
Administration, and research universities, have computerization programs, but smaller institutions often lack
the resources, equipment, and training necessary to undertake such programs. We discuss in
the concluding section some ways to address problems facing these institutions. Surveys by
Michelson and Rothenberg and by Gould, the European Science Foundation's survey edited by Genet
and Zampoli, and the Humanities Information Review Panel report in the United Kingdom (Feeney
and Ross) have charted some of this work.
Developments in communications technologies and facilities have had a significant impact on
the way scholars use and exchange information. The Internet came to the forefront as a
powerful communications system for the higher education community in the 1980s. Since then, its use
has grown dramatically and has extended beyond electronic mail and day-to-day communication to
the exchange of papers, the launch of collaborative ventures, and the provision of online access
to resources. In this context it is important to emphasize that the use of electronic mail (e-mail)
and online information services requires a personal computer or intelligent terminal on every
scholar's desk. In humanities departments such provision is by no means assured. Improved access is
essential to ensure that all scholars have access to the growing range of Internet services.
The use of e-mail has led to the formation of discussion lists and bulletin boards. Discussion
lists allow users to exchange ideas with other scholars and to distribute information to groups in
their field. Users can "post" information to electronic bulletin boards and log in to read what interests them.
HUMANIST, operated jointly by the Center for Electronic Texts in the Humanities at Rutgers University and Kings College London, is a forum for scholars working in humanities computing
or interested in the applications of technology to the humanities. The list includes over 1,000
registered users, a rather small percentage of the scholarly community. INTERDIS <INTERDIS@MIAMIU.MUOHIO.EDU>, another general
discussion group for the humanities, offers announcements, conference notes, and informational queries
rather than discussion of issues.
Most scholars use bulletin boards and discussion groups specific to their field of study.
Specialized discussion groups and bulletin boards tend to promote discussion rather than distribute
announcements. H-Net is an extensive series of discussion lists in history, with a rich variety of subfields
offered for specialists in those areas. PHILOS-L <PHILOS-L@UK.AC.LIVERPOOL>
was created to distribute information about jobs,
conferences, and the occasional query, but users report it often hosts lively scholarly debates. ARCH-L <ARCH-L@LISTSERV.TAMU.EDU>, run
from Texas A & M University, is both a discussion list for archaeologists and a distribution service for
data and software.
Other discussion groups are interdisciplinary, seeking to reinvigorate scholarship in
traditional fields through new modes of interaction. PHIL-LIT <firstname.lastname@example.org> is a forum for queries, information sharing,
and previews of articles and reviews. The group promotes the exchange of ideas, owing no
allegiance to a particular school or style of criticism, and is open to anyone with a serious interest
in philosophical interpretations of literature, literary investigations of classic works of
philosophy, philosophy of language, and literary theory. PSYCHE-D <PSYCHE-D@iris.rfmh.org> aims to encourage discussion and
the exchange of information on research in cognitive science, neuroscience, philosophy, and
other related disciplines with the aim of understanding the nature, function, and underlying
mechanisms of consciousness. Discussion lists and bulletin boards may also focus on the work of an
individual author or theorist, thereby bringing together scholars in different disciplines. While
DESCARTES-L <DESCARTES-L@bucknell.edu> draws its audience mainly from scholars in philosophy, other discussion lists centered on
individual theorists, such as DELEUZE-GUATTARI, attract scholars from almost every field in the
humanities. AUSTEN-L <AUSTEN-L@vm1.mcgill.ca> serves as a discussion group for scholars of English literature, history, and gender
studies, but also provides valuable interaction between scholars and interested lay readers.
In addition to discussion lists and bulletin boards, Internet Relay Chat (IRC) groups and
multiple-user spaces (MUDs or MOOs) afford simultaneous textual exchange. One of the earliest projects
of this nature is PMC-MOO, which hosts a real-time virtual discussion of recent articles in the
electronic journal Postmodern Culture and of general issues related to contemporary cultural theory.
Recently, the VRoma Project
garnered funding from NEH to enhance students' study of Roman culture.
VRoma is a multi-user, networked environment built upon a spatial and cultural metaphor of ancient
Rome; faculty and students can meet, interact, collaborate, hold classes, and access databases, texts,
images, and teaching materials.
Discussion groups, bulletin boards, and IRC are not without limitations. Impermanence is one
of the more intractable problems of the Internet, and services often migrate, wither, or
disappear altogether. At times a small group or an individual will tend to monopolize a discussion,
which undermines the dialogic aspect of networked communication. The anonymity associated
with bulletin boards and discussion groups can give participants a license for mischief that
face-to-face interaction might inhibit; on the other hand, anonymity can also lead to
more honest and creative exchanges.
Networked information, however, consists of more than just discussion lists, bulletin boards,
and Internet Relay Chat. The Internet provides immediate access to electronic archives, automated
library catalogs, and numerous information services. The Library of Congress, government
publications distributed by the General Printing Office (GPO), congressional reports, full-text books and
journals, shareware, census data, other numerical databases, and projects designed specifically for
the scholarly community (such as the Dartmouth Dante Project <Telnet://baker.dartmouth.edu:1835/>)all are available to online
academic researchers. Items once available only in print, such as the Institute for Scientific Information (ISI) Arts & Humanities Search
database, are now accessible over the Internet. As more and more institutions have established a presence
on the Internet, the number of information sources has expanded enormously, giving rise to the
concept of the "virtual library."
Indeed, as the Internet gained in speed and richness of available resources, librarians,
scholars, and information technology specialists began to create new applications of considerable
significance to humanities scholars. These included new publishing opportunities; the digitization of picture
and slide collections and of sound archives; the scanning and delivery of high-quality facsimiles
of manuscripts, archives, and rare books; and the production of multimedia courseware and
interactive learning sessions.
These applications came into being sooner than most had expected with the appearance of
the World Wide Web. The Web, as it is usually termed, allows for multimedia
file transfer and access to millions of sites around the globe. A scholar can perform research on the Web using search
engines such as Lycos, Infoseek, and Webcrawler. He or she can consult digital image
collections from the Vatican or the Louvre, digital text archives at major universities, film reviews, online
journals, bibliographies, information on scholarly societies, syllabi for thousands of academic courses,
moving images, online catalogs, dissertations, and other works pertinent to the
Specialized Web search tools are rapidly developing.
Argos, for example, is the first peer-reviewed, limited-area search engine (LASE) on the World Wide Web. It has been designed to
cover ancient and medieval cultures. Quality is controlled by a system of hyperlinked Internet
indices which are managed by an international group of scholars serving as associate editors of the
project. Argos is managed by the University of Evansville.
TORGO, maintained by the Department of English at the University of California at Santa
Barbara, is another search tool designed to facilitate and promote scholarly research on the Web.
TORGO searches abstracts of online journals and papers which have been screened for quality
and usefulness. Web-Cite, a commercial venture, also collects information from online journals and returns search results by electronic mail.
Guides to Internet resources in the humanities, usually created by scholars in a given field,
are another effective means of locating Web resources. Voice of the Shuttle, maintained by Alan Liu at the University of California at Santa Barbara, is one of the most comprehensive and popular
guides to humanities resources on the Internet in fields ranging from anthropology to women and
Other Internet subject guides focus on a particular field. Literary Resources on the Net, a large collection of references to Internet sites in English and American literature, was one of the
earliest specialized subject guides for the Internet and is maintained by Jack Lynch, a doctoral student at
the University of Pennsylvania. REESWeb at the University of Pittsburgh is a comprehensive index of electronic resources on Central Asia and the former Soviet Union.
In truth, however, the Web is still a fairly raw technology. It points to resources in an elegant
and even flashy way, but still suffers from considerable drawbacks which include missing, defective,
or outdated links; difficulties in ascertaining the authority behind most Web sites; the misleading
titles of many sites; the burgeoning incursion of commercial ventures onto the Web; the sheer
amount of material available; and the lack of direct access to the texts and other resources in some
databases. Because the Web is a highly accessible mass publishing environment, many academics
have embraced this new medium. The concept of an arena for public discourse appeals to most
scholars. However, on the level of day-to-day work, the openness of the Web has led to a saturation
of communication space that at times seems to erase the possibility of coherent discourse.
The potential of networked-based presentation for humanities scholarship is immense, but,
at present, open to question. The Web today is in fact a more passive, anonymous tool than
originally envisioned by its creators, Swiss physicists who saw the Web as a means for intensive
collaboration. In order to transform the Web into a genuinely collaborative space as opposed to a
"surfable," alternate world, a community of scholars must be formed to bring to bear upon the Web the
evaluative standards and professional discernment that are employed in more traditional scholastic endeavors.
Humanities instruction and scholarship generally endeavor to reinterpret and reevaluate our
textual legacy in an evolving understanding of larger historical, social, and cultural contexts. Since work
in the humanities has been largely text-based, electronic text projects are becoming
increasingly important as academic resources. Electronic texts include scholarly editions, corpora of
contemporary writings and transcriptions, reference works, and instructional hypertexts. The most
prolific electronic text producers combine the expertise of scholars, librarians, and humanities
computing consultants. Electronic texts from manuscripts and printed sources are now available in
different formats and from a number of different venues. Some projects endeavor to provide
comprehensive coverage while others focus on a particular corpus of texts. Some are combined with
integrated software for analysis and can be accessed only by means of that software; others offer plain text
files which can be processed by software chosen by individual scholars.
Generally speaking, though, the primary purpose of a text archive is to ensure that
machine-readable texts remain available to the academic community. Because the texts come from so
many different sources, they vary considerably in format, accuracy, and type of coding. Discussion
is ongoing on establishing standards of text encoding so that these archives remain
Major text archives include the Electronic Text Library at the University of Virginia; the Center for Electronic Texts in the Humanities, jointly sponsored by Rutgers and Princeton Universities; the Humanities Text Initiative at the University of Michigan, Ann Arbor; and ALEX: A Catalog of
Electronic Texts on the Internet, supported by North Carolina State University. In the United Kingdom,
the Oxford Text Archive was established in response to the growing number of electronic texts prepared by individual scholars, major research projects, and publishers.
Advances in computer storage and retrieval have made the construction and use of text
corpora much easier, and this has in turn widened their usefulness for research. A corpus differs from
an archive in that it consists of a collection of texts gathered according to particular principles for
a specific purpose. Several of the largest electronic text projects support inquiry across
several traditional fields, facilitating interdisciplinary work. One of the best known, the Thesaurus Linguae Graecae (TLG), hosted by the University of California at Irvine, provides in machine-readable
form the work of 3,157 authors who wrote in Greek from the time of Homer to CE 600
with historiographical, lexicographical, and scholastic texts from the period between CE 600 and
1453. It has been claimed that the 57 million words in the corpus represent 99 percent of the surviving
Greek literature. The TLG accommodates different approaches to textual analysis, and this has
encouraged the production of a number of text-handling packages designed specifically for work with
this corpus. The advantages of electronic corpora, first demonstrated among classical scholars, are
now perceived by scholars in other disciplines. The Dictionary of Old English (DOE) project <email@example.com>,
compiled at the University of Toronto, has converted the whole corpus of Old English texts to
machine-readable form and is now available on magnetic tape or for searching online via the Internet.
Another corpus of significant value is the American and French Research on the Treasury of
the French Language (ARTFL), maintained at the University of Chicago and co-sponsored by the
Centre National de la Reserche Scientifique (CNRS). The project, originally initiated by the
French government for the creation of a new dictionary of the French language, developed a corpus
totaling some 150 million words, representing a broad range of written French. In most cases
standard scholarly editions were used in converting the text into machine-readable form, and the data
contain page references to these authoritative editions. ARTFL makes accessible approximately
2,000 digitized texts from the medieval period to the twentieth century. They include literary
works, political tracts, philosophical theses, and technical writings, and such genres as novels, verse,
theater, journalism, essays, correspondence, and treatises.
The American Verse Project, of the Humanities Text Initiative, assembles an electronic archive of volumes of American poetry spanning the eighteenth to the twentieth centuries. The full text of
each volume is converted into digital form and coded in Standard Generalized Markup Language
(SGML) using the Text Encoding Initiative (TEI) Guidelines. The archives, available over the Internet,
may be searched in a variety of ways. A second goal of the project is to provide a service to scholars
by advancing their ability to use Web documents in their work. Currently, the Internet does not
have well-established mechanisms for authors seeking to integrate complete texts, or parts of texts,
into their scholarship. This project will allow someone writing about Dickinson, for example, to
embed links in his or her electronic text pointing the reader to various poems, stanzas, or lines that are
part of the project without having to replicate the material within his or her own document.
Smaller projects, focused in a particular subject area, are also flourishing. The Dartmouth
Dante Project database <Telnet://baker.dartmouth.edu:1835/> combines computer technology with more than 600 years of commentary tradition on Dante Alighieri's major poem, the Divina
Commedia. This gives scholars easier access to the
full texts of important critical works, many of which are rare and difficult to obtain. Likewise,
the Wesleyan Confucian Project produces electronic texts of Confucian and Confucian-inspired
texts from the eleventh century (CE) to the present. The recent improvements in optical
character recognition (OCR) technology and encoded non-Latin characters has made possible
digitization projects for Chinese, Japanese, Arabic, Hebrew, and Russian works.
Linguistics is another area where electronic text corpora are being developed on a large scale.
The results are proving invaluable not only in lexicography and the preparation of language
reference works, but also in the development of speech recognition for computers, machine
translation, computer-assisted language learning, and "intelligent" word processing software.
Because human language is so complex, computer programs for processing it must be
fed enormous amounts of dataincluding speech, text, lexicons, and grammarsto be robust
and effective. Shared resources permit replication of published results, support fair comparison
of alternative algorithms or systems, and permit the research community to benefit from corrections
and additions provided by individual users. The Linguistic Data Consortium (LDC) is an open consortium of universities, companies, and government research laboratories supported by grants from
the Advanced Research Projects Agency (ARPA) and the National Science Foundation. It creates,
collects, and distributes speech and text databases, lexicons, and other resources for research
and development purposes. The University of Pennsylvania is the LDC's host institution. Its
collections include texts, audio files of telephone speech, and audio and video files of broadcast data,
using such computer-based linguistic technologies as speech recognition and understanding, optical
and pen-based character recognition, text retrieval and understanding, and machine translation.
Many archive and corpora projects derive from academically funded work, but it is an
indication of the growth of this area that commercial publishers are now investing in electronic texts,
assisting with online publication, and creating and distributing CD-ROM, diskette, or print versions
of academic electronic text projects. The Brown University Women Writers Project is creating a textbase of pre-Victorian women's writing in English. This is a sizable body of material which has been
largely inaccessible to scholars and students. Despite its considerable historical and literary interest, its
lack of availability has seriously distorted our view of the role of women in Western literary and
cultural history. This project explores the educational advantages of integrating students into a
technology-intensive interdisciplinary research project. Graduate and undergraduate students learn
the techniques of literary text encoding, scholarly editing, book production, and traditional
and electronic publishing by working in close collaboration with humanities computing specialists
and literary scholars. The textbase is used to support a variety of products in different formats,
including publication of selected works through a 30-volume print series and eventual online publication
with Oxford University Press.
Many other commercial publishers are exploring the possibilities of new media.
Chadwyck-Healey Ltd. offers a number of machine-readable full-text databases on CD-ROM, including
the Database of African-American Poetry, which contains over 2,500 poems written by
African-American poets between 1760 and 1900; Goethes Werke, an electronic version of the Weimar Edition
of Goethe's works, originally published between 1887 and 1919; and the
Corpus des uvres de philosophie en langue
française, which collects major works of post-Renaissance French
philosophy and is developed under the direction of Michel Serres of
l'Académie Française. Recently
Chadwyck-Healey has assembled its English and American literature databases for inclusion in an online
service, Literature Online (LION). LION brings together nine of Chadwyck-Healey's full-text literary databases. Together these comprise more than 208,000 poems, 4,000 plays, 290 works of
prose fiction, and 21 versions of the English Bible. This service combines texts, electronic
journals, discussion groups, reference works, bibliographies, and catalogs in English and American
literature; it also provides hypertext links to relevant resources. Also included will be electronic
bookshops for new and antiquarian books and journal articles, and a printing service for instructors who
require bound copies of texts they have found in the database.
Scholarly effort and commercial funding have also gone into the provision of reference works
in electronic form. The recently completed Oxford English Dictionary (OED) Second Edition on
CD-ROM, published by Oxford University Press, makes use of specially designed search and
retrieval software. This electronic version of the 20-volume dictionary contains 60 million words and
allows researchers to carry out sophisticated searches not feasible using only the print version. For
example, the system supports both browsing and searching through quotations, definitions, and
etymologies. Researchers interested in etymology can find all words derived from a particular language, such
as that of the Blackfoot Indians. A search combining date, author, publication, or word will find
specific quotations. The Stanford Encyclopedia of Philosophy, sponsored by the American
Philosophical Association and the Philosophy Documentation Center at Bowling Green State University, represents
a dynamic database of information on philosophers to which scholars from all over the world contribute.
Machine-readable text offers further possibilities which have been realized in the concept
of hypertext. Hypertext has emerged as a way of exploring and manipulating text that is
non-linear, non-numerical, and conducive to allowing readers to discover and create their own paths
through material. It provides a means of linking texts in an associative way using "nodes" and "links,"
an electronic version of footnotes and cross-references. The links can be preserved to permit others
to follow predefined paths or to define their own paths. This technique is particularly suited to
reference works and collaborative writing.
The Victorian Web, for example, is the Web version of Brown University's Context 61, which serves as a resource for courses in Victorian literature. The materials originally derive from Context 32, a hypertext developed with Intermedia software that provided contextual information for
English 32, a survey of English literature from 1700 to the present. The Victorian Web includes
hyperlinked information about individual authors, Biblical typology, Romantic and Victorian timelines,
feminism and literary theory, public health, race and class issues, and anti-Catholic sentiment in
Victorian England. This collection of materials on nineteenth-century British culture continues to grow
as students and faculty at Brown University and other institutions contribute new essays, questions,
Electronic text can also be used together with text analysis softwarewhether integrated
or separateto carry out complex textual analysis, providing the researcher with both microscopic
and macroscopic views of the text, from small-scale features of an individual work to searches across
an entire corpus. These tools can be used to create critical editions, carry out stylistic comparisons
or lexical analysis, and attribute authorship. An example is the examination by Gerard Ledger of
the works of Plato. By taking the simplest possible feature of languagethe occurrence of
particular letters of the alphabetand subjecting it to a complex multivariate analysis, Ledger was able to
draw new conclusions about the authenticity of dubious dialogues and the chronology of the
Donald Foster recently received national attention for his work in establishing, with wide
critical acceptance, the author of a little-known Renaissance elegy to be William Shakespeare.
The hermeneutic facet of his research cannot be minimized, but the stylistic analysis performed by
the computer was integral to his conclusions. Foster performed a frequency-of-use match of words
in the elegy against the full corpus of Shakespeare's writings and other contemporary authors.
From this analysis Foster could conclude that selected words and their contextual phrasing were
unique to Shakespeare; enough of these appeared in this "lost" elegy to strongly suggest the
Computer-based textual analysis can also assist scholars in the production of critical editions.
The Canterbury Tales Project, in which Sheffield and Oxford Universities collaborate, draws upon
the traditional methods of textual criticism and new software to reveal unseen patterns in the
multiple manuscripts of Chaucer's Canterbury
Tales. The main objective of the project is to produce a
library of resources which can assist in the creation of critical editions of the
Tales, language studies, and historical and cultural analysis. The manuscript tradition for
The Canterbury Tales includes some 83 manuscripts, some of which are fragmentary. Although the
Tales are well understood, study of the entire manuscript tradition was not feasible before the advent of computer systems capable
of handling significant quantities of data.
The use of computers for handling data is well established in historical and archaeological
research, which require the scholar to extract data from source material, formalize them, and organize
them for analysis. In these areas the individual dataset, collected and organized for specific projects,
is predominant. While some projects do rest on the analysis of textual information, structured data
is the basic source for many studies. History projects have spearheaded the creation and
dissemination of electronic datasets, though these datasets are becoming increasingly important to
scholars working in cultural studies, linguistics, and gender studies.
One example is the Dumbarton Oaks Hagiography Project <firstname.lastname@example.org>, which assembles information
on Byzantine culture and society drawn from Greek hagiographical texts. The database includes
over 100,000 data entries. Additionally, text files of the vitae provide immediate access to the chapter
of the text from which data has been extracted. The database is accompanied by brief
introductions to each vita with summary biographical and chronological information.
Other computer datasets have emerged out of counterparts in print publications.
Ethnologue, sponsored by the Summer Institute of Linguistics, is a catalog of the world's languages. The
Summer Institute of Linguistics specializes in work with languages spoken by the world's
lesser-known linguistic groups by developing programs in partnership with host governments,
universities, churches, and local people to promote linguistic research, language development,
literacy, translation, and other educational and research projects. The database includes information
on number of speakers, location, dialects, linguistic affiliation, and other sociolinguistic and
demographic issues. This database represents the twelfth edition of the print version of
Ethnologue. The data may be browsed by country or language family, and includes interactive maps, a
language distribution chart, and search capabilities.
Another category of very large online datasets is the combined library catalog. One of the
largest in the world is the Online Computer Library Center (OCLC), based in Dublin, Ohio,
and containing over 32,000,000 bibliographic records. A rival project is the Research Libraries
Information Network (RLIN), based in California, with over 77,000,000 records from over 250 sources
describing books, journal articles, dissertations, and rare materials in over 365 languages. These
undertakings are of fundamental importance to libraries around the world, with access by individual
scholars increasingly a part of each organization's marketing strategy.
An important question for the providers of large datasets and archival material is how the
database should be organized to optimize access for users. The automation of archive catalogs has led in
some cases to a reconsideration of the way this material is designed and prepared. The Berkeley Finding
Aid Project, funded by a grant from the Department of Research and Development
and sponsored by the Commission on Preservation and Access, provides finding aids in a
standard, platform-independent electronic form. Finding aids are inventories, registers, indices, or guides
to collections held by archives, manuscript repositories, libraries, and museums. The Berkeley
finding aids provide detailed descriptions of collections, their intellectual organization, andat
varying levels of analysisof individual items in the collections. Access to the finding aid allows
scholars to explore the content of a collection and determine whether it is likely to satisfy his or her
research needs. The Berkeley project makes finding aids for a number of American archives available
online. The project is developing sophisticated searching and sorting capabilities so that information
may be retrieved according to individual need.
Some specialized datasets are commercially available on optical storage media. One example
is the CD-ROM version of the Eighteenth-Century Short Title Catalogue published by the British
Library. Those who study eighteenth-century history, literature, and culture can identify the current
location of printed material by particular authors or having particular titles from among the 305,000
records. They can also investigate where works were published and which printers handled
particular categories of books.
The Trans-Atlantic Slave Trade Database, of the W.E.B. Du Bois Institute for
Afro-American Research at Harvard University, aims to disseminate computerized data on most of the slave
voyages that sailed from Africa to the Americas from the sixteenth century to the late nineteenth century.
The project has gathered records from 75 percent of all the slave ships sailing under British,
French, Spanish, and Dutch flags between 1662 and 1860. The data details mortality, age, sex,
crew membership, conditions on slave ships, duration of voyages, nature of slave resistance,
business organization of slave traders, and age and physical characteristics of vessels. When the project
is completed, data will be published on CD-ROM by Cambridge University Press. The core set of
more than 20,000 transatlantic slave voyages will constitute the largest data source for the
long-distance movement of peoples before the twentieth century. Refined demographic data on the volume of
the trade (and thus of pre-colonial African populations) and on the spatial distribution of African
peoples in the Atlantic world will allow scholars to assess more accurately questions of African state
formation, agricultural and ecological change, African cultural survivals, and the development of the
In historical research, scholars who use computers usually create their own datasets. A
large number of machine-readable historical data files have been created over the years, not only
by historians, but also by geographers, anthropologists, genealogists, family history groups,
and sociologists. The Arts and Humanities Data Service (AHDS) was recently established in the
United Kingdom by the Joint Information Systems Committee to facilitate the creation and use of
electronic resources in the arts and humanities. To achieve this aim, the AHDS will collect, describe,
catalog, preserve, and provide subject-specific user support for digital resources that are created as a
product of scholarly research; facilitate collaboration between arts-based user communities and
the commercial or non-profit sectors; and promote standards and guidelines for the creation,
description, preservation, and scholarly use of electronic information.
Scholars engaged in historical research make use of an increasingly wide range of
software packages in order to assemble, organize, analyze, and display their source material.
Statistical packages, in use for many years, have been augmented by programs for relational
database management, nominal record linkage, mapping, and hypermedia. Developments in software
that allow data to be input in a way that preserves their complexity and irregularities facilitate
source criticism, which is fundamental to historical study. The British Academy's Prosopography of
the Byzantine Empire <UDLC052@uk.ac.kcl.cc.elm>, developed and housed at King's College London, was established to create
a database of all documented persons in the Byzantine Empire from CE 641-1260. The
information includes names of the individuals, their responsibilities, first and last date mentioned, sex,
career titles, topographical details, sources of the information, and a short article about each person.
The inquiries made possible by the database have surprised even its creators; it is possible, for
example, to find all bishops whose brothers were also bishops, or the names of all individuals who
appeared at the court of a particular emperor, along with information on religious sects, languages, and patrons.
Visualization of information has long been a valuable explanatory tool among historians
and archaeologists. Computer mapping systems in general and geographical information systems
in specific have increased the options for generating presentations of this kind. The Alexandria
Project represents a consortium of researchers, developers, and educators, spanning the academic,
public, and private sectors, who are exploring a variety of issues related to a digital library for
geographically-referenced information. All of the objects in the library will be associated with one or more
regions on the surface of the earth. The Alexandria Digital Library is an online information system
that provides access over the World Wide Web to a subset of the holdings, as well as other
geographic datasets. It is sponsored by the Map and Imagery Laboratory in the Davidson Library at the
University of California, Santa Barbara
More recently, visualization techniques have also been applied to textual information.
The Language Visualization and Multilayer Text Analysis Project, sponsored by the Cornell
Theory Center, created a prototype tool for the study of language and discourse phenomena in
three-dimensional space. The tool allows a researcher to conduct interactive research in the structures
and typologies of discursive formation in large samples of textual data. Using this resource, scholars
can develop new techniques for reading and interpreting text space.
One promising facet of digital technology is that it makes visual, textual, and numeric
information both more accessible and easier to handle. Archaeologists, art historians, geographers, and
historians are making increasing use of digital image processing, image enhancement, and graphics.
Current projects incorporate three-dimensional modeling, enhanced data identification,
high-resolution scanning techniques, and online exhibitions of visual primary sources.
The rapidly-developing field of computer graphics is beginning to play a key role in
archaeological data processing. Such systems bring to archaeology new mechanisms for analyzing data.
Using graphical representations, archaeologists can explore a range of different configurations
and interpretations of evidence, and thus take a "second look."
Computer modeling systems are also being used to build reconstructions of historical spaces.
The Rossetti Room is a Virtual Reality Modeling Language (VRML) model of the studio of the
pre-Raphaelite painter and poet Dante Gabriel Rossetti. Users can select one of a series of paintings
to be placed in the room, which is then created from existing files by means of a simple program.
The virtual room recreates the work environment of Rossetti. Each picture provides a link back to
the two-dimensional HTML page in the archive. This project is a part of Jerome McGann's The
Complete Writings and Pictures of Dante Gabriel Rossetti: A Hypermedia Research Archive at the
University of Virginia's Institute for Advanced Technology in the Humanities. The archive is a
structured database holding digitized images of Rossetti's works in their original documentary forms.
Rossetti's poetical manuscripts, early printed textsincluding proofs and first editionsdrawings,
and paintings are stored in the archive in full color. The materials are marked up for electronic
search and analysis and supplied with full scholarly annotations and notes.
In addition to three-dimensional analysis and solid-modeling, images captured by digital
cameras or by digitizing conventional photographs offer scholars new tools. Museums and art galleries
have recognized digital technology as a way of providing enhanced access to their collections and
have established a number of computerized systems as permanent or temporary displays.
These applications and other educational packages incorporating digitized images have been
designed with interactivity as a prime concern.
The Museum Educational Site Licensing Project (MESL) is a collaboration of seven
collecting institutions and seven universities that define the terms for the educational use of digitized
museum images. MESL participants include the Fowler Museum of Cultural History; the George
Eastman House; Harvard University Art Museums; the Library of Congress; the Museum of Fine Arts,
Houston; the National Gallery of Art; the National Museum of American Art; American University;
Columbia University; Cornell University; the University of Illinois at Urbana-Champaign; the University
of Maryland at College Park; the University of Michigan; and the University of Virginia. This
group develops a model educational site license, evaluates procedures for the collection and
distribution of museums' digital images and
information, and assesses the impact of this distribution in both
technical and economic terms. At the end of this experiment, the participants will propose a broadly-based
system that could support ongoing distribution and educational use of museum images and text.
Most major museums host online exhibitionsfor example, the Smithsonian and the
Whitneybut now smaller university museums are able to share images of their collections over the
network. The Oriental Institute Virtual Museum showcases the history, art, and archaeology of the ancient Near
East. An integral part of the University of Chicago's Oriental Institute, which supports research
and archaeological excavation in the Near East, the Museum exhibits major collections of antiquities
from Egypt, Mesopotamia, Iran, Syria, Palestine, and Anatolia. This museum uses a series of
panoramic movies to guide visitors through a virtual tour of the galleries.
The techniques of image storage and analysis developed in scientific disciplines,
including crystallography, astronomy, and medicine, clearly have potential for the humanities. The
Image Understanding Environment (IUE), developed with the support of NASA, provides a rich set of
tools for working with imaging data that opens new possibilities to humanists. Standards are
currently under development for recording high-resolution digital images of works of art and
encoding information about these images, whether through textual description or metadata in the image
file itself. The Digital Image Access Project, sponsored by OCLC and the Coalition for
Networked Information, is developing standards for image archiving, compression, representation,
and description. The American Memory Project of the Library of Congress was one of the first sites
to implement developing standards in digital image creation and transmission, offering
background papers and technical information that describe the digital imaging process.
Images accompanied by textual data are often the only way to provide flexible and dynamic
access to collections. SPIRO, the visual online public access catalog for the University of California
at Berkeley's Architecture Slide Library, comprises approximately 200,000 35mm slides. SPIRO can
be accessed using either Image Query, a powerful database retrieval package, or the World Wide
Web. ImageQuery 2.0 was developed by Berkeley's Information Systems and Technology and
the Museum Informatics Project; it permits research by 10 different fields: period, place, creator
name, object name, view type, subject terms (from the Art and Architecture Thesaurus), source of
image, creation dates, classification number, and image identification number. Digital surrogates of
the slides help users identify the exact image for which they are searching. SPIRO currently
contains 16,000 records linked to images. The Web version of SPIRO supports research by five fields:
period, place, creator name, object name, and subject terms.
Similarly, a.k.a., developed by the Getty Information Institute, is an experimental searching
tool that uses the Institute's vocabularies to provide enhanced access to databases of cultural
information. The service allows users to search through thousands of records from several Getty
databases, including the Avery Index to Architectural Periodicals , the International Repertory of the
Literature of Art (RILA), and the Provenance Index Databases. The a.k.a. uses the Art & Architecture
Thesaurus (AAT) and the Union List of Artist Names (ULAN) vocabularies to enhance searches.
The Clearinghouse of Image Databases at the University of Arizona, which includes the IMAGELIB
listserv archives, is currently the most comprehensive listing of image projects in libraries and archives.
The digitized image, combined with computer technology, offers the art historian the same
kind of opportunity for retrieval and manipulation as that enjoyed by the classicist working with
the Thesaurus Linguae Graecae. Most access to images is still via textual description of the image,
but this is not necessarily the most useful means of access. Even on modest computers,
programs manipulating digitized descriptive codes derived from existing art reproductions can make
accurate distinctions between images that appear similar to the human eye. But this is only one
possible application of such a tool; a more useful function will be in the searching of archives of visual
images to find similar and related compositions. This type of system offers possible applications in the
areas of identifying, referencing, classifying, and analyzing images. As hardware with large storage
devices, faster processors, and enhanced resolution and graphics capabilities becomes more widely
available, art historians will have increased opportunities to perform research of this kind.
Digitized sound, not unlike images and data, can be created, manipulated, and analyzed
by computer. Computers can generate or modify musical signals and serve as a control device
for electronic musical instruments. Electronically-generated music, or electroacoustic music,
revolves around the Musical Instrument Digital Interface (MIDI). Originally MIDI allowed
electronic keyboards and associated sound processors to interact in the studio or on the concert platform.
Since then it has emerged as a protocol for information exchange through the development of
software which allows performers to record and edit performances and composers to build up complex
pieces of music. Since MIDI files mainly contain information about sequences on peripheral devices, it
is possible for sophisticated sequencing software to run on low-cost microcomputers. Composers
thus have access to powerful tools to create, edit, mix, process, and filter sound and can record
their compositions to disk.
Performance of electroacoustic music rarely involves computation taking place on stage,
although projects based at the Massachusetts Institute of Technology (MIT) and the Institut de Recherche
et Coordination Acoustique/Musique (IRCAM) in France are exploring this avenue with the aid
of artificial intelligence (AI) techniques. The goal of the MIT project is to produce what may be
called an automatic accompanying machine: in a live performance a human performer can respond to
the computer's output, but the computer cannot reciprocate. For a computer to embody
the characteristics of a performing musician, it must be able to analyze sound into coherent and
discrete pitches of determinate duration and spectrum (timbre), and then use that analysis of the
acoustic signal as a basis for interpreting a musical signal. It must generate a plausible model of what the
live performer is doing, and then use this information to structure its own performance. To build
such a machine requires an understanding of the psychoacoustical and psychological processes
involved in listening to and performing music. The MIT team has had to incorporate some "engineering,"
rather than cognitive, solutions in their system: this indicates the limitations of present understanding
of the cognitive processes underlying musical perception and performance.
Advances in compression schemes for audio files have opened possibilities for the
networked delivery of music performance. The Center for Research in Electronic Art Technology (CREATE)
at the University of California at Santa Barbara supports online production of electroacoustic
and computer music and electronic media art technology. The CREATE facility consists of
multi-track digital recording and monitoring studios, a collection of multimedia workstations, and a
network of stereo and quadraphonic digital synthesis studios. The main focus of its research is
wide-area network-based media.
The ability to deliver sound files across a network enhances online scholarship in music
theory, as well. The Journal of Seventeeth Century Music provides a refereed forum for scholarly
studies of the musical cultures of the seventeenth century. The areas of concern include historical
and archival studies, performance practice, music theory, aesthetics, dance, and theater. The
Journal emphasizes audio examples which are presented as monaural audio files and can be played on
most systems with audio hardware and software. Likewise, Ethnomusicology Online features
peer-reviewed articles; reviews of audio, video, and multimedia titles; and enhanced Ph.D.
dissertations, all accompanied by illustrative audio files and multimedia.
Hypermedia systems, also known as multimedia, use hypertext techniques to link other
mediaimages, graphics, animation, sound, and videoto text. Information linked in this way becomes
a flexible reference tool, effectively functioning as an automated encyclopedia, but with the
advantage of giving visual proximity to conceptual connections. Hypermedia systems are interactive,
allowing users to find their own path through the material; they have obvious potential for both research
One of the best established hypermedia products is Perseus, a multimedia database designed to aid the study of archaic and classical Greece. Perseus expands the ways in which ancient Greek literature, history, art, and archaeology can be investigated. In addition to complete literary
texts, Perseus contains a lexicon, morphological databases, an extensive archaeological catalog
with accompanying illustrations, several atlases, site plans, an illustrated catalog of Greek vases, a
classical encyclopedia, and an historical overview of the history of the classical period. Like many
projects (including the American Verse Project, the Early American Fiction Archive, and the Dante
Gabriel Rosetti Archive), Perseus is available in two versions: a large collection on CD-ROM and a
smaller subset of resources available without charge on the Web.
Another important multimedia project is American Memory: Historical Collections for the National Digital Library, undertaken by the Library of Congress. American Memory consists of primary
source and archival materials relating to American culture and history, combining film (early moving
images of New York City); photographs (from the Office of War Information and the Civil War);
sound recordings (World War II newsreels); and manuscripts (the WPA Federal Writers' Project during
the 1930s). The project encompasses more than 210,000 images from over 20 of the Library's
collections. Similarly, the Valley of the Shadow: Two Communities in the American Civil War, a project
sponsored by the Institute for Advanced Technology in the Humanities and the National Endowment for
the Humanities, draws on resources ranging from government documents, newspapers,
photographs, personal papers, maps, rosters, and government documents. The collection examines
the communities, daily life, politics, and religious and racial conflicts surrounding the Civil War.
The collection also traces the course of events from impending crisis to the secession and is
projected to continue through the war and into the postbellum era.
Hypermedia projects have the potential to become powerful teaching tools. The Labyrinth,
created at Georgetown University, provides organized access to electronic resources in medieval
studies, including an electronic library of texts and images, online forums, professional directories and
news, online bibliographies, an online "university" of teachers and scholars available for
electronic conferencing, and an archive of pedagogical tools. This project will not only provide
an organizational structure for electronic resources in medieval studies, but will also serve as a
model for similar, collaborative projects in other fields of study.
The Electronic Beowulf Project takes advantage of advanced imaging technology to assemble
a database of digital images of the Beowulf manuscript and related texts. The archive already
includes fiber-optic readings of hidden letters and ultraviolet readings of erased text in the early
eleventh-century manuscript; full electronic facsimiles of the indispensable eighteenth-century transcripts
of the manuscript; and selections from important nineteenth-century collations, editions, and
translations. Future additions will include images of contemporary manuscript illuminations and
material culture, and links with the Toronto Dictionary of Old English project and with the
comprehensive Anglo-Saxon bibliographies of the Old English Newsletter.
Libraries, in cooperation with scholars, are engaging in a number of large-scale
retroconversion projects to enhance access to rare texts or other materials that are brittle, damaged, or not
easily accessible. Preserving unique and rare materials has been a principle function of research
libraries but a costly part of their mission. Security concerns also require that physical access to rare
books be restricted. The Internet offers the possibility of greatly expanded access to rare and
delicate materials. And while computer images of rare book pages can serve only as pointers to the
original artifacts, the combination of searchable text and high-resolution color images provides a
detailed and flexible view of the material to teacher and scholar alike.
The Victorian Women Writers Project at Indiana University produces highly accurate
transcriptions of rare, often unpublished, literary works by British women writers of the late nineteenth
century, encoded using SGML. The works include anthologies, novels, political pamphlets, and volumes
of poetry and verse drama along with bibliographical descriptions. While the Victorian Women
Writers Project focuses on converting the text, the Making of America Project (MOA) creates
high-quality digital page images of important materials on the history of the United States. The Cornell
University and University of Michigan libraries are cooperating in the initial phase of MOA by
selecting complementary journals and monographs to ensure full capture of all significant information.
The University of Virginia Library's Early American Fiction Archive will provide both text files and
page images of 582 volumes of early American fiction. The project also presents the opportunity to
study scholarly use of rare books and their computer simulacra, and to determine the extent to
which electronic texts of rare books can serve scholars.
While many retroconversion projects focus on books, others incorporate collections of
historical data. The University of Michigan Papyrus Collection spans a broad range of materials:
textbooks, lectures, private notes and accounts, letters, invitations, decrees of kings and emperors,
official petitions, contracts and agreements of every kind, purchase orders, checks, medical recipes
and prescriptions, receipts, tax lists and declarations, court proceedings, and other legal texts.
The documents exist on papyrus fragments currently preserved in glass casings; the most important
are being digitized at high resolution for archival purposes and at lower resolution for the Web.
JSTOR, originally a project of the Andrew W. Mellon Foundation, is an independent
organization that is building a comprehensive archive of scholarly journals. The project has important
potential cost savings for libraries, and will be closely studied for its impact on the scholarly research
process in the humanities and other disciplines.
Microfilm, long used as a means of preserving brittle books, is also under consideration
for digitization. Yale University Library's Project Open Book, supported by the Commission
on Preservation and Access and the National Endowment for the Humanities, is a research
and development program that is exploring the feasibility and costs of large-scale conversion of
preserved material from microfilm to digital imagery. Project Open Book aims to create a 10,000-volume
digital image library and to enhance access to the converted volumes through the creation of
document structure and page number indices. Furthermore, the project will provide greatly enhanced
access to these materials over the Internet.
Archival holdings are the focus of some retroconversion projects. The Center for Electronic Texts in the Humanities sponsors several pilot projects, among them an experiment in encoding
archival materials in order to link the finding aid with the full text of documents. The William Elliot
Griffis Collection, part of Rutgers University Special Collections, is a major grouping of
nineteenth- and early twentieth-century print, manuscript, photographic, and ephemeral materials relating to
the early history of Japan-U.S. relations. This pilot project demonstrates the feasibility of a
networked electronic access tool for collections of rare documents, according to the guidelines of the TEI
(Text Encoding Initiative) and of the EAD (Encoded Archival Description). The finding aid is used as
a "frame" within which some of the rare manuscript materials held in the collection may be
accessed. Listings in the finding aid are provided with hypertext links to electronic editions of the
manuscripts, two views which in turn provide the essays and images in the manuscript pages.
Archives of photographs and slides form the basis of one of the largest retroconversion
projects. The Research Libraries Group's Digital Image Access Project (DIAP) involved nine
academic institutions in partnership with Stokes Imaging of Austin, Texas. This consortium experimented
with an online image management system to find economical ways to catalog and index
large photographic collections. Approximately 9,000 images related to the theme "The Urban
Landscape" were digitized and included in the project database along with related index records.
Collections include many images from the Avery Library at Columbia University, images of the
Bronx, construction photographs of the Empire State Building, and architectural drawings from the
Original and Creative Works
A growing number of writers, artists, and scholars are turning to the Internet and the Web for
creative expression and using hypertext and hypermedia as methods of publishing original works.
Michael Joyce's hypertext novel Afternoon was among the first to engage the reader specifically
through hypertext, where the software was integral to the experience and interpretation of the novel.
A number of writers, sometimes referred to as the "Eastgate School" after the name of the
prominent hypertext publisher, are creating hypertext fictions. Other recent works in this genre include
Stuart Moulthrop's Victory Garden and Jane Yellowlees Douglas'
The Electronic Poetry Center of the State University of New York at Buffalo serves as a
central gateway to electronic resources in poetry and poetics. A collaborative effort of the
University Libraries, the Faculty of Arts and Letters, and the Poetics Program, the Center represents one of
the first electronic publishing projects joining the academic and creative communities. The Center
makes a wide range of contemporary experimental and innovative poetry available online.
Resources include collections of texts, a directory of contemporary poets, small press announcements,
poetry events, spoken word archives, electronic journals, and gallery areas. The Center also sponsors
online collaborative writing spaces. Selected online texts are cataloged and made available nationally
and internationally through major bibliographic databases.
Today many poetry journals, including
RIF/T and the Mississippi Review, are published
online using hypertext links, audio archives, and accompanying images. In the classroom,
students contribute to hypertext novels, dairies, and journals that form new kinds of participatory
original works (Slatin). This virtual colloquium engages students in ways that differ substantively
from traditional concepts of classroom, course assignments, and the iterative accumulation of knowledge.
Educators recognized the value of hypertext and multimedia for pedagogic purposes during
the 1980s. This gave rise to such earlier projects as the focus on Shakespeare at Stanford, the
emblem book at the Memorial University of Newfoundland, and the project for Robert Browning at
Cornell. Perseus, mentioned earlier, also began at this time and has evolved with the technology
in compelling ways. Hypertext and multimedia allow for deeply nested fields of associations; a line
in Browning's "Caliban Upon Setebos" can be connected to other works of Browning and
Shakespeare, to various contemporary treatises on art, and to images of related plastic arts. The sound of
someone reading the lines and moving images can also be called up to help contextualize the poem.
Critics of this approach feel that while the variety and breadth of allusions can create an effective
teaching tool, the secondary material is necessarily limited to a particular perspective while possibly
giving the student the illusion of thoroughness. The Brown Storyspace Cluster, a collection of
several hundred hypertext and hypermedia Webs, gathers informational materials, fiction, and
poetry created by Brown faculty members, including Robert Arellano, Robert Coover, George
Landow, Massimo Riva, and students in their courses between 1992 and the present.
As poets, novelists, and scholars continue to work in hypertext and networked publications,
future critics and readers will need to become increasingly familiar with this technology. Primary
networked sources will thus pressure the academy to undertake new kinds of training in order to access the
new sources of creative expression.
Several years ago Stephen Harnad, then at Princeton, founded a ground-breaking journal
called Psycholoquy, designed to be a peer-reviewed journal of the highest caliber but available only
online. The reasons for this were widely broadcast by Harnad: journals in his field were terribly
expensive; the field in question, cognitive neuroscience, was of interest to very few researchers
worldwide; review of submitted articles could take months, with this delay often dating the article by the
time of publication; printed journals were difficult to come by in schools without large budgets; and
the nature of printed journals required that they accumulate in discrete volumes, rendering difficult
a more dialogic interaction of scholars with the ideas presented in their pages.
Psycholoquy has been a reasonably successful venture. The turnaround time for editorial
review of submitted articles is six weeks; the scholars vetting submission are distinguished in their
fields; the subscription is free to anyone with an Internet connection; the articles are indexed, and
scholars can respond to individual articles with their responses attached to the original article to
stimulate additional responses. Scholarly electronic publishing continues to show strong growth, with
faculty collaborating to disseminate new research online (Bailey).
Many electronic journals are published by individual scholars or by scholarly societies rather
than by academic publishers. Some journals, like
Critical Inquiry, an interdisciplinary journal for the
arts and humanities, publish only tables of contents and excerpts of current and back issues. Others
focus on one aspect of the traditional scholarly journal:
History Reviews On-Line is a quarterly
electronic journal devoted to reviewing books on all fields of history. The International Philosophical Preprint Exchange is a service provided by the Department of Philosophy, Chiba University, and
an international working group comprised of scholars at the University of Toronto, York University,
the University of Missouri, and the University of Alberta. The Exchange provides preprints of articles
and voluntary peer review, as well as information on eventual publication or presentations
More and more journals are offering the complete text of articles online. The
Slavic Review publishes an electronic post-print edition of the print journal, with articles appearing on its Web
site several months after print publication. Bryn Mawr Classical Review, founded by James O'Donnell at the University of Pennsylvania and designed for publication on the Internet, publishes
the complete text of articles. Early Modern Literary
Studies, a refereed journal for English
literature, literary culture, and language during the sixteenth and seventeenth centuries, features
complete articles as well as an online readers' forum and links to related Internet resources in the field.
The Journal of Buddhist Ethics is also designed specifically for online publication; in addition
to publishing peer-reviewed scholarship, it hosts online conferences and offers the full text of the
Pali Canon online to which scholars may link in their articles.
As of this writing, the number of electronic journals exceeds 1,700 and is increasing at a rapid
pace each year. Many, like Psycholoquy and
History Reviews Online, are new entries in reaction to the slow
and costly printed serial publications. Publishers of traditional journals have been slow to warm to
the idea of electronic publishing because of the uncertainty of revenue and also because the
medium is new, with a predisposition for transience: machine and software obsolescence, uncertainty
of funding, and the uncertain status of electronic publishing as legitimate work for tenure
consideration all contribute to their doubts.
Nonetheless, academic presses are beginning to offer full-text journals online for a
subscription fee, and it is likely that the future of serial publication will be predominantly a networked
enterprise. Project Muse provides worldwide, networked, subscription access to the full text of over 40
Johns Hopkins University Press scholarly journals in the humanities, social sciences, and
mathematics. JSTOR proposes to build a reliable and comprehensive archive of important scholarly
journal literature, improve access to these journals, fill gaps in library collections of journal back
issues, address preservation issues such as mutilated pages and long-term deterioration of paper copy,
and assist scholarly associations and publishers in making the transition to electronic modes
A remarkable amount of commercial software is currently available. Tools for handling text,
data, and images can often be tailored to support a range of humanities research projects. BRS/Search
and similar programs offer powerful free-text search capabilities; they are proving useful in many
areas, including literary and linguistic scholarship. IdeaList, a free-text database manager, is popular
among historians; Access and other database packages are also widely used. Additional programs,
including Collate and Concorder, have emerged from the innovations of individual scholars. An
ongoing critical appraisal of software would be of significant value to humanist scholars; the number
of available packages and diversity of application areas make this task well beyond the scope of
Many historical applications demand the power of relational databases to handle the
complex interrelation of the data. A recognized standard language for such databases, the Structured
Query Language (SQL), is used in such management systems as INGRES and Oracle. Numerical
analysis was one of the first applications of computers in historical and archaeological research; the
Statistical Package for the Social Sciences (SPSS) continues to be widely used and is well supported
by computing services,; with microcomputer versions now also available. A wide variety of
modelling and graphical packages are also in use, some of which, such as ARC/Info, integrate
advanced mapping with relational database facilities. Commercial software, even when it can be
customized, does not meet all the needs of scholars and in some cases it is important to write special
software for humanities applications.
The exchange of textual information poses problems, some of which can be alleviated by the
use of standard conventions. Many electronic texts are now encoded using Standard
Generalized Markup Language (SGML), which is the International Standards Organisation convention for the
description of electronic documents. Before this standard was accepted, electronic texts were often
encoded using idiosyncratic systems, which meant that they could not be exchanged among scholars.
The pace of conversion of text to electronic form has accelerated rapidly, and the Text Encoding Initiative was established in order to address the problem of a proliferation of differently encoded
texts (McCarty). The objective of this international project is to develop guidelines for the preparation
and interchange of machine-readable texts for scholarly research. The first guidelines, produced in
1990, suggested that SGML be adopted by scholars for the encoding of electronic text. SGML provides
a syntactic framework within which markup or encoding tags can be defined by adding only
those characters which can be transmitted over networks and read by most computers. The
application of SGML is not limited to textual resources and can be extended to other disciplines; for
instance, it is already in use for the encoding of music. If these standards gain general acceptance, the
encoding of new data and the interchange of existing data will be positively revolutionized. Encoded
Archival Description (EAD) is being developed to work in tandem with SGML. Unicode, a
platform-independent scheme for languages using non-Latin characters (including Greek, Russian,
Arabic, Hebrew, Japanese, Korean, and Chinese), is also under development as an international
standard to enhance access to computerized information.
SGML has sparked the creation of a new generation of text-handling software. Dynatext,
which makes use of SGML to provide sophisticated textual analysis and manipulation, is typical of
these recent products: it accepts SGML-encoded text directly and creates electronic texts,
automatically building a full-text index and establishing hypertext links for tables, figures, footnotes, and
cross-references. A browsing facility allows users to read, query, and annotate the text; full-text
searching is also supported. Style sheets use SGML tagging to define the format for on-screen display or
printed document. This system is not without its limitations; chief among them, for humanists, is cost.
Many of the most powerful and advantageous software tools are priced far beyond the means of
most scholars, even when generous educational discounts are provided. Panorama, a Web
browser for SGML-encoded texts, provides the same functionality as Dynatext for documents
published over the Internet.
Researchers use text-handling software to carry out complex analyses that would be tedious
and time-consuming if attempted manually. Many such packages are now available to scholars: one
of the best known is the Oxford Concordance Program (OCP) <OCP@VAX.OX.AC.UK>, originally developed for
mainframe computers but now available in a PC version. OCP is a general- purpose tool suitable for
such applications as stylistic analysis, preparation of language courses, vocabulary acquisition,
dictionary making, text editing, and content analysis. It is widely used in literary research and in other
disciplines (history, for example) where textual analysis is required.
Investigation of manuscript traditions and the creation of critical editions from multiple
sources requires specialized software. Collate, a program developed by the Computers and
Manuscripts project at the Oxford University Computing Services, helps scholars collate up to a hundred
texts simultaneously; it provides facilities for adjustments of the collation by the user and can
generate output in many different formats, including several recommended by the Text Encoding Initiative.
A number of simple-to-use hypertext authoring systems are now available, such as
HyperCard from Apple. Instructors can use these programs to compile their own hypertext
courseware. Authoring tools which offer a "shell" for the production of multimedia materials are also widely
used: these include Macromedia, Storyspace (Eastgate), and Toolbook (Asymetrix).
Summary and Outlook
This synopsis indicates that much research is underway involving the creation of new resources,
the conversion of conventional sources into machine-readable form, and the analysis of material
to answer scholarly questions. Computers serve different functions at different stages in the process
of research. The current model of raw materials of scholarship includes text, data, sound, and
images, but there has been a growing attempt by humanists to use computers to model knowledge.
Expert systems are being used by some researchers along with other Artificial Intelligence
techniques (neural networks, for example) to formalize knowledge and automate the process of
interpretation. While projects in this area have generated lively debate, few have produced useful results to
date. However, they do show promise and are likely to become increasingly important in the
Preface | I. Background
II. Information Technology and Scholarship | III. New Developments and Change
IV. To Challenge and Invigorate Future Scholarship | V. Principal Recommendations and Follow-up Activities