American Council of Learned Societies
Occasional Paper No. 41


Computing and the Humanities:
Summary of a Roundtable Meeting


III. SOFTWARE AND STANDARDS DEVELOPMENT


Michael Joyce contrasted conventional and computer-based interactions: "Here we are visible within an invisible technology. Elsewhere we are invisible within a visible technology." Such characterizations help to frame questions about why humanists want software, what they want it to do, and how they want it to operate.

Computing and communications can affect what humanists work with—the presentation, representation, and cumulation of content—and how they do their work—the tools and methodologies. In humanities as elsewhere, software and standards drive available capabilities. Bruce Schatz noted that humanities applications benefit from the propensity of humanists to read, illustrated by a history of efforts to build electronic corpora and concordances and the building of systems to support analysis across multiple sources. He pointed to the emergence of more potent capabilities: "The era of comparative literature being technically possible for large bodies of humanistic material is coming very fast. It is not being pushed by humanities at all, but they are likely to be the greatest users."

Roundtable participants agreed on the need for more and better tools for creating, browsing, storing, finding, and retrieving complex objects with varying degrees of structure, from document-like to database and combinations. For example, Joseph Busch remarked that "databases are not necessarily oriented to functioning in new ways, and software is needed to make databases interoperate more with document-like objects." Databases, networks, and storage systems can enable wider overall participation in humanities activities.

Michael Neuman pointed to the need for item-level access to works under examination.

My background is in English literature. In Othello, as the play moves to its tragic conclusion, there is a stage direction that says "enter Othello and Desdemona in her bed." Well, a scholar asks how, on a stage without curtains, does a character enter in a bed? In what other plays can we find beds that appear in stage directions? So we would need not only access to a rich corpus of dramatic data, but also the ability to search for the word "bed" within the context of a structural component, namely the stage direction. Simple text searching will not be sufficient. The number of hits on the word "bed" would be monumental. So we need something greater than item-level access. We need component-level access.

Visual and audio elements round out the potential represented by textual material and may drive new directions in scholarship. Mused Stephen Franklin:

If you go back 150 years, you can argue that the predominant forms of literature are textual: plays and poetry. I think that when the infrastructure gets there, Ed Ayers is going to start having oral material directly, rather than its representation in text. There is an enormous emotional impact. [There have been problems in ethnomusicology, for example, where] people could not sell enough CDs to justify even private pressing. So I do see, along with Chuck Henry, an enormous opportunity for democratization in terms of the publication.

Beyond consideration of tools and treatment for specific kinds of media is telepresence, which promises to eliminate costs of transport and storage, as well as the ability to control ideas and material. Other complex systems will shape perceptions of what is possible, moving expectations beyond those that can be characterized as "automation." Willard McCarty acknowledged the difficulty of conceptualizing new possibilities.

The idea that a humanist should be able to get in the car and drive from point A to point B, as he or she has always done, rather than first fix the engine and perhaps manufacture a part so that it can work before it can drive from A to B, is fundamentally mistaken. What happens when you have a new form of transportation is you do not drive from A to B anymore because all of a sudden you can go to C, D, E, and F. So the whole world changes around you. While we still have people who only want to go from A to B, there must also be people who are losing time in that sense of struggling, of observing, of musing about and experiencing the most important thing of all, how the world has been changed by the technology. The derivation of a common language is from a group of people who march into the open spaces that Michael Joyce talked about and set up camp there—not from people who sit in one armed camp or another and receive envoys and go back and forth between these different realms, but from people who set up in the new realm and begin to operate in the common language.

At issue in this discussion is evolving thinking about digital libraries and the nature of information systems generally. The digital library concept has broad appeal; in part because of the practical realities that digital library projects represent, they are the focus of a major United States government research and demonstration initiative, and they represent a logical, comprehensible advance over conventional libraries.5 Thus Joseph Busch declared that "operating in the information society means having access to a content-rich and context-rich distributed digital library." Library research today illustrates the difficulties in bridging electronic and paper worlds. As Jerome Saltzer noted, "if half of your material is on the computer and the other half is in the form of paper, that does not work nearly as well as working entirely on paper or entirely on a computer system." More sophisticated tools, such as those for story telling, pose even greater challenges. Beyond finding and reading the bits, problems of interpreting them are common across different kinds of data, remarked Jerome Saltzer, and are emblematic of the historian's challenge.

Shared artifacts and information were part of the rationale for building digital libraries. Future projects may involve spaces for collaboration as well as for artifacts, and evolving perspectives about what people want to do and how this will affect the design and architecture of systems.

Edward Fox proposed a framework for assessing system and software possibilities:

Fundamentally, we deal with four constructs here: streams, structures, spaces, and scenarios. In the world of streams, we have spent a long time talking about text. Now we are moving to multimedia, oral, and other kinds of representations. In both the "real" world and virtual new worlds, we have analog streams and bit streams. Structures is another area. We sometimes talk about syntax as an example, and we have spent a lot of time focusing on databases. In recent years there has been a focus on what is called unstructured information which is really not unstructured: an example is the book, which along with other kinds of literary works has tremendous structure. We call them unstructured because we do not know how to deal with them properly. The third aspect here is spaces. We are shifting, in part, from physical spaces to virtual spaces—in part, from term spaces to concept spaces. We have also heard today about methodological spaces, emotional spaces, collaborational spaces, and cultural spaces. The fourth aspect is scenarios. This gets into the shift from passive to active. On the computer side, we talk about functions and tasks. But in the humanities we speak of stories and of interactions, and these are other kinds of scenarios.

The very concept of structure implies decision-making—what kind of structure will be designed—which in turn raises questions about who makes what decisions and who interprets what for whom. Humanists underscored the importance of finding ways to engage users of humanities information in building structures for content and analysis, observing that this did not happen often enough in the past. Today technology is making it easier for users to populate, update, and maintain databases, and infrastructures are needed to support such user roles.

Pros, Cons, and Experience with Standards

Standards can define functions, processes, or services. Humanist applications have advanced with the use of encoding and cataloguing/classification standards (Text Encoding Initiative [TEI], Standard Generalized Markup Language [SGML], and Hypertext Markup Language [HTML], and Z39.506), but these same standards have limitations recognized by both humanists and technologists. Standards can extend to a variety of procedural or administrative functions, including the management of rights and permissions associated with intellectual property protection and more generally the use of content and services for which there are restrictions, fees, or other conditions.

Standards are a necessary evil or a mixed blessing, depending on one's particular perspective. Humanists, like other users of information technology, asked for standards in the hope of lessening the distractions of figuring out how to work with the technology. Technologists, mindful of how fast information technologies evolve (and, possibly, more tolerant of such change), argue for less emphasis on standards. Standards have the obvious appeal of facilitating the sharing of information. They also minimize the learning burden. At the same time, standards pose the risk of confining users to an obsolete technology or a lowest common denominator because of the difficulties and delays associated with setting standards.

Waves of innovation and ensuing system obsolescence in consumer and professional electronics make clear the challenge of preserving access to humanities material over time. Part of the problem relates to the changing base of support for digital formats. Michael Lesk, for example, reported that only half of the ten word-processing programs advertised in Byte magazine in 1985 remained on the market in 1995. In the field of music, Camilla Cai related the demise of reel-to-reel recording equipment to the problem of maintaining access to recorded information over time after those recorded die or become otherwise unavailable. The problem is compounded, noted Joseph Busch, by the intrinsic concern of the humanities with history: the age of documents, for example, has been a long-standing area of concern. William Wulf explained that some scientific data also have properties of uniqueness—certain satellite data, for example, are not reproducible—and tend to be more concentrated than historical data.

The prospect of an evolving technology base implies a need to prepare to move beyond a given standard. In the library and archiving context, observed Lesk, the task of preserving information and content over time implies planning to refresh formats and other elements subject to standards. It is unreasonable to expect a single standard to endure. In the context of technological support for working with digital content, Mary Shaw suggested planning for technology to reconcile conflicting standards as an accompaniment to technology for helping to find content. "There are going to be different standards whether we like it or not. While it is worth investing in shared standards because that reduces the mayhem, we must, at the same time, learn to live with different standards because that is clearly unavoidable."

Edward Fox suggested considering actual experiences:

The TEI [Text Encoding Initiative] was a noble effort. What has happened to it? Where is it going to go? That has to be something that this group debates. It was the most rigorous investigation and representation of information that we have had in the last decade. It has not filtered into the computer science world with thinking and knowledge representation.

Michael Neuman offered a more positive characterization of the TEI process but acknowledged a difficulty: "[the] determination of the components of text-based genres and how they could be represented using Standard Generalized Mark-up Language so that those details, given a large database, could be ferreted out, combined, juxtaposed." Neuman indicated that regardless of how well TEI served to encode humanities material, its value would not be realized so long as the tools currently available to deal with it remain intractable. He seemed to question whether the TEI as an application of SGML had any useful application to computer science work with knowledge representation, and what prospects were for scientists and humanists working together to produce more usable tools for making TEI more widely available.

Usefulness: Connect Capabilities to Real People

How much of a given content is available in a system is a critical element of its usefulness. As Joseph Busch observed about the objects of humanist activity,

. . . the most important condition necessary for humanists to work is a real critical mass of digital content, [which] consists not just of text but of pictures, sounds, and multimedia source materials, and the links between them.

Computer scientists and humanists alike talk of getting at the "good stuff," but appraising and deciding on what that includes may be subjective and variable over time. The selection challenge is compounded: not only content, but format, tools, and approaches for finding, filtering, and indexing all involve selection. Especially subjective may be selection of what gets preserved in a context of prolific production of content made easier via computer technology. Information technology offers humanists and others with support for more inclusive collections as well as the threat of overload. Although technological support for information finding and filtering is growing, there are philosophical issues about the design of useful information repositories that affect both the resources available to humanists and the design and architecture of systems by computer scientists.

Bruce Schatz noted that

. . . the hardest thing about doing information-style projects like digital libraries is getting enough data coverage. . . . The problem with most of the electronic projects is that the information they can get electronically in an appropriate form is such a narrow segment of the actual information that people use that the system is illustrative but it is not actually useful.

Other elements of usefulness pertain to how a system can be used. There are lessons to be learned from other domains. Bruce Schatz, for example, described how an innovative system he developed for molecular biologists studying nematode worms generated elements echoed in the larger human genome projects. "But the system itself withered and died because the things it did were illustrative of what the future was like but not immediately useful in the present." Schatz noted that objectives and activities may differ depending on the target users: practitioners, professionals, students, the general public.

Camilla Cai lamented the practical problems of learning how to use computer-based systems. Both the initial learning of a new system and the periodic need to learn additional systems add up to time lost. Cai observed that some of her colleagues avoided using computers in order to avoid "wasting time." Other humanists echoed this concern with anecdotes about personal or departmental frustrations in adopting or upgrading information technology. These observations attest, in part, to a broad need for better user interfaces, which would facilitate adoption of computer-based technology by humanists for scholarship and for outreach to the general public (for example, systems aimed at museum patrons).



Contents
I. Introduction and Background
II. Toward a Common Language: Methods and Context
III. SOFTWARE AND STANDARDS DEVELOPMENT
IV. Economic and Institutional Issues
V. Next Steps: Talk First to Select Actions Better
Notes | Appendices

Back to Top