Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.
The introduction of any kind of new technology is often a painful and timeconsuming process, at least for those who must incorporate it into their everyday lives. This is particularly true of computing technology, where the learning curve can be steep, what is learned changes rapidly, and ever more new and exciting things seem to be perpetually on the horizon. How can the providers and consumers of electronic information make the best use of this new medium and ensure that the information they create and use will outlast the current system on which it is used? In this chapter we examine some of these issues, concentrating on the humanities, where the nature of the information studied by scholars can be almost anything and where the information can be studied for almost any purpose.
Today's computer programs are not sophisticated enough to process raw data sensibly. This situation will remain true until artificial intelligence and natural language processing research has made very much more progress. Early on in my days as a humanities computing specialist, I saw a library catalog that had been typed into the computer without anything to separate the fields in the information. There was no way of knowing what was the author, title, publisher, or call number of any of the items. The catalog could be printed out, but the titles could not be searched at all, nor could the items in the catalog be sorted by author name. Although a human can tell which is the author or title from reading the catalog, a computer program cannot. Something must be inserted in the data to give the program more information. This situation is a very simple example of markup, or encoding, which is needed to make computers work better for us. Since we are so far from having the kind of intelligence we really need in computer programs, we must put that intelligence in the data so that computer programs can be informed by it. The more intelligence there is in our data, the better our programs will perform. But what should that intelligence look like? How can we ensure that we make the right decisions in creating it so that computers can really do what we
want? Some scholarly communication and digital library projects are beginning to provide answers to these questions.
New Technology or Old?
Many current technology and digital library projects use the new technology as an access mechanism to deliver the old technology. These projects rest on the assumption that the typical scholarly product is an article or monograph and that it will be read in a sequential fashion as indeed we have done for hundreds of years, ever since these products began to be produced on paper and be bound into physical artifacts such as books. The difference is that instead of going only to the library or bookstore to obtain the object, we access it over the network-and then almost certainly have to print a copy of it in order to read it. Of course there is a tremendous savings of time for those who have instant access to the network, can find the material they are looking for easily, and have high-speed printers. I want to argue here that delivering the old technology via the new is only a transitory phase and that it must not be viewed as an end in itself. Before we embark on the large-scale compilation of electronic information, we must consider how future scholars might use this information and what are the best ways of ensuring that the information will last beyond the current technology.
The old (print) technology developed into a sophisticated model over a long period of time.1 Books consist of pages bound up in sequential fashion, delivering the text in a single linear sequence. Page numbers and running heads are used for identification purposes. Books also often include other organizational aids, such as tables of contents and back-of-the-book indexes, which are conventionally placed at the beginning and end of the book respectively. Footnotes, bibliographies, illustrations, and so forth, provide additional methods of cross-referencing. A title page provides a convention for identifying the book and its author and publication details. The length of a book is often determined by publishers' costs or requirements rather than by what the author really wants to say about the subject. Journal articles exhibit similar characteristics, also being designed for reproduction on pieces of paper. Furthermore, the ease of reading printed books and journals is determined by their typography, which is designed to help the reader by reinforcing what the author wants to say. Conventions of typography (headings, italic, bold, etc.) make things stand out on the page.
When we put information into electronic form, we find that we can do many more things with it than we can with a printed book. We can still read it, though not as well as we can read a printed book. The real advantage of the electronic medium is that we can search and manipulate the information in many different ways. We are no longer dependent on the back-of-the-book index to find things within the text, but can search for any word or phrase using retrieval software. We no longer need the whole book to look up one paragraph but can just access the piece of information we need. We can also access several different pieces of infor-
mation at the same time and make links between them. We can find a bibliographic reference and go immediately to the place to which it points. We can merge different representations of the same material into a coherent whole and we can count instances of features within the information. We can thus begin to think of the material we want as "information objects."2
To reinforce the arguments I am making here, I call electronic images of printed pages "dead text" and use the term "live text" for searchable representations of text.3 For dead text we can use only those retrieval tools that were designed for finding printed items, and even then this information must be added as searchable live text, usually in the form of bibliographic references or tables of contents. Of course most of the dead text produced over the past fifteen or so years began its life as live text in the form of word-processed documents. The obvious question is, how can the utility of that live text be retained and not lost forever?
Electronic Text and Data Formats
Long before digital libraries became popular, live electronic text was being created for many different purposes, most often, as we have seen, with word-processing or typesetting programs. Unfortunately this kind of live electronic text is normally searchable only by the word-processing program that produced it and then only in a very simple way. We have all encountered the problems involved in moving from one word-processing program to another. Although some of these problems have been solved in more recent versions of the software, maintaining an electronic document as a word-processing file is not a sensible option for the long term unless the creator of the document is absolutely sure that this document will be needed only in the short-term future and only for the purposes of...
„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.
Anbieter: WeBuyBooks, Rossendale, LANCS, Vereinigtes Königreich
Zustand: Very Good. Most items will be dispatched the same or the next working day. A copy that has been read, but is in excellent condition. Pages are intact and not marred by notes or highlighting. The spine remains undamaged. Artikel-Nr. rev7617147659
Anzahl: 1 verfügbar
Anbieter: Books From California, Simi Valley, CA, USA
hardcover. Zustand: Very Good. Artikel-Nr. mon0003507514
Anzahl: 1 verfügbar