|
|
THE SCOPE OF VIRTUAL LIBRARIES
SPECIFIC MATERIALS IN VIRTUAL LIBRARIES
CREATING AND MAINTAINING VIRTUAL LIBRARIES
SUPPORT TOOLS IN VIRTUAL LIBRARIES
The creation of the Information Highway presents
libraries with a critical challenge. The dream of the librarian, to offer
information on all subjects, to all people, has suddenly come true, on
an infinite scale. Any person with access to Internet has the same possibility
of finding information as any other; age, education or location are no
longer relevant. It is also easy to create a home page and publish worldwide.
Not only is Internet highly democratic, it is also still largely free.
Internet constitutes a major social revolution, on the level of printing
or television. It has arisen suddenly; WWW protocols were only established
in 1992. A new vocabulary has sprung up: URLs, home pages, search engines
and Virtual Libraries. There is, however, no consensus as to the use of
the latter term. In some libraries, Internet Access Rooms are called Virtual
Libraries. At the other extreme, some consider the Internet itself a huge
Virtual Library. This paper attempts a satisfactory definition of a Virtual
Library, which should maintain the characteristics which enabled traditional
libraries to serve society so well. It should be easy to access and free
of direct usage charges. It should offer carefully selected documents,
organized for ease of use. Materials for which libraries are already acknowledged
sources will continue to be the principal components of virtual libraries.
Libraries offer books, periodicals, theses, patents, multimedia, CD-ROMs,
data bases, information services etc. These are evaluated in function of
the new media. The perfect document for a virtual library would be of great
interest to a wide range of people, relatively brief, already available
in electronic form, and free of copyright or similar restrictions, but
only a small number of documents can now fulfill all those requirements.
Certain materials should change significantly due to the impact of Internet;
scientific journals, for instance, may be transformed within a few years.
Electronic media offer facilities unknown in traditional libraries, notably
interactivity and modifiable documents; these factors will also be evaluated.
Although Internet is still basically free for users, a Virtual Library
is relatively expensive to create and maintain. Texts have to be selected,
scanned, verified and indexed. Copyright holders have to be contacted and
rights obtained. A powerful server is required, supported by specialized
software and personnel. The site has to be attractive and organized for
ease of use. The server must operate all day, every day. Just like a traditional
library, the virtual counterpart should constantly add new materials. The
Information Highway is international, but computers are located in countries
with different laws, customs, and attitudes towards free expression. The
conditions under which Virtual Libraries will be best able to serve their
societies will be delineated and existing services evaluated. Libraries
have long been the best public source of reliable, recorded information,
available for the creation of other documents and the expansion of knowledge.
These principles must be successfully transferred to the Information Highway,
so that Virtual Libraries can serve society through the coming millennium.
The creation of the Information Highway presents libraries and Information services with a critical challenge as the world enters the new millennium. The dream of the librarian, to offer information on all subjects, to all people, has suddenly come true, on an infinite scale. The Internet offers an infinity of information sources, many of which are attractively packaged and easy to use. Any person with access to Internet has the same possibility of finding information as any other; considerations of age, education or location are no longer relevant. The dream of the publisher, to be able to publish without boundaries, has also become reality. Anybody can have a site on the Internet and publish more or less whatever they want to readers worldwide. And, as if this super-democracy was not enough, Internet can still be used largely free and users are strongly against tariffs. There is not even an initial consensus as to the best way to impose charges, let alone how to divide up the proceeds.
Internet clearly constitutes a major social revolution,
certainly of equal impact to the invention of newspapers or television,
quite possibly on the level of printing and the industrial revolution.
It has come upon us very suddenly; World Wide Web protocols were only established
in 1992, the first graphical browsers the year after. Within this brief
time span professionals have been forced to absorb a new vocabulary: URLs,
home pages, search engines and gateways.
Amongst the vocabulary of greatest interest to professionals in the library and information science field are terms such as "Digital Library", "Electronic Library" and "Virtual Library", which provide a starting point to this discussion paper (Birdsall, 1994, Chepesiuk, 1997, Saunders, 1996). These are all relatively new and cover a variety of activities, but they basically denote services which offer electronic access to texts. A library which offered traditional hard copy books and periodicals could not be described in these terms, even if it had an automated catalog or even an automated union catalog. There is a certain difficulty in distinguishing between the three terms, "Digital / Electronic / Virtual Library", which often seem to be used interchangeably (Saffady, 1995). Logically, it could be suggested that a digital library should contain traditional documents which have been digitized for electronic use, whereas an electronic library contains documents which were created directly in electronic form. This distinction may sound attractive at first sight, but is clearly difficult to maintain in practice. One is reminded of the knots into which catalogers tied themselves, trying to distinguish between original, microfilm and xerographic copies of the same text. It seems fairly clear that "Digital Library" is the preferred term in the United States and Canada, whereas "Electronic Library" is more generally used in Britain. (Rowley, 1996). It is sad that local usage of this nature should arise so rapidly in relation to services which can be accessed world-wide. One might have hoped that Internet would promote a standardization of terminology, bringing people together, rather than creating additional divisions between them. The most sensible thing would be to avoid this type of problem altogether by adopting the term "Virtual Library", which would presumably include documents of both types, generated in areas of different linguistic influence.
Again, terminological problems arise, as there are various shades of meaning to the term "Virtual Library". Some libraries have created special rooms from which patrons are able to use Internet; access to the library's CD-ROM collection is usually also offered (Commings, 1997). These rooms are often called "Virtual Libraries", but it would clearly be more precise to use the term "Internet Access Room" for services of this type.
At the other extreme, some internauts consider the World Wide Web itself a huge Virtual Library. They check its catalog at gateways such as Yahoo or Infoseek, then go on to use its indexes at search engines such as Altavista or Lycos. A variation on this is to consider Yahoo a Virtual Library, because it organizes and selects its sites. There is a definite logic in this usage, because libraries, Internet and Yahoo can all be broadly categorized as large-scale public information access systems. Librarians will be flattered that a segment of the public might want to make such a strong parallel between the most modern large-scale public information access systems, Internet or Yahoo, and the most established, the library. However attractive this usage might be, it does raise a specific problem. If the Internet as a whole is considered a "Virtual Library", what name should be given to an Internet site which offers access to materials and services of a type similar to those found in traditional libraries? If a site offers electronic versions of books, periodicals and other materials traditionally offered by libraries, as well as reference and information services, it would clearly be best to term it a "Virtual Library". As Internet as a whole has its own, highly distinctive and fully established name, so distinctive that it is normally written with a capital letter, it would be better to use "Internet" for the entire system and reserve "Virtual Library" for library-style services within Internet. The same applies to Yahoo, for which distinctive terms, gateway or directory, already exist.
This also solves any possible problems which might arise in distinguishing between Virtual Libraries and Virtual Archives. Up to now documents originally produced in multiple copies, by printing, were stored in libraries, while documents produced in one copy, manuscript, went to an archive. But electronic systems function in a totally different manner; one electronic copy is produced, placed on a server and a copy is sent to a clients' machine whenever the text is requested. Distinctions between numbers of copies which were valid between printed and manuscript works are clearly irrelevant to Internet. But it is both valid and useful to reserve the name "Virtual Library" for a site which offers electronic versions of documents traditionally found in libraries, and "Virtual Archive" for sites which offer electronic access to the type of document traditionally found in archives.
It is therefore valid to recommend the term "Virtual Library" from the range of alternatives available. It is also clear that a Virtual Library must maintain the principal characteristics which have enabled traditional libraries to contribute so much to society over the course of several centuries. A good library should be easy to access and free of direct usage charges; it should offer documents which have been professionally selected and organized for ease of use. These basic principles clearly apply both to traditional and virtual libraries; the traditional library should be in a fixed physical location, whereas the virtual library should be accessed via the Information Highway. The materials for which libraries are already acknowledged sources will for some time doubtless continue to be the principal components of virtual libraries. Traditional libraries contain a wide variety of materials, including books, periodicals, theses, patents, multimedia, CD-ROMs etc., and offer access to data bases and information services. Patrons will expect a similar range of services from sites which proclaim themselves as Virtual Libraries (Hurt, 1997; Norbie, 1994; Riggs, 1995).
It is now possible to present a full definition:
a Virtual Library offers, via the Information Highway, easy access, without
direct charges, to professionally selected, organized and processed electronic
texts, emphasizing those documents traditionally found in libraries, such
as books, periodicals and similar materials, together with quality information
services.
THE SCOPE OF VIRTUAL LIBRARIES
The presentation of a list of documents, originally produced in paper formats, which could possibly be included in virtual libraries, such as books, periodicals, theses, patents etc. immediately raises the question as to whether all these items are equally suitable for electronic presentation. If some types of document are more suitable than others, what criteria can be used to rank their suitability for an electronic environment? Here it is necessary to examine some general concepts. Any person, anywhere in the world, would ideally be able to access materials made available in a Virtual Library, but specific institutions incur significant expense in setting up these services and maintaining them on the information highway. It would clearly be more efficient in those circumstances to give a certain priority to documents for which there is a definite, wide interest. These documents would probably be more modern documents, as there is generally more interest in recent books and periodicals, rather than older materials.
The mention of the cost of maintaining virtual libraries brings us to another consideration; costs of converting documents to electronic form and maintaining them on a server are proportional to the length of the document, so there is some advantage in placing briefer documents onto the Information Highway. Computer users do not in general like to read lengthy documents on computer screens. One of the most striking features of the World Wide Web is that the majority of documents available on Internet are quite brief and many users report that their attention span has fallen, as they click from site to site. In these terms it would be sensible to concentrate on brief documents, such as periodical articles and reports, rather than lengthy books and theses.
In order to place a document into a Virtual Library, it must be available in electronic form. In the case of documents currently available on paper supports, this means submitting the document to a lengthy process of scanning (or even keyboarding), verification, and indexing. The format of the document has to be altered in many cases; traditional documents were not presented with links or jumps in mind, and must be reorganized to take full advantage of the potential of hypertext media. There is therefore a significant advantage in putting modern documents, which already exist in electronic form, into Virtual Libraries. Certain categories of documents are already routinely available in this form: academic documents, scientific and technical periodicals, press agency reports, newspapers and magazines are all electronically generated today. By now the great majority of newly published books in developed countries will also be produced by electronic publishing programs of one type or another.
The considerations presented up to now have all been inherent in the document, but there are other considerations which are imposed by society, of which the most strongly formalized is copyright. Copyright has applied to texts for several hundred years, and the fact that the Information Highway is new does not mean that existing texts can be made available through it without regard to the interests of existing copyright owners. In fact these are both organized and vociferous in the defense of their rights, which they hope to see further consolidated within an electronic environment. All indications are that copyright restrictions within an electronic environment will be at least as strict as those in the world of paper based books and magazines. Rights normally only terminate fifty or seventy years after the death of the author, therefore the majority of books written in this century are still covered by copyright restrictions. To place them into a Virtual Library, it is necessary to locate the copyright owner, normally the publisher, and negotiate a fee. There is as yet no standard licensing fee or charge for these rights, but publishers are businessmen who show no tendency to dispose of their rights at reduced prices.
The publishing industry has for the past five hundred years, beginning with the Gutenberg Bible, been based firmly on the production of physical objects for posterior sale. Publishers have a highly developed sense of commercial awareness and are not going to abandon a tradition of half a millennium because of a system which has less than a decade of existence. At the moment there is no real mechanism for charging users of electronic documents or of Internet in general; in fact many Internet users are firmly against such charges. If charges were instituted, it is not certain how they would be divided and what proportion would accrue to the holder of the copyright. Until questions such as this are settled, it is safe to assume that those who currently gain significant sums of money from selling documents will maintain only a token presence on the Information Highway.
Virtual Libraries will be used by people who are familiar with traditional libraries; therefore they will naturally want to give preference to materials normally found in existing libraries. In practical terms, books, periodicals and similar documents would be prime candidates for Virtual Libraries. Resumes and course notes would be best located on the home pages of specific individuals; virtual libraries could offer links to these materials, but should not be fully responsible for them. Product manuals are important documents, excellent candidates for hypertext presentation and electronic access, but their primary location would be on the sites of the companies responsible for the products.
Summing up, we can say that the perfect document
for a virtual library should be of great interest to a wide range of people,
relatively brief and already available in electronic form. It should also
be free of copyright restrictions; the person responsible for the document
should perceive that a definite advantage will be gained from placing it
onto Internet, and it should be the type of document which a user would
normally expect to find in an environment termed a library. Clearly, only
a small number of documents can fulfill all those requirements, but all
must be considered in those terms.
SPECIFIC MATERIALS IN VIRTUAL LIBRARIES
Books are the most traditional library materials and can be examined first in terms of these criteria. Clearly they are by definition lengthy, rather than brief documents, a negative factor. Older books are free of copyright restrictions, but readers generally are more interested in modern books, where copyright is normally held by a publisher, who will be unwilling to permit an electronic version without guarantee of receiving adequate payment. Older books have to be scanned or otherwise converted to electronic form, whereas modern books are often already available in electronic form, due to the penetration of modern printing methods. Therefore modern books would be more appropriate for electronic access, but in fact these are exactly the books whose rights are controlled by publishers. There is a clear danger that publishers will develop their own methods of distributing electronic versions of books and gaining a financial return from those who access them, while virtual libraries will be left with texts which are older and offer no prospect of commercial gain.
These arguments apply to commercial publishing in general, but various other subsets can be identified within the world of books. Although Internet could be considered more adequate for works of a wider interest, it would be economically viable to transfer much non-commercial publishing to electronic form. Books currently published by university presses and government institutions would be prime candidates, as these could be placed on institutional servers. Texts which have significant cultural, national or religious content come into the same category. Under these conditions the "publisher" would rarely have to pay the full cost of the computational element. Poetry represents a similar special area: for centuries poets have paid to publish thin volumes of their own verse. Many could arrange sufficient space on the Information Highway to make electronic publication possible, guaranteeing wider distribution for their work.
After books, the library material most familiar to readers is periodicals. They are briefer than books, therefore would appear more suitable for Internet access; considerations of age, interest and copyright are roughly similar. Electronic versions of scientific and professional journals are already becoming common. But in fact an unusual situation arises with these materials, because they are basically controlled by a small group of specialized publishers which have for many years been able to increase prices in excess of inflation and, presumably, receive steady profits. Scientific journals maintain an unusual relationship with the academic world: they are written, refereed and edited by professors and researchers, who normally undertake all these activities without payment. The contents are in effect donated to a small group of publishers, who then earn significant profits by selling the printed journals back, for substantial subscriptions, to the libraries of the institutions where these same professors and researchers work. Worse, there is no reason to believe that electronic journals will be very much cheaper than the print versions that preceded them. The publishers claim that overhead costs are high and link electronic and print versions together in restrictive licensing agreements.
Now that the Information Highway has the capacity to place all scientific papers produced in the world a few keystrokes from the computer of any professor or researcher, this system has become anachronistic; its only advantage is to provide easy benchmarks for the promotion of university staff. Numerous other procedures could be adopted to grade university and research production: university sites could the publish papers of their professors and researchers directly. Major teaching or research institutions, scientific or professional associations could then set up review sites and establish commented links to worthwhile papers. Quality of work would be measured by the number and quality of review sites which establish links to the papers of a researcher.
It is also easy to imagine formats which would use electronic media to even greater effect; for example an electronic conference or Electronic Topic Node, where an institution invites contributions on a specific theme, continuously placing those of value on a server. Comments which are judged suitable could also be added, linked by hypertext to the relevant points of the principal texts. Interested persons would be informed automatically every time a new contribution or note was added to the conference site, which would rapidly become an important and reliable source for its subject. It would of course need an adequate electronic infrastructure, notably an index and links to other relevant sites. Systems of this type would be much more appropriate to the electronic age than periodical formats, burdened by concepts relevant only to printed documents, such as issue number and frequency. The whole process of production and dissemination of scientific information would then center around researchers and their computers. The same computer will be used to obtain, analyze and comment upon information, also to produce new informational documents, then post them directly on the Information Highway. This will result in an information supernova which will exponentially overshadow the current information explosion.
Although scientific and professional publications are of vital importance to professors and researchers, they are only a small subset of periodical publications in general. Bulletins and newsletters of professional associations are already moribund and will be replaced by electronic discussion lists. Modern newspapers are brief, of wide general interest and available in electronic form. But it is probable that newspaper owners will prefer to keep them under their control, on the site of the newspaper which produced them, rather than within virtual libraries. Older newspapers may be made available to virtual libraries, but these are normally of very limited interest. Weekly publications, on the lines of Time and Newsweek, and popular periodicals, will probably also be offered by their publishers, rather than through virtual library sites. But virtual libraries will doubtless become popular locations for what were previously small-circulation printed or mimeographed literary periodicals, publishing poetry and creative writing.
Theses are documents traditionally found in academic libraries; in their case, however, the prospects for transfer to electronic media are poor. They are usually relatively lengthy documents, and of limited interest. Also authors may show little interest in placing their theses in a virtual library. They will probably be more interested in attempting formal publication via a traditional publisher, which would preclude prior electronic publication. On the positive side, most modern theses are available in electronic form, and virtual libraries in university contexts would be adequate locations for materials of this type.
Patents and Standards are brief documents which are now generated in electronic form and would be of great interest to a large number of people. From that point of view they should be prime candidates for electronic libraries. But specialized agencies earn large sums of money from the sale of these documents, so it is difficult to imagine that they will be placed free of charge on the web in the near future. Reports are also brief and generated in electronic form; they are of more limited interest than patents and standards, but the information they contain has less commercial value and those responsible for their production could well have greater interest in placing them into virtual libraries. The distinctive codes traditionally placed on reports would make them easy to locate via search engines.
The above covers most current document access activities in virtual libraries. CD-ROMs can be offered in secure networks, but those produced commercially will not be licensed for use in open access virtual libraries. Reference services will doubtless maintain commented links to relevant sites, which will require constant updating and attention from library staff (Mitchell, 1996). FAQ (Frequently Asked Questions) files are already familiar tools in the electronic environment and constitute excellent resources for virtual libraries. Search interface selection tools, which help users choose an appropriate search engine and formulate questions correctly, should also be offered. Traditional libraries offer reference desk services, which could easily be mimicked by e-mail services. This would work well within an institution, such as a university or a company, but would be difficult to operate on a wider level. Reference services are labor intensive, require highly qualified staff and are normally offered only to a specific community. Public library reference services are theoretically open to all, but in fact most users come personally to the library or call from a nearby location. Few people go to the expense of phoning long-distance to use a public library reference service in a different city. Public libraries are supported by local taxes and it is natural that their reference services should be basically used by local people. But an e-mail could come from the same city, from the other side of the country or even from overseas; the library would rarely be able to verify its origin. If a virtual public library was to obtain a reputation for handling reference questions correctly, it would attract a large number of requests, which would place a strain on its resources. One solution would be for reference services to place their answered questions on a FAQ file.
Electronic media offer facilities unknown in traditional
libraries, notably multimedia, interactivity and documents which can be
constantly modified, by the producer or by the Internet user. It is difficult
to evaluate the possible impact of these media on virtual libraries, because
they extrapolate the previous experience of librarians. Multimedia may
be valuable in virtual libraries, but much of the collection will consist
of electronic versions of relatively traditional documents. A multimedia
infrastructure would not seem to offer significant advantages under these
circumstances. Interactive services are given less attention than they
deserve, but will doubtless become very important on Internet. At the moment
they help people plan routes, select pets and lay out gardens. But these
are specialized services, best offered by specialized agencies. The main
demand for interactive systems in virtual libraries would be to aid in
information retrieval activities. Texts which can be constantly modified
are totally outside the experience of librarians, who are accustomed to
deal only with fixed content documents and carefully distinguish between
the different editions of the books in their collections. It is difficult
to forecast library penetration in these areas, but it is possible that
it will be weak. Comparisons can be drawn with CD-ROMs; they offer multimedia
and
interactive possibilities, but are normally produced by specialized agencies.
Librarians are normally only responsible for CD-ROMs when these contain
products with a strong information retrieval or library element, such as
databases, cataloging manuals etc.
CREATING AND MAINTAINING VIRTUAL LIBRARIES
Although Internet is still basically free, or low cost, for users, Virtual Libraries are relatively expensive to create and maintain (Logan, 1997). First texts have to be selected, and this requires careful consideration and bibliographic research. There is generally little point in offering access to an old edition, when a new edition is available; nor should a poor quality text be offered when better alternatives exist. Selection is a professional task, which will probably be undertaken by a committee of professionals, and they will require appropriate payment, directly or indirectly.
Once a text has been selected the person or institution responsible for it has to be identified and contacted. In many cases this will be very difficult; it can take a long time to determine who owns rights to an old text, from a publisher which went out of business twenty years ago, by an author who has since retired. Even when contacted, authors or publishers will not always want to have their texts placed on the information highway. They may consider the text inappropriate for republication, or they may not wish to prejudice current sales of paper copies. Many publishers will be wary of placing their backlists on the information highway at this time. They may consider that mechanisms whereby publishers can offer their own electronic texts and receive adequate financial returns will be perfected in the next few years. It is economically sensible to hold on to the electronic publication rights to their backlists until that time. The solution may not be purely virtual but could be hybrid. Banks already have ATMs - Automatic Teller Machines, where the user engages in a brief dialog with the machine, swipes a card and receives money. It is possible to envisage an Automatic Book Machine - ABM, where the reader selects from an electronic catalog, reading reviews and a sample of the text, then passes a credit card to order a hard copy, which is printed immediately at the machine. The technology for this is available, but at the moment it is too bulky and unreliable for routine use. But machinery to develop photographs, for instance, has recently become much smaller and more efficient. An Automatic Book Machine system would offer publishers the advantage of being able to continue to do what they have been doing for the last half millennium, selling hard copies of books. The electronic version of the text would be transmitted over a safe channel, i.e. the publisher would retain control over it. The reader would have a hard copy book. It is interesting that people still prefer to read lengthy texts in hard copy, which can be done in any place, under any conditions. There is a fascinating paradox at the moment, that if a group of educated people are asked whether the book will cease to exist, most will immediately answer yes. If the same group is queried to identify people who have actually read a lengthy text on a computer screen, almost nobody comes forward. A hybrid system might provide an adequate solution to this situation.
When the person or institution responsible for the text are willing to permit it to be placed in a virtual library, it is still necessary to negotiate the exact terms of the license. This is a lengthy process, because of lack of established rules for this field. Should the virtual library have exclusive rights, so that the text could not later be placed on the information highway by any other source or means? Should the rights holder cede electronic publication rights in perpetuity? Of immediate significance, how much should the library pay for the rights? Publishers have been known to ask up to US$25 per page for electronic rights; they may also charge heavily for using specific illustrations. From what source could a virtual library draw significant resources for these expenses?
When the text has been selected and the rights obtained, an electronic version has to be produced. Some documents are already available in electronic form, but most of the books and periodicals published up to now exist only in paper and will have to be converted. Even in the short history of virtual libraries there are at least three different presentations in common use. The simplest and the original format is the text file, or ASCII file, which has the advantage that it can be read on any computer and occupies minimal computer space. The original and still the best-known site for materials of this type is Project Gutenberg, http://promo.net/pg/. The disadvantages of this presentation include loss of illustrations, original page formatting and different type sizes; differential spacing, e.g. in poetic texts, may also suffer. The system is clearly biased towards Roman alphabet materials, especially English language texts which have few diacritical marks. The final result is usually a plain DOS-style text, unattractive to those who have grown accustomed to Windows-style systems. Originally texts were keyboarded to obtain the ASCII version; this is a laborious task, which requires careful verification. Scanning has now replaced keyboarding as the standard entry method for virtual library texts. The basic input process is much quicker and easier, but it still requires a good clean copy of the original, a careful human operator and final verification. The success rate is, naturally, higher with languages with few diacritical marks.
The next step up from plain ASCII text is a hypertext document. The basic text can be generated by a scanner and an Optical Character Reader, but specific procedures will have to be followed to convert it into hypertext mode. The University of Virginia Electronic Text Center is a good example of a site of this type; it can be consulted at http://etext.lib.virginia.edu. Hypertext documents are usually fairly short; books, for instance, will be presented a chapter at a time. A contents list must be established and links created to the various chapters, also from each chapter to the following chapter, back to the contents list, etc. Special coding must be inserted to mark paragraphs, passages in bold, italic, larger type sizes, etc. Headings and lists require special treatment. Illustrations must be treated separately and inserted in the text at the appropriate point. Hypertext documents which include numerous large illustrations are slow to load; such texts should be organized in short sections, or the illustrations stored separately and accessed via a specific link. Hypertext documents are somewhat larger than ASCII texts, but are still quite compact. The hypertext version is generally pleasant to read, but is a version, rather than a perfect image, of the original.
To be true to the original the virtual library should
offer an image of the original document (Lamolinara, 1996). A growing number
of sites now offer images; one of the most interesting is Project Muse,
offering quality images of scholarly journals from Johns Hopkins University:
http://muse.jhu.edu /muse.html. Scanning will generate an image relatively
easily, but this will occupy much more space than an ASCII or hypertext
file. An ASCII file has one byte for each character in the original. A
hypertext version of the same text will have a certain overhead, often
20% more will be necessary for hypertext markup, links etc.; a 50% overhead
would be unusual for a hypertext document without illustrations. A page
of printed text would rarely occupy more than about a thousand characters,
a Kb. An image, however, occupies a much larger computer file, typically
around 40 Kb. per page. An average book may have three hundred pages, which
would occupy perhaps 12 megabytes. To merit the title library a significant
number of books must be offered; virtual libraries, like traditional libraries,
will only attract readers when their collections attain a certain critical
mass. So to use imaging technology significant computer storage must be
available. There is an additional problem here that a variety of software
is currently available in this area, known by complex acronyms such as
pdf, gif, pict, tga etc. Full compatibility between these systems, or even
between different versions of specific imaging software, has not been achieved,
which creates another area of difficulty. The use of compression will reduce
computer storage needs, but adopting a low ratio for an item with high
resolution images, for example, could reduce the value of that item to
potential users (Besser, 1995).
SUPPORT TOOLS IN VIRTUAL LIBRARIES
So far this discussion has been in terms of texts, but a virtual library does not simply place plain text onto the Information Highway. It is also necessary to offer support tools. The first of these is the index; printed books frequently have indexes on the final pages, but in an electronic environment it would be much more sensible to produce a complete computerized index to all words in the text. This should permit boolean and above all proximity searches. But this index will only be effective if the text has been correctly converted, as a spelling error in the electronic version will create a false index entry. ASCII and hypertext versions of printed documents require careful verification, to ensure that they have been correctly converted. Modern texts will normally be run through a computerized spelling checker. For an image-based system, verification and spelling checks are not necessary at the image stage, but will be necessary for the generation of an index. A scanner and Optical Character Reader, using clean English-language text, should attain around 99.95% of accuracy; this will mean an error, e.g. mistaking "clean" for "dean" or "modern" for "modem" , every 2,000 characters, or every four pages or so. Only highly educated and attentive human intervention will note and correct mistakes of this type. An alternative with image-based systems would be to produce a "gray", unverified index and advise the user to verify the results against the image of the text. Whatever the methodology the index will have to be mounted on the computer, together with a search box and appropriate links. Index preparation not only represents additional, high-level work for the organizers of the virtual library, but the index has to be stored electronically. This represents another overhead, perhaps between twenty and forty percent of the size of the ASCII text, depending on the nature of the text, depth of indexing, number of stopwords etc. Note that in the case of image-based systems, the normal procedure would be to make an image of the text, produce an ASCII file from the image; generate the index from the ASCII file, then mount the image and the index on the Information Highway, while discarding the ASCII file of the text. A virtual library can combine its indexes to the words in specific texts into a combined index of all the significant words in all the texts it offers; this is now available at several sites, e.g. at the University of Virginia Electronic Text Center cited earlier (http:// etext.lib.virginia.edu). This feature, if offered in conjunction with adequate proximity searching, would offer a powerful tool, undreamed of in a traditional library. Its major drawback is that it could share the faults of current Internet search engines. That is, except when used for very specific searches, it might throw up much more information than a reader could usefully absorb.
A further essential tool is the bibliographic citation. In an electronic library the reader cannot simply flip back to the title page of the book to check the author and title, then go to the end of the text to verify the number of pages, or look quickly at the wrapper to see whether the book is published in a series. A full and correct citation must be prepared by a professional librarian and made available where it can be consulted easily, e.g. via a prominently-displayed link.
It is clear that an electronic version must reproduce the basic text of the original faithfully, but that certain supporting materials, notably the original contents list and index, will be totally irrelevant in many electronic presentations. Some information in other contexts, such as the verso of the title page or the wrapper will also no longer be completely accurate. Principles have to be established here; all relevant parts of the original must be transmitted in a way that will fully support any future use or study.
A library has a front door and entrance hall; a virtual library will also have to have an opening page, a general explanation, a basic guide to content and philosophy, list of persons involved, etc. The site has to be attractive and organized so that users can go rapidly to the information they require. Just like its traditional counterpart, a virtual library will require feedback from its readers to enable it to improve its services. In an electronic environment this involves making a electronic message form available, then replying to, analyzing and acting upon user comments.
So a Virtual Library will be based upon a significant quantity of complex, inter-linked files, containing data of different types. These will have to be placed on a server, a powerful, reliable microcomputer which will probably need the support of a separate, high-capacity disc drive. These need to be linked, over adequate bandwidth, to the Information Highway, 24 hours a day, seven days a week. Specialized software will be necessary, as well as a maintenance contract for the equipment. The site must be supported by computer personnel familiar with Internet. These are still relatively few in number and command significant salaries. Just like its traditional counterpart, a Virtual Library needs to be dynamic, constantly adding new materials. So the institution which attempts to set up a Virtual Library must plan for significant expense and be able to place professional, technical and clerical staff at the disposition of the new service.
Some further factors need to be briefly discussed. A traditional book does not require equipment to permit reading; an old book can usually be read as easily as a new book. Old computer files are not, however, in the same category. Modern computer users have gone through various file conversions; e.g. from Word 4 (a DOS level system) to Word for Windows under Windows 3.0, then to Word 6 under Windows 95 and probably by now to Word 7 for Office 97. A similar situation has arisen with microforms; opaque microforms, notably of UN documents, are now of little use because the equipment to read them is no longer easy to obtain. It is too early to say what impact processes of this type will have on the virtual library field, but already there is reduced interest in documents in plain ASCII text. It is entirely possible, that as technology develops, those responsible for Virtual Libraries will have to regularly transfer electronic texts and images from one system to another, a process known as migration. There is a new danger here, that at one stage in the chain a technician may lose all or part of an image. The text will then have to be regenerated from an original, but for this the Virtual Library will have to be able to guarantee continuing access to the original document. This will also be useful should it be necessary in the future to verify whether all aspects of the original have been successfully transferred to electronic format, in the correct order and properly indexed. Some systems may lose electronic documents as a result of fires, natural disasters, electrical surges or the action of hackers or vandals. One interesting possibility in this context is to microfilm the document and scan the microfilm. As microfilm equipment is cheaper and better developed than scanners, this can be easier and cheaper than direct scanning and the microfilm can be kept in a safe location as a backup.
Some type of superstructure for Virtual Libraries
needs to be developed, notably to enable potential readers to identify
and locate virtual texts. Libraries were quick to enter the automated age,
with automated catalogs and databases. Because their catalogs were automated,
major university libraries, especially in the United States, were able
to make them accessible via Telnet, becoming major players at that technological
level. But, in part because they are already available via Telnet, catalogs
are taking longer to transfer to the World Wide Web. A striking feature
is the lack of coordination between library catalogs and Internet search
engines. Anyone with Internet access can within seconds locate home pages
which deal with a specific author; a simple additional search will reveal
newsgroup messages which mention that person. But locating libraries which
hold copies of books by that author is a complex task, requiring specialized
software or lengthy searches. Large scale, public access, union catalogs
of library materials are still embryonic. But in the virtual world multi-level
catalogs will be necessary, informing not only where the hard-copy can
be found, but also where virtual texts can be accessed. In order to be
effective, Virtual Libraries will require the support of professionally
produced union catalogs. At the moment a text can be available in three
hard copy presentations: original, xerox copy or microfilm; there are also
three electronic systems: ASCII, hypertext and image. Note that there are
various formats for microfilm, also various systems for computer imaging,
with varying compression ratios. Electronic texts can be supported by carefully
verified word-by-word indexing or by less complete indexing systems. Future
catalogs will be very complex and require a high level of professional
skill. Remember that there is no mechanism for charging for catalogs, which
have always been available for free access.
Libraries are vital instruments for cultural transmission because they simultaneously preserve and disseminate with minimal restrictions and modifications the most significant aspects of the culture of specific societies. In this role libraries are naturally heavily influenced by the societies in which they are located. The Information Highway is international, but computers are located in countries with different laws, scripts, languages, customs, and attitudes towards free expression of ideas and circulation of information. It is necessary to consider to what extent these elements may influence virtual libraries. Legislation covering the World Wide Web is still in its infancy. It is not even certain who can be held responsible if illegal or objectionable material or just antigovernment comment is placed on Internet. For instance: is a university legally responsible for everything that appears on the home pages of its professors? What is the responsibility of the University in relation to student home pages? Can an Information Service Provider be sued for material its subscribers place on their home pages? If objectionable material is placed on the web, do the authorities have the right to impound the computer of the Internet Service Provider, effectively closing the ISP? Can a computer used to make an objectionable home page be held as evidence? Some countries permit anti-government statements and propaganda, others do not; few permit support for the violent overthrow of the government and none openly condone terrorism. In the book and newspaper world some lines have already been drawn; in Western Europe and North America minors are permitted to enter libraries and bookshops, even if they contain some items unsuitable for them. Newspapers are considered to have wide rights to criticize governments, although radio and especially TV are often more restricted. Societies where persons and organizations are considered to have a relatively wide right to disseminate information should be more receptive to the circulation of ideas and to the establishment of virtual libraries. Many countries consider that religion, morality or community traditions are more important than a vague general right to information. Many other countries have additional technical problems when establishing virtual libraries, because the principle software, OCR, indexing systems etc. are intended for use with Roman alphabet texts. Where professionals can set up virtual libraries with more confidence and easily include a wider range of materials, they will be able to appeal to a wider group of users and make their systems more viable.
It is common today to encounter people who look down
upon library services as being outdated and irrelevant to modern society.
But it is necessary to remember that for centuries libraries have been
the major public source of reliable, recorded, organized information that
could be freely cited and used as a basis for the creation of other documents.
They therefore hold a pivotal position in the process of the expansion
of knowledge. In the future, information and education-based society, services
of this type will be even more essential; the major difference will be
that the community will expect to receive these services via electronic
means. The chief task of the next generation of librarians will be to build
up a network of virtual libraries, as strong and as reputable as the existing
complex of traditional libraries. There are in existence tens of millions
of books and other printed texts, of which only a minute proportion have
so far been converted to electronic form. The major task facing our profession
today is to ensure that the finest traditions of library and information
science professionals are successfully transferred to the Information Highway,
so that Virtual Libraries can serve society through the next millennium.
Besser, Howard (1995) Getting the picture on image databases: the basics. Database, 18, (2),12-19
Birdsall, William F (1994) The myth of the electronic library: librarianship and social change in America. Westport, CT.: Greenwood (Contributions in Librarianship and Information Science, 82)
Bishop, Ann Peterson and Star, Susan Leigh (1996) Social informatics of digital library use and infrastructure. In: Williams, Martha E, ed. Annual Review of Information Science and Technology. Medford, NJ: Information Today; 1996; v. 31 pp. 301-401
Chepesiuk, Ron (1997) The future is here: America's libraries go digital. American Libraries. 27, (1), 47-49
Commings, Karen (1997) Virtual library offers the latest in information technology. Computers in Libraries, 17, (2), 20-21
Crawford, Susan Y; Hurd, Julie M, and Weller, Ann C (1996) From Print to Electronic: the Transformation of Scientific Communication. Washington, DC: American Society for Information Science
Hart, Keith (1997) Electronic publishing on CD-ROM and the Internet: the experience of two special libraries. Aslib Proceedings, 49, (6),159-161
Henthorne, Eileen (1995) Digitization and the creation of virtual libraries: the Princeton University image card catalog: reaping the benefits of imaging. Information Technology and Libraries. 14, (1), 38-40
Hurt, Charlene (1997) Building libraries in the virtual age. College & Research Libraries News. 58, (2), 75-76, 91
Jacso, Peter (1996) The Internet as a CD-ROM alternative? Information Today. 13, (3), 29-31
Kleiner, Jane P (1993) The electronic library: the hub of the future's information networks. In: Huang, Samuel T, ed. Modern library technology and reference services. New York: Haworth Press; pp. 131-139
LaGuardia, Cheryl (1995) Virtual dreams give way to digital reality. Library Journal, 120, (14), 42
Lamolinara, Guy (1996) Metamorphosis of a national treasure: the Library of Congress's National Digital Library Program. American Libraries, 27, (3), 31-33
Logan, Elisabeth and Gluck, Myke (1997) Electronic publishing: applications and implications. Washington, DC: American Society for Information Science
Lyman, Peter (1991) The emerging electronic library. Australian Academic and Research Libraries, 22, (3), 159-166
Lyman, Peter (1996) What is a digital library? Technology, intellectual property and the public interest. Daedalus, 125,1-33
Marcum, Deanna B (1997) Digital libraries: for whom, for what? The Journal of Academic Librarianship, 23, (2), 81-84
Martin, Lowell A (1996) The electronic library. In : Martin, Lowell A. Organizational structure of libraries. Revised ed. Lanham, MD: Scarecrow Press, pp. 293-301. (Library Administration Series; v. 5)
Mitchell, Steve and Mooney, Margaret (1996) INFOMINE: a model web-based academic virtual library. Information Technology and Libraries, 15, (1), 20-25
Moffatt, Malcolm (1996) An EEVL solution to engineering information on the Internet. Aslib Proceedings, 48, (6),147-150
Norbie, Dorothy (1994) The electronic library emerges at U.S West. Special Libraries, 85, (4), 274-276
Riggs, Donald E (1995) Digital libraries: assumptions and characteristics. Library Hi Tech, 13, (4), 59-60
Rosenthal, J. A (1991) Crumbling walls: the impact of the electronic age on libraries and their clienteles. Journal of Library Administration, 14, (1), 9-17
Rowley, Jennifer (1996) Libraries and the electronic information marketplace. Library Review, 45, (7), 6-42
Saffady, William (1995) Digital library concepts and technologies for the management of library collections: an analysis of methods and costs. Library Technology Reports, 31, 221-250
Saunders, Laverna M., ed. (1993) The virtual library: visions and reality. Westport, CT: Meckler
Saunders, Laverna M., ed. (1996) The evolving virtual library: visions and case studies. Medford, NJ.: Information Today
Warner, Beth Forrest and Barber, David (1994) Building
the digital library: the University of Michigan's UMLib Text Project. Information
Technology and Libraries, 13, (1), 20-24
Note: this paper was prepared while
I was teaching in the Library and Information Science Program, College
of Graduate Studies, Kuwait University.
|
|
|
|
|
|