Funding the Digital Library: the British ExperienceFunding the Digital Library: the British Experience

by Derek Law


UK higher education has taken a number of bold steps to deliver network services through national planning. We have begun to build and deliver a distributed national electronic collection. This paper then describes not only the funding and management mechanisms for the programme but also considers the theoretical underpinning for our work. Underlying these pragmatically organised services is a philosophical approach to information provision and central to that philosophy are two truths which we hold to be self-evident; firstly, that services should be free at the point of use and that we have a duty to the nation to turn out graduates who are not only eager to use electronic services, but have been taught the skills to take the fullest advantage of this; and secondly that fair use exists for electronic media as it does for print media.

We are conscious that libraries have been given a funding window not previously seen and unlikely to be repeated during our professional careers. As a consequence a group of highly motivated individuals are driving the process ahead very quickly and fairly ruthlessly. There is a determination to deliver and embed products and services which is being done at the expense of democracy and consensus within the wider community. Librarians, library system suppliers and libraries have proved remarkably reluctant to accept the reality of end-user access and continue to devise systems in which librarians act as intermediaries as of right. This attempt to channel information through the library uniquely is doomed to failure and we would do well to recognise that. In the UK the Joint Information Systems Committee of the Higher Education Funding Councils (hereafter "we") is attempting to devise national structures for end-user access in higher education based on this premise. In describing those structures I hope to give some view of the underlying principles which are taking us forward to create the Distributed National Electronic Collection, which is being funded as part of our running budget but also through additional funds for the so-called Follett programme.


I want to begin by making four assertions which have in part guided us in policy issues in this area:

1. Cataloguing the Internet is a grand and necessary ambition, but it may prove as necessary but as impossible as IFLA's programme of Universal Bibliographic Control.

2. It is important that we do not allow the STM model to dominate our thinking. Mediaeval historians and economists are as important as chemists.

3. Librarians have not been notably successful in joint collection building, in part due to histories going back hundreds of years. This despite an often expressed and genuinely felt wish to cooperate.

4. That said, the start of the electronic era allows us an opportunity for a fresh start in which electronic collections can be shared. This requires planning and intervention at a national level, since librarians have failed at a local level.


Three or four years ago it was popular to lecture about the then fashionable concept of moving from the Ptolemaic to the Copernican World. You will remember that the library centred world was compared to Ptolemy's where the concerns of the library were to do with money and meetings, with salesmen and acquisitions - and only peripherally with readers. The sun moved round the earth. Many became excited by the new Copernican world view, the user centred view where the proper symmetry of the earth revolving round the sun was seen and understood. Now the library was only one of a range of information sources required by the user, who had access to video, research results, correspondence with colleagues, their own books and journals and from time to time the library. This was seen as providing a model on which the next generations of user centred systems could be built. But libraries remain recidivists and they and their suppliers continue to cherish the notion that they can act as some kind of filter and/or guide to information, even when this is a demonstrable vanity.

But the popular view, fostered by press and politicians is equally vain. The popular view is that the user sits at the terminal, buys or acquires a connection to the Internet and then passes through some cloud of unknowing to the resources of the world. It would be charitable to call that naive and there is a need to create and develop a major infrastructure to make this a reality. A much more sensible reaction is that of the American sociology professor exposed to networked information for the first time, who talked of the "howling wastes of the Internet".

My own view is that the Internet is much more like one of those jigsaw puzzles where the pieces have no picture and one can use only tireless patience and logic to put the shape together. I am not sure that cataloguing the pieces in traditional ways is any solution although it may provide destitute cataloguers with a form of charitable outwork for the deserving poor. In the UK our preferred approach is to provide a core of essential resources which will be the first resort of network users and attempt to use that as a magnet for other high quality locally created resources which can be linked through metadata.ÊRather than set out to complete the jigsaw and catalogue the universe, we are assuming that Bradford's Law of Distribution (the 80:20 rule) applies and that the majority of user needs can be met from a limited range of resources. So we are putting together a smaller catalogue of resources which will be centrally funded, centrally provided, but held in distributed locations. We will settle for the earth rather than the universe.


General Background

After many years of working with data we are quite clear that the major costs of electronic services are the ownership rather than the acquisition costs. It is therefore in areas such as training, centralisation of datahandling, documentation and support that the greatest economies are to be made. We are clear that this is best done through a nationally planned strategy.

Of course a library cannot give an open-ended commitment to provide an infinite amount of support. The concept of core and value added services has then begun to emerge. A basic minimum level of service is defined which will meet the needs of most people most of the time. This is the core service. For additional services - value added services - an appropriate charge is then made. Thus, for example, to write down from a computer screen the information retrieved by a search might be seen as core, while to print out the result (which costs the library paper, ink etc.) may be charged for.

We also firmly believe that the state rather than the commercial marketplace has a responsibility not just to provide a universally accessible network, but to have a critical involvement in issues such as standards and content thus providing the core infrastructure which enable everyone to have access to a common set of basic facilities. At the very least this would imply the state to require such universal provision from carriers by regulation or statute. In that sense the Bangemann Report for the European Union was a great disappointment since it wishes to leave the development of networks entirely to the market. Since the market has no sense of social responsibility and is interested only in profit this approach may well disenfranchise all but the affluent members of the community. Already in Europe we can see a huge discrepancy in the quality and availability of networks. Instead of enfranchising less favoured regions we run the risk of reinforcing existing discrepancies if the Bangemann approach is adopted.

It is important to remember that increasing classes of information are available only in electronic and digital formats. Satellite data, film, television and radio are obvious examples of this. But the range is growing; in advanced countries the census is available only in machine readable form; weather and crop data, medical and even archaeological data now exists only in electronic form.

The final consideration is the position of publishing and publishers. Many academics can perceive an emerging split between academic and mass market publishing. There is a growing change in the way research is conducted and the results transmitted. A multi-national electronically based future is emerging and while publishers act as though research exists to support publishing (while the opposite is true) it is not clear that they have a long term future in disseminating the results of scholarship. We believe that the paradigm of scholarly communication is changing, a theme picked up by Richard Heseltine.


UK Higher Education

The JANET network and its services are funded centrally from the grant to Higher Education made by the government. The sum is tiny - some £30 million - compared with the total higher education budget of £8 billion pounds, about 0.4% of the total. However it is large enough to provide significantly greater benefit than we would gain from giving each university a few thousand more pounds. About £23 million of the money is spent on the physical network, connecting every university and research institute and providing the international links to other countries. That leaves some £7 million for the provision of services and for research and development. A further £8 million is being provided under the Follett programme to encourage experiments in the digital library field. Links to both the United States and Europe are both relatively low speed and expensive to upgrade. This may be expressed starkly as giving us a choice to spend our money on content or bandwidth. We have then developed a two pronged strategy of increasing the capacity to cache data, of building mirror sites and as a corollary of protecting the data we create within the UK. Cache sites simply capture the international traffic and store it for a brief period. This assumes that the best guide to what will be used is what has been used. Data is kept for a few days and future requests simply look there first before using the international link. A mirror site takes a deliberately chosen piece of data and keeps a permanently updated copy in the country. Perhaps the best example of this is the Visible Human Project. These images are very large, but much in demand by medical and health science students. We are therefore discussing with the National Library of Medicine setting up a mirror service in the UK, simply to keep transatlantic traffic levels within bounds.

Protection of existing data is important. Computing media have gone through astonishing transformations in the last thirty years and unless there is a systematic attempt to "future-proof" research results they may effectively be lost. We have therefore set up centres to deal with this issue as far as research results are concerned.ÊConcern remains about commercially produced material, since there is not yet electronic copyright despite repeated calls from national libraries. Compared with universities, publishing houses are ephemeral and generally have a sense of socialÊresponsibility which bypasses their wallets. We shall be considering the issue of archiving the collective record of the academy at a seminar this autumn, with a view to being more interventionist in the process of preservation of electronic media.

As part of this whole process we are also determined to ensure that we have an adequate national skills base. Dealing with very large datasets of all sorts will be a key skill in future and we are determined that the UK should not be reliant on others for those key skills.


UK Higher Education Networked Services

Let me briefly describe these services, principally so that you can see how far beyond the traditional boundaries of the library they go. The first four services provide the infrastructure, support and training which underpins much of the activity.

AGOCG. The Advisory Group on Computer Graphics provides a single national focus for computer graphics, visualisation and multimedia. Based at Loughborough it carries out software and hardware evaluations, runs workshops and seminars and assists sites in the introduction of key technologies. It offers a useful "technology watch" service.

BUBL. The BUBL Information Service offers an Internet current awareness service, together with organised, user-friendly access to Internet resources and services with the combined gopher/WWW subject tree being a particular feature. It is organised from StrathclydeÊUniversity.

MAILBASE. Based at the University of Newcastle this organises the Listserv activity in the United Kingdom. Its brief is wider however and it also sets out to organise the communities which will operate listservers. It has had notable success in this field, not least with university administrators.

UKOLN. The Office for Library Networking which acts as a sort of strategic thinktank and research and development centre. It also acts as the UK Gopher National entry point. The BUBL service is also physically housed here since the network address of BUBL.BATH proved irresistible.

There is in addition a substantial and growing range of dataservices.

BIDS. Based at the University of Bath, this is the only substantial commercial service. It provides access to a range of bibliographic datasets, including the ISI citation indexes, Embase and Compendex. The International Bibliography of the Social Sciences has also just been added.

DATALIB. This new centre at the University of Edinburgh has been set up, initially with three datasesets: Biosis Previews, Palmer's Index to the Times and Periodicals Contents Index. This last is hoped to open up the periodical literature for humanities scholars in the manner taken for granted by scientists.

ESRC DATA ARCHIVE. The Archive is jointly funded by the ESRC (Economic and Social Research Council), the JISC (Joint Information Systems Committee) and the University of Essex. The oldest national centre, founded in 1967, its function is to acquire and preserve research data in the social sciences and humanities and to make them available for analysis and teaching. About 5000 datasets are held currently.

HENSA. This is the shareware archive. It is in two parts with UNIX numerical and statistical software offered from the University of Kent and p.c. software from Lancaster University.ÊAt Kent, Internet searches may also be performed using the archie server and Kent is becoming the national centre for cacheing.

NISS. This set of services is based at the University of Bath and concentrates on current information ranging from yellow pages to newspapers. It aims to promote an electronic information culture through providing access to useful collections of information. It also acts as a gateway to other services and resources and provides information through the NISS Bulletin Board.

MIDAS. Based at Manchester University, this service is one of very large datasets, most notably the UK 1981 and 1991 Census, continuous government surveys such as the General Household Survey, macro-economic time series databanks and scientific datasets. There is a full range of support services for the data.

AHDS. An Arts & Humanities Data Service has just been authorised and will be based at King's College London. This follows a major feasibility study and the service will broadly be based on the experience of the Essex Archive. It will be a distributed service with the centrally based Executive co-ordinating standards, training, support, evaluation and publicity. Richard Heseltine describes this initiative more fully.

Work has just begun on defining a national image centre. Higher education produces thousands of images each year ranging from medical and dental through to art & design. We are concerned that these should be retained within and made available to the wider academic community. It is hoped that the plan for such an image service will emerge within about one year. The feasibility study has just been completed and is out for consultation. This area is explored more fully by Richard Heseltine.

Negotiations have been completed for the creation of a national higher education OPAC linking the library catalogues of the collections of the major academic research libraries which form the CURL (Consortium of University Research Libraries) group. This will have great value for researchers, but the intention is to link it to new distributed document delivery services which will serve different parts of the country or different subject areas and ensure that maximum value is obtained from the investment that higher education makes in its library collections. Such document delivery services are expected to operate at marginal use and under existing fair-use provision, which will make them very cheap for the user. Some £30 million has been made available for the cataloguing and preservation of collections in the humanities. It is then intended that the catalogue records so created will be added to the CURL database, enriching it gradually to become a single entry point for all the major research and special humanities collections in the UK. Coupled with document delivery this will provide a significant democratisation of research materials.

A second consequence of this humanities funding will be a major initiative in the area of manuscripts and archives. It has been persuasively argued that archives records do not sit happily in the MARC record structure and that, unlike library collections it is not self-evident where, say, the Gladstone or Roosevelt Archives are held. Standards are also in a state of confusion. It is hoped to test the model of a national archives server this autumn and to work with international colleagues on setting standards, trusting that bribery will succeed where democracy has failed.

We have also set up a number of subject based resource discovery services to explore the issues of sustainable resources. These are in the areas of social sciences, biomedicine, engineering, history, art and design, conflict studies. All of them are committed to defining a set of relevant resources on the Internet, cataloguing them, ensuring access to them and providing training and support. This fulfils the philosophy described above of guaranteed access to limited but high quality resources. Different balances of content provision, content description and community involvement are being explored.

A review study of CNIDR (Clearinghouse for Networked Information and Resource Discovery) and of InterNIC has been commissioned to consider how we might use these American ideas in a UK context to make generally available information on network developments and standards and to provide advice and leadership on local system design.

We also wish to pursue the question of whether the paradigm of scholarly communication is changing. Some funds are being spent on developing peer- reviewed electronic journals. However we are also considering proposals to develop the ideas propounded by Paul Ginsparg with his Los Alamos archive of high energy physics papers. One possible step is to launch a pre-print archive in the cognitive sciences. As a related step and partly in response to publishers' refusal to acknowledge that fair dealing exists in electronic media, discussions are going on with US and Australian colleagues on copyright and intellectual property considering whether retaining copyright within institutions (Reclaim the Right!) is a feasible approach to hastening changes in scholarly communication. Finally we are embarking on a digitisation programme which will make available resources on the network. Various models are proposed, some commercial ventures, some partnerships with small publishers and some for heavily used out of copyright material. The intention is to cover a wide range of disciplines. The first materials covered are eighteenth century journals such as The Gentleman's Magazine.


Principles

It is also worth considering some of the policy issues which have been exposed in developing our services. Firstly, it is a cardinal principle that information must be free at the point of use. Where commercial information is provided it is either paid for from central funds or by the institution or by some combination of the two, but never by the end-user. We want to encourage and stimulate use as a strategic national goal. On the whole suppliers do not lose. There is already anecdotal evidence of increased downstream use. As students become employees they are beginning to seek the same electronic resources they used daily at university. We have had and do have major debate over the price to be charged to institutions for such services but always on the premise that services are free at the point of use. In practice most are wholly free and are paid for by "top-slicing" the higher education budget as described above. Only for the commercial bibliographic products do are sites required to make a payment.

Secondly, we are committed to subscription based or licensing models and will not fund transaction based models. There is always another alternative product and only the most arrogant of publishers believe that they have a true monopoly. In fact there is some evidence that our policy is beginning to affect the use of products from those publishers who are not willing to accept this model.

Ê Thirdly is the commonality of interfaces. The concept of a common command language for material as varied as the census, wordprocessing software and bibliographic data is an evident nonsense. However by grouping material together in locations by type, whether bibliographic, full text or numeric, we have been able to go some way towards providing common interfaces to the various datasets. Perhaps the next major challenge for the policy is, however, to encourage better and more friendly interfaces. To this end the bibliographic service providers are to begin an evaluation of OCLC's SiteSearch software so that at least a common look and feel can be given to national services.

Fourthly is community involvement. It is a central tenet that resources are to be provided for all disciplines. A Datasets Steering Group has been set up to conduct a planned programme of procurements for all subject areas and it is already planning up to two years ahead. That group conducts product evaluations which involve the relevant academic and library communities in identifying the "best buys" for the subject. It is specifically charged with covering all subject areas to provide a balanced set of resources.

The last point to mention is our present policy of delivering information to everyone. This means delivering to the poorest sort of terminal, currently defined as a VT100. Inevitably this frustrates users with more powerful equipment. As a result we are about to conduct a census of terminals in UK higher education to decide whether it is now time to move the definition upwards without disenfranchising significant numbers of users with old equipment.

Perhaps the greatest challenge remaining is that of mass instruction. Librarians are used to giving individual or small group support to users. However we now see that we must change and be in a position to pass on information management skills to perhaps 5000 students a year. This will require a major shift of attitude, skills and ambitions.

And so this leads us to the underlying goal of the distributed national electronic collection. It is clearly at this point incomplete and it will take several years to have all the elements in place. Some services will succeed and others will fail; we shall have disappointments along the way. But the objective is clear, to create a central core of material which is centrally defined but meets user needs in all disciplines. The user will then have a limited need to search for materials outside the core. We will spend our resources on developing that core rather than on cataloguing anything that might ever be used on the Internet. In doing this we hope to provide a variant of Gresham's Law. While bad money may drive out good, we hope that quality assured data, available reliably and with excellent nationally prepared documentation will remove the need to use unknown data of unknown validity available intermittently and unreliably.


Conclusion

The analogy is perhaps unfortunate, but what we are consciously doing is the equivalent of giving away drugs in the playground. We see it as our responsibility to create graduate students who are dependent on electronic information and who will go out into the industry and commerce of our country spreading the electronic revolution.ÊWell that just about describes what we are trying to achieve, but with a slightly forced link I thought I might summarise what we are aiming for with a small tribute to someone the hundredth anniversary of whose birth last year seems to have gone largely unremarked. His modern fables provide one of the guiding principles of my life and of this programme of work - "Don't get it right get it written". He was an amazing cartoonist and illustrator and gave us one of the mythic characters of the twentieth century, Walter Mitty. Our work has perhaps unimaginatively been named the e-Lib Programme. However I'm tempted to rename it Excelsior after his illustrations for the Longfellow poem. They neatly describe some of the problems we face. We will not be diverted by the warnings of the old man who predicts great difficulties ahead and warns:

"Try not the pass the old man said,
Dark lowers the tempest overhead."

Nor are we ready to stop for the sake of comfort and popularity -

"O stay the maiden said and rest
Thy weary head upon this breast".

Instead we shall press on up our mountain of ambition carrying "that banner with the strange device - Excelsior", as fully paid up members of the James Thurber School of Networking.