Date: Fri, 29 Oct 1999 19:31:37 +0000 From: Stevan Harnad <harnad@COGLIT.ECS.SOTON.AC.UK> Subject: Report on Santa Fe Initiative (Interoperable Eprint Archives) The following is the official press release describing the proceedings of the UPS Initiative's very important meeting last week in Santa Fe. It is followed at the end by some unofficial addenda by me, particularly about how the newly agreed upon Santa Fe standards will be applied to new, generic archiving software that will be created here at Southampton in the next 6 months and will then be given away free to all universities worldwide who wish to establish Eprint Archives for the research papers of all their Faculty, with sectors devoted to each of their academic discipline. (The name "UPS" will shortly be changed to reflect the fact that the initiative is decidedly not just about "Preprints" but about creating interoperable archives for "Eprints," which includes unrefereed, unpublished preprints, refereed, published reprints, and related kinds of research documents and data.) ------------------------------------------------------------------------ First meeting of the Universal Preprint Service Initiative UPS Initiative: Paul Ginsparg, Rick Luce, Herbert Van de Sompel <http://vole.lanl.gov/ups/ups1-press.htm> Meeting: * Location: Santa Fe, New Mexcio, US, October 21-22 1999 * Sponsors: Council on Library and Information Resources, the Digital Library Federation, the Scholarly Publishing and Academic Resources Coalition, Association of Research Libraries, the Research Library of the Los Alamos National Laboratory. * Meeting moderators: Clifford Lynch & Don Waters. * Represented institutions/organizations: American Physical Society, Andrew W. Mellon Foundation, Association of Research Libraries, California Institute of Technology, Coalition for Networked Information, Cornell University, Council on Library and Information Resources, Digital Library Federation, Harvard University, HighWire Press, Library of Congress, Los Alamos National Laboratory, Massachusetts Institute of Technology, NASA Langley, Old Dominion University, the Scholarly Publishing and Academic Resources Coalition, Stanford Linear Accelerator Center, University of California, University of Ghent, University of Southampton, University of Surrey, Vanderbilt University, Virginia Tech and Washington University. * Represented eprint-initiatives: arXiv.org (=xxx), CogPrints, NDLTD, RePEc, EconWPA, NCSTRL, NTRS * Participants: see seperate list Executive Summary The Universal Preprint Service initiative has been set up to create a forum to discuss and solve matters of interoperability between author self-archiving solutions, as a way to promote their global acceptance (see http://vole.lanl.gov/ups/ups.htm ). The first, largest and most important such archive is the Los Alamos National Laboratory (LANL) Physics Archive. Founded by Paul Ginsparg in 1991, LANL now houses over 100,000 papers, mirrored worldwide in 15 countries with over 50,000 users daily and still growing (see http://arXiv.org/cgi-bin/show_stats ). Other disciplines and institutions have begun to create public research archives along the lines of LANL but what is needed are conventions that archives could adopt to ensure that they work together so that any paper in any of these archives could be found from anyone's desktop worldwide, as if it were all in one virtual public library. The participants in the meeting were digital librarians and computer scientists specializing in archiving, metadata, and interoperability, and they included the founders of the principal public research archives that exist so far. The participants were diverse in their underlying motivations, but entirely unified in their objective of paving the way for universal public archiving of the scientific and scholarly research literature on the Web. The group agreed on minimal technical requirements for archives. These will be published seperately as the "Santa Fe Conventions" and, in the next six months, will be implemented in the existing archives. Technical Summary The first meeting concentrated on the creation of cross-archive end-user services. The aim was to try and identify general architectural and technical characteristics of archive solutions, that would facilitate the creation of such services. These characteristics could then be used as recommendations for existing and upcoming initiatives. The meeting started off with a presentation and demonstration by a team consisting of Herbert Van de Sompel (University of Ghent and Los Alamos National Laboratory), Michael Nelson (NASA Langley and Old Dominion University) and Thomas Krichel (University of Surrey and RePEc initiative). This group had built an experimental end-user service providing access to data originating from main archive initiatives (arXiv, RePEc, NCSTRL, NDLTD, NTRS). A variety of technologies were used in the project, including NCSTRL+ as the digital library service, intelligent objects called buckets as a means to store the archive metadata and the SFX linking solution as a means to interlink the eprint data with the traditional scholarly communication mechanism. The presentation identified problems that arose during the project, and discussion of those served to launch the UPS meeting. This presentation was followed by position papers on interoperability issues presented by Carl Lagoze (Cornell University), Kurt Maly (Old Dominion University), Ed Fox (Virginia Tech) and Carolyne Arms (Library of Congress). Following the initial presentations, there was a panel discussion in which Paul Ginsparg (Los Alamos National Laboratory), Paul Gherman (Vanderbilt University), Eric Van de Velde (CalTech) and John Ober (University of California) expressed their opinion on the possible pros and cons of institutional versus discipline-oriented archive initiatives. The UPS group concluded that many different archive initiatives were likely to emerge, with different conceptual, organizational and technical foundations. In order for such initiatives to successfully become part of the scholarly communication system, interoperability was seen as a crucial factor. The UPS group agreed that interoperability hinges on a fundamental distinction between the archive-functions, which include data-collection and maintenance and end-user functions, like the cross-system search and linking prototype service described in the opening session. Although archive initiatives can implement their own end-user services, it is essential that the archives remain "open" in order to allow others to equally create such services. This concept was formalized in the distinction between providers of data (the archive initiatives) and implementers of data services (the initiatives that want to create end-user services for archive initiatives). Stimulated by a presentation by Thomas Krichel, the UPS group agreed that an essential feature of the Santa Fe Conventions would be that providers of data use a standard mechanism to state the conditions under which their datasets can be used by implementers of data services. Similarly, the implementers of data services could describe the use they make of archive data. This organizational argument was followed by a discussion on the technicalities of creating end-user services for data originating from different archives. The group recognized that there are basically two ways to implement these: a distributed searching approach and a harvesting approach. The former would require archives to implement a joint distributed search protocol, which is not considered to be a low-entry requirement. Moreover, the technical experts recognized that there are important problems of scale when implementing such distributed search solutions, in light of the possible emergence of thousands of institutional and/or subject-oriented archives worldwide. As such, the group decided this was not a realistic approach at this point in time. Therefore, as in the experimental project presented at the beginning of the meeting, a harvesting solution was proposed. Such a harvesting solution would allow trusted parties - the ones that subscribe to the Santa Fe Conventions - to selectively collect data from different archives. It was identified that such a technique requires an understanding regarding: * Protocols to selectively harvest data; * Criteria that can be used to selectively harvest data; * Metadata formats that are used by archive solutions to respond to harvesting requests. It was recognized that providers of data could describe the details of these interfaces in standard ways thus enabling implementers of data to create archive-specific harvesters. Still, the UPS group decided to go one step further and to highly recommend the following: * Protocols to selectively harvest data: implementation of part of the Dienst protocol in order to achieve a uniform way to poll an archive for its logical division(s) (subarchives) and to selectively harvest data from these divisions. * Criteria that can be used to selectively harvest data: there should at least be support for a bulk harvest of all data from an archive, as well as a mechanism to harvest based on accession date. Other harvesting criteria that were thought to be important included author affiliation, subject, publication type. * Metadata formats that are used by archive solutions to respond to harvesting requests: It is recognized that archives will use (an) internal metadata format(s) best suited to deal with the material to be described. Still, the UPS group decided to propose a minimal Dublin Core compliant metadata set, called the Santa Fe Set, that should be made available by all archives. It is desirable that archives are able to respond to harvesting requests with data delivered in both the internal metadata format as in the Santa Fe Set format. The representatives of existing archive initiatives at the meeting as well as those from institutions that are in the process of setting up archive initiatives agreed to comply to those guidelines. The Dienst protocol will be enhanced to allow for the functions mentioned above and a minimal Dienst release facilitating the process of making an archive compliant to the required aspects of Dienst will be made available. A transport format for MARC-formatted metadata will be proposed, as well as an XML DTD for the description of the Santa Fe Set. The recommendations will be extensively documented on a Web site. Adoption of the recommendations will be promoted worldwide. The way forward * The minimal Dienst protocol set will be implemented for all archives that were represented at the meeting. This will allow for a first round of experimentation with the creation of end-user services layered over existing archives. * The group identified the urgent need to discuss the mechanisms used to submit material to archives. * Paul Ginsparg suggested that a next meeting should be held in Europe, in the first quarter of next year. * It was also thought to be important to have a presentation and/or workshop on the UPS Initiative at the ACM 2000 Conference on Digital Libraries as well as at the European ECDLC. * The experimental, non-productional prototype that was presented at the meeting will temporarily be available for exploration at the beginning of November 1999 at http://ups.cs.odu.edu . The representatives of Old Dominion University, the Research Library of the Los Alamos National Laboratory and the University of Ghent expressed their interest in continuing this prototyping work. * The UPS Initiative will soon be given a new name and Web site. __________________ October 29th 1999 get in touch with the UPS initiative by contacting herbert.vandesompel@rug.ac.be ------------------------------------------------------------ Unofficial addenda and application to generalization of CogPrints Archive to all academic desciplines (Stevan Harnad): The Santa Fe meeting was about public archiving of scientific and scholarly research on the Web. <http://vole.lanl.gov/ups/ups.htm> What follows is my own (nonofficial) summary; first, the context: The first, largest and most important such archive is the Los Alamos National Laboratory (LANL) Physics Archive. Founded by Paul Ginsparg in 1991, LANL now houses over 100, 000 papers, mirrored worldwide in 15 countries with over 50,000 users daily and still growing. <http://xxx.lanl.gov/cgi-bin/show_monthly_submissions> Other disciplines and institutions have begun to found public research archives along the lines of LANL but what was needed to promote this public archiving intitiative was conventions that could be jointly adopted to ensure that the archives will be mutually "interoperable," which means that they can be integrated seamlessly into one globally navigable archive (so one need not know where to look for what in advance). With suitable "metadata" tagging that they all share (for example, by title, author, subject, date), any paper in any of these archives could be found from anyone's desktop worldwide, as if it were all in one virtual public library. The participants in the meeting were digital librarians and computer scientists specializing in archiving, metadata, and interoperability, and they included the founders of the principal public research archives that exist so far. Standards and protocols were agreed upon, and in the next six months these will be implemented in the existing archives, and put forward with a name something like "The Santa Fe Agreement," recommended for adoption globally. The Santa Fe participants were diverse in their underlying motivations, but entirely unified in their objective of paving the way for universal public archiving of the scientific and scholarly research literature on the Web. All agreed that this literature is currently being held hostage and needs to be freed. All wanted to free it from (1) the access barriers of the paper medium; most wanted also to free it from (2) the access barriers of journal subscription prices; some wanted to go on and free it also from (3) the access barriers of journal peer review. But the necessary condition for any of these is an interoperable digital literature; the Santa Fe protocol should go a long way toward bringing that to pass. For my own part, I am a confirmed advocate of (1) and (2) and an equally confirmed opponent of (3), but there was no difficulty making common cause with all the parties on interoperability. My own CogPrints archive will now be rewritten according to the agreed upon Santa Fe conventions in such a way as to turn it into generic archive software, ready to be mounted by any university so all its researchers in each of its departments can publicly archive all their papers. http://cogprints.soton.ac.uk/ The new archive software should be available for adoption for free for all within six months. If a significant number of universities worldwide mount and use it, the freeing of the research literature in a global public archive could in principle take place before the end of the year 2000. (The rest is only a question, having led them to the waters of public archiving, of what will induce the research cavalry to drink; the push from the institutional library serials budget crisis, together with the pull from the prospect of universal barrier-free access and impact just might do the trick, now that interoperability is ensured.) See: Harnad, S. (1998) On-Line Journals and Financial Fire-Walls. Nature 395(6698): 127-128 http://www.cogsci.soton.ac.uk/~harnad/nature.html Harnad, S. (1998) The invisible hand of peer review. Nature [online] (5 Nov. 1998) http://helix.nature.com/webmatters/invisible.html -------------------------------------------------------------------- Stevan Harnad harnad@cogsci.soton.ac.uk Professor of Cognitive Science harnad@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.cogsci.soton.ac.uk/~harnad/ Highfield, Southampton http://www.princeton.edu/~harnad/ SO17 1BJ UNITED KINGDOM NOTE: A complete archive of this ongoing discussion of "Freeing the Refereed Journal Literature Through Online Self-Archiving" is available at the American Scientist September Forum (98 & 99): http://amsci-forum.amsci.org/archives/september98-forum.html