Arxiv's Funding Pains May Be A Wake-Up Call: Distributed Versus Central Archiving

Arxiv's Funding Pains May Be A Wake-Up Call: Distributed Versus Central Archiving Stevan Harnad 14 Aug 2011 13:34 UTC
*** Apologies for Cross-Posting ***
Hyperlinked version:
http://openaccess.eprints.org/index.php?/archives/831-guid.html

Comments on:
Ginsparg, Paul (2011) Arxiv at 20. Nature 476: 145–147 doi:10.1038/476145a
http://www.nature.com/nature/journal/v476/n7359/full/476145a.html

&

Fischman, Josh (2011) Anonymous FTP Achives. The First Free
Research-Sharing Site, arXiv, Turns 20 With an Uncertain Future.
Chronicle of Higher Education August 10, 2011
http://chronicle.com/blogs/wiredcampus/the-first-free-research-sharing-site-arxiv-turns-20/32778#disqus_thread

Anonymous FTP archives:
Arxiv (1991) was an invaluable milestone on the road to Open Access.
But it was not the first free research-sharing site: That began in the
1970's with the internet itself, with authors making their papers
freely accessible to all users net-wide by self-archiving them in
their own local institutional "anonymous FTP archives" --
http://www.w3.org/Protocols/rfc959/2_Overview.html

Distributed local websites:
With the creation of the world wide web in 1990, HTTP began replacing
FTP sites for the self-archiving of papers on authors' institutional
websites. FTP and HTTP sites were mostly local and distributed, but
accessible free for all, webwide. Arxiv was the first important
central HTTP site for research self-archiving, with physicists webwide
all depositing their papers in one central locus (first hosted at Los
Alamos). Arxiv's remarkable growth and success were due to both its
timeliness and the fact that it had emerged from a widespread practice
among high energy physicists that had already predated the web,
namely, to share hard copies of their papers before publication by
mailing them to central preprint distribution sites such as SLAC and
CERN.

Central harvesting and search:
At the same time, while physicists were taking to central
self-archiving, in other disciplines (particularly computer science),
distributed self-archiving continued to grow. Later web developments,
notably google and webwide harvesting and search engines, continued to
make distributed self-archiving more and more powerful and attractive.
Meanwhile, under the stimulus of Arxiv itself, the Open Archives
Initiative (OAI) was created in 1999 -- a metadata-harvesting protocol
that made all distributed OAI-compliant websites interoperable, as if
their distributed local contents were all in one global, searchable
archive.

No need for direct central deposit in google:
Together, google and OAI probably marked the end of the need for
central archives. The cost and effort can instead be distributed
across institutions, with all the essential search and retrieval
functionality provided by automated central "overlay" services for
harvesting, indexing, search and retrieval (e.g., OAIster, Scirus,
Base and Google Scholar). Arxiv continues to flourish, because two
decades of invaluable service to the physics community has several
generations of users deeply committed to it. But no other dedicated
central archive has arisen since. Like computer scientists, whose
local, distributed self-archiving is harvested centrally by Citeseerx,
economists, for example, self-archive institutionally, with central
harvesting by RepEc.

Mandating self-archiving:
In biomedicine, PubMed Central looks to be an exception, with direct
central depositing rather than local. But PubMed Central was not a
direct author initiative, like anonymous FTP, author websites or
Arxiv. It was designed by NLM, deposit was mandated by NIH, and
deposit is done not only by authors but by publishers.

Institutions are the universal research providers:
Open Access is still growing far more slowly than it might, and one of
the factors holding it back might be notional conflicts between
institutional and central archiving. It is clear that Open Access
self-archiving will have to be universally mandated, if all
disciplines are to enjoy its benefits (maximized research access,
uptake, usage and impact, minimized costs). The universal providers of
all research paper output, funded and unfunded, are the world's
universities and research institutions, distributed globally across
all scholarly and scientific disciplines, all languages, and all
national boundaries.

Deposit institutionally, harvest centrally:
Hence funder self-archiving mandates like NIH's and institutional
self-archiving mandates like Harvard's need to join forces to
reinforce one another rather than to complete for the same papers, and
the most natural, efficient and economical way to do this is for both
institutiions and funders to mandate that all self-archivingshould be
done locally, in the author's institutional OAI-compliant repository.
The contents of the institutional repositories can then be harvested
automatically by central OAI-compliant repositories such as PubMed
Central (as well as by google and other central harvesters) for global
indexing and search.

Distribute the archiving, rather than the cost:
In this light, Arxiv's self-funding pains may be a wake-up call: Why
should Cornell University (or a "wealthy donor") subsidize a cost that
institutions can best "sponsor" by each doing (and mandating) their
own distributed archiving locally (thereby reducing total cost, to
boot)? After all, no one deposits directly in Google…

Stevan Harnad
EnablingOpenScholarship
http://www.openscholarship.org/

"How to Integrate University and Funder Open Access Mandates"
http://openaccess.eprints.org/index.php?/archives/369-guid.htm

SUMMARY:
Research funder open-access mandates (such as NIH's) and university
open-access mandates (such as Harvard's) are complementary. There is a
simple way to integrate them to make them synergistic and mutually
reinforcing:
      Universities' own
Institutional Repositories (IRs) are the natural locus for the direct
deposit of their own research output: Universities (and research
institutions) are the universal research providers of all research
(funded and unfunded, in all fields) and have a direct interest in
archiving, monitoring, measuring, evaluating, and showcasing their own
research assets -- as well as in maximizing their uptake, usage and
impact.

Both universities and funders should accordingly mandate deposit of
all peer-reviewed final drafts (postprints), in each author's own
university IR, immediately upon acceptance for publication, for
institutional and funder record-keeping purposes. Access to that
immediate postprint deposit in the author's university IR may be set
immediately as Open Access if copyright conditions allow; otherwise
access can be set as Closed Access, pending copyright negotiations or
embargoes. All the rest of the conditions described by universities
and funders should accordingly apply only to the timing and copyright
conditions for setting open access to those deposits, not to the
depositing itself, its locus or its timing.
      As a result, (1) there will be a common deposit locus for all
research output worldwide; (2) university mandates will reinforce and
monitor compliance with funder mandates; (3) funder mandates will
reinforce university mandates; (4)
legal details concerning open-access provision, copyright and
embargoes will be applied independently of deposit itself, on a case
by case basis, according to the conditions of each mandate; (5)
opt-outs will apply only to copyright negotiations, not to deposit
itself, nor its timing; and (6) any central OA repositories can then
harvest the postprints from the authors' IRs under the agreed
conditions at the agreed time, if they wish.