Re: Institutional vs. Central Repositories Stevan Harnad 25 Nov 2009 17:10 UTC

On Wed, Nov 25, 2009 at 10:22 AM, Simeon Warner (Arxiv, Cornell) wrote:

> SW: Lots of money is being spent on institutional repositories and,
> so far, the return on that investment is quite low.

Compared to what? It is undeniable that most of the thousands of
institutional repositories are languishing near empty. The only
exceptions are the fewer that a hundred mandated ones.

But that's the point. What's needed is more mandates, not more "investment."
Mandates are what will bring the return on the investment.

And there is another crucial point, constantly overlooked: Most
central repositories are languishing near-empty too! The only reason
it looks otherwise is that usually a subject repository has more
content than an institutional repository. But the reason for that is
quite simple:

The annual worldwide output of an entire field is incomparably bigger
than the annual output of any single institution. So when an
institution contains no more than the usual low baseline for annual
unmandated self-archiving (c. 15% of total annual research output) it
has a much smaller absolute number of annual deposits than a central
repository (even though that too contains only the very same low
baseline 15% of the annual output in the *field as a whole*, across
all institutions, worldwide).

Yes, I know Arxiv is an exception (with an incomparably higher
unmandated central deposit rate for several of its subfields). But
that's the point: Arxiv is, and has been, an exception for nearly 20
years now. No point continuing to hold our breath and hope that the
longstanding spontaneous (unmandated) self-archiving practices of
(some fields of) physics will be adopted by other fields. It's not
happening, and 20 years is an awfully long time.

PubMedCentral might -- and I say might, because no one has done the
calculation -- possibly be doing better than the 15% baseline, but
that's because it is mandated, not because it is central! (Indeed, my
whole point is that the NIH and kindred biomedical self-archiving
mandates would get incomparably more bang for the buck if they
mandated institutional deposit -- and then just harvested/imported to
PMC -- rather than needlessly insisting on direct PMC deposit. For if
NIH mandated institutional deposit, it would help stir the Slumbering
Giant -- the universal providers of all research, funded and funded,
in all fields, namely, the world's universities and research
institutes -- into mandating deposit for all the rest of their annual
research output too.

> SW: I am still optimistic that institutional repositories
> will become more useful but for that to happen there need to be useful
> worldwide (not just UK or European focused because that doesn't match
> research communities) disciplinary services and portals built on top
> them. The Catch 22 here is that disciplinary services have exactly the
> same funding and sustainability issues that disciplinary repositories
> have.

What institutional repositories need is deposit mandates, so they can
have content that is worth building services on top of. It's not the
potential (or the funding) for services that's missing, it's the
content (85%). And to get that content deposited, we need (convergent)
institutional and funder deposit mandates.

> SW: My group manages both Cornell's eCommons institutional repository and
> the arXiv.org disciplinary repository. The effective cost per item (*)
> submitted is more than 10 times higher for the institutional
> repository than the disciplinary repository and the
> benefit/utility/visibility is lower. However, I know exactly who
> should and will fund eCommons (Cornell), and that nicely matches the
> vested interest (Cornell). The community benefit from arXiv.org is
> enormous and the effective cost per new item very low (<$7/item), but
> given 60k new items per year that is a significant cost and
> sustainability is a challenge.

The cost-per-item stats are funny-money. Cornell's problem is not that
it costs too much per item to deposit, it's that the deposits are not
being done, because Cornell has no mandate. That makes the ratio of IR
costs to IR items unsatisfying, of course, but you are missing the
real cause!

Moreover, if all institutions had mandates, the (equally small) cost
per deposited item would be distributed across the planet's 10K
institutional repositories, instead of concentrated on a few central
repositories (most near-empty, like Cornell's, and a few
serendipitously overstocked, like Arxiv).

> SW: I think the best example of a disciplinary service over institutional
> repositories is RePEc in economics. This predates OAI and our current
> conception of IRs but fits the model: institutions (typically
> economics departments (**)) host articles and expose metadata/data via
> a standard interface. The institutionally held content is genuinely
> useful to the economics community because of the disciplinary
> services.

All true. Except again we have here a community that has been
self-archiving (spontaneously, and institutionally) unmandated for
almost as long as Arxiv users. And again, this admirable practice has
not generalized to other fields.

What physicists and economists seem to have in common is that they
find the practice of publicly disseminating working papers --
unrefereed preprints -- useful and productive. That is splendid. I do
too. But the majority of fields do not find it useful. And you can't
mandate making authors' unrefereed drafts public; in some biomedical
fields that might even be dangerous.

But you *can* mandate making refereed, *published* drafts public: they
already are public, since they're published. So all you need to do is
mandate that they also be made freely accessible online, so not only
subscribers can access and use them: so all potential users can.

And that is what OA is about.

> SW: At the end of the day, researchers want and will use disciplinary
> services (look at usage stats for arXiv, ADS, SPIRES, RePEc, PMC, SSRN
> vs IRs). They probably don't care whether the items themselves are
> stored centrally or institutionally.

Correct, for *users*. But users do care whether the items are
accessible at all. And that's what deposit mandates for. And authors
*do* care about whether they need to do multiple deposits; and
institutions *do* care about whether they host their own research
output.

So it does matter whether deposit is mandated institutionally or
centrally by institutions and funders.

The difference is not in functionality, but in content. And you have
no functionality if you have no content.

> SW: Some of Stevan's arguments miss key points:
>
>> sh: (1) Institutions are the universal providers of all research output --
>> funded and unfunded, across all subjects, all institutions, and all
>> nations.
>
> SW: Not true, researchers are the universal providers of research
> output. They often work in teams that span multiple institutions and
> their first allegiance is often to their discipline rather than their
> institution.

That is (sometimes) true, but trivial. Researchers are answerable to
their own institutions (employers) when it comes to the tallying of
their research output for research performance assessment. (You may be
more loyal to "Physics" than to Cornell University, but it is Cornell,
not "Physics," that hires you, pays your salary, and evaluates your
productivity; it is "for" Cornell that you "publish or perish" even if
your heart belongs to "Physics.")

>> sh: (3) OAI-compliant Repositories are all interoperable.
>> sh: (7) The metadata and/or full-text deposits of any OAI compliant
>> repository can be harvested, exported or imported to any OAI compliant
>> repository.
>
> SW: Interoperable to a point, and I say that as one of the creators of
> OAI-PMH. There is plenty of experience showing how hard it is to
> maintain large harvested collections and merge varying metadata
> (e.g. OAIster, NSDL). Institutional repositories are often managed
> with scant attention to maintaining interoperability, managers change
> the OAI-PMH base URL on a whim or do not monitor for errors. Full-text
> often has copyright/license issues preventing import into other
> repositories.

All extremely minor (and readily remediable) points, compared to the
real problem of institutional repositories, which is not that they are
errorful but that they are EMPTY. (No point even fixing the errors
while content is so impoverished. And once content is rich enough,
there's the motivation to clean up errors and maximize
interoperability -- and services.)

>> sh: (11) The solution is to fix the funder locus-of-deposit specs,
>> not to switch to central locus of deposit.
>
> SW: The solution is to build disciplinary services (either on disciplinary
> repositories or over harvested content) that are sufficiently useful
> to motivate researchers to submit of their own free will.

The solution to what problem? The problem I am addressing ('lo these
nearly 20 years) is the absence of the target content over which the
putative services are built. Arxiv does not suffer from this problem,
and saints be praised for that, but that doesn't help the rest of us.

Yes, all kinds of powerful new services would be more than welcome
(and will come) -- but they are useless in the absence of the content
on which they are meant to operate. And it is not researchers as
*users* that are the problem. It is researchers as *authors* -- hence
providers, depositors -- that is the problem. The reason they are
failing to deposit is *not* -- let me save you the trouble of waiting
more years to find this is so -- because the user-services (or even
the author services) are not spiffy enough yet.

They are failing to deposit because their fingers are paralyzed (for
at least 34 reasons).

Harnad, S. (2006) Opening Access by Overcoming Zeno's Paralysis, in
Jacobs, N., Eds. Open Access: Key Strategic, Technical and Economic
Aspects, chapter 8. Chandos.   http://eprints.ecs.soton.ac.uk/12094/

And the cure for that paralysis is deposit mandates: "keystroke
mandates" from their institutions and funders.

And one of the (many) things holding up the adoption of those
keystroke mandates is funders needlessly competing with institutions
for their researchers' keystrokes by mandating central deposit, hence
paralyzed authors' (rightful) resistance to the prospect of divergent
multiple deposit at central sites instead of convergent one-time local
deposit.

> SW: (*) I think effective cost per new item is a good measure of
> repository cost because almost all effort beyond relatively fixed
> costs of keeping the system going tends to be dealing with new
> items. I calculate as operating budget over some period divided by
> number of new items in that period.

But surely you also see that the cost per item deposited depends on
the overall number of items deposited!

> SW: (**) I'm pleased to say that the section of arXiv that overlaps with
> RePEc -- Quantitative Finance (q-fin) -- is also included in RePEc
> (http://ideas.repec.org/s/arx/papers.html).

Splendid. And I wish both Arxiv and RePec all the best in taking their
very useful place among (many) central collections and service-providers.

But let the one-time locus of deposit be where it belongs: in the
researcher's own local institutional repository. And let that be the
convergent locus of deposit for both institutional and funder
mandates.

Amen

Stevan Harnad