On Wed, Nov 25, 2009 at 10:22 AM, Simeon Warner (Arxiv, Cornell) wrote: > SW: Lots of money is being spent on institutional repositories and, > so far, the return on that investment is quite low. Compared to what? It is undeniable that most of the thousands of institutional repositories are languishing near empty. The only exceptions are the fewer that a hundred mandated ones. But that's the point. What's needed is more mandates, not more "investment." Mandates are what will bring the return on the investment. And there is another crucial point, constantly overlooked: Most central repositories are languishing near-empty too! The only reason it looks otherwise is that usually a subject repository has more content than an institutional repository. But the reason for that is quite simple: The annual worldwide output of an entire field is incomparably bigger than the annual output of any single institution. So when an institution contains no more than the usual low baseline for annual unmandated self-archiving (c. 15% of total annual research output) it has a much smaller absolute number of annual deposits than a central repository (even though that too contains only the very same low baseline 15% of the annual output in the *field as a whole*, across all institutions, worldwide). Yes, I know Arxiv is an exception (with an incomparably higher unmandated central deposit rate for several of its subfields). But that's the point: Arxiv is, and has been, an exception for nearly 20 years now. No point continuing to hold our breath and hope that the longstanding spontaneous (unmandated) self-archiving practices of (some fields of) physics will be adopted by other fields. It's not happening, and 20 years is an awfully long time. PubMedCentral might -- and I say might, because no one has done the calculation -- possibly be doing better than the 15% baseline, but that's because it is mandated, not because it is central! (Indeed, my whole point is that the NIH and kindred biomedical self-archiving mandates would get incomparably more bang for the buck if they mandated institutional deposit -- and then just harvested/imported to PMC -- rather than needlessly insisting on direct PMC deposit. For if NIH mandated institutional deposit, it would help stir the Slumbering Giant -- the universal providers of all research, funded and funded, in all fields, namely, the world's universities and research institutes -- into mandating deposit for all the rest of their annual research output too. > SW: I am still optimistic that institutional repositories > will become more useful but for that to happen there need to be useful > worldwide (not just UK or European focused because that doesn't match > research communities) disciplinary services and portals built on top > them. The Catch 22 here is that disciplinary services have exactly the > same funding and sustainability issues that disciplinary repositories > have. What institutional repositories need is deposit mandates, so they can have content that is worth building services on top of. It's not the potential (or the funding) for services that's missing, it's the content (85%). And to get that content deposited, we need (convergent) institutional and funder deposit mandates. > SW: My group manages both Cornell's eCommons institutional repository and > the arXiv.org disciplinary repository. The effective cost per item (*) > submitted is more than 10 times higher for the institutional > repository than the disciplinary repository and the > benefit/utility/visibility is lower. However, I know exactly who > should and will fund eCommons (Cornell), and that nicely matches the > vested interest (Cornell). The community benefit from arXiv.org is > enormous and the effective cost per new item very low (<$7/item), but > given 60k new items per year that is a significant cost and > sustainability is a challenge. The cost-per-item stats are funny-money. Cornell's problem is not that it costs too much per item to deposit, it's that the deposits are not being done, because Cornell has no mandate. That makes the ratio of IR costs to IR items unsatisfying, of course, but you are missing the real cause! Moreover, if all institutions had mandates, the (equally small) cost per deposited item would be distributed across the planet's 10K institutional repositories, instead of concentrated on a few central repositories (most near-empty, like Cornell's, and a few serendipitously overstocked, like Arxiv). > SW: I think the best example of a disciplinary service over institutional > repositories is RePEc in economics. This predates OAI and our current > conception of IRs but fits the model: institutions (typically > economics departments (**)) host articles and expose metadata/data via > a standard interface. The institutionally held content is genuinely > useful to the economics community because of the disciplinary > services. All true. Except again we have here a community that has been self-archiving (spontaneously, and institutionally) unmandated for almost as long as Arxiv users. And again, this admirable practice has not generalized to other fields. What physicists and economists seem to have in common is that they find the practice of publicly disseminating working papers -- unrefereed preprints -- useful and productive. That is splendid. I do too. But the majority of fields do not find it useful. And you can't mandate making authors' unrefereed drafts public; in some biomedical fields that might even be dangerous. But you *can* mandate making refereed, *published* drafts public: they already are public, since they're published. So all you need to do is mandate that they also be made freely accessible online, so not only subscribers can access and use them: so all potential users can. And that is what OA is about. > SW: At the end of the day, researchers want and will use disciplinary > services (look at usage stats for arXiv, ADS, SPIRES, RePEc, PMC, SSRN > vs IRs). They probably don't care whether the items themselves are > stored centrally or institutionally. Correct, for *users*. But users do care whether the items are accessible at all. And that's what deposit mandates for. And authors *do* care about whether they need to do multiple deposits; and institutions *do* care about whether they host their own research output. So it does matter whether deposit is mandated institutionally or centrally by institutions and funders. The difference is not in functionality, but in content. And you have no functionality if you have no content. > SW: Some of Stevan's arguments miss key points: > >> sh: (1) Institutions are the universal providers of all research output -- >> funded and unfunded, across all subjects, all institutions, and all >> nations. > > SW: Not true, researchers are the universal providers of research > output. They often work in teams that span multiple institutions and > their first allegiance is often to their discipline rather than their > institution. That is (sometimes) true, but trivial. Researchers are answerable to their own institutions (employers) when it comes to the tallying of their research output for research performance assessment. (You may be more loyal to "Physics" than to Cornell University, but it is Cornell, not "Physics," that hires you, pays your salary, and evaluates your productivity; it is "for" Cornell that you "publish or perish" even if your heart belongs to "Physics.") >> sh: (3) OAI-compliant Repositories are all interoperable. >> sh: (7) The metadata and/or full-text deposits of any OAI compliant >> repository can be harvested, exported or imported to any OAI compliant >> repository. > > SW: Interoperable to a point, and I say that as one of the creators of > OAI-PMH. There is plenty of experience showing how hard it is to > maintain large harvested collections and merge varying metadata > (e.g. OAIster, NSDL). Institutional repositories are often managed > with scant attention to maintaining interoperability, managers change > the OAI-PMH base URL on a whim or do not monitor for errors. Full-text > often has copyright/license issues preventing import into other > repositories. All extremely minor (and readily remediable) points, compared to the real problem of institutional repositories, which is not that they are errorful but that they are EMPTY. (No point even fixing the errors while content is so impoverished. And once content is rich enough, there's the motivation to clean up errors and maximize interoperability -- and services.) >> sh: (11) The solution is to fix the funder locus-of-deposit specs, >> not to switch to central locus of deposit. > > SW: The solution is to build disciplinary services (either on disciplinary > repositories or over harvested content) that are sufficiently useful > to motivate researchers to submit of their own free will. The solution to what problem? The problem I am addressing ('lo these nearly 20 years) is the absence of the target content over which the putative services are built. Arxiv does not suffer from this problem, and saints be praised for that, but that doesn't help the rest of us. Yes, all kinds of powerful new services would be more than welcome (and will come) -- but they are useless in the absence of the content on which they are meant to operate. And it is not researchers as *users* that are the problem. It is researchers as *authors* -- hence providers, depositors -- that is the problem. The reason they are failing to deposit is *not* -- let me save you the trouble of waiting more years to find this is so -- because the user-services (or even the author services) are not spiffy enough yet. They are failing to deposit because their fingers are paralyzed (for at least 34 reasons). Harnad, S. (2006) Opening Access by Overcoming Zeno's Paralysis, in Jacobs, N., Eds. Open Access: Key Strategic, Technical and Economic Aspects, chapter 8. Chandos. http://eprints.ecs.soton.ac.uk/12094/ And the cure for that paralysis is deposit mandates: "keystroke mandates" from their institutions and funders. And one of the (many) things holding up the adoption of those keystroke mandates is funders needlessly competing with institutions for their researchers' keystrokes by mandating central deposit, hence paralyzed authors' (rightful) resistance to the prospect of divergent multiple deposit at central sites instead of convergent one-time local deposit. > SW: (*) I think effective cost per new item is a good measure of > repository cost because almost all effort beyond relatively fixed > costs of keeping the system going tends to be dealing with new > items. I calculate as operating budget over some period divided by > number of new items in that period. But surely you also see that the cost per item deposited depends on the overall number of items deposited! > SW: (**) I'm pleased to say that the section of arXiv that overlaps with > RePEc -- Quantitative Finance (q-fin) -- is also included in RePEc > (http://ideas.repec.org/s/arx/papers.html). Splendid. And I wish both Arxiv and RePec all the best in taking their very useful place among (many) central collections and service-providers. But let the one-time locus of deposit be where it belongs: in the researcher's own local institutional repository. And let that be the convergent locus of deposit for both institutional and funder mandates. Amen Stevan Harnad