Fwd: Ranking Web of Repositories: July 2010 Edition

Fwd: Ranking Web of Repositories: July 2010 Edition Stevan Harnad 09 Jul 2010 18:24 UTC
---------- Forwarded message ----------
From: Leslie Carr <lac -- ecs.soton.ac.uk>
Date: Fri, Jul 9, 2010 at 1:04 PM
Subject: Re: Ranking Web of Repositories: July 2010 Edition
To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM@listserver.sigmaxi.org

On 9 Jul 2010, at 08:12, Isidro F. Aguillo wrote:

> However perhaps you will like this page we prepared for the University rankings related to UK universities commitment to OA:
> http://www.webometrics.info/openac.html

Thanks for preparing the page - it is very informative and helpful in
answering questions about the interpretation of the IR ranking
relating to the discrepancy between the relative ordering of
institutions in the IR list and other (independent) research rankings.

As you point out, much of the difference is explained by the relative
"openness" of each institution's literature. Since 50% of the score is
devoted to in-links, and there is little motivation to link to an
empty bibliographic record, a high proportion of OA papers will tend
to attract more links, more traffic and hence a more "impactful"
repository.

Some institutions have therefore benefited from their efforts to
deposit OA papers, becoming more visible and hence more highly rated.
Others are seeing the opposite effect -  institutions that would
normally be at the top of any research list are much lower down than
expected. Some of these institutions don't have very effective
repositories and some do but hide them behind firewalls. Either way
the net effect is the same - not much visible public literature to
attract links or traffic.

I hope that the effect of this "league table" will be to encourage
institutions to redouble their efforts in regard to Open Access. I
also hope that it will be possible to have further public dialogue so
that the process can be increasingly open and the community can better
understand, verify and trust your metrics.

Thanks again for your contribution!
--
Les Carr

On 9 Jul 2010, at 08:12, Isidro F. Aguillo wrote:

> Dear Stevan:
>
> A lot of interesting stuff to think about. We are already working on some  of those proposals but it is not easy. However perhaps you will like this page we prepared for the University rankings related to UK universities commitment to OA:
>
> http://www.webometrics.info/openac.html
>
> Thanks for your useful comments,
>
>
>
> El 08/07/2010 18:34, Stevan Harnad escribió:
>> On 2010-07-08, at 4:43 AM, Isidro F. Aguillo wrote:
>>
>>> Dear Hélène:
>>>
>>> Thank you for your message, but I disagree with your proposal. We are not measuring only contents but contents AND visibility in the web.
>> Dear Isidro,
>>
>> If I may intervene with some comments too, as this discussion has some wider implications:
>>
>> Yes, you are measuring both contents and visibility, but presumably you want the difference between (1) the ranking of the top 800 repositories and  (2) the ranking of the top 800 *institutional* repositories to be based on  the fact that the latter are institutional repositories whereas the former  are all repositories (central, i.e., multi-institutional, as well as institutional).
>>
>> Moreover, if you list redundant repositories (some being the proper subsets of others) in the very same ranking, it seems to me the meaning of the ranking becomes rather vague.
>>
>>> Certainly HyperHAL covers the contents of all its participants, but the  impact of these contents depends of other factors. Probably researchers prefer to link to the paper in INRIA because of the prestige of this institution, the affiliation of the author or the marketing of their institutional repository.
>> All true, but perhaps the significance and usefulness of the rankings would be greater if you either changed the weight of the factors (volume of full-text content, number of links) or, alternatively, you designed the rankings so the user could select and weight the criteria on which the rankings  are displayed.
>>
>> Otherwise your weightings become like the "h-index" -- an a-priori combination of untested, unvalidated weights that many users may not be satisfied with, or fully informed by...
>>
>>> But here is a more important aspect. If I were the president of INRIA I  will prefer people using my institutional repository instead CCSD. No problem with the last one, they are makinng a great job and increasing the reach of INRIA, but the papers deposited are a very important (the most important?) asset of INRIA.
>> But how much INRIA papers are linked, downloaded and cited is not necessarily (or even probably) a function of their direct locus!
>>
>> What is important for INRIA (and all institutions) is that as much as possible of their paper output should be OA, simpliciter, so that it can be linked, downloaded, read, applied, used and cited. It is entirely secondary,  for INRIA (and all institutions), *where* their papers are OA, compared to  the necessary condition *that* they are OA (and hence freely accessible, usaeble, harvestable).
>>
>> Hence (in my view) by far the most important ranking factor for institutional repositories is how much of their full-text institutional paper output is indeed deposited and OA. INRIA would have no reason to be disappointed  if the locus from which its content is searched, retrieved and linked is some other, multi-institutional harvester. INRIA still gets the credit and benefits from all the links, downloads and citations of INRIA content!
>>
>> (Having said that, locus of deposit *does* matter, very much, for deposit mandates, Deposit mandates are necessary in order to generate OA content.  And, for strategic reasons that are elaborated in my reply to Chris Armbruster, it makes a big practical difference for success in agreeing on the adoption of a mandate that both institutional and funder mandates should require convergent *institutional* deposit, rather than divergent and competing  institutional vs. institution-extermal deposit. Here too, your repository rankings would be much more helpful and informative if they gave a greater weight to the relative size of each institutional repository's content and eliminated multi-institutional repositories from the institutional repository rankings -- or at least allowed institutional repositories to be ranked independently on content vs links.
>>
>> I think you are perhaps being misled here by the analogy with your sister rankings http://www.webometrics.info/ RWWU of universities rather than their repositories In university rankings, the links to the university site itself matter a lot. But in repository rankings links matter much less than *how much institutional content is accessible*. For the degree of usage of that content, harvester sites may be more relevant measures, and, after all, downloads and citations, unlike links, carry their credits (to the authors and institutions) with them no matter where the transaction happens to occur...
>>
>>> Regarding the other comments we are going to correct those with mistakes but it is very difficult for us to realize that Virginia Tech University is "faking" its institutional repository with contents authored by external  scholars.
>> I have called Gail McMillan at Virginia Tech about this, and she has explained it to me. The question was never whether Virginia Tech was "faking"!  They simply host content over and above Virginia Tech content -- for example, OA journals whose content originates from other institutions.
>>
>> As such, the Virginia Tech repository, besides providing access to Virgina Tech content,  is also conduit or portal for accessing the content of those other institutions. The "credit" for providing the conduit, goes to Virginia Tech, of course. But the credit for the links, usage and citations goes to those other institutions! (When an institutional repository is also  used as a portal for other institutions, its function becomes a hybrid one  -- both an aggregator and a provider. I think it's far more useful and important to try to keep those functions separate, in both the rankings and the weightings.
>>
>> Best wishes,
>>
>> Stevan
>>
>>> El 07/07/2010 23:03, Hélène.Bosc escribió:
>>>> Isidro,
>>>> Thank you for your Ranking Web of World Repositories and for informing  us about the best quality repositories!
>>>>
>>>>
>>>> Being French, I am delighted to see HAL so well ranked and I take this  opportunity to congratulate Franck Laloe for having set up such a good national repository as well as the CCSD team for continuing to maintain and improve it.
>>>>
>>>> Nevertheless, there is a problem in your ranking that I have already had occasion to point out to you in private messages.
>>>> May I remind you that:
>>>>
>>>> Correction for the top 800 ranking:
>>>>
>>>>
>>>> The ranking should either index HyperHAL alone, or index both HAL/INRIA and HAL/SHS, but not all three repositories at the same time: HyperHAL includes both HAL/INRIA and HAL/SHS .
>>>>
>>>> Correction for the ranking of institutional repositories:
>>>>
>>>>
>>>> Not only does HyperHAL (#1) include both HAL/INRIA (#3) and HAL/SHS (#5), as noted above, but HyperHAL is a multidisciplinary repository, intended to collect all French research output, across all institutions. Hence it should not be classified and ranked against individual institutional repositories but as a national, central repository. Indeed, even HAL/SHS is multi-institutional in the usual sense of the word: single universities or research institutions. The classification is perhaps being misled by the polysemous use of the word "institution."
>>>>
>>>>
>>>> Not to seem to be biassed against my homeland, I would also point out that, among the top 10 of the top 800 "institutional repositories," CERN (#2) is to a certain extent hosting multi-institutional output too, and is hence not strictly comparable to true single-institution repositories. In addition, "California Institute of Technology Online Archive of California" (#9) is misnamed -- it is the Online Archive of California http://www.oac.cdlib.org/ (CDLIB, not CalTech) and as such it too is multi-institutional. And  Digital Library and Archives Virginia Tech University (#4) may also be anomalous, as it includes the archives of electronic journals with multi-institutional content. Most of the multi-institutional anomalies in the "Top 800  Institutional" seem to be among the top 10 -- as one would expect if multiple institutional content is inflating the apparent size of a repository. Beyond the top 10 or so, the repositories look to be mostly true institutional ones.
>>>>
>>>>
>>>> I hope that this will help in improving the next release of your increasingly useful ranking!
>>>>
>>>>
>>>> Best wishes
>>>> Hélène Bosc
>>>>
>>>> ----- Original Message ----- From: "Stevan Harnad"<harnad@ECS.SOTON.AC.UK>
>>>> To:<AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM@LISTSERVER.SIGMAXI.ORG>
>>>> Sent: Tuesday, July 06, 2010 6:07 PM
>>>> Subject: Fwd: Ranking Web of Repositories: July 2010 Edition
>>>>
>>>>
>>>>
>>>> Begin forwarded message:
>>>>
>>>> From: "Isidro F. Aguillo"<isidro.aguillo@CCHS.CSIC.ES>
>>>> Date: July 6, 2010 11:13:58 AM EDT
>>>> To: SIGMETRICS@listserv.utk.edu
>>>> Subject: [SIGMETRICS] Ranking Web of Repositories: July 2010 Edition
>>>>
>>>> Ranking Web of Repositories: July 2010 Edition
>>>>
>>>> The second edition of 2010 Ranking Web of Repositories has been published the same day OR2010 started here in Madrid. The ranking is available from the following URL:
>>>>
>>>> http://repositories.webometrics.info/
>>>>
>>>> The main novelty is the substantial increase in the number of repositories analyzed (close to 1000). The Top 800 are ranked according to their web presence and visibility. As usual thematic repositories (CiteSeer, RePEc,  Arxiv) leads the Ranking, but the French research institutes (CNRS, INRIA,  SHS) using HAL are very close.  Two issues have changed from previous editions from a methodologicall point of view:, the use of Bing's engine data  has been discarded due to irregularities in the figures obtained and MS Excel files has been excluded again.
>>>>
>>>> At the end of July the new edition of the Rankings of universities, research centers and hospitals will be published.
>>>>
>>>> Comments, suggestions and additional information are greatly appreciated.
>>>>
>>>
>>> --
>>> ===========================
>>>
>>> Isidro F. Aguillo, HonPhD
>>> Cybermetrics Lab (3C1)
>>> IPP-CCHS-CSIC
>>> Albasanz, 26-28
>>> 28037 Madrid. Spain
>>>
>>>
>>> Editor of the Rankings Web
>>> ===========================
>
>
> --
> ===========================
>
> Isidro F. Aguillo, HonPhD
> Cybermetrics Lab (3C1)
> IPP-CCHS-CSIC
> Albasanz, 26-28
> 28037 Madrid. Spain
>
>
> Editor of the Rankings Web
> ===========================