On Mon, 11 Aug 2003, Matthew Cockerill wrote: >sh> "The use one makes of those full texts is to read them, >sh> print them off, quote/comment them, cite them, and use >sh> their *contents* in further research, building on them. >sh> What is "re-use"? And what is "redistribution" (when >sh> everyone on the planet with access to the web has access >sh> to the full-text of every such article)?" > > Having free access to articles on the publisher's website would certainly > offer progress compared to the current status quo. But it would not offer > anything like the benefits of true open access. Free access to the current 20,000 journals (2 million articles yearly) would be like the difference between night and day. Compared to that, the difference between "free" and "true open" access amounts to just a few degrees of luminosity. But let me agree at once that if free access were gerrymandered so all the user could do was to browse the text on-screen, without being able to download, save, grep, or print-off, then that would indeed arbitrarily limit free access's usefulness. How many (if any) of the several million free-access refereed-journal articles currently on the web, however -- whether BOAI-1, BOAI-2, or otherwise -- are gerrymandered in that way? If (as I suspect) the answer is "very few" or even "none that I know of," then this hypothetical constraint is not worth another moment's thought or energy diverted from the real task at hand, which is to turn night into day, as soon as possible. > Here are just some of the > reasons why re-use and re-distribution rights are vital to open access: > > (1) Digital permanence - it is not enough for the publisher to be the only > body which curates the full archive of published research content. To ensure > long term digital permanence of the scientific record, it is vital that > articles should be deposited with multiple archives, and redistributable > from and between those archives. It seems to me that this is conflating (arbitrarily) two completely independent matters. One is toll-free online *access* to the articles in the 20K journals that are currently only accessible via tolls. The other is the *preservation* of that toll-based corpus. Well, preservation of that toll-based corpus was always a concern, in on-paper days as in on-line days, and the concern has nothing whatsoever to do with free (or open) access! We could have a failsafe preservation system without free access, or we could have a failsafe preservation with free access; or we could have an uncertain preservation system without free access (as we do now) or an uncertain preservation system with free access (bringing the present system out into the light of day). The preservation burden has to be (and will be, and is being) faced in any case. Why on earth should that entirely orthogonal longterm task be coupled in *any way* to the immediate and urgent problem of free access today? And why should "open access" be linked with or defined in terms of the eventual solution to the preservation problem, one way or the other? (This is not an argument for indifference to preservation: it is an argument for decoupling two completely independent desiderata.) > (2) A flexible choice of tools for searching and browsing > The reason that Google exists is because the web is free for anyone to > download and index. As a result, there is competition among search engines, > and Google had the incentive to develop a better system for indexing web > pages, which has since driven other search engine companies to improve the > tools they offer. > > Compare this with the situation with scientific research. If the research > resides only on the publisher's site, you don't have a free choice of what > tools you use to search and browse it - you are stuck with what that > particular publisher provides you with. We are quite squarely in the domain of hypotheticals here. (Which publisher's free-access corpus, inaccessible to google, are we talking about?) But let us suppose that a publisher provides free access -- not gerrymandered free access, but free access that allows downloading, saving, grepping and printing: First, I will bet that such a publisher will want to maximize the visibility and impact of his contents by allowing at least the indexing metadata to be harvested, both by google, and by the OAI search engines specializing in the refereed journal literature. But even if we get doubly hypothetical here, and suppose the publisher does *not* disclose the metadata to harvesters, there is still a super-simple solution: Every author has an online CV. Their CV will contain the metadata for every one of their journal publications. (Such CVs can and will be OAI-compliant: http://paracite.eprints.org/cgi-bin/rae_front.cgi ). Add the URL for the free-access full-text on the publisher's website to your CV entry and the circle is closed. (Better still, also self-archive the full text in your own institutional OAI-compliant repository!) End of story. > This ties in with developments in Grid computing (e.g. > http://www.escience-grid.org.uk/ ). With open access, published research > would be available "on tap" via the grid, and scientists would be able to > use their preferred choice of grid tools to access the data, rather than > being stuck with the tools provided by the publisher. As stated above, the CV/OAI gambit above already trivially takes care of closing the circle. I agree, though, that for many research purposes, it is beneficial to have not just the metadata but the full-text inverted and indexed, as well as agent-harvestable and. Again, if the publisher's free-access site doesn't do this, the author's institutional site certainly can and will. In fact, authors and their institutions are the ones with the most direct interest in making sure their own research output is maximally usable in this way. http://www.ecs.soton.ac.uk/~harnad/Temp/unto-others.html Let us not, however, conflate article-text archiving with data-archiving. Data-archiving is important too, but it is an extra: an independent new bonus of the online era, having nothing to do with the question of toll-free access to article-texts. In the paper era, raw data were not published, just summarized in what was published. Eventually data will no doubt be incorporated into online publications in some way, but until then there is certainly no need for authors to wait! They can publish their article, as before, and, in addition, self-archive the data on which their article is based in their own OAI-compliant institutional research repository (the same repository in which the full-text of their article can and should be self-archived too, whether it appears in an open-access journal, a toll-access journal, or a toll-access journal that offers toll-free access too). Again, the online CV can close the circle, if it is not already closed of its own accord. And this way, although it is functionally independent, data-archiving can help speed the progress toward toll-free full-text access too. > (3) Datamining > > With a million or so biomedical research articles being published each year, > the sheer volume of output is an obstacle to the comprehension and synthesis > of the results reported in that research. If the XML of the articles can be > brought together in one place then the tools of datamining can be applied to > it to extract useful but non-obvious information. Agreed. See above. But before we get carried away with the potential perks, let's not forget the still absent basics: Let there be Light (toll-free full-text access), now! Leave the Solar-Energy and Club-Med projects for when we already have our daily fill of photons. > The simplest type of datamining is citation analysis > > Currently you need to pay ISI a lot of money to find out what cites what, > but with true open access, citation analysis becomes trivial. Perhaps not quite trivial. (There's still the problem of parsing, identifying and linking the citations for all those articles without the ultimate mark-up: But we're working on it: http://opcit.eprints.org/ ). But again, this is an independent perk, because you could have universal citation linking and analysis even *without* toll-free full-text access! For an article's reference list, like its indexing metadata (and its accompanying empirical data) can all be self-archived by the author (guess where?). We are in fact promoting this solution for royalty-based books, whose authors, unlike journal article-authors, are unlikely to want to make their full-texts accessible toll-free. Their metadata and reference lists, however, are another matter, and can (and will) be tucked into the institutional OAI-compliant repository too, with a new indicator of global book citation impact as the harvestable reward. http://www.ariadne.ac.uk/issue35/harnad/ > So, for example, if you view a PubMed record: > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_ui > ds=11667947&dopt=Abstract > you already get links to all the full text articles in PubMed Central which > cite that PubMed item > http://www.pubmedcentral.gov/tocrender.fcgi?action=cited&tool=pubmed&pubmedi > d=11667947 And if you look at citebase, you will see how this generalizes to the entire OAI-compliant literature: http://citebase.eprints.org/cgi-bin/search > The more true open access research that is published and archived at PubMed > Central, the more useful this becomes for biomedical researchers. [Sure, > "screen-scaping" HTML from free articles displayed on publisher sites could > give some citation information, but with nothing like the ease, accuracy and > reliability that it can be obtained with the use of XML data, as at PubMed > Central]. Fine. But I'd rather have toll-free access to all 20K journals right now, rather than waiting for these XML perks -- wouldn't you? Again, toll-free access is one thing -- and extremely important, already reachable, and already overdue -- and potential perks such as citation-based navigation are another. Let there be light first; then we can worry about calibrating the photometers on our Yashicas. > Beyond citation analysis, there are many other forms of datamining that are > possible: > For more information see: > http://www.biomedcentral.com/info/about/datamining/ > > e.g. Research articles can be mined for details of protein interactions > http://bioinfo.mshri.on.ca/prebind/ See above. Right now, it is an indisputable fact that open-access publishing today (BOAI-2) is the solution only for that 5% of the literature (of 20K journals) that has a suitable open-access journal today. The immediate solution for all the rest is self-archiving (BOAI-1), rather than continuing to wait for more open-access journals to spawn and grow. (If, in the meanwhile, toll-access publishers also want to help hasten things along by providing free access, they are certainly welcome to do so! I still regret -- for the sake of open access -- that the BOAI http://www.soros.org/openaccess/sign2.shtml?o was not ready to count it as publisher support of open access if a toll-access journal supported author self-archiving of their articles http://www.ecs.soton.ac.uk/~harnad/Temp/rcoptable.gif: *Of course* that is publisher support for open access! By the same token, I would certainly consider it as publisher support for open access if a toll-access journal made its full-text contents publicly accessible online toll-free. Even if it was gerrymandered full-text access -- as long as they also supported self-archiving!) > And as scientific content is increasingly marked up using richer forms of > semantically meaningful XML (e.g. CML for chemical structures, MathML for > equations), the value of datamining will continue to increase. All true. And it will all prevail eventually. But we need free access *now*. http://www.ecs.soton.ac.uk/~harnad/Temp/che.htm > The BioLINK group are using BioMed Central's open access corpus as the raw > material for a datamining competition, designed to stimulate progress in the > development of tools for biological datamining. > http://www.pdg.cnb.uam.es/BioLINK/BioCreative_task2.html That is commendable and welcome. But it must not be forgotten what percentage of the annual biological journal literature that sample actually represents. We must not be held back to that small percentage because we are informed that mere free access is not good enough -- not "true open access." Such rarefied fussiness does not serve the cause of either free or open access at this point. > (4) Derivative works and compilations > Say that a scientist performs a meta-analysis on a group of published > clinical trials, and wants to make available the conclusions of that > research. Or perhaps a datamining researcher has taken a corpus of 1000 > articles breast cancer, and established some interesting conclusions. All very welcome and valuable (indeed, inevitable) developments in the online age. But I'd rather that progress toward free access for all 20K did not wait for these perks. Indeed, the sooner we have free access, the sooner the rest will come too. > In a true open access environment, each is free to post the results of their > research, *along with* the actual corpus of data which the research was > based on (effectively, the raw data for that research). > But in a non-open access environment, that raw data (i.e. the research > articles) cannot be redistributed, which makes it far more difficult than it > needs to be for other scientists to reproduce, critique and follow up the > work. I am afraid I have to disagree. As already noted above, authors are as free to self-archive (in their institutional repositories) the empirical data underlying their toll-access publications as they are to do so with the data underlying their open-access publications. Data-archiving is another thing for which there is no point sitting around awaiting the era of universal open-access publishing. Data-archiving will encourage article self-archiving, and both will hasten the era of universal open-access. > Similarly, a scientist may wish to make a point by assembling a collection > of certain articles or article fragments (perhaps they wish to assemble a > comparison of the methods used for a certain technique). > In an open access world, as long as they cite the sources, they are > completely free to create and redistribute that compilation. Such a > selective compilation may in itself be extremely useful contribution to > science. I can't follow this at all. A compilation is a list of articles, whether online or on-paper, whether toll-access of open-access. If the full-texts of the texts are *free* access, all the compilation need list is their URLs. (Ditto for article "fragments": try section number, paragraph number, or even [yech!] PDF page number.) > (5) Print redistribution rights - the National Health Service, for example, > should be able to redistribute thousands of printed copies of an important > research article (which it may have funded) to its doctors if it wishes to > do so. It should not have to pay a hefty copyright fee for the privilege. I have no views on this, but it has nothing to do with open access, which even in the strict BOAI definition refers to online access, not to multiple printing and redistribution rights. Besides, this is all becoming moot in the online era: Why distribute print copies instead of URLs, if the texts are publicly accessible online toll-free? (I think it is a big mistake, and clouds the issue, to try to link online toll-free access arguments with paper-printing rights. Don't forget that those worthy paper-based arguments would have been just as worthy in the paper era. So surely they are *not* what has changed in the online era.) > Certainly, print redistribution will likely become less significant in the > future, but there is no logical reason that the scientific community should > not be free to exchange and distribute the research that it has created in > print form, as well as online. The case for multiple printing rights is *much* weaker than the case for toll-free online access. Please let us not needlessly weaken the case for free access by handicapping it with such needless extra burdens. Free access will erode the need to print, even as it erodes publisher opposition to printing. But now, all fussing about print "redistribution" rights does is provoke needless opposition, to no good purpose. Keep it light, till everyone sees the light. Stevan Harnad NOTE: A complete archive of the ongoing discussion of providing open access to the peer-reviewed research literature online is available at the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03): http://amsci-forum.amsci.org/archives/september98-forum.html or http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html Discussion can be posted to: september98-forum@amsci-forum.amsci.org