Date: Tue, 12 Aug 2003 09:14:45 +0100 From: Jan Velterop <jan@biomedcentral.com> Subject: RE: Free Access vs. Open Access Posting on behalf of Matt Cockerill: Stevan asks: "The use one makes of those full texts is to read them, print them off, quote/comment them, cite them, and use their *contents* in further research, building on them. What is "re-use"? And what is "redistribution" (when everyone on the planet with access to the web has access to the full-text of every such article)?" Having free access to articles on the publisher's website would certainly offer progress compared to the current status quo. But it would not offer anything like the benefits of true open access. Here are just some of the reasons why re-use and re-distribution rights are vital to open access: (1) Digital permanence - it is not enough for the publisher to be the only body which curates the full archive of published research content. To ensure long term digital permanence of the scientific record, it is vital that articles should be deposited with multiple archives, and redistributable from and between those archives. (2) A flexible choice of tools for searching and browsing The reason that Google exists is because the web is free for anyone to download and index. As a result, there is competition among search engines, and Google had the incentive to develop a better system for indexing web pages, which has since driven other search engine companies to improve the tools they offer. Compare this with the situation with scientific research. If the research resides only on the publisher's site, you don't have a free choice of what tools you use to search and browse it - you are stuck with what that particular publisher provides you with. This ties in with developments in Grid computing (e.g. http://www.escience-grid.org.uk/ ). With open access, published research would be available "on tap" via the grid, and scientists would be able to use their preferred choice of grid tools to access the data, rather than being stuck with the tools provided by the publisher. (3) Datamining With a million or so biomedical research articles being published each year, the sheer volume of output is an obstacle to the comprehension and synthesis of the results reported in that research. If the XML of the articles can be brought together in one place then the tools of datamining can be applied to it to extract useful but non-obvious information. The simplest type of datamining is citation analyis Currently you need to pay ISI a lot of money to find out what cites what, but with true open access, citation analysis becomes trivial. So, for example, if you view a PubMed record: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_ui ds=11667947&dopt=Abstract you already get links to all the full text articles in PubMed Central which cite that PubMed item http://www.pubmedcentral.gov/tocrender.fcgi?action=cited&tool=pubmed&pubmedi d=11667947 The more true open access research that is published and archived at PubMed Central, the more useful this becomes for biomedical researchers. [Sure, "screen-scaping" HTML from free articles displayed on publisher sites could give some citation information, but with nothing like the ease, accuracy and reliability that it can be obtained with the use of XML data, as at PubMed Central]. Beyond citation analysis, there are many other forms of datamining that are possible: For more information see: http://www.biomedcentral.com/info/about/datamining/ e.g. Research articles can be mined for details of protein interactions http://bioinfo.mshri.on.ca/prebind/ And as scientific content is increasingly marked up using richer forms of semantically meaningful XML (e.g. CML for chemical structures, MathML for equations), the value of datamining will continue to increase. The BioLINK group are using BioMed Central's open access corpus as the raw material for a datamining competition, designed to stimulate progress in the development of tools for biological datamining. http://www.pdg.cnb.uam.es/BioLINK/BioCreative_task2.html (4) Derivative works and compilations Say that a scientist performs a meta-analysis on a group of published clinical trials, and wants to make available the conclusions of that research. Or perhaps a datamining researcher has taken a corpus of 1000 articles breast cancer, and established some interesting conclusions. In a true open access environment, each is free to post the results of their research, *along with* the actual corpus of data which the research was based on (effectively, the raw data for that research). But in a non-open access environment, that raw data (i.e. the research articles) cannot be redistributed, which makes it far more difficult than it needs to be for other scientists to reproduce, critique and follow up the work. Similarly, a scientist may wish to make a point by assembling a collection of certain articles or article fragments (perhaps they wish to assemble a comparison of the methods used for a certain technique). In an open access world, as long as they cite the sources, they are completely free to create and redistribute that compilation. Such a selective compilation may in itself be extremely useful contribution to science. (5) Print redistribution rights - the National Health Service, for example, should be able to redistribute thousands of printed copies of an important research article (which it may have funded) to its doctors if it wishes to do so. It should not have to pay a hefty copyright fee for the privilege. Certainly, print redistribution will likely become less significant in the future, but there is no logical reason that the scientific community should not be free to exchange and distribute the research that it has created in print form, as well as online. Matt Cockerill == Matthew Cockerill Ph.D. Technical Director BioMed Central Limited (http://www.biomedcentral.com) 34-42, Cleveland Street London W1T 4LB Email: matt@biomedcentral.com