Archival/Retrieval of Electronic Journals Stevan Harnad 19 Oct 1991 16:51 UTC

The following is a detailed reply by PSYCOLOQUY Assistant Editor
Malcolm Bauer to a query on about electronic journal archiving and
retrieval:

Date: Fri, 18 Oct 91 13:01:01 EDT
From: Malcolm Bauer <malcolm@clarity.princeton.edu>

  >Date:         Fri, 11 Oct 1991 08:36:00 EDT
  >Sender: Association of Electronic Scholarly Journals
  >              <AESJ-L@ALBNYVM1.BITNET>
  >From: JOANNA <WRIGHTJ%UNCWIL.BITNET@ncsuvm.cc.ncsu.edu>
  >Subject:      Re: Reply to Library management of online journal
  >
  > I am very disappointed in OCLC's proposed method of distribution and
  > pricing for CURRENT CLINICAL TRIALS. I was hoping for article access
  > through a ftp site. Can't OCLC provide something like this? We would
  > prefer to pay a much reduced subscription rate for access to the
  > channel plus a charge per article retrieved. Viewing articles online
  > can run up telecommunications charges real quick! Downloading articles
  > can take a lot of time, particularly if you have to do this for several
  > faculty! The idea of having "offline prints" mailed or faxed to us is a
  > step backwards! We should be able to retrieve e-journal articles within
  > a couple of minutes as netdata files, ship them electronically to our
  > students and faculty, and let them download the articles for viewing or
  > printing.
  >Joanna Wright Randall Library Univ. of North Carolina at Wilmington

There are many possible ways to access electronic journals. It may be
some time before we determine the best means of distribution and
access. Different means of distribution and access may be appropriate
for different journals. I think it's important to view each new
electronic journal as an experiment - to determine which elements of it
are good and which elements could be improved, and to understand why
the decisions in format and distribution were made. In that way we can
improve the quality of electronic journals as a whole.

As the assistant editor of PSYCOLOQUY, a peer-reviewed electronic
journal, I thought I'd describe the way we currently distribute
articles and discuss some software we are developing to access our
archives. I do this in order to present another example or "experiment"
in how electronic journals can be distributed and accessed.

PSYCOLOQUY is available free for anyone who has an internet or bitnet
email address. Issues are currently sent directly to about 2500
individual subscribers and redistribution lists worldwide through a
bitnet LISTSERV. It is also available through Usenet's netnews (to
anyone with an email address at thousands of host sites and over
100,000 individual computers worldwide) as the newsgroup
sci.psychology.digest.

Back issues are currently retrievable using anonymous ftp interactively
from internet and using bitftp from Bitnet and various other ftp-like
off-line email-guided means on JANET and elsewhere. We are in the
process of developing a more powerful and convenient means of accessing
the PSYCOLOQUY archive that may serve as a model for other electronic
journals. Ftp provides a quick and convenient means of retrieving
journal articles provided one already knows what issue or article to
retrieve. However, if one only has a general idea of what types of
articles one is interested in, there are better means available than
perusing an ftp retrieved index file.

Ideally, articles should be intelligently accessible; the archives
should be able to respond to various types of queries - from general
descriptions of topics to requests for specific articles. I view the
archives less as a collection of back issues and more as a
database-like research tool that people can use to find useful
information. The archives should be able to assist the user in
narrowing and focusing the search, or in expanding the range of
retrieved articles when necessary. The archiving software we are
developing in conjunction with Tom Landauer and Sue Dumais at Bellcore,
and Peter Foltz at the University of Colorado will have such capablities.

Researchers will be able to communicate with the archives through an
e-mail query system. The e-mail system was written by Peter Foltz for
use with HCIBIB, a database of Human-Computer Interaction abstracts.
We will adapt the system as needed for use by PSYCOLOQUY in conjunction
with Mr. Foltz. We considered allowing interactive access, but it's
difficult to construct a system that all researchers can use because of
the wide variety of computer systems and networks to which people have
access. Also, because PSYCOLOQUY is housed on university computers,
the number of lines available for interactive use is somewhat limited.
An e-mail query system with iterative search seems to be the best solution
currently to both of these problems. It's a balance between cost,
availabilty and ease of use. That balance will undoubtedly change and
PSYCOLOQUY's means of distribution will change accordingly.

The archives will store individual articles separately instead of
keeping them grouped by issue (the issue heading will be included with
each article for reference purposes). From a retrieval standpoint it
makes little sense to group articles together simply because they were
published at the same time. However, it will be possible to reconstruct
an issue by retrieving by the date of publication or issue and volume
number.

The retrieval technique we are using is Latent Semantic Indexing (LSI),
developed at Bellcore. For each document and term, LSI constructs a
vector of orthogonal factors based upon the association of words within
the documents.  During retrieval, the list of search terms is
represented as a vector of orthogonal factors in the same space as
well. Retrieval is based upon similarity of the search term vector to
the document vectors. Because the context in which words are used is
considered, this technique tends to retrieve more relevant articles
than the standard exact keyword match techniques. It also allows
researchers to perform more flexible searches.  For example,
researchers can ask the archives for articles that are similar to
articles that they found particularly relevant (Peter Foltz has
implemented this in HCIBIB already). For more information on LSI see
Deerwester, Dumais, Furnas, Landauer, and Harshman (1988): Indexing by
latent semantic analysis, JASIS; or Dumais, Furnas, Landauer,
Deerwester, and Harshman (1988): Using latent semantic analysis to
improve access to textual information, in the CHI'88 proceedings.

We will add features and take away others as the needs of our readers
become apparent. It is too soon in the development of this new medium
to become locked into particular formats and retrieval strategies. I
believe that our system will provide a useful, quick and inexpensive
way to retrieve relevant articles from the PSYCOLOQUY archive (and
other journals that will adopt this model).

Again, I view each electronic journal as an experiment. PSYCOLOQUY is one,
OCLC is another. Other electronic journals exist as well, for example,
David Rodgers at the American Mathematical Society is working on a related
project. For the sake of improving scholarly communication, it's important
that we learn as much as we can from each other.

                                Malcolm Bauer
                                Assistant Editor, PSYCOLOQUY
                                Department of Psychology
                                Princeton University
                                Princeton NJ 08544
                                malcolm@clarity.princeton.edu