Date: Sun, 29 Jun 1997 15:03:56 -0500 (CDT)
From: Gerry McKiernan <JL.GJM@ISUMVS.IASTATE.EDU>
Subject: Reconstitution of Meaning: MARC Fields as Morphemes
_Reconstitution of Meaning: MARC Fields as Morphemes_
In considering the possibilities of making better use of
the intellectual content embedded in MARC fields in my review
of the potential application of Data Mining and Knowledge
Discovery in Databases (KDD), it has occurred to me that
such an an investigation might prove useful if MARC fields
were viewed as _morphemes_ [no not morphine [:->]. The morpheme
is considered by (many) linguists as a basic unit of meaning
within a language.
In the cataloging process, meaning is embedded within a
defined structure using acceptable rules of grammar (e.g.
AACR2) - syntax if you will [:->]. In such a process,
a message about an individual work is conveyed, using this
grammar and an associated lexicon. Here the physical 'meaning'
and intellectual 'meaning' of an item are translated into
a message that is intended to describe the item and its content.
While this process of bibliographic control has enabled users
to identify 'meaningful' items relevant to an information need,
most existing and (even) New Age OPACS I've identified and compiled
in my Onion Patch (sm) clearinghouse at URL
http://www.public.iastate.edu/~CYBERSTACKS/Onion.htm
do not, I believe, make full use of the meaning explicit or
implicit within these records.
To identify items that are most relevant to users
[BTW: 'Relevance' is a 'meaning-full' concept [:->]],
we need to contemplate the creation of OPACs that
provide users (or allow users) to 'reconstitute' the
meaning within these records. We need to develop systems
that can present users with items (i.e., records of
cataloged items within the OPAC) that best meet their
needs using an 'optimal syntax' determined by the
'meaningful' associations uncovered by a Data Mining or a
KDD process, or provide users with the ability to select
a different syntax (e.g. subject and publisher associations),
to identify that 'good book' on the subject. Likewise,
we need to provide users with the ability to 'cross-tabulate'
associations within MARC fields such that they be provided with
a ranked listing of items by publisher-author-call number, or
call number-publisher, or subject heading-publisher, or other
potentially meaningfull association of their choosing.
[I have sketched out a mock-up interface for this function
and will certainly let the list(s) know, when it's
available]
In addition to associations revealed in the application
of Data Mining and KDD to an appropriate catalog database
(e.g., the OCLC cataloging database) or selected local
OPAC database of peer groups (e.g. RLG), as well as the desired
associations of users themselves, comprehensive log
data should also reveal useful associations that might
provide a new syntax, or enhance one already considered.
[Here circulation data would be very important, as would
OPAC transaction log data, as Larson has demonstrated in
his study of subject access in OPACs]
One could envision the application of the methods of
Computation Linguistics applied to MARC records or
even (perhaps) Transformational/Generative Grammar [:->]
[Long Live Noam Chomsky!] !
Once again, as always, any reactions to such musings
would be most welcome. [In particular, I am interested
any literature relating to the application of linguistic
theories/practices to bibliographic and MARC record structure.]
Regards,
Gerry McKiernan
Curator, CyberStacks(sm)
Iowa State University
Ames IA 50011
gerrymck@iastate.edu
http://www.public.iastate.edu/~CYBERSTACKS/
"Oh No!, Not Another Project"
P.S. One could certainly apply these envisioned methodologies to
any Metadata regime (e.g. The Dublin Core, TEI, etc.).