ALA 2004 Annual ALCTS CCS CCRDG meeting notes Janet Lee Smeltzer 19 Jul 2004 17:46 UTC

ALCTS CCS- Cataloging & Classification Research Discussion Group, ALA
2004

- Session Recorder: Amanda Wilson, Metadata Librarian at The Ohio State
University, is an AMeGA project task force member
- Recorder Assistant:  Michelle Mascaro, soon-to-be cataloger at Utah
State University, and Research Assistant on the AMeGA project

Optimizing Metadata Generation Practices
Discussion Leader: Jane Greenberg, Associate Professor,
School of Information and Library Science, UNC-Chapel Hill (SILS/UNC-CH)

The discussion group covered the topic of metadata generation and two
research projects that Prof. Greenberg is directing at SILS/UNC-CH.  The
session began with a discussion on the motivations underlying metadata
generation research - the explosive growth of digital repositories, the
need to enhance resource discovery options, and the current "metadata
bottleneck."  Prof. Greenberg then moved on to talk specifically about
the research questions, design, methods, and results of the Metadata
Generation Research (MGR) Project
(http://ils.unc.edu/mrc/mgr_index.htm).  The MGR Project is a
collaboration with the National Institute of Environmental and Health
Sciences through which they are exploring different means of metadata
generation (author, cataloger, and automatic metadata generation).

Prof. Greenberg then reported on the AMeGA (Automatic Metadata
Generation Applications) Project (http://ils.unc.edu/mrc/amega.htm),
which is being conducted in connection with Section 4.2 of the Library
of Congress Bibliographic Control Action Plan.  The goal of this project
is to identify and recommend functionalities for applications supporting
automatic metadata generation in the library/ bibliographic control
community.  Greenberg discussed project goals, presented some
preliminary results, and announced that the survey for the AMeGA project
is now live and requested participation from those present and their
colleagues.  To participate in the study, please go to
http://ils.unc.edu/mrc/amega_survey.htm.

After these reports, Prof. Greenberg asked two questions of the
attendees:
1.      How else can we gather data to learn about metadata generation
and improve: a.) the process?, b.) the quality of metadata being
generated?, and c. the applications supporting metadata generation?
2.      How can we bring communities interested in metadata together to
improve metadata creation?

Prof. Greenberg then referred back to question one to start the
discussion.

Response [participant]:  The National Science Digital Library (NSDL)
project is using Infomine's Ivia tool and UKOLN's DC.dot tool to create
collection-level metadata records:  a person is presented with
automatically generated metadata from the Ivia or DC.dot tools and can
then choose which information to use.  The NSDL project is also
investigating at human guided metadata generation.

Response [participant]: In the University of Pennsylvania's pilot
institutional repository, there is a major concern about using subject
headings in metadata because faculty and authors are encouraged to
create metadata about their digital resources.  And for that, the
process can be [appears to be] too hard [difficult].  As a solution, the
project offers a browsable list of subject headings specific to their
subject areas, so that a heading may be simply selected from a list.
This is seen as an easier task [way] for creators because they do not
have to make up their own subject terms.

Question [Greenberg]:  How are the terms offered to authors and faculty
chosen?

Response [participant continued from U.Penn]: Staff searched the Library
of Congress Subject Headings (LCSH) to identify terms for use in a
"general" category.  Another category of terms, "other," was also
developed with terms specific to different disciplines.  The
faculty-generated metadata is then reviewed by a cataloger before
acceptance into the institutional repository collection.

Comment [participant]:  We should be careful about how we look at the
dichotomy of metadata generation between professional catalogers and
others creating metadata because the "metadata bottleneck" (or
cataloging backlogs) is immense and insurmountable by professional
catalogers alone.  When discussing metadata creation it is important not
to couch the discussion in terminology so that only librarians can
understand.  We should try to come up with tools that produce metadata
in bulk because creating metadata records for one resource at a time is
too slow.  For instance, what about a tool that would expose
automatically generated metadata and human generated metadata about the
same resource to others - allowing them to determine how much they trust
the information (e.g. provide data about who made the metadata record,
what process was used to make the metadata record, etc.).

Comment [Greenberg]: Catalogers should also be in a difference place in
the metadata generation process, and not just serve as evaluators of the
metadata.  There is no point for catalogers to evaluate the accurateness
of URLs - subject headings are a more important element for catalogers
to evaluate.

Comment [participant]:  Also, more than one way exists to evaluate
metadata.  For example, an institution can establish a certain level of
confidence in different methods of metadata creation.  Then, the level
of acceptable metadata can be ascertained - "Some metadata is better
than none."  If we don't continue to find additional ways to evaluate
metadata, then we limit too much where metadata is going.

Question [Greenberg]:  How can we get the people creating metadata to
know more about the expertise and skills needed to create good metadata?
How do we get beyond word of mouth?

Response [participant]:  Having catalogers review just subject headings
is not scalable.  However, we should not burden the user to evaluate
quality.  We are dealing with the "Google user."  Federated searching is
important to consider with increasing growth of web resources.  And in
terms of subject headings, many local or other controlled vocabularies
exist for different communities, while a lot of communities have no
controlled vocabularies at all.

Comment [Greenberg]:  Authors in the MGR project just looked at keywords
when thinking about subjects.  Catalogers are better at specificity and
exhaustivity in subject access.

Comment [participant]:  Keep in mind that lots of people are doing
metadata including publishers and vendors.  They don't know about
libraries and subject access - and for that matter controlled
vocabularies.

Comment [participant]:  We can't scale up what we currently do to deal
with the tremendous amount of digital assets produced.

Comment [Greenberg]:  In a follow-up survey for the MGR project, we
found that some scientists liked assigning metadata to their resources
and grasped the importance, while others were very reluctant to created
metadata. There were responses like "this is just one more task to add
to my list of one more tasks."  We need to educate resource creators
about the value of metadata in resource discovery.  The motivational and
behavioral issues connected to metadata creation also important to
consider.

Question [participant]:  What is the cost of metadata creation?

Response/comment/question [Greenberg]:  Good question.  In the MGR
project's comparative study, we found that 65% subject analysis
conducted by scientists was of fair to high quality, while 97% of the
subject analysis conducted by professional catalogers was of fair to
high quality.  What is the cost here?  NIEHS scientists are presumably
paid more than metadata professionals so the time that a scientist
spends creating metadata for their own work is more costly compared to
the time/cost ratio of a professional cataloger.

Comment [participant]:  Another factor in cost is getting digital asset
creators to know about us [catalogers].

Comment [Greenberg]:  One more factor in cost is the overall metadata
creation time.  According to the logs of the comparative phase of the
MGR project, catalogers took between two and three times as long to
create metadata records compared to resource authors.  Expediency is a
factor in the cost of metadata creation.

Question [participant]:  One bothersome issue in metadata creation is
the fact that most technical metadata can be extracted and most systems
do not do this.

Response/comment [Greenberg]:  Indeed, and this is where we need
research.  This is a goal of the AMeGA project.  Please participate in
the AMeGA survey.  Our goal is to have a master RFP that will address
these issues.  (http://ils.unc.edu/mrc/amega_survey.htm)

Question [participant]:  What was the impact of the MGR project on
metadata creation at NIEHS?  Does the library still have one Technical
Services librarian?

Response [Greenberg]:  NIEHS is looking at automatic metadata generation
and the use of all three methods (1. author create metadata, 2.
professional cataloger create metadata, and 3. automatic generation of
metadata).  NIEHS has not instituted author-generated metadata program.
One of the obstacles is that the scientists did not feel that there was
a benefit to creating metadata for their records.  They still have
trouble with the idea of searching metadata elements rather than Google.

Question [participant]:  Are the documents the authors created metadata
for easier to locate?

Response [Greenberg]:  This is not known yet, but the original project
did not have a mechanism set up to track web pages where there is more
metadata.

Question [participant]:  There is an enormous demand for increased in
the quality of metadata generation; and humans are excellent, while
automatic generators are ok.  Why is there no demand for human metadata
specialists at these research institutions (like NIEHS) and why are LIS
schools not training armies to fill the need (In past days, these people
were called catalogers)?

Response [Greenberg]   I see a trend, perhaps a slight trend, counter to
this.  Students are recruited from UNC's School of Information and
Library Science to do this type of work; however, often the vocabulary
used in describing these positions is different than what we are
familiar in library science, even though the functions are the same.
For example, job posts may include the word taxonomist, ontologists,
etc.  I get calls for taxonomists, but after I talk to the person doing
the recruiting for a while, I find out what they really want is a
cataloger/metadata person.  The demand for library science people for
such positions isn't as high as I would like to see, but I believe a
change is taking place, where library science skills are seen as
invaluable beyond the library environment.

Comment [participant]:  Many people are coming to these jobs from
another background and realizing that LIS skills are necessary.

Comment [participant]:  The same trend was noted in the recent NISO
Metadata workshop.  There were publishers and vendors among the
attendees as well as LIS professionals.

Session closed.

****************************************************
Janet Lee-Smeltzer
Database Management Librarian/LIMS Coordinator
Albin O. Kuhn Library & Gallery
University of Maryland, Baltimore County
1000 Hilltop Circle, Baltimore, MD 21250
Phone: 410-455-6814
Fax: 410-455-1598
Email: jleesme@umbc.edu