ALCTS CCS- Cataloging & Classification Research Discussion Group, ALA 2004 - Session Recorder: Amanda Wilson, Metadata Librarian at The Ohio State University, is an AMeGA project task force member - Recorder Assistant: Michelle Mascaro, soon-to-be cataloger at Utah State University, and Research Assistant on the AMeGA project Optimizing Metadata Generation Practices Discussion Leader: Jane Greenberg, Associate Professor, School of Information and Library Science, UNC-Chapel Hill (SILS/UNC-CH) The discussion group covered the topic of metadata generation and two research projects that Prof. Greenberg is directing at SILS/UNC-CH. The session began with a discussion on the motivations underlying metadata generation research - the explosive growth of digital repositories, the need to enhance resource discovery options, and the current "metadata bottleneck." Prof. Greenberg then moved on to talk specifically about the research questions, design, methods, and results of the Metadata Generation Research (MGR) Project (http://ils.unc.edu/mrc/mgr_index.htm). The MGR Project is a collaboration with the National Institute of Environmental and Health Sciences through which they are exploring different means of metadata generation (author, cataloger, and automatic metadata generation). Prof. Greenberg then reported on the AMeGA (Automatic Metadata Generation Applications) Project (http://ils.unc.edu/mrc/amega.htm), which is being conducted in connection with Section 4.2 of the Library of Congress Bibliographic Control Action Plan. The goal of this project is to identify and recommend functionalities for applications supporting automatic metadata generation in the library/ bibliographic control community. Greenberg discussed project goals, presented some preliminary results, and announced that the survey for the AMeGA project is now live and requested participation from those present and their colleagues. To participate in the study, please go to http://ils.unc.edu/mrc/amega_survey.htm. After these reports, Prof. Greenberg asked two questions of the attendees: 1. How else can we gather data to learn about metadata generation and improve: a.) the process?, b.) the quality of metadata being generated?, and c. the applications supporting metadata generation? 2. How can we bring communities interested in metadata together to improve metadata creation? Prof. Greenberg then referred back to question one to start the discussion. Response [participant]: The National Science Digital Library (NSDL) project is using Infomine's Ivia tool and UKOLN's DC.dot tool to create collection-level metadata records: a person is presented with automatically generated metadata from the Ivia or DC.dot tools and can then choose which information to use. The NSDL project is also investigating at human guided metadata generation. Response [participant]: In the University of Pennsylvania's pilot institutional repository, there is a major concern about using subject headings in metadata because faculty and authors are encouraged to create metadata about their digital resources. And for that, the process can be [appears to be] too hard [difficult]. As a solution, the project offers a browsable list of subject headings specific to their subject areas, so that a heading may be simply selected from a list. This is seen as an easier task [way] for creators because they do not have to make up their own subject terms. Question [Greenberg]: How are the terms offered to authors and faculty chosen? Response [participant continued from U.Penn]: Staff searched the Library of Congress Subject Headings (LCSH) to identify terms for use in a "general" category. Another category of terms, "other," was also developed with terms specific to different disciplines. The faculty-generated metadata is then reviewed by a cataloger before acceptance into the institutional repository collection. Comment [participant]: We should be careful about how we look at the dichotomy of metadata generation between professional catalogers and others creating metadata because the "metadata bottleneck" (or cataloging backlogs) is immense and insurmountable by professional catalogers alone. When discussing metadata creation it is important not to couch the discussion in terminology so that only librarians can understand. We should try to come up with tools that produce metadata in bulk because creating metadata records for one resource at a time is too slow. For instance, what about a tool that would expose automatically generated metadata and human generated metadata about the same resource to others - allowing them to determine how much they trust the information (e.g. provide data about who made the metadata record, what process was used to make the metadata record, etc.). Comment [Greenberg]: Catalogers should also be in a difference place in the metadata generation process, and not just serve as evaluators of the metadata. There is no point for catalogers to evaluate the accurateness of URLs - subject headings are a more important element for catalogers to evaluate. Comment [participant]: Also, more than one way exists to evaluate metadata. For example, an institution can establish a certain level of confidence in different methods of metadata creation. Then, the level of acceptable metadata can be ascertained - "Some metadata is better than none." If we don't continue to find additional ways to evaluate metadata, then we limit too much where metadata is going. Question [Greenberg]: How can we get the people creating metadata to know more about the expertise and skills needed to create good metadata? How do we get beyond word of mouth? Response [participant]: Having catalogers review just subject headings is not scalable. However, we should not burden the user to evaluate quality. We are dealing with the "Google user." Federated searching is important to consider with increasing growth of web resources. And in terms of subject headings, many local or other controlled vocabularies exist for different communities, while a lot of communities have no controlled vocabularies at all. Comment [Greenberg]: Authors in the MGR project just looked at keywords when thinking about subjects. Catalogers are better at specificity and exhaustivity in subject access. Comment [participant]: Keep in mind that lots of people are doing metadata including publishers and vendors. They don't know about libraries and subject access - and for that matter controlled vocabularies. Comment [participant]: We can't scale up what we currently do to deal with the tremendous amount of digital assets produced. Comment [Greenberg]: In a follow-up survey for the MGR project, we found that some scientists liked assigning metadata to their resources and grasped the importance, while others were very reluctant to created metadata. There were responses like "this is just one more task to add to my list of one more tasks." We need to educate resource creators about the value of metadata in resource discovery. The motivational and behavioral issues connected to metadata creation also important to consider. Question [participant]: What is the cost of metadata creation? Response/comment/question [Greenberg]: Good question. In the MGR project's comparative study, we found that 65% subject analysis conducted by scientists was of fair to high quality, while 97% of the subject analysis conducted by professional catalogers was of fair to high quality. What is the cost here? NIEHS scientists are presumably paid more than metadata professionals so the time that a scientist spends creating metadata for their own work is more costly compared to the time/cost ratio of a professional cataloger. Comment [participant]: Another factor in cost is getting digital asset creators to know about us [catalogers]. Comment [Greenberg]: One more factor in cost is the overall metadata creation time. According to the logs of the comparative phase of the MGR project, catalogers took between two and three times as long to create metadata records compared to resource authors. Expediency is a factor in the cost of metadata creation. Question [participant]: One bothersome issue in metadata creation is the fact that most technical metadata can be extracted and most systems do not do this. Response/comment [Greenberg]: Indeed, and this is where we need research. This is a goal of the AMeGA project. Please participate in the AMeGA survey. Our goal is to have a master RFP that will address these issues. (http://ils.unc.edu/mrc/amega_survey.htm) Question [participant]: What was the impact of the MGR project on metadata creation at NIEHS? Does the library still have one Technical Services librarian? Response [Greenberg]: NIEHS is looking at automatic metadata generation and the use of all three methods (1. author create metadata, 2. professional cataloger create metadata, and 3. automatic generation of metadata). NIEHS has not instituted author-generated metadata program. One of the obstacles is that the scientists did not feel that there was a benefit to creating metadata for their records. They still have trouble with the idea of searching metadata elements rather than Google. Question [participant]: Are the documents the authors created metadata for easier to locate? Response [Greenberg]: This is not known yet, but the original project did not have a mechanism set up to track web pages where there is more metadata. Question [participant]: There is an enormous demand for increased in the quality of metadata generation; and humans are excellent, while automatic generators are ok. Why is there no demand for human metadata specialists at these research institutions (like NIEHS) and why are LIS schools not training armies to fill the need (In past days, these people were called catalogers)? Response [Greenberg] I see a trend, perhaps a slight trend, counter to this. Students are recruited from UNC's School of Information and Library Science to do this type of work; however, often the vocabulary used in describing these positions is different than what we are familiar in library science, even though the functions are the same. For example, job posts may include the word taxonomist, ontologists, etc. I get calls for taxonomists, but after I talk to the person doing the recruiting for a while, I find out what they really want is a cataloger/metadata person. The demand for library science people for such positions isn't as high as I would like to see, but I believe a change is taking place, where library science skills are seen as invaluable beyond the library environment. Comment [participant]: Many people are coming to these jobs from another background and realizing that LIS skills are necessary. Comment [participant]: The same trend was noted in the recent NISO Metadata workshop. There were publishers and vendors among the attendees as well as LIS professionals. Session closed. **************************************************** Janet Lee-Smeltzer Database Management Librarian/LIMS Coordinator Albin O. Kuhn Library & Gallery University of Maryland, Baltimore County 1000 Hilltop Circle, Baltimore, MD 21250 Phone: 410-455-6814 Fax: 410-455-1598 Email: jleesme@umbc.edu