What can be accomplished at the catalogue end and what needs harmonized metadata? CESSDA Data Catalogue as an example case
Taina Jääskeläinen (CESSDA)
CESSDA is the Consortium of European Social Science Data Archives. The CESSDA Data Catalogue provides metadata on research datasets in various languages, harvested from 20+ data repositories located in different countries. All use the same DDI metadata standard but it is not sufficient for a functional user interface (UI), harmonizing the metadata content itself is sometimes needed. For instance, country and data collection time filters in search interfaces rely on the machine-actionable codes in the metadata. The big question is: are there alternatives to this type of resource intensive metadata harmonization? Are there open source tools that would map all variants of country and other geographical names in a different language to the standardized codes, or different ways of documenting date information, and matching it with an ISO code? Social sciences would benefit from dialogue with other research domains relating to this. Another big issue is how to design the search interface so that users can navigate confidently between the language(s) of the metadata, the actual data files and the user interface where the languages may be the same or different? The CESSDA Catalogue team decided to reduce choice by providing the interface in English only but the metadata in different languages. According to our testing, multilingual search gave incomplete results in some languages. Are there open tools where the language analyzers are good enough to manage multilingual search across languages correctly?
The aim of the workshop is to discuss possibilities for multilingual data catalogues and their user-facing interfaces in cases where there are no resource for AI enhancements; exchange information on useful tools and best practices; potential cooperation ideas.
Posssible discussion items are:
– Geographical location information
- GeoNames.org open source geographical database providing 27 million geographical names, matching country names in different languages to ISO codes and continents, administrative divisions etc.
– EU Semantic Interoperability Catalogue
– Search interface functionalities and language choices
– Language analyzers
Keywords: Multilinguality, search interface, harmonised metadata, discoverability
Workshop leader: Taina Jääskeläinen (CESSDA)
Taina Jääskeläinenis Senior Specialist at the Finnish Social Science Data Archive. Her expertise areas include translation, multilinguality, cross-national vocabularies and metadata harmonization. She has been involved in CESSDA activities and service building for a long time. She functioned as the Service Owner for the CESSDA Data Catalogue and CESSDA Vocabulary Service 2021-2022.