On November 17th, 2015 the fourth part of the DINI–nestor workshop series dedicated to the management and preservation of research data was hosted by the University Duisburg-Essen (program in German; presentation slides will be published here as well).
The workshop looked at the selection and appraisal of research data from the perspective of different (predominantly German) actors in the research data infrastructure, specifically data holding institutions such as universities and university libraries, “traditional” archives and discipline-specific services, and the German national library.
Jens Ludwig (Staatsbibliothek Berlin – Preußischer Kulturbesitz) started off the day with a general introduction to the topic of selection and appraisal of research data. In his presentation he specifically argued against the defeatist stance that claims research data management and preservation are a pointless undertaking because ‘we will never have the resources to adequately manage, appraise, and preserve all the data generated’. While it is true that the means and measures at our disposal are not adequate to preserve everything, using this as an excuse to remain inactive means throwing out the figurative baby with the bath water. Thus, if we cannot preserve everything forever, we must develop strategies to help us identify more reliably what should be kept and for how long.
One approach to this is the definition of selection processes and criteria, for example based on the question of “what for?”: For which purpose do we want to manage and keep data? Possible answers to these questions are:
- because doing so is an important element of good research practice and enables the replication of research results;
- or because we assume the data will be useful to other researchers in the future and can help the scientific community to gain new insights.
Preserving data is not an end in itself. It serves a purpose, and selecting data / defining selection criteria for preservation must take this purpose into account.
The introduction was followed by presentations shedding light on practical aspects of data appraisal. This included examples of how data is selected for preservation and dissemination in bigger organizations, namely, the GESIS Data Archive for the Social Sciences (Natascha Schumann), the German National Library (Sabine Schrimpf), and the Landesarchiv Baden-Württemberg (Christian Keitel). The approaches of these three organizations differ considerably as a result of their different mandates, both in terms of the types of selection criteria employed (e.g. formal vs content-based or a combination of the two) and in regard to the “rigor” of the selection process, reflected among other things in the “acceptance rate” for submitted objects.
A second set of presentations focused on universities and on the “producer” side of the process of selecting research data for preservation and sharing. Kerstin Helbig discussed support services for research data management at Humboldt University Berlin. Specifically, she looked at information needs of researchers who prepare to submit data to an institutional repository or discipline-specific archive. For them it can be difficult to select the “correct” subset of all the data and information they collected and created throughout the entire research process for preservation and/or dissemination. In the context of research data management planning it is important to know and understand the selection criteria employed by the repository or archive.
Documenting and analyzing the questions that researchers have about this process can help archives and repositories to better understand producers’ needs and to communicate with them more efficiently. The same purpose can be served by the results of a survey carried out among researchers in Austrian universities (Paolo Budroni, Barbara Sánchez Solís), presented at the end of the paper sessions.
Break-out sessions and final discussion
The paper sessions were followed by break-out sessions for practical exercises on data appraisal and for discussions focusing on the different stakeholders involved in the process of selection and appraisal and the general structure of an appraisal process for research data. These nicely complemented the presentations and offered a good opportunity for in-depth discussions and for hands-on learning about the appraisal and selection process.
The workshop concluded with an open discussion which highlighted a perspective which unfortunately was not strongly represented in the previous presentations: that of (smaller) institutional, multi-disciplinary repositories in universities. Often, these have a mandate to (or are expected to) accept everything created by the researchers of the university. They perform a kind of “catch basin” function for the “long tail” of research data that needs to be retained for a certain period, e.g. for replication purposes.
It is obvious that selection and appraisal processes in such repositories differ from those of a bigger, discipline-specific service. However, even when “everything” has to be accepted, criteria are required – for example, to determine if and how the data will be curated and disseminated (e.g. legal, ethical aspects), or whether a given dataset should be offered to a subject-specific archive or repository.
One envisioned output of the workshop is a German-language guideline on the selection and appraisal of research data to be created collaboratively in the next year and building on the presentations and discussions of the workshop. One challenge of this guideline certainly will be to include both the perspectives of bigger, discipline-specific services and of smaller institutional repositories.