DINI/nestor Workshop: #RDM Tools

On June 17, 2016 the fifth DINI/nestor workshop on the topic of research data and data management took place in Kiel, Germany. It addressed the topic of tools for research data and their integration in the research and data management process.

Tools for data management and data handling are important for (at least) two reasons: 1) They help to standardize research data management and its procedures. 2) They can help to foster the adoption of research data management practices by reducing the effort and time researchers have to invest when implementing RDM measures.

The biggest challenge in this regard is, however, to develop tools that are actually used – as presenters pointed out repeatedly, tools have to integrate with the research process as seamlessly as possible to stand a chance of being adopted. A tool requiring researchers to go out of their way and that does not produce an immediate tangible benefit will end up not being used.

The workshop’s presentations and breakout sessions introduced participants to an assortment of different data management tools already available or currently being developed, ranging from Virtual Research Environments to tools for the creation and publication of metadata, and packaging tools for the submission of data and documentation to a repository.

Among the questions discussed during the sessions is that of generic vs. discipline- (or methods-) specific tools. Thus, while it makes sense to offer certain services – such as secure storage, document sharing, project management and communication – centrally, many aspects of data management are very specific to the different disciplines and methods used in the research. This includes, for example, how and with which metadata the research process and data are best documented, the degree of automation of measurement and analysis processes, or typical ways of collaboration.

Below is an overview and short description of the tools presented during the workshop. All presentations (in German) are available for download from the workshop page.

Overview of tools presented

Tool Discipline institutional Collaboration Storage Metadata Data handling Publication
DataWiz Psychology o o o x o x
LZA Lite generic x (x) x x o (x)
MASI generic, Applied Sciences x o x x o x
Replay DH Digital Humanities o x o x x o
VRE GEOMAR Marine science x x x x x x
VRE U Kiel generic x x x x x x
ZBW Journal Data Archive Economics o o (x) x o x

DataWiz is a data management tool currently developed at the ZPID, Leibniz Center for Psychology Information. It supports data management planning and implementation in the field of psychology and supports the documentation of the research process and the data with the help of metadata. It will be possible to directly submit the data and the documentation created with the tool to the ZPID research data center PsychData. The tool will also support the pre-registration of research in the future.

GEOMAR data management portal: This integrated data management system for marine research is a collaborative effort of several large-scale marine research projects begun in 2009. The objective was to create a common working platform and a common research data management rather than addressing the associated challenges in each project separately. Today the platform incorporates tools for data collection, archiving, and publication, as well as for information exchange and collaboration.

LZA Lite is a cooperation between three German universities. It is a Fedora-based platform supporting the secure storage of both administrative records and research data and its enrichment with metadata. It is planned to expand the platform with solutions for working collaboratively and for long-term preservation. The productive system will be launched in 2017.

MASI – Metadata Management for Applied Sciences is a tool currently developed at the TU Dresden in cooperation with several other HEIs. Intended as a research data repository for “living data” it will integrate functions and services for the (automated) description of data with metadata, data storage, and publication and re-use of data.

Replay DH: This is a project to build a git-based versioning tool for research data in the digital humanities carried out at the universities of Ulm and Stuttgart. A GUI and fields for standardized metadata description will be created for git and a DOI registration (also for not-yet final versions of the data) will be implemented.

Virtual Research Environment at the University of Kiel: Based on existing infrastructure, the project currently establishes a generic VRE combining tools for data storage, collaboration, and publication with central services such as identity management and with discipline-specific tools. The VRE is embedded in an organizational setting dedicated to fostering RDM practices at the University of Kiel and offering (face to face) support for researchers among other things.

ZBW Journal Data Archive: This portal was developed at the ZBW – German National Library of Economic as part of the EDaWaX project to support the replicability of research in Economics. Based on CKAN, it allows for the description with metadata and publication of data underlying empirical research articles in accordance with journal data policies. The data is securely stored in the SOEP Research Data Center while the metadata is managed by the ZBW.




DINI/nestor workshop: Appraisal and selection of research data

On November 17th, 2015 the fourth part of the DINInestor workshop series dedicated to the management and preservation of research data was hosted by the University Duisburg-Essen (program in German; presentation slides will be published here as well).

The workshop  looked at the selection and appraisal of research data from the perspective of different (predominantly German) actors in the research data infrastructure, specifically data holding institutions such as universities and university libraries, “traditional” archives and discipline-specific services, and the German national library.


Jens Ludwig (Staatsbibliothek Berlin – Preußischer Kulturbesitz) started off the day with a general introduction to the topic of selection and appraisal of research data. In his presentation he specifically argued against the defeatist stance that claims research data management and preservation are a pointless undertaking because ‘we will never have the resources to adequately manage, appraise, and preserve all the data generated’. While it is true that the means and measures at our disposal are not adequate to preserve everything, using this as an excuse to remain inactive means throwing out the figurative baby with the bath water. Thus, if we cannot preserve everything forever, we must develop strategies to help us identify more reliably what should be kept and for how long.

One approach to this is the definition of selection processes and criteria, for example based on the question of “what for?”: For which purpose do we want to manage and keep data? Possible answers to these questions are:

  • because doing so is an important element of good research practice and enables the replication of research results;
  • or because we assume the data will be useful to other researchers in the future and can help the scientific community to gain new insights.

Preserving data is not an end in itself. It serves a purpose, and selecting data / defining selection criteria for preservation must take this purpose into account.

Paper sessions

The introduction was followed by presentations shedding light on practical aspects of data appraisal. This included examples of how data is selected for preservation and dissemination in bigger organizations, namely, the GESIS Data Archive for the Social Sciences (Natascha Schumann), the German National Library (Sabine Schrimpf), and the Landesarchiv Baden-Württemberg (Christian Keitel). The approaches of these three organizations differ considerably as a result of their different mandates, both in terms of the types of selection criteria employed (e.g. formal vs content-based or a combination of the two) and in regard to the “rigor” of the selection process, reflected among other things in the “acceptance rate” for submitted objects.

A second set of presentations focused on universities and on the “producer” side of the process of selecting research data for preservation and sharing. Kerstin Helbig discussed support services for research data management at Humboldt University Berlin. Specifically, she looked at information needs of researchers who prepare to submit data to an institutional repository or discipline-specific archive. For them it can be difficult to select the “correct” subset of all the data and information they collected and created throughout the entire research process for preservation and/or dissemination. In the context of research data management planning it is important to know and understand the selection criteria employed by the repository or archive.

Documenting and analyzing the questions that researchers have about this process can help archives and repositories to better understand producers’ needs and to communicate with them more efficiently. The same purpose can be served by the results of a survey carried out among researchers in Austrian universities (Paolo Budroni, Barbara Sánchez Solís), presented at the end of the paper sessions.

Break-out sessions and final discussion

The paper sessions were followed by break-out sessions for practical exercises on data appraisal and for discussions focusing on the different stakeholders involved in the process of selection and appraisal and the general structure of an appraisal process for research data. These nicely complemented the presentations and offered a good opportunity for in-depth discussions and for hands-on learning about the appraisal and selection process.

The workshop concluded with an open discussion which highlighted a perspective which unfortunately was not strongly represented in the previous presentations: that of (smaller) institutional, multi-disciplinary repositories in universities. Often, these have a mandate to (or are expected to) accept everything created by the researchers of the university. They perform a kind of “catch basin” function for the “long tail” of research data that needs to be retained for a certain period, e.g. for replication purposes.

It is obvious that selection and appraisal processes in such repositories differ from those of a bigger, discipline-specific service. However, even when “everything” has to be accepted, criteria are required – for example, to determine if and how the data will be curated and disseminated (e.g. legal, ethical aspects), or whether a given dataset should be offered to a subject-specific archive or repository.

One envisioned output of the workshop is a German-language guideline on the selection and appraisal of research data to be created collaboratively in the next year and building on the presentations and discussions of the workshop. One challenge of this guideline certainly will be to include both the perspectives of bigger, discipline-specific services and of smaller institutional repositories.