DINI/nestor Workshop: #RDM Tools

On June 17, 2016 the fifth DINI/nestor workshop on the topic of research data and data management took place in Kiel, Germany. It addressed the topic of tools for research data and their integration in the research and data management process.

Tools for data management and data handling are important for (at least) two reasons: 1) They help to standardize research data management and its procedures. 2) They can help to foster the adoption of research data management practices by reducing the effort and time researchers have to invest when implementing RDM measures.

The biggest challenge in this regard is, however, to develop tools that are actually used – as presenters pointed out repeatedly, tools have to integrate with the research process as seamlessly as possible to stand a chance of being adopted. A tool requiring researchers to go out of their way and that does not produce an immediate tangible benefit will end up not being used.

The workshop’s presentations and breakout sessions introduced participants to an assortment of different data management tools already available or currently being developed, ranging from Virtual Research Environments to tools for the creation and publication of metadata, and packaging tools for the submission of data and documentation to a repository.

Among the questions discussed during the sessions is that of generic vs. discipline- (or methods-) specific tools. Thus, while it makes sense to offer certain services – such as secure storage, document sharing, project management and communication – centrally, many aspects of data management are very specific to the different disciplines and methods used in the research. This includes, for example, how and with which metadata the research process and data are best documented, the degree of automation of measurement and analysis processes, or typical ways of collaboration.

Below is an overview and short description of the tools presented during the workshop. All presentations (in German) are available for download from the workshop page.

Overview of tools presented

Tool Discipline institutional Collaboration Storage Metadata Data handling Publication
DataWiz Psychology o o o x o x
LZA Lite generic x (x) x x o (x)
MASI generic, Applied Sciences x o x x o x
Replay DH Digital Humanities o x o x x o
VRE GEOMAR Marine science x x x x x x
VRE U Kiel generic x x x x x x
ZBW Journal Data Archive Economics o o (x) x o x

DataWiz is a data management tool currently developed at the ZPID, Leibniz Center for Psychology Information. It supports data management planning and implementation in the field of psychology and supports the documentation of the research process and the data with the help of metadata. It will be possible to directly submit the data and the documentation created with the tool to the ZPID research data center PsychData. The tool will also support the pre-registration of research in the future.

GEOMAR data management portal: This integrated data management system for marine research is a collaborative effort of several large-scale marine research projects begun in 2009. The objective was to create a common working platform and a common research data management rather than addressing the associated challenges in each project separately. Today the platform incorporates tools for data collection, archiving, and publication, as well as for information exchange and collaboration.

LZA Lite is a cooperation between three German universities. It is a Fedora-based platform supporting the secure storage of both administrative records and research data and its enrichment with metadata. It is planned to expand the platform with solutions for working collaboratively and for long-term preservation. The productive system will be launched in 2017.

MASI – Metadata Management for Applied Sciences is a tool currently developed at the TU Dresden in cooperation with several other HEIs. Intended as a research data repository for “living data” it will integrate functions and services for the (automated) description of data with metadata, data storage, and publication and re-use of data.

Replay DH: This is a project to build a git-based versioning tool for research data in the digital humanities carried out at the universities of Ulm and Stuttgart. A GUI and fields for standardized metadata description will be created for git and a DOI registration (also for not-yet final versions of the data) will be implemented.

Virtual Research Environment at the University of Kiel: Based on existing infrastructure, the project currently establishes a generic VRE combining tools for data storage, collaboration, and publication with central services such as identity management and with discipline-specific tools. The VRE is embedded in an organizational setting dedicated to fostering RDM practices at the University of Kiel and offering (face to face) support for researchers among other things.

ZBW Journal Data Archive: This portal was developed at the ZBW – German National Library of Economic as part of the EDaWaX project to support the replicability of research in Economics. Based on CKAN, it allows for the description with metadata and publication of data underlying empirical research articles in accordance with journal data policies. The data is securely stored in the SOEP Research Data Center while the metadata is managed by the ZBW.

 

 

Advertisements

CESSDA Training goes GEBF

As a part of the annual conference of the Gesellschaft für Empirische Bildungsforschung (GEBF) at the beginning of March 2016 in Berlin, CESSDA Training held a workshop on research data management.

Data management and the re-usability of research data are becoming of increasing importance in empirical social and educational research in Germany. Thus the German Federal Ministry of Education and Research has begun to make data management, data archiving and data re-usability a precondition for funding in the field of educational research. In response to this, three German research institutes, namely the Deutsche Institut für Internationale Pädagogische Forschung (DIPF), the Institut für Qualitätseintwicklung im Bildungswesen (IQB) and GESIS, established the Verbund Forschungsdaten Bildung (VFDB).

This research infrastructure supports researchers in managing and archiving their data from the field of educational research. The VFDB web portal (in German) offers best practice guidelines and templates on all relevant topics of research data management in empirical social and educational research. This includes legal and ethical issues, data documentation, data security, and data archiving among other things.

To support the VFDB’s primary objective, CESSDA Training and the IQB commonly held the GEBF workshop to introduce participants to the overall field of research data management. During the two hour session, we talked about the relevance of long-term accessibility of research data and discussed the elements of good data management. Due to the limited time available, this workshop could of course only provide a first introduction to the topic and increase participants’ awareness of issues that might compromise data quality and re-usability. However, this is a first important step on our way to making research data management and sharing a matter-of-course for reseachers in the empirical social and educational science community.

KE Workshop on RDM training and skills

Co-authored with Laurence Horton, London School of Economics and Political Science

Knowledge Exchange (KE) is an international collaboration to enable open scholarship featuring infrastructure bodies from Germany, Denmark, Netherlands, Finland and the UK.

With a focus on Research Data Management (RDM) training, this London event on 9-10 February 2016 contained presentations allied with group discussions for broader themes. The results of the workshop as well as results of a survey carried out by KE among RDM training providers before the workshop will be published as a Knowledge Exchange Report this spring.

Presentations

Ellen Verbakel (TU Delft) introduced The Essentials 4 Data Support Course, targeted at those who support researchers in data management. The course is organized along the research data life cycle and can be taken online for information or certification through blended learning, including assignments and face-to-face teaching. One finding is the website needed short text, images and video.

Jonas Recker (GESIS) talked about CESSDA Training introductory workshops on RDM. Evaluations show participants are happy with the opportunity for questions and discussion. However, demand exists for practical examples plus guidance on informed consent, anonymisation and data protection.

UK Data Service’s Libby Bishop also reported on the value in using “real” data in workshops. Again, the argument was less PowerPoint, more exercises. Libby also mentioned challenges making life harder, including “cloud” storage, encryption, “big data”, and non-academic sources using data for research.

Institutional focus came from Gareth Knight (LSHTM) who identified demand in developing areas for RDM support like training on mobile devices for data capture, advanced anonymisation and encryption training.

Stéphane Goldstein (InformAll) described the KE survey of RDM training. It found training audiences is almost exclusively PhD/Post-doc. Data also suggested discrepancies between learning aims and impact of the training.

Reporting on their train the trainers project, Joy Davidson (DCC) identified groups missing from RDM training and why it is important. A wide range of institutional support staff includes archivists, finance, legal officers who through their roles touch on RDM areas but are not getting support on how their jobs fit into enabling data reuse and preservation.

Two presentations from Denmark showed how a smaller nation is tackling RDM. Henrik Pedersen (University of Southern Denmark) outlined the use of a national forum to ground local expertise in national coordination and Karsten Kryger (Aalborg University Library) sketched their flexible training master plan, while emphasizing that training was a minor but important part of an overall strategy.

Christian Jämsen of the National Institute for Health and Welfare in Finland covered how it manages extensive sensitive patient record data and makes it available for research. Part of the challenge for the institute was knowing what it has and what it can use.

Presentations are available from this dropbox link.

Discussions

Broader theme breakout sessions focused on lessons learnt, challenges in training, cross-national insights, and success criteria for RDM. Summaries of discussions will be available from KE.

A number of recurring themes emerged that were discussed throughout the workshop, including

Level of training and scalability

It appears that most of the trainings delivered by workshop participants were introductory level trainings. This could be due to the fact that it is still “early days” in implementing RDM procedures and RDM training into the research routine. However, as a community we should start thinking about more advanced trainings for certain target groups.

It appears that most training delivered by workshop participants were introductory level. This could be because it is still “early days” in implementing RDM procedures and RDM training into the research routine. However, as a community we should start thinking about more advanced training for target groups.

However, linked to this is how to make RDM training scale. The KE survey revealed that in 2014 over 30% of training offered reached fewer than 50 participants, and another 19% reached fewer than 99 participants. A reason for this may be the small number of dedicated staff but also the fact that in-depth training beyond very general introductory remarks on RDM appears to require smaller groups and room for discussion of subject and projectspecific questions.

Impact and success

The question of impact and success is twofold: how do we measure the impact of RDM training – i.e. how do we know training has actually improved something, but also: how do we measure the success of RDM itself?

 Metadata and repositories for training resources

It became clear in discussions there is a demand to improve discoverability and accessibility of RDM training materials. As rightly pointed out by Laura Molloy and Kevin Ashley, suitable repositories and metadata already exist. It seems that we need to create more awareness that these tools are already out there and possibly have a discussion about how to make them a good fit for a broad spectrum of training resources.

DINI/nestor workshop: Appraisal and selection of research data

On November 17th, 2015 the fourth part of the DINInestor workshop series dedicated to the management and preservation of research data was hosted by the University Duisburg-Essen (program in German; presentation slides will be published here as well).

The workshop  looked at the selection and appraisal of research data from the perspective of different (predominantly German) actors in the research data infrastructure, specifically data holding institutions such as universities and university libraries, “traditional” archives and discipline-specific services, and the German national library.

Introduction

Jens Ludwig (Staatsbibliothek Berlin – Preußischer Kulturbesitz) started off the day with a general introduction to the topic of selection and appraisal of research data. In his presentation he specifically argued against the defeatist stance that claims research data management and preservation are a pointless undertaking because ‘we will never have the resources to adequately manage, appraise, and preserve all the data generated’. While it is true that the means and measures at our disposal are not adequate to preserve everything, using this as an excuse to remain inactive means throwing out the figurative baby with the bath water. Thus, if we cannot preserve everything forever, we must develop strategies to help us identify more reliably what should be kept and for how long.

One approach to this is the definition of selection processes and criteria, for example based on the question of “what for?”: For which purpose do we want to manage and keep data? Possible answers to these questions are:

  • because doing so is an important element of good research practice and enables the replication of research results;
  • or because we assume the data will be useful to other researchers in the future and can help the scientific community to gain new insights.

Preserving data is not an end in itself. It serves a purpose, and selecting data / defining selection criteria for preservation must take this purpose into account.

Paper sessions

The introduction was followed by presentations shedding light on practical aspects of data appraisal. This included examples of how data is selected for preservation and dissemination in bigger organizations, namely, the GESIS Data Archive for the Social Sciences (Natascha Schumann), the German National Library (Sabine Schrimpf), and the Landesarchiv Baden-Württemberg (Christian Keitel). The approaches of these three organizations differ considerably as a result of their different mandates, both in terms of the types of selection criteria employed (e.g. formal vs content-based or a combination of the two) and in regard to the “rigor” of the selection process, reflected among other things in the “acceptance rate” for submitted objects.

A second set of presentations focused on universities and on the “producer” side of the process of selecting research data for preservation and sharing. Kerstin Helbig discussed support services for research data management at Humboldt University Berlin. Specifically, she looked at information needs of researchers who prepare to submit data to an institutional repository or discipline-specific archive. For them it can be difficult to select the “correct” subset of all the data and information they collected and created throughout the entire research process for preservation and/or dissemination. In the context of research data management planning it is important to know and understand the selection criteria employed by the repository or archive.

Documenting and analyzing the questions that researchers have about this process can help archives and repositories to better understand producers’ needs and to communicate with them more efficiently. The same purpose can be served by the results of a survey carried out among researchers in Austrian universities (Paolo Budroni, Barbara Sánchez Solís), presented at the end of the paper sessions.

Break-out sessions and final discussion

The paper sessions were followed by break-out sessions for practical exercises on data appraisal and for discussions focusing on the different stakeholders involved in the process of selection and appraisal and the general structure of an appraisal process for research data. These nicely complemented the presentations and offered a good opportunity for in-depth discussions and for hands-on learning about the appraisal and selection process.

The workshop concluded with an open discussion which highlighted a perspective which unfortunately was not strongly represented in the previous presentations: that of (smaller) institutional, multi-disciplinary repositories in universities. Often, these have a mandate to (or are expected to) accept everything created by the researchers of the university. They perform a kind of “catch basin” function for the “long tail” of research data that needs to be retained for a certain period, e.g. for replication purposes.

It is obvious that selection and appraisal processes in such repositories differ from those of a bigger, discipline-specific service. However, even when “everything” has to be accepted, criteria are required – for example, to determine if and how the data will be curated and disseminated (e.g. legal, ethical aspects), or whether a given dataset should be offered to a subject-specific archive or repository.

One envisioned output of the workshop is a German-language guideline on the selection and appraisal of research data to be created collaboratively in the next year and building on the presentations and discussions of the workshop. One challenge of this guideline certainly will be to include both the perspectives of bigger, discipline-specific services and of smaller institutional repositories.