Building a German infrastructure for RDM – RfII Recommendations

In June 2016 the RfII published a position paper with recommendations on the structure, processes, and funding of RDM in Germany. It presents an analysis of the current research data management (RDM) landscape in Germany and makes recommendations on how to foster an infrastructure for data management, preservation, and sharing.

The German Council for Scientific Information Infrastructures (RfII – Rat für Informationsinfrastrukturen) consists of 24 members from the scientific community. It was established in 2014 by the state and federal governments to support “the strategic development of a contemporary infrastructure for access to scientific information” (RfII Webpage).

 

The German national RDM landscape is marked by “an overall absence of coordination, and . . . parallel, project-based initiatives” (RfII 2016, p. 5). The (establishment of ) RDM services, procedures, and policies is currently mainly driven by individual academic institutions and research organisations and a small number of learned societies (e.g. in psychology). Research funders, especially the Federal Ministry of Education and Research (BMBF) and the German Research Foundation (DFG) promote RDM and data sharing and foster the development of infrastructures for data management. However, there is no general mandate to manage and share data from funded research projects.

Thus the current situation in Germany presents itself as follows:

  • infrastructure for RDM is built in a largely uncoordinated bottom-up process (p. 5; p. 22);
  • lack of a suitable political framework and of a nation-wide, coordinated information infrastructure, where the latter is understood as comprehensive services and procedures to support the management and use of data in all phases of the data lifecycle (p. 12)
  • a great number of position papers, problem analyses and recommendations for action on the one hand, but an “implementation deficit” on the other (p. 15, our translation)

Contributing to this situation is the fact that the German constitution limits the possibilities for cooperation between the federal states (“Länder”) and the national government (“Bund”) in the field of higher education and research. Universities are funded by the Länder, and funding efforts spanning all or several federal states are limited (also see p. 22).

Another important factor that has to be taken into account, especially when thinking about RDM-related policies and mandates is the freedom of science guaranteed by the German constitution, and specified in the Framework Act for Higher Education. This freedom of science gives researchers control about the way they conduct the research and publish its results.

These limiting factors must be acknowledged when discussing the German RDM landscape. As RfII observes, it will not be possible to develop and implement measures in a top-down fashion. However, “top-down impulses” are required to start processes leading towards a “dynamic integration of distributed knowledge” (p. 34, our translation).

The core recommendations made in the position paper to this end are:

  1. implement long-term funding mechanisms for RDM infrastructures;
  2. establish a national research data infrastructure (NFDI);
  3. foster “responsible data culture”;
  4. invest in the development in human resources for data management;
  5. strengthen international cooperation;
  6. establish mechanism for actively steering the transition process.

In the following, we focus on the suggested NFDI, which is interesting also in light of the report on the European Open Science Cloud published just a few days before the RfII position paper.

A National Research Data Infrastructure

The RfII position paper introduces the vision of a National Research Data Infrastructure as follows:

“Many aspects of research data management are of a generic and hence transferable nature . . . . With an eye to cost and efficiency, generic services can and should be established and offered in a shared manner. The RfII suggests the establishment of a consortium which bundles existing competences and provides basic storage infrastructure and services and ensures a fast transfer of competences within the science system. This National Research Data Infrastructure (NFDI) should take the form of a network spanning disciplines and communities. It should include existing big information infrastructures, the national level of ESFRI projects as well as those repositories serving sufficiently homogeneous user groups”. (p. 40, our translation)

The envisioned infrastructure has a strong focus not only on storage but also on (enabling) access to and use of the data. Thus the recommendations paint the picture of an infrastructure that fosters interdisciplinary research by enabling the combination and analysis of data from different communities/fields (p. 41). Services to support this include an access portal with access rights management, services for data registration and publication, search engines supporting semantic search with automatic translation of terms into community-specific terminology, support for data analysis and visualization (see appendix D.3).

To achieve this, the following challenges need to be addressed

  • overarching minimal standards for quality management in data description and storage;
  • the development of generic procedures of data analysis;
  • development of generic data services and data storage;
  • (continuing) professional education (p. 41)

Among the many questions that will have to be answered to realize the vision of the NFDI, two seem especially relevant:

1. On the national level, what will be the role of discipline-specific infrastructures and services within the NFDI?

The RfII position paper suggests that the NFDI will consist of different layers: it will integrate smaller archives, libraries or computing centers as well as bigger, community-specific infrastructures in addition to the cross-disciplinary, generic services. It is crucial to define the distribution of responsibilities between these different players.

For example, the RfII paper states that the NFDI should be responsible for digital preservation. This would be an important step to increase sustainability in this area. Many aspects of digital preservation beyond storage and backup will have to be dealt with by discipline-/community-specific infrastructures – especially description with metadata, format migrations, and support of users in working with and understanding the data. At the same time, however, the RfII recommends that infrastructure nodes should compete for (financial) resources and users (p. 43). In a way, there is a danger here of re-introducing an uncertainty concerning long-term funding through the backdoor, which in the case of digital preservation services is inherently problematic.

2. How do we ensure compatibility and interoperability of the German NFDI with the European Open Science Cloud?

On June 20, 2016 the Commission High Level Expert Group on the European Open Science Cloud (EOSC) published its first report A Cloud on the 2020 Horizon which “aims to lay out a high level, living roadmap for the realisation of the European Open Science Cloud” (no pag.).

The EOSC report calls for a “complex eco-system of infrastructures” in response to the fact that

“the challenges of ever bigger data can no longer be solved only by ever bigger infrastructure. . . . With the growth of data in more and more disciplines outpacing the increase of transfer speed, many comprehensive datasets are simply too big to move efficiently from one location to another. Moreover, data are in many cases so privacy sensitive that legislation effectively precludes their moving outside the environment in which they have been collected. Therefore, relatively lightweight workflows (e.g. process virtual machines) . . . increasingly visit data where they reside, with supporting reference data and transporting only conclusions outside the safe data vault. . . . Centralised supercomputing locations that are crucial for solving high capacity HPC scientific challenges alone will not adequately support this irreversible trend. Complementary infrastructures are needed”. (no pag.)

It appears that the EOSC report and the RfII position paper adhere to a similar infrastructure “paradigm”. Both argue for a decentral network of interoperable nodes that offer generic or specialized services which are embedded in a framework of suitable policies and which draw on community-/discipline specific knowledge and practices.

The EOSC report puts a rather strong emphasis on the importance of (technical) protocols,  comparable to those on which the Internet builds, to achieve interoperability between the complementary infrastructures, whereas as the RfII paper recommends focusing on standards for data description and generic procedures for data analysis as one of the challenges to be addressed with high priority. It is crucial that the German national development is closely coordinated with the developments on the European level, but given that it is still early days for both the EOSC and the NFDI, this does not seem an unsurmountable task.