In the wake of the spread of digital approaches, the question of how to deal with so-called "research data" has recently come to the fore. It seems as if the ideas associated with this go back to the circumstances in the natural sciences. A common scenario in this field is that, for example, large amounts of measurement data are first collected and then analysed in interpretative texts. This results in a seemingly clear division in which only the measurement data is referred to as "research data". It may be that it was, or still is, customary at times to regard research data as ephemeral and not worth preserving permanently. The objective of research data management is to preserve not only the interpretive texts, but also the data referred to as "research data" – which form the basis for the interpretation – in the long term and to make them further usable.
The topic of "research data management" (FDM) is currently (2018) being strongly promoted in Germany at both federal and state level with a number of relevant ventures already underway. These activities have been launched with the aim to establish a European Open Science Cloud (EOSC) at EU level. For Germany, on a supra-regional, nationwide level, the recommendation of the "Council for Information Infrastructures" (RfII) for the establishment of a "National Research Data Infrastructure" (NFDI), the NFDI Working Group of the Union of German Academies (with a focus on the humanities) or the interdisciplinary project "Generic Research Data Infrastructure" (GeRDI) funded by the DFG since 2016are worth mentioning. The projects HeFDI ("Hessian Research Data Infrastructures") in Hesse and the "eHumanities – interdisziplinär" project funded by the Bavarian Ministry of Science should be mentioned here as examples of FDM initiatives at state level.
From the perspective of the humanities, the supposedly clear distinction between research data and interpretation data or texts (as may be possible in isolated cases in the field of natural sciences) is extremely problematic and questionable. At any rate, VerbaAlpina does not make such a distinction, but considers all data collected and produced by the project as an inseparably interwoven whole whose individual parts are related to each other in many ways. Accordingly, in terms of "research data management", VerbaAlpina declares the totality of its digital data distributed across the VA_DB, VA_WEB and VA_MT modules (i.e. language data, comments, glossary entries, computer code, media data, etc.) as a research datum that must be preserved faithfully according to the FAIR principles and is guided by the relevant recommendations of the RfII (RfII 2016, Annex A, p. A-13). As a pilot project, VerbaAlpina is integrated into the already mentioned projects GeRDI and "eHumanities – interdisziplinär".
An essential aspect of research data management is the guarantee of interoperability in the sense that persistent cross-project or cross-dataset links between subsets of the respective datasets are possible. The so-called DOIs, "Digital Object Identifiers", play an important role here. These represent the technical prerequisite for the permanent, URL-independent addressability of "digital objects" and can be generated for all electronic content that can be accessed via URL. In the library environment, DOIs were initially used for the persistent identification of electronic book publications (e.g., https://doi.org/10.5282/ubm/epub.25627) or even entire websites (e.g. http://dx.doi.org.emedien.ub.uni-muenchen.de/10.5282/asica). However, deviating from this practice, the need for interoperability between separately developed and managed datasets requires a much finer level of granulation. To this end, VA is creating a series of files accessible on the Internet via URLs which contain the collected linguistic material grouped by morpho-lexical types, concepts, communities of origin and individual records. The files are named using the IDs assigned by VA for the respective data category. Files of the category "Municipality" have an "A" at the beginning of the file name, "C" marks concepts and "L" indicates morpho-lexical types. The number following each one is the ID assigned by VA (cf. "Identifiers"). Access to this data is possible via the API. The DOIs are initially assigned by the LMU's university library within the framework of the "eHumanities – interdisziplinär" project. The LMU's university library will also transfer the data to its own database, where it will be indexed in depth using procedures that are still being developed and a suitable metadata schema. In addition to making the research data available in the repository, the aim is to integrate the finely granulated VA data into the library catalogues and make it easy to retrieve. From the holdings of the LMU's UB, the VA data will also be transferred to the index of the DFG project GeRDI and thus made available for subsequent use in interdisciplinary contexts.
Since May 2021, the VerbaAlpina data of versions 19/1 and 19/2 have also been accessible at the finest level of granulation via the research data portal "Discover" of the UB of the LMU.
see also "Standards Data"