General information about the GND
The indexing of library content is essentially based on 2 pillars:
- By means of classifications, a (rough) evaluation of the content can be made (DDC). Also, the listing of literature in open access holdings can be organised (RVK).
- By assigning keywords or keyword chains (no longer en vogue), the content of a publication can be described in more detail (GND).
In 2012, the previous authority files PND (Name Authority File), GKD (Corporate Bodies Authority File), SWD (Subject Headings Authority File) and EST (Uniform Title File of the German Music Archive) were merged into the GND. The earlier differentiation between authority files for formal indexing and authority files for subject indexing has thus been abandoned. Today there is one data set per entity that can be used in both contexts.
On indexing and allocation practices
- Formal indexers, i.e., librarians who record the formal metadata of a resource, such as author, title, year of publication, etc., are encouraged to link at least the persons associated with the resource (e.g., author, editor, celebrated person, etc.) to an entry in the GND. In this way, the person is clearly identified. If a person is not yet registered in the GND, a new person entity is created. For this purpose, a predefined set of identifying information (e.g., life data, occupation, assigned institution, etc.) is recorded, which, if possible, is taken from the resource at hand. However, a relevant source of information may also be the person's curriculum vitae published on an institution's website.
- Subject indexers are librarians who index the content of a resource. In doing so, they rely on the title of the resource, but not exclusively. It is not uncommon for resources to have quite elaborate headings that do not give any indication of the actual content. Therefore, subject indexers usually get an overview of the content by looking at the title, the blurb, the table of contents, the preface, the introduction, the conclusion, etc. They then summarise it in a handful of keywords. The OGND, for example, is a good place to search for suitable keywords.
Meanwhile, the DNB is testing procedures that can be used to assign keywords automatically.
On the GND in the context of normalisation and data exchange
Libraries began exchanging their indexing data with each other quite early on. This requires a uniform (exchange) format (MARC) and a vocabulary (GND) that standardises designations and at the same time solves the problem of synonyms, homonyms, etc.
For some years now, data has been exchanged not only between libraries, but also between different culture and knowledge institutions. In the course of this, the GND, as a source of authority data, is also increasingly used by archives, museums etc. It has thus become fundamentally relevant for the digital humanities. (Cf. the GND4C project: https://www.dnb.de/DE/Professionell/ProjekteKooperationen/Projekte/GND4C/gnd4c.html)
The use of authority data, especially the GND, enables data aggregators such as the German Digital Library or bavarikon to link objects from different fields and thus improve their retrievability.
The advantage that the GND offers in this context can be illustrated with a (fictitious) example:
In bavarikon, there is a portrait of Martin Luther alongside a coin with Martin Luther's portrait. Both objects have Martin Luther as their "subject", however, they can only be related to each other by the system (in a simple way) if in both cases not only a string is entered in the field dc:subject, but also a unique identifier, such as the GND-ID (118575449). If strings are used instead of an identifier, it is quite possible that they differ from each other, i.e., the same person is meant, but their identifiers are different. A look at the column "Other Names" of the GND data set makes it clear that this is not at all unlikely:
http://d-nb.info/gnd/118575449. It is not difficult for a human to merge the (slightly) different strings, but for a machine this is a greater obstacle.
On the GND in the context of Linked Data
Although the GND is now increasingly used outside of libraries, the format of the GND records, MARC, is strongly domain-specific and is not used outside of the library world. The GND Ontology represents an attempt to close this gap in order to make the GND usable for the Semantic Web as well, because:
„The need for name disambiguation and entries having an authoritative character is an issue that concerns a lot more communities than the library world. In a growing information society the unique identification and linking of persons, places and other authorities becomes more and more important. The GND Ontology aims to transfer the made experience from libraries to the web community by providing a vocabulary for the description of conferences or events, corporate bodies, places or geographic names, differentiated persons, undifferentiated persons (name of undifferentiated persons), subject headings, and works.“
An ontology consists of the following components:
- concepts/classes group real-world instances with common properties, e.g., "keyword";
- instances/concepts representing the actual objects, e.g., butter, identified by the global URI http://d-nb.info/gnd/4009236-7;
- relations link concepts and instances together. For example, Butter is identified as an object of the class "SubjectHeadingSensoStricto" (a subclass of the class "keyword") via the following construct:<rdf:Description rdf:about="http://d-nb.info/gnd/4009236-7"><rdf:type rdf:resource="http://d-nb.info/standards/elementset/gnd#SubjectHeadingSensoStricto"/ß> (cf. http://d-nb.info/gnd/4009236-7/about/rdf).
One advantage of Linked Data is that the encoded information is language-independent. In the example above, the object represented by the term Butter, or in other words, the real world object BUTTER, is described in more detail by properties. The string Butter also appears in the RDF file, but only as a property of the resource Butter:
<gndo:preferredNameForTheSubjectHeading rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Butter</gndo:preferredNameForTheSubjectHeading>.
In a use case where one would need the Italian equivalent in addition to the German term Butter, one could simply form another triple (RDF is based on triples) for this, e.g., consisting of the resource http://d-nb.info/gnd/4009236-7 as subject, rdfs:label xml:lang="it" as predicate and the literal (string) burro.
Assuming that the Biblioteca nazionale Firenze would proceed with its Nuovo Soggetario Thesaurus in a similar way as the DNB does with the GND, the resource Butter in the GND could be linked to the resource burro in the Nuovo Soggetario Thesaurus, e.g., via the property owl:sameAs, in order to express that in both cases the same real world object BUTTER is described.
With the property <skos:broadMatch rdf:resource="http://zbw.eu/stw/descriptor/14957-0"/>, for example, the GND resource butter is related to the ZBW resource coating fat.Deutsche Nationalbibliothek
Identifikationsnummer