Skip to content
Snippets Groups Projects

Generic badge

Generic badge

DHO Knowledge Graph Data Integration

Automated Pipelines and Mappings to integrate new data into the Digital Heraldry Knowledge Graph

Directory Structure

  • data/
    • input/ New data to be integrated into the Knowledge Graph
    • rdf-output/ RDF files created by the transformation pipelines
  • src/
    • rdf-mappings Mapping script to transform data into RDF
  • config/ Includes json files, containing information how to run the scripts. Each config-file has the corresponding script name embedded in its name as well as in its content.

Changelog

Changes between versions of all ontologies are documented in the CHANGELOG.md

Pipeline

  • Visualisation of the complete Pipeline with Github mermaid

Usage

  • Add usage instructions, when pipeline is complete

Current order of mappings

  • Automate calling all mappings in a single pipeline
  • Call map-tblBranch.py
  • Call map-tblArmItems.py
  • Call merge_rdf_files_into_kg.py with rdf files to be merged

Namespaces

All namespaces must be defined in dho_namespaces.py. Each mapping script can bind all of these namespaces to its local graph by calling the function bind_namespaces().

Individual Mapping Scripts

Map descriptions of Coats of Arms to RDF

Uses the descriptions from the OMA table tblBranch. Mapping is done by the script map-tblBranch.py. The script can be configured through the file config/config-map-tblBranch.json. This config-file contains:

  • csv_input_path: source file from which the coat of arms descriptions shall be mapped.
  • initial_ontology_definitions: Decides, if classes and properties are defined before adding new data to the knowledge graph. Can be set with a python file which contains a number class and property definitions, executed by rdflib. These definitions are then executed in map-tblBranch.py before any data is being mapped from tblBranch (set in csv_input_path). If null is given as a value for initial_ontology_definitions, no classes or properties are added in advance.
  • existing_ontology: File link to an existing knowledge graph. If set, this KG is loaded before adding any new data. The old data, including UUIDs, is then not overwritten, when map-tblBranch.py is run.
  • term_mappings: Mapping table, resolving abbreviations for heraldic terms, that are used in tblBranch.
  • concepts_with_multiple_inheritance: List of heraldic concepts that are used as synonyms. This is special case if a heraldic concept is used in different context e.g. "per bend" may be used as a Pattern as well as an Arrangement. In this case, new_class_name has to be differentiated by their subclass e.g. by transforming it to ArrangedPerPale or PatternedPerPale.
  • add_metadata: Boolean value. States, if the metadata, defined in dho_metadata.py is to be added as ontology metadata.
  • output_files: List of output files and corresponding format into which the results are to be serialized. The first output-object in the list is considered as preferred and therefore used by following steps in the pipeline.

Map occurences of coats of arms in manuscripts to RDF

Uses the list of occurences of coats of arms in manuscripts from the OMA table tblArmItems. Mapping is done by the script map-tblArmItems.py. The script can be configured through the file config/config-map-tblArmItems.json. This config-file contains:

  • csv_input_path: source file from which the coat of arms descriptions shall be mapped.
  • initial_ontology_definitions: Decides, if classes and properties are defined before adding new data to the knowledge graph. Can be set with a python file which contains a number class and property definitions, executed by rdflib. These definitions are then executed in map-tblArmItems.py before any data is being mapped from tblArmItems (set in csv_input_path). If null is given as a value for initial_ontology_definitions, no classes or properties are added in advance.
  • existing_ontology: File link to an existing knowledge graph. If set, this KG is loaded before adding any new data. The old data, including UUIDs, is then not overwritten, when map-tblArmItems.py is run.
  • output_files: List of output files and corresponding format into which the results are to be serialized. The first output-object in the list is considered as preferred and therefore used by following steps in the pipeline.
  • include_armcodes: You can also only map specifically selected manuscripts (identified through ArmCode) to RDF. To do so, set include_armcodes to a list, containing the ArmCodes. If you want to map the whole OMA database to RDF, set this to null.

If you call map-tblArmItems.py with "-t" as a parameter, only a small test dataset is created.

Merge multiple RDF files into one single Knowledge Graph

Merging is done by the script merge_rdf_files_into_kg.py. The input is given as terminal parameters. For more information call python3 merge_rdf_files_into_kg.py -h. The output and if the content of an existing graph is to be overwritten is set in the configuration file config-merge_rdf_files_into_kg.json. This config-file contains:

  • existing_ontology: File link to an existing knowledge graph. If set, this KG is loaded before adding any new data. The old data, including UUIDs, is then not overwritten, when merge_rdf_files_into_kg.py is run.
  • metadata_file: File link to a table with manuscript metadata. Necessary to create complete IDs for dhor:CoatOfArmsRepresentations and dhoo:Manuscripts.
  • output_files: List of output files and corresponding format into which the results are to be serialized. The first output-object in the list is considered as preferred and therefore used by following steps in the pipeline.

Integrate metadata

The script integrate_manuscript_metadata_into_kg.py creates entities for the manuscript in the Knowledge Graph and integrates their metadata. The script is only a preliminary version; configuration is hard coded into the script.

Create ontology documentation

The content of the documentation of all classes and properties is stored as TSV files in data/input/documentation. To integrate the whole content of the documentation directory into an RDF file, call the script update_documentation.py with the RDF file as a command line parameter (in most cases, this will be digital-heraldry-ontology.ttl)

The html documentation of the ontology is created with WIDOCO. To create the first version of the html documentation (without prior versions) call

java -jar widoco-1.4.16-jar-with-dependencies.jar -ontFile "data/rdf-output/digital-heraldry-ontology.ttl" -outFolder documentation -getOntologyMetadata -rewriteAll -htaccess -webVowl -includeAnnotationProperties

To create additional versions, call

java -jar widoco-1.4.16-jar-with-dependencies.jar -ontFile "data/rdf-output/digital-heraldry-ontology.ttl" -outFolder documentation -getOntologyMetadata -rewriteAll -webVowl -includeAnnotationProperties -licensius

To update an existing documentation to preserve prior versions call:

  • add command to take versioning into account

  • Automate Upload of html documentation to server

Testing changes

For testing purposes (e.g. to check if changes made to scripts or ontologies apply correctly in the data), the pipeline in test-dataset-creation-pipeline.ipynb may be used. Note, that you may have to adapt the file paths in the config-files in the config directory.

Important Dependencies

License

  • Add License