# DHO Knowledge Graph Data Integration Automated Pipelines and Mappings to integrate new data into the Digital Heraldry Knowledge Graph ## Directory Structure * `data/` * `input/` New data to be integrated into the Knowledge Graph * `rdf-output/` RDF files created by the transformation pipelines * `src/` * `rdf-mappings` Mapping script to transform data into RDF * `config/` Includes json files, containing information how to run the scripts. Each config-file has the corresponding script name embedded in its name as well as in its content. ## Pipeline - [ ] Visualisation of the complete Pipeline with Github mermaid ## Usage - [ ] Add usage instructions, when pipeline is complete ### Current order of mappings - [ ] Automate calling all mappings in a single pipeline * Call `map-tblBranch.py` * Call `map-tblArmItems.py` * Call `merge_rdf_files_into_kg.py` with rdf files to be merged ### Namespaces All namespaces must be defined in `dho_namespaces.py`. Each mapping script can bind all of these namespaces to its local graph by calling the function `bind_namespaces()`. ### Individual Mapping Scripts #### Map descriptions of Coats of Arms to RDF Uses the descriptions from the OMA table `tblBranch`. Mapping is done by the script `map-tblBranch.py`. The script can be configured through the file `config/config-map-tblBranch.json`. This config-file contains: * `csv_input_path`: source file from which the coat of arms descriptions shall be mapped. * `initial_ontology_definitions`: Decides, if classes and properties are defined before adding new data to the knowledge graph. Can be set with a python file which contains a number class and property definitions, executed by rdflib. These definitions are then executed in `map-tblBranch.py` before any data is being mapped from `tblBranch` (set in `csv_input_path`). If `null` is given as a value for `initial_ontology_definitions`, no classes or properties are added in advance. * `existing_ontology`: File link to an existing knowledge graph. If set, this KG is loaded before adding any new data. The old data, including UUIDs, is then not overwritten, when `map-tblBranch.py` is run. * `term_mappings`: Mapping table, resolving abbreviations for heraldic terms, that are used in `tblBranch`. * `add_metadata`: Boolean value. States, if the metadata, defined in `dho_metadata.py` is to be added as ontology metadata. * `output_files`: List of output files and corresponding format into which the results are to be serialized. The first output-object in the list is considered as preferred and therefore used by following steps in the pipeline. #### Map occurences of coats of arms in manuscripts to RDF Uses the list of occurences of coats of arms in manuscripts from the OMA table `tblArmItems`. Mapping is done by the script `map-tblArmItems.py`. The script can be configured through the file `config/config-map-tblArmItems.json`. This config-file contains: * `csv_input_path`: source file from which the coat of arms descriptions shall be mapped. * `initial_ontology_definitions`: Decides, if classes and properties are defined before adding new data to the knowledge graph. Can be set with a python file which contains a number class and property definitions, executed by rdflib. These definitions are then executed in `map-tblArmItems.py` before any data is being mapped from `tblArmItems` (set in `csv_input_path`). If `null` is given as a value for `initial_ontology_definitions`, no classes or properties are added in advance. * `existing_ontology`: File link to an existing knowledge graph. If set, this KG is loaded before adding any new data. The old data, including UUIDs, is then not overwritten, when `map-tblArmItems.py` is run. * `output_files`: List of output files and corresponding format into which the results are to be serialized. The first output-object in the list is considered as preferred and therefore used by following steps in the pipeline. #### Merge multiple RDF files into one single Knowledge Graph Merging is done by the script `merge_rdf_files_into_kg.py`. The input is given as terminal parameters. For more information call `python3 merge_rdf_files_into_kg.py -h`. The output and if the content of an existing graph is to be overwritten is set in the configuration file `config-merge_rdf_files_into_kg.json`. This config-file contains: * `existing_ontology`: File link to an existing knowledge graph. If set, this KG is loaded before adding any new data. The old data, including UUIDs, is then not overwritten, when `merge_rdf_files_into_kg.py` is run. * `output_files`: List of output files and corresponding format into which the results are to be serialized. The first output-object in the list is considered as preferred and therefore used by following steps in the pipeline. #### Integrate metadata The script `integrate_manuscript_metadata_into_kg.py` creates entities for the manuscript in the Knowledge Graph and integrates their metadata. The script is only a preliminary version; configuration is hard coded into the script. ### Create ontology documentation The content of the documentation of all classes and properties is stored as TSV files in `data/input/documentation`. To integrate the whole content of the documentation directory into an RDF file, call the script `update_documentation.py` with the RDF file as a command line parameter (in most cases, this will be `digital-heraldry-ontology.ttl`) The html documentation of the ontology is created with [WIDOCO](https://github.com/dgarijo/Widoco). To create the first version of the html documentation (without prior versions) call ```` java -jar widoco-1.4.16-jar-with-dependencies.jar -ontFile "data/rdf-output/digital-heraldry-ontology.ttl" -outFolder documentation -getOntologyMetadata -rewriteAll -htaccess -webVowl -includeAnnotationProperties ```` To create additional versions, call ```` java -jar widoco-1.4.16-jar-with-dependencies.jar -ontFile "data/rdf-output/digital-heraldry-ontology.ttl" -outFolder documentation -getOntologyMetadata -rewriteAll -webVowl -includeAnnotationProperties -licensius ```` To update an existing documentation to preserve prior versions call: - [ ] add command to take versioning into account - [ ] Automate Upload of html documentation to server ## Important Dependencies * [rdflib](https://github.com/RDFLib/rdflib) * [Pandas](https://pandas.pydata.org/) * [WIDOCO](https://github.com/dgarijo/Widoco) ## License - [ ] Add License