# DHO Knowledge Graph Data Integration

Automated Pipelines and Mappings to integrate new data into the Digital Heraldry Knowledge Graph

## Directory Structure
* `data/`
    * `input/` New data to be integrated into the Knowledge Graph
    * `rdf-output/` RDF files created by the transformation pipelines
* `src/`
    * `rdf-mappings` Mapping script to transform data into RDF
* `config/` Includes json files, containing information how to run the scripts. Each config-file has the corresponding script name embedded in its name as well as in its content.

## Pipeline
- [ ] Visualisation of the complete Pipeline with Github mermaid

## Usage
- [ ] Add usage instructions, when pipeline is complete

### Current order of mappings
- [ ] Automate calling all mappings in a single pipeline
* Call `map-tblBranch.py`
* Call `map-tblArmItems.py`
* Call `merge_rdf_files_into_kg.py` with rdf files to be merged

### Namespaces
All namespaces must be defined in `dho_namespaces.py`. Each mapping script can bind all of these namespaces to its local graph by calling the function `bind_namespaces()`.

### Individual Mapping Scripts
#### Map descriptions of Coats of Arms to RDF
Uses the descriptions from the OMA table `tblBranch`. Mapping is done by the script `map-tblBranch.py`. The script can be configured through the file `config/config-map-tblBranch.json`. This config-file contains:
* `csv_input_path`: source file from which the coat of arms descriptions shall be mapped.
* `initial_ontology_definitions`: Decides, if classes and properties are defined before adding new data to the knowledge graph. Can be set with a python file which contains a number class and property definitions, executed by rdflib. These definitions are then executed in `map-tblBranch.py` before any data is being mapped from `tblBranch` (set in `csv_input_path`). If `null` is given as a value for `initial_ontology_definitions`, no classes or properties are added in advance.
* `existing_ontology`: File link to an existing knowledge graph. If set, this KG is loaded before adding any new data. The old data, including UUIDs, is then not overwritten, when `map-tblBranch.py` is run.
* `term_mappings`: Mapping table, resolving abbreviations for heraldic terms, that are used in `tblBranch`.
* `add_metadata`: Boolean value. States, if the metadata, defined in `dho_metadata.py` is to be added as ontology metadata.
* `output_files`: List of output files and corresponding format into which the results are to be serialized. The first output-object in the list is considered as preferred and therefore used by following steps in the pipeline.

#### Map occurences of coats of arms in manuscripts to RDF
Uses the list of occurences of coats of arms in manuscripts from the OMA table `tblArmItems`. Mapping is done by the script `map-tblArmItems.py`. The script can be configured through the file `config/config-map-tblArmItems.json`. This config-file contains:
* `csv_input_path`: source file from which the coat of arms descriptions shall be mapped.
* `initial_ontology_definitions`: Decides, if classes and properties are defined before adding new data to the knowledge graph. Can be set with a python file which contains a number class and property definitions, executed by rdflib. These definitions are then executed in `map-tblArmItems.py` before any data is being mapped from `tblArmItems` (set in `csv_input_path`). If `null` is given as a value for `initial_ontology_definitions`, no classes or properties are added in advance.
* `existing_ontology`: File link to an existing knowledge graph. If set, this KG is loaded before adding any new data. The old data, including UUIDs, is then not overwritten, when `map-tblArmItems.py` is run.
* `output_files`: List of output files and corresponding format into which the results are to be serialized. The first output-object in the list is considered as preferred and therefore used by following steps in the pipeline.

#### Merge multiple RDF files into one single Knowledge Graph
Merging is done by the script `merge_rdf_files_into_kg.py`. The input is given as terminal parameters. For more information call `python3 merge_rdf_files_into_kg.py -h`. The output and if the content of an existing graph is to be overwritten is set in the configuration file `config-merge_rdf_files_into_kg.json`. This config-file contains:
* `existing_ontology`: File link to an existing knowledge graph. If set, this KG is loaded before adding any new data. The old data, including UUIDs, is then not overwritten, when `merge_rdf_files_into_kg.py` is run.
* `output_files`: List of output files and corresponding format into which the results are to be serialized. The first output-object in the list is considered as preferred and therefore used by following steps in the pipeline.

#### Integrate metadata
The script `integrate_manuscript_metadata_into_kg.py` creates entities for the manuscript in the Knowledge Graph and integrates their metadata. The script is only a preliminary version; configuration is hard coded into the script.

### Create ontology documentation
The content of the documentation of all classes and properties is stored as TSV files in `data/input/documentation`. To integrate the whole content of the documentation directory into an RDF file, call the script `update_documentation.py` with the RDF file as a command line parameter (in most cases, this will be `digital-heraldry-ontology.ttl`)

The html documentation of the ontology is created with [WIDOCO](https://github.com/dgarijo/Widoco). To create the first version of the html documentation (without prior versions) call
java -jar widoco-1.4.16-jar-with-dependencies.jar -ontFile "data/rdf-output/digital-heraldry-ontology.ttl" -outFolder documentation -getOntologyMetadata -rewriteAll -htaccess -webVowl -includeAnnotationProperties
To create additional versions, call
java -jar widoco-1.4.16-jar-with-dependencies.jar -ontFile "data/rdf-output/digital-heraldry-ontology.ttl" -outFolder documentation -getOntologyMetadata -rewriteAll -webVowl -includeAnnotationProperties -licensius

To update an existing documentation to preserve prior versions call:
- [ ] add command to take versioning into account

- [ ] Automate Upload of html documentation to server

## Important Dependencies
* [rdflib](https://github.com/RDFLib/rdflib)
* [Pandas](https://pandas.pydata.org/)
* [WIDOCO](https://github.com/dgarijo/Widoco)

## License
- [ ] Add License