LAUDATIO requires metadata for the corpus, its documents and annotations. For each component, you need to upload a TEI XML metadata file. LAUDATIO uses a TEI customization for each component, which is based on a Metamodel for Corpus Metadata. It defines which metadata needs to be recorded for each version of a corpus. The TEI customization is published on Zenodo (https://zenodo.org/record/2543455#.XVui7ntCSMo).
TEI customization
The TEI customization uses the teiHeader for realizing the metadata. For the three customizations, a basic teiHeader structure contains fileDesc, titleStmt, publicationStmt, sourceDesc, encodingDesc, revisionDesc.
Corpus metadata
The TEI XML file for corpus metadata provides information about the corpus title, corpus editors, annotators and rsearchers involved in processing the data (infrastructure task), and project contexts. it further refers to all documents and lists annotations including values and description (guidelines) for each corpus format.
titleStmt contains:
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader type="CorpusHeader">
<fileDesc>
<titleStmt>
<title>XYZ</title>
<!-- add a title -->
<editor n="1" role="CorpusEditor"><!-- add more editors if necessary -->
<persName><!-- optional, recommended add norm references such as OCRIDs as attributes,
add @key and @ref, e.g.: key="orcid" ref="https://orcid.org/1234-1234-1234-" -->
<forename>Jane</forename>
<surname>Doe</surname>
</persName>
<affiliation>
<orgName type="Department">Department of Linguistics</orgName>
<orgName type="Institution">XYZ</orgName><!-- e.g. university -->
</affiliation>
</editor>
<author n="1" role="Annotator"><!-- add more annotators if necessary, count in attribute @n -->
<persName>
<forename>John</forename>
<surname>Doe</surname>
</persName>
<affiliation>
<orgName type="Department">Department of History</orgName>
<orgName type="Institution">University</orgName><!-- e.g. university -->
</affiliation>
</author>
<respStmt>
<resp>Metadata</resp>
<persName><!-- add more if necessary -->
<forename>John</forename>
<surname>Doe</surname>
</persName>
<orgName type="Department">Department of History</orgName>
<orgName type="Institution">University</orgName>
<!-- e.g. university -->
</respStmt>
</titleStmt>
<!-- ... -->
</fileDesc>
</teiHeader>
</TEI>
fileDesc contains metadata that describe the amount of tokens, the publication context, corpus licence and a list of documents:
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader type="CorpusHeader">
<fileDesc>
<titleStmt/>
<extent type="Tokens">123456789</extent>
<publicationStmt>
<authority>Hamburg University</authority>
<!-- e.g. your university -->
<idno>xyz</idno>
<!-- add identificators if available -->
<availability status="free">
<licence target="http://creativecommons.org/licenses/by/4.0/"/>
<!-- e.g. http://creativecommons.org/licenses/by/4.0/ -->
<p>The corpus is publish with CC-BY 4.0 licence.</p>
<!-- prose description of the licence -->
</availability>
<date type="CorpusRelease" when="2019">First complete corpus release.</date>
<!-- short description of the release type -->
</publicationStmt>
<sourceDesc>
<list type="CorpusDocument">
<!-- each document header contains an ID in <fileDesc xml:id="document1">, list the references here -->
<item corresp="Print1" n="1"/>
<item corresp="Print1" n="1"/>
</list>
</sourceDesc>
</fileDesc>
<!-- ... -->
</teiHeader>
</TEI>
profileDesc contains metadata about languages in the documents of the corpus:
<profileDesc>
<langUsage>
<language ident="de" style="Language">Early New High German</language>
<language ident="de" style="LanguageArea">Southern dialects</language>
<language ident="de" style="LanguageType">Bavarian</language>
</langUsage>
</profileDesc>
encodingDesc contains annotation keys and values including a descriptions (guidelines). Annotation keys and values are given in within encodingDesc. Use @xml:id on first naming and use @corresp on every subsequent naming for reference to the same annotation. Each encodingDesc contains metadata corresponding to one corpus format. It might be the case that not every annotation is realized in every format of the corpus. The annotations are classified with a closed list of categories in namespace@rend="...":
- Transcription
- Lexical
- Morphological
- Syntactic
- Graphical
- MarkUp
- Meta
- Other
A corpus header can therefore have more than one encodingDesc The format is specified in appInfo. revisionDesc metadata about corpus revision:
<?xml version='1.0' encoding='utf-8'?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<!-- we use "xyz" to indicate where you can add prose text or values, replace xyz where you would like to fill in something, remove XYZ in any other case -->
<teiHeader type="CorpusHeader">
<fileDesc/>
<profileDesc/>
<encodingDesc n="1">
<!-- each <encodingDesc> describes annotation that are released in a format of the corpus, add more <encodingsDesc> if the corpus has more than one format -->
<appInfo>
<application ident="EXMARaLDA" version="3.0">
<label>EXMARaLDA XML for Partitur editor.</label>
</application>
</appInfo>
<projectDesc>
<p>
<ref target="www.xyz.de"/>Data annotation was carried out in our project:
project description. used </p>
</projectDesc>
<editorialDecl>
<segmentation>
<p>Annotation 'dipl' has an independent segmentation. Every other annotation is
based on the segmentation of 'dipl.</p>
</segmentation>
<normalization>
<p>No normalization is applied.</p>
</normalization>
</editorialDecl>
<tagsDecl>
<namespace name="dipl" rend="Transcription" xml:id="d">
<tagUsage gi="STring">Diplomatic,character based transcription.</tagUsage>
</namespace>
<namespace name="POS" rend="Lexical" xml:id="pos">
<tagUsage gi="DET">Determiner.</tagUsage>
<tagUsage gi="N">Noun.</tagUsage>
<tagUsage gi="V">Verb.</tagUsage>
<tagUsage gi="P">Punctuation.</tagUsage>
<tagUsage gi="PRON">Pronoun.</tagUsage>
</namespace>
</tagsDecl>
</encodingDesc>
<revisionDesc>
<change n="1.0" type="CorpusRelease" when="2019" who="xyz">xyz</change>
</revisionDesc>
</teiHeader>
<text/>
</TEI>
Document metadata
The TEI XML file for document metadata provides information about the document title, document editors and publication history. It further refers to all annotations contained by the document. LAUDATIO requires a TEI XML file for each document in the corpus. Each document needs an XML ID, which must be specified in the attribute xml:id of fileDesc. Best practice: Use the xml:id for each document for file naming.
<?xml version='1.0' encoding='utf-8'?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader style="Herbology" type="DocumentHeader">
<fileDesc xml:id="document1">
<titleStmt>
<title>The great story of the plants</title>
<author>
<persName key="gnd" ref="http...">
<forename>Max</forename>
<surname>Mustermann</surname>
</persName>
</author>
<editor>
<persName>
<forename>Sabine</forename>
<surname>Schulz</surname>
</persName>
</editor>
<respStmt>
<resp>Metadata</resp>
<persName>
<forename>Jane</forename>
<surname>Doe</surname>
</persName>
<orgName type="Department">Department of Linguistics</orgName>
<orgName type="Institution">XYZ</orgName>
</respStmt>
</titleStmt>
<extent type="Tokens">1234</extent>
<publicationStmt/>
<seriesStmt/>
<sourceDesc/>
</fileDesc>
<!-- ... -->
</teiHeader>
<text/>
</TEI>
publicationStmt and seriesStmt provide bibliographic metadata. If this exemplar contains a historical source, provide metadata for this source in sourceDesc (see below).
<?xml version='1.0' encoding='utf-8'?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<!-- we use "xyz" to indicate where you can add prose text or values, replace xyz where you would like to fill in something, remove XYZ in any other case -->
<teiHeader style="Herbology" type="DocumentHeader">
<fileDesc xml:id="document1">
<titleStmt/>
<publicationStmt>
<publisher>ABC Druck</publisher>
<pubPlace>Berlin</pubPlace>
<idno></idno>
<date when="1674">1675</date>
<biblScope>pp.1-80</biblScope>
</publicationStmt>
<seriesStmt>
<title>Zeitschrift für Pflanzen</title>
<editor></editor>
<biblScope unit="vol">2</biblScope>
</seriesStmt>
<sourceDesc/>
</fileDesc>
<!-- ... -->
</teiHeader>
<text/>
</TEI>
sourceDesc provides metadata of the historical original, if applicable and known.
<?xml version='1.0' encoding='utf-8'?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader style="Herbology" type="DocumentHeader">
<fileDesc xml:id="document1">
<titleStmt/>
<publicationStmt/>
<seriesStmt/>
<sourceDesc n="1">
<msDesc>
<msIdentifier>
<msName>1234d</msName>
<altIdentifier>
<repository>Staatsbibliothek</repository>
<collection></collection>
<idno>ST1</idno>
</altIdentifier>
</msIdentifier>
<history>
<origin>
<objectType>Manuskript</objectType>
<origDate notAfter-custom="1550" notBefore-custom="1500" precision="high">1500-1550</origDate>
<origPlace>Tübingen</origPlace>
<title>Pflanzenbuch</title>
<locus/>
</origin>
</history>
</msDesc>
<recordHist>
<source facs="">
<ref
target="http://reader.html"
>Staatsbibliothek ABC</ref>
</source>
</recordHist>
</sourceDesc>
</fileDesc>
<!-- ... -->
</teiHeader>
<text/>
</TEI>
profileDesc contains metadata about languages in the document:
<profileDesc>
<langUsage>
<language ident="de" style="Language">Early New High German</language>
<language ident="de" style="LanguageArea">Southern dialects</language>
<language ident="de" style="LanguageType">Bavarian</language>
</langUsage>
</profileDesc>
encodingDesc contains a list of annotaiton that are contained by the document. It might be the case that document contain different annotations.
<?xml version='1.0' encoding='utf-8'?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader style="Herbology" type="DocumentHeader">
<fileDesc xml:id="document1"/>
<!-- ... -->
<encodingDesc>
<schemaSpec ident="AnnotationKey">
<elementSpec ident="Transcription">
<valList>
<valItem corresp="dipl" ident="dipl"/>
</valList>
</elementSpec>
<elementSpec ident="Lexical">
<valItem corresp="POS" ident="POS"/>
</elementSpec>
<elementSpec ident="Syntactic">
<valList>
<valItem corresp="s" ident="s"/>
</valList>
</elementSpec>
<elementSpec ident="Meta">
<valList>
<valItem corresp="date" ident="date"/>
</valList>
</elementSpec>
</schemaSpec>
</encodingDesc>
</teiHeader>
<text/>
</TEI>
Annotation metadata
The TEI XML file for annotation metadata provides information about the annotation title, editors, annotators and preparation steps. LAUDATIO requires a TEI XML file for each annotation in the corpus. Each annotation needs an XML ID, which must be specified in the attribute xml:id in the corpusHeader. This xml:id is used in @corresp="POS" within title. Best practice: Use the xml:id for each annotation that is provided in the corpus header for file naming.
titleStmt contains the annotation title, editors and annotators:
<?xml version='1.0' encoding='utf-8'?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader type="PreparationHeader">
<fileDesc>
<titleStmt>
<title corresp="POS" type="AnnotationKey">POS</title>
<editor n="1" role="CorpusEditor"><!-- add more editors if necessary -->
<persName><!-- optional, recommended add norm references such as OCRIDs as attributes,
add @key and @ref, e.g.: key="orcid" ref="https://orcid.org/1234-1234-1234-" -->
<forename>Jane</forename>
<surname>Doe</surname>
</persName>
<affiliation>
<orgName type="Department">Department of Linguistics</orgName>
<orgName type="Institution">XYZ</orgName><!-- e.g. university -->
</affiliation>
</editor>
<author n="1" role="Annotator"><!-- add more annotators if necessary, count in attribute @n -->
<persName>
<forename>John</forename>
<surname>Doe</surname>
</persName>
<affiliation>
<orgName type="Department">Department of History</orgName>
<orgName type="Institution">University</orgName><!-- e.g. university -->
</affiliation>
</author>
<respStmt>
<resp>Metadata</resp>
<persName><!-- add more if necessary -->
<forename>John</forename>
<surname>Doe</surname>
</persName>
<orgName type="Department">Department of History</orgName>
<orgName type="Institution">University</orgName>
<!-- e.g. university -->
</respStmt>
</titleStmt>
<publicationStmt/>
<sourceDesc/>
</fileDesc>
<encodingDesc/>
<!-- ... -->
</teiHeader>
<text/>
</TEI>