TEI class att.lexicographic.normalized

att.lexicographic.normalized

att.lexicographic.normalized provides attributes for usage within word-level elements in the analysis module and within lexicographic microstructure in the dictionaries module.

Module

analysis — Simple Analytic Mechanisms

Members

att.lexicographic [case colloc def entryFree etym form gen gram gramGrp hom hyph iType lang lbl mood number oRef orth pRef per pos pron re sense subc syll tns usg xr] att.linguistic [pc w]

Attributes

norm norm⚓︎

(normalized) provides the normalized/standardized form of information present in the source text in a non-normalized form.

Status	Optional
Datatype	teidata.text
Normalization of part-of-speech information within a dictionary entry. <gramGrp> <pos norm="noun">n</pos> </gramGrp> ⚓︎
Normalization of a source form in a tokenized historical corpus. <s> <w>for</w> <w norm="virtue's">vertues</w> <w>sake</w> </s> ⚓︎
<s> <w norm="persuasion">perswasion</w> <w>of</w> <w norm="Unity">Vnitie</w> </s> ⚓︎
Example of normalization from Aviso. Relation oder Zeitung. Wolfenbüttel, 1609. In: Deutsches Textarchiv. <s> <w norm="freiwillig">freywillig</w> <pc norm="," join="left">/</pc> <w norm="unbedrängt">vnbedraͤngt</w> <w norm="und">vnd</w> <w norm="unverhindert">vnuerhindert</w> </s> ⚓︎
<w norm="Teil">Theyll</w> ⚓︎
<w norm="Freude">Frewde</w> ⚓︎

orig orig⚓︎

(original) gives the original string or is the empty string when the element does not appear in the source text.

Status	Optional
Datatype	teidata.text
Example from a language documentation project of the Mixtepec-Mixtec language (ISO 639-3: 'mix'). This is a use case where speakers spell something incorrectly but we would like to preserve it for any number of reasons, the use of orig is essential and could have uses for both the speaker to see past mistakes, researchers to get insight into how untrained speakers write their language instinctually (in contrast to prescribed convention), etc.: <w orig="ntsa sia'i">ntsasia'i</w> ⚓︎
Example from the EarlyPrint project. Fragment of text where obvious errors have been corrected but the original forms remain recorded: <w lemma="he" pos="pns" xml:id="b1afj-003-a-0950">he</w> <w lemma="have" pos="vvz" xml:id="b1afj-003-a-0960">hath</w> <w lemma="bring" pos="vvn" xml:id="b1afj-003-a-0970">brought</w> <w lemma="forth" pos="av" xml:id="b1afj-003-a-0980" orig="sorth">forth</w> ⚓︎
An example from the EarlyPrint project showing the use of both norm and orig. The orig attribute preserves the original version (sometimes with spelling errors, often with printer abbreviations), the element content resolves printer abbreviations but retains the original orthography, and the norm attribute holds normalized values: <w lemma="commandment" pos="n1" norm="commandment" xml:id="b9avr-018-a-7720" orig="commandemēt">commandement</w> ⚓︎

Note

It needs to be stressed that the two attributes in this class are meant for strictly lexicographic and linguistic uses, and not for editorial interventions. For the latter, the mechanism based on choice, orig, and reg needs to be employed.

TEI: Guidelines for Electronic Text Encoding and Interchange

att.lexicographic.normalized