Revised 1 March 2001
Under current instructions, the only place you need to use the LANG attribute is in connection with the <FOREIGN> tag, used chiefly to mark the location of text in a non-Latin alphabet (e.g. Greek, Hebrew, etc.), where its only content is a <GAP> tag.
[Many other tags, e.g. <Q>, have the LANG attribute but we have not been consistently using it except for this one purpose.]
[<FOREIGN> can of course be used to tag any text in a language other than the predominant language of the document, but we have not generally been marking such text unless it is a non-Roman alphabet. It does no harm to tag (say) Spanish or Latin quotations as <FOREIGN>, but since we make no use of the information at this point, it is more efficient not to record it.]
Our way of handling this problem is to insert a <LANGUSAGE> tag into the <PROFILEDESC> of the <TEIHEADER>. Insert the tag near the beginning of the <PROFILEDESC> which itself should appear at the end of the <TEIHEADER>. See the moa-tei.dtd or the green books (or the online TEI guidelines) for the exact placement rules.
The <LANGUSAGE> element contains one or more <LANGUAGE> tags. The content of the <LANGUAGE> tag is the name of the language; the ID attribute of the <LANGUAGE> tag is an abbreviation for the language. Use the same abbreviation as the value of the ID attribute here as you used when assigning the LANG attribute of the <FOREIGN> tag.
The actual language abbreviations that you use should be the three-letter codes listed in the table of Library of Congress MARC codes for languages, found at:
http://lcweb.loc.gov/marc/languages/
Say that in one of your books you find a line in Greek and another in Tibetan. Since we do not attempt to capture Greek (or Tibetan) characters, simply record the presence of the foreign text with <FOREIGN> tags around a <GAP> tag: <FOREIGN LANG="grc"><GAP></FOREIGN> and <FOREIGN LANG="tib"></FOREIGN>.
("grc" is the MARC code for ancient, as opposed to Modern, Greek: use the most general code that fits. Don't try to distinguish between periods of a language or between different dialects of a language unless the code list requires you to do so. In this case there is no general "greek" code, so we have to decide if its ancient or modern. Being a quotation from Plutarch, it's ancient.)
Since you've now inserted two attributes in the document that take a declared value of "IDREFS", the document won't parse unless there are corresponding IDs in the same document.
Insert this in header at beginning of PROFILEDESC
<PROFILEDESC> <LANGUSAGE> <LANGUAGE ID="grc">Ancient Greek<LANGUAGE> <LANGUAGE ID="tib">Tibetan</LANGUAGE> </LANGUSAGE> ... </PROFILEDESC>
with a <LANGUAGE> tag for each language used and tagged within that document. The document will now validate.