Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Encoding and Presenting Interlinear Text Using XML Technologies


Published on

Paper at ALTW2003 (December 2003, Melbourne)

Published in: Economy & Finance, Technology
  • Have u ever tried external professional writing services like ⇒ ⇐ ? I did and I am more than satisfied.
    Are you sure you want to  Yes  No
    Your message goes here
  • I have always found it hard to meet the requirements of being a student. Ever since my years of high school, I really have no idea what professors are looking for to give good grades. After some google searching, I found this service ⇒ ⇐ who helped me write my research paper.
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Encoding and Presenting Interlinear Text Using XML Technologies

  1. 1. Encoding and Presenting Interlinear Text Using XML Technologies Baden Hughes, Steven Bird, Catherine Bow University of Melbourne Australasian Language Technology Workshop December 10, 2003
  2. 2. Introduction / Outline <ul><li>What is interlinear text? </li></ul><ul><li>EMELD Interlinear Text Model </li></ul><ul><li>XML Representation </li></ul><ul><li>Interlinear Text Styles </li></ul><ul><li>XSL Rendering </li></ul><ul><li>Prototype & Implementation </li></ul><ul><li>Future Research </li></ul>
  3. 3. What is interlinear text? <ul><li>A standard presentational form for displaying a source text aligned with a variety of linguistic annotations </li></ul><ul><ul><li>may include phonological, morphological, syntactic analyses, glosses, translations, comments </li></ul></ul><ul><li>Variations in structure, alignment, display styles, mapping, wrapping, etc. </li></ul><ul><li>Typical example of three line text: </li></ul>Yidinj (Dixon 1977)
  4. 4. Interlinear text samples (contd) <ul><li>Nivkh (Comrie 1981) </li></ul>text metadata notes free translation
  5. 5. Interlinear Text Samples sh v3.0 485 SE Text itm t Story from tape 20001bx told by Kalsarap Namaf. aud as 0 ae 13.0002 x Akit tumaui tae esan ipi, go mr akit tu- mau tae esan i - pi go mg 1plincS 1plincRS- all know place 3sgRS - be and POS pron pron- quantifier vambi n pron - v conj fg We all know that place, and this Litrapong… fgb Yumi evriwan isave ples ia. Mo Litrapong (Lisepsep) ia. South Efate (Namaf, 2001)
  7. 7. XML Representation <ul><li><interlinear-text> </li></ul><ul><li><item type=”user-defined”> </li></ul><ul><li>Content at the text level, such as metadata, </li></ul><ul><li>or an unaligned transcription of the entire text, </li></ul><ul><li>or a pointer to an unaligned audio file </li></ul><ul><li></item> </li></ul><ul><li><phrases> </li></ul><ul><li>Nested XML content to represent the phrasal </li></ul><ul><li>constituents of the text </li></ul><ul><li></phrases> </li></ul><ul><li></interlinear-text> </li></ul><ul><li>Each level is considered an element in </li></ul><ul><li>an XML document </li></ul>
  8. 8. <ul><li><interlinear-text> </li></ul><ul><li><item type=“title”> A Yidinj Story </item> </li></ul><ul><li><phrases> </li></ul><ul><li><phrase> </li></ul><ul><li><item type=“number”> 99 </item> </li></ul><ul><li><item type=“gls”> Where have you come from? </item> </li></ul><ul><li><words> </li></ul><ul><li><word> </li></ul><ul><li><item type=“txt”> nundu </item> </li></ul><ul><li><morphs> </li></ul><ul><li><morph> </li></ul><ul><li><item type=“gls”> you-SA </item> </li></ul><ul><li></morph> </li></ul><ul><li></morphs> </li></ul><ul><li></word> </li></ul><ul><li><word> </li></ul><ul><li><item type=“txt”> wandam </item> </li></ul><ul><li><morphs> </li></ul><ul><li><morph> </li></ul><ul><li><item type=“gls”> where-ABL </item> </li></ul><ul><li></morph> </li></ul><ul><li></morphs> </li></ul><ul><li></word> </li></ul><ul><li></words> </li></ul><ul><li><phrase> </li></ul><ul><li></phrases> </li></ul><ul><li></interlinear-text> </li></ul>XML Representation – Yidinj text
  9. 9. Interlinear Text Styles <ul><li>Row display </li></ul><ul><li>Row styles </li></ul><ul><li>Row ordering </li></ul><ul><li>Grouping of content </li></ul>TEXT nyewøxi nyenæcyøje q syo q MNG (noun+Ø+vbs) (noun+n/j+acpl+cbs)(suf+genpl) (noun+jo+gbs) (suf+nompl) BASE nyewøxi nyenæcyøh syo MITA Traditional folk songs. (1) Tundra Nenets (Paakkan, 1997) Nyewºxiº nyenecyøyeq syoq. ancient.ABS.NOM.SG person.ABS.GEN.PL song.ABS.NOM.PL Traditional folk songs. (2) Tundra Nenets (Susoi 1990)
  10. 10. <ul><li>Key presentation challenge of interlinear text </li></ul><ul><li>Complexity due to relative length of analysis & source </li></ul>Line-wrapping Nyew º xi º nyenecy ø yeq syoq ancient.ABS.NOM.SG person.ABS.GEN.PL song.ABS.NOM.PL Traditional folk songs <ul><li>Implications for rendering technology </li></ul>Hypothetical line length Hypothetical line wrapping Correct line wrapping
  11. 11. XSL Rendering <ul><li>Transforms XML docs into other formats </li></ul><ul><li>Generate a variety of useful formats for </li></ul><ul><ul><li>human consumption (e.g. html, pdf, jpg) </li></ul></ul><ul><ul><li>machine consumption (e.g. to another XML format) </li></ul></ul><ul><li>Two stages: </li></ul><ul><ul><li>Convert XML to format specifying grouping, row ordering, styles </li></ul></ul><ul><ul><li>Convert XML into formatting instructions of another language </li></ul></ul><ul><ul><ul><li>Conversion to XSL Formatting Objects (XSL-FO) </li></ul></ul></ul><ul><ul><ul><li>Rendering into delivery format </li></ul></ul></ul>
  12. 12. XSL Formatting Objects <ul><li>XSL-FO is an XML application that describes how pages will look when presented to a reader </li></ul>XML + XSL XSL-FO OUTPUT   Abstract representation Stylesheet transformation Rendered version: XML, PDF, JPG, etc. Abstract presentational format
  13. 13. XSL Implementation XSL 1 XML UR Abstract representation Delivery XSL 2 XSL 3 XML SR Surface representation XSL FO XML FO PDF HTML RTF SVG JPEG XSL PUB XSL PUB Rendered in XML
  14. 14. <ul><li><xsl:template match=”phrase”> </li></ul><ul><li><phrase> </li></ul><ul><li><xsl:apply-templates select=”words”/> </li></ul><ul><li><xsl:apply-templates select=”item”/> </li></ul><ul><li></phrase> </li></ul><ul><li></xsl:template> </li></ul>XSL Example - phrase
  15. 15. <ul><li><xsl:template match=“document”> </li></ul><ul><li><document> </li></ul><ul><li><interlinear-text> </li></ul><ul><li><phrases> </li></ul><ul><li><xsl:for-each </li></ul><ul><li>select=“interlinear-text/phrases/phrase/words/word”> </li></ul><ul><li><xsl:sort select=“.”/> </li></ul><ul><li><phrase> </li></ul><ul><li><words> </li></ul><ul><li><xsl:copy-of select=“.”/> </li></ul><ul><li></words> </li></ul><ul><li></phrase> </li></ul><ul><li></xsl:for-each> </li></ul><ul><li></phrases> </li></ul><ul><li></interlinear-text> </li></ul><ul><li></document> </li></ul><ul><li></xsl:template> </li></ul>XSL Example - document
  16. 16. Example: Nenets interlinear (Susoi)
  17. 17. Example: Nenets (Susoi) structure
  18. 18. Example: Nenets (Susoi) wordlist
  19. 19. Prototype <ul><li>Underlying Data </li></ul><ul><li>Surface Display </li></ul><ul><li>Variant Display </li></ul><ul><ul><li>Simple display types </li></ul></ul><ul><ul><ul><li>Free translation as separate block </li></ul></ul></ul><ul><ul><ul><li>or separate frame for synchronised scrolling and linking </li></ul></ul></ul><ul><ul><li>Complex display types </li></ul></ul><ul><ul><ul><li>Metastructural display </li></ul></ul></ul><ul><ul><ul><li>Row re-ordering </li></ul></ul></ul><ul><ul><ul><li>Optional row display </li></ul></ul></ul><ul><ul><ul><li>Wordlist linkage </li></ul></ul></ul><ul><ul><ul><li>Concordance linkage </li></ul></ul></ul>
  20. 20. Implementation <ul><li>User Interface </li></ul><ul><ul><li>Select input text, display types, output format </li></ul></ul><ul><li>Parameterisation Logic </li></ul><ul><ul><li>Processed by script to determine display type and result type </li></ul></ul><ul><li>Rendering Engine </li></ul><ul><ul><li>Combines source and option parameters to generate appropriate output type for browser to display </li></ul></ul>
  21. 22. Future Research <ul><li>Architectural Extensions </li></ul><ul><ul><li>Linguistic ontologies </li></ul></ul><ul><ul><li>Text mining and retrieval </li></ul></ul><ul><ul><li>Compatibility with other schemata </li></ul></ul><ul><li>API for interlinear text manipulation </li></ul><ul><li>Embedding interlinear functionality in application instances </li></ul><ul><ul><li>e.g. AGTK </li></ul></ul>
  22. 23. Conclusion <ul><li>Interlinear text as a pervasive data type in linguistics </li></ul><ul><ul><li>Various tools available to create and edit </li></ul></ul><ul><ul><li>Outputs tied to particular implementations </li></ul></ul><ul><li>Need for open extensible model </li></ul><ul><ul><li>Allows reuse of interlinear text in different output formats </li></ul></ul><ul><ul><li>XML-based structural encoding allows for manipulation and querying </li></ul></ul>