3. XML Document Structure
• An XML document consists of a number of discrete components
• Not all the sections of an XML document may be necessary,
– But their inclusion helps to make for a well-structured XML document
• A well-structured XML document can
– Easily be transported between systems and devices
4. Major portions of an XML document
• The major portions of an XML document include the following:
– The XML declaration
– The Document Type Declaration (DTD)
– The element data
– The attribute data
– The character data or XML content
5. XML Declaration
• XML Declaration is a definite way of stating exactly
– What the document contains.
• XML document can optionally have an XML declaration
– It must be the first statement of the XML document
• XML declaration is a processing instruction of the form
<?xml ...?>
6. Components of XML Declaration
Component Meaning
<?xml Starts the beginning of the processing instruction
Version= “xxx” Describes the specific version of XML being used
standalone= “xxx” Defines whether documents are allowed to contain
external markup declarations
encoding= “xxx” Indicates the character encoding that the document uses.
The default is “US-ASCII” but can be set to any value
Example :
7. Document Type Declaration (DOCTYPE)
• DOCTYPE
– Gives a name to the XML content , and
– Provides a means to guarantee the document’s validity,
• Either by including or specifying a link to a Document Type Definition (DTD).
• DOCTYPE is optional in XML
• Valid XML documents must declare the document type to which they
comply
8. General Form of DOCTYPE
• General Forms of the Document Type Declarations
<!DOCTYPE NAME SYSTEM “file”>
<!DOCTYPE NAME [ ]>
<!DOCTYPE NAME SYSTEM “file” [ ]>
First form refers to
– A document that only allows use of an externally defined DTD subset.
Second declaration
– Only allows an internally defined subset within the document.
Last form provides
– A place for inclusion of an internally defined DTD subset b/w square brackets
while also making use of an external subset.
9. Example on DOCTYPE
• Example on First Forms
<!DOCTYPE shirt SYSTEM “shirt.dtd”>
– Root (first) tag in the document will be the <shirt> element
– DTD is saved to a file named shirt.dtd
11. Markup and Content
• XML documents are composed of markup and content.
• In general, six kinds of markup can occur in an XML document:
– elements,
– entity references,
– comments,
– processing instructions,
– marked sections, and
– Document Type Declarations.
12. Elements
• XML elements are
– Either a matched pair of XML tags or single XML tags that are “self-closing.”
• For example,
– A shirt element begins with <shirt> and ends with </shirt>.
• When elements do not come in pairs,
– The element name is suffixed by the forward slash.
• The “unmatched” elements are known as empty elements
• Elements can be arbitrarily nested within other elements
13. Attributes
• Within elements,
Additional information can be communicated to XML processors
– That modifies the nature of the encapsulated content.
• Attributes are name/value pairs contained within the start element
– That can specify text strings that modify the context of the element.
• Example:
<price currency=”USD”>…</price>
<on_sale start_date=”10-15-2001”/>
14. Entity References
• Some characters have a special meaning in XML,
• Entity references indicate to XML-processing applications
– That a special text string is to follow that will be replaced with a different literal value,
• Entity references are delimited by
– An ampersand at the beginning and
– A semicolon at the ending.
• Ex : Inserting a > sign in our text
<descript> Following says 8 is greater than 5 </descript>
<equation>4 > 5</equation>
Major Entity References Character
< <
> >
& &
" "
' '
15. Comments
• Comments can be placed anywhere in a document and
– They are not considered to be part of the textual content of an XML document.
• Character sequence <!-- begins a comment and --> ends the comment.
• B/w these 2 delimiters,
– Any text at all can be written, including valid XML markup.
• Only restriction is that
– Comment delimiters cannot be used; neither can the literal string --.
• Example :
<!-- The below element talks about Elephant I once owned... -->
<animal>Elephant</animal>
16. Processing Instructions (PIs)
• PIs are not a textual part of an XML document
– But provide information to applications as to how the content should be processed.
• Unlike comments, XML processors are required to pass along PIs.
• Processing instructions have the following form:
<?instruction options?>
• Instruction name is called the PI target
– It is a special identifier that the processing application is intended to understand.
• Any following information can be optionally specified
• Example: <?send-message “process complete”?>
17. Marked CDATA Sections
• Some documents will contain a large number of characters and text
– That an XML processor should ignore and pass to an application.
• These are known as character data (or CDATA) sections.
• Within an XML document, a CDATA section instructs the parser
– To ignore all markup characters except the end of the CDATA markup instruction.
• This allows for a section of XML code to be “escaped”
– So that it doesn’t inadvertently disrupt XML processing.
• CDATA sections follow this general form:
<![CDATA[content]]>
18. Marked CDATA Sections
• All content contained in the CDATA section is
– Passed as string literals directly to the application without interpretation
• Example:
<object_code>
<![CDATA[
function master(poltice integer) {
if poltice<=3 then {
Mas=poltice+IntToString(FindElement(“<chicken>”));
}
}
]]>
</object_code>
19. Document Type Definitions (DTD)
• Don’t confuse the DOCTYPE with the DTD.
• A DOCTYPE and a DTD serve very different, although related purposes.
– DOCTYPE is used to identify and name the XML content
– DTD is used to validate the metadata contained within.
• DTDs represent a specific form of XML text
– That is allowable in an XML document.
• DTDs and XML Schema are the means for defining the validity constraints
on XML documents
20. XML Content
• XML content can consist of any data, including binary data,
– As long as it doesn’t violate rules that would confuse the content with valid XML
metadata instructions.
• XML content can contain any characters,
– Including any valid Unicode and international characters.
• XML content can be as long as necessary
21.
22. XML document with an internal DTD
• A DTD defines the structure & the legal elements and attributes of an XML
document.
• An application can use a DTD to verify that XML data is valid.
• If the DTD is declared inside the XML file,
– It must be wrapped inside the <!DOCTYPE> definition.
• Document Type Declaration (DOCTYPE) gives a name to the XML
content
23. Document Type Declaration (DOCTYPE)
• A DTD defines the structure & the legal elements and attributes of an XML
document.
• An application can use a DTD to verify that XML data is valid.
• If the DTD is declared inside the XML file,
– It must be wrapped inside the <!DOCTYPE> definition.
• Document Type Declaration (DOCTYPE) gives a name to the XML
content