Learning about the basic constructs of XML

A few weeks ago I wrote about relational data models and databases and some of the basic principles which I learned about in an online Stanford course. As part of the same course I recently learned about “XML” which stands for Extensible Markup Language, a standard for data representation and exchange. Having worked in the entertainment industry for a few years now, I’ve often find myself looking at metadata in the form of XML, but it was great to get a good refresher as part of my course.

These are some of the basic constructs of XML:

  • Tagged elements (nested) – an opening and a closing tag
  • Attributes – attributes tend to consist of: a name (unique), an equal sign (=) and an attribute value
  • Text – also known as “character data”, this can be as simple as “Marc Abraham” or “123456”

The instructor, Jennifer Widom, then went on to explain the differences between the relational data model and XML:

Relational data (eg. SQL):

  1. Structure: Tables
  2. Schema: Fixed in advance
  3. Queries: Simple, nice language
  4. Ordering: None, unordered
  5. Implementation: Native

XML:

  1. Structure: Hierarchical tree, graph
  2. Schema: Flexible, “self-describing”
  3. Queries: Less simple, more complex
  4. Order: Implied ordering
  5. Implementation: Add-on

With XML, validation is a key aspect. In an oversimplified way, it comes down to taking an XML document and validate it against a set “XSD” (XML Schema Descriptor). This process determines whether the XML document is valid or invalid (see Fig. 1 below). During the class, Jennifer also highlighted that XML documents contain two file types. First, a schema file which contains the XSD. Second, the actual data file.

I then struggled a bit when Jennifer talked about “DTDs”. I subsequently learned that “DTD” stands for ‘Document Type Definition’ and is a set of markup declarations which defines the legal building blocks of an XML document.

There are four features of an XML schema which aren’t present in DTDs:

  • Key declarations – In DTDs, document or item IDs have to be globally unique. An XML ID can be specified through an attribute value only. This means that you can’t index elements in the XML based on a parent-child relationship (see Fig. 2 below). Key declarations in XML aim to overcome such limitations.
  • Type values  XML Schema has a lot of built-in data types. The most common types are string, decimal, integer, boolean, date and time. I’ve found some useful examples of ‘simple type’ and ‘complex types’ XML schema (see Fig. 3 below).
  • References – References can refer to already defined keys (see my previous point about key declarations) or so-called “typed pointers”. A typed pointer must point to a specific element of the XML (e.g. a string) which in term must confirm to the specification as laid out in the pointer.
  • Currents constraints  In XML one can specify how many times an element type is allowed to occur. One can thus specify a minimum and a maximum number of occurrences.

Main learning point: In her online video on the basics of XML, Jennifer Widom provided a useful overview of XML. Even though I had looked at XML schema before, it was good to understand more about some of the foundations behind XML and XML validation.

Fig. 1 – Sample XML validator – Taken from: http://www.deitel.com/articles/xml_tutorials/20060401/XMLStructuringData/XMLStructuringData_Page4.html

validateLetter1

 

Fig. 2 – Sample XML, highlighting XML ID requirement – Taken from: http://msdn.microsoft.com/en-us/library/aa302297.aspx

<?xml version="1.0"?>
<!DOCTYPE orders [
  <!ELEMENT order ANY>  
  <!ATTLIST order
    orderno ID #REQUIRED   
  >  
]>
<orders>
  <order orderno="id10952" date="4/15/96" shipAddress="Obere Str. 57"/>
  <order orderno="id10535" date="6/13/95" shipAddress="Mataderos  2312"/>
</orders>

 

Fig. 3 – Examples of simple type and complex type XML schema – Taken from: http://www.xmlmaster.org/en/article/d01/c05/

Simple Type Example

<xs:element name=”Department” type=”xs:string” />

Here, the section described together with “xs:string” is an embedded simple type according to XML Schema. In this example, we have established the definition that the data type for the element called “Department” is a text string.

Complex Type Example

<xs:complexType name=”EmployeeType”>
<xs:sequence maxOccurs=”unbounded”>
<xs:element ref=”Name” />
<xs:element ref=”Department” />
</xs:sequence>
</xs:complexType>
<xs:element name=”Name” type=”xs:string” />
<xs:element name=”Department” type=”xs:string” />

In this case the type name “EmployeeType” is designated by the name attribute of the complexType element. A model group (what designates the order of occurrence for the child element) is designated in the child element.

New types are created by placing restrictions on or extending simple or complex types. In this volume, we will discuss restrictions and extensions for simple types.

Related links for further learning:

  1. http://www.rpbourret.com/xml/XMLAndDatabases.htm
  2. http://stackoverflow.com/questions/966901/modeling-xml-vs-relational-database
  3. http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.1.0/com.ibm.db2.udb.apdv.embed.doc/doc/c0023811.htm
  4. http://www.deitel.com/articles/xml_tutorials/20060401/XMLStructuringData/XMLStructuringData_Page4.html
  5. http://en.wikipedia.org/wiki/Document_type_definition
  6. http://www.w3.org/TR/xmlschema-1/
  7. http://msdn.microsoft.com/en-us/library/aa302297.aspx
  8. http://www.w3schools.com/schema/schema_simple.asp
  9. http://www.xmlmaster.org/en/article/d01/c05/
  10. http://www.w3.org/TR/xptr-framework/
  11. http://www.xmlnews.org/docs/xml-basics.html

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: