Deriving DataSet from XML Schema

How XML Schema Authoring Styles Influence DataSet Derivation

QTAssistant provides an easy to use feature to derive and visualize an ADO.NET DataSet, from an XML or an XML Schema (XSD) file. The core functionality comes from the .NET Framework and is documented on MSDN. This tool is available as an ADO.NET DataSet tab page in XML and XSD editors.

Using this method to understand an XSD, or to facilitate writing SQL queries for analysis of XML files, sometimes yields unexpected results; the most common issue is missing relations between tables.

This topic demonstrates how two different XSD authoring styles, describing the same XML structure, affect the inference of a DataSet. This should help users understand the options they have to overcome what may be seen as a limitation in the .NET framework.

 

The XML Document

For illustration we are using a simplified document, derived from the W3C Purchase Order (PO) example XML Schema.

<?xmlversion="1.0"?>
<purchaseOrder xmlns="http://tempuri.org/po.xsd">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
    </shipTo>
    <billTo country="US">
        <name>Robert Smith</name>
        <street>8 Oak Avenue</street>
        <city>Old Town</city>
        <state>PA</state>
        <zip>95819</zip>
    </billTo>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity>1</quantity>
            <USPrice>148.95</USPrice>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <USPrice>39.98</USPrice>
            <shipDate>1999-05-21</shipDate>
        </item>
    </items>
</purchaseOrder>

 

PO XML Diagram
PO XML Diagram
(Click to Enlarge)

The expectation of most users is to see a relational structure having five interconnected tables. However, the ADO.NET DataSet view in the QTAssistant XML Editor shows a different layout. A major drawback, often pointed out, is that such a relational structure would not be able to store the content of more than one XML document. For instance, none of the top level entities have an unique identifier to associate an entry with an XML document.

PO DataSet with missing relations
PO DataSet with missing relations relations

Figure 1.

 

XML Schema - Version 1

The inferred XML Schema for the above XML document is shown below.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://tempuri.org/po.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
<xs:element name="purchaseOrder">
   
<xs:complexType>
     
<xs:sequence>
       
<xs:element name="shipTo">
         
<xs:complexType>
           
<xs:sequence>
             
<xs:element name="name" type="xs:string" />
             
<xs:element name="street" type="xs:string" />
             
<xs:element name="city" type="xs:string" />
             
<xs:element name="state" type="xs:string" />
             
<xs:element name="zip" type="xs:unsignedInt" />
           
</xs:sequence>
           
<xs:attribute name="country" type="xs:string" use="required" />
         
</xs:complexType>
       
</xs:element>
       
<xs:element name="billTo">
         
<xs:complexType>
           
<xs:sequence>
             
<xs:element name="name" type="xs:string" />
             
<xs:element name="street" type="xs:string" />
             
<xs:element name="city" type="xs:string" />
             
<xs:element name="state" type="xs:string" />
             
<xs:element name="zip" type="xs:unsignedInt" />
           
</xs:sequence>
           
<xs:attribute name="country" type="xs:string" use="required" />
         
</xs:complexType>
       
</xs:element>
       
<xs:element name="items">
         
<xs:complexType>
           
<xs:sequence>
             
<xs:element maxOccurs="unbounded" name="item">
               
<xs:complexType>
                 
<xs:sequence>
                   
<xs:element name="productName" type="xs:string" />
                   
<xs:element name="quantity" type="xs:unsignedByte" />
                   
<xs:element name="USPrice" type="xs:decimal" />
                   
<xs:element minOccurs="0" name="shipDate" type="xs:date" />
                   
<xs:element minOccurs="0" name="comment" type="xs:string" />
                 
</xs:sequence>
                 
<xs:attribute name="partNum" type="xs:string" use="required" />
               
</xs:complexType>
             
</xs:element>
           
</xs:sequence>
         
</xs:complexType>
       
</xs:element>
     
</xs:sequence>
   
</xs:complexType>
 
</xs:element>
</xs:schema>

PO XSD Diagram
PO XSD Diagram
(Click to Enlarge)

 

The derived relational structure is the same as Figure 1.

XML Schema - Version 2

One way to force the creation of a table for the document element (here purchaseOrder) is to make its type global.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns="http://tempuri.org/po.xsd" targetNamespace="http://tempuri.org/po.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
<xs:element name="purchaseOrder" type="purchaseOrder"/>
   
<xs:complexType name="purchaseOrder">
     
<xs:sequence>
       
<xs:element name="shipTo">
         
<xs:complexType>
           
<xs:sequence>
             
<xs:element name="name" type="xs:string" />
             
<xs:element name="street" type="xs:string" />
             
<xs:element name="city" type="xs:string" />
             
<xs:element name="state" type="xs:string" />
             
<xs:element name="zip" type="xs:unsignedInt" />
           
</xs:sequence>
           
<xs:attribute name="country" type="xs:string" use="required" />
         
</xs:complexType>
       
</xs:element>
       
<xs:element name="billTo">
         
<xs:complexType>
           
<xs:sequence>
             
<xs:element name="name" type="xs:string" />
             
<xs:element name="street" type="xs:string" />
             
<xs:element name="city" type="xs:string" />
             
<xs:element name="state" type="xs:string" />
             
<xs:element name="zip" type="xs:unsignedInt" />
           
</xs:sequence>
           
<xs:attribute name="country" type="xs:string" use="required" />
         
</xs:complexType>
       
</xs:element>
       
<xs:element name="items">
         
<xs:complexType>
           
<xs:sequence>
             
<xs:element maxOccurs="unbounded" name="item">
               
<xs:complexType>
                 
<xs:sequence>
                   
<xs:element name="productName" type="xs:string" />
                   
<xs:element name="quantity" type="xs:unsignedByte" />
                   
<xs:element name="USPrice" type="xs:decimal" />
                   
<xs:element minOccurs="0" name="shipDate" type="xs:date" />
                   
<xs:element minOccurs="0" name="comment" type="xs:string" />
                 
</xs:sequence>
                 
<xs:attribute name="partNum" type="xs:string" use="required" />
               
</xs:complexType>
             
</xs:element>
           
</xs:sequence>
         
</xs:complexType>
       
</xs:element>
     
</xs:sequence>
   
</xs:complexType>
</xs:schema>

For the inferred DataSet, the relations are now displayed as expected.

 

PO DataSet with relations
PO DataSet with relations

 

Conclusion

This paper demonstrated the impact the XML Schema authoring styles have on deriving a ADO.NET DataSet by showing how changing the type of the root element from anonymous to global creates new tables and relationships in the DataSet.

References

Inferring Relationships (ADO.NET)  

Summary of the DataSet Schema Inference Process