Combining XSDs

Combining XSDs

The XML Schema specification (XSD) provides support for schema composition through include, import and redefine clauses. Several large models are released using files described by complex graphs, sometimes with cycles and confusing redefinitions. To further complicate things, there are products and tools (e.g. Microsoft SQL Server 2012) which do not support include and redefine.

Many then ask how to combine XSD files using some sort of automation; XSLT, C#, Java code or XML Schema editors with built-in functionality are the typical answers. A common issue is that the proposed solutions are too simple for the problem they're supposed to solve.

Below we explore the subject of combining multiple XSD files and provide illustrations based on capabilities found in our QTAssistant product. It is also assumed that an XSD file is what most people are used to see: an XML document where the document element is a {http://www.w3.org/2001/XMLSchema}schema element.

Background  

The highlighted attributes of a schema element are important when combining XSD files, and should be considered always; others, such as id and xml:lang, may be ignored.

<schema
attributeFormDefault = (qualified | unqualified) : unqualified
blockDefault = (#all | List of (extension | restriction | substitution)) : ''
elementFormDefault = (qualified | unqualified) : unqualified
finalDefault = (#all | List of (extension | restriction | list | union)) : ''
id =
ID
targetNamespace = anyURI
version = token
xml:lang = language
{any attributes with non-schema namespace . . .}>
Content: ((
include | import | redefine | annotation)*, (((simpleType | complexType | group | attributeGroup) | element | attribute | notation), annotation*)*)
</schema>

The schema element is the document element of what is commonly understood as an XSD file; By itself then an XSD file can only target one XML namespace, since there is only one targetNamespace attribute per XSD file.

The XSD specification does not define an "XSD" serialization (file) format, so the above inference may cause reason for disagreement. In other words, one could define a format (see the XSV Demo) where multiple schema elements may coexist in the same file. The WSDL specification allows for multiple schema elements under the types section. This note assumes the common meaning and format of XSD files as understood by most tools, and understanding that this is not endorsed, nor described by, and therefore not a limitation of, the W3C XSD specification.

An include is different than an import:

  • An include must reference an XSD file that has the same target namespace, OR no target namespace definition at all.
  • An import must reference an XSD file that has a different target namespace.

A redefine is similar to an include, in that the namespaces must match or the redefined reference must have no target namespace.

The above reasons lead to the following conclusion:

A multi-namespace set of schema files must be composed from several XSD files.

The minimum number of XSD files is a function of existence (TRUE/FALSE) of global elements (that can show as roots in XML documents) without a namespace, and the number of other namespaces:

TRUE: 1 (for an XSD with no target namespace definition) + the number of other referenced XML namespaces;

FALSE: the number of other referenced XML namespaces.

It is important to remember the difference between a global element without a namespace, and a local element (contained by another element) without namespace; the same goes for attributes. From an XSD authoring perspective, this distinction can be controlled using two mechanisms:

  • Through defaults at the schema file level, using the elementFormDefault and attributeFormDefault attributes of the schema element.
  • Local overrides, using the form attribute when applied to an element or attribute local definition.

XSD files composed using include make them candidates for combining all into one XSD file. But things get tricky when different values are used for the schema element attributes we've highlighted  above.

Simple Scenario: Strongly Connected XSD

A strongly connected XSD is a file from which it is possible to reach any other referenced XSD, by traversing external references. The diagram below shows a simple example of a typical layout; The only strongly connected XSD file is br-2.xsd

Strongly connected XSD
Strongly connected XSD
(Click to Enlarge)

 Complex Scenario: Combining XSDs with different global settings 

As an example, the form of elements and attributes can be controled using two mechanisms:

  • Globally, through elementFormDefault and attributeFormDefault attributes of the schema element.
  • Locally, using the form attribute that applies to an element or attribute definition.

XSD files composed using include are candidates for combining all into one XSD file; things get tricky when different values are used for the schema element attributes we've highlighted above that deal with the form of elements of attributes, or defaults for block and final.

To demonstrate a case where combining XSDs typically fails, the following XSD files were crafted.  

BaseXsd

<?xml version="1.0" encoding="utf-8"?>
<!--XML Schema generated by QTAssistant/XSD Module (http://www.paschidev.com)-->
<xsd:schema xmlns="urn:tempuri-org" attributeFormDefault="unqualified" elementFormDefault="unqualified"
targetNamespace="urn:tempuri-org" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
        <xsd:element name="reused">
                <xsd:complexType>
                        <xsd:sequence>
                                <xsd:element name="some1" type="xsd:string"/>
                                <xsd:element name="some2" type="xsd:string"/>
                                <xsd:element ref="someref"/>
                        </xsd:sequence>
                </xsd:complexType>
        </xsd:element>
        <xsd:element name="someref" type="xsd:int"/>
</xsd:schema>

TopXsd 

<?xml version="1.0" encoding="utf-8"?>
<!--XML Schema generated by QTAssistant/XSR Module (http://www.paschidev.com)-->
<xsd:schema xmlns="urn:tempuri-org" attributeFormDefault="unqualified"
elementFormDefault="qualified" targetNamespace="urn:tempuri-org" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
        <xsd:include schemaLocation="BaseXsd.xsd"/>
        <xsd:element name="root">
                <xsd:complexType>
                        <xsd:sequence>
                                <xsd:element ref="reused"/>
                        </xsd:sequence>
                </xsd:complexType>
        </xsd:element>
</xsd:schema>

 An example of a C# script posted on Technet will generate the following XSD:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:tns="urn:tempuri-org" attributeFormDefault="unqualified"
elementFormDefault="qualified" targetNamespace="urn:tempuri-org" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="reused">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="some1" type="xs:string" />
        <xs:element name="some2" type="xs:string" />
        <xs:element ref="tns:someref" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="someref" type="xs:int" />
  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="tns:reused" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

The problem is that the generated XSD is incorrect due to the mismatch in the default element form at the schema level.