XSD Design Quality Check

XSD Design Quality Check

Many organizations developing large XML Schema specifications do so based on formal design guidelines. This approach pursues consistent deliverables, automation and interoperable implementations.

However, the effort required to assure adherence to guidelines is what many complain about; and too often, the outcome still disagrees with the guidelines. 

Industry wide specifications tend to be developed independently by consortiums over longer periods of time and are based on more formal engineering principles; proprietary metamodels and governance, often specific to the industry and the geographic area the spec targets, prevail. On the other hand, in-house commissioned models are developed under tighter timelines, with no time and much consideration for ongoing maintenance, and do not undergo same the level of, or diverse scrutiny, as the public standards; this, and the diversity of tools and preferences in authoring or generating XSDs, doesn't help consumers.

Consumers are typically left to their own devices to deal with the specifications, albeit from different perspectives. The problem becomes even more interesting when considering staff turnover and the learning curve associated with the understanding and application of design guidelines.

Our approach to XSD design quality check is similar to that used for static code analysis; it uses the power of Query XSD Analysis to deliver a solution that makes guidelines executable and by extension, models become more testable through automation. We intend to define and make available a set of queries and templates which would allow our users to use out-of-the-box profiles or implement their own design quality checks.

Below we look at solutions we provide to support quality check for some of the most common guideline areas defined virtually in all surveyed specifications.

Naming Conventions

Naming conventions define a system of rules for constructing names used to reference XSD components. These rules constrain allowed characters, words, abbreviations and acronyms, grammar and composition, capitalization (e.g. lower/upper camel case) and use of separators - to reference just a few.

Query XSD supports

  • String and regular expressions functions which allow for compact queries to be written, and to easily implement support for virtually any kind of naming conventions.
  • Spell check functions, including support for custom dictionaries. English, French, Spanish - and any other language for which a Hunspell, Ispell or custom dictionary can be created, are equally supported.

 The sample below shows how to load some of the built in dictionaries, and use then to spell check words using DataSet-SQL.

DECLARE @dictkey NVARCHAR
DECLARE @optionsspelling INT
SET @optionsspelling = SpellCheckOptionsMask(0, 0, 0, 0, 0, 0)
 
SET @dictkey = SpellCheckAddDictionary(NULL, NULL, NULL, NULL, 'en-CA', @optionsspelling)
SELECT SpellCheckIsValid(@dictkey, 'colour')
SET @dictkey = SpellCheckAddDictionary(NULL, NULL, NULL, NULL, 'en-US', @optionsspelling)
SELECT SpellCheckIsValid(@dictkey, 'color')
SET @dictkey = SpellCheckAddDictionary(NULL, NULL, NULL, NULL, 'fr-moderne', @optionsspelling)
SELECT SpellCheckIsValid(@dictkey, 'élève')
SET @dictkey = SpellCheckAddDictionary(NULL, NULL, NULL, NULL, 'es-ES', @optionsspelling)
SELECT SpellCheckIsValid(@dictkey, 'español')
--// X:\path-to\WordsBeyondInc.dic - full path to a text file
--// Add your own words, one per line.
SET @dictkey = SpellCheckAddDictionary('custom', 'X:\path-to\WordsBeyondInc.dic', NULL, NULL, 'en-CA', @optionsspelling)
SELECT SpellCheckIsValid(@dictkey, 'dayliner')

Namespace Use

It is common to see schema profiles that prescribe specific patterns for namespace URIs. Regular expressions, optionally including data extracted from other sources can be used to validate namespace URI.

The example below checks for target namespace URIs as a URN with the NID part must be paschidev-com

DECLARE @mask INT
SET @mask = RegexMask(1, 0, 1, 0, 0, 0, 1)
SELECT XSSchema.TargetNamespace, RegexIsMatch(XSSchema.TargetNamespace, '^urn:paschidev-com:([a-z0-9()+,\-.:=@;$_!*'']|%[0-9a-f]{2})+$', @mask)
FROM XSSchema

While not an XML requirement, some profiles even constrain the use of aliases.  

--// List of XML namespaces and aliases.
SELECT *
FROM XSObjectNamespaces

The chameleon design can also be very easily detected using the query below:

SELECT *
FROM XSSchema S1 INNER JOIN XSSchema S2 on S1.SourceUri = S2.SourceUri
WHERE S1.TargetNamespace IS NULL AND S2.TargetNamespace IS NOT NULL

Modularity

Also referred to as "composition", the query below reports the use of prohibited composition patterns (e.g. xsd:redefine or xsd:include)

SELECT *
FROM XSExternal
WHERE XSExternal.Type in ('XmlSchemaRedefine', 'XmlSchemaInclude')

Extensibility

Extensibility of an XSD comes in many flavors from a pattern perspective, yet most of them seem to rely on the use of wildcard elements/attributes and/or substitution groups. Things get tricky when schema makes use of inheritance (e.g. through extensions).

For example, this query displays file, line number and position of all the xsd:any element wildcard.

Select XSObject.SourceUri,
  XSObject.LineNumber,
  XSObject.LinePosition,
  XSAny.Namespace,
  XSAny.ProcessContents
From XSAny
  Inner Join XSObject On XSObject.RowId = XSAny.RowId

This query displays all the places where a reference to a known head of a substitution group are found.

Select XSObject.SourceUri,
  XSObject.LineNumber,
  XSObject.LinePosition,
  XSElement1.LocalName,
  XSElement1.Namespace
From XSElement
  Inner Join XSElement XSElement1 On XSElement1.RefXSElementRowId =
    XSElement.RowId
  Inner Join XSObject On XSObject.RowId = XSElement1.RowId
Where (XSElement.IsHeadSubstitution = 1) Or
  (XSElement.IsAbstract = 1)

Unless blocked, all references to elements that are of a global type are substitutable. A query against XSElement may indicate what combinations are allowed, etc.

Type Hierarchies

Use of extension/restriction mechanisms in XSDs is sometimes controversial. Some tools have issues in dealing with multiple extension/restrictions in the same hierarchy.

The example below reports the file, line number and position of all the xsd:restriction.

Select XSObject.SourceUri,
  XSObject.LineNumber,
  XSComplexType.LocalName
From XSComplexType
  Inner Join XSComplexContent On XSComplexContent.RowId =
    XSComplexType.ContentModelRowId
  Inner Join XSComplexContentRestriction On XSComplexContentRestriction.RowId =
    XSComplexContent.ContentRowId
  Inner Join XSObject On XSObject.RowId = XSComplexContentRestriction.RowId
Detecting use of xsd:restriction
Detecting use of xsd:restriction
(Click to Enlarge)

The AllDescedents table contains the complete list of relationships between types. It could be used to determine all type hierarchies, and to ensure that within a hierarchy or a branch of it, only one type of inheritance is used (either extension or restriction).  

Documentation

Use of xsd:annotation can be easily checked against an organization policy, ranging from being all empty (or not present) to ensuring the certain annotations are present, in form of xsd:documentation, or xsd:appinfo, etc.

The example below reports where xsd:appinfo are used.

Select XSAnnotationItems.Type,
  XSObject.SourceUri As SourceUri1,
  XSObject.LineNumber As LineNumber1
From XSAnnotationItems
  Inner Join XSObject On XSAnnotationItems.XSObjectRowId = XSObject.RowId
Where XSAnnotationItems.Type = 'XmlSchemaAppInfo'

Authoring Styles

Many organizations/standards prefer a Venetian Blind style XSD (all types are global, elements local except for document elements and support for substitution groups) while others prefer a Salami Slice style (all elements global, types local).

The query below lists all the types that are not global.

Select XSObject.SourceUri,
  XSObject.LineNumber
From XSObject
  Inner Join XSType On XSObject.RowId = XSType.RowId
Where XSObject.IsGlobal = 0

Versioning, Forward/Backward Compatibility

More than one XML Schema set can be loaded in a Query XSD engine. This allows for side by side comparison of data structures. In addition, QTAssistant supports XML Schema comparison and by incorporating the results of the comparison with the ability query across both sets  at the same time provides the ability to define sofisticated change detection patterns. 

References

We use this section to collect a list of references to public XML Schema design guidelines which we intend to provide support for and/or implementation guidance using our tools.

Kulvatunyou, Boonserm and K.C. Morris. XML Schema Design Quality Test Requirement Document. 2004. NIST - National Institute of Standards and Technology.

MedBiquitous Technical Steering Committee. MedBiquitous XML Schema Design Guidelines. 2004. MedBiquitous.

UN/CEFACT. XML Naming and Design Rules. 2006. UN/CEFACT 

Minakawa, Garet, Satish Ramanathan and Michael Rowell. OAGl - OAGIS 9.0 XML Naming and Design Rules Standard. 2005. Open Applications Group