XML provides a simple and efficient way of storing relational data, especially for the purposes of transporting it from one RDBMS to another. Besides the "natural" technique, where the structure of the XML document is used to represent the relationships, XML Schema also provides three special data types that can be used to define the relationships inside an XML document. I've already discussed the former in my chapter of SQL Server MVP Deep Dives, so in this article we will examine the latter.
The xs:ID
, xs:IDREF
and xs:IDREFS
XML Schema Data Types
The ID
data type is a derived XML Schema data type:
The top-most type above is the string
data type, a primitive XML Schema data type, and the types following it are derived from string
by restriction – each derivate is more restrictive in respect to its properties and use than its base type.
While a valid string
node can contain any character, a valid ID
must begin with a letter or an underscore ("_") character, may contain any numeral and/or letter, but may only contain a restricted set of punctuation characters (the dot ".", the dash "-", and the underscore "_"), and it must not contain any blank characters (spaces, tabs, carriage-return or line-feed characters).1
The purpose of the ID
data type is to serve as a key – to uniquely identify each individual element of an XML document. Therefore, each ID
value inside an individual XML document must be unique. This also serves a second purpose, namely to allow each uniquely identified element to be referenced by one or more related elements of the same document – via an IDREF
or an IDREFS
attribute.
The IDREF
data type is derived from the ID
type – also by restriction: IDREF
nodes can only contain valid ID
values that exist inside the same XML document. For instance, in the following XML document, the address/@address_id
attribute (typed as ID
) represents a unique key of each address
element, and the person/@address_id
attribute (typed as IDREF
) represents a reference to an address
element. However, in this particular example only the first person
element contains a valid reference, while the second one does not:
An address
element with @address_id
of "A3" does not exist in the above document.
The IDREFS
data type is derived from the IDREF
data type by list – it may contain a single ID
reference or a space-delimited list of ID
references. For instance, in the following example, the person/@address_id
attribute (this time typed as IDREFS
) contains multiple references:
NB: If the examples in this article aren't displayed correctly, and you're using Internet Explorer 9, please, use Compatibility View.
To illustrate the use of XML to store relational data by implementing these special XML Schema data types, I've created two XML documents from the data available in the pubs database, which is (or rather, used to be) a simplistic, yet robust data sample used by SQL Server documentation. The sample database is (still) available online.
For those of you who never worked with the pubs sample: it represents a database used in a library application used in the management of information about books, authors, publishers, etc. In this article I use two particular relationships from the pubs data model:
- Each book is published by exactly one publisher, and each publisher can publish zero, one or more books;
- Each book is written by one or more authors, and each author has written one or more books.2
Publishers and Titles
The relationship between the publishers and the books is shown in the following XML document:
The document contains a set of publishers (publisherCollection
), each with a unique identifier stored in the @publisherId
attribute (publisher/@publisherId
), and a set of books (titleCollection
), each with a valid reference to a publisher (title/@publisherId
).
The following XML Schema governs the integrity of the "Publishers and Titles" XML document:
I prepared a script that you can use to create the corresponding XML Schema Collection in your copy of the pubs database. You will need to do so in order to try out the examples in this article. (NB: you can safely remove both XML Schema Collections presented in this article after you're done experimenting.)
I used the following T-SQL query to compose the first XML document, demonstrating the relationship between the Publishers and the Titles:
begin with xmlnamespaces ( 'http://www.w3.org/2001/XMLSchema' as xs ,'http://schemas.milambda.net/pubs' as bp ) select cast(( select ( select [@bp:publisherId] = N'Pub_' + publishers.pub_id ,publishers.pub_name as [bp:name] ,publishers.city as [bp:city] ,publishers.[state] as [bp:state] ,publishers.country as [bp:country] from dbo.publishers where (exists ( select * from dbo.titles where (titles.title_id like 'BU%') and (titles.pub_id = publishers.pub_id) )) for xml path('bp:publisher'), root('bp:publisherCollection'), elements xsinil, type ) ,( select [@bp:titleId] = N'Titl_' + titles.title_id ,[@bp:publisherId] = N'Pub_' + titles.pub_id ,titles.title as [bp:title] ,titles.[type] as [bp:type] ,titles.price as [bp:price] from dbo.titles where (titles.title_id like 'BU%') for xml path('bp:title'), root('bp:titleCollection'), elements xsinil, type ) for xml path(''), root('bp:pubs'), type) as xml(dbo.BookPublisher)) end
The query above produces a typed XML document. Proper typing is needed, not only to preserve data integrity, but also to provide the XML processor with information about the relationships that exist inside the XML document.
Titles and Authors
The relationship between the authors and the books is shown in the following XML document:
This document contains a set of authors (authorCollection
), each with a unique identifier stored in the @authorId
attribute (author/@authorId
), and a set of books (titleCollection
), each with a valid set of references to one or more authors (title/@authorIds
).
The "Titles and Authors" XML document is also governed by an XML Schema:
You can use this script to create the corresponding XML Schema Collection in the pubs database.
This XML document was also composed using T-SQL, and is also typed accordingly:
begin with xmlnamespaces ( 'http://www.w3.org/2001/XMLSchema' as xs ,'http://schemas.milambda.net/pubs' as ba ) select cast(( select ( select [@ba:authorId] = N'Auth_' + authors.au_id ,authors.au_lname as [ba:lastName] ,authors.au_fname as [ba:firstName] from dbo.authors where (exists ( select * from dbo.titleauthor inner join dbo.titles on titles.title_id = titleauthor.title_id where (titles.title_id like 'BU%') and (titleauthor.au_id = authors.au_id) )) for xml path('ba:author'), root('ba:authorCollection'), type ) ,( select [@ba:titleId] = N'Titl_' + titles.title_id ,[@ba:authorIds] = ( select [authorId] = N'Auth_' + titleauthor.au_id from dbo.titleauthor where (titleauthor.title_id = titles.title_id) for xml path('titleAuthor'), type ).query('distinct-values(titleAuthor/authorId)').value('.', 'nvarchar(max)') ,titles.title as [ba:title] ,titles.[type] as [ba:type] ,titles.price as [ba:price] from dbo.titles where (titles.title_id like 'BU%') for xml path('ba:title'), root('ba:titleCollection'), elements xsinil, type ) for xml path(''), root('ba:pubs'), type) as xml(dbo.BookAuthor)) end
Since more than one author could have collaborated on a single book, there could be multiple references from a Title to the Authors. SQL Server implements the fn:distinct-values
XPath function that I've used in the query above to list multiple references, representing the many-to-many relationship between a Title and its Authors.
The function accepts a single argument, namely an XPath expression pointing to one or more nodes, and returns the atomic values from the resulting node set in form of a space-separated string, without duplicates, which is exactly what we need in our case – a set of references, to be placed in an IDREFS
attribute.
The fn:id
XPath Function
To query the XML documents composed in this article – for instance, to retrieve the Publisher who published a particular Book, or to retrieve a list of Authors who wrote it – we can use the fn:id
XPath function, introduced in SQL Server 2005.
The current SQL Server implementation of this function accepts a single argument, namely an XPath expression pointing to an xs:IDREF
(or xs:IDREFS
) node containing a reference (or a list of references). The function returns a set of related XML nodes, containing the corresponding ID
values.
- To retrieve a list of publishers whose books cost less than $11 from the "Publishers and Titles" XML document:
begin with xmlnamespaces ( 'http://www.w3.org/2001/XMLSchema' as xs ,'http://schemas.milambda.net/pubs' as bp ) select Pubs.Publisher.query('bp:name').value ('.', 'varchar(40)') as PublisherName from @pubs.nodes (' id(data(/bp:pubs/bp:titleCollection/bp:title [bp:price < 11]/@bp:publisherId)) ') Pubs (Publisher) end
Result:PublisherName ---------------------------------------- New Moon Books (1 row(s) affected)
- To retrieve a list of authors whose books cost less than $12 from the "Titles and Authors" XML Document:
begin with xmlnamespaces ( 'http://www.w3.org/2001/XMLSchema' as xs ,'http://schemas.milambda.net/pubs' as ba ) select Pubs.Author.query('ba:lastName').value ('.', 'varchar(40)') as LastName ,Pubs.Author.query('ba:firstName').value ('.', 'varchar(20)') as FirstName from @pubs.nodes (' id(data(/ba:pubs/ba:titleCollection/ba:title [ba:price < 12]/@ba:authorIds)) ') Pubs (Author) end
Result:LastName FirstName ---------------------------------------- -------------------- Green Marjorie O'Leary Michael MacFeather Stearns (3 row(s) affected)
Can you think of another alternative for each of these queries? Is it more or less complex than the one with the fn:id
function?
The fn:idref
XPath function
The fn:idref
XPath function should do exactly the opposite – for instance, return a list of Books published by a particular Publisher, or a list of Books written by a particular Author. I say should, because unfortunately the fn:idref
function is not implemented in SQL Server.
To query a relationship from the perspective of the key, we have to resort to slightly more elaborate techniques. For example, using XQuery:
- To retrieve a list of books published by a publisher from Berkeley:
begin with xmlnamespaces ( 'http://www.w3.org/2001/XMLSchema' as xs ,'http://schemas.milambda.net/pubs' as bp ) select Titles.Title.query('bp:title').value('.', 'varchar(80)') as Title from @pubs.nodes (' for $p in /bp:pubs/bp:publisherCollection/bp:publisher [bp:city = "Berkeley"]/@bp:publisherId return /bp:pubs/bp:titleCollection/bp:title[@bp:publisherId = $p] ') Titles (Title) end
Result:Title -------------------------------------------------------------------------------- The Busy Executive's Database Guide Cooking with Computers: Surreptitious Balance Sheets Straight Talk About Computers (3 row(s) affected)
- To retrieve a list of books published by an author with the last name of Green:
begin with xmlnamespaces ( 'http://www.w3.org/2001/XMLSchema' as xs ,'http://schemas.milambda.net/pubs' as ba ) select Titles.Title.query('ba:title').value('.', 'varchar(80)') as Title from @pubs.nodes (' for $a in /ba:pubs/ba:authorCollection/ba:author [ba:lastName = "Green"]/@ba:authorId return /ba:pubs/ba:titleCollection/ba:title [contains(string(@ba:authorIds), $a)] ') Titles (Title) end
Result:Title -------------------------------------------------------------------------------- The Busy Executive's Database Guide You Can Combat Computer Stress! (2 row(s) affected)
Relational Data and XML
There you have it – yet another method of using XML as a means of transporting relational data. This time enforced in composition by dedicated XML Schema data types, and assisted in retrieval by the corresponding XPath function(s). Why wait until the data has reached its destination to verify whether relationships have been preserved accordingly?
Update: If the pubs database is not available in your environment, and if you're merely interested in the retrieval part of this exercise, you can find a simplified variation of the samples in another post.
ML
1 You can find all the details in the XML Schema Datatypes W3C Recommendation.
2 How would we call an author who hasn't written any books? ;)
1 comment:
This is great stuff! Thank you for a really informative post.
--Brad
Post a Comment