Wednesday, June 01, 2011

XPath ID for People Avoiding Pubs

I discussed the ID XPath function in a recent post. I composed the data samples used in that post from the data available in the pubs sample database. Since this particular database may not be a very popular commodity these days, I've also prepared a "pubs-free" alternative – using the same data as before, but with fewer obstacles.

The sample consists of two scripts showcasing the examples I've provided in my principal post – the retrieval of relational data from an XML document using special XML Schema data types. You can execute the scripts in your favorite testing database, regardless of whether the pubs database is present on your system or not. Of course, you should review the code before attempting to execute it, and follow the comments describing each individual execution step provided for you in the scripts.

The examples above demonstrate the retrieval part of the operation, XML composition is demonstrated in the earlier post.

Enjoy!


ML


p.s. This may not be the right time of year to avoid pubs, though. ;)

Monday, May 09, 2011

SQL Server: The XPath ID Function

XML provides a simple and efficient way of storing relational data, especially for the purposes of transporting it from one RDBMS to another. Besides the "natural" technique, where the structure of the XML document is used to represent the relationships, XML Schema also provides three special data types that can be used to define the relationships inside an XML document. I've already discussed the former in my chapter of SQL Server MVP Deep Dives, so in this article we will examine the latter.


The xs:ID, xs:IDREF and xs:IDREFS XML Schema Data Types

The ID data type is a derived XML Schema data type:

The top-most type above is the string data type, a primitive XML Schema data type, and the types following it are derived from string by restriction – each derivate is more restrictive in respect to its properties and use than its base type.

While a valid string node can contain any character, a valid ID must begin with a letter or an underscore ("_") character, may contain any numeral and/or letter, but may only contain a restricted set of punctuation characters (the dot ".", the dash "-", and the underscore "_"), and it must not contain any blank characters (spaces, tabs, carriage-return or line-feed characters).1

The purpose of the ID data type is to serve as a key – to uniquely identify each individual element of an XML document. Therefore, each ID value inside an individual XML document must be unique. This also serves a second purpose, namely to allow each uniquely identified element to be referenced by one or more related elements of the same document – via an IDREF or an IDREFS attribute.

...
ID
IDREF ... IDREFS

The IDREF data type is derived from the ID type – also by restriction: IDREF nodes can only contain valid ID values that exist inside the same XML document. For instance, in the following XML document, the address/@address_id attribute (typed as ID) represents a unique key of each address element, and the person/@address_id attribute (typed as IDREF) represents a reference to an address element. However, in this particular example only the first person element contains a valid reference, while the second one does not:

An address element with @address_id of "A3" does not exist in the above document.

The IDREFS data type is derived from the IDREF data type by list – it may contain a single ID reference or a space-delimited list of ID references. For instance, in the following example, the person/@address_id attribute (this time typed as IDREFS) contains multiple references:

NB: If the examples in this article aren't displayed correctly, and you're using Internet Explorer 9, please, use Compatibility View.

To illustrate the use of XML to store relational data by implementing these special XML Schema data types, I've created two XML documents from the data available in the pubs database, which is (or rather, used to be) a simplistic, yet robust data sample used by SQL Server documentation. The sample database is (still) available online.

For those of you who never worked with the pubs sample: it represents a database used in a library application used in the management of information about books, authors, publishers, etc. In this article I use two particular relationships from the pubs data model:

  1. Each book is published by exactly one publisher, and each publisher can publish zero, one or more books;
  2. Each book is written by one or more authors, and each author has written one or more books.2

Publishers and Titles

The relationship between the publishers and the books is shown in the following XML document:

The document contains a set of publishers (publisherCollection), each with a unique identifier stored in the @publisherId attribute (publisher/@publisherId), and a set of books (titleCollection), each with a valid reference to a publisher (title/@publisherId).

The following XML Schema governs the integrity of the "Publishers and Titles" XML document:

I prepared a script that you can use to create the corresponding XML Schema Collection in your copy of the pubs database. You will need to do so in order to try out the examples in this article. (NB: you can safely remove both XML Schema Collections presented in this article after you're done experimenting.)

I used the following T-SQL query to compose the first XML document, demonstrating the relationship between the Publishers and the Titles:

begin
 with xmlnamespaces
  (
  'http://www.w3.org/2001/XMLSchema' as xs
  ,'http://schemas.milambda.net/pubs' as bp
  )
 select cast((
  select (
   select [@bp:publisherId]
     = N'Pub_'
     + publishers.pub_id
    ,publishers.pub_name as [bp:name]
    ,publishers.city as [bp:city]
    ,publishers.[state] as [bp:state]
    ,publishers.country as [bp:country]
    from dbo.publishers
    where (exists (
      select *
       from dbo.titles
       where (titles.title_id like 'BU%')
        and (titles.pub_id = publishers.pub_id)
      ))
    for xml path('bp:publisher'), root('bp:publisherCollection'),
        elements xsinil, type
   )
   ,(
   select [@bp:titleId]
     = N'Titl_'
     + titles.title_id
    ,[@bp:publisherId]
     = N'Pub_'
     + titles.pub_id
    ,titles.title as [bp:title]
    ,titles.[type] as [bp:type]
    ,titles.price as [bp:price]
    from dbo.titles
    where (titles.title_id like 'BU%')
    for xml path('bp:title'), root('bp:titleCollection'),
        elements xsinil, type
   )
  for xml path(''), root('bp:pubs'), type) as xml(dbo.BookPublisher))
end

The query above produces a typed XML document. Proper typing is needed, not only to preserve data integrity, but also to provide the XML processor with information about the relationships that exist inside the XML document.


Titles and Authors

The relationship between the authors and the books is shown in the following XML document:

This document contains a set of authors (authorCollection), each with a unique identifier stored in the @authorId attribute (author/@authorId), and a set of books (titleCollection), each with a valid set of references to one or more authors (title/@authorIds).

The "Titles and Authors" XML document is also governed by an XML Schema:

You can use this script to create the corresponding XML Schema Collection in the pubs database.

This XML document was also composed using T-SQL, and is also typed accordingly:

begin
 with xmlnamespaces
  (
  'http://www.w3.org/2001/XMLSchema' as xs
  ,'http://schemas.milambda.net/pubs' as ba
  )
 select cast((
  select (
   select [@ba:authorId]
     = N'Auth_'
     + authors.au_id
    ,authors.au_lname as [ba:lastName]
    ,authors.au_fname as [ba:firstName]
    from dbo.authors
    where (exists (
      select *
       from dbo.titleauthor
        inner join dbo.titles
          on titles.title_id = titleauthor.title_id
       where (titles.title_id like 'BU%')
        and (titleauthor.au_id = authors.au_id)
      ))
    for xml path('ba:author'), root('ba:authorCollection'), type
   )
   ,(
   select [@ba:titleId]
     = N'Titl_'
     + titles.title_id
    ,[@ba:authorIds]
     = (
     select [authorId]
       = N'Auth_'
       + titleauthor.au_id
      from dbo.titleauthor
      where (titleauthor.title_id = titles.title_id)
      for xml path('titleAuthor'), type
     ).query('distinct-values(titleAuthor/authorId)').value('.', 'nvarchar(max)')
    ,titles.title as [ba:title]
    ,titles.[type] as [ba:type]
    ,titles.price as [ba:price]
    from dbo.titles
    where (titles.title_id like 'BU%')
    for xml path('ba:title'), root('ba:titleCollection'),
        elements xsinil, type
   )
  for xml path(''), root('ba:pubs'), type) as xml(dbo.BookAuthor))
end

Since more than one author could have collaborated on a single book, there could be multiple references from a Title to the Authors. SQL Server implements the fn:distinct-values XPath function that I've used in the query above to list multiple references, representing the many-to-many relationship between a Title and its Authors.

The function accepts a single argument, namely an XPath expression pointing to one or more nodes, and returns the atomic values from the resulting node set in form of a space-separated string, without duplicates, which is exactly what we need in our case – a set of references, to be placed in an IDREFS attribute.


The fn:id XPath Function

To query the XML documents composed in this article – for instance, to retrieve the Publisher who published a particular Book, or to retrieve a list of Authors who wrote it – we can use the fn:id XPath function, introduced in SQL Server 2005.

The current SQL Server implementation of this function accepts a single argument, namely an XPath expression pointing to an xs:IDREF (or xs:IDREFS) node containing a reference (or a list of references). The function returns a set of related XML nodes, containing the corresponding ID values.

  1. To retrieve a list of publishers whose books cost less than $11 from the "Publishers and Titles" XML document:
    begin
     with xmlnamespaces
      (
      'http://www.w3.org/2001/XMLSchema' as xs
      ,'http://schemas.milambda.net/pubs' as bp
      )
     select Pubs.Publisher.query('bp:name').value
                ('.', 'varchar(40)') as PublisherName
      from @pubs.nodes
        ('
        id(data(/bp:pubs/bp:titleCollection/bp:title
            [bp:price < 11]/@bp:publisherId))
        ') Pubs (Publisher)
    end
    Result:
    PublisherName
    ----------------------------------------
    New Moon Books
    
    (1 row(s) affected)
  2. To retrieve a list of authors whose books cost less than $12 from the "Titles and Authors" XML Document:
    begin
     with xmlnamespaces
      (
      'http://www.w3.org/2001/XMLSchema' as xs
      ,'http://schemas.milambda.net/pubs' as ba
      )
     select Pubs.Author.query('ba:lastName').value
                ('.', 'varchar(40)') as LastName
      ,Pubs.Author.query('ba:firstName').value
                ('.', 'varchar(20)') as FirstName
      from @pubs.nodes
        ('
        id(data(/ba:pubs/ba:titleCollection/ba:title
            [ba:price < 12]/@ba:authorIds))
        ') Pubs (Author)
    end
    Result:
    LastName                                 FirstName
    ---------------------------------------- --------------------
    Green                                    Marjorie
    O'Leary                                  Michael
    MacFeather                               Stearns
    
    (3 row(s) affected)

Can you think of another alternative for each of these queries? Is it more or less complex than the one with the fn:id function?


The fn:idref XPath function

The fn:idref XPath function should do exactly the opposite – for instance, return a list of Books published by a particular Publisher, or a list of Books written by a particular Author. I say should, because unfortunately the fn:idref function is not implemented in SQL Server.

To query a relationship from the perspective of the key, we have to resort to slightly more elaborate techniques. For example, using XQuery:

  1. To retrieve a list of books published by a publisher from Berkeley:
    begin
     with xmlnamespaces
      (
      'http://www.w3.org/2001/XMLSchema' as xs
      ,'http://schemas.milambda.net/pubs' as bp
      )
     select Titles.Title.query('bp:title').value('.', 'varchar(80)') as Title
      from @pubs.nodes
        ('
        for $p in /bp:pubs/bp:publisherCollection/bp:publisher
            [bp:city = "Berkeley"]/@bp:publisherId
        return /bp:pubs/bp:titleCollection/bp:title[@bp:publisherId = $p]
        ') Titles (Title)
    end
    Result:
    Title
    --------------------------------------------------------------------------------
    The Busy Executive's Database Guide
    Cooking with Computers: Surreptitious Balance Sheets
    Straight Talk About Computers
    
    (3 row(s) affected)
  2. To retrieve a list of books published by an author with the last name of Green:
    begin
     with xmlnamespaces
      (
      'http://www.w3.org/2001/XMLSchema' as xs
      ,'http://schemas.milambda.net/pubs' as ba
      )
     select Titles.Title.query('ba:title').value('.', 'varchar(80)') as Title
      from @pubs.nodes
        ('
        for $a in /ba:pubs/ba:authorCollection/ba:author
            [ba:lastName = "Green"]/@ba:authorId
        return /ba:pubs/ba:titleCollection/ba:title
            [contains(string(@ba:authorIds), $a)]
        ') Titles (Title)
    end
    Result:
    Title
    --------------------------------------------------------------------------------
    The Busy Executive's Database Guide
    You Can Combat Computer Stress!
    
    (2 row(s) affected)

Relational Data and XML

There you have it – yet another method of using XML as a means of transporting relational data. This time enforced in composition by dedicated XML Schema data types, and assisted in retrieval by the corresponding XPath function(s). Why wait until the data has reached its destination to verify whether relationships have been preserved accordingly?

Update: If the pubs database is not available in your environment, and if you're merely interested in the retrieval part of this exercise, you can find a simplified variation of the samples in another post.


ML


1 You can find all the details in the XML Schema Datatypes W3C Recommendation.

2 How would we call an author who hasn't written any books? ;)

Tuesday, April 05, 2011

SQL Server Integration Services, Execute SQL with Complex Parameters

If you're familiar with SQL Server Integration Services (SSIS) then you've probably, at one time or another, run into problems with the integration bit – however weird that may sound. I can understand the fact that SSIS is supposed to be generic and universal and platform independent, and therefore not favor a particular DBMS – not even the one that it's a part of. Nonetheless, for years now I have wished for SSIS to support SQL Server just a little better, and I'm still waiting... And as far as I know, so is pretty much everybody else I've talked to about this, ever since SSIS first came out.

In this post I discuss one particular issue with SQL Server Integration Services that has been the source of many headaches for me and I bet for a lot of you as well: complex parameters and the Execute SQL Task.


Rise above Primitive

Just what is a complex parameter? Well, a parameter of a complex data type, of course. For example:


I. XML

Imagine a stored procedure with an XML parameter... Or better yet, take a look at these two:

  • The first one returns the current date and time in an output XML parameter:
    create proc dbo.GetDate_asXml
     (
     @dateAsXml xml   = null  output
     )
    as
    begin
     with xmlnamespaces
      (
      'http://schemas.testing-ground.com' as tg
      )
     select @dateAsXml
       = (
       select getdate() as [@tg:date]
        for xml path('tg:element'), root('tg:entity'), type
       )
     ;
    end
    go
  • The second one accepts an XML input parameter of the type returned by the first procedure, then extracts a date/time value from it, and returns it in a result set:
    create proc dbo.ExtractDate_asXml
     (
     @dateAsXml  xml
     )
    as
    begin
     with xmlnamespaces
      (
      'http://schemas.testing-ground.com' as tg
      )
     select @dateAsXml.query
       ('
       data(/tg:entity[1]/tg:element[1]/@tg:date)
       ').value
        (
        '.'
        ,'datetime'
        ) as ExtractedDateTime
     ;
    end
    go

How do we configure an SSIS Execute SQL Task to successfully execute both these procedures?


Data Source Providers and Data Types

The first choice we need to make is between the two data providers available to the task in question (we're connecting to SQL Server):

  • OLE DB; or
  • ADO.Net.

Next, we need to select the appropriate data types for:

  • the SSIS Package variable(s); and
  • the procedure's parameter(s).

Olé, DB!

According to the SSIS/SQL/OLE DB data type mapping, documented in the MSDN Library (unfortunately, not all of it in the same place, but that's another story), the SQL Server XML data type should map to the NVARCHAR OLE DB data type. Unfortunately, this is only half-true. None of the OLE DB data types, available to the Execute SQL Task in SQL Server Integration Services can be used for XML output parameters! To add insult to injury, the NVARCHAR data type can be used for XML data type input parameters.

This is the exception raised by the Execute SQL Task when trying to use the NVARCHAR data type for an XML output parameter:

Error: 0xC002F210 at %task name%, Execute SQL Task: Executing the query "%query 
text%" failed with the following error: "Implicit conversion from data type xml 
to nvarchar(max) is not allowed. Use the CONVERT function to run this query.". 
Possible failure reasons: Problems with the query, "ResultSet" property 
not set correctly, parameters not set correctly, or connection not established 
correctly.

Much Ado about .Net

The ADO.Net provider supports the XML data type, so there should be no problems here. Eventually. In fact, I've run into a different problem (explained later in this post), but was able to fix that in the end.


Variables

SSIS supports a subset of .Net data types for the SSIS Package Variables. XML is not supported explicitly, although either String or Object will do if the variable is to be used for Execute SQL Task's parameters – either using the OLE DB or the ADO.Net provider.


How Big is Your XML?

I've mentioned a problem earlier, haven't I? For output parameters of the XML data type, ADO.Net expects the parameter size to be set. The magic number that always seems to be accepted is 2147483647 (2 GB or 2^31 - 1 Bytes), otherwise the following exception will be raised by the Execute SQL Task (even if parameter size is left at -1, which is the default, and apparently does not stand for unlimited):

Error: 0xC002F210 at %task name%, Execute SQL Task: Executing the query "%query 
text%" failed with the following error: "String[0]: the Size property has an 
invalid size of 0.". Possible failure reasons: Problems with the query, 
"ResultSet" property not set correctly, parameters not set correctly, or 
connection not established correctly.

II. User-defined CLR Types

In this test I've used the Point User-defined CLR Type (CLR UDT) used as an example in Books Online.

According to documentation, OLE DB as well as ADO.Net generally do support CLR UDTs via their UDT or Udt data types, respectively. However, neither of these types is available to the Execute SQL Task in SQL Server Integration Services. Using the usual candidates, Object or String, fails as well.

On the other hand, every CLR UDT implements the ToString() and the Parse() methods, making it possible to use string representations of the UDT in the Execute SQL Task. This workaround is both: a lot of work and a long way around the problem. Why? First of all, they require a rewrite of the SQL query or the procedure, or the creation of a "wrapper" procedure that executes the actual procedure, and is executed from SSIS instead of the "real" one. Second, CLR UDTs usually implement additional accessors and operators – without them a complex type is incomplete, and its usability limited.


III. Table-valued Parameters

Introduced with SQL Server 2008, Table-valued Parameters (TVPs) provide a way of passing a set (a table) to a SQL Server module (procedure or function). Before TVPs the only way to pass a set of values to a SQL Server module, using Transact-SQL or another programming language, would be to pack the set up into a delimited string, to use XML, or to rely on other, usually significantly more elaborate means.1

SSIS has seen its share of changes for SQL Server 2008, but... long story short: SQL Server Integration Services do not support Table-valued Parameters.

According to documentation (MSDN Library), TVPs are supported by OLE DB via its Object data type (I've never confirmed this, though), and by ADO.Net via its Structured data type (confirmed). However, none of these are available to the Execute SQL Task in SQL Server Integration Services.


Mental Aggregation

And here they are, all the conclusions, neatly packed in a table for your benefit:

  Provider Variable Output Parameter Input Parameter
Data Type Data Type Size Data Type Size
XML OLE DB String or Object N/A NVARCHAR -1 or 2147483647
ADO.Net Xml 2147483647 Xml -1 or 2147483647
CLR UDT N/A
TVP N/A

And the conclusion of all conclusions? XML via ADO.Net seems to be the only fully supported complex parameter in the SSIS Execute SQL Task.

I really hope this changes in the future...


ML


1 Erland Sommarskog has written a few very useful articles dealing with the subject of exchanging data sets between modules.