Tuesday, April 01, 2008

SQL Server: XQuery/XPath, Predicate Functions

Previously we have looked at the supported XQuery/XPath retrieval functions implemented in SQL Server 2005 with a few examples of use, and we have mentioned the fact that some functions are restricted to use in XPath predicates. Today we will see which XQuery/XPath functions are subject to this restriction.

Any attempt at using these restricted functions to retrieve data from the XML would result in the following exception:

Msg 2371, Level 16, State 1, Line 302
XQuery [query()]: '%function%' can only be used within a predicate or XPath 
selector

Where do we begin?

First of all, you'll need the XML sample (right-click + Save target) and the XML Schema that we've used in the previous post. Review each script and execute against your favorite testing database.

Of course, the XML Schema script must be executed first.


Predicate predicaments

The purpose of predicates is to restrict the result set returned by the XQuery or the XPath expression. Think of the XPath predicate as being to XQuery as the WHERE clause is to the T-SQL query.

The XPath predicate is an optional part of the XPath expression. It must follow the XPath reference to an XML node and must be enclosed in square brackets. For instance, if we wanted to retrieve the values of the Price element of all the books in our XML sample, we would use the following XPath expression:

/Books/Book/Price

However, to retrieve only the price for the third book in the sample we would also need the XPath predicate:

/Books/Book[3]/Price

...which is short for:

/Books/Book[position() = 3]/Price

Which, incidentally, brings us to our first predicate function.

NB: *any* supported XQuery/XPath function can be used inside an XPath predicate in SQL Server 2005 XML queries.


Only two?

position()

To retrieve the title of the third book in our XML we'd use either of these queries:

select Books.Book.query(
  'Title'
  ).value(
   '.'
   ,'varchar(80)'
   ) as TitleOfThirdBook
 from @Books.nodes('/Books/Book[position() = 3]') Books (Book)

select Books.Book.query(
  'Title'
  ).value(
   '.'
   ,'varchar(80)'
   ) as TitleOfThirdBook
 from @Books.nodes(
  '/Books/Book[3]'
  ) Books (Book)

Of course it would also be quite usefull to actually retrieve the position of a given node, wouldn't it? In SQL Server 2005 there's little we can do with the position() function directly, but by using a table of numbers we can circumvent this limitation:

declare @NumbersOnTheFly table
 (
 Number int primary key
 )

-- Use the spt_values table in the master database to create a table of numbers
insert @NumbersOnTheFly
 (
 Number
 )
 select Number
   = row_number()
    over
    (
    order by [name]
    )
  from master.dbo.spt_values

select NumbersOnTheFly.Number as Position
 ,Books.Book.query(
  'Book[position() = sql:column("NumbersOnTheFly.Number")]/Title'
  ).value(
   '.'
   ,'varchar(80)'
   ) as Title
 from @Books.nodes('/Books') Books (Book)
  cross join @NumbersOnTheFly NumbersOnTheFly
 where (NumbersOnTheFly.Number <= Books.Book.query(
   'count(Book)'
   ).value(
    '.'
    ,'int'
    ))

Using the sql:column function (look it up in Books Online) we can correlate the results of the XQuery with the results of the T-SQL query. The initial correlation is achieved by issuing another XQuery in the WHERE clause (i.e. using count(Book) to get the total number of Book elements), thus restricting the rows of the numbers table to the actual number of nodes targeted by the main XQuery expression passed to the nodes XML method in the FROM clause, followed by the "data-driven" query XML method in the SELECT statement.

If the position() XQuery/XPath function was not restricted to predicate use, we could have used something much simpler:

select Books.Book.query(
  'position()'
  ).value(
   '.'
   ,'int'
   ) as Position
 ,Books.Book.query(
  'Title'
  ).value(
   '.'
   ,'varchar(80)'
   ) as Title
 from @Books.nodes(
  '/Books/Book'
  ) Books (Book)

...not in SQL Server 2005, though.

Take a look at the execution plan for the workaround, and imagine the execution plan if the workaround were not needed. ;)


last()

The last() XQuery/XPath function is closely connected with the position() XQuery/XPath function; it returns the index of the last node in the node set returned by a given XPath expression. To retrieve the title of the last book in the XML we can use either of these queries:

select Books.Book.query(
  'Title').value(
   '.'
   ,'varchar(80)'
   ) as TitleOfLastBook
 from @Books.nodes(
  '/Books/Book[last()]'
  ) Books (Book)

select Books.Book.query(
  'Title'
  ).value(
   '.'
   ,'varchar(80)'
   ) as TitleOfLastBook
 from @Books.nodes(
  '/Books/Book[position() = last()]'
  ) Books (Book)

Why is there no first() XQuery/XPath function? Think about it: the first index can always be known in advance (= 1). :) On the other hand, knowing the last index would require counting the nodes (e.g. using the count() function) and additional processing that can be avoided thanks to the last() function.


Mental aggregation

So, here they are – the two XQuery/XPath functions the use of which is restricted to XPath predicates. Both of them are essentially in conflict with one of the principal concepts of the set theory, namely that sets are not ordered. In XML, on the other hand, order is preserved (though not guaranteed), so functions like these make sense.

And we've also seen one more use for the table of numbers (for the time being).

Anyway, there are still a few XQuery/XPath functions in SQL Server 2005 that we haven't discussed yet – so stay tuned.


ML