Monday, March 10, 2008

SQL Server: XQuery/XPath, Retrieval Functions

Even after all this time (since the release of SQL Server 2005 which introduced the XML data type) I still notice users having trouble with XQuery/XPath functions when processing (or rather trying to process) XML data in SQL Server, that's why I've decided to dedicate a few more posts to the subject. The ulterior motive here is me learning and testing what (I think) I know. ;)

I've introduced the SQL Server 2005 implementation of XQuery/XPath functions in an earlier post.

In this post we'll be looking at "pure" retrieval functions – in contrast to predicate-only functions. The difference between the former and the latter comes from the limitations emerging from the way the XQuery/XPath standard(s) have been implemented in SQL Server 2005. As most users have already found out (the hard way) the use of some functions is limited to XPath predicates. This time we'll be focusing on those that do not suffer from this limitation. You can read more about the supported XQuery/XPath functions in Books Online. Also note that all retrieval functions can also be used inside XPath predicates.

Have you ever encountered the following exception when using XQuery/XPath functions in SQL Server 2005?

XQuery [query()]: The XQuery syntax '/function()' is not supported.

Take a good look at the examples below – you'll notice that no XPath expression uses the XPath/function() syntax. Although this is valid syntax for XML queries according to the W3C XQuery/XPath Recommendations it is not currently allowed in SQL Server unless used inside XPath predicates (e.g. this is allowed: XPath[function() = 'value']). But as mentioned before – we'll discuss predicates later.


Are we ready?

Almost. We need an XML sample (right-click + Save target) (built from the pubs sample database), and – since I assume most of us have left kindergarten a long time ago – an XML Schema that will provide us with what every data type needs: domain integrity.

Remember: untyped XML is synonymous with unnecessary problems.

Oh, and make sure the XML Schema Collection has been created in your favorite testing database *before* attempting to do the rest of this exercise.


I. Numeric functions

number(), ceiling(), floor(), round()

These four functions provide the basic functionality related to numerical data. If an XPath expression is passed as the sole argument to one of these functions then this XPath expression must evaluate to a singleton.

select Books.BooksCollection.query(
  'Price'
  ).value(
   'number(.)'
   ,'money'
   ) as Price
 ,Books.BooksCollection.query(
  'Price'
  ).value(
   'ceiling(.)'
   ,'money'
   ) as CeilingPrice
 ,Books.BooksCollection.query(
  'Price'
  ).value(
   'floor(.)'
   ,'money'
   ) as FloorPrice
 ,Books.BooksCollection.query(
  'Price'
  ).value(
   'round(.)'
   ,'money'
   ) as RoundPrice
 from @Books.nodes(
    '/Books/Book'
    ) Books (BooksCollection)

A word of caution: absent and/or nilled nodes return an empty sequence that cannot be implicitly converted to numerical data types – hence the use of the number() function for the Price column. Try removing the function and observe the consequences.

The number() function converts the data returned by the XPath expression to a value of type xs:double (equivalent to float in SQL Server), and in contrast with the data() function (discussed further down) properly returns null to the caller when the XPath expression evaluates to an empty sequence.


II. Aggregate functions

Contrary to the W3C XPath Recommendation aggregate XPath functions implemented in SQL Server 2005 do not accept multiple arguments. Only a constant or an XPath expression can thus be passed as the sole argument to any of the aggregate functions, and the XPath expression must evaluate to a singleton or a node set.

sum(), avg(), count()

select Books.BooksCollection.query(
  'sum(Book/Price)'
  ).value(
   '.'
   ,'money'
   ) as PriceTotal
 ,Books.BooksCollection.query(
  'avg(Book/Price)'
  ).value(
   '.'
   ,'money'
   ) as AveragePrice
 ,Books.BooksCollection.query(
  'count(Book/Price)'
  ).value(
   '.'
   ,'int'
   ) as BookCount
 from @Books.nodes(
   '/Books'
   ) Books (BooksCollection)

Consider the XPath expression used for the BookCount column. Does it return the actual number of books in the XML sample?


max()

select Books.BooksCollection.query(
  'max(Book/Royalty)'
  ).value(
   '.'
   ,'int'
   ) as LargestRoyalty
 from @Books.nodes('/Books') Books (BooksCollection)

min()

select Books.BooksCollection.query(
  'min(Book/YtdSales)'
  ).value(
   '.'
   ,'int'
   ) as SmallestYtdSales
 from @Books.nodes(
   '/Books'
   ) Books (BooksCollection)

III. String functions

What can I say about these five functions? Their names pretty much speak for themselves, don't they?

string()

select Books.BooksCollection.query(
  'string(Price)'
  ).value(
   '.'
   ,'nvarchar(20)'
   ) as LocalName
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

Be careful when using the string() function: when the XPath expression passed as the argument evaluates to a node set, the data from *all* elements of the node set is concatenated into a single string.


string-length()

select Books.BooksCollection.query(
  'Title'
  ).value(
   '.'
   ,'nvarchar(80)'
   ) as Title
 ,Books.BooksCollection.query(
  'string-length(Title)'
  ).value(
   '.'
   ,'int'
   ) as TitleLength
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

substring()

select Books.BooksCollection.query(
  'substring(Title, 4, 12)'
  ).value(
   '.'
   ,'nvarchar(12)'
   ) as TitleSubstring
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

concat()

The concat() function is used to concatenate values into a single string. Note that at least two arguments are mandatory, otherwise an exception is raised. Empty sequences (e.g. returned by XPath expressions when no node is matched) are treated as zero-length strings.

select Books.BooksCollection.query(
  'concat(Title, " (", @TitleId, ")")'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as TitleAndTitleId
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

contains()

I would expect the contains() function to be restricted to predicate use, but apparently it can also be used to retrieve data. Of course, it either returns true if the string passed as the second argument is found inside the string passed as the first argument, or false if it isn't.

select Books.BooksCollection.query(
  'Title'
  ).value(
   '.'
   ,'nvarchar(80)'
   ) as Title
 ,Books.BooksCollection.query(
  'contains(Title, "Computer")'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as Contains_Computer_InTitle
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

Warning: if the string passed as the second argument is longer than 4000 characters, contains() returns an empty sequence. This is yet another limitation of the SQL Server 2005 XQuery/XPath implementation.


Update (2009-03-07)

SQL Server 2008 introduces two additional XPath functions that are not supported by SQL Server 2005.

lower-case() and upper-case()

These two functions can make it just a little easier to circumvent live with the fact that XML is case-sensitive.

select Books.Book.query
  (
  'lower-case(Title)'
  ).value
   (
   '.'
   ,'nvarchar(80)'
   ) as LowerCaseTitle
 ,Books.Book.query
  (
  'upper-case(Title)'
  ).value
   (
   '.'
   ,'nvarchar(80)'
   ) as UpperCaseTitle
 from @books.nodes
   ('
   /Books/Book
   ') Books (Book)

IV. Miscellaneous functions

data()

We've mentioned the data() function before. It returns the data from the node(s) specified in the XPath expression in the appropriate type (if the node is typed). For details see the Remarks section in this Books Online article.

select Books.BooksCollection.query(
  'data(@TitleId)'
  ).value(
   '.'
   ,'varchar(6)'
   ) as TitleId
 ,Books.BooksCollection.query(
  'data(@PubDate)'
  ).value(
   '.'
   ,'datetime'
   ) as PubDate
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

I've also mentioned the data() function when discussing the retrieval of XML attributes.


empty()

One might think this function would also be restricted to predicates (testing whether a node is empty or not), but how would we then make a node empty when modifying XML instances?

select Books.BooksCollection.query(
  'empty(Price)'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as EmptyPrice
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

Books Online also suggests using the empty() function together with not() to compensate for the lack of the exists XPath function. I'll give an example when we discuss predicates.


true(), false()

Once again, not only are we allowed to test the state of a bolean type node in XML instances, but we can also modify its value using these two functions.

Both are mentioned here only as a reference.


not()

Essential to logical operations, wouldn't you agree?

select Books.BooksCollection.query(
  'not(Price)'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as NotPrice
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

The query above returns true for books with no Price element.


local-name()

Now this is a function that causes many users migraines. But there's nothing actually wrong with it – provided the syntax limitations (mentioned earlier) are taken under consideration.

select Books.BooksCollection.query(
  'local-name(.)'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as LocalName
 from @Books.nodes(
   '/Books/Book[1]/*'
   ) Books (BooksCollection)

Look closely at the XQuery expression passed to the nodes function and the ones passed to the query and value functions. This is how the node name(s) can be retrieved in SQL Server 2005.


distinct-values()

This can be a very useful function, as demonstrated in another post.

select Books.BooksCollection.query(
  'distinct-values(Book/PubId)'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as PubId
 from @Books.nodes(
   '/Books'
   ) Books (BooksCollection)

Well, there are five more XQuery/XPath retrieval functions implemented in SQL Server 2005, but they deserve special attention, and we'll discuss them later. For instance, the predicate-only functions.


Mental aggregation

So, here they are, (most of) the XQuery/XPath functions implemented in SQL Server 2005 with examples of use.

Ergo:

  • Not all functions have been implemented;
  • Some functions are restricted to predicate use;
  • The XPath/function() syntax is not currently supported in SQL Server 2005;
  • Additional limitations may apply – Books Online is your friend.

ML


p.s. Regarding the BookCount column in the query demonstrating aggregate functions: count(Book/Price) only counts Book elements where the Price element is present, and does not return the number of *all* Book elements in the XML sample.