Friday, August 18, 2006

XPath/XQuery functions in SQL Server 2005

One of the key improvements in SQL Server 2005 is the implementation of XML as a native data type. At first glance this may not seem like such a big task (after all, it's just one more data type, right, and it's been a standard for years) but after considering the amount of additional features involved, the task of implementing it seems far from trivial:

  • XML Schema - a proper data type requires a constrainable domain and the XML data type even comes with its own validation standard;
  • XML Indexes - what good is a column data type that can't be indexed?
  • Full-text indexing - ok, the XML word breaker already existed in SQL 2000, nonetheless it's a feature worth mentioning;
  • XML Methods - complex data types (in contrast to primitive data types) usually come with native data access methods and the SQL XML implementation is no exception;
  • XML DML - would the implementation of XML in SQL be of any use without appropriate additions to the data modification language?
  • XPath - the XPath implementation is vital to the usability of XML: it's used in all XML retrieval and DML methods;
  • XQuery - in contrast to other familiar complex data types the XML data type comes with its own querying language. Currently the standard is still in development, but numerous implementations are already available.

Of course when I say native data type, there are a few limitations to consider, and its size cannot exceed 2 GB.

The XPath and the XQuery language standards come with a variety of functions and operators - currently 179 of them according to the XQuery 1.0 and XPath 2.0 Functions and Operators W3C Candidate Recommendation. Obviously not every function in this Candidate Recommendation has been implemented in SQL Server 2005, but the W3C list serves as a valid point of reference.

The Microsoft SQL Server implementation of the XPath and the XQuery standards follows the XML Path Language (XPath) Version 1.0 W3C Recommendation and the XQuery 1.0 W3C Working Draft. As the development of the XQuery standard continues several changes have been made by the Working Group since the SQL Server implementation. The latest is the Candidate Recommendation of June, 8th 2006.

Taken from the W3C Glossary and Dictionary:

  • Working Draft (WD)
    A Working Draft is a document that W3C has published for review by the community, including W3C Members, the public, and other technical organizations.
  • Candidate Recommendation (CR)
    A Candidate Recommendation is a document that W3C believes has been widely reviewed and satisfies the Working Group's technical requirements. W3C publishes a Candidate Recommendation to gather implementation experience.
  • W3C Recommendation (REC)
    A W3C Recommendation is a specification or set of guidelines that, after extensive consensus-building, has received the endorsement of W3C Members and the Director. W3C recommends the wide deployment of its Recommendations. Note: W3C Recommendations are similar to the standards published by other organizations.

The decision to implement a W3C Working Draft in the release of SQL Server 2005 is a bold step forward for Microsoft. It also explains why only the very elementary XQuery functions have been included so far.

Since not all XPath and XQuery functions have been included in the SQL Server 2005 RTM, and as their implementation most probably exceeds the extent of changes and improvements reserved for Service Releases it's safe to assume that more functions will not be available until the next release of SQL Server. Another question is the evolution of the XQuery standard itself - will a W3C Recommendation be available before the next release of SQL Server?

That's why I've also decided to keep track of implementations as the functions are added in upcoming releases of SQL Server. I'm also keeping an eye on the XQuery Working Group to track any additions to the collection of functions.

Below is a list of all XPath XQuery Functions that are part of the W3C Candidate Recommendation mentioned above. The second column shows the release of SQL Server that first implemented them.

Function name Release of first implementation
fn:abs -
fn:adjust-date-to-timezone -
fn:adjust-dateTime-to-timezone -
fn:adjust-time-to-timezone -
fn:avg SQL 2005 RTM more...
fn:base-uri -
fn:boolean -
fn:ceiling SQL 2005 RTM more...
fn:codepoint-equal -
fn:codepoints-to-string -
fn:collection -
fn:compare -
fn:concat SQL 2005 RTM more...
fn:contains SQL 2005 RTM more...
fn:count SQL 2005 RTM more...
fn:current-date -
fn:current-dateTime -
fn:current-time -
fn:data SQL 2005 RTM more...
fn:dateTime -
fn:day-from-date -
fn:day-from-dateTime -
fn:days-from-duration -
fn:deep-equal -
fn:default-collation -
fn:distinct-values SQL 2005 RTM more...
fn:doc -
fn:doc-available -
fn:document-uri -
fn:empty SQL 2005 RTM more...
fn:encode-for-uri -
fn:ends-with -
fn:error -
fn:escape-html-uri -
fn:exactly-one -
fn:exists -
fn:false SQL 2005 RTM more...
fn:floor SQL 2005 RTM more...
fn:hours-from-dateTime -
fn:hours-from-duration -
fn:hours-from-time -
fn:id SQL 2005 RTM more...
fn:idref -
fn:implicit-timezone -
fn:in-scope-prefixes -
fn:index-of -
fn:insert-before -
fn:iri-to-uri -
fn:lang -
fn:last SQL 2005 RTM more...
fn:local-name SQL 2005 RTM more...
fn:local-name-from-QName SQL 2005 RTM
fn:lower-case SQL 2008 RTM more...
fn:matches -
fn:max SQL 2005 RTM more...
fn:min SQL 2005 RTM more...
fn:minutes-from-dateTime -
fn:minutes-from-duration -
fn:minutes-from-time -
fn:month-from-date -
fn:month-from-dateTime -
fn:months-from-duration -
fn:name -
fn:namespace-uri SQL 2005 RTM
fn:namespace-uri-for-prefix -
fn:namespace-uri-from-QName SQL 2005 RTM
fn:nilled -
fn:node-name -
fn:normalize-space -
fn:normalize-unicode -
fn:not SQL 2005 RTM more...
fn:number SQL 2005 RTM more...
fn:one-or-more -
fn:position SQL 2005 RTM more...
fn:prefix-from-QName -
fn:QName -
fn:remove -
fn:replace -
fn:resolve-QName -
fn:resolve-uri -
fn:reverse -
fn:root -
fn:round SQL 2005 RTM more...
fn:round-half-to-even -
fn:seconds-from-dateTime -
fn:seconds-from-duration -
fn:seconds-from-time -
fn:starts-with -
fn:static-base-uri -
fn:string SQL 2005 RTM more...
fn:string-join -
fn:string-length SQL 2005 RTM more...
fn:string-to-codepoints -
fn:subsequence -
fn:substring SQL 2005 RTM more...
fn:substring-after -
fn:substring-before -
fn:sum SQL 2005 RTM more...
fn:timezone-from-date -
fn:timezone-from-dateTime -
fn:timezone-from-time -
fn:tokenize -
fn:trace -
fn:translate -
fn:true SQL 2005 RTM more...
fn:unordered -
fn:upper-case SQL 2008 RTM more...
fn:year-from-date -
fn:year-from-dateTime -
fn:years-from-duration -
fn:zero-or-one -

Update (2009-03-07)

In SQL Server 2008 two aditional functions have been added to the collection:


ML

p.s. If you're interested in more information regarding the implementation of XML in SQL 2005 continue here.

7 comments:

Tommy Vernieri said...

I see that translate, upper-case, and lower-case are not available. It doesn't seem like ms:string-compare is available either. Is there any way to do a case-insensitive string comparison in a SQL 2005 XQuery predicate?

I'm trying to do something like:

//MyNode[fn:lower-case(@MyAttribute) = fn:lower-case(sql:variable("@TestValue"))]

Matija Lah said...

XML is by definition case-sensitive, thus the binary representations of XML values are compared. This means that until appropriate XPath functions are implemented in the database engine case-insensitive search won't be possible in SQL Server.

However, there are three possible workarounds:

1) Store a lower-case (or upper-case) copy of the XML in a separate column - might double your storage space;

2) Use FTS to narrow down your search - might not help at all;

3) Use XQuery methods in views to extract data and search the view rather than the XML - XML methods must be encapsulated in a UDF in order for the view to support indexing, might double your storage space.

IMHO the best approach to deal with this problem is to use XML solely for transport purposes, but store the data (properly normalized) in tables.

ML

Anonymous said...

hmmm. these would be good if you could actually use them instead of just getting "The XQuery syntax '/function()' is not supported." anytime you try to do anything vaguely useful

Matija Lah said...

Anonymous,

Can you provide a repro (e.g. post it to a newsgroup)?

ML

seecNaz said...

i would guess that Anonymous means is that unfortunately you cannot use the position() function (or any other function) outside of the [ ] brackets.
if you wanted to return the position of a element from a list of siblings...well you can't!


quite a big limitiation and a dead end for many programming projects.

Matija Lah said...

Yes, currently the XQuery implementation in SQL Server is limited, and there are functions whose use is limited to predicates.

And yes, the limitations can sometimes be a PITA (nothing to do with a common dessert).

I intend to show how the functions can be used in a future post.


ML

Anonymous said...

You can try using the likw option but I am not sure how fast its going to be.

SELECT pk, xCol
FROM docs CROSS APPLY xCol.nodes('/book/title/text()') title(tRef)
WHERE CONTAINS (xCol, 'Secure')
AND title.tRef.value ('.', 'NVARCHAR(MAX)') LIKE '%Secure%'

Ref: http://msdn.microsoft.com/en-us/library/ms345117.aspx