Monday, November 27, 2006

XQuery 1.0 Becomes a Proposed Recommendation

On November 22nd the World Wide Web Consortium announced that the XQuery 1.0, XPath 2.0 and XSLT 2.0 became Proposed Recommendations. In other words, it's safe to assume that in the early 2007 we should expect the work on current versions of these three XML management standards to end with proper W3C Recommendations. Obviously, the R&D activities of the XML Query Working Group will not seize, only the current versions of the specifications will have been made final.

From the W3C Glossary:

  • Proposed Recommendation (PR)
    A Proposed Recommendation is a mature technical report that, after wide review for technical soundness and implementability, W3C has sent to the W3C Advisory Committee for final endorsement.
  • W3C Recommendation (REC)
    A W3C Recommendation is a specification or set of guidelines that, after extensive consensus-building, has received the endorsement of W3C Members and the Director. W3C recommends the wide deployment of its Recommendations. Note: W3C Recommendations are similar to the standards published by other organizations.

The specifications are available at the W3C Website:

The accompainying specifications are also avaliable:

The XQuery 1.0 and XPath 2.0 Functions and Operators (W3C Proposed Recommendation 21 November 2006) specification has also been made available, and – as far as the XPath/XQuery functions are concerned – remains pretty much unchanged from the Candidate Recommendation. This means that so far, the list of XPath/XQuery functions does not need any updating.

The question that remains is whether the RTM version of SQL Server 2005 Service Pack 2 will reflect any of this. I expect no changes in the SQL Server database engine regarding the XML Query Language implementation nor any additional support of the XPath/XQuery functions in the current version of the product, although the CTP of the SP2 already contains certain breaking changes that have received some negative reactions from the community. I guess we'll see more XPath/XQuery support in the next full version of SQL Server.


ML

Friday, November 24, 2006

SQL Server 2005 XML Methods, Part One, Basic Retrieval

As mentioned in one of the previous posts XML became a native data type with SQL Server 2005. Opinions on its usefulness in a relational database are divided – "fragmented", even. ;) The purpose of this series of posts is to shed some light on how the XML data type could be utilized efficiently, and what are the caveats that should be avoided. In this first part specifically, we'll take a look at the built-in data retrieval methods concerning the XML data type.

So, you've decided on using the XML data type in your SQL Server database. Now your main concern is how to access parts of that XML in your queries. There are three data retrieval XML methods available in SQL Server 2005:

  • query() – returns the results of an XML Query (the entire result of the XML Query in one row). The result is untyped XML;
  • nodes() – returns a rowset of nodes returned by an XML Query (one row for each node matched by the XML Query expression). The result is a rowset of untyped XML values;
  • value() – returns the result of an XML Query as a value in the specified data type (the XML Query statement and the name of the data type for the result are passed as parameters). The result of the XML Query used in this function must be a singleton, otherwise an error is raised. The final result is value in the data type specified at the function's invocation.

Which one to use and when?

If values are needed rather than XML nodes then the value function should be used to access the data (obvious, isn't it?). When designing queries it is vital to take into account that the XQuery results evaluate to singletons before the value function can be applied.

Yes, an example might be of help here:

Let's start with the following XML:

declare @xml xml

set @xml = N'<?xml version="1.0" encoding="utf-8"?>
<attributeCollection>
 <attribute>
  <name>firstName</name>
  <value>Catherine</value>
 </attribute>
 <attribute>
  <name>middleName</name>
  <value>R.</value>
 </attribute>
 <attribute>
  <name>lastName</name>
  <value>Abel</value>
 </attribute>
</attributeCollection>'

Executing either of these two queries:

  • Query #1
    select @xml.query(
                   '//attribute'
                   ).value(
                           'value'
                           ,'varchar(max)'
                           ) as AttributeValue
    ...or:
  • Query #2
    select Attributes.Attribute.value(
                    'value'
                    ,'varchar(max)'
                    ) as AttributeValue
     from @xml.nodes(
            '//attribute'
            ) Attributes (Attribute)

...leads to the following exception:

Msg 2389, Level 16, State 1, Line #
XQuery [value()]: 'value()' requires a singleton (or empty sequence),
found operand of type 'xdt:untypedAtomic *'

The road to a valid XML singleton lies within the XPath expression – it must evaluate to a single XML node. If we correct the XPath expression in both queries, the exception won't occur:

  • Query #3
    select @xml.query(
                    '//attribute/value'
                    ).value(
                            '.'
                            ,'varchar(max)'
                            ) as AttributeValue
    ...and:
  • Query #4
    select Attributes.Attribute.value(
                    '.'
                    ,'varchar(max)'
                    ) as AttributeValue
     from @xml.nodes(
                '//attribute/value'
                ) Attributes (Attribute)

This brings us to the first consideration regarding the choice of XML data retrieval methods. The principal question we need to answer is purely conceptual and deals with the business purpose of a particular XML: what relationships exist inside the particular XML? And don't say none. ;) The goal of queries against the XML should be clear – clarify it by deciding whether the relationships inside the XML are of value to the relational model.

For example: Query #3 returns a concatenation of values from all nodes corresponding to the XPath expression, while Query #4 returns a set of values:

  • Results for Query #3:
    AttributeValue
    -------------------
    CatherineR.Abel
    
    (1 row(s) affected)
  • Results for Query #4:
    AttributeValue
    -------------------
    Catherine
    R.
    Abel
    
    (3 row(s) affected)

Usually, if the relationships in a particular XML model significantly influence the data use then it would be more appropriate (especially performance-wise) if they were properly catered for in the database model directly (stored separately, relationally constrained, and indexed if applicable) rather than left to dwell within the depths of the XML.

However, there are cases when it is more appropriate (from a business perspective) to leave data as XML entities even in the data store; e.g. document management systems (where each document is stored as an XML entity and a single data model is used to store several classes or types of documents), information platforms (where on the business layer the solution needs to cater to seemingly highly diverse requirements, often leaving the end-users to have uninhibitted control over the structure of their entities, while the solution actually allows for unified physical storage), etc.

In fact, once again there is no straight answer – the actual implementation depends on the requirements of the case in hand. This influences the design of the data model, and the latter influences the choice of appropriate retrieval methods.


Is that it?

Well, this has been an incredibly brief introduction to the subject of XML retrieval methods provided by SQL Server 2005, but we're not done yet! Next time we'll be looking at a few examples where the three built-in XML methods we've seen today are used in various combinations. And we might just as well make a few surprising discoveries. ;)


ML

Friday, November 10, 2006

NOLOCK no good

Query hints should be used with caution, and a special care should be given to locking hints; in Books Online this subject is fully covered, and several SQL professionals out there have added their own views and experience, which is also very evident online. I've also used some examples of how query hints work when discussing optimistic concurrency (and some other places) – purely for illustration purposes.

Too often are locking hints missused, or at least used inappropriately – either when developers ignore the possibility of negative consequences altogether or when they focus purely on performance, rather than balancing the latter with data consistency and accuracy. Visit any SQL Server related newsgroup and you'll find really good examples of really bad query hint abuse. Luckily, these users (still) represent only a less significant minority. But on the other hand, when sensitive data is concerned a single member of this minority with their *special performance imporvements* is enough to cause mayhem. Joe Celko might also say: murder.

Tony Rogerson posted a very interesting article on his blog today dealing with the possibility of truly unpleasant consequences when the NOLOCK hint is used. Take a look at Tony's example, reproduce it on your system, and see for yourself how devastating such possibilities really can be. Great article!

Don't get me wrong, I'm not saying (and I don't think Tony is) that the NOLOCK hint should not be used at all – far from it: sometimes it really is the only way to go, but never should it be used without first considering the alternatives (if any) and possible consequences. In fact, if you ask me, the same goes for any query hint. Test, re-test, double-test, and – just to make sure – test again.

Further more, Tony has also announced more articles on the subject of concurrency and consistency, so keep an eye on his blog. I know I will.


ML