As mentioned in one of the previous posts XML became a native data type with SQL Server 2005. Opinions on its usefulness in a relational database are divided – "fragmented", even. ;) The purpose of this series of posts is to shed some light on how the XML data type could be utilized efficiently, and what are the caveats that should be avoided. In this first part specifically, we'll take a look at the built-in data retrieval methods concerning the XML data type.

So, you've decided on using the XML data type in your SQL Server database. Now your main concern is how to access parts of that XML in your queries. There are three data retrieval XML methods available in SQL Server 2005:

  • query() – returns the results of an XML Query (the entire result of the XML Query in one row). The result is untyped XML;
  • nodes() – returns a rowset of nodes returned by an XML Query (one row for each node matched by the XML Query expression). The result is a rowset of untyped XML values;
  • value() – returns the result of an XML Query as a value in the specified data type (the XML Query statement and the name of the data type for the result are passed as parameters). The result of the XML Query used in this function must be a singleton, otherwise an error is raised. The final result is value in the data type specified at the function's invocation.

Which one to use and when?

If values are needed rather than XML nodes then the value function should be used to access the data (obvious, isn't it?). When designing queries it is vital to take into account that the XQuery results evaluate to singletons before the value function can be applied.

Yes, an example might be of help here:

Let's start with the following XML:

declare @xml xml

set @xml = N'<?xml version="1.0" encoding="utf-8"?>
<attributeCollection>
 <attribute>
  <name>firstName</name>
  <value>Catherine</value>
 </attribute>
 <attribute>
  <name>middleName</name>
  <value>R.</value>
 </attribute>
 <attribute>
  <name>lastName</name>
  <value>Abel</value>
 </attribute>
</attributeCollection>'

Executing either of these two queries:

  • Query #1
    select @xml.query(
                   '//attribute'
                   ).value(
                           'value'
                           ,'varchar(max)'
                           ) as AttributeValue
    ...or:
  • Query #2
    select Attributes.Attribute.value(
                    'value'
                    ,'varchar(max)'
                    ) as AttributeValue
     from @xml.nodes(
            '//attribute'
            ) Attributes (Attribute)

...leads to the following exception:

Msg 2389, Level 16, State 1, Line #
XQuery [value()]: 'value()' requires a singleton (or empty sequence),
found operand of type 'xdt:untypedAtomic *'

The road to a valid XML singleton lies within the XPath expression – it must evaluate to a single XML node. If we correct the XPath expression in both queries, the exception won't occur:

  • Query #3
    select @xml.query(
                    '//attribute/value'
                    ).value(
                            '.'
                            ,'varchar(max)'
                            ) as AttributeValue
    ...and:
  • Query #4
    select Attributes.Attribute.value(
                    '.'
                    ,'varchar(max)'
                    ) as AttributeValue
     from @xml.nodes(
                '//attribute/value'
                ) Attributes (Attribute)

This brings us to the first consideration regarding the choice of XML data retrieval methods. The principal question we need to answer is purely conceptual and deals with the business purpose of a particular XML: what relationships exist inside the particular XML? And don't say none. ;) The goal of queries against the XML should be clear – clarify it by deciding whether the relationships inside the XML are of value to the relational model.

For example: Query #3 returns a concatenation of values from all nodes corresponding to the XPath expression, while Query #4 returns a set of values:

  • Results for Query #3:
    AttributeValue
    -------------------
    CatherineR.Abel
    
    (1 row(s) affected)
  • Results for Query #4:
    AttributeValue
    -------------------
    Catherine
    R.
    Abel
    
    (3 row(s) affected)

Usually, if the relationships in a particular XML model significantly influence the data use then it would be more appropriate (especially performance-wise) if they were properly catered for in the database model directly (stored separately, relationally constrained, and indexed if applicable) rather than left to dwell within the depths of the XML.

However, there are cases when it is more appropriate (from a business perspective) to leave data as XML entities even in the data store; e.g. document management systems (where each document is stored as an XML entity and a single data model is used to store several classes or types of documents), information platforms (where on the business layer the solution needs to cater to seemingly highly diverse requirements, often leaving the end-users to have uninhibitted control over the structure of their entities, while the solution actually allows for unified physical storage), etc.

In fact, once again there is no straight answer – the actual implementation depends on the requirements of the case in hand. This influences the design of the data model, and the latter influences the choice of appropriate retrieval methods.


Is that it?

Well, this has been an incredibly brief introduction to the subject of XML retrieval methods provided by SQL Server 2005, but we're not done yet! Next time we'll be looking at a few examples where the three built-in XML methods we've seen today are used in various combinations. And we might just as well make a few surprising discoveries. ;)


ML