Friday, July 06, 2007

SQL Server 2005 XML Methods, Part Four, The Existential Quandary

We've seen SQL Server 2005 XML data retrieval methods in previous posts. In particular we've seen how they perform compared to T-SQL retrieval methods, and – quite frankly – the results were somewhat disappointing.

There is an XML retrieval method that we haven't discussed yet. There are two reasons why not:

  • It's not really a retrieval method as much as it is a way of testing whether nodes and/or data exists in a given XML instance; and
  • It's not as flexible as the other methods since it takes testing query conditions out of the »good and safe« relational world and into the depths of the XML (under)world. In other words: as XML is by definition case-sensitive, so is the method we'll be looking at in this post.

Cut to the chase!

I'm of course referring to the exist() XML method. Its purpose is to test whether a given XQuery expression against a given XML instance returns a node set or not.

The method accepts an XQuery expression as its sole parameter, and returns a scalar value:

  • 1 if the XQuery expression returns a nonempty result;
  • 0 if it returns an empty result; or
  • null if executed against a null XML instance.

You can find all the details in Books Online. Incidentally, have you tried the new SQL Server 2005 Books Online Scoped Search yet?

To provide a better understanding of what our example is aimed at here's the original query:

select Person.Contact.ContactID as ContactID
 from Person.Contact
 where (Person.Contact.MiddleName = @middleName)

Our objective is to retrieve a list of ContactID values for contacts in the Person.Contact table of the AdventureWorks database with »Z.« as their MiddleName.

One row corresponds to the given criteria (that's 1 row out of 19972 in 559 pages).

First of all, let's compare the performance of the query against the original table with only the clustered primary key and then with a covering index. The definition of the covering index is available in the footnotes of this post (just follow the links from the table).

Query Performance
Primary Key Only Covering Index
CPU Reads CPU Reads
Relational 6 107 0 3

Excellent. On both counts. Would anyone expect anything else on a table of this size?


War of the worlds?

Let's attempt at achieving our objective where the data actually resides. In Part Two of the series we've used this script to create two tables of customer data from the AdventureWorks database with each customer's first, middle and last names packed up in two different XML instances. Two tables will thus be created:

  • One with the data in an EAV XML model: 19972 rows in 12519 pages (299580 rows in 2020 pages used by XML indexes);
  • The other with the data in a strongly typed XML model: 19972 rows in 6930 pages (119832 rows in 733 pages used by XML indexes).

We'll be using two pairs of queries (one pair for each XML type): a pair where only the value of the sought node is parameterized, and another pair of queries with both the value as well as the name of the target node supplied through parameters. Once again, the queries are located at the bottom of this post. Oh, and you'll need this, too:

declare @elementName sysname
declare @middleName varchar(8)

set @elementName = 'middleName'
set @middleName = 'Z.'

Now, witness the power of exist.

Query Performance
All XML Indexes
CPU Reads
#1 4 19
#2 4 19
#3 0 7
#4 4 13

Not bad at all! Mind you that we've supplied the sought value in the correct (upper) case – »z.« (lower-case) would not have returned anything.

Judging by the number of reads, data access is minimal – processing can stop as soon as the condition is met, and no data actually needs to be retrieved from the XML. With appropriate XML indexes the XML instances do not have to be accessed at all.

Observe, how XML indexes have been used in this case by looking at the execution plans:

NB: I use the following prefixes for XML index names:

  • pxx_* – primary XML index;
  • axx_* – secondary XML index for path;
  • vxx_* – secondary XML index for value; and
  • rxx_* – secondary XML index for property.


The price of case-insensitivity

The exist method operates in the XML world, which means that the value supplied to the XQuery expression must be in the correct case. Supplying node names should not present a problem in this regard because we can rely on the given XML model (i.e. using XML Schema is highly recommended – we do want domain integrity, don't we?). This, however, most probably won't be enough to help us supply the data in the correct case. So, what alternatives do we have if we decide on using XML to store data in our SQL Server 2005 database?

  1. Limit the use of the exist method to tests where the case of the data is either unambiguous or irrelevant (i.e. only for numeric values);
  2. Supply the value in any case and then use an appropriate XPath function in the XPath/XQuery expression to compensate for the lack of proper casing;
  3. Keep a lower-case (or upper-case) copy of each XML instance (e.g. in a separate column or table or in an [indexed] view); or
  4. Move condition testing out of the XML (i.e. use the value XML method to extract data and test the conditions in T-SQL).

Let's forget about option 1 for a moment as it is as ridiculous as it sounds. Option 2 is not even possible at this time, since none of the appropriate XPath functions (lower-case(), upper-case() or translate()) are currently supported by SQL Server 2005. This leaves us with options 3 and 4 – both come at a price.

Keeping a lower-case copy of each XML instance as suggested by option 3 will inevitably increase the size of the data store, so we must ask ourselves whether the increased performance of searching XML instances using the exist XML method is worth the growth of the database. And there's also the question of maintenance: how will having to keep two copies of the same data affect inserts, updates and deletes (both the logic behind data modifications as well as the performance)?

The answers to both questions depend on the actual case, and may even in the near future become irrelevant if we consider the probability that the XPath/XQuery W3C Recommendations will be implemented in a future release of SQL Server (or be introduced through a service release).

Of course, we can always bring search out of XML and into the relational world, as suggested by option 4. Compare the following counters (and queries) with those presented earlier:

Query Performance
All XML Indexes
CPU Reads
#5 14176 514491
#6 17391 635503
#7 6882 147733
#8 12620 207649

"Isn't that nice..." he mumbled to himself sarcastically. Case-insensitivity doesn't come cheap.

Care to compare execution plans?

Pay special attention to the size of data in each step (also signified by the weight of the arrows). Heavy stuff.


Mental aggregation

Obviously, the exist XML method has merit – heck, it even outperforms pure T-SQL (in the absence of appropriate indexes). Nevertheless, the nature of XML, namely its case-sensitivity, quite significantly limits the usability of this otherwise highly efficient method.

And – I must say this, because I always do – SQL Server seems to prefer Typed XML (compared to the EAV XML model).


ML


  • Covering index on the Person.Contact table of the AdventureWorks database:
    create nonclustered index x_Person_Contact_MiddleName
     on Person.Contact
      (
      MiddleName
      )
  • Query #1: EAV, fixed XPath, parameterized value, test in XML:
    select ANV.AttributeId
     from dbo.AttributesNameValue ANV
     where (ANV.Attribute.exist(
       '/attributeCollection/attribute[data(name) = "middleName"]
        /value[data(.) = sql:variable("@middleName")]'
       ) = 1)
  • Query #2: EAV, parameterized XPath and value, test in XML:
    select ANV.AttributeId
     from dbo.AttributesNameValue ANV
     where (ANV.Attribute.exist(
       '/attributeCollection/attribute[data(name) = sql:variable("@elementName")]
        /value[data(.) = sql:variable("@middleName")]'
       ) = 1)
  • Query #3: Typed, fixed XPath, parameterized value, test in XML:
    select AT.AttributeId
     from dbo.AttributesTyped AT
     where (AT.Attribute.exist(
       '/attributeCollection/middleName[data(.) = sql:variable("@middleName")]'
       ) = 1)
  • Query #4: Typed, parameterized XPath and value, test in XML:
    select AT.AttributeId
     from dbo.AttributesTyped AT
     where (AT.Attribute.exist(
       '/attributeCollection/*[local-name() = sql:variable("@elementName")]
        [data(.) = sql:variable("@middleName")]'
       ) = 1)
  • Query #5: EAV, fixed XPath, parameterized value, test in SQL:
    select ANV.AttributeId
     from dbo.AttributesNameValue ANV
     where (ANV.Attribute.query(
       '/attributeCollection/attribute[data(name) = "middleName"]/value'
       ).value('.', 'varchar(32)') = @middleName)
  • Query #6: EAV, parameterized XPath and value, test in SQL:
    select ANV.AttributeId
     from dbo.AttributesNameValue ANV
     where (ANV.Attribute.query(
       '/attributeCollection/attribute[data(name) = sql:variable("@elementName")]/value'
       ).value('.', 'varchar(32)') = @middleName)
  • Query #7: Typed, fixed XPath, parameterized value, test in SQL:
    select AT.AttributeId
     from dbo.AttributesTyped AT
     where (AT.Attribute.query(
       '/attributeCollection/middleName'
       ).value(
        '.'
        ,'varchar(32)'
        ) = @middleName)
  • Query #8: Typed, parameterized XPath and value, test in SQL:
    select AT.AttributeId
     from dbo.AttributesTyped AT
     where (AT.Attribute.query(
       '/attributeCollection/*[local-name() = sql:variable("@elementName")]'
       ).value(
        '.'
        ,'varchar(32)'
        ) = @middleName)

1 comment:

Anonymous said...

You have been tagged

http://sqlblog.com/blogs/denis_gobo/archive/2007/07/16/1746.aspx