snaps & snippets: 2007

Monday, December 31, 2007

Not Just Any Old 2008

In this uncertain world of ours one thing remains pretty certain.

2008 is definitely coming. The year, that is. ;)

Have a good one!

Friday, December 21, 2007

Microsoft Office 2007 iFilters and SQL Server

Ever since the problem with the missing iFilters for Office 2007 documents first arose, the official response by the vendor was to install Office 2007 (or at least Word 2007) on the server. I believe we all agree that this simply isn't good practice.

First of all, why would anyone need an end-user application such as Microsoft Word on a server?

Obviously Word would never even have to be started on the server (unless the DBA would use it for documenting ;), but the question of how it would compete with SQL Server services still needs to be considered if we're planning on putting it there. Ok, at least it deserves a thought.

Then there's the question of security. Does MS Word 2007 with all of its modern data-sharing-collaborating-globally functionalities provide enough security to even be "allowed" to share the machine with SQL Server in a production environment? I'm not saying Word is IIS, I'm simply saying that the security aspect needs proper consideration. Remember all those Office-related security issues from previous versions?

Frankly, I see no reason for any end-user application (together with several server applications) to exist on the same machine with SQL Server. Another battle for resources is something any DBA should prevent. Ok, maybe we can have SSMS on the server, but that one's really more an integral part of the package than just another application. ;)

Anyway, all these questions have since become irrelevant, as Microsoft just made the iFilters available as a stand-alone download.

The files can be found here:

2007 Office System Converter: Microsoft Filter Pack;

...and the corresponding KB article can be found here:

How to register Microsoft Filter Pack IFilters with SQL Server 2005 and with SQL Server 2008.

Now go, and add support for Microsoft Office 2007 documents to your SQL Server 2005's Full-text Search.

Friday, November 30, 2007

SQL 2005 Books Online – September 2007 Update

Once again it's time to update the local copy of SQL Server 2005 Books Online.

It's the end of November and the September 2007 Update has just been made available for download. The files are available through Windows Update, but in case you've missed them (or want full control over the installation) you can also find them here:

SQL Server 2005 Books Online (September 2007).

The latest version of the AdventureWorks database is available on CodePlex:

SQL Server 2005 SP2 Product Samples.

Friday, September 14, 2007

SQL Server 2005 Analysis Services – Column Truncation "Inconvenience"

I've been using SQL Server Analysis Services for large data analysis (analyzing texts) recently, and I've encountered an intriguing problem, the investigation of which has led me to some unexpected discoveries (and a workaround).

To reproduce the problem start SQL Server Business Intelligence Studio, then follow these 6 simple steps:

Create a new SQL Server Analysis project;
Add a Data Source object, pointing to the AdventureWorks database on your SQL Server 2005;
Add a Data Source View using the Data Source added in the previous step and create a Named Query using the following statement:
```
select sys.objects.[name] as ObjectName
 ,sys.objects.[object_id] as ObjectId
 ,object_definition(sys.objects.[object_id]) as ObjectDefinition
 from sys.objects
 where (sys.objects.[type] not in
   ('F', 'IT', 'PK', 'S', 'SQ', 'U', 'UQ'))
```
This query returns the definitions of certain database objects; its purpose in this case is to provide an OLAP dimension using the nvarchar(max) data type for its members.

If the Logical Primary Key for the named query is not selected automatically, set the ObjectId column as the Logical Primary Key;
Add a Cube using the Data Source View created in the previous step. Use all the default settings;
A single Dimension is created;
Deploy the project.

I should have said *attempt* to deploy the project. ;) Yes, the deployment fails with the following error message:

Errors in the back-end database access module. The size specified for a binding 
was too small, resulting in one or more column values being truncated.

Why?

Upon further investigation (i.e. by processing each object individually) we can see that our »Object Definitions« dimension is the source of the problem. It would be quite reasonable to expect this error if we had altered the data source view or the underlying objects after the dimensions had been created, and if this modification had resulted in an increase of the size of any of the data source view columns, as is the case in this issue reported on Microsoft Connect:

The size specified for a binding was too small – resolved as Closed (Won't fix).

The problem reported by Pieter M. does not apply to our case, though, as we've only just created the objects. Let's look at the dimension's definition (right-click the »Object Definitions« dimension in Solution Explorer and select »View Code«) to continue our investigation. Look for the Attribute named »Object Definition« in the XML (Dimension/Attributes/Attribute/Name). Under the KeyColumns/KeyColumn elements you can see the properties for the Object Definition column.

<Attribute dwd:design-time-name="28921594-bbde-434a-b855-8b76cccf1e8a">
 <ID>Object Definition</ID>
 <Name>Object Definition</Name>
 <EstimatedCount>191</EstimatedCount>
 <KeyColumns>
  <KeyColumn dwd:design-time-name="0377ce84-9c52-40ac-8149-64f2cf826009">
   <DataType>WChar</DataType>
   <Source xsi:type="ColumnBinding"
    dwd:design-time-name="2509b813-0e0f-40e5-aa48-317a60cfe881">
    <TableID>ObjectDefinitions</TableID>
    <ColumnID>ObjectDefinition</ColumnID>
   </Source>
  </KeyColumn>
 </KeyColumns>
 <OrderBy>Key</OrderBy>
 <InstanceSelection>List</InstanceSelection>
</Attribute>

In the DataType element we can see that the data type is correctly mapped (i.e. nvarchar(max) to WChar); what we cannot see is the DataSize element. But, hold on, do we really need one? After all, nvarchar(max) is supposed to be unlimited, isn't it? Well, it is limited to 2 GBytes (2,147,483,647 Bytes, to be exact).

How large is in fact the data in the Object Definition column? A quick check in SQL Server Management studio...

select max(datalength(object_definition(sys.objects.[object_id]))) as MaxSize
 from sys.objects
 where (sys.objects.[type] not in
   ('F', 'IT', 'PK', 'S', 'SQ', 'U', 'UQ'))

...reveals that the largest value is "only" 12,844 Bytes. Ok, let's take on the challenge, and add the following element to the dimension's definition...

<DataSize>12844</DataSize>

...as a sibling following the DataType element (as documented in Books Online). If we now process the dimension, the processing finishes successfully, and we can deploy the project. There it is!

So, if we can predict the size of our [n]varchar data, we're all set. But what if we can't?

BTW, if you're thinking of explicitly setting the size of [n]varchar data source view columns, bear in mind that even this won't work every time. For instance: casting the values to nvarchar(4000) results in the value for the DataSize element being set to 1,024 Bytes. How weird is that!?

Those Magic Numbers...

I must admit I'm somewhat baffled by how the mapping between Database Engine and Analysis Services data types is implemented. After all, both the nvarchar(max) as well as the WChar data types are based on the String data type (System.String) of the .Net framework, which means that the upper limit of their sizes is supposed to be 2 GBytes. Why would the actual size then still need to be specified? And why isn't it then specified automatically or – dare I say it – assumed? And how can 1024 be equal to 4000?

I have tried specifying the maximum value of 2147483647 – since, one can't always predict the size of nvarchar(max) data, don't you agree? Isn't that what the *MAX* is for? Anyway, I was greeted with this wonderful error message:

Memory error: Allocation failure : Not enough storage is available to process 
this command.

To respond to this particular exception the documentation suggests closing programs, reducing paging, adding more RAM etc., which all seem pretty absurd given the situation, but – as you probably know – it's just a generic operating system exception.

So, if we can't specify the said maximum value, what can we do? The most logical solution would be to analyze existing data and perhaps come up with an approximation of how large the data is going to grow in the future. Which is a fine strategy, until it breaks. Well, I've been playing around a bit with this option and have found what seems to be the actual upper limit for the WChar data type. And it's a weird one, to say the least. Does the size of 163,315,555 Bytes mean anything to you?

Yes, here it is (for you to copy and paste to your project)...

<DataSize>163315555</DataSize>

...the actual limit that still allows the deployment (at least the deployment of this particular SSAS project).

WTF? If you have any clue, please, leave a comment. And if you can deploy the project using a larger value, please, provide a repro. I really would like to know.

...

So, why all the fuss? Who needs to analyze large pieces of text? Well, I do. :) Honestly, I've come across this little column truncation "inconvenience" merely by chance. I don't think the texts that I need to run analyses against would ever exceed a thousand Bytes in size but one never knows... And now I know they should never exceed 156 MBytes. :)

p.s. I've reported this issue on Connect as well, so, please, vote on it if you think it should be resolved.
SSAS: varchar/nvarchar column size incorrectly mapped between data source views and cube dimensions (#298085)

Friday, September 07, 2007

An Unusual Data Rescue

Take a look at the picture below.

Somewhere in that lovely mess, on one shred of paper – or more? – there is an eleven digit number that I need.

How lovely!

Luckily for me, I've managed to find all the pieces. And it only took about 20 minutes. :) Which is still a long time compared to about three seconds that it originally took to cause the problem...

Lessons learned:

Don't shred anything unless you're absolutely, positively, beyond a shadow of a doubt certain that complete and utter destruction is what you're after.
Shredders are not perfect.

Phew!

Friday, July 20, 2007

Six Months

»Become a better developer in six months.« Define 'better'. :)

I've been tagged by Denis 'The SQL Menace' Gobo to share my views on how I would go about becoming a better developer in the next six months.

(Self-)education is the first thing that comes to mind, and the majority of the tag-attack "victims" agree that reading and learning (and writing about it) directly lead to being better at development; myself included. However, the kind of literature that I consider important (in view of my current work) may have nothing or at least very little to do with software or database development. As some of you know, my principal calling is law and legal science, so if I plan on being better at SQL development at the moment then I'll have to brush up on these:

Legal philosphy and legal theory;
Core legal concepts in individual legal disciplines (property law, obligations, criminal law, civil and criminal procedure – to name but a few);
Legal argumentation; and
The institutions of law (judiciary, legislature, executive – including police and military).

But the key (IMHO) to improving the professional skill-set lies elsewhere. »Staying hungry« is an old principle that I find highly beneficial to self-development.

What am I talking about? Allow me to illustrate. Every boxing coach will work hard at keeping "the champ" fit to fight and at or near his peak. But a good coach will also keep the boxer from reaching the peak of his skills *before* the big match – for instance: stopping a practice fight before time, changing sparring partners to distract the boxer, or doing other more or less annoying things. In other words: keeping the champ "hungry" as well as fit.

Law? Hunger? Boxing? What do these have to do with development and being better at it?

All I'm really trying to say is that spending time away from the workplace, away from work, and – most importantly – away from the PC screen actually provides a distraction that might seem insignificant or even counter-productive but is in effect a great way of getting the creative juices flowing. I'm not saying the brain should be switched off – not at all! Keep your brain active, but do something different. And NO! Watching TV simply won't do.

I see Adam is thinking along the same lines.

I've mentioned before (I believe it was in one of the newsgroups) that I keep a guitar next to my desk. What do *you* do while waiting for SSMS or SSIS to finish, especially with longer operations?

I'm not going to tag anyone (I tend not to be a menace ;), but I'd still be interested in what you think or what you'd do to become better at whatever you do professionally.

Friday, July 06, 2007

SQL Server 2005 XML Methods, Part Four, The Existential Quandary

We've seen SQL Server 2005 XML data retrieval methods in previous posts. In particular we've seen how they perform compared to T-SQL retrieval methods, and – quite frankly – the results were somewhat disappointing.

There is an XML retrieval method that we haven't discussed yet. There are two reasons why not:

It's not really a retrieval method as much as it is a way of testing whether nodes and/or data exists in a given XML instance; and
It's not as flexible as the other methods since it takes testing query conditions out of the »good and safe« relational world and into the depths of the XML (under)world. In other words: as XML is by definition case-sensitive, so is the method we'll be looking at in this post.

Cut to the chase!

I'm of course referring to the exist() XML method. Its purpose is to test whether a given XQuery expression against a given XML instance returns a node set or not.

The method accepts an XQuery expression as its sole parameter, and returns a scalar value:

1 if the XQuery expression returns a nonempty result;
0 if it returns an empty result; or
null if executed against a null XML instance.

You can find all the details in Books Online. Incidentally, have you tried the new SQL Server 2005 Books Online Scoped Search yet?

To provide a better understanding of what our example is aimed at here's the original query:

select Person.Contact.ContactID as ContactID
 from Person.Contact
 where (Person.Contact.MiddleName = @middleName)

Our objective is to retrieve a list of ContactID values for contacts in the Person.Contact table of the AdventureWorks database with »Z.« as their MiddleName.

One row corresponds to the given criteria (that's 1 row out of 19972 in 559 pages).

First of all, let's compare the performance of the query against the original table with only the clustered primary key and then with a covering index. The definition of the covering index is available in the footnotes of this post (just follow the links from the table).

Query	Performance
	Primary Key Only		Covering Index
	CPU	Reads	CPU	Reads
Relational	6	107	0	3
Relational	10	107	0	3
Relational	0	107	0	3
Relational	0	107	0	3
Relational	10	107	0	3
Relational	10	107	0	3

Excellent. On both counts. Would anyone expect anything else on a table of this size?

War of the worlds?

Let's attempt at achieving our objective where the data actually resides. In Part Two of the series we've used this script to create two tables of customer data from the AdventureWorks database with each customer's first, middle and last names packed up in two different XML instances. Two tables will thus be created:

One with the data in an EAV XML model: 19972 rows in 12519 pages (299580 rows in 2020 pages used by XML indexes);
The other with the data in a strongly typed XML model: 19972 rows in 6930 pages (119832 rows in 733 pages used by XML indexes).

We'll be using two pairs of queries (one pair for each XML type): a pair where only the value of the sought node is parameterized, and another pair of queries with both the value as well as the name of the target node supplied through parameters. Once again, the queries are located at the bottom of this post. Oh, and you'll need this, too:

declare @elementName sysname
declare @middleName varchar(8)

set @elementName = 'middleName'
set @middleName = 'Z.'

Now, witness the power of exist.

Query	Performance
	All XML Indexes
	CPU	Reads
#1	4	19
#1	0	19
#1	0	19
#1	0	19
#1	10	19
#1	10	19
#2	4	19
#2	10	19
#2	0	19
#2	0	19
#2	10	19
#2	0	19
#3	0	7
#3	0	7
#3	0	7
#3	0	7
#3	0	7
#3	0	7
#4	4	13
#4	10	13
#4	0	13
#4	0	13
#4	10	13
#4	0	13

Not bad at all! Mind you that we've supplied the sought value in the correct (upper) case – »z.« (lower-case) would not have returned anything.

Judging by the number of reads, data access is minimal – processing can stop as soon as the condition is met, and no data actually needs to be retrieved from the XML. With appropriate XML indexes the XML instances do not have to be accessed at all.

Observe, how XML indexes have been used in this case by looking at the execution plans:

Query #1: EAV, fixed XPath, parameterized value;
Query #2: EAV, parameterized XPath and value;
Query #3: Typed, fixed XPath, parameterized value;
Query #4: Typed, parameterized XPath and value.

NB: I use the following prefixes for XML index names:

pxx_* – primary XML index;
axx_* – secondary XML index for path;
vxx_* – secondary XML index for value; and
rxx_* – secondary XML index for property.

The price of case-insensitivity

The exist method operates in the XML world, which means that the value supplied to the XQuery expression must be in the correct case. Supplying node names should not present a problem in this regard because we can rely on the given XML model (i.e. using XML Schema is highly recommended – we do want domain integrity, don't we?). This, however, most probably won't be enough to help us supply the data in the correct case. So, what alternatives do we have if we decide on using XML to store data in our SQL Server 2005 database?

Limit the use of the exist method to tests where the case of the data is either unambiguous or irrelevant (i.e. only for numeric values);
Supply the value in any case and then use an appropriate XPath function in the XPath/XQuery expression to compensate for the lack of proper casing;
Keep a lower-case (or upper-case) copy of each XML instance (e.g. in a separate column or table or in an [indexed] view); or
Move condition testing out of the XML (i.e. use the value XML method to extract data and test the conditions in T-SQL).

Let's forget about option 1 for a moment as it is as ridiculous as it sounds. Option 2 is not even possible at this time, since none of the appropriate XPath functions (lower-case(), upper-case() or translate()) are currently supported by SQL Server 2005. This leaves us with options 3 and 4 – both come at a price.

Keeping a lower-case copy of each XML instance as suggested by option 3 will inevitably increase the size of the data store, so we must ask ourselves whether the increased performance of searching XML instances using the exist XML method is worth the growth of the database. And there's also the question of maintenance: how will having to keep two copies of the same data affect inserts, updates and deletes (both the logic behind data modifications as well as the performance)?

The answers to both questions depend on the actual case, and may even in the near future become irrelevant if we consider the probability that the XPath/XQuery W3C Recommendations will be implemented in a future release of SQL Server (or be introduced through a service release).

Of course, we can always bring search out of XML and into the relational world, as suggested by option 4. Compare the following counters (and queries) with those presented earlier:

Query	Performance
	All XML Indexes
	CPU	Reads
#5	14176	514491
#5	13729	514491
#5	13669	514491
#5	13960	514491
#5	15502	514491
#5	14021	514491
#6	17391	635503
#6	17465	635503
#6	17405	635503
#6	17185	635503
#6	17475	635503
#6	17425	635503
#7	6882	147733
#7	6970	147733
#7	6879	147733
#7	7010	147733
#7	6539	147733
#7	7010	147733
#8	12620	207649
#8	12428	207649
#8	13139	207649
#8	12478	207649
#8	12498	207649
#8	12558	207649

"Isn't that nice..." he mumbled to himself sarcastically. Case-insensitivity doesn't come cheap.

Care to compare execution plans?

Query #5: EAV, fixed XPath, parameterized value;
Query #6: EAV, parameterized XPath and value;
Query #7: Typed, fixed XPath, parameterized value;
Query #8: Typed, parameterized XPath and value.

Pay special attention to the size of data in each step (also signified by the weight of the arrows). Heavy stuff.

Mental aggregation

Obviously, the exist XML method has merit – heck, it even outperforms pure T-SQL (in the absence of appropriate indexes). Nevertheless, the nature of XML, namely its case-sensitivity, quite significantly limits the usability of this otherwise highly efficient method.

And – I must say this, because I always do – SQL Server seems to prefer Typed XML (compared to the EAV XML model).

Covering index on the Person.Contact table of the AdventureWorks database:

create nonclustered index x_Person_Contact_MiddleName
 on Person.Contact
  (
  MiddleName
  )

Query #1: EAV, fixed XPath, parameterized value, test in XML:

select ANV.AttributeId
 from dbo.AttributesNameValue ANV
 where (ANV.Attribute.exist(
   '/attributeCollection/attribute[data(name) = "middleName"]
    /value[data(.) = sql:variable("@middleName")]'
   ) = 1)

Query #2: EAV, parameterized XPath and value, test in XML:

select ANV.AttributeId
 from dbo.AttributesNameValue ANV
 where (ANV.Attribute.exist(
   '/attributeCollection/attribute[data(name) = sql:variable("@elementName")]
    /value[data(.) = sql:variable("@middleName")]'
   ) = 1)

Query #3: Typed, fixed XPath, parameterized value, test in XML:

select AT.AttributeId
 from dbo.AttributesTyped AT
 where (AT.Attribute.exist(
   '/attributeCollection/middleName[data(.) = sql:variable("@middleName")]'
   ) = 1)

Query #4: Typed, parameterized XPath and value, test in XML:

select AT.AttributeId
 from dbo.AttributesTyped AT
 where (AT.Attribute.exist(
   '/attributeCollection/*[local-name() = sql:variable("@elementName")]
    [data(.) = sql:variable("@middleName")]'
   ) = 1)

Query #5: EAV, fixed XPath, parameterized value, test in SQL:

select ANV.AttributeId
 from dbo.AttributesNameValue ANV
 where (ANV.Attribute.query(
   '/attributeCollection/attribute[data(name) = "middleName"]/value'
   ).value('.', 'varchar(32)') = @middleName)

Query #6: EAV, parameterized XPath and value, test in SQL:

select ANV.AttributeId
 from dbo.AttributesNameValue ANV
 where (ANV.Attribute.query(
   '/attributeCollection/attribute[data(name) = sql:variable("@elementName")]/value'
   ).value('.', 'varchar(32)') = @middleName)

Query #7: Typed, fixed XPath, parameterized value, test in SQL:

select AT.AttributeId
 from dbo.AttributesTyped AT
 where (AT.Attribute.query(
   '/attributeCollection/middleName'
   ).value(
    '.'
    ,'varchar(32)'
    ) = @middleName)

Query #8: Typed, parameterized XPath and value, test in SQL:

select AT.AttributeId
 from dbo.AttributesTyped AT
 where (AT.Attribute.query(
   '/attributeCollection/*[local-name() = sql:variable("@elementName")]'
   ).value(
    '.'
    ,'varchar(32)'
    ) = @middleName)

Wednesday, July 04, 2007

SQL Server 2005 Best Practices Analyzer

SQL Server 2005 Best Practices Analyzer has been released on July 2^nd, and is available at this site:

SQL Server 2005 Best Practices Analyzer (July 2007).

2007 Microsoft MVP Award

I am extremely pleased to tell you, dear reader(s), that on July 1^st I have been presented with the 2007 Microsoft MVP award in the »Windows Server System – SQL Server« category.

I would like to take this opportunity to thank everybody at the MVP Program for considering my participation in the communities to be worthy of this award.

I would certainly like to thank Dejan Sarka for introducing me to the depths of this ocean called SQL.

And of course I thank you, the readers of this blog and the members of the communities, for keeping me up late at night – learning. ;)

Thank you!

Monday, June 18, 2007

SQL 2005 Books Online – May 2007

In case you've missed it, the May 2007 update of SQL Server 2005 Books Online has been made available for download on June 12^th.

The 135 MB setup file can be downloaded from this site:

SQL Server 2005 Books Online (May 2007).

Of course, the online edition has also been updated accordingly.

Also, Windows Live Search now supports SQL Server 2005 Books Online Scoped Search, introducing a much needed way of focusing on relevant subject matter, rather than simply relying on users to provide (or not) additional criteria to the search string (e.g. "SQL Server", "SQL 2005", etc.). Compare scoped search with the 'general' MSDN search.

Friday, May 18, 2007

Presentation Slides and Enabling Database File Encryption

The PowerPoint slideshow of the presentation I held at this year's NT Conference is ready for download. The presentation is in Slovene, but I'm willing to translate it to English (or German or Latin – if you give me some time ;) – leave a comment in this blog if you're interested.

The presentation provides an overview of cryptographic capabilities of SQL Server 2005, focusing on the Database Engine. I also provide a brief (and simplified) general overview of cryptography as a way to improve the security of data exchange.

The demonstration of how digital signatures can be used to allow cross-database access without resorting to cross-database ownership chaining is a simplified version of the demonstration provided by Raul Garcia in his blog, and is also available for download. I've included an additional script for you to play with.

If you've decided on improving SQL Server security using encryption then perhaps this is the first thing to consider: encrypting database files. Microsoft Windows (versions 2000 and later) support file encryption on NTFS drives – this is a certain way of preventing access to database files on illegally acquired devices (e.g. stolen notebooks and/or disks).

How to encrypt database files?

If the database has not been created yet then follow these steps:

Make sure SQL Server 2005 Service is running in an appropriate Local User or Domain User account;
Log on to the server using the service account. This is vital as the service account needs access to encrypted files;
Access the properties of an existing folder or of a newly created one and check the »Encrypt contents to secure data« under »Advanced Attributes« (see illustration bellow). It is recommended that you encrypt the folder rather than individual files – encryption is automatically applied to new files as they are added to the encrypted folder.

Encrypting files and folders in Windows NTFS.

If the database already exists then do this:

Make sure SQL Server 2005 Service is running in an appropriate Local User or Domain User account;
Detach the database (see Books Online for details);
Log on to the server using the service account. This is vital as the service account needs access to encrypted files;
Move the database files to the encrypted folder(s);
Still logged on as the service user re-attach the database (see Books Online for details). Encrypted database files can only be attached by the user who encrypted them.

After completing the steps above the files are encrypted, and any user with sufficient permissions on the SQL Server instance (e.g. ALTER DATABASE) can create encrypted database files provided the correct folder is specified as the file path.

Simple, isn't it – yet so rarely used.

Friday, May 11, 2007

SQL Server 2005 Cryptography, Symmetric key performance

As mentioned before, cryptography is a new functionality in SQL Server 2005. Most aspects of data encryption, digital signing of modules, and the use of cryptographic objects for authentication and authorization are covered in Books Online. Additionally, here are the links to a couple of blogs dedicated to cryptography in SQL Server 2005 (and SQL Server security in general) that I found very helpful while researching this new functionality:

I will be presenting most of what I've discovered regarding cryptography in SQL Server 2005 at this year's NT Conference in Portorož (Slovenia) on Wednesday, May 16^th 2007. If you're there, you're very welcome to attend – you might even learn something. ;)

If you're interested in SQL Server 2005 security in general, though, you really should check out the SQL Server 2005 Security Workshop by Dejan Sarka on the same day. In fact, if you're administering or designing solutions on SQL Server you really should not miss this workshop. You can find more details at the NT Conference web site.

But now... a topic that I don't intend to discuss in detail at the conference.

Symmetric keys – to open or to auto-open?

Symmetric keys are recommended for data encryption by Books Online as they provide better performance when encrypting and/or decrypting data in SQL Server 2005 compared to asymmetric keys and certificates (at least according to Microsoft). Cryptographic functions provided by the platform (in general) are not entirely problem-free (e.g. data-size limitations discussed in the previous post), but let's not get into all that this time. Symmetric keys in SQL Server 2005 support the following cryptographic algorithms: DES, TRIPLE_DES, RC2, RC4, RC4_128, DESX, AES_128, AES_192, and AES_256. The actual availability of these depends on the operating system (e.g. AES is not available on Windows 2000 or XP).

Encrypting data with symmetric keys is done using the EncryptByKey system function. This method requires the symmetric key to be opened prior to the function's invocation using the OPEN SYMMETRIC KEY statement.

Similarly, data decryption is done pretty much in the same way, using the DecryptByKey system function (after opening the key). There are two alternatives to this method, though:

DecryptByKeyAutoCert – can be used for data decryption by symmetric keys protected by certificates; and
DecryptByKeyAutoAsymKey – can be used for data decryption by symmetric keys protected by asymmetric keys.

Both these functions open the symmetric key implicitly.

I was curious whether explicit or implicit opening of the symmetric key influences the performance of these functions in any way. For this purpose I've prepared a rather simplistic set of encrypted values available in this script that you can execute in your own favorite testing database.

The script creates a table and fills it up with short strings (in order not to influence execution significantly). In addition, a certificate is created and then used to protect a newly created symmetric key. This symmetric key uses the DESX cryptographic algorithm.

To see how decryption functions perform on this rather small but IMHO fairly representative set of encrypted values I first traced the execution of two retrieval queries against the data without decrypting the values. This was then followed by two pairs of queries using the OPEN SYMMETRIC KEY/DecryptByKey method and then using the decryptByKeyAutoCert method. BTW: the script produced 148877 rows on 1128 pages with 44 Bytes of encrypted data per row. The only index is the clustered primary key on StringId – the question of how encrypted data could or should be indexed will not be answered in this article.

These are the results (the queries can be found at the bottom of this post – just follow the links in the Query column):

Query	Performance
Query	CPU	Reads
#1	281	1138
#1	271	1138
#1	291	1138
#1	220	1138
#1	331	1138
#1	290	1138
#2	8736	1140
#2	8913	1140
#2	7931	1140
#2	8823	1140
#2	8712	1140
#2	9303	1140
#3	7284	1140
#3	7100	1140
#3	7040	1140
#3	7070	1140
#3	7861	1140
#3	7351	1140
#4	68	1138
#4	70	1138
#4	80	1138
#4	50	1138
#4	90	1138
#4	50	1138
#5	100	1140
#5	120	1140
#5	120	1140
#5	101	1140
#5	70	1140
#5	90	1140
#6	96	1140
#6	70	1140
#6	90	1140
#6	120	1140
#6	100	1140
#6	101	1140

In contrast to queries #1, #2 and #3 (unrestricted) the result sets from queries #4, #5 and #6 are restricted to the »first« 100 rows (i.e. StringId < 101). Queries #1 and #4 retrieve unencrypted values, queries #2 and #5 use the OPEN SYMMETRIC KEY/DecryptByKey method, and queries #3 and #6 use the DecryptByKeyAutoCert method.

Apparently opening keys automatically (i.e. implicitly) outperforms opening keys manually (i.e. explicitly). Who would have thought...? Although the advantage is only slight (5–10%) it may prove significant on larger sets of data. Can you reproduce this on your system?

Of course decrypting values puts additional stress on the CPU (compared to the retrieval of encrypted data) but with only 2 additional reads. The latter is needed to retrieve (and decrypt) the symmetric key. Well, there you have it. In one of the next posts we'll try to discover why asymmetric keys and certificates are said to be less desired for data encryption (performance-wise) than symmetric keys.

Query #1:

select dbo.Strings.StringId
 ,dbo.Strings.EncryptedString
 from dbo.Strings

Query #2:

open symmetric key CryptographyPerformanceSymKey
 decryption by certificate CryptographyPerformanceCert

select dbo.Strings.StringId
 ,cast(decryptByKey(
   dbo.Strings.EncryptedString
   ) as nvarchar(max)) as String
 from dbo.Strings

close symmetric key CryptographyPerformanceSymKey

Query #3:

select dbo.Strings.StringId
 ,cast(decryptByKeyAutoCert(
  cert_id('CryptographyPerformanceCert')
  ,null
  ,dbo.Strings.EncryptedString
  ) as nvarchar(max)) as String
 from dbo.Strings

Query #4:

select dbo.Strings.StringId
 ,dbo.Strings.EncryptedString
 from dbo.Strings
 where (dbo.Strings.StringId < 101)

Query #5:

open symmetric key CryptographyPerformanceSymKey
 decryption by certificate CryptographyPerformanceCert

select dbo.Strings.StringId
 ,cast(decryptByKey(
   dbo.Strings.EncryptedString
   ) as nvarchar(max)) as String
 from dbo.Strings
 where (dbo.Strings.StringId < 101)

close symmetric key CryptographyPerformanceSymKey

Query #6:

select dbo.Strings.StringId
 ,cast(decryptByKeyAutoCert(
   cert_id('CryptographyPerformanceCert')
   ,null
   ,dbo.Strings.EncryptedString
   ) as nvarchar(max)) as String
 from dbo.Strings
 where (dbo.Strings.StringId < 101)

Wednesday, March 07, 2007

SQL Server 2005 Service Pack 2a

Although SP2 for SQL Server has been released more than a week ago I haven't yet had the time to upgrade. Which could also be interpreted as good fortune as yesterday SP2a has been released bringing a fix to the »original« SP2. You can read more about it in this KB article.

In brief: if you use History Cleanup tasks or Maintenance Cleanup tasks in your SQL Server 2005 maintenance plans (or IS packages) and you've already upgraded to SP2 using an installation package downloaded *before* March 5^th 2007, you should either:

Apply the Critical Update for SQL Server 2005; or
Use the current version of the SP2 files, dated March 6^th 2007.

AFAIK »SP2a« is also not the official name of this release.

The build number of the Critical Update mentioned above is 3050 (bringing the full version number to 9.00.3050), the build number of the corrected SP2 remains unchanged at 3042 (full version: 9.00.3042). Confusing? Well...

Finally, don't forget to replace the old SP2 files in your local repository with the new ones.

Monday, February 19, 2007

SQL Server 2005 Service Pack 2

After the December 2006 CTP of SQL Server 2005 SP2 the second Service Pack has RTM'ed today and is ready for download at the official site.

The official site contains links to individual downloads, here are a few essential short cuts:

Microsoft SQL Server 2005 Service Pack 2;
Microsoft SQL Server 2005 Express Edition SP2 (registration is encouraged, not necessary);
SQL Server 2005 Books Online (updated: February 19, 2007).

The build number of SP2 is 3042 (the full version number is therefore 9.00.3042).

I guess this means that Windows Vista installations can finally be improved with a state-of-the-art RDBMS product. ;)

Wednesday, February 14, 2007

SQL Server 2005 XML Methods, Part Three, The XPath Expression

We've discussed the basics of XML in SQL, we've seen a few examples of XQueries and their performance; I guess we're ready to go a bit deeper. In this article we'll look more closely at XPath expressions and see how they behave in SQL Server 2005 XML Queries.

More XPaths than one lead to <Rome/>

First of all there are several ways of formulating a valid XPath expression – for a comprehensive overview of the subject please consult the W3C Recommendations on XPath (mentioned in previous posts) and other sources (e.g. the XPath Tutorial at W3Schools Online Web Tutorials).

Generally, we have two options ^{(with examples later in this article)}:

The exact path – where either the full path (starting at the root node) to a particular node (be it an element or an attribute) is specified (a.k.a. the absolute path) or the partial path to a particular node is specified depending on the current context (a.k.a. the relative path); or
The approximate path – where the path to the node is specified using XPath wildcards (e.g. "//" – anywhere, "element::*" or "*" – any element, "attribute::*" or "@*" – any attribute;¹ of course more on the subject can be found in the W3C Recommendations). Although wildcards can also be used as part of the absolute or the relative path, such XPath expressions cannot be considered exact.

Specifying the exact path is generally the fastest, but it requires quite an intimate knowledge of the XML in question, and may require existing XPath expressions (e.g. used in T-SQL modules) to be corrected if the XML structure is subsequently modified for whatever reason. Approximate paths, on the other hand, are less structure-dependent, but they suffer from a higher performance cost, especially in complex structures.

Of course neither of the two options may lead to the desired node if more nodes exist in a given XML that correspond to a particular XPath expression. It is therefore necessary to extend the expression in such cases with an appropriate XPath predicate.

If we were to compare XPath/XQuery expressions with T-SQL queries we could say that the XPath expression corresponds to the T-SQL SELECT and FROM clauses and the XPath predicate functions as the T-SQL WHERE clause. An appropriate XPath predicate is practically indispensable in situations where the XPath expression should point to a specific node – e.g. if a singleton is needed (as is the case with the value() XML method; we've seen such a case before).

For instance, using the following XML:

declare @example xml

set @example = N'<document>
 <title>Hamsters prefer pumpkin seeds</title>
 <paragraph>George B. Quick is a golden hamster.</paragraph>
 <paragraph>He''s quick when curious about his surroundings and even quicker
  when hungry.</paragraph>
 <paragraph>Although he''s presented with a wide variety of tasty and
  nutritious goodies each day, George prefers shiny pumpkin seeds.
  </paragraph>
</document>'

...all the following queries return the value of the title element using three different XPath expressions:

-- Exact (absolute) path
select @example.query(
   '/document/title'
   ).value(
    '.'
    ,'varchar(max)'
    ) as Title

-- Approximate path using the "anywhere" wildcard
select @example.query(
   '//title'
   ).value(
    '.'
    ,'varchar(max)'
    ) as Title

-- Approximate path using the "any element" wildcard
select @example.query(
   '/*/title'
   ).value(
    '.'
    ,'varchar(max)'
    ) as Title

No predicates are necessary in this particular example as there is only a single title element in our XML. However, if we wanted to extract the value of a particular paragraph element we might end up with unexpected results:

select @example.query(
   '/document/paragraph'
    ).value(
     '.'
     ,'varchar(max)'
     ) as Paragraph

select @example.query(
   '//paragraph'
   ).value(
    '.'
    ,'varchar(max)'
    ) as Paragraph

select @example.query(
   '/*/paragraph'
   ).value(
    '.'
    ,'varchar(max)'
    ) as Paragraph

All three queries return a concatenation of values from all paragraph elements. While this may be valid for some it may not be valid for all queries. What can we do? Well, with an appropriate predicate we can restrict the XML Query result. For instance, we can qualify the XPath expression using the paragraph element's position:

-- Extract the value of the second paragraph...

-- ...using the full syntax for the predicate:
select @example.query(
   '//paragraph[position() = 2]'
   ).value(
    '.'
    ,'varchar(max)'
    ) as Paragraph

-- ...using shorthand:
select @example.query(
   '//paragraph[2]'
   ).value(
    '.'
    ,'varchar(max)'
    ) as Paragraph

Although this method is frequently used in printed and online references on the subject of querying XML data (including Books Online!) I strongly advice against it, as it serves as a doorway to a whole world of possible problems. According to the XML specification the original order of elements need not be preserved (if several elements of the same name are allowed in a given XML) – meaning that it is perfectly legal to change element positions, to insert additional elements anywhere inside the existing element island(s), etc. Even XML Schema won't prevent this.

Being presented with an unfortunate XML as the one in our example there is little we can do in XML queries to prevent unexpected or incorrect results due to changes in the XML. If, on the other hand, the XML structure includes a means of uniquely identifying individual nodes, we can use more resilient XPath expressions:

declare @example xml

set @example = N'<document>
 <title>Hamsters prefer pumpkin seeds</title>
 <paragraph id="1">George B. Quick is a golden hamster.</paragraph>
 <paragraph id="2">He''s quick when curious about his surroundings and even
  quicker when hungry.</paragraph>
 <paragraph id="3">Although he''s presented with a wide variety of tasty and
  nutritious goodies each day, George prefers shiny pumpkin seeds.
  </paragraph>
</document>'

-- Exact (absolute) path using the appropriate predicate
select @example.query(
   '/document/paragraph[@id = 2]'
   ).value(
    '.'
    ,'varchar(max)'
    ) as Paragraph

-- Approximate path using the "anywhere" wildcard and the appropriate predicate
select @example.query(
   '//paragraph[@id = 2]'
   ).value(
    '.'
    ,'varchar(max)'
    ) as Paragraph

-- Approximate path using the "any element" wildcard and the appropriate predicate
select @example.query(
   '//*[@id = 2]'
   ).value(
    '.'
    ,'varchar(max)'
    ) as Paragraph

In the example above we've modified the XML by adding attributes that uniquely identify each repeated element – this way we can use XPath expressions that make the query no longer depend on element positions, but rather on more reliable properties. Of course this is not the only method that can be used nor may it be the most appropriate in all situations. We can quite clearly see that data integrity does start in the XML, and typing the XML (e.g. with XML Schema) is only one of the many steps we have to take to maintain it.

Undoubtedly, data integrity is important, but we should not neglect the performance of data retrieval.

Which XPath is the quickest?

In our performance test² we will focus on the use of XML retrieval methods in query conditions – i.e. a comparison of various XPath expressions in the WHERE clause and their effect on performance. The queries used in this test are listed in footnotes at the bottom of this article. Please, refer to Part Two of the series for DDL and sample data. Note that both tables are XML-indexed and appropriate XML schema collections are used to type XML columns.

Performance ^{(All XML indexes)}
Query	EAV Schema		Query	Typed Schema
Query	CPU	Reads	Query	CPU	Reads
#1	29260	1180945	#7	11837	403710
#1	25847	1181142	#7	10846	403706
#1	28161	1180896	#7	11206	403706
#1	30845	1180896	#7	12378	403706
#1	31225	1180896	#7	12839	403724
#1	30224	1180896	#7	11917	403706
#2	29929	1180896	#8	13175	456104
#2	28351	1180896	#8	13680	456104
#2	26628	1180896	#8	12087	456104
#2	31945	1180896	#8	13439	456104
#2	31946	1180896	#8	13519	456104
#2	30774	1180896	#8	13149	456104
#3	28148	884298	#9	18164	764994
#3	27369	884298	#9	17996	764992
#3	27560	884298	#9	18397	765004
#3	27880	884298	#9	17876	764992
#3	29873	884298	#9	18807	764992
#3	28060	884298	#9	17746	764992

As we've already seen in previous articles the performance of XML queries in comparison to T-SQL queries is quite poor. However, it should be apparent now that we can actually improve the performance of XML retrieval simply by using more appropriate XPath expressions.

For the EAV XML type approximate XPath expressions seem to be less resource-intensive, while for the strongly typed XML using an exact path will yield better performance. Of course, once again it seems that SQL Server prefers typed XML.

Anyway, we're not quite done yet. Believe it or not, there still remains some room for improvement. And it won't require any changes to the XML or the XML Schema – the XPath expressions can be further improved. How about halving CPU time and decimating reads?

Performance ^{(All XML indexes)}
Query	EAV Schema		Query	Typed Schema
Query	CPU	Reads	Query	CPU	Reads
#4	8995	148077	#10	5670	87871
#4	8922	148077	#10	5648	87871
#4	8823	148077	#10	5879	87871
#4	9293	148077	#10	5678	87871
#4	8893	148077	#10	5487	87871
#4	9043	148077	#10	5658	87871
#5	9035	148077	#11	5740	87871
#5	8993	148077	#11	5598	87871
#5	9073	148077	#11	5668	87871
#5	8893	148077	#11	5778	87871
#5	9143	148077	#11	5749	87871
#5	9073	148077	#11	5909	87871
#6	12332	228022	#12	10497	242122
#6	12178	228031	#12	10115	242122
#6	12087	228019	#12	10084	242122
#6	12388	228023	#12	10726	242122
#6	12518	228019	#12	10835	242122
#6	12488	228019	#12	10725	242122

By using even more exact XPath expressions we can reduce CPU time by 2 to 3 times and reduce the number of reads by 3 to 8 times! This *is* without a doubt a significant amount.

But where's the catch?

Well, where does the value of a simple (or mixed) XML element actually reside? In its text node, that's where. It is therefore logical to look for it right there and nowhere else, and apparently this is exactly how XML data retrieval works in SQL: by specifying an XPath expression that is as exact as possible we allow the database engine to navigate past other possible locations (e.g. child nodes) and go directly to the intended target (the bull's-eye).

Even though our target element holds no descendants (and – according to XML schema collections being used – isn't allowed to hold any) the "/attributeCollection/firstName" XPath expression actually points to the entire element (i.e. a node-set, not just a node!), while the "/attributeCollection/firstName/text()" XPath expression points to a single final node. Well, this is what I've observed – after all, I haven't designed the darn thing. ;) It makes perfect sense, though.

Mental aggregation

What have we learned?

First of all, serious XML use starts with serious XML modeling: most of the considerations related to an efficient SQL data model are necessary for efficient XML models as well. Use the XML Schema to provide domain integrity (XDR, although supported by SQL Server 2005, is fast becoming obsolete), use appropriate relational modeling (yes, relational!) inside your XML Schema, and consider all relevant aspects of XML use *before* you introduce your XML model into your SQL model;
Second, know your XML: XML Queries should reflect the expected structure of XML data – consider the XML Schema as your guide through the XML. Can you afford to use incorrect data?
Third, really know your XML: the difference between exact and approximate XPath expressions may be insignificant on small quantities of data, but may grow dramatically on large quantities. SQL Server prefers the exact XPath and so should you;
And last but not least, if your particular business case requires an XML model that can also be modeled in SQL, use SQL instead; especially so if a SQL pattern or practice that matches your requirements already exists. After all, the superiority of SQL over XML regarding data modeling, storage and retrieval remains undisputed.

Next time we'll take a look at something rather unexpected and quite useful (in some respects).

¹ Quotation marks are not part of the XPath expression.
² You'll need this for the rest of the queries to work (in addition to DDL and sample data available here):
```
declare @firstName varchar(8)
set @firstName = 'Karen'
```

Query #1 (EAV Type, absolute path):

select ANV.AttributeId as AttributeId
 from dbo.AttributesNameValue ANV
 where (ANV.Attribute.query(
    '/attributeCollection/attribute[name = "firstName"]/value'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #2 (EAV Type, approximate path):

select ANV.AttributeId as AttributeId
 from dbo.AttributesNameValue ANV
 where (ANV.Attribute.query(
    '//attribute[name = "firstName"]/value'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #3 (EAV Type, an "even more approximate" path):

select ANV.AttributeId as AttributeId
 from dbo.AttributesNameValue ANV
 where (ANV.Attribute.query(
    '//value[../name = "firstName"]'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #4 (EAV Type, the "bull's-eye" path):

select ANV.AttributeId as AttributeId
 from dbo.AttributesNameValue ANV
 where (ANV.Attribute.query(
    '/attributeCollection/attribute[name/text() = "firstName"]/value/text()'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #5 (EAV Type, the "near bull's-eye" path):

select ANV.AttributeId as AttributeId
 from dbo.AttributesNameValue ANV
 where (ANV.Attribute.query(
    '//attribute[name/text() = "firstName"]/value/text()'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #6 (EAV Type, the "in the neighbourhood of the bull's-eye" path):

select ANV.AttributeId as AttributeId
 from dbo.AttributesNameValue ANV
 where (ANV.Attribute.query(
    '//value[../name/text() = "firstName"]/text()'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #7 (Strong Type, absolute path):

select AT.AttributeId as AttributeId
 from dbo.AttributesTyped AT
 where (AT.Attribute.query(
    '/attributeCollection/firstName'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #8 (Strong Type, approximate path):

select AT.AttributeId as AttributeId
 from dbo.AttributesTyped AT
 where (AT.Attribute.query(
    '//firstName'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #9 (Strong Path, the "ridiculously complicated" path):

select AT.AttributeId as AttributeId
 from dbo.AttributesTyped AT
 where (AT.Attribute.query(
    '//*[local-name() = "firstName"]'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #10 (Strong Type, the "bulls's-eye" path):

select AT.AttributeId as AttributeId
 from dbo.AttributesTyped AT
 where (AT.Attribute.query(
    '/attributeCollection/firstName/text()'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #11 (Strong Type, the "near bull's-eye" path):

select AT.AttributeId as AttributeId
 from dbo.AttributesTyped AT
 where (AT.Attribute.query(
    '//firstName/text()'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)

Query #12 (Strong Type, the "grasping at straws" path):

select AT.AttributeId as AttributeId
 from dbo.AttributesTyped AT
 where (AT.Attribute.query(
    '//*[local-name() = "firstName"]/text()'
    ).value(
     '.'
     ,'varchar(max)'
     ) = @firstName)