snaps & snippets: August 2006

Monday, August 28, 2006

[snap] Cryptography limitations

As mentioned before SQL Server 2005 brings several security improvements – one of them being the support for cryptography. The purpose of the built-in cryptiographic functions is to enhance server and/or database security and to provide additional methods in data security (by allowing the encryption of values in columns and variables). The SQL Server encryption hierarchy is well documented in Books Online, and so are the built-in cryptographic functions. What Books Online fail to mention, however, (even after being updated for the third time since the release) is the fact that there are limitations to consider regarding encryption in SQL Server 2005.

One limitation is the size of the clear text that can be encrypted. A limitation that practically renders the built-in functions useless for large values (larger than the available maximum length based on the length of the key), unless the value is encrypted in chunks. The behaviour of the SQL built-in functions is dependent on the CryptoAPI cryptographic functions, specifically the CryptEncrypt function.

Even before the release of SQL Server 2005 these limitations have been discussed, and the author of the article also posted appropriate T-SQL user-defined functions to serve as workarounds. These user-defined functions split the clear text into digestible chunks allowing the use of the built-in functions for larger values (and vice-versa). Which is nice. And – judging by the test scripts provided in that post – reliable as well. But are these functions as efficient as they should be? One thing is almost certain – T-SQL is not as efficient as CLR when it comes to operations on strings, so appropriate SQL CLR functions will perform better. This, of course, is not meant as a critique of the article mentioned above, it's rather a simple statement of facts – the built-in functions are limited in their usability, and the workarounds come with their own limitations as well.

In this Microsoft feedback article the issue regarding the encryptByCert built-in function is resolved as "by design".

Which brings us straight to the point of this post – while the built-in cryptographic functions work well with shorter values (e.g. encryption keys, passwords, credit card numbers, salaries etc.) it would be better to let the client application handle the encryption and decryption of larger values (e.g. longer text values). The encryption keys, however, could still be safely stored (encrypted, even) on the server.

Friday, August 18, 2006

XPath/XQuery functions in SQL Server 2005

One of the key improvements in SQL Server 2005 is the implementation of XML as a native data type. At first glance this may not seem like such a big task (after all, it's just one more data type, right, and it's been a standard for years) but after considering the amount of additional features involved, the task of implementing it seems far from trivial:

XML Schema - a proper data type requires a constrainable domain and the XML data type even comes with its own validation standard;
XML Indexes - what good is a column data type that can't be indexed?
Full-text indexing - ok, the XML word breaker already existed in SQL 2000, nonetheless it's a feature worth mentioning;
XML Methods - complex data types (in contrast to primitive data types) usually come with native data access methods and the SQL XML implementation is no exception;
XML DML - would the implementation of XML in SQL be of any use without appropriate additions to the data modification language?
XPath - the XPath implementation is vital to the usability of XML: it's used in all XML retrieval and DML methods;
XQuery - in contrast to other familiar complex data types the XML data type comes with its own querying language. Currently the standard is still in development, but numerous implementations are already available.

Of course when I say native data type, there are a few limitations to consider, and its size cannot exceed 2 GB.

The XPath and the XQuery language standards come with a variety of functions and operators - currently 179 of them according to the XQuery 1.0 and XPath 2.0 Functions and Operators W3C Candidate Recommendation. Obviously not every function in this Candidate Recommendation has been implemented in SQL Server 2005, but the W3C list serves as a valid point of reference.

The Microsoft SQL Server implementation of the XPath and the XQuery standards follows the XML Path Language (XPath) Version 1.0 W3C Recommendation and the XQuery 1.0 W3C Working Draft. As the development of the XQuery standard continues several changes have been made by the Working Group since the SQL Server implementation. The latest is the Candidate Recommendation of June, 8^th 2006.

Taken from the W3C Glossary and Dictionary:

Working Draft (WD)
A Working Draft is a document that W3C has published for review by the community, including W3C Members, the public, and other technical organizations.
Candidate Recommendation (CR)
A Candidate Recommendation is a document that W3C believes has been widely reviewed and satisfies the Working Group's technical requirements. W3C publishes a Candidate Recommendation to gather implementation experience.
W3C Recommendation (REC)
A W3C Recommendation is a specification or set of guidelines that, after extensive consensus-building, has received the endorsement of W3C Members and the Director. W3C recommends the wide deployment of its Recommendations. Note: W3C Recommendations are similar to the standards published by other organizations.

The decision to implement a W3C Working Draft in the release of SQL Server 2005 is a bold step forward for Microsoft. It also explains why only the very elementary XQuery functions have been included so far.

Since not all XPath and XQuery functions have been included in the SQL Server 2005 RTM, and as their implementation most probably exceeds the extent of changes and improvements reserved for Service Releases it's safe to assume that more functions will not be available until the next release of SQL Server. Another question is the evolution of the XQuery standard itself - will a W3C Recommendation be available before the next release of SQL Server?

That's why I've also decided to keep track of implementations as the functions are added in upcoming releases of SQL Server. I'm also keeping an eye on the XQuery Working Group to track any additions to the collection of functions.

Below is a list of all XPath XQuery Functions that are part of the W3C Candidate Recommendation mentioned above. The second column shows the release of SQL Server that first implemented them.

Function name	Release of first implementation
fn:abs	-
fn:adjust-date-to-timezone	-
fn:adjust-dateTime-to-timezone	-
fn:adjust-time-to-timezone	-
fn:avg	SQL 2005 RTM more...
fn:base-uri	-
fn:boolean	-
fn:ceiling	SQL 2005 RTM more...
fn:codepoint-equal	-
fn:codepoints-to-string	-
fn:collection	-
fn:compare	-
fn:concat	SQL 2005 RTM more...
fn:contains	SQL 2005 RTM more...
fn:count	SQL 2005 RTM more...
fn:current-date	-
fn:current-dateTime	-
fn:current-time	-
fn:data	SQL 2005 RTM more...
fn:dateTime	-
fn:day-from-date	-
fn:day-from-dateTime	-
fn:days-from-duration	-
fn:deep-equal	-
fn:default-collation	-
fn:distinct-values	SQL 2005 RTM more...
fn:doc	-
fn:doc-available	-
fn:document-uri	-
fn:empty	SQL 2005 RTM more...
fn:encode-for-uri	-
fn:ends-with	-
fn:error	-
fn:escape-html-uri	-
fn:exactly-one	-
fn:exists	-
fn:false	SQL 2005 RTM more...
fn:floor	SQL 2005 RTM more...
fn:hours-from-dateTime	-
fn:hours-from-duration	-
fn:hours-from-time	-
fn:id	SQL 2005 RTM more...
fn:idref	-
fn:implicit-timezone	-
fn:in-scope-prefixes	-
fn:index-of	-
fn:insert-before	-
fn:iri-to-uri	-
fn:lang	-
fn:last	SQL 2005 RTM more...
fn:local-name	SQL 2005 RTM more...
fn:local-name-from-QName	SQL 2005 RTM
fn:lower-case	SQL 2008 RTM more...
fn:matches	-
fn:max	SQL 2005 RTM more...
fn:min	SQL 2005 RTM more...
fn:minutes-from-dateTime	-
fn:minutes-from-duration	-
fn:minutes-from-time	-
fn:month-from-date	-
fn:month-from-dateTime	-
fn:months-from-duration	-
fn:name	-
fn:namespace-uri	SQL 2005 RTM
fn:namespace-uri-for-prefix	-
fn:namespace-uri-from-QName	SQL 2005 RTM
fn:nilled	-
fn:node-name	-
fn:normalize-space	-
fn:normalize-unicode	-
fn:not	SQL 2005 RTM more...
fn:number	SQL 2005 RTM more...
fn:one-or-more	-
fn:position	SQL 2005 RTM more...
fn:prefix-from-QName	-
fn:QName	-
fn:remove	-
fn:replace	-
fn:resolve-QName	-
fn:resolve-uri	-
fn:reverse	-
fn:root	-
fn:round	SQL 2005 RTM more...
fn:round-half-to-even	-
fn:seconds-from-dateTime	-
fn:seconds-from-duration	-
fn:seconds-from-time	-
fn:starts-with	-
fn:static-base-uri	-
fn:string	SQL 2005 RTM more...
fn:string-join	-
fn:string-length	SQL 2005 RTM more...
fn:string-to-codepoints	-
fn:subsequence	-
fn:substring	SQL 2005 RTM more...
fn:substring-after	-
fn:substring-before	-
fn:sum	SQL 2005 RTM more...
fn:timezone-from-date	-
fn:timezone-from-dateTime	-
fn:timezone-from-time	-
fn:tokenize	-
fn:trace	-
fn:translate	-
fn:true	SQL 2005 RTM more...
fn:unordered	-
fn:upper-case	SQL 2008 RTM more...
fn:year-from-date	-
fn:year-from-dateTime	-
fn:years-from-duration	-
fn:zero-or-one	-

Update (2009-03-07)

In SQL Server 2008 two aditional functions have been added to the collection:

lower-case(); and
upper-case().

p.s. If you're interested in more information regarding the implementation of XML in SQL 2005 continue here.

Monday, August 14, 2006

Certificate dependency

Knowing whether a certificate exists and/or is in use by a dependent object is good, knowing how it's used is slightly better, but knowing exactly which objects are using a certificate is what is really useful.

This post is a follow-up to a previous post concerning the use of certificates, so without further ado...

...here's the function:

create function dbo.fnList_Certificate_Dependencies
 (
 @certificateName sysname  = null
 )
returns table
as
return (
 select sys.certificates.[name] as CertificateName
  ,sys.certificates.certificate_id as CertificateId
  ,'database principal' as DependentObjectType
  ,sys.database_principals.[name] as DependentObjectName
  ,sys.database_principals.principal_id as DependentObjectId
  from sys.certificates
   inner join sys.database_principals
     on sys.database_principals.sid = sys.certificates.sid
  where ((sys.certificates.[name] = @certificateName)
    or (@certificateName is null))
 union
 select sys.certificates.[name]
  ,sys.certificates.certificate_id
  ,'symmetric key'
  ,sys.symmetric_keys.[name]
  ,sys.symmetric_keys.symmetric_key_id
  from sys.certificates
   inner join sys.key_encryptions
     on sys.key_encryptions.thumbprint = sys.certificates.thumbprint
     inner join sys.symmetric_keys
       on sys.symmetric_keys.symmetric_key_id = sys.key_encryptions.key_id
  where ((sys.certificates.[name] = @certificateName)
    or (@certificateName is null))
 union
 select sys.certificates.[name]
  ,sys.certificates.certificate_id
  ,coalesce(lower(replace(sys.objects.type_desc, '_', ' ')), 'unknown')
  ,coalesce(sys.objects.[name], 'unknown')
  ,sys.objects.[object_id]
  from sys.certificates
   inner join sys.crypt_properties
     on sys.crypt_properties.thumbprint = sys.certificates.thumbprint
     left join sys.objects
       on sys.objects.[object_id] = sys.crypt_properties.major_id
  where ((sys.certificates.[name] = @certificateName)
    or (@certificateName is null))
 )
go

The function lists the name and the id of the certificate in question along with the names, ids and a description of the dependent object's type. Possible dependent objects are discussed in this post.

Two examples of use:

to list dependencies of a specific certificate (e.g. 'SalesProxy'):
```
select *
 from dbo.fnList_Certificate_Dependencies('SalesProxy')
```

to list dependencies of all certificates:

select *
 from dbo.fnList_Certificate_Dependencies(null)

Warning!
Note that neither of the functions posted here detect whether a certificate has been used to encrypt data, so backup your certificates before dropping them and unintentionally encrypting your data permanently!

Thursday, August 10, 2006

Is a certificate in use?

SQL 2005 brought several important security improvements in regard to previous versions, one of which is the Certificate. It has three purposes in SQL 2005:

Authentication – subjects (persons, devices, services) are granted access to the server (e.g. through HTTP endpoints, as database principals) after a valid certification path has been verified;
Module signatures – procedures, functions, triggers and assemblies can be signed with certificates; and
Encryption – data and symmetric keys can be encrypted using certificates.

The principal benefit of certificates is the possibility of separating security management from data management one stage further. In previous versions security needed to be managed either on the data server itself or inside the Windows NT domain. In SQL 2005 after establishing a trust between the data provider and the certificate issuer, the latter becomes the actual security manager. The host trusts that the issuer has verified the identity of the certificate subjects and that the issuer has fully resolved any matters regarding the use of certificates – which certificates can be used by which subjects etc.

This post is dedicated to a very specific case regarding certificates: knowing whether a certificate is being used (and how).

The function below uses four catalog views to determine whether a certificate is in use in one or more of the following situations:

a user in the current database that maps to a certificate (sys.database_principals);
a symmetric key encrypted by a certificate (sys.key_encryptions); or
a cryptographic property of a securable (e.g. module signatures – sys.crypt_properties).

One thing is missing, obviously – a test of whether a certificate has been used to encrypt data. At this time I haven't had any luck in discovering whether this information is stored at all. The real question is: should it be saved? Do the principles of data security allow the data server to track which values have been encrypted by which method and/or by which means? I believe not - 'tis the eternal conflict of security (inaccessibility) vs. accessibility (insecurity). It seems to be an unsupported feature – if knowing whether a certificate has been used to encrypt data is vital to your business you should design your own tracking solution.

While SQL 2005 will stop us with an error message if we decide to drop a certificate that is in use in one of the three situations mentioned above, we are free to drop a certificate if it's used solely to encrypt data.

Which makes it impossible to decrypt it afterwards.

Incidentally, dropping Symmetric keys which were used to encrypt data will also succeed – followed by an error message when trying to use (open) them afterwards. Obviously, the data can no longer be decrypted in this case as well.

Pretty much the same goes for Asymmetric keys, however, since they do not have to be opened prior to being used, no error is ever raised, only the data remains "permanently encrypted".

This may sound ridiculous, but on the other hand it's simply the case of the principles of data security (once again) prevailing over the principles of data accessibility. Which isn't all bad, actually.

I guess it's good practice to backup certificates.

But I'm drifting...

Anyway, here's the function:

create function dbo.fnGet_Certificate_IsUsed
 (
 @certificateName sysname
 )
returns tinyint
/*
Return value Meaning
null  Certificate by that name does not exist
0  Certificate by that name exists but is not used
1  Database User is mapped to certificate
2  Entities signed or encrypted by certificate
3  1 + 2
4  Securables associated to certificate
5  1 + 4
6  2 + 4
7  3 + 4
*/
as
begin
 declare @state tinyint 

 if (exists (
   select *
    from sys.certificates
    where (sys.certificates.[name] = @certificateName)
   ))
  begin
   set @state = 0

   if (exists (
     select *
      from sys.certificates
       inner join sys.database_principals
         on sys.database_principals.sid = sys.certificates.sid
      where (sys.certificates.[name] = @certificateName)
     ))
    begin
     set @state = @state + 1
    end

   if (exists (
     select *
      from sys.certificates
       inner join sys.key_encryptions
         on sys.key_encryptions.thumbprint = sys.certificates.thumbprint
      where (sys.certificates.[name] = @certificateName)
     ))
    begin
     set @state = @state + 2
    end

   if (exists (
     select *
      from sys.certificates
       inner join sys.crypt_properties
         on sys.crypt_properties.thumbprint = sys.certificates.thumbprint
      where (sys.certificates.[name] = @certificateName)
     ))
    begin
     set @state = @state + 4
    end
  end

 return @state
end
go

Return values (bitmap):

Value	Meaning
null	certificate does not exist
0	certificate exists but isn't used
1	a user is mapped to the certificate
2	one or more entities are signed or encrypted with the certificate
4	one or more modules are signed with the certificate

An example of use (after you've created the function in the AdventureWorks database):

use AdventureWorks
go

create certificate JethroTull
 encryption by password = 'JT4Ever'
 with subject = 'Jethro Tull fans'
go

create symmetric key JethroTullSymmetricKey
 with algorithm = des -- or any of the supported algorithms
 encryption by certificate JethroTull
go

create user Jethro
 for certificate JethroTull
go

select dbo.fnGet_Certificate_IsUsed('JethroTull') as CertificateState
go

P.S. Coming soon: a function to list certificate dependencies.

Tuesday, August 08, 2006

Full synchronisation

In contrast to Unidirectional synchronisation (discussed in a previous post) also better described as source-to-target data propagation, the purpose of Full synchronisation is to consolidate data at both locations – following the meaning of the word 'synchronisation' more closely: resulting in current historical events (e.g. business transactions) at both locations.

From Greek: syn- (together [with]), chronos (time) – synchronos (at the same time, in the same period).

The rules sound fairly simple:

Insert all rows from one location to the other if they don't already exist at the destination;
Update the rows at one location with the values from the other location that were changed at a later time.

What about deleted rows? Deleting rows from one location that do not also exist at the other might sound right but would quite obviously be in conflict with the insert rule. Each synchronisation would simply result in rows being un-deleted. One quite simple alternative is to use a Deleted flag – e.g. a bit column for which the value 1 designates a 'deleted' row and the default value of 0 a 'regular' row. In addition the delete operation could be prevented altogether through the use of an INSTEAD OF DELETE trigger.

As far as the synchronisation is concerned the Deleted flag allows the update operation to handle 'deleted' rows: rows flagged as deleted at one location are in turn flagged as deleted at the other.

These two tables represent the two locations:

create table dbo.RemoteSales
 (
 SaleId  int  identity (1, 1) not null
 ,DateChanged datetime   not null
 ,ProductName varchar(128)   not null
 ,CustomerName varchar(128)   not null
 ,SaleDate datetime   not null
 ,Quantity decimal(18,2)   not null
 ,Deleted bit    not null
 )
go

alter table dbo.RemoteSales
 add constraint pk_RemoteSales_SaleDate_ProductName_CustomerName
   primary key nonclustered
   (
   SaleDate
   ,ProductName
   ,CustomerName
   )
go

create unique clustered index ux_RemoteSales_SaleId
 on dbo.RemoteSales
  (
  SaleId
  )
go

alter table dbo.RemoteSales
 add constraint df_RemoteSales_SaleDate
   default (getdate())
   for SaleDate
   with values
go

alter table dbo.RemoteSales
 add constraint df_RemoteSales_DateChanged
   default (getdate())
   for DateChanged
   with values
go 

alter table dbo.RemoteSales
 add constraint df_RemoteSales_Deleted
   default (0)
   for Deleted
   with values
go

create table dbo.LocalSales
 (
 SaleId  int  identity (1, 1) not null
 ,DateChanged datetime   not null
 ,ProductName varchar(128)   not null
 ,CustomerName varchar(128)   not null
 ,SaleDate datetime   not null
 ,Quantity decimal(18,2)   not null
 ,Deleted bit    not null
 )
go

alter table dbo.LocalSales
 add constraint pk_LocalSales_SaleDate_ProductName_CustomerName
   primary key nonclustered
   (
   SaleDate
   ,ProductName
   ,CustomerName
   )
go

create unique clustered index ux_LocalSales_SaleId
 on dbo.LocalSales
  (
  SaleId
  )
go

alter table dbo.LocalSales
 add constraint df_LocalSales_SaleDate
   default (getdate())
   for SaleDate
   with values
go

alter table dbo.LocalSales
 add constraint df_LocalSales_DateChanged
   default (getdate())
   for DateChanged
   with values
go 

alter table dbo.LocalSales
 add constraint df_LocalSales_Deleted
   default (0)
   for Deleted
   with values
go

Notice the primary key, it's vital to the operation. It's a composite key, spanning three columns (SaleDate, ProductName and CustomerName) that uniquely identify each sale transaction. Of course in a real-life situation the customer name and the product name would not be used here, but rather be represented by foreign keys referencing a table of customers and a table of products respectively.

Now the values:

-- Some initial transactions
insert dbo.RemoteSales
 (
 DateChanged
 ,ProductName
 ,CustomerName
 ,Quantity
 )
 select dateadd(w, -5, getdate()) as DateChanged
  ,'Granny Smith' as ProductName
  ,'Jim' as CustomerName
  ,27 as Quantity
 union all
 select dateadd(w, -4, getdate())
  ,'Granny Smith'
  ,'Bob'
  ,62
 union all
 select dateadd(w, -4, getdate())
  ,'Golden delicious'
  ,'Bob'
  ,12
go

insert dbo.LocalSales
 (
 DateChanged
 ,ProductName
 ,CustomerName
 ,Quantity
 )
 select dateadd(w, -7, getdate()) as DateChanged
  ,'Granny Smith' as ProductName
  ,'Sam' as CustomerName
  ,28 as Quantity
 union all
 select dateadd(w, -12, getdate())
  ,'Granny Smith'
  ,'Lana'
  ,6
 union all
 select dateadd(w, -14, getdate())
  ,'Golden delicious'
  ,'Rob'
  ,72
go

-- A couple of conflicting rows
insert dbo.LocalSales
 (
 DateChanged
 ,ProductName
 ,CustomerName
 ,SaleDate
 ,Quantity
 )
 select top 1
  getdate() as DateChanged
  ,dbo.RemoteSales.ProductName as ProductName
  ,dbo.RemoteSales.CustomerName as CustomerName
  ,dbo.RemoteSales.SaleDate as SaleDate
  ,120 as Quantity
  from dbo.RemoteSales
  order by DateChanged
go

insert dbo.RemoteSales
 (
 DateChanged
 ,ProductName
 ,CustomerName
 ,SaleDate
 ,Quantity
 ,Deleted
 )
 select getdate() as DateChanged
  ,dbo.LocalSales.ProductName as ProductName
  ,dbo.LocalSales.CustomerName as CustomerName
  ,dbo.LocalSales.SaleDate as SaleDate
  ,dbo.LocalSales.Quantity as Quantity
  ,1
  from dbo.LocalSales
  where (dbo.LocalSales.CustomerName = 'Lana')
go

Before designing the synchronisation script let's take a closer look at the rules:

The insert rule is as simple as possible: all transactions must be stored at both locations.

The update would be difficult without the DateChanged column. With it the job is also quite simple – 'the latest state prevails'. All latest changes to the rows must exist at both locations – including the Deleted column values to accurately represent cancelled or annulled transactions.

By using a FULL OUTER JOIN query we can observe the differences between the remote and the local table:

select *
 from dbo.RemoteSales
  full join dbo.LocalSales
    on (dbo.LocalSales.ProductName = dbo.RemoteSales.ProductName)
     and (dbo.LocalSales.CustomerName = dbo.RemoteSales.CustomerName)
     and (dbo.LocalSales.SaleDate = dbo.RemoteSales.SaleDate)
 order by dbo.RemoteSales.DateChanged desc

Some rows exist at one location but not at the other while some rows have been changed at one location but the values not yet propagated to the other.

For the purposes of this rather simplified example we'll leave out any concurrency issues, and perhaps deal with them in another post.

As mentioned the insert part is simple:

The propagation of values from the Local table to the Remote:

insert dbo.LocalSales
 (
 DateChanged
 ,ProductName
 ,CustomerName
 ,SaleDate
 ,Quantity
 ,Deleted
 )
 select dbo.RemoteSales.DateChanged as DateChanged
  ,dbo.RemoteSales.ProductName as ProductName
  ,dbo.RemoteSales.CustomerName as CustomerName
  ,dbo.RemoteSales.SaleDate as SaleDate
  ,dbo.RemoteSales.Quantity as Quantity
  ,dbo.RemoteSales.Deleted as Deleted
  from dbo.RemoteSales
  where (not exists(
    select *
     from dbo.LocalSales
     where (dbo.LocalSales.ProductName = dbo.RemoteSales.ProductName)
      and (dbo.LocalSales.CustomerName = dbo.RemoteSales.CustomerName)
      and (dbo.LocalSales.SaleDate = dbo.RemoteSales.SaleDate)
    ))

And the opposite direction:

insert dbo.RemoteSales
 (
 DateChanged
 ,ProductName
 ,CustomerName
 ,SaleDate
 ,Quantity
 ,Deleted
 )
 select dbo.LocalSales.DateChanged as DateChanged
  ,dbo.LocalSales.ProductName as ProductName
  ,dbo.LocalSales.CustomerName as CustomerName
  ,dbo.LocalSales.SaleDate as SaleDate
  ,dbo.LocalSales.Quantity as Quantity
  ,dbo.LocalSales.Deleted as Deleted
  from dbo.LocalSales
  where (not exists(
    select *
     from dbo.RemoteSales
     where (dbo.RemoteSales.ProductName = dbo.LocalSales.ProductName)
      and (dbo.RemoteSales.CustomerName = dbo.LocalSales.CustomerName)
      and (dbo.RemoteSales.SaleDate = dbo.LocalSales.SaleDate)
    ))

By storing the time of data modifications in the DateChanged column it's also really easy to know which rows need to be updated. Of course the default constraint will only work when inserting data as long as the value is not set by the insert statement, and the value would have to be set explicitly when updating data.

The propagation of changes from the Local table to the Remote:

update dbo.LocalSales
 set dbo.LocalSales.DateChanged = dbo.RemoteSales.DateChanged
  ,dbo.LocalSales.ProductName = dbo.RemoteSales.ProductName
  ,dbo.LocalSales.CustomerName = dbo.RemoteSales.CustomerName
  ,dbo.LocalSales.SaleDate = dbo.RemoteSales.SaleDate
  ,dbo.LocalSales.Quantity = dbo.RemoteSales.Quantity
  ,dbo.LocalSales.Deleted = dbo.RemoteSales.Deleted
 from dbo.RemoteSales
  inner join dbo.LocalSales
    on (dbo.LocalSales.ProductName = dbo.RemoteSales.ProductName)
     and (dbo.LocalSales.CustomerName = dbo.RemoteSales.CustomerName)
     and (dbo.LocalSales.SaleDate = dbo.RemoteSales.SaleDate)
 where (dbo.RemoteSales.DateChanged > dbo.LocalSales.DateChanged)

And the opposite direction:

update dbo.RemoteSales
 set dbo.RemoteSales.DateChanged = dbo.LocalSales.DateChanged
  ,dbo.RemoteSales.ProductName = dbo.LocalSales.ProductName
  ,dbo.RemoteSales.CustomerName = dbo.LocalSales.CustomerName
  ,dbo.RemoteSales.SaleDate = dbo.LocalSales.SaleDate
  ,dbo.RemoteSales.Quantity = dbo.LocalSales.Quantity
  ,dbo.RemoteSales.Deleted = dbo.LocalSales.Deleted
 from dbo.LocalSales
  inner join dbo.RemoteSales
    on (dbo.RemoteSales.ProductName = dbo.LocalSales.ProductName)
     and (dbo.RemoteSales.CustomerName = dbo.LocalSales.CustomerName)
     and (dbo.RemoteSales.SaleDate = dbo.LocalSales.SaleDate)
 where (dbo.LocalSales.DateChanged > dbo.RemoteSales.DateChanged)

If we now execute the FULL OUTER JOIN query mentioned above we can now see that the data has been synchronised appropriately between the two locations.

What are the typical use cases that could benefit from this example? As mentioned before most real-life scenarios can be quite easily handled by Replication – which, on the other hand, is not available in all editions of SQL Server. Wherever replication cannot be used to synchronise datasources other methods will have to be used, and once again a set-based solution is the simplest to design and will generally outperform any other method.

SQL Server 2005 Books Online – July 2006 Update

In case you missed it - like I have (I've been offline for a while) - the Books Online collection was updated July 20^th.

The files can be downloaded from this location.

The online version has also been updated.